[ https://issues.apache.org/jira/browse/PHOENIX-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated PHOENIX-4094: ------------------------------ Summary: ParallelWriterIndexCommitter incorrectly applies local updates to index tables for 4.x-HBase-0.98 (was: ParallelWriterIndexCommitter incorrectly applys local updates to index tables for 4.x-HBase-0.98) > ParallelWriterIndexCommitter incorrectly applies local updates to index > tables for 4.x-HBase-0.98 > ------------------------------------------------------------------------------------------------- > > Key: PHOENIX-4094 > URL: https://issues.apache.org/jira/browse/PHOENIX-4094 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.11.0 > Reporter: chenglei > Assignee: chenglei > Fix For: 4.12.0 > > Attachments: PHOENIX-4094_4.x-HBase-0.98_v1.patch, > PHOENIX-4094_v1.patch > > > I used phoenix-4.x-HBase-0.98 in my hbase cluster.When I restarted my hbase > cluster a certain time, I noticed some RegionServers have plenty of > {{WrongRegionException}} as following: > {code:java} > 2017-08-01 11:53:10,669 WARN > [rsync.slave005.bizhbasetest.sjs.ted,60020,1501511894174-index-writer--pool2-t786] > regionserver.HRegion: Failed getting lock in batch put, > row=\x10\x00\x00\x00913f0eed-6710-4de9-8bac-077a106bb9ae_0 > org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out > of range for row lock on HRegion > BIZARCH_NS_PRODUCT.BIZTRACER_SPAN,90ffd783-b0a3-4f8a-81ef-0a7535fea197_0,1490066612493.463220cd8fad7254481595911e62d74d., > startKey='90ffd783-b0a3-4f8a-81ef-0a7535fea197_0', > getEndKey()='917fc343-3331-47fa-907c-df83a6f302f7_0', > row='\x10\x00\x00\x00913f0eed-6710-4de9-8bac-077a106bb9ae_0' > at > org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:3539) > at > org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:3557) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2394) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2261) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2213) > at > org.apache.phoenix.util.IndexUtil.writeLocalUpdates(IndexUtil.java:671) > at > org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitter$1.call(ParallelWriterIndexCommitter.java:157) > at > org.apache.phoenix.hbase.index.write.ParallelWriterIndexCommitter$1.call(ParallelWriterIndexCommitter.java:134) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > The problem is caused by the ParallelWriterIndexCommitter.write method, in > following line 151, if {{allowLocalUpdates}} is true, it would wiite index > mutations to current data table region unconditionlly,which is obviously > inappropriate: > {code:java} > 150 try { > 151 if (allowLocalUpdates && env != null) { > 152 try { > 153 throwFailureIfDone(); > 154 > IndexUtil.writeLocalUpdates(env.getRegion(), mutations, true); > 155 return null; > 156 } catch (IOException ignord) { > 157 // when it's failed we fall back to the > standard & slow way > 158 if (LOG.isDebugEnabled()) { > 159 LOG.debug("indexRegion.batchMutate > failed and fall back to HTable.batch(). Got error=" > 160 + ignord); > 161 } > 162 } > 163 } > {code} > If a data table has a global index table , and when we replay the WALs to > index table in Indexer.postOpen method in following > line 691, which the {{allowLocalUpdates}} parameter is true, the {{updates}} > parameter for the global index table would incorrectly be written to the > current data table region: > {code:java} > 688 // do the usual writer stuff, killing the server again, if we > can't manage to make the index > 689 // writes succeed again > 690 try { > 691 writer.writeAndKillYourselfOnFailure(updates, true); > 692 } catch (IOException e) { > 693 LOG.error("During WAL replay of outstanding index updates, > " > 694 + "Exception is thrown instead of killing server > during index writing", e); > 695 } > 696 } finally { > {code} > However, ParallelWriterIndexCommitter.write method in the master and other > 4.x branches is correct, just as following line 150 and line 151 : > {code:java} > 147 try { > 148 if (allowLocalUpdates > 149 && env != null > 150 && tableReference.getTableName().equals( > 151 > env.getRegion().getTableDesc().getNameAsString())) { > 152 try { > 153 throwFailureIfDone(); > 154 > IndexUtil.writeLocalUpdates(env.getRegion(), mutations, true); > 155 return null; > 156 } catch (IOException ignord) { > 157 // when it's failed we fall back to the > standard & slow way > 158 if (LOG.isDebugEnabled()) { > 159 LOG.debug("indexRegion.batchMutate > failed and fall back to HTable.batch(). Got error=" > 160 + ignord); > 161 } > 162 } > 163 } > {code} > This inconsistency of branches is introduced by PHOENIX-1734 and > PHOENIX-3018, because lack of unit tests or IT tests for > Indexer.preWALRestore /postOpen, the inconsistency is not detected. > BTW,the TrackingParallelWriterIndexCommitter is right for master and all the > 4.x branches. -- This message was sent by Atlassian JIRA (v6.4.14#64029)