[ https://issues.apache.org/jira/browse/PHOENIX-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Taylor updated PHOENIX-4190: ---------------------------------- Labels: secondary_index (was: ) > Salted local index failure is causing region server to abort > ------------------------------------------------------------ > > Key: PHOENIX-4190 > URL: https://issues.apache.org/jira/browse/PHOENIX-4190 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain > Assignee: James Taylor > Labels: secondary_index > Fix For: 4.12.0 > > Attachments: PHOENIX-4190.patch > > > If you run just this case > {code} > { false, true, true, true, false, null} > {code} > in MutableIndexFailureIT on the 4.x-HBase-1.2 branch, [~rajeshbabu], you will > see the following NPE in logs: > {code} > 2017-09-11 00:27:08,119 WARN > [B.defaultRpcServer.handler=2,queue=0,port=63436] > org.apache.phoenix.index.PhoenixIndexFailurePolicy(143): handleFailure failed > java.lang.NullPointerException > at > org.apache.phoenix.util.SchemaUtil.getTableKeyFromFullName(SchemaUtil.java:707) > at > org.apache.phoenix.util.IndexUtil.updateIndexState(IndexUtil.java:717) > at > org.apache.phoenix.index.PhoenixIndexFailurePolicy.handleFailureWithExceptions(PhoenixIndexFailurePolicy.java:221) > at > org.apache.phoenix.index.PhoenixIndexFailurePolicy.handleFailure(PhoenixIndexFailurePolicy.java:140) > at > org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:155) > at > org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:139) > at > org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:651) > at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:608) > at > org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:591) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1034) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1705) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1030) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3322) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2881) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2823) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:758) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:720) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2168) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2188) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) > at java.lang.Thread.run(Thread.java:745) > {code} > This happens only for salted local indexes. If I remove the SALT_BUCKETS from > the table DDL, then the test passes fine. On looking closely at the code, it > seems like something is wrong with the computation of offset and subsequent > parsing of the index id from the row key here (in PhoenixIndexFailurePolicy): > {code} > int offset = > regionInfo.getStartKey().length == 0 ? > regionInfo.getEndKey().length > : regionInfo.getStartKey().length; > byte[] viewId = null; > for (Mutation mutation : mutations) { > viewId = > indexMaintainer.getViewIndexIdFromIndexRowKey( > new ImmutableBytesWritable(mutation.getRow(), > offset, > mutation.getRow().length - offset)); > String indexTableName = localIndexNames.get(new > ImmutableBytesWritable(viewId)); > indexTableNames.add(indexTableName); > } > {code} > Because of this NPE in PhoenixIndexFailurePolicy, we end up triggering the > KillServerOnFailurePolicy which ends up causing the region server to abort. > This region server abort is also the reason why our builds against the > 4.x-HBase-1.2 branch are hanging. I also believe once we fix this, we can > hopefully reenable back the parameters which were testing out rebuild of > local indexes for the 4.x-HBase-0.98, 4.x-HBase-1.1 and 4.x-HBase-1.2 > branches. On the master branch, because local index update is transactional > with data table update, we won' run into such failure scenarios (I think). > [~jamestaylor] - A bit orthogonal, but it seems like we can do better here. > Wouldn't a better option here would be to let HBase black list the Indexer > co-processor in cases of such bugs? Else, we run the risk of shutting down > the entire HBase cluster which is what happened here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)