[ https://issues.apache.org/jira/browse/PHOENIX-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501533#comment-14501533 ]
James Taylor edited comment on PHOENIX-1870 at 4/18/15 7:09 PM: ---------------------------------------------------------------- Good questions, [~shuxi0ng]. bq. 1. What tuple is used for? Tuple is used to store the state of the current row being evaluated. It's mainly used at evaluation time when an expression is a reference to a column. If an expression.isStateless() is true, that means it doesn't need to contact the server for any information. If an expression.isDetermism() == Determinism.ALWAYS, this means it'll give the same output each time with the same input. Counterexamples are RAND(), CURRENT_DATE(), and NEXT VALUE FOR my_sequence. So if expression.isStateless() is true and expression.isDetermism() is Determinism.ALWAYS, then we don't need a Tuple and can just use null instead. It'd actually be nice if we could remove Tuple completely from the evaluate method, but that's a bigger change for another day. bq. 2. In some functions, for example RegexpReplaceFunction, there are two arguments, source strings and replace strings. How should the replace interface be designed? Maybe the most consistent way would be to pass a byte[] buf, int offset, int length triple for each argument. You can then use the ImmutableBytesWritable passed in to evaluate to evaluate each one, assigning local variables for the above arguments before evaluating the next one. bq. 2.1 replace(srcPtr, replacePtr): RegexpReplaceFunction has a class member replacePtr, and if is used in evaluation, and will be set to empty bytes after evaluation. Oh, I missed that. That's potentially problematic, as these expressions are potentially used by multiple threads at the same time. I'd recommend changing that as I've described above. bq. 2.2 replace(ptr, srcExpression, replaceExpression): Pass the expression to Regex Engine, and srcStr and replaceStr use the same ptr. Which one is better? See response for (2), as that's probably the most consistent (even though it adds extra args). bq. 3. for a function, REGEXP_REPLACE(PatternStr, SrcStr, ReplaceStr), if some of the three arguments is Determinism.ALWAYS, so they can be pre-compute in init(). Why not try to cache the value? Yes, correct. If childExpression.isStateless() and Determinism.ALWAYS, then it can be cached. FYI, we have a check above this in ExpressionCompiler that'll evaluate a function at compile time if *all* of it's arguments are constant. bq. Fox example, fun1( arg1, fun2(arg2)), if fun2(arg2) is Determinism.ALWAYS, and in every evaluation of fun1, we don't have to compute fun2 again, because we can pre-compute in fun1.init(). Yes, correct. In this case, fun2(arg2) will come in already pre-computed. was (Author: jamestaylor): Good questions, [~shuxi0ng]. bq. 1. What tuple is used for? Tuple is used to store the state of the current row being evaluated. It's mainly used at evaluation time when an expression is a reference to a column. If an expression.isStateless() is true, that means it doesn't need to contact the server for any information. If an expression.isDetermism() == Determinism.ALWAYS, this means it'll give the same output each time with the same input. Counterexamples are RAND(), CURRENT_DATE(), and NEXT VALUE FOR my_sequence. So if expression.isStateless() is true and expression.isDetermism() is Determinism.ALWAYS, then we don't need a Tuple and can just use null instead. It'd actually be nice if we could remove Tuple completely from the evaluate method, but that's a bigger change for another day. bq. 2. In some functions, for example RegexpReplaceFunction, there are two arguments, source strings and replace strings. How should the replace interface be designed? Maybe the most consistent way would be to pass a byte[] buf, int offset, int length triple for each argument. You can then use the ImmutableBytesWritable passed in to evaluate to evaluate each one, assigning local variables for the above arguments before evaluating the next one. bq. 2.1 replace(srcPtr, replacePtr): RegexpReplaceFunction has a class member replacePtr, and if is used in evaluation, and will be set to empty bytes after evaluation. Oh, I missed that. That's potentially problematic, as these expressions are potentially used by multiple threads at the same time. I'd recommend changing that as I've described above. bq. 2.2 replace(ptr, srcExpression, replaceExpression): Pass the expression to Regex Engine, and srcStr and replaceStr use the same ptr. Which one is better? See response for (2), as that's probably the most consistent (even though it adds extra args). bq. 3. for a function, REGEXP_REPLACE(PatternStr, SrcStr, ReplaceStr), if some of the three arguments is Determinism.ALWAYS, so they can be pre-compute in init(). Why not try to cache the value? Yes, correct. If childExpression.isStateless() and Determinism.ALWAYS, then it can be cached. FYI, we have a check above this in ExpressionCompiler that'll evaluate a function at compile time if *all* of it's arguments are constant. bq. Fox example, fun1( arg1, fun2(arg2)), if fun2(arg2) is Determinism.ALWAYS, and in every evaluation of fun1, we don't have to compute fun2 again, because we can pre-compute in fun1.init(). Yes, correct. In this case, fun2(arg2) will come in already pre-computed. > Fix NPE occurring during regex processing when joni library not used > -------------------------------------------------------------------- > > Key: PHOENIX-1870 > URL: https://issues.apache.org/jira/browse/PHOENIX-1870 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: Shuxiong Ye > Fix For: 5.0.0, 4.4.0 > > Attachments: > 0001-PHOENIX-1870-Fix-NPE-occurring-during-regex-processi.patch, > PHOENIX-1870-Regex-Expression-refinement.patch, PHOENIX-1870_v2.patch > > > Tests in error: > {code} > > IndexExpressionIT.testImmutableCaseSensitiveFunctionIndex:1277->helpTestCaseSensitiveFunctionIndex:1321 > » Commit > > IndexExpressionIT.testImmutableLocalCaseSensitiveFunctionIndex:1282->helpTestCaseSensitiveFunctionIndex:1321 > » Commit > {code} > Stack trace: > {code} > testImmutableCaseSensitiveFunctionIndex(org.apache.phoenix.end2end.index.IndexExpressionIT) > Time elapsed: 0.723 sec <<< ERROR! > org.apache.phoenix.execute.CommitException: > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 > actions: > org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed > to build index for unexpected reason! > at > org.apache.phoenix.hbase.index.util.IndexManagementUtil.rethrowIndexingException(IndexManagementUtil.java:180) > at > org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:206) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$35.call(RegionCoprocessorHost.java:989) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1671) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1703) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:985) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2697) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2478) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2432) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2436) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:641) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:605) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1822) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:724) > Caused by: java.lang.NullPointerException > at > org.apache.phoenix.expression.util.regex.JavaPattern.substr(JavaPattern.java:83) > at > org.apache.phoenix.expression.function.RegexpSubstrFunction.evaluate(RegexpSubstrFunction.java:118) > at > org.apache.phoenix.index.IndexMaintainer.buildRowKey(IndexMaintainer.java:453) > at > org.apache.phoenix.index.IndexMaintainer.buildDeleteMutation(IndexMaintainer.java:842) > at > org.apache.phoenix.index.PhoenixIndexCodec.getIndexUpdates(PhoenixIndexCodec.java:173) > at > org.apache.phoenix.index.PhoenixIndexCodec.getIndexDeletes(PhoenixIndexCodec.java:119) > at > org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addDeleteUpdatesToMap(CoveredColumnsIndexBuilder.java:403) > at > org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addCleanupForCurrentBatch(CoveredColumnsIndexBuilder.java:287) > at > org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addMutationsForBatch(CoveredColumnsIndexBuilder.java:239) > at > org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.batchMutationAndAddUpdates(CoveredColumnsIndexBuilder.java:136) > at > org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.getIndexUpdate(CoveredColumnsIndexBuilder.java:99) > at > org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133) > at > org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:129) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ... 1 more > : 2 times, > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227) > at > org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207) > at > org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1563) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:928) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:942) > at > org.apache.phoenix.execute.MutationState.commit(MutationState.java:422) > at > org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:439) > at > org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:436) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:436) > at > org.apache.phoenix.end2end.index.IndexExpressionIT.helpTestCaseSensitiveFunctionIndex(IndexExpressionIT.java:1321) > at > org.apache.phoenix.end2end.index.IndexExpressionIT.testImmutableCaseSensitiveFunctionIndex(IndexExpressionIT.java:1277) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)