[jira] [Comment Edited] (PHOENIX-1870) Fix NPE occurring during regex processing when joni library not used

James Taylor (JIRA) Sat, 18 Apr 2015 12:10:48 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501533#comment-14501533
 ]


James Taylor edited comment on PHOENIX-1870 at 4/18/15 7:09 PM:
----------------------------------------------------------------

Good questions, [~shuxi0ng].
bq. 1. What tuple is used for?
Tuple is used to store the state of the current row being evaluated. It's 
mainly used at evaluation time when an expression is a reference to a column. 
If an expression.isStateless() is true, that means it doesn't need to contact 
the server for any information. If an expression.isDetermism() == 
Determinism.ALWAYS, this means it'll give the same output each time with the 
same input. Counterexamples are RAND(), CURRENT_DATE(), and NEXT VALUE FOR 
my_sequence. So if expression.isStateless() is true and 
expression.isDetermism() is Determinism.ALWAYS, then we don't need a Tuple and 
can just use null instead. It'd actually be nice if we could remove Tuple 
completely from the evaluate method, but that's a bigger change for another day.

bq. 2. In some functions, for example RegexpReplaceFunction, there are two 
arguments, source strings and replace strings. How should the replace interface 
be designed?
Maybe the most consistent way would be to pass a byte[] buf, int offset, int 
length triple for each argument. You can then use the ImmutableBytesWritable 
passed in to evaluate to evaluate each one, assigning local variables for the 
above arguments before evaluating the next one.

bq. 2.1 replace(srcPtr, replacePtr): RegexpReplaceFunction has a class member 
replacePtr, and if is used in evaluation, and will be set to empty bytes after 
evaluation.
Oh, I missed that. That's potentially problematic, as these expressions are 
potentially used by multiple threads at the same time. I'd recommend changing 
that as I've described above.

bq. 2.2 replace(ptr, srcExpression, replaceExpression): Pass the expression to 
Regex Engine, and srcStr and replaceStr use the same ptr. Which one is better?
See response for (2), as that's probably the most consistent (even though it 
adds extra args).

bq. 3. for a function, REGEXP_REPLACE(PatternStr, SrcStr, ReplaceStr), if some 
of the three arguments is Determinism.ALWAYS, so they can be pre-compute in 
init(). Why not try to cache the value?
Yes, correct. If childExpression.isStateless() and Determinism.ALWAYS, then it 
can be cached. FYI, we have a check above this in ExpressionCompiler that'll 
evaluate a function at compile time if *all* of it's arguments are constant.

bq. Fox example, fun1( arg1, fun2(arg2)), if fun2(arg2) is Determinism.ALWAYS, 
and in every evaluation of fun1, we don't have to compute fun2 again, because 
we can pre-compute in fun1.init().
Yes, correct. In this case, fun2(arg2) will come in already pre-computed.


was (Author: jamestaylor):
Good questions, [~shuxi0ng].
bq. 1. What tuple is used for?
Tuple is used to store the state of the current row being evaluated. It's 
mainly used at evaluation time when an expression is a reference to a column. 
If an expression.isStateless() is true, that means it doesn't need to contact 
the server for any information. If an expression.isDetermism() == 
Determinism.ALWAYS, this means it'll give the same output each time with the 
same input. Counterexamples are RAND(), CURRENT_DATE(), and NEXT VALUE FOR 
my_sequence. So if expression.isStateless() is true and 
expression.isDetermism() is Determinism.ALWAYS, then we don't need a Tuple and 
can just use null instead. It'd actually be nice if we could remove Tuple 
completely from the evaluate method, but that's a bigger change for another day.

bq. 2. In some functions, for example RegexpReplaceFunction, there are two 
arguments, source strings and replace strings. How should the replace interface 
be designed?
Maybe the most consistent way would be to pass a byte[] buf, int offset, int 
length triple for each argument. You can then use the ImmutableBytesWritable 
passed in to evaluate to evaluate each one, assigning local variables for the 
above arguments before evaluating the next one.

bq. 2.1 replace(srcPtr, replacePtr): RegexpReplaceFunction has a class member 
replacePtr, and
if is used in evaluation, and will be set to empty bytes after evaluation.
Oh, I missed that. That's potentially problematic, as these expressions are 
potentially used by multiple threads at the same time. I'd recommend changing 
that as I've described above.

bq. 2.2 replace(ptr, srcExpression, replaceExpression): Pass the expression to 
Regex Engine, and srcStr and replaceStr use the same ptr. Which one is better?
See response for (2), as that's probably the most consistent (even though it 
adds extra args).

bq. 3. for a function, REGEXP_REPLACE(PatternStr, SrcStr, ReplaceStr), if some 
of the three arguments is Determinism.ALWAYS, so they can be pre-compute in 
init(). Why not try to cache the value?
Yes, correct. If childExpression.isStateless() and Determinism.ALWAYS, then it 
can be cached. FYI, we have a check above this in ExpressionCompiler that'll 
evaluate a function at compile time if *all* of it's arguments are constant.

bq. Fox example, fun1( arg1, fun2(arg2)), if fun2(arg2) is Determinism.ALWAYS, 
and in every evaluation of fun1, we don't have to compute fun2 again, because 
we can pre-compute in fun1.init().
Yes, correct. In this case, fun2(arg2) will come in already pre-computed.

> Fix NPE occurring during regex processing when joni library not used
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-1870
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1870
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Shuxiong Ye
>             Fix For: 5.0.0, 4.4.0
>
>         Attachments: 
> 0001-PHOENIX-1870-Fix-NPE-occurring-during-regex-processi.patch, 
> PHOENIX-1870-Regex-Expression-refinement.patch, PHOENIX-1870_v2.patch
>
>
> Tests in error:
> {code}
>   
> IndexExpressionIT.testImmutableCaseSensitiveFunctionIndex:1277->helpTestCaseSensitiveFunctionIndex:1321
>  » Commit
>   
> IndexExpressionIT.testImmutableLocalCaseSensitiveFunctionIndex:1282->helpTestCaseSensitiveFunctionIndex:1321
>  » Commit
> {code}
> Stack trace:
> {code}
> testImmutableCaseSensitiveFunctionIndex(org.apache.phoenix.end2end.index.IndexExpressionIT)
>   Time elapsed: 0.723 sec  <<< ERROR!
> org.apache.phoenix.execute.CommitException: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: 
> org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed 
> to build index for unexpected reason!
>         at 
> org.apache.phoenix.hbase.index.util.IndexManagementUtil.rethrowIndexingException(IndexManagementUtil.java:180)
>         at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:206)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$35.call(RegionCoprocessorHost.java:989)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1671)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1703)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:985)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2697)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2478)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2432)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2436)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:641)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:605)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1822)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>         at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.phoenix.expression.util.regex.JavaPattern.substr(JavaPattern.java:83)
>         at 
> org.apache.phoenix.expression.function.RegexpSubstrFunction.evaluate(RegexpSubstrFunction.java:118)
>         at 
> org.apache.phoenix.index.IndexMaintainer.buildRowKey(IndexMaintainer.java:453)
>         at 
> org.apache.phoenix.index.IndexMaintainer.buildDeleteMutation(IndexMaintainer.java:842)
>         at 
> org.apache.phoenix.index.PhoenixIndexCodec.getIndexUpdates(PhoenixIndexCodec.java:173)
>         at 
> org.apache.phoenix.index.PhoenixIndexCodec.getIndexDeletes(PhoenixIndexCodec.java:119)
>         at 
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addDeleteUpdatesToMap(CoveredColumnsIndexBuilder.java:403)
>         at 
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addCleanupForCurrentBatch(CoveredColumnsIndexBuilder.java:287)
>         at 
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.addMutationsForBatch(CoveredColumnsIndexBuilder.java:239)
>         at 
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.batchMutationAndAddUpdates(CoveredColumnsIndexBuilder.java:136)
>         at 
> org.apache.phoenix.hbase.index.covered.CoveredColumnsIndexBuilder.getIndexUpdate(CoveredColumnsIndexBuilder.java:99)
>         at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
>         at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:129)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         ... 1 more
> : 2 times,
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207)
>         at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1563)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:928)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:942)
>         at 
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:422)
>         at 
> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:439)
>         at 
> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:436)
>         at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>         at 
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:436)
>         at 
> org.apache.phoenix.end2end.index.IndexExpressionIT.helpTestCaseSensitiveFunctionIndex(IndexExpressionIT.java:1321)
>         at 
> org.apache.phoenix.end2end.index.IndexExpressionIT.testImmutableCaseSensitiveFunctionIndex(IndexExpressionIT.java:1277)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-1870) Fix NPE occurring during regex processing when joni library not used

Reply via email to