[
https://issues.apache.org/jira/browse/PHOENIX-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
XIANG,CHAO-ZHU updated PHOENIX-6808:
------------------------------------
Description:
create table && upsert values
{code:java}
CREATE TABLE IF NOT EXISTS "TEST_TABLE" (
"K" VARCHAR PRIMARY KEY,
"f"."JR1" UNSIGNED_INT DEFAULT 0,
"f"."JR2" UNSIGNED_INT DEFAULT 0,
"f"."ZN" UNSIGNED_INT DEFAULT 0,
"f"."AA" UNSIGNED_INT,
"f"."AB" UNSIGNED_INT DEFAULT 0,
"f"."AC" INTEGER DEFAULT 0,
"f"."AD" INTEGER
) column_encoded_bytes=0 SPLIT ON (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD)
VALUES('5:xiang:346358075:370562729', 270, null, 10846, null, null, null, null);
UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD)
VALUES('0:chaozhu:123123120:321321321' , 123, null, 10846, 23303, 123, 321,
111);
{code}
then execute query:
{code:java}
select * from "TEST_TABLE" where ( "ZN" in (10846) and ("JR1" in (2303) or
"JR2" in (10846) ));{code}
result:
!image-2022-10-10-18-41-09-059.png!
Above, neither `JR1` nor `JR2` matches query statement but result contains 2
rows.
And execute query (change the order):
{code:java}
select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and
"ZN" in (10846) ); {code}
result:
!image-2022-10-11-00-06-32-190.png!
Or execute query ( "ZN" in (10846) -> "AB" in ( 123 )):
{code:java}
select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2"
in (10846) ));
{code}
result:
!image-2022-10-10-23-10-01-059.png!
I try to debug it by docker and remote debug. Maybe I know the cause of this
problem.
In
{*}ComparisonExpression.java{*}(org/apache/phoenix/expression/ComparisonExpression.java)
{code:java}
@Override
public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
if (!children.get(0).evaluate(tuple, ptr)) {
return false;
}
if (ptr.getLength() == 0) { // null comparison evals to null
return true;
}
....
}{code}
because `JR2` value is null, so `ptr.getLength() == 0` is true then return true
directly.
In *AndOrExpression.java* (org/apache/phoenix/expression/AndOrExpression.java)
{code:java}
public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
boolean isNull = false;
for (int i = 0; i < children.size(); i++) {
Expression child = children.get(i);
// If partial state is available, then use that to know we've already
evaluated this
// child expression and do not need to do so again.
if (partialEvalState == null || !partialEvalState.get(i)) {
// Call through to child evaluate method matching parent call to
allow child to optimize
// evaluate versus getValue code path.
if (child.evaluate(tuple, ptr)) {
// Short circuit if we see our stop value
if (isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr,
child.getDataType()))) {
return true;
} else if (partialEvalState != null) {
partialEvalState.set(i);
}
} else {
isNull = true;
}
}
}
if (isNull) {
return false;
}
return true;
} {code}
because evaluate `JR2` return true so `("JR1" in (2303) or "JR2" in (10846))`
is true.
In *BooleanExpressionFilter.Java*
(org/apache/phoenix/filter/BooleanExpressionFilter.java)
{code:java}
@Override
public String toString() {
return expression.toString();
} @edu.umd.cs.findbugs.annotations.SuppressWarnings(
value="NP_BOOLEAN_RETURN_NULL",
justification="Returns null by design.")
protected Boolean evaluate(Tuple input) {
try {
if (!expression.evaluate(input, tempPtr)) {
return null;
}
} catch (IllegalDataException e) {
return Boolean.FALSE;
}
return (Boolean)expression.getDataType().toObject(tempPtr);
}{code}
now `(Boolean)expression.getDataType().toObject(tempPtr) = true` ( `"ZN" in
(10846) is true` cause ) **
In *MultiKeyValueComparisonFilter.java*
(org/apache/phoenix/filter/MultiKeyValueComparisonFilter.java)
{code:java}
public ReturnCode filterKeyValue(Cell cell) {
...
// We found a new column, so we can re-evaluate
// TODO: if we have row key columns in our expression, should
// we always evaluate or just wait until the end?
this.matchedColumn = this.evaluate(inputTuple);
if (this.matchedColumn == null) {
if (inputTuple.isImmutable()) {
this.matchedColumn = Boolean.FALSE;
} else {
return ReturnCode.INCLUDE_AND_NEXT_COL;
}
}
...
}
public boolean filterRow() {
if (this.matchedColumn == null && !inputTuple.isImmutable() &&
expression.requiresFinalEvaluation()) {
inputTuple.setImmutable();
this.matchedColumn = this.evaluate(inputTuple);
}
return ! (Boolean.TRUE.equals(this.matchedColumn));
}
{code}
Above, filterKeyValue -> this.matchedColumn = true -> filterRow -> return false
-> not filter row
Thus, the reason of query:
{code:java}
select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and
"ZN" in (10846) ); {code}
`("JR1" in (2303) or "JR2" in (10846)` is true and `isStopValue((Boolean)
PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` ->
`(Boolean)expression.getDataType().toObject(tempPtr) = null` ->
this.matchedColumn = false -> filterRow -> return true -> filter row
And, the reason of query:
{code:java}
select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2"
in (10846) )); {code}
hbase read cell and filter it by alphabetical order.
first read "AB" and `"AB" in (123)` is true then `partialEvalState.set(0)`,
then read "JR1" and set subfilter `partialEvalState.set(0)`
final read "JR2" -> skip index 0 since partialEvalState.get(0) -> ("JR1" in
(2303) or "JR2" in (10846) ) is true and PBoolean.INSTANCE.toObject(ptr,
child.getDataType())) = true` ->
`(Boolean)expression.getDataType().toObject(tempPtr) = null`
-> this.matchedColumn = false -> filterRow -> return true -> filter row
was:
create table && upsert values
{code:java}
CREATE TABLE IF NOT EXISTS "TEST_TABLE" (
"K" VARCHAR PRIMARY KEY,
"f"."JR1" UNSIGNED_INT DEFAULT 0,
"f"."JR2" UNSIGNED_INT DEFAULT 0,
"f"."ZN" UNSIGNED_INT DEFAULT 0,
"f"."AA" UNSIGNED_INT,
"f"."AB" UNSIGNED_INT DEFAULT 0,
"f"."AC" INTEGER DEFAULT 0,
"f"."AD" INTEGER
) column_encoded_bytes=0 SPLIT ON (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD)
VALUES('5:xiang:346358075:370562729', 270, null, 10846, null, null, null, null);
UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD)
VALUES('0:chaozhu:123123120:321321321' , 123, null, 10846, 23303, 123, 321,
111);
{code}
then execute query:
{code:java}
select * from "TEST_TABLE" where ( "ZN" in (10846) and ("JR1" in (2303) or
"JR2" in (10846) ));{code}
result:
!image-2022-10-10-18-41-09-059.png!
Above, neither `JR1` nor `JR2` matches query statement but result contains 2
rows.
And execute query (change the order):
{code:java}
select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and
"ZN" in (10846) ); {code}
result:
!image-2022-10-10-22-24-52-310.png!
Or execute query ( "ZN" in (10846) -> "AB" in ( 123 )):
{code:java}
select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2"
in (10846) ));
{code}
result:
!image-2022-10-10-23-10-01-059.png!
I try to debug it by docker and remote debug. Maybe I know the cause of this
problem.
In
{*}ComparisonExpression.java{*}(org/apache/phoenix/expression/ComparisonExpression.java)
{code:java}
@Override
public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
if (!children.get(0).evaluate(tuple, ptr)) {
return false;
}
if (ptr.getLength() == 0) { // null comparison evals to null
return true;
}
....
}{code}
because `JR2` value is null, so `ptr.getLength() == 0` is true then return true
directly.
In *AndOrExpression.java* (org/apache/phoenix/expression/AndOrExpression.java)
{code:java}
public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
boolean isNull = false;
for (int i = 0; i < children.size(); i++) {
Expression child = children.get(i);
// If partial state is available, then use that to know we've already
evaluated this
// child expression and do not need to do so again.
if (partialEvalState == null || !partialEvalState.get(i)) {
// Call through to child evaluate method matching parent call to
allow child to optimize
// evaluate versus getValue code path.
if (child.evaluate(tuple, ptr)) {
// Short circuit if we see our stop value
if (isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr,
child.getDataType()))) {
return true;
} else if (partialEvalState != null) {
partialEvalState.set(i);
}
} else {
isNull = true;
}
}
}
if (isNull) {
return false;
}
return true;
} {code}
because evaluate `JR2` return true so `("JR1" in (2303) or "JR2" in (10846))`
is true.
In *BooleanExpressionFilter.Java*
(org/apache/phoenix/filter/BooleanExpressionFilter.java)
{code:java}
@Override
public String toString() {
return expression.toString();
} @edu.umd.cs.findbugs.annotations.SuppressWarnings(
value="NP_BOOLEAN_RETURN_NULL",
justification="Returns null by design.")
protected Boolean evaluate(Tuple input) {
try {
if (!expression.evaluate(input, tempPtr)) {
return null;
}
} catch (IllegalDataException e) {
return Boolean.FALSE;
}
return (Boolean)expression.getDataType().toObject(tempPtr);
}{code}
now `(Boolean)expression.getDataType().toObject(tempPtr) = true` ( `"ZN" in
(10846) is true` cause ) **
In *MultiKeyValueComparisonFilter.java*
(org/apache/phoenix/filter/MultiKeyValueComparisonFilter.java)
{code:java}
public ReturnCode filterKeyValue(Cell cell) {
...
// We found a new column, so we can re-evaluate
// TODO: if we have row key columns in our expression, should
// we always evaluate or just wait until the end?
this.matchedColumn = this.evaluate(inputTuple);
if (this.matchedColumn == null) {
if (inputTuple.isImmutable()) {
this.matchedColumn = Boolean.FALSE;
} else {
return ReturnCode.INCLUDE_AND_NEXT_COL;
}
}
...
}
public boolean filterRow() {
if (this.matchedColumn == null && !inputTuple.isImmutable() &&
expression.requiresFinalEvaluation()) {
inputTuple.setImmutable();
this.matchedColumn = this.evaluate(inputTuple);
}
return ! (Boolean.TRUE.equals(this.matchedColumn));
}
{code}
Above, filterKeyValue -> this.matchedColumn = true -> filterRow -> return false
-> not filter row
Thus, the reason of query:
{code:java}
select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and
"ZN" in (10846) ); {code}
`("JR1" in (2303) or "JR2" in (10846)` is true and `isStopValue((Boolean)
PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` ->
`(Boolean)expression.getDataType().toObject(tempPtr) = null` ->
this.matchedColumn = false -> filterRow -> return true -> filter row
And, the reason of query:
{code:java}
select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2"
in (10846) )); {code}
hbase read cell and filter it by alphabetical order.
first read "AB" and `"AB" in (123)` is true then `partialEvalState.set(0)`,
then read "JR1" and set subfilter `partialEvalState.set(0)`
final read "JR2" -> skip index 0 since partialEvalState.get(0) -> ("JR1" in
(2303) or "JR2" in (10846) ) is true and PBoolean.INSTANCE.toObject(ptr,
child.getDataType())) = true` ->
`(Boolean)expression.getDataType().toObject(tempPtr) = null`
-> this.matchedColumn = false -> filterRow -> return true -> filter row
> query filter cannot get correct result
> --------------------------------------
>
> Key: PHOENIX-6808
> URL: https://issues.apache.org/jira/browse/PHOENIX-6808
> Project: Phoenix
> Issue Type: Bug
> Components: 4.x, core
> Affects Versions: 4.14.1
> Environment: hbase-1.4
> phoenix-4.14.1-HBase-1.4
> Reporter: XIANG,CHAO-ZHU
> Priority: Major
> Attachments: image-2022-10-10-18-41-09-059.png,
> image-2022-10-10-22-41-51-879.png, image-2022-10-10-23-10-01-059.png,
> image-2022-10-11-00-06-32-190.png
>
>
> create table && upsert values
>
> {code:java}
> CREATE TABLE IF NOT EXISTS "TEST_TABLE" (
> "K" VARCHAR PRIMARY KEY,
> "f"."JR1" UNSIGNED_INT DEFAULT 0,
> "f"."JR2" UNSIGNED_INT DEFAULT 0,
> "f"."ZN" UNSIGNED_INT DEFAULT 0,
> "f"."AA" UNSIGNED_INT,
> "f"."AB" UNSIGNED_INT DEFAULT 0,
> "f"."AC" INTEGER DEFAULT 0,
> "f"."AD" INTEGER
> ) column_encoded_bytes=0 SPLIT ON (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
> UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD)
> VALUES('5:xiang:346358075:370562729', 270, null, 10846, null, null, null,
> null);
> UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD)
> VALUES('0:chaozhu:123123120:321321321' , 123, null, 10846, 23303, 123, 321,
> 111);
> {code}
> then execute query:
>
> {code:java}
> select * from "TEST_TABLE" where ( "ZN" in (10846) and ("JR1" in (2303) or
> "JR2" in (10846) ));{code}
> result:
> !image-2022-10-10-18-41-09-059.png!
> Above, neither `JR1` nor `JR2` matches query statement but result contains
> 2 rows.
> And execute query (change the order):
>
> {code:java}
> select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and
> "ZN" in (10846) ); {code}
> result:
> !image-2022-10-11-00-06-32-190.png!
> Or execute query ( "ZN" in (10846) -> "AB" in ( 123 )):
>
> {code:java}
> select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or
> "JR2" in (10846) ));
> {code}
> result:
>
> !image-2022-10-10-23-10-01-059.png!
>
> I try to debug it by docker and remote debug. Maybe I know the cause of this
> problem.
>
> In
> {*}ComparisonExpression.java{*}(org/apache/phoenix/expression/ComparisonExpression.java)
>
> {code:java}
> @Override
> public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
> if (!children.get(0).evaluate(tuple, ptr)) {
> return false;
> }
> if (ptr.getLength() == 0) { // null comparison evals to null
> return true;
> }
> ....
> }{code}
> because `JR2` value is null, so `ptr.getLength() == 0` is true then return
> true directly.
>
> In *AndOrExpression.java* (org/apache/phoenix/expression/AndOrExpression.java)
> {code:java}
> public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
> boolean isNull = false;
> for (int i = 0; i < children.size(); i++) {
> Expression child = children.get(i);
> // If partial state is available, then use that to know we've already
> evaluated this
> // child expression and do not need to do so again.
> if (partialEvalState == null || !partialEvalState.get(i)) {
> // Call through to child evaluate method matching parent call to
> allow child to optimize
> // evaluate versus getValue code path.
> if (child.evaluate(tuple, ptr)) {
> // Short circuit if we see our stop value
> if (isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr,
> child.getDataType()))) {
> return true;
> } else if (partialEvalState != null) {
> partialEvalState.set(i);
> }
> } else {
> isNull = true;
> }
> }
> }
> if (isNull) {
> return false;
> }
> return true;
> } {code}
>
> because evaluate `JR2` return true so `("JR1" in (2303) or "JR2" in (10846))`
> is true.
>
> In *BooleanExpressionFilter.Java*
> (org/apache/phoenix/filter/BooleanExpressionFilter.java)
>
> {code:java}
> @Override
> public String toString() {
> return expression.toString();
> } @edu.umd.cs.findbugs.annotations.SuppressWarnings(
> value="NP_BOOLEAN_RETURN_NULL",
> justification="Returns null by design.")
> protected Boolean evaluate(Tuple input) {
> try {
> if (!expression.evaluate(input, tempPtr)) {
> return null;
> }
> } catch (IllegalDataException e) {
> return Boolean.FALSE;
> }
> return (Boolean)expression.getDataType().toObject(tempPtr);
> }{code}
> now `(Boolean)expression.getDataType().toObject(tempPtr) = true` ( `"ZN" in
> (10846) is true` cause ) **
>
> In *MultiKeyValueComparisonFilter.java*
> (org/apache/phoenix/filter/MultiKeyValueComparisonFilter.java)
>
> {code:java}
> public ReturnCode filterKeyValue(Cell cell) {
> ...
> // We found a new column, so we can re-evaluate
> // TODO: if we have row key columns in our expression, should
> // we always evaluate or just wait until the end?
> this.matchedColumn = this.evaluate(inputTuple);
> if (this.matchedColumn == null) {
> if (inputTuple.isImmutable()) {
> this.matchedColumn = Boolean.FALSE;
> } else {
> return ReturnCode.INCLUDE_AND_NEXT_COL;
> }
> }
> ...
> }
> public boolean filterRow() {
> if (this.matchedColumn == null && !inputTuple.isImmutable() &&
> expression.requiresFinalEvaluation()) {
> inputTuple.setImmutable();
> this.matchedColumn = this.evaluate(inputTuple);
> }
>
> return ! (Boolean.TRUE.equals(this.matchedColumn));
> }
> {code}
> Above, filterKeyValue -> this.matchedColumn = true -> filterRow -> return
> false -> not filter row
>
>
> Thus, the reason of query:
>
> {code:java}
> select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and
> "ZN" in (10846) ); {code}
> `("JR1" in (2303) or "JR2" in (10846)` is true and `isStopValue((Boolean)
> PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` ->
> `(Boolean)expression.getDataType().toObject(tempPtr) = null` ->
> this.matchedColumn = false -> filterRow -> return true -> filter row
>
> And, the reason of query:
>
> {code:java}
> select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or
> "JR2" in (10846) )); {code}
> hbase read cell and filter it by alphabetical order.
> first read "AB" and `"AB" in (123)` is true then `partialEvalState.set(0)`,
> then read "JR1" and set subfilter `partialEvalState.set(0)`
> final read "JR2" -> skip index 0 since partialEvalState.get(0) -> ("JR1" in
> (2303) or "JR2" in (10846) ) is true and PBoolean.INSTANCE.toObject(ptr,
> child.getDataType())) = true` ->
> `(Boolean)expression.getDataType().toObject(tempPtr) = null`
> -> this.matchedColumn = false -> filterRow -> return true -> filter row
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)