[jira] [Updated] (KYLIN-4260) When using server side PreparedStatement cache, the query result are not match on TopN scenario

Marc Wu (Jira) Sat, 30 Nov 2019 00:18:34 -0800


     [ 
https://issues.apache.org/jira/browse/KYLIN-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marc Wu updated KYLIN-4260:
---------------------------
    Description: 
Hi Kylin team,

I found an issue while server side PreparedStatement enabled. The second time 
query and after's result will be different from the first when query TopN, and 
the result is not right.

Part of Cube info:

 
 Dimensions
 TRANS_ID
 PART_DT
 SELLER_ID
 BUYER_ID

Measures:
 SUM(PRICE)
 MAX(PRICE)
 TOPN(PRICE) Group By:KYLIN_SALES.SELLER_ID,KYLIN_SALES.BUYER_ID

 

SQL:
{code:java}
{"sql":"select seller_id, buyer_id, sum(PRICE) from glaucus.kylin_sales where 
PART_DT >= ? and PART_DT <= ? group by seller_id, buyer_id order by sum(PRICE) 
desc limit 20","project":"DDTFORTEST_Analytics", "params":[{"className": 
"java.lang.String","value": "2012-01-01"},{"className": 
"java.lang.String","value": "2012-01-10"}]}
{code}
The First query result:

!image-2019-11-18-15-55-00-312.png!

The Second and after:

!image-2019-11-18-15-55-11-906.png|width=2046,height=1096!

 -----------------------------------------------
h2. -Root Cause-

Cached preparedContext is changed when doing preparedStatement.executeQuery, 
and losing groupByColumns. So the first execution result is correct, the second 
and the after will be incorrect.

!image-2019-11-18-19-29-34-489.png!

!image-2019-11-18-19-29-42-721.png!
h2. Real Root Cause

The first time we entered PreparedStatement logic, we'll try to borrow 
preparedContext from cache pool, of course there isn't any, but the cache pool 
will execute create method to create a new preparedContext, and then loaned it 
to us.

I didn't figure out how adjustSqlDigest works before, and try to remove code
{code:java}
sqlDigest.groupbyColumns.removeAll(topnLiteralCol){code}
but it's not right. Top-N isn't like some other measures, the dimensions aren't 
as part of row key, they stored in measures in design, so it's why the 
adjustSqlDigest matters, especially those codes.
{code:java}
sqlDigest.groupbyColumns.removeAll(topnLiteralCol);
sqlDigest.metricColumns.addAll(topnLiteralCol);
{code}
The root cause for this issue is because the adjustSqlDigest is execute again 
after we store it in cache, so the digest changed.
{code:java}
# This is from the first time
fact table GLAUCUS.KYLIN_SALES,group by [],filter on 
[GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=TOP_N, 
parameter=GLAUCUS.KYLIN_SALES.PRICE,GLAUCUS.KYLIN_SALES.SELLER_ID,GLAUCUS.KYLIN_SALES.BUYER_ID,
 returnType=topn(5000,8)]].

# This is the second one
fact table GLAUCUS.KYLIN_SALES,group by [],filter on 
[GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=SUM, 
parameter=GLAUCUS.KYLIN_SALES.PRICE, returnType=decimal(19,4)]].

{code}
So the second time and after we execute the same query or same pattern, the 
expression will be changed to SUM instead of TOPN, that's why the strange 
result show up.

  was:
Hi Kylin team,

I found an issue while server side PreparedStatement enabled. The second time 
query and after's result will be different from the first when query TopN, and 
the result is not right.

Part of Cube info:

 
 Dimensions
 TRANS_ID
 PART_DT
 SELLER_ID
 BUYER_ID

Measures:
 SUM(PRICE)
 MAX(PRICE)
 TOPN(PRICE) Group By:KYLIN_SALES.SELLER_ID,KYLIN_SALES.BUYER_ID

 

SQL:
{code:java}
{"sql":"select seller_id, buyer_id, sum(PRICE) from glaucus.kylin_sales where 
PART_DT >= ? and PART_DT <= ? group by seller_id, buyer_id order by sum(PRICE) 
desc limit 20","project":"DDTFORTEST_Analytics", "params":[{"className": 
"java.lang.String","value": "2012-01-01"},{"className": 
"java.lang.String","value": "2012-01-10"}]}
{code}
The First query result:

!image-2019-11-18-15-55-00-312.png!

The Second and after:

!image-2019-11-18-15-55-11-906.png|width=2046,height=1096!

 -----------------------------------------------
h2. -Root Cause-

Cached preparedContext is changed when doing preparedStatement.executeQuery, 
and losing groupByColumns. So the first execution result is correct, the second 
and the after will be incorrect.

!image-2019-11-18-19-29-34-489.png!

!image-2019-11-18-19-29-42-721.png!
h2.  Root Cause

The first time we entered PreparedStatement logic, we'll try to borrow 
preparedContext from cache pool, of course there isn't any, but the cache pool 
will execute create method to create a new preparedContext, and then loaned it 
to us.

I didn't figure out how adjustSqlDigest works before, and try to remove code
{code:java}
sqlDigest.groupbyColumns.removeAll(topnLiteralCol){code}
but it's not right. Top-N isn't like some other measures, the dimensions aren't 
as part of row key, they stored in measures in design, so it's why the 
adjustSqlDigest matters.


> When using server side PreparedStatement cache, the query result are not 
> match on TopN scenario
> -----------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4260
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4260
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: v3.0.0-alpha2, v2.6.4
>            Reporter: Marc Wu
>            Assignee: Marc Wu
>            Priority: Major
>             Fix For: v3.0.0, v2.6.5
>
>         Attachments: image-2019-11-18-15-55-00-312.png, 
> image-2019-11-18-15-55-11-906.png, image-2019-11-18-19-29-34-489.png, 
> image-2019-11-18-19-29-42-721.png
>
>
> Hi Kylin team,
> I found an issue while server side PreparedStatement enabled. The second time 
> query and after's result will be different from the first when query TopN, 
> and the result is not right.
> Part of Cube info:
>  
>  Dimensions
>  TRANS_ID
>  PART_DT
>  SELLER_ID
>  BUYER_ID
> Measures:
>  SUM(PRICE)
>  MAX(PRICE)
>  TOPN(PRICE) Group By:KYLIN_SALES.SELLER_ID,KYLIN_SALES.BUYER_ID
>  
> SQL:
> {code:java}
> {"sql":"select seller_id, buyer_id, sum(PRICE) from glaucus.kylin_sales where 
> PART_DT >= ? and PART_DT <= ? group by seller_id, buyer_id order by 
> sum(PRICE) desc limit 20","project":"DDTFORTEST_Analytics", 
> "params":[{"className": "java.lang.String","value": 
> "2012-01-01"},{"className": "java.lang.String","value": "2012-01-10"}]}
> {code}
> The First query result:
> !image-2019-11-18-15-55-00-312.png!
> The Second and after:
> !image-2019-11-18-15-55-11-906.png|width=2046,height=1096!
>  -----------------------------------------------
> h2. -Root Cause-
> Cached preparedContext is changed when doing preparedStatement.executeQuery, 
> and losing groupByColumns. So the first execution result is correct, the 
> second and the after will be incorrect.
> !image-2019-11-18-19-29-34-489.png!
> !image-2019-11-18-19-29-42-721.png!
> h2. Real Root Cause
> The first time we entered PreparedStatement logic, we'll try to borrow 
> preparedContext from cache pool, of course there isn't any, but the cache 
> pool will execute create method to create a new preparedContext, and then 
> loaned it to us.
> I didn't figure out how adjustSqlDigest works before, and try to remove code
> {code:java}
> sqlDigest.groupbyColumns.removeAll(topnLiteralCol){code}
> but it's not right. Top-N isn't like some other measures, the dimensions 
> aren't as part of row key, they stored in measures in design, so it's why the 
> adjustSqlDigest matters, especially those codes.
> {code:java}
> sqlDigest.groupbyColumns.removeAll(topnLiteralCol);
> sqlDigest.metricColumns.addAll(topnLiteralCol);
> {code}
> The root cause for this issue is because the adjustSqlDigest is execute again 
> after we store it in cache, so the digest changed.
> {code:java}
> # This is from the first time
> fact table GLAUCUS.KYLIN_SALES,group by [],filter on 
> [GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=TOP_N, 
> parameter=GLAUCUS.KYLIN_SALES.PRICE,GLAUCUS.KYLIN_SALES.SELLER_ID,GLAUCUS.KYLIN_SALES.BUYER_ID,
>  returnType=topn(5000,8)]].
> # This is the second one
> fact table GLAUCUS.KYLIN_SALES,group by [],filter on 
> [GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=SUM, 
> parameter=GLAUCUS.KYLIN_SALES.PRICE, returnType=decimal(19,4)]].
> {code}
> So the second time and after we execute the same query or same pattern, the 
> expression will be changed to SUM instead of TOPN, that's why the strange 
> result show up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KYLIN-4260) When using server side PreparedStatement cache, the query result are not match on TopN scenario

Reply via email to