[jira] [Comment Edited] (KUDU-1644) Simplify IN-list predicate values based on tablet partition key or rowset PK bounds

wangningito (Jira) Wed, 04 Dec 2019 22:55:11 -0800


    [ 
https://issues.apache.org/jira/browse/KUDU-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974290#comment-16974290
 ]


wangningito edited comment on KUDU-1644 at 12/5/19 6:54 AM:
------------------------------------------------------------

Here I submitted an implementation for token-based scan in case of only one 
hash partition which it contains only one key.  
[https://gerrit.cloudera.org/c/14706/ |https://gerrit.cloudera.org/c/14706/]

This implementation, in client module, filtered the values to be pushed during 
the stage of token building while do very slightly modification of current code 
and slightly impact on performance.

In previous pruneHashComponent method, all the hash bucket of rows were 
calculated, I simply implemented the idea by collecting those id and replace 
the in-list predicate values with filtered values . So this implementation were 
done with almost no performance impaction for other case. I implemented it by 
place it in client instead of place in tablet while the performance improvement 
can be acquired in two aspects, less values for transport in network, and 
reduction the complexity of further binary search logarithmically.

Here I attach some performance benchmark with this implementation.

Hardware:

Client:  4 cores, 8g memory 

Server: 4 cores, 8g memory

In-List size: 100000, all query happen in cache.

The table to be scan by in-list query contains 10M rows and 30 dense columns, 
cells are consist of  BIGINT or STRING randomly.   24 partitions.

Before tuning:
 !image-2019-12-05-14-54-03-485.png! 

After tuning:

 !image-2019-12-05-14-53-57-741.png! 


was (Author: wangning):
Here I submitted an implementation for token-based scan in case of only one 
hash partition which it contains only one key.  
[https://gerrit.cloudera.org/c/14706/ |https://gerrit.cloudera.org/c/14706/]

This implementation, in client module, filtered the values to be pushed during 
the stage of token building while do very slightly modification of current code 
and slightly impact on performance.

In previous pruneHashComponent method, all the hash bucket of rows were 
calculated, I simply implemented the idea by collecting those id and replace 
the in-list predicate values with filtered values . So this implementation were 
done with almost no performance impaction for other case. I implemented it by 
place it in client instead of place in tablet while the performance improvement 
can be acquired in two aspects, less values for transport in network, and 
reduction the complexity of further binary search logarithmically.

Here I attach some performance benchmark with this implementation.

Hardware:

Client:  4 cores, 8g memory 

Server: 4 cores, 8g memory

In-List size: 100000, all query happen in cache.

The table to be scan by in-list query contains 10M rows and 30 dense columns, 
cells are consist of  BIGINT or STRING randomly.   24 partitions.

Before tuning:

!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-11_19-11-21.png?version=1&modificationDate=1573470681000&api=v2!

After tuning:

!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-12_15-5-57.png?version=1&modificationDate=1573542358000&api=v2!

> Simplify IN-list predicate values based on tablet partition key or rowset PK 
> bounds
> -----------------------------------------------------------------------------------
>
>                 Key: KUDU-1644
>                 URL: https://issues.apache.org/jira/browse/KUDU-1644
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: perf, tablet
>            Reporter: Dan Burkert
>            Priority: Major
>         Attachments: image-2019-12-05-14-52-05-846.png, 
> image-2019-12-05-14-52-18-487.png, image-2019-12-05-14-53-51-175.png, 
> image-2019-12-05-14-53-57-741.png, image-2019-12-05-14-54-03-485.png
>
>
> When new scans are optimized by the tablet, the tablet's partition key bounds 
> aren't taken into account in order to remove predicates from the scan.  One 
> of the most important such optimizations is that IN-list predicates could 
> remove values based on the tablet's constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KUDU-1644) Simplify IN-list predicate values based on tablet partition key or rowset PK bounds

Reply via email to