guiyanakuang commented on pull request #915:
URL: https://github.com/apache/orc/pull/915#issuecomment-936629718


   I have completed a benchmark test (dba9a1a) using the current implementation 
for the time being. To show the benefits of custom statistics.
   Added OptimizeFilterBenchmark. Tested the performance of the default query, 
and of the filter condition with the base percentage of filter values 
re-ordered by TDigest.
   
   proportion: Ratio of cardinal number between columns
   quota: Minimum cardinal number
   
   ```
   Benchmark                            (proportion)  (quota)  Mode  Cnt     
Score     Error  Units
   OptimizeFilterBenchmark.noUseTDigest             2       10  avgt   20  
1052.305 ±  10.632  us/op
   OptimizeFilterBenchmark.noUseTDigest             2      100  avgt   20  
1109.375 ±  10.162  us/op
   OptimizeFilterBenchmark.noUseTDigest             2     1000  avgt   20  
1173.790 ±  11.696  us/op
   OptimizeFilterBenchmark.noUseTDigest             3       10  avgt   20  
1056.139 ±   8.359  us/op
   OptimizeFilterBenchmark.noUseTDigest             3      100  avgt   20  
1154.665 ±   9.152  us/op
   OptimizeFilterBenchmark.noUseTDigest             3     1000  avgt   20  
1168.113 ±   9.115  us/op
   OptimizeFilterBenchmark.useTDigest               2       10  avgt   20  
1116.076 ±   6.330  us/op
   OptimizeFilterBenchmark.useTDigest               2      100  avgt   20  
1162.956 ±   9.865  us/op
   OptimizeFilterBenchmark.useTDigest               2     1000  avgt   20  
1220.028 ±  22.544  us/op
   OptimizeFilterBenchmark.useTDigest               3       10  avgt   20  
1114.617 ±  10.220  us/op
   OptimizeFilterBenchmark.useTDigest               3      100  avgt   20  
1219.488 ± 138.798  us/op
   OptimizeFilterBenchmark.useTDigest               3     1000  avgt   20   
651.001 ±  20.784  us/op
   ```
   
   Tests show some performance loss in reading custom statistics structures, 
around 100 us/op. When there are sparse values in the filter conditions, 
reordering the filter conditions results in a larger performance gain, as this 
allows for early pruning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@orc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to