[ 
https://issues.apache.org/jira/browse/ORC-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041980#comment-17041980
 ] 

Panagiotis Garefalakis edited comment on ORC-597 at 4/9/20, 11:52 AM:
----------------------------------------------------------------------

Row-filter benchark uses existing datasets (github, sales, taxi) with 
configurable filter_percentages and projected columns.
 It seems that even filtering out 90% of the rows can drop runtime by a second 
while filtering-out as low as 20% performs on par with no-filtering at all.

 
{code:java}
Benchmark                                                 (compression)  
(dataset)  (filter_percentage)  (projected_columns)  Mode  Cnt          Score   
      Error  Units
RowFilterProjectionBenchmark.orcNoFilter                           none      
sales                 0.01                  all  avgt    5   11475225.464 ± 
1623255.254  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      
sales                 0.01                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      
sales                 0.01                  all  avgt    5          0.459 ±     
  0.065  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      
sales                 0.01                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      
sales                 0.01                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcNoFilter                           none      
sales                  0.1                  all  avgt    5   11675996.797 ± 
2018888.900  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      
sales                  0.1                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      
sales                  0.1                  all  avgt    5          0.467 ±     
  0.081  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      
sales                  0.1                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      
sales                  0.1                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcNoFilter                           none      
sales                  0.4                  all  avgt    5   11435162.159 ± 
2618968.876  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      
sales                  0.4                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      
sales                  0.4                  all  avgt    5          0.457 ±     
  0.105  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      
sales                  0.4                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      
sales                  0.4                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcNoFilter                           none      
sales                  0.8                  all  avgt    5   11310452.698 ±  
716395.472  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      
sales                  0.8                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      
sales                  0.8                  all  avgt    5          0.452 ±     
  0.029  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      
sales                  0.8                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      
sales                  0.8                  all  avgt    5  125000000.000       
             #

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

RowFilterProjectionBenchmark.orcRowFilter                          none      
sales                 0.01                  all  avgt    5   10555379.527 ± 
2636332.098  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      
sales                 0.01                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      
sales                 0.01                  all  avgt    5          0.422 ±     
  0.105  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      
sales                 0.01                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      
sales                 0.01                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcRowFilter                          none      
sales                  0.1                  all  avgt    5   10568755.756 ± 
2958742.985  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      
sales                  0.1                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      
sales                  0.1                  all  avgt    5          0.423 ±     
  0.118  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      
sales                  0.1                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      
sales                  0.1                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcRowFilter                          none      
sales                  0.4                  all  avgt    5   10775518.795 ±  
807832.612  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      
sales                  0.4                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      
sales                  0.4                  all  avgt    5          0.431 ±     
  0.032  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      
sales                  0.4                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      
sales                  0.4                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcRowFilter                          none      
sales                  0.8                  all  avgt    5   11479177.704 ±  
957484.991  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      
sales                  0.8                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      
sales                  0.8                  all  avgt    5          0.459 ±     
  0.038  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      
sales                  0.8                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      
sales                  0.8                  all  avgt    5  125000000.000       
             #

{code}


was (Author: pgaref):
Row-filter benchark uses existing datasets (github, sales, taxi) with 
configurable filter_percentages and projected columns.
It seems that even filtering out 10% of the rows can drop runtime by a second 
while filtering-out as low as 20% performs on par with no-filtering at all.

 
{code:java}
Benchmark                                                 (compression)  
(dataset)  (filter_percentage)  (projected_columns)  Mode  Cnt          Score   
      Error  Units
RowFilterProjectionBenchmark.orcNoFilter                           none      
sales                 0.01                  all  avgt    5   11475225.464 ± 
1623255.254  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      
sales                 0.01                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      
sales                 0.01                  all  avgt    5          0.459 ±     
  0.065  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      
sales                 0.01                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      
sales                 0.01                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcNoFilter                           none      
sales                  0.1                  all  avgt    5   11675996.797 ± 
2018888.900  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      
sales                  0.1                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      
sales                  0.1                  all  avgt    5          0.467 ±     
  0.081  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      
sales                  0.1                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      
sales                  0.1                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcNoFilter                           none      
sales                  0.4                  all  avgt    5   11435162.159 ± 
2618968.876  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      
sales                  0.4                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      
sales                  0.4                  all  avgt    5          0.457 ±     
  0.105  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      
sales                  0.4                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      
sales                  0.4                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcNoFilter                           none      
sales                  0.8                  all  avgt    5   11310452.698 ±  
716395.472  us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord            none      
sales                  0.8                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcNoFilter:perRecord                 none      
sales                  0.8                  all  avgt    5          0.452 ±     
  0.029  us/op
RowFilterProjectionBenchmark.orcNoFilter:reads                     none      
sales                  0.8                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcNoFilter:records                   none      
sales                  0.8                  all  avgt    5  125000000.000       
             #

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

RowFilterProjectionBenchmark.orcRowFilter                          none      
sales                 0.01                  all  avgt    5   10555379.527 ± 
2636332.098  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      
sales                 0.01                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      
sales                 0.01                  all  avgt    5          0.422 ±     
  0.105  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      
sales                 0.01                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      
sales                 0.01                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcRowFilter                          none      
sales                  0.1                  all  avgt    5   10568755.756 ± 
2958742.985  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      
sales                  0.1                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      
sales                  0.1                  all  avgt    5          0.423 ±     
  0.118  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      
sales                  0.1                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      
sales                  0.1                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcRowFilter                          none      
sales                  0.4                  all  avgt    5   10775518.795 ±  
807832.612  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      
sales                  0.4                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      
sales                  0.4                  all  avgt    5          0.431 ±     
  0.032  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      
sales                  0.4                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      
sales                  0.4                  all  avgt    5  125000000.000       
             #
RowFilterProjectionBenchmark.orcRowFilter                          none      
sales                  0.8                  all  avgt    5   11479177.704 ±  
957484.991  us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord           none      
sales                  0.8                  all  avgt    5        623.538       
             #
RowFilterProjectionBenchmark.orcRowFilter:perRecord                none      
sales                  0.8                  all  avgt    5          0.459 ±     
  0.038  us/op
RowFilterProjectionBenchmark.orcRowFilter:reads                    none      
sales                  0.8                  all  avgt    5        895.000       
             #
RowFilterProjectionBenchmark.orcRowFilter:records                  none      
sales                  0.8                  all  avgt    5  125000000.000       
             #

{code}

> Row-level Filtering bench
> -------------------------
>
>                 Key: ORC-597
>                 URL: https://issues.apache.org/jira/browse/ORC-597
>             Project: ORC
>          Issue Type: Sub-task
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>         Attachments: RowFilterBenchBoolean.out, RowFilterBenchDecimal.out, 
> RowFilterBenchDouble.out, RowFilterBenchString.out, 
> RowFilterBenchTimestamp.out
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Extend orc-benchmarks for row-level filtering



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to