[
https://issues.apache.org/jira/browse/ORC-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041980#comment-17041980
]
Panagiotis Garefalakis edited comment on ORC-597 at 4/9/20, 11:52 AM:
----------------------------------------------------------------------
Row-filter benchark uses existing datasets (github, sales, taxi) with
configurable filter_percentages and projected columns.
It seems that even filtering out 90% of the rows can drop runtime by a second
while filtering-out as low as 20% performs on par with no-filtering at all.
{code:java}
Benchmark (compression)
(dataset) (filter_percentage) (projected_columns) Mode Cnt Score
Error Units
RowFilterProjectionBenchmark.orcNoFilter none
sales 0.01 all avgt 5 11475225.464 ±
1623255.254 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none
sales 0.01 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcNoFilter:perRecord none
sales 0.01 all avgt 5 0.459 ±
0.065 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none
sales 0.01 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcNoFilter:records none
sales 0.01 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcNoFilter none
sales 0.1 all avgt 5 11675996.797 ±
2018888.900 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none
sales 0.1 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcNoFilter:perRecord none
sales 0.1 all avgt 5 0.467 ±
0.081 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none
sales 0.1 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcNoFilter:records none
sales 0.1 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcNoFilter none
sales 0.4 all avgt 5 11435162.159 ±
2618968.876 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none
sales 0.4 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcNoFilter:perRecord none
sales 0.4 all avgt 5 0.457 ±
0.105 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none
sales 0.4 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcNoFilter:records none
sales 0.4 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcNoFilter none
sales 0.8 all avgt 5 11310452.698 ±
716395.472 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none
sales 0.8 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcNoFilter:perRecord none
sales 0.8 all avgt 5 0.452 ±
0.029 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none
sales 0.8 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcNoFilter:records none
sales 0.8 all avgt 5 125000000.000
#
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RowFilterProjectionBenchmark.orcRowFilter none
sales 0.01 all avgt 5 10555379.527 ±
2636332.098 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none
sales 0.01 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcRowFilter:perRecord none
sales 0.01 all avgt 5 0.422 ±
0.105 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none
sales 0.01 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcRowFilter:records none
sales 0.01 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcRowFilter none
sales 0.1 all avgt 5 10568755.756 ±
2958742.985 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none
sales 0.1 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcRowFilter:perRecord none
sales 0.1 all avgt 5 0.423 ±
0.118 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none
sales 0.1 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcRowFilter:records none
sales 0.1 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcRowFilter none
sales 0.4 all avgt 5 10775518.795 ±
807832.612 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none
sales 0.4 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcRowFilter:perRecord none
sales 0.4 all avgt 5 0.431 ±
0.032 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none
sales 0.4 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcRowFilter:records none
sales 0.4 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcRowFilter none
sales 0.8 all avgt 5 11479177.704 ±
957484.991 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none
sales 0.8 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcRowFilter:perRecord none
sales 0.8 all avgt 5 0.459 ±
0.038 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none
sales 0.8 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcRowFilter:records none
sales 0.8 all avgt 5 125000000.000
#
{code}
was (Author: pgaref):
Row-filter benchark uses existing datasets (github, sales, taxi) with
configurable filter_percentages and projected columns.
It seems that even filtering out 10% of the rows can drop runtime by a second
while filtering-out as low as 20% performs on par with no-filtering at all.
{code:java}
Benchmark (compression)
(dataset) (filter_percentage) (projected_columns) Mode Cnt Score
Error Units
RowFilterProjectionBenchmark.orcNoFilter none
sales 0.01 all avgt 5 11475225.464 ±
1623255.254 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none
sales 0.01 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcNoFilter:perRecord none
sales 0.01 all avgt 5 0.459 ±
0.065 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none
sales 0.01 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcNoFilter:records none
sales 0.01 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcNoFilter none
sales 0.1 all avgt 5 11675996.797 ±
2018888.900 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none
sales 0.1 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcNoFilter:perRecord none
sales 0.1 all avgt 5 0.467 ±
0.081 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none
sales 0.1 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcNoFilter:records none
sales 0.1 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcNoFilter none
sales 0.4 all avgt 5 11435162.159 ±
2618968.876 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none
sales 0.4 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcNoFilter:perRecord none
sales 0.4 all avgt 5 0.457 ±
0.105 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none
sales 0.4 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcNoFilter:records none
sales 0.4 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcNoFilter none
sales 0.8 all avgt 5 11310452.698 ±
716395.472 us/op
RowFilterProjectionBenchmark.orcNoFilter:bytesPerRecord none
sales 0.8 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcNoFilter:perRecord none
sales 0.8 all avgt 5 0.452 ±
0.029 us/op
RowFilterProjectionBenchmark.orcNoFilter:reads none
sales 0.8 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcNoFilter:records none
sales 0.8 all avgt 5 125000000.000
#
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
RowFilterProjectionBenchmark.orcRowFilter none
sales 0.01 all avgt 5 10555379.527 ±
2636332.098 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none
sales 0.01 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcRowFilter:perRecord none
sales 0.01 all avgt 5 0.422 ±
0.105 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none
sales 0.01 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcRowFilter:records none
sales 0.01 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcRowFilter none
sales 0.1 all avgt 5 10568755.756 ±
2958742.985 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none
sales 0.1 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcRowFilter:perRecord none
sales 0.1 all avgt 5 0.423 ±
0.118 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none
sales 0.1 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcRowFilter:records none
sales 0.1 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcRowFilter none
sales 0.4 all avgt 5 10775518.795 ±
807832.612 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none
sales 0.4 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcRowFilter:perRecord none
sales 0.4 all avgt 5 0.431 ±
0.032 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none
sales 0.4 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcRowFilter:records none
sales 0.4 all avgt 5 125000000.000
#
RowFilterProjectionBenchmark.orcRowFilter none
sales 0.8 all avgt 5 11479177.704 ±
957484.991 us/op
RowFilterProjectionBenchmark.orcRowFilter:bytesPerRecord none
sales 0.8 all avgt 5 623.538
#
RowFilterProjectionBenchmark.orcRowFilter:perRecord none
sales 0.8 all avgt 5 0.459 ±
0.038 us/op
RowFilterProjectionBenchmark.orcRowFilter:reads none
sales 0.8 all avgt 5 895.000
#
RowFilterProjectionBenchmark.orcRowFilter:records none
sales 0.8 all avgt 5 125000000.000
#
{code}
> Row-level Filtering bench
> -------------------------
>
> Key: ORC-597
> URL: https://issues.apache.org/jira/browse/ORC-597
> Project: ORC
> Issue Type: Sub-task
> Reporter: Panagiotis Garefalakis
> Assignee: Panagiotis Garefalakis
> Priority: Major
> Attachments: RowFilterBenchBoolean.out, RowFilterBenchDecimal.out,
> RowFilterBenchDouble.out, RowFilterBenchString.out,
> RowFilterBenchTimestamp.out
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Extend orc-benchmarks for row-level filtering
--
This message was sent by Atlassian Jira
(v8.3.4#803005)