[ 
https://issues.apache.org/jira/browse/IMPALA-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826924#comment-16826924
 ] 

Daniel Becker commented on IMPALA-8381:
---------------------------------------

We made some benchmarks comparing the old and the new implementations. The code 
for the benchmarks is not in the patch for this Jira as the benchmarks were 
originally written for the delta encoding (implementation in progress: 
[https://gerrit.cloudera.org/#/c/12621/]).

The benchmarks read plain encoded values to a "scratch batch" with the given 
stride. `OutType` means the type of the values in the scratch batch. We use the 
Impala benchmark framework. For the values, bigger is better.

Baseline (old implementation):
||ParquetType||OutType||Stride||p10||p50||p90||
|INT32|int32_t|8|136|140|141|
|INT32|int32_t|12|136|139|139|
|INT32|int32_t|16|137|139|140|
|INT32|int32_t|20|137|139|140|
|INT32|int32_t|30|125|128|129|
|INT32|int32_t|40|92.3|93.9|94.7|
|INT32|int32_t|50|72|72.7|73.1|
|INT32|int32_t|80|54|54.4|55|
|INT32|int32_t|100|51.7|52.2|52.7|
|INT32|int32_t|120|49.4|49.9|50.4|
|INT32|int32_t|150|46.3|46.6|47|
|INT32|int32_t|180|46.9|47.6|48.4|
|INT32|int32_t|200|46.9|47.9|48.6|
|INT32|int32_t|400|51.7|52.9|54.4|
||ParquetType||OutType||Stride||p10||p50||p90||
|INT64|int64_t|8|137|140|140|
|INT64|int64_t|12|135|140|141|
|INT64|int64_t|16|137|139|141|
|INT64|int64_t|20|132|138|139|
|INT64|int64_t|30|115|117|117|
|INT64|int64_t|40|88.2|89.5|90.1|
|INT64|int64_t|50|69.5|70|70.5|
|INT64|int64_t|80|52.7|53|53.4|
|INT64|int64_t|100|48|48.5|48.9|
|INT64|int64_t|120|48.1|48.6|49.1|
|INT64|int64_t|150|43.1|43.6|43.8|
|INT64|int64_t|180|42.2|42.4|43|
|INT64|int64_t|200|43.9|44.4|44.8|
|INT64|int64_t|400|42.5|42.8|43.2|

New implementation:
||ParquetType||OutType||Stride||p10||p50||p90||
|INT32|int32_t|8|281|284|286|
|INT32|int32_t|12|257|261|263|
|INT32|int32_t|16|237|240|242|
|INT32|int32_t|20|209|212|215|
|INT32|int32_t|30|136|138|139|
|INT32|int32_t|40|95.7|96.5|97.2|
|INT32|int32_t|50|73.2|73.9|74.5|
|INT32|int32_t|80|54|54.5|55.2|
|INT32|int32_t|100|51.9|52.4|52.8|
|INT32|int32_t|120|49.6|50|50.4|
|INT32|int32_t|150|45.9|46.6|47|
|INT32|int32_t|180|47.3|48.6|49.5|
|INT32|int32_t|200|47.4|48.6|49.7|
|INT32|int32_t|400|50.7|52.9|54.4|
||ParquetType||OutType||Stride||p10||p50||p90||
|INT64|int64_t|8|264|268|271|
|INT64|int64_t|12|221|224|226|
|INT64|int64_t|16|216|217|219|
|INT64|int64_t|20|184|186|188|
|INT64|int64_t|30|120|122|123|
|INT64|int64_t|40|90.2|91.3|91.9|
|INT64|int64_t|50|69.3|69.9|70.4|
|INT64|int64_t|80|52.7|53.2|53.6|
|INT64|int64_t|100|48.2|48.8|49.1|
|INT64|int64_t|120|48.2|48.7|49|
|INT64|int64_t|150|43.2|43.6|44|
|INT64|int64_t|180|42.2|42.7|43|
|INT64|int64_t|200|44|44.5|44.7|
|INT64|int64_t|400|43.9|44.5|45.1|

> Remove branch from ParquetPlainEncoder::Decode()
> ------------------------------------------------
>
>                 Key: IMPALA-8381
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8381
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Daniel Becker
>            Priority: Minor
>              Labels: newbie, parquet, performance, ramp-up
>
> Removing the "if" at
> https://github.com/apache/impala/blob/5670f96b828d57f9e36510bb9af02bcc31de775c/be/src/exec/parquet/parquet-common.h#L203
> can lead to 1.5x speed up in plain decoding (type=int32, stride=16). For 
> primitive types, the same check can be done for a whole batch, so the speedup 
> can be gained for large batches without loosing safety. The only Parquet type 
> where this check is needed per element is BYTE_ARRAY (typically used for 
> STRING columns), which already has a template specialization for  
> ParquetPlainEncoder::Decode().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to