[jira] [Commented] (IMPALA-8381) Remove branch from ParquetPlainEncoder::Decode()

2019-04-26 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827373#comment-16827373
 ] 

ASF subversion and git services commented on IMPALA-8381:
-

Commit 6a703741d8fdc359833a0d593ca8b121cd5d890d in impala's branch 
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6a70374 ]

IMPALA-8381: Optimize ParquetPlainEncoder::DecodeBatch() for simple types

Refactored the ParquetPlainEncoder::Decode() and
ParquetPlainEncoder::DecodeBatch() methods to increase performance in
batch decoding.

The `Decode` and `DecodeBatch` methods retain their behaviour and
outward interface, but the internal structure changes.

We change how we split up the `Decode` template specialisations. The
generic unspecialised template is used for numerical parquet types
(INT32, INT64, INT96, FLOAT and DOUBLE) and various specialisations are
used for BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY.

We add a new method template, DecodeNoCheck, which does the actual
decoding without bounds checking. It is called by the generic Decode
method template internally. For all parquet types except for BYTE_ARRAY,
DecodeBatch performs the bounds check once for the whole batch at the
same time and calls DecodeNoCheck, so we save the cost of bounds
checking for every decoded value. For BYTE_ARRAY, this cannot be done
and we have to perform the checks for every value.

In the non-BYTE_ARRAY version of DecodeBatch, we explicitly unroll the
loop in batches of 8 to increase performance.

The overall performance increase is up to 2x for small strides (8 bytes,
INT32) but decreases as the stride increases, and disappears from around
40 bytes. With bigger strides, there is no performance difference from
the previous implementation.

Testing:
  Added tests to parquet-plain-test.cc to test the `Decode` and the
  `DecodeBatch` methods both in single-value decoding and batch
  decoding.

Change-Id: I57b7d2573bb6dfd038e581acb3bd8ea1565aa20d
Reviewed-on: http://gerrit.cloudera.org:8080/12985
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Remove branch from ParquetPlainEncoder::Decode()
> 
>
> Key: IMPALA-8381
> URL: https://issues.apache.org/jira/browse/IMPALA-8381
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: newbie, parquet, performance, ramp-up
>
> Removing the "if" at
> https://github.com/apache/impala/blob/5670f96b828d57f9e36510bb9af02bcc31de775c/be/src/exec/parquet/parquet-common.h#L203
> can lead to 1.5x speed up in plain decoding (type=int32, stride=16). For 
> primitive types, the same check can be done for a whole batch, so the speedup 
> can be gained for large batches without loosing safety. The only Parquet type 
> where this check is needed per element is BYTE_ARRAY (typically used for 
> STRING columns), which already has a template specialization for  
> ParquetPlainEncoder::Decode().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8381) Remove branch from ParquetPlainEncoder::Decode()

2019-04-26 Thread Daniel Becker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826924#comment-16826924
 ] 

Daniel Becker commented on IMPALA-8381:
---

We made some benchmarks comparing the old and the new implementations. The code 
for the benchmarks is not in the patch for this Jira as the benchmarks were 
originally written for the delta encoding (implementation in progress: 
[https://gerrit.cloudera.org/#/c/12621/]).

The benchmarks read plain encoded values to a "scratch batch" with the given 
stride. `OutType` means the type of the values in the scratch batch. We use the 
Impala benchmark framework. For the values, bigger is better.

Baseline (old implementation):
||ParquetType||OutType||Stride||p10||p50||p90||
|INT32|int32_t|8|136|140|141|
|INT32|int32_t|12|136|139|139|
|INT32|int32_t|16|137|139|140|
|INT32|int32_t|20|137|139|140|
|INT32|int32_t|30|125|128|129|
|INT32|int32_t|40|92.3|93.9|94.7|
|INT32|int32_t|50|72|72.7|73.1|
|INT32|int32_t|80|54|54.4|55|
|INT32|int32_t|100|51.7|52.2|52.7|
|INT32|int32_t|120|49.4|49.9|50.4|
|INT32|int32_t|150|46.3|46.6|47|
|INT32|int32_t|180|46.9|47.6|48.4|
|INT32|int32_t|200|46.9|47.9|48.6|
|INT32|int32_t|400|51.7|52.9|54.4|
||ParquetType||OutType||Stride||p10||p50||p90||
|INT64|int64_t|8|137|140|140|
|INT64|int64_t|12|135|140|141|
|INT64|int64_t|16|137|139|141|
|INT64|int64_t|20|132|138|139|
|INT64|int64_t|30|115|117|117|
|INT64|int64_t|40|88.2|89.5|90.1|
|INT64|int64_t|50|69.5|70|70.5|
|INT64|int64_t|80|52.7|53|53.4|
|INT64|int64_t|100|48|48.5|48.9|
|INT64|int64_t|120|48.1|48.6|49.1|
|INT64|int64_t|150|43.1|43.6|43.8|
|INT64|int64_t|180|42.2|42.4|43|
|INT64|int64_t|200|43.9|44.4|44.8|
|INT64|int64_t|400|42.5|42.8|43.2|

New implementation:
||ParquetType||OutType||Stride||p10||p50||p90||
|INT32|int32_t|8|281|284|286|
|INT32|int32_t|12|257|261|263|
|INT32|int32_t|16|237|240|242|
|INT32|int32_t|20|209|212|215|
|INT32|int32_t|30|136|138|139|
|INT32|int32_t|40|95.7|96.5|97.2|
|INT32|int32_t|50|73.2|73.9|74.5|
|INT32|int32_t|80|54|54.5|55.2|
|INT32|int32_t|100|51.9|52.4|52.8|
|INT32|int32_t|120|49.6|50|50.4|
|INT32|int32_t|150|45.9|46.6|47|
|INT32|int32_t|180|47.3|48.6|49.5|
|INT32|int32_t|200|47.4|48.6|49.7|
|INT32|int32_t|400|50.7|52.9|54.4|
||ParquetType||OutType||Stride||p10||p50||p90||
|INT64|int64_t|8|264|268|271|
|INT64|int64_t|12|221|224|226|
|INT64|int64_t|16|216|217|219|
|INT64|int64_t|20|184|186|188|
|INT64|int64_t|30|120|122|123|
|INT64|int64_t|40|90.2|91.3|91.9|
|INT64|int64_t|50|69.3|69.9|70.4|
|INT64|int64_t|80|52.7|53.2|53.6|
|INT64|int64_t|100|48.2|48.8|49.1|
|INT64|int64_t|120|48.2|48.7|49|
|INT64|int64_t|150|43.2|43.6|44|
|INT64|int64_t|180|42.2|42.7|43|
|INT64|int64_t|200|44|44.5|44.7|
|INT64|int64_t|400|43.9|44.5|45.1|

> Remove branch from ParquetPlainEncoder::Decode()
> 
>
> Key: IMPALA-8381
> URL: https://issues.apache.org/jira/browse/IMPALA-8381
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: newbie, parquet, performance, ramp-up
>
> Removing the "if" at
> https://github.com/apache/impala/blob/5670f96b828d57f9e36510bb9af02bcc31de775c/be/src/exec/parquet/parquet-common.h#L203
> can lead to 1.5x speed up in plain decoding (type=int32, stride=16). For 
> primitive types, the same check can be done for a whole batch, so the speedup 
> can be gained for large batches without loosing safety. The only Parquet type 
> where this check is needed per element is BYTE_ARRAY (typically used for 
> STRING columns), which already has a template specialization for  
> ParquetPlainEncoder::Decode().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8381) Remove branch from ParquetPlainEncoder::Decode()

2019-04-11 Thread Daniel Becker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815439#comment-16815439
 ] 

Daniel Becker commented on IMPALA-8381:
---

https://gerrit.cloudera.org/#/c/12985/

> Remove branch from ParquetPlainEncoder::Decode()
> 
>
> Key: IMPALA-8381
> URL: https://issues.apache.org/jira/browse/IMPALA-8381
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: newbie, parquet, performance, ramp-up
>
> Removing the "if" at
> https://github.com/apache/impala/blob/5670f96b828d57f9e36510bb9af02bcc31de775c/be/src/exec/parquet/parquet-common.h#L203
> can lead to 1.5x speed up in plain decoding (type=int32, stride=16). For 
> primitive types, the same check can be done for a whole batch, so the speedup 
> can be gained for large batches without loosing safety. The only Parquet type 
> where this check is needed per element is BYTE_ARRAY (typically used for 
> STRING columns), which already has a template specialization for  
> ParquetPlainEncoder::Decode().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8381) Remove branch from ParquetPlainEncoder::Decode()

2019-04-04 Thread Daniel Becker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809886#comment-16809886
 ] 

Daniel Becker commented on IMPALA-8381:
---

Some measurements:

Running the following query in the database tpch_parquet:
{code:java}
set num_nodes=1; select max(l_orderkey) from lineitem;{code}
we found the following results averaging the MaterializeTupleTime(*) values 
over 100 runs with and without the "if":

Without "if": 14.3464ms

With "if": 16.42624ms

This is a 14% improvement in MaterializeTupleTime in this query.

The total query time was 0.11s, the ~2ms gain is a little less than 2%.

> Remove branch from ParquetPlainEncoder::Decode()
> 
>
> Key: IMPALA-8381
> URL: https://issues.apache.org/jira/browse/IMPALA-8381
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Csaba Ringhofer
>Priority: Minor
>  Labels: newbie, parquet, performance, ramp-up
>
> Removing the "if" at
> https://github.com/apache/impala/blob/5670f96b828d57f9e36510bb9af02bcc31de775c/be/src/exec/parquet/parquet-common.h#L203
> can lead to 1.5x speed up in plain decoding (type=int32, stride=16). For 
> primitive types, the same check can be done for a whole batch, so the speedup 
> can be gained for large batches without loosing safety. The only Parquet type 
> where this check is needed per element is BYTE_ARRAY (typically used for 
> STRING columns), which already has a template specialization for  
> ParquetPlainEncoder::Decode().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org