[ https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613667#comment-16613667 ]
Csaba Ringhofer commented on IMPALA-7367: ----------------------------------------- About TimestampValue and boost: it is also a possibility to replace the boost types with uint64_t/int32_t or wrapper structs that have the same layout (nanoseconds from midnight/days since some epoch). The layout itself cannot be changed easily, because Parquet uses the same 96 bit struct for timestamps and memcpy is used to read them. So if we find a better way to store timestamps or boost is updated and these classes change, then the current version still has to be kept to be able to read/write Parquet. boost::posix_time::time_duration and boost::gregorian::date have some useful functions, but also cause problems: - adding duration can overflow, which leads to a lot of wrapper code (an example is https://gerrit.cloudera.org/#/c/11390 ) - the limitations of the boost classes is the reason behind the limited timestamp range in Impala (1400..9999) - uint32_t could be enough to a much larger range e.g. from 0001 - some functions can throw exceptions if out of range and Impala is not very good at catching them - some functions are slow compared to manipulating the integer representations directly I am not saying that the boost classes should be replaced in the scope of this Jira, I just wanted to add that the tendency is to move away from boost's time types. Replacing boost/date_time/local_time/tz_database with CCTZ was a big step in this direction. Also note that Impala's TPCH tables store timestamps are strings (as far as I know), so packing TimestampValue shouldn't affect TPCH performance. > Pack StringValue, CollectionValue and TimestampValue slots > ---------------------------------------------------------- > > Key: IMPALA-7367 > URL: https://issues.apache.org/jira/browse/IMPALA-7367 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Tim Armstrong > Assignee: Pooja Nilangekar > Priority: Major > Labels: perfomance > Attachments: 0001-WIP.patch > > > This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789 > didn't actually fully pack the memory layout because StringValue, > TimestampValue and CollectionValue still occupy 16 bytes but only have 12 > bytes of actual data. This results in a higher memory footprint, which leads > to higher memory requirements and worse performance. We don't get any benefit > from the padding since the majority of tuples are not actually aligned in > memory anyway. > I did a quick version of the change for StringValue only which improves TPC-H > performance. > {noformat} > Report Generated on 2018-07-30 > Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs > f1e401505ac20c0400eec819b9196f7f506fb927" > Cluster Name: UNKNOWN > Lab Run Info: UNKNOWN > Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE () > Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27) > +----------+-----------------------+---------+------------+------------+----------------+ > | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | > Delta(GeoMean) | > +----------+-----------------------+---------+------------+------------+----------------+ > | TPCH(10) | parquet / none / none | 2.69 | -4.78% | 2.09 | > -3.11% | > +----------+-----------------------+---------+------------+------------+----------------+ > +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ > | Workload | Query | File Format | Avg(s) | Base Avg(s) | > Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | > +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ > | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94 | 0.93 | > +0.75% | 3.37% | 2.84% | 1 | 30 | > | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32 | 3.32 | > +0.13% | 1.74% | 2.09% | 1 | 30 | > | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99 | 0.99 | > -0.02% | 3.74% | 3.16% | 1 | 30 | > | TPCH(10) | TPCH-Q5 | parquet / none / none | 2.30 | 2.33 | > -0.96% | 2.15% | 2.45% | 1 | 30 | > | TPCH(10) | TPCH-Q2 | parquet / none / none | 1.55 | 1.57 | > -1.45% | 1.65% | 1.49% | 1 | 30 | > | TPCH(10) | TPCH-Q8 | parquet / none / none | 2.89 | 2.93 | > -1.51% | 2.69% | 1.34% | 1 | 30 | > | TPCH(10) | TPCH-Q9 | parquet / none / none | 5.96 | 6.06 | > -1.63% | 1.34% | 1.82% | 1 | 30 | > | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58 | 1.61 | > -1.85% | 2.28% | 2.16% | 1 | 30 | > | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18 | 1.21 | > -2.11% | 3.68% | 4.72% | 1 | 30 | > | TPCH(10) | TPCH-Q3 | parquet / none / none | 2.13 | 2.18 | > -2.31% | 2.09% | 1.92% | 1 | 30 | > | TPCH(10) | TPCH-Q15 | parquet / none / none | 1.86 | 1.90 | > -2.52% | 2.06% | 2.22% | 1 | 30 | > | TPCH(10) | TPCH-Q17 | parquet / none / none | 1.85 | 1.90 | > -2.86% | 10.00% | 8.02% | 1 | 30 | > | TPCH(10) | TPCH-Q10 | parquet / none / none | 2.58 | 2.66 | > -2.93% | 1.68% | 6.49% | 1 | 30 | > | TPCH(10) | TPCH-Q14 | parquet / none / none | 1.37 | 1.42 | > -3.22% | 3.35% | 6.24% | 1 | 30 | > | TPCH(10) | TPCH-Q18 | parquet / none / none | 4.99 | 5.17 | > -3.38% | 1.75% | 3.82% | 1 | 30 | > | TPCH(10) | TPCH-Q6 | parquet / none / none | 0.66 | 0.69 | > -3.73% | 5.04% | 4.12% | 1 | 30 | > | TPCH(10) | TPCH-Q4 | parquet / none / none | 1.07 | 1.12 | > -3.97% | 1.79% | 2.85% | 1 | 30 | > | TPCH(10) | TPCH-Q1 | parquet / none / none | 2.29 | 2.39 | > -4.34% | 2.70% | 3.43% | 1 | 30 | > | TPCH(10) | TPCH-Q7 | parquet / none / none | 6.17 | 6.45 | > -4.42% | 2.15% | 1.69% | 1 | 30 | > | TPCH(10) | TPCH-Q19 | parquet / none / none | 1.85 | 1.93 | > -4.42% | 2.76% | 2.26% | 1 | 30 | > | TPCH(10) | TPCH-Q12 | parquet / none / none | 1.32 | 1.42 | > -6.75% | 2.55% | 3.56% | 1 | 30 | > | TPCH(10) | TPCH-Q21 | parquet / none / none | 10.43 | 12.08 | > -13.69% | * 14.21% * | * 10.76% * | 1 | 30 | > +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+ > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org