[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. IMPALA-5560: always store CHAR(N) inline in tuple This is done to simplify the CHAR(N) logic. I believe this is overall an improvement - any benefits of the out-of-line storage that motivated this optimisation originally were outweighed by the added complexity. This also avoids IMPALA-5559 (fe/be have different notions of var-len), which will unblock IMPALA-3200. Pros: * Reduce the number of code paths and improve test coverage. (e.g. avoids IMPALA-5559: fe/be have different notions of var-len) * Reduced memory to store non-NULL data (saves 12-byte StringValue) * Fewer branches in code -> save CPU cycles. * If CHAR(N) performance is important, reduced complexity makes it easier to implement codegen. Cons: * Requires N bytes to store a NULL value. * May hurt cache locality (although this is speculative in my mind). The change is mostly mechanical - I removed MAX_CHAR_INLINE_LENGTH and then removed branches that depended on that. Testing: Ran exhaustive build. Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Reviewed-on: http://gerrit.cloudera.org:8080/7303 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/slot-ref.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/string-value.h M be/src/runtime/string-value.inline.h M be/src/runtime/types.h M be/src/service/fe-support.cc M be/src/service/hs2-util.cc M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M testdata/workloads/functional-query/queries/QueryTest/spilling.test 19 files changed, 69 insertions(+), 130 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 7: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/815/ -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 7: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Dan Hecht has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7303 to look at the new patch set (#6). Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. IMPALA-5560: always store CHAR(N) inline in tuple This is done to simplify the CHAR(N) logic. I believe this is overall an improvement - any benefits of the out-of-line storage that motivated this optimisation originally were outweighed by the added complexity. This also avoids IMPALA-5559 (fe/be have different notions of var-len), which will unblock IMPALA-3200. Pros: * Reduce the number of code paths and improve test coverage. (e.g. avoids IMPALA-5559: fe/be have different notions of var-len) * Reduced memory to store non-NULL data (saves 12-byte StringValue) * Fewer branches in code -> save CPU cycles. * If CHAR(N) performance is important, reduced complexity makes it easier to implement codegen. Cons: * Requires N bytes to store a NULL value. * May hurt cache locality (although this is speculative in my mind). The change is mostly mechanical - I removed MAX_CHAR_INLINE_LENGTH and then removed branches that depended on that. Testing: Ran exhaustive build. Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 --- M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/slot-ref.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/string-value.h M be/src/runtime/string-value.inline.h M be/src/runtime/types.h M be/src/service/fe-support.cc M be/src/service/hs2-util.cc M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M testdata/workloads/functional-query/queries/QueryTest/spilling.test 19 files changed, 69 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/7303/6 -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7303/3/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS3, Line 523: dst > I think this would be clearer if we say "the slot 'dst'". or rename 'dst' t Done -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Dan Hecht has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 5: Code-Review+2 Meant to +2 but please see previous comment. -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Dan Hecht has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7303/3/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS3, Line 523: dst I think this would be clearer if we say "the slot 'dst'". or rename 'dst' to 'slot' -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 5: A couple of tests referred to the in/out-of-line char optimisation in comments. Fixed those. -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Tim Armstrong has uploaded a new patch set (#5). Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. IMPALA-5560: always store CHAR(N) inline in tuple This is done to simplify the CHAR(N) logic. I believe this is overall an improvement - any benefits of the out-of-line storage that motivated this optimisation originally were outweighed by the added complexity. This also avoids IMPALA-5559 (fe/be have different notions of var-len), which will unblock IMPALA-3200. Pros: * Reduce the number of code paths and improve test coverage. (e.g. avoids IMPALA-5559: fe/be have different notions of var-len) * Reduced memory to store non-NULL data (saves 12-byte StringValue) * Fewer branches in code -> save CPU cycles. * If CHAR(N) performance is important, reduced complexity makes it easier to implement codegen. Cons: * Requires N bytes to store a NULL value. * May hurt cache locality (although this is speculative in my mind). The change is mostly mechanical - I removed MAX_CHAR_INLINE_LENGTH and then removed branches that depended on that. Testing: Ran exhaustive build. Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 --- M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/slot-ref.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/string-value.h M be/src/runtime/string-value.inline.h M be/src/runtime/types.h M be/src/service/fe-support.cc M be/src/service/hs2-util.cc M fe/src/main/java/org/apache/impala/catalog/ScalarType.java M testdata/workloads/functional-query/queries/QueryTest/spilling.test 19 files changed, 68 insertions(+), 129 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/7303/5 -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. IMPALA-5560: always store CHAR(N) inline in tuple This is done to simplify the CHAR(N) logic. I believe this is overall an improvement - any benefits of the out-of-line storage that motivated this optimisation originally were outweighed by the added complexity. This also avoids IMPALA-5559 (fe/be have different notions of var-len), which will unblock IMPALA-3200. Pros: * Reduce the number of code paths and improve test coverage. (e.g. avoids IMPALA-5559: fe/be have different notions of var-len) * Reduced memory to store non-NULL data (saves 12-byte StringValue) * Fewer branches in code -> save CPU cycles. * If CHAR(N) performance is important, reduced complexity makes it easier to implement codegen. Cons: * Requires N bytes to store a NULL value. * May hurt cache locality (although this is speculative in my mind). The change is mostly mechanical - I removed MAX_CHAR_INLINE_LENGTH and then removed branches that depended on that. Testing: Ran exhaustive build. Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 --- M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/slot-ref.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/string-value.h M be/src/runtime/string-value.inline.h M be/src/runtime/types.h M be/src/service/fe-support.cc M be/src/service/hs2-util.cc M fe/src/main/java/org/apache/impala/catalog/ScalarType.java 18 files changed, 65 insertions(+), 126 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/7303/3 -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7303/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: PS2, Line 72: 128 bytes > update Done http://gerrit.cloudera.org:8080/#/c/7303/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS2, Line 571: StringValue sv > what's the point of this StringValue now? This seems confusing and round ab Done. Also fixed the dst type weirdness - that was actually an easy fix, unlike the other place that does the same weird thing. -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Dan Hecht has posted comments on this change. Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. Patch Set 2: (2 comments) nice http://gerrit.cloudera.org:8080/#/c/7303/2/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: PS2, Line 72: 128 bytes update http://gerrit.cloudera.org:8080/#/c/7303/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS2, Line 571: StringValue sv what's the point of this StringValue now? This seems confusing and round about. (It's also weird that dst type is StringValue*, but that's a pre-existing weirdness). Seems more straightforward to just do: char* dst_char = reinterpret_cast(dst); and use dst_char wherever sv.ptr is used. -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5560: always store CHAR(N) inline in tuple
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-5560: always store CHAR(N) inline in tuple .. IMPALA-5560: always store CHAR(N) inline in tuple This is done to simplify the CHAR(N) logic. I believe this is overall an improvement - any benefits of the out-of-line storage that motivated this optimisation originally were outweighed by the added complexity. This also avoids IMPALA-5559 (fe/be have different notions of var-len), which will unblock IMPALA-3200. Pros: * Reduce the number of code paths and improve test coverage. (e.g. avoids IMPALA-5559: fe/be have different notions of var-len) * Reduced memory to store non-NULL data (saves 12-byte StringValue) * Fewer branches in code -> save CPU cycles. * If CHAR(N) performance is important, reduced complexity makes it easier to implement codegen. Cons: * Requires N bytes to store a NULL value. * May hurt cache locality (although this is speculative in my mind). The change is mostly mechanical - I removed MAX_CHAR_INLINE_LENGTH and then removed branches that depended on that. Testing: Ran exhaustive build. Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 --- M be/src/exec/hash-table.cc M be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exprs/agg-fn-evaluator.cc M be/src/exprs/anyval-util.h M be/src/exprs/scalar-expr-evaluator.cc M be/src/exprs/slot-ref.cc M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/raw-value.inline.h M be/src/runtime/string-value.h M be/src/runtime/string-value.inline.h M be/src/runtime/types.h M be/src/service/fe-support.cc M be/src/service/hs2-util.cc M fe/src/main/java/org/apache/impala/catalog/ScalarType.java 18 files changed, 55 insertions(+), 118 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/7303/2 -- To view, visit http://gerrit.cloudera.org:8080/7303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c0b823ccff6b0c37f5267c548d096c29b8caac3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong