[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has uploaded a new patch set (#21). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 166.241us 166.241us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 71.695us 71.695us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s971ms3s809ms3 13.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s533ms1s535ms 28.80M 28.80M 532.28 MB 1.88
[Impala-ASF-CR] Pass build type to Impala LZO.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/6446 Change subject: Pass build type to Impala LZO. .. Pass build type to Impala LZO. Before, the build type used for Impala LZO was always debug. Now, the build type is passed from the Impala CMakeLists.txt. This patch needs corresponding changes to Impala LZO. Testing: I tested locally with these build types: DEBUG, RELEASE, and ADDRESS_SANITIZER. Change-Id: Ia83e594409ad5938662ca210c810d5d31b8637b0 --- M CMakeLists.txt 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/6446/1 -- To view, visit http://gerrit.cloudera.org:8080/6446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia83e594409ad5938662ca210c810d5d31b8637b0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5088: Fix heap buffer overflow .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/401/ -- To view, visit http://gerrit.cloudera.org:8080/6441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Dan Hecht has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 20: (2 comments) http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 263: (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state)); > Unfortunately that wouldn't work. We could make it work but would require changing how row-batches work a little. I agree it's not worth it, so let's leave this alone. http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 129: return first_materialized_child_idx_ <= child_idx_ && child_idx_ < children_.size(); > Reordered it as you suggested. (By the way, in Python you can actually writ The condition you have here doesn't tell "if there are still rows to be returned from children than need materialization". Your condition tells you whether we are currently processing children that need passthrough. The condition that makes sense for this function is the one I wrote earlier, read the comment to see why: // We have children that need materialization and haven't processed them all yet. first_materizlied_child_idx_ != children_.size() && child_idx_ < children_.size() -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Reviewed-on: http://gerrit.cloudera.org:8080/6436 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 3 insertions(+), 3 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has uploaded a new patch set (#20). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 166.241us 166.241us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 71.695us 71.695us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s971ms3s809ms3 13.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s533ms1s535ms 28.80M 28.80M 532.28 MB 1.88
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. IMPALA-5072: Fix test_recover_partitions on S3 On S3, directory creation is a nop, so we have to actually create values to materialize the partitions. Was able to merge this against my fork and successfully ran the metadata/test_recover_partitions test on S3. Also ran locally to verify there is no slowdown for HDFS. Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Reviewed-on: http://gerrit.cloudera.org:8080/6408 Reviewed-by: Michael Brown Reviewed-by: Dimitris Tsirogiannis Tested-by: Impala Public Jenkins --- M tests/metadata/test_recover_partitions.py 1 file changed, 7 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Michael Brown: Looks good to me, approved Dimitris Tsirogiannis: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5816 to look at the new patch set (#20). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 166.241us 166.241us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 71.695us 71.695us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s971ms3s809ms3 13.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 19: (14 comments) http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 111: DCHECK_LT(child_idx_, children_.size()); > this DCHECK should come before any use of child_idx_ since if it's violated Done Line 114: DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc())); > what do these dchecks have to do with child_eos_? i.e. shouldn't they true Yes, They should be true. I think the idea was to call this once per child. I actually think it's safer to call this every time because currently it does not get called for the first child (which gets opened in Open()). Changed it so that it gets called every time. PS19, Line 132: child_idx_++ > ++child_idx_ per our style. Done PS19, Line 147: child_idx_ < children_.size() > Would make more sense as HasMoreMaterialized() Done. Even though this does an extra unnecessary check (first_materialized_child_idx_ <= child_idx_), this is cleaner. PS19, Line 150: children > children that need materialization Done PS19, Line 153: Row > Child row batch Done Line 168: // There are only 3 ways of getting out of this loop: > it would be helpful to concisely say what this loop is responsible for give Done Line 193: COUNTER_SET(rows_returned_counter_, num_rows_returned_); > should we do: child_batch_.reset() here? we shouldn't ever reference it aga I see. So the invariant should be if child_batch references something (it's not reset), then it corresponds to an open materialized child. Added reset() here. By the way, reset() will be called twice, once here and once in Close(). PS19, Line 200: // We end up here iff one of the following is true (or both). : // 1. We are done consuming all batches from the current child and we need to move on : //to the next child. : // 2. The output row batch is at capacity. > this could be summarized for a quicker read: Changed the structure along the lines of what you suggested, and cleaned this up. PS19, Line 204: In other words, the only way to not end up here if we entered the outer loop is if : // the limit is reached. > rather than say that in a comment, how about: Done Line 263: (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state)); > shouldn't we put lines 254-263 inside a do-while loop, so that we don't ret Unfortunately that wouldn't work. For passthrough, we simply forward as they are. Even if they are partially filled, we just forward them. That's kind of the point of this patch. Maybe we could put materialized and const in a while loop, but that would make the code messier and the benefit would be minimal (up to 1 fewer partially filled row batch per union node per entire cluster). http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h File be/src/exec/union-node.h: PS19, Line 115: child_idx > 'child_idx' Done Line 129: return child_idx_ >= first_materialized_child_idx_ && child_idx_ < children_.size(); > this suggestion is okay to ignore, but i find these kind of conditions easi Reordered it as you suggested. (By the way, in Python you can actually write "w < x <= y < z") first_materialized_child_idx_ points to the first materialized child. So everything after it is materialized. that means if child_idx_ is greater or equal to materialized_child_idx_ then it's materialized. It's the exact opposite of passthrough (where we check if child_idx_ is less than). http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc File be/src/exprs/slot-ref.cc: Line 94: DCHECK(false); > this is confusing. either it's an invariant or it's not. it's even more con Removed. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3203: Part 1: Free list implementation .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3203: Part 1: Free list implementation .. IMPALA-3203: Part 1: Free list implementation We will have a single free list per size class. Free buffers are stored in a heap, ordered by the memory address of the buffer, to help reduce address space fragmentation. A follow-up patch will use the free lists in BufferPool. Currently TCMalloc has thread-local caches with somewhat similar purpose. However, these are not suitable for our purposes for several reasons: * They are designed for caching small allocations - large allocations like most buffers are served from a global page heap protected by a global lock. * We intend to move away from using TCMalloc for buffers: IMPALA-5073 * Thread caches are ineffective for the producer-consumer pattern where one thread allocates memory and another thread frees it. * TCMalloc gives us limited control over how and when memory is actually released to the OS. Testing: Added unit tests for sanity checks and verification of behaviour that is trickier to check in integration or system tests. The cost will be exercised more thoroughly via BufferPool in Part 2. Performance: Includes a benchmark that demonstrates the scalability of the free lists under concurrency. When measuring pure throughput of free list operations, having a free list per core is significantly faster than a shared free list, or allocating directly from TCMalloc. On 8 cores, if the memory allocated is actually touched then for 64KB+ buffers, memory access is the bottleneck rather than lock contention. The benchmark also showed that non-inlined constructors and move operators of BufferHandle were taking significant CPU cycles, so I inlined those. This suggests that having a free list per core is more than sufficient (however, we will need to validate this with system concurrency benchmarks once we switch to using this during query execution). Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Reviewed-on: http://gerrit.cloudera.org:8080/6410 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/free-lists-benchmark.cc M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h A be/src/runtime/bufferpool/free-list-test.cc A be/src/runtime/bufferpool/free-list.h M be/src/util/benchmark-test.cc M be/src/util/benchmark.cc M be/src/util/benchmark.h 12 files changed, 813 insertions(+), 64 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5816 to look at the new patch set (#20). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 166.241us 166.241us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 71.695us 71.695us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s971ms3s809ms3 13.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Dan Hecht has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 263: (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state)); > shouldn't we put lines 254-263 inside a do-while loop, so that we don't ret I guess we can't really make this work easily when there are multiple passthrough children, so let's not worry about doing this. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder
Laurel Hale has posted comments on this change. Change subject: [DOCS] Replace LZO setup instructions with placeholder .. Patch Set 3: Code-Review+1 Builds cleanly and renders well in both html & pdf. -- To view, visit http://gerrit.cloudera.org:8080/6439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Genericize references to ODBC driver versions
Laurel Hale has posted comments on this change. Change subject: [DOCS] Genericize references to ODBC driver versions .. Patch Set 1: Code-Review+1 Builds cleanly and changes render well in both HTML & PDF. I agree with Jim that it would read well to remove the phrase, "versions of" in both places identified by him. -- To view, visit http://gerrit.cloudera.org:8080/6440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ida0fa4f25934eaebf095b427413f20944f1a4f71 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow
Henry Robinson has posted comments on this change. Change subject: IMPALA-5088: Fix heap buffer overflow .. Patch Set 2: As if to prove the point about this not just being a hypothetical bug, I just hit this, in unison, on five nodes at once in a twenty node cluster :) Thanks for fixing it. -- To view, visit http://gerrit.cloudera.org:8080/6441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5088: Fix heap buffer overflow .. Patch Set 2: Code-Review+2 Thanks for the review, Tim. Updated the commit message to mention the commit that introduced the bug (for backporting), forwarding the +2. I'll start the GVO after my private ASAN build passes. -- To view, visit http://gerrit.cloudera.org:8080/6441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow
Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: IMPALA-5088: Fix heap buffer overflow .. IMPALA-5088: Fix heap buffer overflow In a recent patch (IMPALA-4787), we introduced a change where we are reading past the end of a buffer during a copy. Even though we would never read the data that was copied from past the end of the buffer, this could cause a crash if the allocation is sitting on the edge of an unmapped area of memory. This also caused the ASAN build to fail. The issue is fixed by never accessing memory that is past the end of the buffer. Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 --- M be/src/exprs/aggregate-functions-ir.cc 1 file changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6441/2 -- To view, visit http://gerrit.cloudera.org:8080/6441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6441 to look at the new patch set (#2). Change subject: IMPALA-5088: Fix heap buffer overflow .. IMPALA-5088: Fix heap buffer overflow In a recent patch (IMPALA-4787), we introduced a change where we are reading past the end of a buffer during a copy. Even though we would never read the data that was copied from past the end of the buffer, this could cause a crash if the allocation is sitting on the edge of an unmapped area of memory. This also caused the ASAN build to fail. The issue is fixed by never accessing memory that is past the end of the buffer. Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 --- M be/src/exprs/aggregate-functions-ir.cc 1 file changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6441/2 -- To view, visit http://gerrit.cloudera.org:8080/6441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Dan Hecht has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 19: (14 comments) nice cleanup http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 111: DCHECK_LT(child_idx_, children_.size()); this DCHECK should come before any use of child_idx_ since if it's violated, it doesn't make sense to call e.g. IsChildPassthrough() with chidl_idx_. Line 114: DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc())); what do these dchecks have to do with child_eos_? i.e. shouldn't they true regardless of the value of child_eos_? PS19, Line 132: child_idx_++ ++child_idx_ per our style. PS19, Line 147: child_idx_ < children_.size() Would make more sense as HasMoreMaterialized() PS19, Line 150: children children that need materialization PS19, Line 153: Row Child row batch Line 168: // There are only 3 ways of getting out of this loop: it would be helpful to concisely say what this loop is responsible for given it's complexity. Something like: This loop fetches row batches from a single child and materializes each output row, until one of these conditions: 1) ... Line 193: COUNTER_SET(rows_returned_counter_, num_rows_returned_); should we do: child_batch_.reset() here? we shouldn't ever reference it again, but seems cleaner to keep the invariant that child_batch_ always corresponds to the open child (otherwise its memory references aren't valid). PS19, Line 200: // We end up here iff one of the following is true (or both). : // 1. We are done consuming all batches from the current child and we need to move on : //to the next child. : // 2. The output row batch is at capacity. this could be summarized for a quicker read: // Either we've finished the current child or the output batch is at capacity, or both. PS19, Line 204: In other words, the only way to not end up here if we entered the outer loop is if : // the limit is reached. rather than say that in a comment, how about: DCHECK(!ReachedLimit()); Line 263: (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state)); shouldn't we put lines 254-263 inside a do-while loop, so that we don't return partially filled row-batches when we still have output to produce? Returning partially filled row-batches is legal but can be confusing to clients (admittedly it can happen for other reasons currently, but it'd be nice to avoid doing that unless there's a good reason). do { ... while (!*eos && !row_batch->AtCapacity()); http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h File be/src/exec/union-node.h: PS19, Line 115: child_idx 'child_idx' Line 129: return child_idx_ >= first_materialized_child_idx_ && child_idx_ < children_.size(); this suggestion is okay to ignore, but i find these kind of conditions easier to read as: x <= y && y < z so that it looks more close to how you'd read it in math: x <= y < z. That said, I don't see how the condition: first_materialized_child_idx_ <= child_idx_ < children_.size() is correct for whether there are more materialized children. i.e. when child_idx_ < first_materialized_child_idx, we still have more materalized children, right? So, shouldn't this be: // We have children that need materialization and haven't processed them all yet. first_materizlied_child_idx_ != children_.size() && child_idx_ < children_.size() http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc File be/src/exprs/slot-ref.cc: Line 94: DCHECK(false); this is confusing. either it's an invariant or it's not. it's even more confusing because now it makes this case and the one at line 79 different for no good reason, and future code readers won't know why. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5088: Fix heap buffer overflow .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5088: Fix heap buffer overflow
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/6441 Change subject: IMPALA-5088: Fix heap buffer overflow .. IMPALA-5088: Fix heap buffer overflow In a recent patch, we introduced a change where we are incorrectly copying a buffer. We are reading past the end of a buffer during the copy. Even though we would never read the data that was copied from past the end of the buffer, this could cause a crash if the allocation is sitting on the edge of an unmapped area of memory. This also caused the ASAN build to fail. The issue is fixed by never accessing memory that is past the end of the buffer Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 --- M be/src/exprs/aggregate-functions-ir.cc 1 file changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/6441/1 -- To view, visit http://gerrit.cloudera.org:8080/6441 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia68d69b35d53934e6f1aae2e1696503d17660ca2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views
Zach Amsden has uploaded a new patch set (#6). Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views .. IMPALA-5003: Constant propagation in scan nodes and inline views When conjuncts are pushed into table refs and inline views, they can be considered for constant progagation within that node. Note: currently broken due to optimizing away column refs for HBase nodes when FALSE conjuncts are present. Working on a fix. Testing: Expanded the test cases for the planner to achieve constant propagation. More tests being added (for HBase and Kudu) Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test A testdata/workloads/visa 13 files changed, 219 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/6 -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] [DOCS] Genericize references to ODBC driver versions
Jim Apple has posted comments on this change. Change subject: [DOCS] Genericize references to ODBC driver versions .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6440/1/docs/topics/impala_ports.xml File docs/topics/impala_ports.xml: PS1, Line 94: versions of elide here and below -- To view, visit http://gerrit.cloudera.org:8080/6440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ida0fa4f25934eaebf095b427413f20944f1a4f71 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. Patch Set 2: Code-Review+2 Seems reasonable so long as it doesn't conflict with other in-flight version bumps. -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] PREVIEW: IMPALA-4678: port backend exec to use buffer pool
Tim Armstrong has uploaded a new patch set (#5). Change subject: PREVIEW: IMPALA-4678: port backend exec to use buffer pool .. PREVIEW: IMPALA-4678: port backend exec to use buffer pool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). Each ExecNode has to declare its memory requirements at Prepare() time. Convert HashTable to use the new BufferPool via a Suballocator. Make PAGG memory consumption more efficient (avoid wasting buffers): * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. Convert Sorter to use BufferPool. TODO in this patch: * some of the DCHECKS may be too aggressive. With the current memory transfer model, operators that accumulate batches, i.e. NLJ, can "steal" reservation. We need a test to reproduce this problem. We can probably fix by having NLJ copy if it sees an attached buffer. * Consider renaming buffer_pool_page_size, e.g. to spillable_page_size * We hit a delete_on_read_ DCHECK in PHJ. This is currently too strict. TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Remove the old hash join and aggregation nodes Testing: * Updated tests to reflect new memory requirements * TODO: recalibrate limits in test_mem_usage_scaling * TODO: more tests to exercise new code paths Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/CMakeLists.txt D be/src/runtime/buffered-block-mgr-test.cc D be/src/runtime/buffered-block-mgr.cc D be/src/runtime/buffered-block-mgr.h D be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h D be/src/runtime/buffered-tuple-stream.cc D be/src/runtime/buffered-tuple-stream.h D be/src/runtime/buffered-tuple-stream.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter.h M be/src/util/memory-metrics.h M be/src/util/static-asserts.cc M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test D tests/custom_cluster/test_spilling.py M tests/query_test/test_mem_usage_scaling.py M tests/query_test/test_sort.py A tests/query_test/test_spilling.py 61 files changed, 1,669 insertions(+), 7,647 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/5801/5 -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] PREVIEW: IMPALA-4678: port backend exec to use buffer pool
Tim Armstrong has uploaded a new patch set (#3). Change subject: PREVIEW: IMPALA-4678: port backend exec to use buffer pool .. PREVIEW: IMPALA-4678: port backend exec to use buffer pool Always create global BufferPool at startup using 80% of memory and limit reservations to 80% of query memory (same as BufferedBlockMgr). Each ExecNode has to declare its memory requirements at Prepare() time. Convert HashTable to use the new BufferPool via a Suballocator. Make PAGG memory consumption more efficient (avoid wasting buffers): * Allow preaggs to execute with 0 reservation - if streams and hash tables cannot be allocated, it will pass through rows. * Halve the buffer requirement for spilling aggs - avoid allocating buffers for aggregated and unaggregated streams simultaneously. Convert Sorter to use BufferPool. TODO in this patch: * some of the DCHECKS may be too aggressive. With the current memory transfer model, operators that accumulate batches, i.e. NLJ, can "steal" reservation. We need a test to reproduce this problem. We can probably fix by having NLJ copy if it sees an attached buffer. * Consider renaming buffer_pool_page_size, e.g. to spillable_page_size * We hit a delete_on_read_ DCHECK in PHJ. This is currently too strict. TODO in follow-up patches: * Rename BufferedTupleStreamV2 to BufferedTupleStream * Remove the old hash join and aggregation nodes Testing: * Updated tests to reflect new memory requirements * TODO: recalibrate limits in test_mem_usage_scaling * TODO: more tests to exercise new code paths Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/analytic-eval-node.cc M be/src/exec/analytic-eval-node.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-table-test.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h M be/src/exec/hash-table.inline.h M be/src/exec/partitioned-aggregation-node-ir.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/exec/partitioned-hash-join-node.inline.h M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/CMakeLists.txt D be/src/runtime/buffered-block-mgr-test.cc D be/src/runtime/buffered-block-mgr.cc D be/src/runtime/buffered-block-mgr.h D be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream-v2.cc M be/src/runtime/buffered-tuple-stream-v2.h D be/src/runtime/buffered-tuple-stream.cc D be/src/runtime/buffered-tuple-stream.h D be/src/runtime/buffered-tuple-stream.inline.h M be/src/runtime/disk-io-mgr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/plan-fragment-executor.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/bloom-filter.h M be/src/util/memory-metrics.h M be/src/util/static-asserts.cc M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/analytic-fns.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters_phj.test M testdata/workloads/functional-query/queries/QueryTest/spilling.test D tests/custom_cluster/test_spilling.py M tests/query_test/test_mem_usage_scaling.py M tests/query_test/test_sort.py A tests/query_test/test_spilling.py 61 files changed, 1,669 insertions(+), 7,722 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/5801/3 -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-5003: Generic constant propagation in planner
Zach Amsden has posted comments on this change. Change subject: IMPALA-5003: Generic constant propagation in planner .. Patch Set 3: (12 comments) I'll post a WIP update to this diff as there was some refactoring that turned out to be necessary, and some new revelations about HBase key filtering have come up. http://gerrit.cloudera.org:8080/#/c/6389/5//COMMIT_MSG Commit Message: Line 7: IMPALA-5003: [DRAFT] Generic constant propagation in planner > Instead of "Generic constant propagation in planner" I'd propose something Done Line 9: Rather than specialize the constant propagation framework to be specific > Maybe rephrase to what the new code does and where it is applied, e.g., Done http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java: Line 115 > Looks wrong. We cannot just drop the 'false' predicate. I think we should e No, you missed the ! - we certainly can't drop the false predicate, but we do want to drop constant true predicates. I can probably (e.isConstant() && Expr.IS_TRUE_LITERAL) to make this more clear - this skips dropping constants that don't evaluate to bools, but there shouldn't be any such conjuncts anyway. http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: Line 1217: ConstantPropagator.optimizeConjuncts(conjuncts, analyzer, > Did it not come out cleaner? It did, but required some refactoring. http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: Line 1217: ConstantPropagator.optimizeConjuncts(conjuncts, analyzer, > Have you tried running a few queries that end up with a constant 'false' pr In practice this is more complicated. In many places we've already created a ScanNode of some sort and end up adding impossible false filters on to it. For HBase key filtering, this is actually made worse by optimizing (key = 'a' && false) => (false), and then the key = 'a' conjunct is not picked up by key range optimization and we end up with an unbounded key range scan across all data on all nodes :( I'm trying to rework so that this converts to an EmptySetNode, but this is actually more difficult than it should be. Ideally we would pull the conjunct optimization to figure out impossible scans first and have a generic way to create an EmptySetNode similar across all scan types, but some more refactoring would be needed to do that cleanly (turns out this already works for inline views). http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java: Line 48: public enum RuleSet { > This enum seems to add an unnecessary indirection. It's the same as "enable Done http://gerrit.cloudera.org:8080/#/c/6389/5/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test: Line 16 > Add a similar test for Kudu in one of the Kudu .test files. Will do. Also, this word 'propagation' is my nemesis of all words in the English language. Line 18 > flip the lhs/rhs of some predicates to test the expr normalization within t Done Line 27 > whitespace I noticed this too after the diff was sent Line 43 > What's the impossibility? copypasta mistake Line 56 > Can we move this into an HBase .test file? It's sometimes better to keep th Done Line 67 > Add negative test cases with arbitrary exprs, e.g. slot refs with explicit I think we also want to test with analytic expressions and functions returning bool as conjuncts -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/400/ -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress .. IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress The test does not work as intended and would likely not provide very good coverage of the different spilling paths, because it only runs a single simple query. The stress test (tests/stress/concurrent_select.py) provides much better coverage. test_mem_usage_scaling.py probably also provides better coverage that TestSpillStress in its current form. Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Reviewed-on: http://gerrit.cloudera.org:8080/6437 Reviewed-by: Michael Brown Tested-by: Impala Public Jenkins --- M tests/custom_cluster/test_spilling.py 1 file changed, 0 insertions(+), 75 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Brown: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6436 to look at the new patch set (#2). Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee --- M bin/impala-config.sh 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/6436/2 -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. IMPALA-4846: Upgrade Snappy to 1.1.4 Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2 Reviewed-on: http://gerrit.cloudera.org:8080/6428 Reviewed-by: Tim Armstrong Reviewed-by: Henry Robinson Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Henry Robinson: Looks good to me, approved Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Bump Kudu client version to 16dd6e4
Tim Armstrong has posted comments on this change. Change subject: Bump Kudu client version to 16dd6e4 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6423 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9a8074640a6b93b5a65a1e0e2f22c7fe78754ad1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/399/ -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Michael Brown has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Sorry for noise. Somehow two Gerrit jobs started. -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/398/ -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/397/ -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/397/ -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/398/ -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Zach Amsden has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Yes, Michael, if you could start a build, that should unbreak S3 tests. :) -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] Allow BlockingQueue and ThreadPool to accept rvalue args
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/6442 Change subject: Allow BlockingQueue and ThreadPool to accept rvalue args .. Allow BlockingQueue and ThreadPool to accept rvalue args Previously the BlockingQueue and ThreadPool APIs only accepted const lvalue references, so the argument was always copied into the queue. Very often we create a thin wrapper for each work item we submit to a thread pool, and will not want to use that object again, so moving it into the pool rather than copying makes the most sense. Note the introduction of an extra template parameter into Offer() and BlockingPut*(). To enable perfect-forwarding (i.e. allow the methods to accept rvalue or lvalues and pass them through), we need to use a) rvalue references (V&&) and b) do so in a 'type-deducing context' [1]. Having the enclosing class be template-parameterized does not count as type-deducing, so we add the dummy V parameter and the compiler will ensure that V is properly compatible with the original T type. [1] http://eli.thegreenplace.net/2014/perfect-forwarding-and-universal-references-in-c/ Change-Id: I1791870576cb269e86495034f92555de48f92f10 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/util/blocking-queue.h M be/src/util/thread-pool.h 3 files changed, 15 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/6442/1 -- To view, visit http://gerrit.cloudera.org:8080/6442 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1791870576cb269e86495034f92555de48f92f10 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3203: Part 1: Free list implementation .. Patch Set 10: Code-Review+2 Missed updating benchmark-test to reflect the change in optional arguments -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6261 to look at the new patch set (#8). Change subject: IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT .. IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT Create external table can be used for two different operations, creating a new managed table, and importing an existing unmanaged external table. Previously, it was only possible to create managed external tables with CTAS for Kudu. This change makes it possible to create new, unmanaged external kudu tables from SELECT statements. This is useful for importing data from one table to another persistent table in Kudu. In the catalog executor, we detect if a table creation is new by inferring that from the presence of column definitions, not by whether it is external / managed. Front-end code is changed to allow this through the parser. This means any attempts to specify table metadata for Kudu tables are interpreted as a table creation. Testing: Manual testing, expanded ParserTest and AnalyzeDDLTest and added test cases to query_test to validate Kudu support. Ran the folowing query: create external table test2_name primary key(id) partition by hash(id) partitions 10 stored as kudu as select * from test_name; Which successfully created and inserted 3 rows of data. Tried many other things like changing primary keys and forms of partitioning. Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 --- M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M shell/impala_shell.py M tests/query_test/test_kudu.py 8 files changed, 92 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/6261/8 -- To view, visit http://gerrit.cloudera.org:8080/6261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3203: Part 1: Free list implementation .. Patch Set 10: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/396/ -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Hello Impala Public Jenkins, Jim Apple, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6410 to look at the new patch set (#10). Change subject: IMPALA-3203: Part 1: Free list implementation .. IMPALA-3203: Part 1: Free list implementation We will have a single free list per size class. Free buffers are stored in a heap, ordered by the memory address of the buffer, to help reduce address space fragmentation. A follow-up patch will use the free lists in BufferPool. Currently TCMalloc has thread-local caches with somewhat similar purpose. However, these are not suitable for our purposes for several reasons: * They are designed for caching small allocations - large allocations like most buffers are served from a global page heap protected by a global lock. * We intend to move away from using TCMalloc for buffers: IMPALA-5073 * Thread caches are ineffective for the producer-consumer pattern where one thread allocates memory and another thread frees it. * TCMalloc gives us limited control over how and when memory is actually released to the OS. Testing: Added unit tests for sanity checks and verification of behaviour that is trickier to check in integration or system tests. The cost will be exercised more thoroughly via BufferPool in Part 2. Performance: Includes a benchmark that demonstrates the scalability of the free lists under concurrency. When measuring pure throughput of free list operations, having a free list per core is significantly faster than a shared free list, or allocating directly from TCMalloc. On 8 cores, if the memory allocated is actually touched then for 64KB+ buffers, memory access is the bottleneck rather than lock contention. The benchmark also showed that non-inlined constructors and move operators of BufferHandle were taking significant CPU cycles, so I inlined those. This suggests that having a free list per core is more than sufficient (however, we will need to validate this with system concurrency benchmarks once we switch to using this during query execution). Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/free-lists-benchmark.cc M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h A be/src/runtime/bufferpool/free-list-test.cc A be/src/runtime/bufferpool/free-list.h M be/src/util/benchmark-test.cc M be/src/util/benchmark.cc M be/src/util/benchmark.h 12 files changed, 813 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6410/10 -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Henry Robinson has posted comments on this change. Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. Patch Set 1: All the Kudu tests failed, not clear why one of the table servers failed to start. Unfortunately, looks hard to get on that machine to get at the tablet logs. As expected, one sub-build failed because Snappy 1.1.3 is no longer available in the toolchain. Will wait for that build to finish before retrying this. -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. Patch Set 1: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/392/ -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT
Zach Amsden has posted comments on this change. Change subject: IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/6261/7//COMMIT_MSG Commit Message: Line 10: new by inferring that from the presence of column definitions, not > what does 'if a table creation is new' mean? New rather that existing table. Terminology here is confusing, as managed != external, yet in many places of code and documentation, that is implied. Will rephrase. http://gerrit.cloudera.org:8080/#/c/6261/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 2264: "tblproperties('kudu.table_name'='t')", "Table partitioning must be " + > why is that? wouldn't the kudu table/metadata be able to provide that? Specifying any part of the metadata (x int) now implies a table creation, as we use the presence of column definitions to infer that this is a table creation. Since the assumption is that the table is being created, we need to know the partitioning scheme to create it. I tried to make the error message suggest this. I'll add a positive test case to illustrate the required syntax. http://gerrit.cloudera.org:8080/#/c/6261/6/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: PS6, Line 360: kudu_tbl_name = KuduTestSuite.to_kudu_table_name(unique_database, "foo2") : assert kudu_client.table_exists(kudu_tbl_name) : cursor.execute("SELECT * FROM %s.foo2" % (unique_database)) : assert len(cursor.fetchall()) == 1 : cursor.execute("DROP TABLE %s.foo2" % (unique_database)) : assert kudu_client.table_exists(kudu_tbl_name) > foo2 in kudu won't be deleted when you drop the database in impala though, Good point. -- To view, visit http://gerrit.cloudera.org:8080/6261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Genericize references to ODBC driver versions
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6440 Change subject: [DOCS] Genericize references to ODBC driver versions .. [DOCS] Genericize references to ODBC driver versions Make the statement about who uses which ports for Impala less specific w.r.t. the Cloudera ODBC driver. Change-Id: Ida0fa4f25934eaebf095b427413f20944f1a4f71 --- M docs/topics/impala_ports.xml 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/6440/1 -- To view, visit http://gerrit.cloudera.org:8080/6440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ida0fa4f25934eaebf095b427413f20944f1a4f71 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3203: Part 1: Free list implementation .. Patch Set 9: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/395/ -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Michael Brown has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Zach, let me know if you would like a build started for this patch, and I can do it. -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file
Impala Public Jenkins has posted comments on this change. Change subject: [DOCS] Rewrite github URL for timezone source file .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file
Impala Public Jenkins has submitted this change and it was merged. Change subject: [DOCS] Rewrite github URL for timezone source file .. [DOCS] Rewrite github URL for timezone source file Use the corresponding Apache github URL instead of the Cloudera one. Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2 Reviewed-on: http://gerrit.cloudera.org:8080/6438 Reviewed-by: Jim Apple Reviewed-by: Laurel Hale Tested-by: Impala Public Jenkins --- M docs/impala_keydefs.ditamap M docs/topics/impala_fixed_issues.xml 2 files changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Laurel Hale: Looks good to me, but someone else must approve Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Michael Brown has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Zach Amsden has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Yes, updated diff attached. -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6408 to look at the new patch set (#3). Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. IMPALA-5072: Fix test_recover_partitions on S3 On S3, directory creation is a nop, so we have to actually create values to materialize the partitions. Was able to merge this against my fork and successfully ran the metadata/test_recover_partitions test on S3. Also ran locally to verify there is no slowdown for HDFS. Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c --- M tests/metadata/test_recover_partitions.py 1 file changed, 7 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6408/3 -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file
Impala Public Jenkins has posted comments on this change. Change subject: [DOCS] Rewrite github URL for timezone source file .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/103/ -- To view, visit http://gerrit.cloudera.org:8080/6438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Alex Behm has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 19: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: Line 228: if (FLAGS_use_statestore && statestore_port > 0) { > This is a good idea - but please leave it for now, as one of the KRPC commi let's please not file a jira for something this small, the time involved in dealing with the jira will exceed the time spent on the code. please leave a todo so that henry can do this as part of the krpc changes. -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Henry Robinson has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: Line 228: if (FLAGS_use_statestore && statestore_port > 0) { > please move all of the duplicated code, incl. initializer list, into a shar This is a good idea - but please leave it for now, as one of the KRPC commits would conflict significantly with this. Maybe file a JIRA to clean this up (pretty sure we can remove the second c'tor). -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 2: (10 comments) http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: Line 183: request_pool_service_.get(), metrics_.get(), backend_address_)); no need to pass in backend_address_, you can get that with ExecEnv::GetInstance()->backend_address(). same for scheduler c'tor. Line 228: if (FLAGS_use_statestore && statestore_port > 0) { please move all of the duplicated code, incl. initializer list, into a shared c'tor or something like that, there's too much copy & paste at this point. Line 349: if (admission_controller_ != NULL) RETURN_IF_ERROR(admission_controller_->Init()); replace all NULL w/ nullptr in this file (and any other file you touch where that hasn't happened yet) http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: Line 38: DECLARE_bool(is_coordinator); unused http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: Line 210: /// Subscription manager owned? http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 341: bind(mem_fn(&ImpalaServer::CatalogUpdateCallback), this, _1, _2); use lambda, unless impractical. you can also inline that in the call. Line 1484: // Check if the local backend descriptor is in the list of known this function is already too long. break this up. where did this block come from? was it moved? Line 1903: // Only for coordinators, initialize the HS2 and Beeswax services. rephrase to fix grammar http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 955: /// Impala server is a coordinator (dictated from the is_coordinator flag). "indicated by the .. flag" http://gerrit.cloudera.org:8080/#/c/6344/2/tests/custom_cluster/test_coordinators.py File tests/custom_cluster/test_coordinators.py: Line 42: client = worker.service.create_beeswax_client() same for hs2? -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder
Jim Apple has posted comments on this change. Change subject: [DOCS] Replace LZO setup instructions with placeholder .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6439/2/docs/topics/impala_txtfile.xml File docs/topics/impala_txtfile.xml: PS2, Line 562: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : IN particular this section and one at the top seem likely to be universal, no matter which LZO library is used -- To view, visit http://gerrit.cloudera.org:8080/6439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder
Jim Apple has posted comments on this change. Change subject: [DOCS] Replace LZO setup instructions with placeholder .. Patch Set 2: I thought the plan was to leave in as much as possible about LZO? -- To view, visit http://gerrit.cloudera.org:8080/6439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder
John Russell has posted comments on this change. Change subject: [DOCS] Replace LZO setup instructions with placeholder .. Patch Set 3: Patch set 2 = fix a stray 'cdh' inside the same impala_txtfile page. Patch set 3 = fix a reference to the old package name including -cdh4 that I had genericized elsewhere. -- To view, visit http://gerrit.cloudera.org:8080/6439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder
John Russell has uploaded a new patch set (#3). Change subject: [DOCS] Replace LZO setup instructions with placeholder .. [DOCS] Replace LZO setup instructions with placeholder An "under construction" banner since non-Cloudera-specific URLs are not available at this time, and the information itself may be out of date / inapplicable for Apache Impala project users. Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b --- M docs/topics/impala_incompatible_changes.xml M docs/topics/impala_txtfile.xml 2 files changed, 4 insertions(+), 121 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/6439/3 -- To view, visit http://gerrit.cloudera.org:8080/6439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder
John Russell has uploaded a new patch set (#2). Change subject: [DOCS] Replace LZO setup instructions with placeholder .. [DOCS] Replace LZO setup instructions with placeholder An "under construction" banner since non-Cloudera-specific URLs are not available at this time, and the information itself may be out of date / inapplicable for Apache Impala project users. Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b --- M docs/topics/impala_txtfile.xml 1 file changed, 3 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/6439/2 -- To view, visit http://gerrit.cloudera.org:8080/6439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder
John Russell has posted comments on this change. Change subject: [DOCS] Replace LZO setup instructions with placeholder .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6439/1/docs/topics/impala_txtfile.xml File docs/topics/impala_txtfile.xml: PS1, Line 538: cdh4-0.4.15 Replace this 'cdh' portion of the filename with a generic placeholder. -- To view, visit http://gerrit.cloudera.org:8080/6439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Replace LZO setup instructions with placeholder
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6439 Change subject: [DOCS] Replace LZO setup instructions with placeholder .. [DOCS] Replace LZO setup instructions with placeholder An "under construction" banner since non-Cloudera-specific URLs are not available at this time, and the information itself may be out of date / inapplicable for Apache Impala project users. Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b --- M docs/topics/impala_txtfile.xml 1 file changed, 1 insertion(+), 118 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/6439/1 -- To view, visit http://gerrit.cloudera.org:8080/6439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Idea5a28a73287b6b3a023843ddada88204db7b1b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file
Laurel Hale has posted comments on this change. Change subject: [DOCS] Rewrite github URL for timezone source file .. Patch Set 1: Code-Review+1 Builds cleanly, displays perfectly in both html & pdf. -- To view, visit http://gerrit.cloudera.org:8080/6438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has uploaded a new patch set (#19). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 166.241us 166.241us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 71.695us 71.695us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s971ms3s809ms3 13.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s533ms1s535ms 28.80M 28.80M 532.28 MB 1.88
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5816 to look at the new patch set (#19). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 166.241us 166.241us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 71.695us 71.695us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s971ms3s809ms3 13.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 18: (5 comments) http://gerrit.cloudera.org:8080/#/c/5816/18/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 61: /// Index of the first non-passthrough child; i.e. a child that needs materializing and > i.e. a child that needs materialization (remove the "and evaluating its exp Done Line 62: /// evaluating its exprs. When all children are materialized, this should be zero. When > Shrink like this: Done http://gerrit.cloudera.org:8080/#/c/5816/18/fe/src/main/java/org/apache/impala/planner/UnionNode.java File fe/src/main/java/org/apache/impala/planner/UnionNode.java: Line 180:* children come before the children that need to be materialized and evaluated. Also > remove "and evaluated" Done Line 181:* reorders 'resultExprLists_'. This is done in order to simplify the implementation in > Instead of 'This' be explicit since the reference could be misunderstood. S Done http://gerrit.cloudera.org:8080/#/c/5816/18/tests/query_test/test_queries.py File tests/query_test/test_queries.py: Line 80: query_string = ("select count(c) from ( " > Why not add this at the end of union.test? Seems odd to have a single naked Done -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5031: Suppress noisy UBSAN errors from Thrift.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5031: Suppress noisy UBSAN errors from Thrift. .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6425 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I74ece2157048e8cd24c2536c0a292d9c21f719b9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5031: Suppress noisy UBSAN errors from Thrift.
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5031: Suppress noisy UBSAN errors from Thrift. .. IMPALA-5031: Suppress noisy UBSAN errors from Thrift. This suppresses a Thrift undefined behavior error in which a negative value is left-shifted. See THRIFT-2026 for tracking. One example backtrace from the expr-test backend test: /thrift/protocol/TCompactProtocol.tcc:375:13: runtime error: left shift of negative value -1 #0 0x2b02fe247996 in apache::thrift::protocol::TCompactProtocolT::i64ToZigzag(long) /thrift/protocol/TCompactProtocol.tcc:375:13 #1 0x2b02fe247674 in apache::thrift::protocol::TCompactProtocolT::writeI64(long) /thrift/protocol/TCompactProtocol.tcc:242:24 #2 0x2b02fe239504 in apache::thrift::protocol::TVirtualProtocol, apache::thrift::protocol::TProtocolDefaults>::writeI64_virt(long) /thrift/protocol/TVirtualProtocol.h:409:12 #3 0x2b03014a3a06 in apache::thrift::protocol::TProtocol::writeI64(long) /thrift/protocol/TProtocol.h:453:12 #4 0x2b0301d55c02 in impala::TRuntimeProfileNode::write(apache::thrift::protocol::TProtocol*) const /generated-sources/gen-cpp/RuntimeProfile_types.cpp:827:11 #5 0x2b0301d59e34 in impala::TRuntimeProfileTree::write(apache::thrift::protocol::TProtocol*) const /generated-sources/gen-cpp/RuntimeProfile_types.cpp:1017:15 #6 0x2b02f6f8c7be in impala::Status impala::ThriftSerializer::Serialize(impala::TRuntimeProfileTree*, unsigned int*, unsigned char**) /src/rpc/thrift-util.h:67:7 #7 0x2b02f6f0d23e in impala::Status impala::ThriftSerializer::Serialize(impala::TRuntimeProfileTree*, std::vector >*) /src/rpc/thrift-util.h:54:31 #8 0x2b02f6eec934 in impala::RuntimeProfile::SerializeToArchiveString(std::basic_stringstream, std::allocator >*) const /src/util/runtime-profile.cc:727:19 #9 0x2b02f6eec640 in impala::RuntimeProfile::SerializeToArchiveString() const /src/util/runtime-profile.cc:718:3 #10 0x2b02feda626c in impala::ImpalaServer::ArchiveQuery(impala::ImpalaServer::QueryExecState const&) /src/service/impala-server.cc:671:39 #11 0x2b02fedb57b0 in impala::ImpalaServer::UnregisterQuery(impala::TUniqueId const&, bool, impala::Status const*) /src/service/impala-server.cc:972:3 #12 0x2b02ff15b666 in impala::ImpalaServer::close(beeswax::QueryHandle const&) /src/service/impala-beeswax-server.cc:236:29 #13 0x2b030177dc14 in beeswax::BeeswaxServiceProcessor::process_close(int, apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, void*) /generated-sources/gen-cpp/BeeswaxService.cpp:3543:5 #14 0x2b0301763825 in beeswax::BeeswaxServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::string const&, int, void*) /generated-sources/gen-cpp/BeeswaxService.cpp:2952:3 #15 0x2b03016bc2d6 in impala::ImpalaServiceProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::string const&, int, void*) /generated-sources/gen-cpp/ImpalaService.cpp:1673:12 #16 0x2b02f650138e in apache::thrift::TDispatchProcessor::process(boost::shared_ptr, boost::shared_ptr, void*) /thrift/TDispatchProcessor.h:121:12 #17 0x1d4080a in apache::thrift::server::TThreadPoolServer::Task::run() (/build/debug/exprs/expr-test+0x1d4080a) #18 0x1d23e38 in apache::thrift::concurrency::ThreadManager::Worker::run() (/build/debug/exprs/expr-test+0x1d23e38) #19 0x2b02fe2be520 in impala::ThriftThread::RunRunnable(boost::shared_ptr, impala::Promise*) /src/rpc/thrift-thread.cc:64:3 #20 0x2b02fe2c4c6b in boost::_mfi::mf2, impala::Promise*>::operator()(impala::ThriftThread*, boost::shared_ptr, impala::Promise*) const /boost/bind/mem_fn_template.hpp:280:16 #21 0x2b02fe2c498a in void boost::_bi::list3, boost::_bi::value >, boost::_bi::value*> >::operator(), impala::Promise*>, boost::_bi::list0>(boost::_bi::type, boost::_mfi::mf2, impala::Promise*>&, boost::_bi::list0&, int) /boost/bind/bind.hpp:392:9 #22 0x2b02fe2c444b in boost::_bi::bind_t, impala::Promise*>, boost::_bi::list3, boost::_bi::value >, boost::_bi::value*> > >::operator()() /boost/bind/bind_template.hpp:20:16 #23 0x2b02fe2c3709 in boost::detail::function::void_function_obj_invoker0, impala::Promise*>, boost::_bi::list3, boost::_bi::value >, boost::_bi::value*> > >, void>::invoke(boost::detail::function::function_buffer&) /boost/function/function_template.hpp:153:11 #24 0x2b02f70085d4 in boost::function0::operator()() const /boost/function/function_template.hpp:766:14 #25 0x2b02f6ff9710 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function, impala::Promise*) /src/util/thread.cc:325:3 #26 0x2b02f7021783 in void boost::_bi::list4, boost::_bi::value, boost::_bi::value >, boost::_bi::value*> >::operator(), impala::Promise*), b
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5816 to look at the new patch set (#19). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 166.241us 166.241us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 71.695us 71.695us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s971ms3s809ms3 13.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5816 to look at the new patch set (#19). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. Removed operand reordering in the FE because it's simpler and safer to handle all passthrough children before non-passthrough children in the BE. The recent improvements to memory management allowed us to remove this requirement. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 19s139ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 166.241us 166.241us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 71.695us 71.695us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s971ms3s809ms3 13.08 MB 10.00 MB 00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0
[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file
Jim Apple has posted comments on this change. Change subject: [DOCS] Rewrite github URL for timezone source file .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Rewrite github URL for timezone source file
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6438 Change subject: [DOCS] Rewrite github URL for timezone source file .. [DOCS] Rewrite github URL for timezone source file Use the corresponding Apache github URL instead of the Cloudera one. Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2 --- M docs/impala_keydefs.ditamap M docs/topics/impala_fixed_issues.xml 2 files changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6438/1 -- To view, visit http://gerrit.cloudera.org:8080/6438 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I40bff9286d08f1de11abd9ab4b804855f0330cc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3203: Part 1: Free list implementation .. Patch Set 9: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/395/ -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3203: Part 1: Free list implementation .. Patch Set 9: Code-Review+2 Rebase -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Hello Jim Apple, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6410 to look at the new patch set (#9). Change subject: IMPALA-3203: Part 1: Free list implementation .. IMPALA-3203: Part 1: Free list implementation We will have a single free list per size class. Free buffers are stored in a heap, ordered by the memory address of the buffer, to help reduce address space fragmentation. A follow-up patch will use the free lists in BufferPool. Currently TCMalloc has thread-local caches with somewhat similar purpose. However, these are not suitable for our purposes for several reasons: * They are designed for caching small allocations - large allocations like most buffers are served from a global page heap protected by a global lock. * We intend to move away from using TCMalloc for buffers: IMPALA-5073 * Thread caches are ineffective for the producer-consumer pattern where one thread allocates memory and another thread frees it. * TCMalloc gives us limited control over how and when memory is actually released to the OS. Testing: Added unit tests for sanity checks and verification of behaviour that is trickier to check in integration or system tests. The cost will be exercised more thoroughly via BufferPool in Part 2. Performance: Includes a benchmark that demonstrates the scalability of the free lists under concurrency. When measuring pure throughput of free list operations, having a free list per core is significantly faster than a shared free list, or allocating directly from TCMalloc. On 8 cores, if the memory allocated is actually touched then for 64KB+ buffers, memory access is the bottleneck rather than lock contention. The benchmark also showed that non-inlined constructors and move operators of BufferHandle were taking significant CPU cycles, so I inlined those. This suggests that having a free list per core is more than sufficient (however, we will need to validate this with system concurrency benchmarks once we switch to using this during query execution). Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/free-lists-benchmark.cc M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h A be/src/runtime/bufferpool/free-list-test.cc A be/src/runtime/bufferpool/free-list.h M be/src/util/benchmark.cc M be/src/util/benchmark.h 11 files changed, 812 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6410/9 -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/394/ -- To view, visit http://gerrit.cloudera.org:8080/6437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Henry Robinson has posted comments on this change. Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/393/ -- To view, visit http://gerrit.cloudera.org:8080/6428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Henry Robinson has posted comments on this change. Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. Patch Set 1: With Snappy 1.1.3, that query against a 96 million row table took on average 6.1s. With Snappy 1.1.4, the query took 5.8s. Confirmed that these were release builds, same query options etc. -- To view, visit http://gerrit.cloudera.org:8080/6428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Dan Hecht has posted comments on this change. Change subject: IMPALA-3203: Part 1: Free list implementation .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
Michael Brown has posted comments on this change. Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress .. Patch Set 2: Code-Review+2 Thanks for the commit message explaining better test alternatives. -- To view, visit http://gerrit.cloudera.org:8080/6437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress .. IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress The test does not work as intended and would likely not provide very good coverage of the different spilling paths, because it only runs a single simple query. The stress test (tests/stress/concurrent_select.py) provides much better coverage. test_mem_usage_scaling.py probably also provides better coverage that TestSpillStress in its current form. Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 --- M tests/custom_cluster/test_spilling.py 1 file changed, 0 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6437/2 -- To view, visit http://gerrit.cloudera.org:8080/6437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6437 Change subject: IMPALA-IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress .. IMPALA-IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress The test does not work as intended and would likely not provide very good coverage of the different spilling paths, because it only runs a single simple query. The stress test (tests/stress/concurrent_select.py) provides much better coverage. test_mem_usage_scaling.py probably also provides better coverage that TestSpillStress in its current form. Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 --- M tests/custom_cluster/test_spilling.py 1 file changed, 0 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6437/1 -- To view, visit http://gerrit.cloudera.org:8080/6437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Henry Robinson has posted comments on this change. Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. Patch Set 1: I'll try and do this now. Because of the way the toolchain works, we can't get any other version changes in until this one is committed (all subsequent builds build snappy 1.1.4 and not 1.1.3). We could do a historical build, but better to make forward progress. -- To view, visit http://gerrit.cloudera.org:8080/6428 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f3439019ae22bbcf4db7f731e45ba5f7899fcc2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/392/ -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0)
Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/6436 Change subject: IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) .. IMPALA-5057: Upgrade glog (0.3.4-p2) and gflags (2.2.0) Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee --- M bin/impala-config.sh 1 file changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/6436/1 -- To view, visit http://gerrit.cloudera.org:8080/6436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If5c8718c67f021e18b3b95178b077fc147d6fcee Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 14 files changed, 654 insertions(+), 301 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/2 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 1: (15 comments) http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG Commit Message: Line 11: serialization library. > Any benchmark numbers? Just curious. Also I think it is good to add them he Done http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS1, Line 24: > Trailing white spaces at multiple places in this file. Done Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b) > Can this be moved outside the loop like JAVA_FE_ARGS Done http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: PS1, Line 20: > extraneous white spaces. Done PS1, Line 54: file_name_prefix_idx > Didn't understand what this prefix means till I read HdfsTable.fileNamePref Done Line 70: root_type FbFileDesc; > What is the significance of this? I'm reading about FlatBuffers for the fir Yeah, it is mostly used if you read from JSON. Removed. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 83: storageIdGenerator.put(host, new Short(diskId)); > Drive-by comment: using 'new' here is pessimising memory consumption. If yo Good point thanks. Done http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: PS1, Line 318: REPLICA_HOST_IDX_MASK > Looks like it used for UNMASKing, rename may be? It's a bit mask, so I'd rather keep the name as such :) Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF; > Document these? Done PS1, Line 357: getHostIdx > how about just making hostIdx_ a short rather than typecasting everytime? Done PS1, Line 360: REPLICA_HOST_CACHE_MASK > Shouldn't this be short rather than int? May be it works fine this way, jus For convenience. Java short is signed. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 315: Preconditions.checkNotNull(fd); > I don't think this FileDescriptor.create() can ever return null? If yes, wo Done Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' and returns a > I like the idea but I'm wondering if this kind of optimization is a little That's not entirely true. Multiple filenames from a single write do have a common prefix. Line 420: prefixCompressFileName(currentFileName, prevFileName); > Also I think it may not be optimal to construct the prefixes this way by co In principal you're right, this is not the best way to compress strings. However, due to the way these file names are generated (random) this is not meant to be a generic file name compressor. Here we take advantage of the pattern that file names of a single write exhibit to avoid storing the common prefix multiple times. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 96: protected static EnumSet SUPPORTED_TABLE_TYPES = EnumSet.of( > May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by Not sure how this ended up here. Removed. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes