[Impala-ASF-CR] Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().
Impala Public Jenkins has submitted this change and it was merged. Change subject: Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct(). .. Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct(). This caching is causing problems when multiple threads are involved in query startup, which is being implemented as part of the move to a per-query exec rpc (IMPALA-2550). Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06 Reviewed-on: http://gerrit.cloudera.org:8080/6511 Reviewed-by: Marcel Kornacker Tested-by: Impala Public Jenkins --- M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h 2 files changed, 40 insertions(+), 46 deletions(-) Approvals: Marcel Kornacker: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().
Impala Public Jenkins has posted comments on this change. Change subject: Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct(). .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4883: Union Codegen .. Patch Set 4: I ran the query above on Patch Set 4 as well, complete results: Apache/master: 17.60s Patch set 4: 10.58s Patch set 5: 9.98s -- To view, visit http://gerrit.cloudera.org:8080/6459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4883: Union Codegen .. Patch Set 5: I reran the benchmark on patch 5 on a larger table where we select only 1 column: SELECT COUNT(c) FROM ( select fnv_hash(ss_sold_time_sk) c from tpcds_10_parquet.store_sales_unpartitioned_big union all select fnv_hash(ss_sold_time_sk) c from tpcds_10_parquet.store_sales_unpartitioned_big union all select fnv_hash(ss_sold_time_sk) c from tpcds_10_parquet.store_sales_unpartitioned_big union all select fnv_hash(ss_sold_time_sk) c from tpcds_10_parquet.store_sales_unpartitioned_big ) t Before: 17.6s After: 9.98s Not a huge difference. I think the bottleneck is scanning (not union), that's why the improvement is not as big. Maybe the difference will be more significant on a large cluster? -- To view, visit http://gerrit.cloudera.org:8080/6459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4883: Union Codegen .. IMPALA-4883: Union Codegen For each non-passthrough child of the Union node, codegen the loop that does per row tuple materialization. Testing: Ran test_queries.py test locally in exchaustive mode. Benchmark: Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned store_sales table. SELECT COUNT(c), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: 39s704ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 194.504us 194.504us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 17.284us 17.284us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s202ms2s934ms3 1 115.00 KB 10.00 MB 00:UNION 3 32s514ms 34s926ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales |--06:SCAN HDFS3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--07:SCAN HDFS3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales |--08:SCAN HDFS3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales |--09:SCAN HDFS3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales |--10:SCAN HDFS3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales After: 20s664ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 167.757us 167.757us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 16.592us 16.592us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s924ms3s715ms3 1 115.00 KB 10.00 MB 00:UNION 34s971ms6s082ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS31s189ms1s588ms 28.80M 28.80M 483.82 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS31s117ms1s157ms 28.80M 28.80M 484.85 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS31s226ms1s454ms 28.80M 28.80M 483.00 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS31s141ms1s3
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4883: Union Codegen .. IMPALA-4883: Union Codegen For each non-passthrough child of the Union node, codegen the loop that does per row tuple materialization. Testing: Ran test_queries.py test locally in exchaustive mode. Benchmark: Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned store_sales table. SELECT COUNT(c), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: 39s704ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 194.504us 194.504us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 17.284us 17.284us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s202ms2s934ms3 1 115.00 KB 10.00 MB 00:UNION 3 32s514ms 34s926ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales |--06:SCAN HDFS3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--07:SCAN HDFS3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales |--08:SCAN HDFS3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales |--09:SCAN HDFS3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales |--10:SCAN HDFS3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales After: 20s664ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 167.757us 167.757us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 16.592us 16.592us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s924ms3s715ms3 1 115.00 KB 10.00 MB 00:UNION 34s971ms6s082ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS31s189ms1s588ms 28.80M 28.80M 483.82 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS31s117ms1s157ms 28.80M 28.80M 484.85 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS31s226ms1s454ms 28.80M 28.80M 483.00 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS31s141ms1s3
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4883: Union Codegen .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/6459/4/be/src/exec/union-node-ir.cc File be/src/exec/union-node-ir.cc: Line 28: while (!dst_batch->AtCapacity() && child_row_idx_ < child_batch_->num_rows()) { > The remaining optimisation is to pull all references to member variables ou Done. I think I took everything out. Let me know if you see any other ways to further improve performance. Line 34: if (ReachedLimit()) break; > We can avoid checking limits for each row if we check it at the end and tru Good idea, done. (We already do a similar thing in the passthrough case) -- To view, visit http://gerrit.cloudera.org:8080/6459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has uploaded a new patch set (#7). Change subject: IMPALA-4883: Union Codegen .. IMPALA-4883: Union Codegen For each non-passthrough child of the Union node, codegen the loop that does per row tuple materialization. Testing: Ran test_queries.py test locally in exchaustive mode. Benchmark: Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned store_sales table. SELECT COUNT(c), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: 39s704ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 194.504us 194.504us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 17.284us 17.284us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s202ms2s934ms3 1 115.00 KB 10.00 MB 00:UNION 3 32s514ms 34s926ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales |--06:SCAN HDFS3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--07:SCAN HDFS3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales |--08:SCAN HDFS3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales |--09:SCAN HDFS3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales |--10:SCAN HDFS3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales After: 20s664ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 167.757us 167.757us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 16.592us 16.592us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s924ms3s715ms3 1 115.00 KB 10.00 MB 00:UNION 34s971ms6s082ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS31s189ms1s588ms 28.80M 28.80M 483.82 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS31s117ms1s157ms 28.80M 28.80M 484.85 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS31s226ms1s454ms 28.80M 28.80M 483.00 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS31s141ms1s3
[Impala-ASF-CR] IMPALA-4893: Efficiently update the rows read counter for sequence file
anujphadke has uploaded a new change for review. http://gerrit.cloudera.org:8080/6522 Change subject: IMPALA-4893: Efficiently update the rows read counter for sequence file .. IMPALA-4893: Efficiently update the rows read counter for sequence file Update the rows read counter after processing the scan range instead of updating it after reading every row for sequence files to save CPU cycles. Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 --- M be/src/exec/hdfs-sequence-scanner.cc 1 file changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6522/1 -- To view, visit http://gerrit.cloudera.org:8080/6522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie42c97a36e46172884cc497aa645036c2c11f541 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: anujphadke
[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views
Zach Amsden has uploaded a new patch set (#10). Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views .. IMPALA-5003: Constant propagation in scan nodes and inline views N.B. - preview diff again. I added an inequality collation phase which is not yet tested or finished, but should combine conjuncts such as a < k1, a < k2 into a < min(k1, k2). When conjuncts are pushed into table refs and inline views, they can be considered for constant progagation within that node. In certain cases, we might end up with a FALSE conditional and now we can convert ScanNodes to EmptySet nodes when that occurs. Testing: Expanded the test cases for the planner to achieve constant propagation. Added Kudu, datasource, Hdfs and HBase tests to validate we can create EmptySetNodes Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/ValueRange.java M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test 18 files changed, 607 insertions(+), 110 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/10 -- To view, visit http://gerrit.cloudera.org:8080/6389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has uploaded a new patch set (#6). Change subject: IMPALA-4883: Union Codegen .. IMPALA-4883: Union Codegen For each non-passthrough child of the Union node, codegen the loop that does per row tuple materialization. Testing: Ran test_queries.py test locally in exchaustive mode. Benchmark: Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned store_sales table. SELECT COUNT(c), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: 39s704ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 194.504us 194.504us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 17.284us 17.284us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s202ms2s934ms3 1 115.00 KB 10.00 MB 00:UNION 3 32s514ms 34s926ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales |--06:SCAN HDFS3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--07:SCAN HDFS3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales |--08:SCAN HDFS3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales |--09:SCAN HDFS3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales |--10:SCAN HDFS3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales After: 20s664ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 167.757us 167.757us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 16.592us 16.592us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s924ms3s715ms3 1 115.00 KB 10.00 MB 00:UNION 34s971ms6s082ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS31s189ms1s588ms 28.80M 28.80M 483.82 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS31s117ms1s157ms 28.80M 28.80M 484.85 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS31s226ms1s454ms 28.80M 28.80M 483.00 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS31s141ms1s3
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has uploaded a new patch set (#5). Change subject: IMPALA-4883: Union Codegen .. IMPALA-4883: Union Codegen For each non-passthrough child of the Union node, codegen the loop that does per row tuple materialization. Testing: Ran test_queries.py test locally in exchaustive mode. Benchmark: Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned store_sales table. SELECT COUNT(c), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned union all select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: 39s704ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 194.504us 194.504us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 17.284us 17.284us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s202ms2s934ms3 1 115.00 KB 10.00 MB 00:UNION 3 32s514ms 34s926ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales |--06:SCAN HDFS3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales |--07:SCAN HDFS3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales |--08:SCAN HDFS3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales |--09:SCAN HDFS3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales |--10:SCAN HDFS3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales 01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales After: 20s664ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 167.757us 167.757us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 16.592us 16.592us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s924ms3s715ms3 1 115.00 KB 10.00 MB 00:UNION 34s971ms6s082ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS31s189ms1s588ms 28.80M 28.80M 483.82 MB 1.88 GB tpcds_10_parquet.store_sales |--03:SCAN HDFS31s117ms1s157ms 28.80M 28.80M 484.85 MB 1.88 GB tpcds_10_parquet.store_sales |--04:SCAN HDFS31s226ms1s454ms 28.80M 28.80M 483.00 MB 1.88 GB tpcds_10_parquet.store_sales |--05:SCAN HDFS31s141ms1s3
[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors .. IMPALA-5137: pt1, Refactor TimestampValue constructors In preparation for supporting Kudu TIMESTAMPs, some cleanup of the TimestampValue code will be helpful first. This change just refactors some of the TimestampValue constructors which do more than simple construction, e.g. parsing or converting, to static factory methods. Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1 --- M be/src/benchmarks/parse-timestamp-benchmark.cc M be/src/exec/data-source-scan-node.cc M be/src/exec/text-converter.inline.h M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/cast-functions-ir.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-test.cc M be/src/exprs/expr-value.h M be/src/exprs/literal.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.cc M be/src/runtime/raw-value-test.cc M be/src/runtime/runtime-state.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/service/impala-http-handler.cc M be/src/service/impala-server.cc M be/src/service/query-exec-state.cc M be/src/statestore/statestore-subscriber.cc M be/src/util/dict-test.cc 21 files changed, 254 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6510/2 -- To view, visit http://gerrit.cloudera.org:8080/6510 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5137: pt1, Refactor TimestampValue constructors
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5137: pt1, Refactor TimestampValue constructors .. Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/exec/data-source-scan-node.cc File be/src/exec/data-source-scan-node.cc: PS1, Line 284: FromUnixTimeNanos > does it make sense that this conversion is subjected to use local tz flag? Hm I'm not sure, though if so I'll file a separate JIRA so it can be tracked externally as this would be a breaking change (which probably affects nearly nobody). http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS1, Line 354: TimestampValue > maybe use "const TimetampValue& tv = ..." here (and elsewhere), though mayb Done http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/exprs/timestamp-functions-ir.cc File be/src/exprs/timestamp-functions-ir.cc: PS1, Line 81: lexical_cast(t) > how about getting rid of this implicit code as well by defining a ToString( yeah this is weird, agreed http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/runtime/timestamp-test.cc File be/src/runtime/timestamp-test.cc: PS1, Line 632: coverted > converted Done PS1, Line 641: EXPECT_TRUE(leap_tv1.ToUnixTime(&leap_time_t)); : EXPECT_EQ(915148800, leap_time_t); : // Converted to the Unix time representation : EXPECT_EQ("1999-01-01 00:00:00", leap_tv1.DebugString()); > maybe do these cases on leap_ptime2 as well, just to be sure everything is Done http://gerrit.cloudera.org:8080/#/c/6510/1/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS1, Line 91: Conversion to local time will be done if > Return the corresponding timestamp in the local timezone if Done PS1, Line 92: . > Otherwise, return the corresponding timestamp in UTC. Done PS1, Line 97: 'FromUnixTime' > For functions, we usually write: FromUnixTime() Done PS1, Line 105: unix_time > 'unix_time' Done PS1, Line 287: operator > do we need this other than that lexical_cast I commented on? I think we could work around it, but this does get used in a few places where we branch on the type and directly use this operator, e.g. literal.cc and raw-value.cc -- To view, visit http://gerrit.cloudera.org:8080/6510 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id25e19f7984e5ebf9073d9c569faf69cec142fa1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3637,IMPALA-3636: refactor codegen constant replacement
Tim Armstrong has abandoned this change. Change subject: IMPALA-3637,IMPALA-3636: refactor codegen constant replacement .. Abandoned Abandoning for now since there wasn't a consensus about it and I don't have time right now to push it forward. I think we should consider going forward with it even if some of it will be superseded by other mechanisms because it provides immediate performance benefits by inlining CpuInfo::IsSupported() and speeding up the interpreted path. -- To view, visit http://gerrit.cloudera.org:8080/3843 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I6ba029ed8589698eb15dbfb0a20dd2a7ea752635 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3203: Part 2: per-core free lists in buffer pool
Dan Hecht has posted comments on this change. Change subject: IMPALA-3203: Part 2: per-core free lists in buffer pool .. Patch Set 15: (14 comments) Next batch. I've gone through buffer-allocator.h/.cc. http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/bufferpool/buffer-allocator.cc File be/src/runtime/bufferpool/buffer-allocator.cc: PS15, Line 137: Return the buffer size for a buffer of the given length. doesn't match PS15, Line 138: GetBuffersOfSize GetListsForSize? PS15, Line 419: FreeBuffers this name is a bit confusing now that i see it in use since "free buffers" is also used as a noun. Anyway, this code might be clearer if instead of this method, the free-list exposes a way to get N buffers from the tail, and then this class calls the system_allocator_->Free(). That way, all system_allocator_ Allocate()/Free() happens in one layer and FreeList doesn't have to know about SystemAllocator. (You'd have to remove FreeAll() also, but that seems like you could just use the same routine by passing the number of buffers in the list as the count). PS15, Line 421: so that other threads : // can claim the memory. not sure what this comment is saying. PS15, Line 423: bytes_freed > 0 when would this be false? PS15, Line 478: !al.owns_lock() && why have this condition here? it's still correct to skip the whole block if the lock is already held, right? Line 488: // Figure out how many buffers we should free of this size. i was misled by this comment since at this point, buffers_to_free is only how many buffers from the free list (not how many buffers we should free of this size). PS15, Line 489: bytes_to_free this is so similarly named to buffers_to_free and buffer_bytes_to_free but has a different meaning in that it's the target, not the accumulated. That makes the loop a bit harder to understand. So, how about renaming this to 'target_bytes' or something? Line 496: // that they are freed based on memory address in the expected order. this is basically the same as evicting the pages, right? it would help to use that terminology here to make the connection explicit. PS15, Line 525: int64_t max_to_claim = claim_memory ? bytes_to_free : 0; : if (bytes_freed > max_to_claim) { : // Add back the extra for other threads before releasing the lock to avoid race : // where the other thread may not be able to find enough buffers. : parent_->system_bytes_remaining_.Add(bytes_freed - max_to_claim); : bytes_freed = max_to_claim; : } as mentioned earlier in the function comment, this is pretty confusing. PS15, Line 551: BufferHandle buffer; : { : lock_guard pl(page->buffer_lock); : buffer = move(page->buffer); : } : lists->free_buffers.AddFreeBuffer(move(buffer)); : lists->num_free_buffers.Add(1); why doesn't this need to do the other stuff that FreeBufferArena::AddFreeBuffer() does? Why not just remove 'claim_buffer' and this block, and then have the buffer-pool just always call back down into BufferAllocator::Free(). Oh we just talked about that and you pointed out that then a client would temporarily be over its reservation while the lock is dropped. But, shouldn't this block do everything FreeBufferArena::AddFreeBuffer() does? PS15, Line 569: touched needed? Line 571: // least one, we guarantee that an idle list will shrink to zero entries. can you add some motivation for why we need to care about the previous two intervals, rather than, say, just the last interval? PS15, Line 576: bytes_freed > 0 when would this be false? -- To view, visit http://gerrit.cloudera.org:8080/6414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I612bd1cd0f0e87f7d8186e5bedd53a22f2d80832 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Michael Ho has posted comments on this change. Change subject: IMPALA-3079: Fix sequence file writer .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3079: Fix sequence file writer
Michael Ho has posted comments on this change. Change subject: IMPALA-3079: Fix sequence file writer .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/6107/4/be/src/exec/hdfs-sequence-table-writer.cc File be/src/exec/hdfs-sequence-table-writer.cc: PS4, Line 179: : > The old sequence file writer had multiple issues: Thanks for listing them out. Please also put this list in the commit message. http://gerrit.cloudera.org:8080/#/c/6107/5/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: Line 214: // Returns size of the encoded long value, including the 1 byte for length. for val < -112 or val > 127. PS5, Line 228: nit: long line PS5, Line 245: -(num_bytes + 111); nit: help to comment why it's 119 here (which is different from 120 in the comment above) as we already include the length of the extra 1 byte to differentiate between the case of -112 <= v <= 127 vs other values. -- To view, visit http://gerrit.cloudera.org:8080/6107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0db642ad35132a9a5a6611810a6cafbbe26e7487 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [SECURITY] Add security library to build
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4669: [SECURITY] Add security library to build .. Patch Set 5: (1 comment) Some of the same questions/comments apply from the util/ review http://gerrit.cloudera.org:8080/#/c/5717/5/be/src/kudu/security/token_verifier.cc File be/src/kudu/security/token_verifier.cc: Line 160: DCHECK(false); seems reasonable to add this upstream -- To view, visit http://gerrit.cloudera.org:8080/5717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4f327810066bee7f3ac107b0295480fb9ed45e14 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Add kudu util library to the build.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Add kudu_util library to the build. .. Patch Set 5: (5 comments) 1) Brining in all the gflags is maybe concerning. will any conflict with ours? even if not, they will crowd our already large number of gflags. perhaps we need to look more closely at them. 2) I'm curious to know how this (and more particularly the actual code import change) affects the binary size. 3) It'd be good if we could push some of these changes upstream if possible. Some additional comments inline. http://gerrit.cloudera.org:8080/#/c/5715/5/be/CMakeLists.txt File be/CMakeLists.txt: Line 315: maintenance_manager_proto It might be nice to keep kudu specific libs in a separate list and merge that list in. http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/CMakeLists.txt File be/src/kudu/util/CMakeLists.txt: Line 208: if(NOT NO_NVM_SUPPORT) I think the NO_NVM_SUPPORT flag would probably be useful to add to Kudu, and of course if its upstream it's less to maintain in our codebase in the future. Line 331: target_link_libraries(protoc-gen-insertions gutil glog gflags protobuf protoc ${KUDU_BASE_LIBS}) can this be added upstream? http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/env.cc File be/src/kudu/util/env.cc: Line 10: #include "kudu/util/status.h" seems like they're missing this include, can this be added upstream? http://gerrit.cloudera.org:8080/#/c/5715/5/be/src/kudu/util/logging.cc File be/src/kudu/util/logging.cc: Line 47: DECLARE_string(log_filename); what does this mean for our log files? will this code start writing to our log? -- To view, visit http://gerrit.cloudera.org:8080/5715 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8da798ee55506d6f969416b17c191eb03cb215f5 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().
Marcel Kornacker has posted comments on this change. Change subject: Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct(). .. Patch Set 2: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/6511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct().
Impala Public Jenkins has posted comments on this change. Change subject: Remove caching of llvm StructType by TupleDescriptor::GetLlvmStruct(). .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/424/ -- To view, visit http://gerrit.cloudera.org:8080/6511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6f765048f36b9e596aeb4e4482d38b94f5ffba06 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4701: make distcc work reliably with clang
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4701: make distcc work reliably with clang .. Patch Set 7: Any more comments? -- To view, visit http://gerrit.cloudera.org:8080/6493 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I199b245fb14b6c3484b66339a7d4b37d74755af7 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts, : so the SORT node will contain all partitioning columns first, followed : by the SORT BY columns. > in general, the commit message shouldn't act as documentation. for things l That's fine by me. Feel free to simplify the description here and move it to the JIRA. I just wanted to understand the implemented semantics instead of trying to infer them from the code. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4883: Union Codegen .. Patch Set 4: (2 comments) Hmm, I guess the per-row overhead is probably not as significant for the case when we're returning a bunch of columns. There might be a more pronounced effect if we're just returning a handful of columns. I think we can still squeeze some more cycles out of this with minimal effort, but if we stop seeing a measurable improvement for a query that returns a smaller number of columns we could consider stopping then. http://gerrit.cloudera.org:8080/#/c/6459/4/be/src/exec/union-node-ir.cc File be/src/exec/union-node-ir.cc: Line 28: while (!dst_batch->AtCapacity() && child_row_idx_ < child_batch_->num_rows()) { The remaining optimisation is to pull all references to member variables out of the loop. I.e. child_row_idx_, child_batch_, child_row_idx_, child_expr_lists_, tuple_desc_->byte_size(). This will reduce the number of loads and stores quite a bit. E.g. int child_row_idx = child_row_idx_; int tuple_byte_size = tuple_desc_->byte_size; while (...) { } child_row_idx_ = child_row_idx; Currently it will do a load and store to variables like 'child_row_idx_' via the 'this' pointer on every loop iteration. The compiler could do that automatically if it could deduce that the values aren't modified via a different pointer, but I don't think it's deducible in this case because the compiler has to generate code that's "correct" in a weird case like 'this' and *tuple_buf pointing to the same memory. Line 34: if (ReachedLimit()) break; We can avoid checking limits for each row if we check it at the end and truncate the batch using RowBatch::set_num_rows. E.g. see SortNode::GetNext(). -- To view, visit http://gerrit.cloudera.org:8080/6459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts, : so the SORT node will contain all partitioning columns first, followed : by the SORT BY columns. > Done in general, the commit message shouldn't act as documentation. for things like that, we should use the jira, which is more accessible as a public record. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4889: Use client sidecars for Thrift RPCs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4889: Use client sidecars for Thrift RPCs .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6473 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5127: Add history max option
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5127: Add history_max option .. Patch Set 2: (1 comment) Thanks, this will be very handy. What do you think about also bumping up the default value? I suspect most people won't think to change the config but might benefit from a longer history. We should also make sure to document this in docs/topics/impala_shell_options.xml. http://gerrit.cloudera.org:8080/#/c/6335/2/shell/impala_shell.py File shell/impala_shell.py: Line 183: self.readline.set_history_length(int(options.history_max)) It would be nice to produce a human-friendly error message. I get the following, which hints at the problem but has a lot of noise: tarmstrong@tarmstrong-box:~/Impala/incubator-impala$ impala-shell.sh Starting Impala Shell without Kerberos authentication Traceback (most recent call last): File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line 1382, in shell = ImpalaShell(options) File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line 183, in __init__ self.readline.set_history_length(int(options.history_max)) ValueError: invalid literal for int() with base 10: '100k' -- To view, visit http://gerrit.cloudera.org:8080/6335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5128: BE Test Infrastructure
Michael Ho has posted comments on this change. Change subject: IMPALA-5128: BE Test Infrastructure .. Patch Set 2: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/6477/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: Line 2093: if (FLAGS_run_exhaustive) nit: missing { } http://gerrit.cloudera.org:8080/#/c/6477/2/be/src/testutil/gtest-util.h File be/src/testutil/gtest-util.h: PS2, Line 25: DECLARE_bool(run_exhaustive); This can use a one line comment. http://gerrit.cloudera.org:8080/#/c/6477/2/bin/run-backend-tests.sh File bin/run-backend-tests.sh: Line 30: # parse command line options Not your change but it helps to add a comment that -E and -R are ctest options to exclude and include tests matching the given regular expression -- To view, visit http://gerrit.cloudera.org:8080/6477 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71fe6de39dff21b21d322daf0232b0578bdff297 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[Impala-ASF-CR] Increase the robustness of pip download.py
Tim Armstrong has posted comments on this change. Change subject: Increase the robustness of pip_download.py .. Patch Set 1: Is there a JIRA for this? I'm not opposed to this as a workaround but this seems like a stopgap and it would be nice to have a more robust solution. It would be nice if we could remove the dependence on the main python.org. The information that we're getting in JSON format is available in a HTML format on all mirrors (https://www.python.org/dev/peps/pep-0503/). E.g. https://pypi.python.org/simple/kudu-python/ -- To view, visit http://gerrit.cloudera.org:8080/6518 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie2d923e4b061a4197be69e3259aad881aca7b519 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports
Michael Brown has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports .. Patch Set 1: (3 comments) I checked impala_fixed_issues down to "Issues Fixed in Impala 2.1.1" and found a couple problems. Other reviewers should build the docs and check the links for the rest of that page. http://gerrit.cloudera.org:8080/#/c/6515/1/docs/topics/impala_fixed_issues.xml File docs/topics/impala_fixed_issues.xml: PS1, Line 59: https://issues.apache.org/jira/issues/?jql= This works, but why not make it jira_list_280? PS1, Line 573: This didn't work for me. It seems as if there is no fixed_issues_232 anchor. http://gerrit.cloudera.org:8080/#/c/6515/1/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: PS1, Line 96: https://issues.apache.org/jira/IMPALA-2093?filter= Doesn't work. What about this? https://issues.apache.org/jira/issues/?jql=project%20%3D%20IMPALA%20AND%20priority%20in%20(blocker%2C%20critical)%20AND%20status%20in%20(open%2C%20Reopened)%20AND%20labels%20%3D%20correctness%20ORDER%20BY%20priority%20DESC If Gerrit muddles this, let me know and I'll email it to you. -- To view, visit http://gerrit.cloudera.org:8080/6515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4669: [KUTIL] Import kudu util library from kudu@a1bfd7b
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4669: [KUTIL] Import kudu_util library from kudu@a1bfd7b .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5714 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d89384730b60354b5fae2b1472183d2a561d170 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 4: (7 comments) Thank you for the reviews. Please see my comments and PS5. http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 9: SORT BY (...) clauses to CREATE : TABLE and ALTER TABLE statements > It would be helpful to have the formal specification of the new syntax here Done PS3, Line 13: TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) : PARTITIONS 8 SORT BY(i) STORED AS KUDU; : CREATE TABLE t SORT BY > Can you expand the comment a little bit on the semantics of this? A few hig Done http://gerrit.cloudera.org:8080/#/c/6495/3/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS3, Line 382: Optiona > Optional? Done http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS3, Line 1132: // This extra production is necessary since without it the parser will not be able to : // parse "CREATE TABLE A LIKE B". > I don't understand this comment, since the sort_col_cols here is optional. At first I tried adding sort_by_cols to the production above, but then the parser would not pick it up for queries like "CREATE TABLE A LIKE B" and instead try to parse it only as a "CREATE TABLE LIKE FILE" query. I could not figure out why it does that, but copying the rule fixes it. I agree that it should work. I'll ask Alex for help. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 452: > I think this only makes sense in the context of testing? If so, note that h Done http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS3, Line 539: s' property and the query has a 'noclu > I think that you need a '&& !hasNoClusteredHint()' here. Done, also added a comment to explain that the hint will be ignored if there are sort by columns. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" + > Is 'sort.by.columns'='' handled? Yes, added a test. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#5). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 856 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/5 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#4). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 856 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/4 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] Increase the robustness of pip download.py
Taras Bobrovytsky has uploaded a new change for review. http://gerrit.cloudera.org:8080/6518 Change subject: Increase the robustness of pip_download.py .. Increase the robustness of pip_download.py There were some build failures due to a failure to download a JSON file for some Python package. In this patch, we increase the number of download attempts and randomly vary the amount of time between each download attempt. Change-Id: Ie2d923e4b061a4197be69e3259aad881aca7b519 --- M infra/python/deps/pip_download.py 1 file changed, 6 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/6518/1 -- To view, visit http://gerrit.cloudera.org:8080/6518 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie2d923e4b061a4197be69e3259aad881aca7b519 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports
Jim Apple has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (6 comments) Just a heads up, this is going to have some conflicts with IMPALA-3742 (partition and sort Kudu inserts) which I'll have a patch out for soon, but this will probably go in first so I guess that's my problem. http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 9: SORT BY (...) clauses to CREATE : TABLE and ALTER TABLE statements It would be helpful to have the formal specification of the new syntax here, to show where the clause would go, e.g.: ALTER TABLE name SORT BY (col_spec[, col_spec...]) etc. http://gerrit.cloudera.org:8080/#/c/6495/3/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS3, Line 382: Options Optional? http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS3, Line 1132: // This extra production is necessary since without it the parser will not be able to : // parse "CREATE TABLE A LIKE B". I don't understand this comment, since the sort_col_cols here is optional. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 452: public void setNumClusteringCols(int n) { numClusteringCols_ = n; } I think this only makes sense in the context of testing? If so, note that here. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS3, Line 539: !insertStmt.getSortByExprs().isEmpty() I think that you need a '&& !hasNoClusteredHint()' here. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 515: AnalysisError("alter table functional.alltypes set tblproperties(" + Is 'sort.by.columns'='' handled? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4226, IMPALA-4227: Bump Breakpad version
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4226, IMPALA-4227: Bump Breakpad version .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I89cbcc3f5cc742a0f5700f3985c02482bdfd03d0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4226, IMPALA-4227: Bump Breakpad version
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4226, IMPALA-4227: Bump Breakpad version .. IMPALA-4226, IMPALA-4227: Bump Breakpad version Change-Id: I89cbcc3f5cc742a0f5700f3985c02482bdfd03d0 Reviewed-on: http://gerrit.cloudera.org:8080/6513 Reviewed-by: Lars Volker Tested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Lars Volker: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I89cbcc3f5cc742a0f5700f3985c02482bdfd03d0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6515 Change subject: IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports .. IMPALA-4643: [DOCS] Change URLs / set up keydefs for JIRA reports In addition to switching URLs for individual JIRAs to point to issues.apache.org, also construct alternative URLs for JIRA reports (usually 1 per release, summarizing the issues fixed in that release). There is some inconsistency in which releases have associated JIRA reports. For releases where we never published a link to a JIRA report, I added a tag that could be used to construct a report in future, but didn't fill in any URL. Some releases do have a link to a JIRA report in the release notes, however the report is empty. Possibly there was some strangeness around the release process for those, or there could be some missing info in the JIRA system. I just transcribed the links in those cases and didn't try to reconstruct the history to debug the empty reports. Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525 --- M docs/impala_keydefs.ditamap M docs/topics/impala_fixed_issues.xml M docs/topics/impala_known_issues.xml 3 files changed, 212 insertions(+), 66 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/6515/1 -- To view, visit http://gerrit.cloudera.org:8080/6515 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I007e634f9da57289674683dd5bf64e3e3ca8f525 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-4889: Use client sidecars for Thrift RPCs
Henry Robinson has posted comments on this change. Change subject: IMPALA-4889: Use client sidecars for Thrift RPCs .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6473/1/be/src/rpc/rpc.h File be/src/rpc/rpc.h: Line 169: int idx = -1; > what's undefined about it? Hm, looks like compilers won't warn in this case (I thought they'd be more conservative), but turns out: void t(int* p) { } int foo() { int i; t(&i); if (i) { } } doesn't give a warning even though i is uninitialized (whereas if you omit the call to t(), it does). Long story short, I removed the initialization. Line 179: RETURN_IF_ERROR(DeserializeThriftMsg(sidecar.data(), &len, true, resp)); > how long is sidecar.data() valid? As long as the controller_ object is valid, so as long as the Rpc is valid (which is usually not very long after this method returns). -- To view, visit http://gerrit.cloudera.org:8080/6473 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4889: Use client sidecars for Thrift RPCs
Henry Robinson has uploaded a new patch set (#3). Change subject: IMPALA-4889: Use client sidecars for Thrift RPCs .. IMPALA-4889: Use client sidecars for Thrift RPCs This patch changes the way Thrift structures are serialized with KRPC. Instead of serializing them to a byte stream, and then writing that stream to a Protobuf object which is serialized again en route to the wire, the Thrift objects are serialized to byte streams which are then directly written to the wire as a 'sidecar'. This saves a copy and serialization step both on the sender and receiver sides of the RPC. Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95 --- M be/src/rpc/common.proto M be/src/rpc/rpc-mgr-test.cc M be/src/rpc/rpc.h M be/src/rpc/thrift-util.h M be/src/statestore/statestore-subscriber.cc M be/src/statestore/statestore-test.cc M be/src/statestore/statestore.cc 7 files changed, 125 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/6473/3 -- To view, visit http://gerrit.cloudera.org:8080/6473 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0fcc87c6b67aa167d70ae022663b14bc90261c95 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-5140: improve docs building guidelines
Jim Apple has posted comments on this change. Change subject: IMPALA-5140: improve docs building guidelines .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6512/4/docs/README.md File docs/README.md: Line 6: * Open a terminal window and run the following commands to get the Impala documentation source files from Git: Could yo wrap long lines to help with the gerrit display to help ease reviewing? -- To view, visit http://gerrit.cloudera.org:8080/6512 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71ae79ecd346045697fe225140ee9a317c5a337f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Brown Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (1 comment) Lars, a few high level questions before I start the next review iteration. Thanks http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts, : so the SORT node will contain all partitioning columns first, followed : by the SORT BY columns. Can you expand the comment a little bit on the semantics of this? A few high level questions are: 1. What is the connection between this and the sortby() hints? Can I use both? What if there is a conflict? Same for clustered hint. 2. Why does sort by enables clustering as well? 3. Can I specify any table columns (even the partitioning columns) in the SORT BY clause or are there some restrictions? 4. I suppose this works for all table types (including Kudu)? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT
Zach Amsden has posted comments on this change. Change subject: IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/6261/11/shell/impala_shell.py File shell/impala_shell.py: Line 49: HISTORY_LENGTH = 1 > revert that Whoops, this was supposed to be a local only change. -- To view, visit http://gerrit.cloudera.org:8080/6261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT
Hello Matthew Jacobs, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6261 to look at the new patch set (#12). Change subject: IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT .. IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT Create external table can be used for two different operations, creating a new table by selecting data from another, and importing an existing unmanaged external table. Previously, for Kudu tables, it was only possible to issue CTAS for managed tables, and external tables were assumed to be imports of already existing data. This change allows the creation of new, unmanaged (external) kudu tables from SELECT statements. In the catalog executor, we detect if a table creation is new by inferring that from the presence of column definitions, not by whether it is external / managed. Front-end code is changed to allow this through the parser. This means any attempts to specify table metadata for Kudu tables are now interpreted as a table creation. Testing: Manual testing, expanded ParserTest and AnalyzeDDLTest and added test cases to query_test to validate Kudu support. Ran the folowing query: create external table test2_name primary key(id) partition by hash(id) partitions 10 stored as kudu as select * from test_name; Which successfully created and inserted 3 rows of data. Tried many other things like changing primary keys and forms of partitioning. Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 --- M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M tests/query_test/test_kudu.py 7 files changed, 91 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/6261/12 -- To view, visit http://gerrit.cloudera.org:8080/6261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT
Dan Hecht has posted comments on this change. Change subject: IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/6261/11/shell/impala_shell.py File shell/impala_shell.py: Line 49: HISTORY_LENGTH = 1 revert that -- To view, visit http://gerrit.cloudera.org:8080/6261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3203: Part 2: per-core free lists in buffer pool
Dan Hecht has posted comments on this change. Change subject: IMPALA-3203: Part 2: per-core free lists in buffer pool .. Patch Set 15: (15 comments) http://gerrit.cloudera.org:8080/#/c/6414/14/be/src/runtime/bufferpool/buffer-allocator.cc File be/src/runtime/bufferpool/buffer-allocator.cc: Line 347: lock_guard pl(page->buffer_lock); > 'client_lock' would guard against the page being moved to a different arena Yup, realized that after writing this comment. oops. http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/bufferpool/buffer-allocator.cc File be/src/runtime/bufferpool/buffer-allocator.cc: PS15, Line 59: . should be more explicit that it might return less memory than 'target_bytes'. Maybe say "Free as much memory as possible up to 'target_bytes' from this area back to the system allocator. PS15, Line 64: returns the total amount freed hmm, it's kinda confusing that the return value has this subtle difference in meaning depending on claim_memory. It seems the most intuitive return value would be "Returns the number of bytes of system memory claimed for the caller. Memory returned to the system in excess of that amount is added into 'system_bytes_remaining_'. Can the claim_memory=false path accommodate something like that? If we can't reconcile it to have a single return value, then it seems we really have two different abstractions bunched into one call. PS15, Line 70: FreeMemory do you think it makes sense to call this FreeSystemMemory() to make it clearer that this is about pushing memory back to the system? PS15, Line 254: ffinally typo Line 261: if (delta == len) break; since we can't get rid of the other DecreaseSystemBytesRemaining() probably better to move this back to where it was. PS15, Line 263: lock_arenas thought this would become 'final_attempt' Line 304: vector> arena_locks; let's add a comment explaining all of this. Something like: There are two strategies for scavenging buffers: 1) Fast, opportunistic: Each arena is searched in succession. Although reservations guarantee that the memory we need is available somewhere, this can fail because we can race with another thread that returned buffers to an arena that we've already searched and took the buffers from an arena we haven't yet searched. 2) Slow, guaranteed to succeed: In order to ensure that we can find the memory in a single pass, we hold on to the locks for arenas we've already examined. That way, if another thread can't take the memory that we need from an arena that we haven't yet examined (or from 'system_bytes_available_') because in order to do so, it would have had to return the equivalent amount of memory to an earlier arena or added it back into 'systems_bytes_reamining_'. The former can't happen since we're still holding those locks, and the latter is solved by trying DecreaseSystemBytesRemaining() at the end. PS15, Line 317: // It's possible that 'system_bytes_remaining_' was incremented since we last checked : // it. maybe this sentence would now be explained in the proposed comment above. PS15, Line 319: another thread between when that thread released memory to the system and : // incremented 'system_bytes_remaining_'. I think it'd be good to explicitly say: ... since that happens while holding one of the arena locks. or similar. though the exception is when AllocateInternal() we add it back in without holding a lock, which isn't a correctness problem since that memory couldn't be needed by another client, but makes the statement a little confusing. PS15, Line 322: false why false instead of true? I guess it doesn't matter since it should succeed to get all of it, but 'true' seems more intuitive to me. http://gerrit.cloudera.org:8080/#/c/6414/15/be/src/runtime/bufferpool/buffer-allocator.h File be/src/runtime/bufferpool/buffer-allocator.h: Line 47: /// required memory from the OS). I think explaining a couple of more things would help: 1) The fact that BufferAllocator relies on the reservations (which are managed by the BufferPool & client) and the BufferPool's eviction policy, for correctness, to ensure that a client that has unused reservation can call Allocate() and the BufferAllocator() will always be able to find the memory to back the buffer somewhere. 2) Some implementation notes explaining that available memory is sort of kept in three forms: buffers in the free lists, buffers in the clean pages lists, and available memory that can be called up from the system. This information is kind of in the above paragraphs, but I think explaining it in that way will help understand the implementation. Probably paragraph two above can be moved into this section since the class abstraction itself doesn't expose arenas, right? PS15, Line 91: void does the caller care if it couldn't release 'bytes_to_free' bytes? PS15, Line 133:
[Impala-ASF-CR] IMPALA-4226, IMPALA-4227: Bump Breakpad version
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4226, IMPALA-4227: Bump Breakpad version .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/423/ -- To view, visit http://gerrit.cloudera.org:8080/6513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I89cbcc3f5cc742a0f5700f3985c02482bdfd03d0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db
Lars Volker has posted comments on this change. Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db .. Patch Set 7: Thanks for fixing this! Could you close the Jira? -- To view, visit http://gerrit.cloudera.org:8080/6503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db
Zach Amsden has posted comments on this change. Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db .. Patch Set 7: While I did forget to remove the std:: on vector, but I think enough effort has gone into this. Back to bug-hunting. -- To view, visit http://gerrit.cloudera.org:8080/6503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db .. IMPALA-5123: Fix ASAN use after free in timezone_db The issue is that the string temporary returned by .string goes out of scope immediately after being created. Also, the API to mkstemp is unclear on whether it modifies the string in place. Just strdup() the c_str() to be safe - this is not performance critical code. Testing: ASAN build, running expr-test be test; ASAN fails before, passes after this change. Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d Reviewed-on: http://gerrit.cloudera.org:8080/6503 Reviewed-by: Lars Volker Tested-by: Impala Public Jenkins --- M be/src/exprs/timezone_db.cc 1 file changed, 15 insertions(+), 11 deletions(-) Approvals: Impala Public Jenkins: Verified Lars Volker: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zach Amsden
[native-toolchain-CR] Ported native-toolchain to work on ppc64le
Valencia Edna Serrao has posted comments on this change. Change subject: Ported native-toolchain to work on ppc64le .. Patch Set 1: (1 comment) I'm ready with almost all required changes, except, for crcutil. Before I push them on gerrit, I need to know your views on patching mechanism usage issue w.r.t crcutil below. http://gerrit.cloudera.org:8080/#/c/6468/1/source/crcutil/build.sh File source/crcutil/build.sh: Line 42: wrap autoreconf -i; > The patch is applied on Makefile.am which is generated only after autogen.s Patching mechanism usage for 2nd patch file. Please let me know your views on this. -- To view, visit http://gerrit.cloudera.org:8080/6468 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7931752ca497bc7a5e3cc574bbb54637f382c72a Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Valencia Edna Serrao Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Valencia Edna Serrao Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5123: Fix ASAN use after free in timezone db
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5123: Fix ASAN use after free in timezone_db .. Patch Set 6: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/422/ -- To view, visit http://gerrit.cloudera.org:8080/6503 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I490f741403ea2004bc51394aa1251577337b1e1d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No