[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6338 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links .. IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links This change primarily sets up the infrastructure in the keydefs file to allow shortening and genericizing cross-doc links within the Impala library. Many links to CDH-related pages such as cdh_*.xml and cm_*.xml will be replaced by generic plain text. Most of that will be in the next phase. For phase 1, you can verify that things are working by viewing the HTML output pages: impala_new_features.html impala_perf_hdfs_caching.html and doublechecking that at the spots where formerly was a link to a CDH page describing HDFS caching, now there is only the plain text phrase: "see the documentation for HDFS caching". Blanking out all the Cloudera doc links (in the next phase) will remove many instances of www.cloudera.com that appear both in the doc source and in the HTML output. Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686 Reviewed-on: http://gerrit.cloudera.org:8080/6338 Reviewed-by: Laurel HaleReviewed-by: John Russell Tested-by: Impala Public Jenkins --- M docs/impala_keydefs.ditamap M docs/topics/impala_new_features.xml M docs/topics/impala_perf_hdfs_caching.xml 3 files changed, 377 insertions(+), 126 deletions(-) Approvals: Impala Public Jenkins: Verified Laurel Hale: Looks good to me, but someone else must approve John Russell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6338 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale
[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
John Russell has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6338 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/67/ -- To view, visit http://gerrit.cloudera.org:8080/6338 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable .. Patch Set 2: Code-Review+2 (1 comment) I feel pretty confident +2ing - we have good test coverage of the in-memory joins and this patch is only changing the internal implementation of deprecated code. I assume you did a test run using this code path but if not it would be good to do before merging. http://gerrit.cloudera.org:8080/#/c/6263/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: Line 1458: // Load _expr_ctxs_[0] Stray comment. -- To view, visit http://gerrit.cloudera.org:8080/6263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6263/1//COMMIT_MSG Commit Message: Line 16: > I looked at that instance at ScalarFnCall(). It t should be fine for MT as I looked and it seemed like the fallback would never be used since the GetCodegendComputeFn() should never return OK + NULL. Not a big deal anyway. -- To view, visit http://gerrit.cloudera.org:8080/6263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: partitions DMLs for Kudu tables .. Patch Set 6: (13 comments) http://gerrit.cloudera.org:8080/#/c/6037/6/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS6, Line 446: TupleRow* current_row = batch->GetRow(i); : uint32_t partition; : RETURN_IF_ERROR(partitioner_->Partition(current_row, )); : RETURN_IF_ERROR(channels_[partition % num_channels]->AddRow(current_row)); As we discussed, I think Henry is right that we need to avoid the v-f-calls per row. I think we can consider two approaches: 1) we can continue branching separately for kudu and for hash partitioning here, and calling non virtual fns on the partitioner. It's still good to have pulled all the Kudu code into a separate class. 2) the partitioner class you added can take row batches at a time as well as well as the channels_, and handle the per-row calls internally. #2 sounds cleaner but I'm open to suggestions. http://gerrit.cloudera.org:8080/#/c/6037/6/bin/impala-config.sh File bin/impala-config.sh: PS6, Line 124: 4cd7d85 btw this won't be in your change, right? we still need to get the Kudu API in http://gerrit.cloudera.org:8080/#/c/6037/6/common/thrift/Partitions.thrift File common/thrift/Partitions.thrift: PS6, Line 42: TKuduDataPartition brief comment PS6, Line 43: target_table_id mention this is necessary for the kudu Partitioner API http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 683: // Hdfs folder structure correctly. , or the indexes to construct rows passed to the Kudu partitioning API. (Or something similar) http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: PS6, Line 46: partition partitioning PS6, Line 52: number index Line 55: private DataPartition( nit: comment should be called only by the static factory method for Kudu partitioned tables http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS6, Line 274: List partitionExprs = Lists.newArrayList(modifyStmt.getPartitionExprs()); : List targetColIdxs = Lists.newArrayList(modifyStmt.getTargetColIdxs()); maybe just create these inline as params to the static constructor method below, since they're not needed otherwise. http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 153: } else { a brief comment here may be helpful http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test: PS6, Line 64: [KUDU(a.id)] seems to be printing the alias rather than the base table name as we see in the line above http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test: PS6, Line 29: 01:EXCHANGE [KUDU(1)] hm this looks fishy with unions. i'm not sure if the approach doesn't make sense with unions, at least those with constant operands. can you add a test case which has unions with selecting from a table, e.g. upsert into foo select x,y,z from a union all select x,y,z from b PS6, Line 117: upsert into functional_kudu.testtbl /*+ clustered */ : select * from functional_kudu.testtbl : PLAN : UPSERT INTO KUDU [functional_kudu.testtbl] : | : 01:SORT : | order by: id ASC NULLS LAST : | : 00:SCAN KUDU [functional_kudu.testtbl] : DISTRIBUTEDPLAN : UPSERT INTO KUDU [functional_kudu.testtbl] : | : 02:SORT : | order by: id DESC NULLS LAST : | : 01:EXCHANGE [KUDU(functional_kudu.testtbl.id)] : | : 00:SCAN KUDU [functional_kudu.testtbl] we'll wanna do something similar without the clustered hint later, as we discussed -- To view, visit http://gerrit.cloudera.org:8080/6037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
Laurel Hale has posted comments on this change. Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links .. Patch Set 1: Code-Review+1 Builds cleanly and the cdh_ig_hdfs_caching token replaces beautifully in the built content. Looks good. -- To view, visit http://gerrit.cloudera.org:8080/6338 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Laurel Hale Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5039: Fix variability in parquet dictionary filtering test
Alex Behm has posted comments on this change. Change subject: IMPALA-5039: Fix variability in parquet dictionary filtering test .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6301/2/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: Line 537: for expected_line in expected_lines: > You can think of each of these as sparse lists of mutually exclusive things Thanks, make sense. Works for me. http://gerrit.cloudera.org:8080/#/c/6301/3/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: Line 571: (function, field, expected_value, actual_value)) I think we should dump the complete actual profile as well for better debugging. -- To view, visit http://gerrit.cloudera.org:8080/6301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable
Michael Ho has posted comments on this change. Change subject: IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/6263/1//COMMIT_MSG Commit Message: Line 16: > I did a grep for CastPtrToLlvmPtr() to make sure there were no other instan I looked at that instance at ScalarFnCall(). It t should be fine for MT as Expr objects are shared across fragment instances. My understanding of that path is that it's a fall back to the interpretation path if codegen fails for a child expression so they aren't dead code but I think there are other ways of generating those code without cross-compiling the static stub functions in expr-ir.cc http://gerrit.cloudera.org:8080/#/c/6263/1/be/src/exec/hash-table-ir.cc File be/src/exec/hash-table-ir.cc: Line 26: ExprContext* const* HashTableCtx::GetBuildExprCtxs() const { > Long lines Done http://gerrit.cloudera.org:8080/#/c/6263/1/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: Line 736: > Comment that this is pointer to the first ExprContext* pointer? Done Line 1099: BasicBlock* null_block = BasicBlock::Create(context, "null", *fn); > Maybe comment that this is a pointer to the first ExprContext*? Done Line 1121: CodegenAnyVal result = CodegenAnyVal::CreateCallWrapped(codegen, , > No longer accurate It's still accurate in a way but I rephrased it to: expr_ctx = ctx_vector[i]; http://gerrit.cloudera.org:8080/#/c/6263/1/be/src/exec/old-hash-table.cc File be/src/exec/old-hash-table.cc: Line 325: llvm_loc = builder.CreatePointerCast(llvm_loc, > Remove TODO if you're not going to do this in this patch? Done. old-hash-table's day is numbered anyway. Line 651: > What do you think about adding a CallFunction() helper to LlvmCodegen to ma Done -- To view, visit http://gerrit.cloudera.org:8080/6263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable
Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable .. IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable To support sharing generated code across fragment instances, no fragment instance specific states should be baked into the IR. All cases were addressed previously except for the old hash tables used for legacy agg/join. This change fixes the old hash tables to not bake its fields in the generated IR functions. Similar to previous patches, some cross-compiled thin wrappers are introduced to access the fields of interest from an OldHashTable object. Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/CMakeLists.txt M be/src/exec/hash-table-ir.cc M be/src/exec/hash-table.cc M be/src/exec/hash-table.h A be/src/exec/old-hash-table-ir.cc M be/src/exec/old-hash-table.cc M be/src/exec/old-hash-table.h 11 files changed, 185 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6263/2 -- To view, visit http://gerrit.cloudera.org:8080/6263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. IMPALA-4929: Safe concurrent access to IR function call graph Previously, LlvmCodeGen creates a map (fn_refs_map_) which maps an IR function to its set of callees. That map was initialized once at Impalad's start time and it is supposed to be read-only afterwards. However, the map was unintentionally modified by its user LlvmCodeGen::MaterializeFunctionHelper() due to the use of operator []. In particular, that operator may create a new entry in the map if it doesn't exist already. This is possible because the map initially only contains entries for functions with at least one callee function. Using the operator [] with functions with no callee function may modify the map. Since the map is shared by all fragment instances, such unsafe concurrent modification can cause missing or wrong callee functions to be materialized, leading to failure to resolve symbols in LLVM. This change fixes the problem above by: 1. Create an entry for all functions even if it has no callee. In fact, LlvmCodeGen::LinkModule() assumes all functions defined in main module will have entries in the map. 2. Introduce a new class CodegenCallGraph which encapsulates the map described above. The map was established once at initialization time and there is no interface to modify the map afterwards, thus preventing any accidental modification to the map at run time. Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Reviewed-on: http://gerrit.cloudera.org:8080/6326 Reviewed-by: Michael HoTested-by: Impala Public Jenkins --- M be/src/codegen/CMakeLists.txt A be/src/codegen/codegen-callgraph.cc A be/src/codegen/codegen-callgraph.h M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h 5 files changed, 188 insertions(+), 81 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Ho: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5039: Fix variability in parquet dictionary filtering test
Joe McDonnell has uploaded a new patch set (#3). Change subject: IMPALA-5039: Fix variability in parquet dictionary filtering test .. IMPALA-5039: Fix variability in parquet dictionary filtering test The tests for dictionary filtering look at how many row groups are processed and how many are filtered by matching text in the profile. However, the number of row groups processed and filtered by any individual fragment depends on how the work is split and how many impalads are running. This causes variability in the test output. To fix this, the test needs a way to aggregate the results across fragments. This fix introduces the following syntax for specifying these aggregates: aggregate(function_name, field_name): expected_value This searches the runtime profile for lines that contain 'field_name: number'. It skips the averaged fragment, as this is derived from all the other fragments. Currently, only SUM is implemented, and the expected_value is required to be an integer. It should be easy to implement other interesting functions like COUNT and MIN/MAX. It would also be possible to extend it to floats. Switching the dictionary filtering tests over to this new syntax eliminates the variability in the tests. Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb --- D testdata/workloads/functional-query/queries/QueryTest/mt-dop-parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test M tests/common/test_result_verifier.py M tests/query_test/test_mt_dop.py M tests/query_test/test_scanners.py 5 files changed, 134 insertions(+), 317 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/6301/3 -- To view, visit http://gerrit.cloudera.org:8080/6301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5039: Fix variability in parquet dictionary filtering test
Joe McDonnell has posted comments on this change. Change subject: IMPALA-5039: Fix variability in parquet dictionary filtering test .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6301/2/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: Line 495: print field_regex > why print? does this go anywhere useful? Removed. This was just leftover debugging. Line 537: expected_regexes.append(try_compile_regex(expected_line)) > What's the benefit of having one entry per expected_line in both these list You can think of each of these as sparse lists of mutually exclusive things. An expected line is either a regex, an aggregation or an exact match. None entries indicate the absence of a regex or aggregation. Including the None entries means the lists are all the same size and have the same indices. So, for a given index into the expected lines, the expected_regexes tells you whether it is a regex or not. The expected_aggregations tells you whether it is an aggregation or not. If it is neither, then it is an exact match. This also allows you to walk through the expected lines in order and have a single "matched" list across all expected lines. I worked up a version that had separate lists, but it ended up not being much cleaner than the current code. I hope this makes sense. -- To view, visit http://gerrit.cloudera.org:8080/6301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Joe McDonnell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6338 Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links .. IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links This change primarily sets up the infrastructure in the keydefs file to allow shortening and genericizing cross-doc links within the Impala library. Many links to CDH-related pages such as cdh_*.xml and cm_*.xml will be replaced by generic plain text. Most of that will be in the next phase. For phase 1, you can verify that things are working by viewing the HTML output pages: impala_new_features.html impala_perf_hdfs_caching.html and doublechecking that at the spots where formerly was a link to a CDH page describing HDFS caching, now there is only the plain text phrase: "see the documentation for HDFS caching". Blanking out all the Cloudera doc links (in the next phase) will remove many instances of www.cloudera.com that appear both in the doc source and in the HTML output. Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686 --- M docs/impala_keydefs.ditamap M docs/topics/impala_new_features.xml M docs/topics/impala_perf_hdfs_caching.xml 3 files changed, 377 insertions(+), 126 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6338/1 -- To view, visit http://gerrit.cloudera.org:8080/6338 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/6261/4//COMMIT_MSG Commit Message: PS4, Line 9: No reason not to allow this. Not necessary. Also, I find this whole paragraph confusing - for example what thrift protocol? Why would it need to be changed? (not that I'm saying you should explain this, it may be an unnecessary detail). Obviously, you should assume that anyone reading this has no context for the work that you've been doing up to this point. http://gerrit.cloudera.org:8080/#/c/6261/4/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 240:* Analyzes and checks parameters specified for ingested external Kudu tables. Since ingested isn't a term we use anywhere else (AFAIK), you should explain what you mean by it, something like: "Analyzes and checks parameters specified for external tables that already exist in Kudu." -- To view, visit http://gerrit.cloudera.org:8080/6261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info .. IMPALA-3401: [DOCS] Physically remove Cloudera Manager info Followup from Laurel's code reviews, to physically remove references to Cloudera Manager that were hidden. Remove a few stray instances of Cloudera Manager that I found still remaining in the source. Fix up trailing spaces introduced during earlier Cloudera Manager-related edits. Also remove stray 'Cloudera' references, or stale/commented Cloudera-specific info, noticed near other spots being edited. Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b Reviewed-on: http://gerrit.cloudera.org:8080/6325 Reviewed-by: Ambreen KaziReviewed-by: John Russell Tested-by: Impala Public Jenkins --- M docs/shared/impala_common.xml M docs/topics/impala_admission.xml M docs/topics/impala_auditing.xml M docs/topics/impala_authorization.xml M docs/topics/impala_breakpad.xml M docs/topics/impala_config_options.xml M docs/topics/impala_config_performance.xml M docs/topics/impala_faq.xml M docs/topics/impala_fixed_issues.xml M docs/topics/impala_impala_shell.xml M docs/topics/impala_incompatible_changes.xml M docs/topics/impala_isilon.xml M docs/topics/impala_kerberos.xml M docs/topics/impala_logging.xml M docs/topics/impala_new_features.xml M docs/topics/impala_noncm_installation.xml M docs/topics/impala_perf_resources.xml M docs/topics/impala_perf_skew.xml M docs/topics/impala_perf_testing.xml M docs/topics/impala_prereqs.xml M docs/topics/impala_proxy.xml M docs/topics/impala_resource_management.xml M docs/topics/impala_s3.xml M docs/topics/impala_scalability.xml M docs/topics/impala_schema_design.xml M docs/topics/impala_timeouts.xml M docs/topics/impala_troubleshooting.xml M docs/topics/impala_txtfile.xml M docs/topics/impala_udf.xml M docs/topics/impala_webui.xml 30 files changed, 123 insertions(+), 839 deletions(-) Approvals: Impala Public Jenkins: Verified Ambreen Kazi: Looks good to me, but someone else must approve John Russell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell
[native-toolchain-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6336 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4fe07399e892837b9417f959bdaf967ca995be28 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Laszlo GaalGerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3748: minimum buffer requirements in planner
Tim Armstrong has uploaded a new patch set (#9). Change subject: IMPALA-3748: minimum buffer requirements in planner .. IMPALA-3748: minimum buffer requirements in planner Compute the minimum buffer requirement for spilling nodes and per-host estimates for the entire plan tree. This builds on top of the existing resource estimation code, which computes the sets of plan nodes that can execute concurrently. This is cleaned up so that the process of producing resource requirements is clearer. It also removes the unused VCore estimates. Fixes various bugs and other issues: * computeCosts() was not called for unpartitioned fragments, so the per-operator memory estimate was not visible. * Nested loop join was not treated as a blocking join. * The TODO comment about union was misleading * Fix the computation for mt_dop > 1 by distinguishing per-instance and per-host estimates. * Always generate an estimate instead of unpredictably returning -1/"unavailable" in many circumstances - there was little rhyme or reason to when this happened. * Remove the special "trivial plan" estimates. With the rest of the cleanup we generate estimates <= 10MB for those trivial plans through the normal code path. I left one bug (IMPALA-4862) unfixed because it is subtle, will affect estimates for many plans and will be easier to review once we have the test infra in place. Testing: Added basic planner tests for resource requirements in both the MT and non-MT cases. Re-enabled the explain_level tests, which appears to be the only coverage for many of these estimates. Removed the complex and brittle test cases and replaced with a couple of much simpler end-to-end tests. Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 --- M be/src/exec/analytic-eval-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/sorter.cc M be/src/scheduling/query-schedule.cc M be/src/service/query-exec-state.cc M common/thrift/Frontend.thrift M docs/shared/impala_common.xml M docs/topics/impala_explain.xml M docs/topics/impala_explain_level.xml M docs/topics/impala_explain_plan.xml M docs/topics/impala_hbase.xml M docs/topics/impala_known_issues.xml M docs/topics/impala_optimize_partition_key_scans.xml M docs/topics/impala_perf_joins.xml M docs/topics/impala_perf_stats.xml M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/common/PrintUtils.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java A fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test A testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test M
[Impala-ASF-CR] IMPALA-3748: estimate minimum buffers in planner
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3748: estimate minimum buffers in planner .. Patch Set 8: (15 comments) http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java File fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java: Line 48: public class ResourceRequirement { > these aren't requirements if they include the existing "estimates". Resourc Done Line 115: node.computeCosts(queryOptions); > they're not costs renamed to computeResourceConsumption(). I don't feel strongly about the name - please let me know if you have a better idea. Line 156: hdfsScanMemEstimate + hbaseScanMemEstimate + nonScanMemEstimate + dataSinkMemEstimate); > long line Done http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: Line 209: public int getNumInstancesPerHost(TQueryOptions queryOptions) { > passing in the dop directly is preferable, because it's easier to see what' Done Line 232: public long getNumDistinctValues(TQueryOptions queryOptions, List exprs) { > rename to include "per instance" in the name to clarify. you might also wan Done http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: Line 120: // estimated per-instance memory usage for this plan node in bytes; > the "estimates" don't really have defined semantics (max? desirable?). we m I added a comment to point to IMPALA-5013, which tracks that problem. Some of the estimates (HDFS scanner, join, agg) are reasonable and we might want to use them as a basis for estimating "ideal" memory once this evolves further. Reworded to avoid "usage". Line 124: // Minimum number of buffers required to execute spilling operator. > why restrict this to spilling? doesn't every operator have a defined requir That's true. Currently the requirement is 0 for non-spilling operators but that will change as IMPALA-4834 progresses. Updated the comment accordingly to be clearer. Line 281: TQueryOptions queryOptions, TExplainLevel detailLevel) { > what's the role of the query options? It's needed to get mt_dop. For this particular interface it's at a high-level of abstraction so seems a little specific to be passing in mt_dop as an int. Line 317: PrintUtils.printInstances(" ", fragment_.getNumInstances(queryOptions))); > why do we want this? I think it makes it much easier to interpret the plans with mt_dop > 1. Otherwise it's not immediately obvious what parallelism each node executes with, and particularly how the per-instance estimates relate to the total per-host number. You suggested in a different comment that we should move it to plan fragment, so I did that. Line 322: PrintUtils.printPerInstanceMinBuffers(" ", perInstanceMinBufferBytes_)); > why make this conditional? The output would be confusing if it was user-visible because the backend exec nodes often execute with less memory than them minimum buffers (because of the small buffers optimisation, etc). It'll be more actionable once we have true reservations. http://gerrit.cloudera.org:8080/#/c/5847/8/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: Line 10: | hosts=1 instances=1 est-per-instance-mem=0B > how come the number of hosts changed? Because it's a single-node plan - I adjusted the approach so that # hosts more accurately reflects how the planner expects the plan to execute. It's now based on PlanFragment.getNumNodes() (the # of instances of the fragment expected) vs PlanNode.getNumNodes() (an estimate that feeds into the decision how to partition a fragment). I think this will be even clearer if I move hosts/instances to the fragment level. The previous approach led to various anomalies, e.g. hosts could be different for PlanNodes in the same fragment. http://gerrit.cloudera.org:8080/#/c/5847/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test: Line 15: hosts=1 instances=1 est-per-instance-mem=0B > regarding the number of fragment instances: that's really a per-fragment (r I ended up making this change at explain levels 2 and 3. I've updated some of affected tests so you can see how the new output looks. If we agree I can go through and fix the remaining tests (I don't have a working kudu instance locally so it's a bit labour-intensive to update some of the tests).
[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
John Russell has posted comments on this change. Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
John Russell has posted comments on this change. Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_logging.xml File docs/topics/impala_logging.xml: PS2, Line 453: : See : http://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html; scope="external" format="html"/> : for details about how to enable this feature and set : up the regular expressions to detect and redact sensitive information within SQL statement text. : > This should've been set to hidden and removed -- the content at that link i That'll be an item for the next iteration where we deal with external links. -- To view, visit http://gerrit.cloudera.org:8080/6325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
Ambreen Kazi has posted comments on this change. Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-4846: Upgrade Snappy to 1.1.4
Laszlo Gaal has uploaded a new change for review. http://gerrit.cloudera.org:8080/6336 Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4 .. IMPALA-4846: Upgrade Snappy to 1.1.4 Snappy 1.1.4 (released Jan 25th, 2017) claims significant performance improvements: - improved compression performance by 5% - improved decompression performance by 20% Change-Id: I4fe07399e892837b9417f959bdaf967ca995be28 --- M buildall.sh 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/36/6336/1 -- To view, visit http://gerrit.cloudera.org:8080/6336 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4fe07399e892837b9417f959bdaf967ca995be28 Gerrit-PatchSet: 1 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal
[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
John Russell has posted comments on this change. Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info .. Patch Set 3: (3 comments) Removed a couple more 'Cloudera' instances. http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_logging.xml File docs/topics/impala_logging.xml: PS2, Line 372: > Cloudera ref - there's at least one more on this page. Done http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_txtfile.xml File docs/topics/impala_txtfile.xml: PS2, Line 489: > will this continue to be hosted on archive.cloudera? Flagging this because Done http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_udf.xml File docs/topics/impala_udf.xml: PS2, Line 386: /cloudera/ > cloudera repo link to be replaced? That'll be a discussion topic for the next iteration, not doing this time. -- To view, visit http://gerrit.cloudera.org:8080/6325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: John Russell Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
John Russell has uploaded a new patch set (#3). Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info .. IMPALA-3401: [DOCS] Physically remove Cloudera Manager info Followup from Laurel's code reviews, to physically remove references to Cloudera Manager that were hidden. Remove a few stray instances of Cloudera Manager that I found still remaining in the source. Fix up trailing spaces introduced during earlier Cloudera Manager-related edits. Also remove stray 'Cloudera' references, or stale/commented Cloudera-specific info, noticed near other spots being edited. Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b --- M docs/shared/impala_common.xml M docs/topics/impala_admission.xml M docs/topics/impala_auditing.xml M docs/topics/impala_authorization.xml M docs/topics/impala_breakpad.xml M docs/topics/impala_config_options.xml M docs/topics/impala_config_performance.xml M docs/topics/impala_faq.xml M docs/topics/impala_fixed_issues.xml M docs/topics/impala_impala_shell.xml M docs/topics/impala_incompatible_changes.xml M docs/topics/impala_isilon.xml M docs/topics/impala_kerberos.xml M docs/topics/impala_logging.xml M docs/topics/impala_new_features.xml M docs/topics/impala_noncm_installation.xml M docs/topics/impala_perf_resources.xml M docs/topics/impala_perf_skew.xml M docs/topics/impala_perf_testing.xml M docs/topics/impala_prereqs.xml M docs/topics/impala_proxy.xml M docs/topics/impala_resource_management.xml M docs/topics/impala_s3.xml M docs/topics/impala_scalability.xml M docs/topics/impala_schema_design.xml M docs/topics/impala_timeouts.xml M docs/topics/impala_troubleshooting.xml M docs/topics/impala_txtfile.xml M docs/topics/impala_udf.xml M docs/topics/impala_webui.xml 30 files changed, 123 insertions(+), 839 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6325/3 -- To view, visit http://gerrit.cloudera.org:8080/6325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi
[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
Lars Volker has posted comments on this change. Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader() .. Patch Set 1: My proposal is to keep the signature as it is, but initialize the status parameter to Status::OK() and do a private perf run to see if it has any negative impact on performance. If it does not, then I'd advocate to initialize it, since it may help prevent similar bugs in the future. -- To view, visit http://gerrit.cloudera.org:8080/6328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] Impala-Shell: Add history max option
Sailesh Mukil has posted comments on this change. Change subject: Impala-Shell: Add history_max option .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6335/1//COMMIT_MSG Commit Message: Line 7: Impala-Shell: Add history_max option I think it will be useful to create a JIRA and refer that here, so Impala users will find it easier to refer back to this. -- To view, visit http://gerrit.cloudera.org:8080/6335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] Impala-Shell: Add history max option
Zach Amsden has uploaded a new change for review. http://gerrit.cloudera.org:8080/6335 Change subject: Impala-Shell: Add history_max option .. Impala-Shell: Add history_max option Allow users to keep a longer history of queries if desired. I personally find it useful to keep a long history of queries to reference and want to bump this up to a very large value, but keep the default reasonable. Testing: Created .impalarc as follows, now getting more history saved. [impala] history_max=1000 Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f --- M shell/impala_shell.py M shell/impala_shell_config_defaults.py 2 files changed, 21 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6335/1 -- To view, visit http://gerrit.cloudera.org:8080/6335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden
[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
Ambreen Kazi has posted comments on this change. Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_logging.xml File docs/topics/impala_logging.xml: PS2, Line 372: Cloudera Cloudera ref - there's at least one more on this page. PS2, Line 453: : See : http://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html; scope="external" format="html"/> : for details about how to enable this feature and set : up the regular expressions to detect and redact sensitive information within SQL statement text. : This should've been set to hidden and removed -- the content at that link is very strictly geared towards CM. We need clarification on how redaction works with standalone Impala on hadoop. How it can be enabled and what the default setting is. http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_txtfile.xml File docs/topics/impala_txtfile.xml: PS2, Line 489: Cloudera will this continue to be hosted on archive.cloudera? Flagging this because Cloudera was removed from the previous paragraph that talks about this package, but not here. http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_udf.xml File docs/topics/impala_udf.xml: PS2, Line 386: /cloudera/ cloudera repo link to be replaced? -- To view, visit http://gerrit.cloudera.org:8080/6325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Alex Behm has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 13: (5 comments) http://gerrit.cloudera.org:8080/#/c/5816/13/be/src/exec/union-node.cc File be/src/exec/union-node.cc: Line 156: // It is OK to close the child here because all RowBatches have already been ... because all tuple data has been copied, and we will not be calling ... Line 174: child_batch_.reset(); move this before ++child_idx_ to have them clustered? http://gerrit.cloudera.org:8080/#/c/5816/13/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 67: /// Children that can be passed through, without evaluating and materializing their single line Line 109: inline bool AtEos(int per_fragment_instance_idx) const { Maybe it's clearer to make GetNextPassThrough() not set eos, and instead have all the *eos setting be done directly inside GetNext(). That way each place different place can do the appropriate checks. Line 119: DCHECK_LE(child_idx_, children_.size()); Checking of the child_idx_ is dependent on the aller pattern of advancing the child_idx_ and calling into this AtEos(), another argument for maybe inlining the checks into GetNext() to make this subtle point less disconnected. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
Lars Volker has posted comments on this change. Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader() .. Patch Set 1: Since we expect no performance penalty and consumers shouldn't rely on status being untouched, can initialize it to OK() in GetBytes and do a private perf run to see whether it has a perf impact? I think that would eliminate a potential cause for future problems. -- To view, visit http://gerrit.cloudera.org:8080/6328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader() .. Patch Set 1: Code-Review+2 I spoke to Joe offline and he mentioned that changing the return type to Status involved touching many more things. Given that I'm happy with this fix. -- To view, visit http://gerrit.cloudera.org:8080/6328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/367/ -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Michael Ho has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 3: Code-Review+2 Carry Tim's +2 forward. -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Michael Ho has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6326/2/be/src/codegen/codegen-callgraph.cc File be/src/codegen/codegen-callgraph.cc: PS2, Line 71: std:: > std:: not needed. Done http://gerrit.cloudera.org:8080/#/c/6326/2/be/src/codegen/codegen-callgraph.h File be/src/codegen/codegen-callgraph.h: Line 57: CallGraph call_graph_; > Maybe this is a bit overkill, but moving this (and the Init() function) to Thanks for the idea. The public interfaces of this class: GetCallees() and fns_referenced_by_gv() both have const attribute in their return type so it should get most of what we need for preventing mutation outside of the class. Of course, one can always do const_cast() but I guess we should catch them in code review. The bool inited_ will catch re-initialization of the map as this should not occur. -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Zach Amsden has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6326/2/be/src/codegen/codegen-callgraph.h File be/src/codegen/codegen-callgraph.h: Line 57: bool inited_; Maybe this is a bit overkill, but moving this (and the Init() function) to a nested class that only allows access to the map through a const reference (returned by .get()) would make it impossible to mutate the map after initialization, and the compiler will catch you at it if you try. -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6326 to look at the new patch set (#3). Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. IMPALA-4929: Safe concurrent access to IR function call graph Previously, LlvmCodeGen creates a map (fn_refs_map_) which maps an IR function to its set of callees. That map was initialized once at Impalad's start time and it is supposed to be read-only afterwards. However, the map was unintentionally modified by its user LlvmCodeGen::MaterializeFunctionHelper() due to the use of operator []. In particular, that operator may create a new entry in the map if it doesn't exist already. This is possible because the map initially only contains entries for functions with at least one callee function. Using the operator [] with functions with no callee function may modify the map. Since the map is shared by all fragment instances, such unsafe concurrent modification can cause missing or wrong callee functions to be materialized, leading to failure to resolve symbols in LLVM. This change fixes the problem above by: 1. Create an entry for all functions even if it has no callee. In fact, LlvmCodeGen::LinkModule() assumes all functions defined in main module will have entries in the map. 2. Introduce a new class CodegenCallGraph which encapsulates the map described above. The map was established once at initialization time and there is no interface to modify the map afterwards, thus preventing any accidental modification to the map at run time. Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a --- M be/src/codegen/CMakeLists.txt A be/src/codegen/codegen-callgraph.cc A be/src/codegen/codegen-callgraph.h M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h 5 files changed, 188 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6326/3 -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Taras Bobrovytsky has uploaded a new patch set (#13). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. A new query option DISABLE_UNION_PASSTHROUGH was added in this patch as a precaution and for testing purposes. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 20s660ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 187.474us 187.474us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 15.238us 15.238us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s958ms3s749ms3 13.08 MB 10.00 MB 00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s637ms1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS31s697ms1s708ms 28.80M 28.80M 528.48 MB
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Hello Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5816 to look at the new patch set (#13). Change subject: IMPALA-3586: Implement union passthrough .. IMPALA-3586: Implement union passthrough The union node acts as pass through operator and forwards row batches from it's children without materializing. This is done in the case when the child's tuple layout is identical to union node tuple layout and no functions need to be applied to the child row batches. A new query option DISABLE_UNION_PASSTHROUGH was added in this patch as a precaution and for testing purposes. Testing: - Added new planner and end to end tests that cover the new functionality. - Updated existing tests to reflect the new behavior. Perf: Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned version of the store_sales table. There was over a 2x performance improvement for the following query: SELECT COUNT(ss_sold_time_sk), COUNT(ss_item_sk), COUNT(ss_customer_sk), COUNT(ss_cdemo_sk), COUNT(ss_hdemo_sk), COUNT(ss_addr_sk), COUNT(ss_store_sk), COUNT(ss_promo_sk), COUNT(ss_ticket_number), COUNT(ss_quantity), COUNT(ss_wholesale_cost), COUNT(ss_list_price), COUNT(ss_sales_price), COUNT(ss_ext_discount_amt), COUNT(ss_ext_sales_price), COUNT(ss_ext_wholesale_cost), COUNT(ss_ext_list_price), COUNT(ss_ext_tax), COUNT(ss_coupon_amt), COUNT(ss_net_paid), COUNT(ss_net_paid_inc_tax), COUNT(ss_net_profit), COUNT(ss_sold_date_sk) FROM ( select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned union all select * from tpcds_10_parquet.store_sales_unpartitioned ) t Before: Total Time: 43s164ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 224.721us 224.721us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 24.578us 24.578us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s402ms3s060ms3 1 119.00 KB 10.00 MB 00:UNION 3 35s380ms 37s846ms 288.01M 288.01M3.08 MB 0 |--02:SCAN HDFS3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned |--03:SCAN HDFS3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--04:SCAN HDFS3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--05:SCAN HDFS3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--06:SCAN HDFS3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned |--07:SCAN HDFS3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned |--08:SCAN HDFS3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned |--09:SCAN HDFS3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned |--10:SCAN HDFS3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned 01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned After: Total Time: 20s660ms Summary: Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail -- 13:AGGREGATE 1 187.474us 187.474us1 1 28.00 KB -1.00 B FINALIZE 12:EXCHANGE1 15.238us 15.238us3 1 0 -1.00 B UNPARTITIONED 11:AGGREGATE 32s958ms3s749ms3 13.08 MB 10.00 MB 00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0 |--02:SCAN HDFS31s637ms1s734ms 28.80M 28.80M 528.68 MB 1.88 GB
[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4996: Single-threaded KuduScanNode .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-base.h File be/src/exec/kudu-scan-node-base.h: Line 35: /// are used to retrieve the rows for this scan. You should make this comment more specific to KuduScanNodeBase, eg. "Base class for single and multi-threaded versions of..." and "TODO: remove this once all scanners support multithreaded execution..." PS1, Line 98: // /// -- To view, visit http://gerrit.cloudera.org:8080/6312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5041: AuthManager::Init() is not idempotent
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/6333 Change subject: IMPALA-5041: AuthManager::Init() is not idempotent .. IMPALA-5041: AuthManager::Init() is not idempotent We use a global variable to track the completion of the kerberos environment setup which shouldn't be the case since we would want to re-setup the environment if we call AuthManager::Init() with a different configuration. Also, the global variable 'env_setup_complete_' was used so that we don't set the enviornment twice (once for internal transport and once for external). This patch ensures that it's still the case. Change-Id: I12cc210aa422f077446ea728ebf76921351417b0 --- M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/authentication.h 3 files changed, 11 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/6333/1 -- To view, visit http://gerrit.cloudera.org:8080/6333 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I12cc210aa422f077446ea728ebf76921351417b0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.cc File be/src/codegen/codegen-callgraph.cc: Line 87: call_graph_[key].insert(fn_name); > The entry may not exist already. Note the entry creation above is for 'fn_n Ah got it http://gerrit.cloudera.org:8080/#/c/6326/2/be/src/codegen/codegen-callgraph.cc File be/src/codegen/codegen-callgraph.cc: PS2, Line 71: http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.h File be/src/codegen/codegen-callgraph.h: Line 21: #include > Good point but it appears that two closely related files llvm-codegen.cc an I believe the std:: and boost:: versions have different perf characteristics because the c++ standard has some restrictions around how iterators behave. So they might not be a simple drop-in replacement. E.g. https://tinodidriksen.com/2010/04/cpp-set-performance/ -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Michael Ho has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.cc File be/src/codegen/codegen-callgraph.cc: Line 27: using namespace strings; > You can use common/names.h instead of the string and unordered_* usings. Done Line 35: Status status = > Can combine this with the below line. Too long after converting to nullptr. PS1, Line 71: boost:: > I believe this creates two unordered_sets (the Lvalue and Rvalue), then cop Done Line 87: call_graph_[key].insert(fn_name); > I think we can use find(key)->insert here, since we don't really want the b The entry may not exist already. Note the entry creation above is for 'fn_name'. Auto-creating entries here is fine as we are constructing the map. We just don't want this to happen after initialization. http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.h File be/src/codegen/codegen-callgraph.h: Line 21: #include > Use std::unordered_set and _map when possible Good point but it appears that two closely related files llvm-codegen.cc and libcache.cc are already using boost::unordered_* so I'd avoid mixing both to avoid confusion or to further extend the scope of this change. We should clean up all these places in one shot instead. Line 29: class CodegenCallGraph { > One line class comment? Done PS1, Line 39: inline > I believe "inline" is implied for all methods defined in the class body: Done Line 43: if (LIKELY(iter != call_graph_.end())) { > Consider using ? : to make this a one-liner Done PS1, Line 46: NULL > nullptr? Done PS1, Line 51: inline > unnecessary inline Done Line 71: /// and return them in 'users'. > "append them to 'users'", to make it clear that it doesn't clear existing c Done http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS1, Line 290: NULL > nit: we've mostly standardised on nullptr Done PS1, Line 594: (*callees) > nit: unnecessary parens Done -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Michael Ho has uploaded a new patch set (#2). Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. IMPALA-4929: Safe concurrent access to IR function call graph Previously, LlvmCodeGen creates a map (fn_refs_map_) which maps an IR function to its set of callees. That map was initialized once at Impalad's start time and it is supposed to be read-only afterwards. However, the map was unintentionally modified by its user LlvmCodeGen::MaterializeFunctionHelper() due to the use of operator []. In particular, that operator may create a new entry in the map if it doesn't exist already. This is possible because the map initially only contains entries for functions with at least one callee function. Using the operator [] with functions with no callee function may modify the map. Since the map is shared by all fragment instances, such unsafe concurrent modification can cause missing or wrong callee functions to be materialized, leading to failure to resolve symbols in LLVM. This change fixes the problem above by: 1. Create an entry for all functions even if it has no callee. In fact, LlvmCodeGen::LinkModule() assumes all functions defined in main module will have entries in the map. 2. Introduce a new class CodegenCallGraph which encapsulates the map described above. The map was established once at initialization time and there is no interface to modify the map afterwards, thus preventing any accidental modification to the map at run time. Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a --- M be/src/codegen/CMakeLists.txt A be/src/codegen/codegen-callgraph.cc A be/src/codegen/codegen-callgraph.h M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h 5 files changed, 188 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6326/2 -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6328/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 802: bool success = stream_->GetBytes( > What would be the benefit of the bool* if we always return a Status? We use this pattern when things are retryable (e.g. in a lot of spilling code, false + OK status means "we didn't succeed, but you should free memory and retry). You make a good point - GetBytes() doesn't actually do that. We could just change it to return a Status. I wonder if this is some vestige of an optimisation from when Status was much more expensive. -- To view, visit http://gerrit.cloudera.org:8080/6328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6328/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 802: bool success = stream_->GetBytes( I feel like this pattern (returning a bool and setting status on error) is pretty error-prone. In a couple of perf-critical places we use it because it's faster than returning a Status (e.g. partitioned-hash-join-node), but I don't see a reason to use it here. We could consider inverting it for GetBytes() - returning Status and setting a bool* to indicate success. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/6328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Jim Apple has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.h File be/src/codegen/codegen-callgraph.h: Line 21: #include Use std::unordered_set and _map when possible -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
Lars Volker has posted comments on this change. Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader() .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6328/1//COMMIT_MSG Commit Message: Line 13: I verified that the other uses of status are ok. Most do not check status.ok() I think we should change GetBytes() to always initialize status. The function comment even seems vague on this, so we should update it, too. Does any of the other uses rely on the fact that status does not get changed if no error occurs? If so, I think we should revisit that code. -- To view, visit http://gerrit.cloudera.org:8080/6328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()
Joe McDonnell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6328 Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader() .. IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader() GetBytes only sets status in the case of an error. This means that ReadPageHeader needs to initialize the status variable so that the status.ok() check is accurate after the GetBytes call. I verified that the other uses of status are ok. Most do not check status.ok() directly, but rely on the return value of the function setting status. Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e --- M be/src/exec/parquet-column-readers.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6328/1 -- To view, visit http://gerrit.cloudera.org:8080/6328 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell
[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4929: Safe concurrent access to IR function call graph .. Patch Set 1: (12 comments) Thanks for the cleanup - I think this makes it easier to understand the callgraph http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.cc File be/src/codegen/codegen-callgraph.cc: Line 27: using namespace strings; You can use common/names.h instead of the string and unordered_* usings. Line 35: Status status = Can combine this with the below line. PS1, Line 71: boost:: I believe this creates two unordered_sets (the Lvalue and Rvalue), then copies the rvalue to the lvalue then destroys the rvalue. I think if you use emplace(fn_name) that just initialises the set inside the map: http://en.cppreference.com/w/cpp/container/unordered_map/emplace The boost:: package name is also not needed. Line 87: call_graph_[key].insert(fn_name); I think we can use find(key)->insert here, since we don't really want the behaviour of auto-creating entries. http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.h File be/src/codegen/codegen-callgraph.h: Line 29: class CodegenCallGraph { One line class comment? PS1, Line 39: inline I believe "inline" is implied for all methods defined in the class body: https://isocpp.org/wiki/faq/inline-functions#inline-member-fns-more Line 43: if (LIKELY(iter != call_graph_.end())) { Consider using ? : to make this a one-liner PS1, Line 46: NULL nullptr? PS1, Line 51: inline unnecessary inline Line 71: /// and return them in 'users'. "append them to 'users'", to make it clear that it doesn't clear existing contents. http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: PS1, Line 290: NULL nit: we've mostly standardised on nullptr PS1, Line 594: (*callees) nit: unnecessary parens -- To view, visit http://gerrit.cloudera.org:8080/6326 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3973: optional 2nd and 3rd arguments for instr().
Jim Apple has posted comments on this change. Change subject: IMPALA-3973: optional 2nd and 3rd arguments for instr(). .. Patch Set 2: > > Since I got the +1 from Zoltan, and I'd classify the subsequent > > suggestions as nice-to-have but not essential, any objection to > my > > +2'ing it now and revisiting improvements later? > > Yes. The way things work on the code side is that all comments are > addressed before submitting. TO be even more specific: on the code side, all comments are addressed in the opinion of reviewer, not the patch author. -- To view, visit http://gerrit.cloudera.org:8080/5589 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Reviewer: zi+z...@cloudera.com Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3973: optional 2nd and 3rd arguments for instr().
Jim Apple has posted comments on this change. Change subject: IMPALA-3973: optional 2nd and 3rd arguments for instr(). .. Patch Set 2: > Since I got the +1 from Zoltan, and I'd classify the subsequent > suggestions as nice-to-have but not essential, any objection to my > +2'ing it now and revisiting improvements later? Yes. The way things work on the code side is that all comments are addressed before submitting. -- To view, visit http://gerrit.cloudera.org:8080/5589 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Reviewer: zi+z...@cloudera.com Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5025: Update binutils to 2.26.1
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5025: Update binutils to 2.26.1 .. IMPALA-5025: Update binutils to 2.26.1 This release includes the fix for IMPALA-3507 and the fix for https://sourceware.org/bugzilla/show_bug.cgi?id=19520, which made libImpalaUdf.a unusable by older linkers. Change-Id: Idcc00efc13b1ff4fe8ce3aafcff0b7880908ad5b Reviewed-on: http://gerrit.cloudera.org:8080/6287 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6287 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Idcc00efc13b1ff4fe8ce3aafcff0b7880908ad5b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5025: Update binutils to 2.26.1
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5025: Update binutils to 2.26.1 .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6287 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idcc00efc13b1ff4fe8ce3aafcff0b7880908ad5b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No