[Impala-ASF-CR] IMPALA-6084: Avoid using of global namespace for llvm
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8489 ) Change subject: IMPALA-6084: Avoid using of global namespace for llvm .. Patch Set 4: Code-Review+1 (12 comments) Looks good to me, just some cleanup left to do http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/codegen/codegen-anyval.cc@156 PS4, Line 156: // DecimalVal, you can fit this in the previous line http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/codegen/codegen-callgraph.cc File be/src/codegen/codegen-callgraph.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/codegen/codegen-callgraph.cc@41 PS4, Line 41: for (llvm::Use& u : val->uses()) I think we usually stick to using no space between ":" like "for (llvm::Use& u: val->uses())" but I see exceptions in the codebase to this rule. Can someone please confirm? http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/codegen/instruction-counter.cc File be/src/codegen/instruction-counter.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/codegen/instruction-counter.cc@43 PS4, Line 43: // Create all instruction counter and put them into counters_. Any : // InstructionCount that has instructions delegated to it in : // InstructionCounter::visit(const Instruction ) can fit in 2 lines http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/codegen/llvm-codegen.cc@1127 PS4, Line 1127: llvm:: search/replace gone wrong http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/hash-table.cc@40 PS4, Line 40: using llvm::APFloat; : using llvm::ArrayRef; : using llvm::BasicBlock; : using llvm::ConstantFP; : using llvm::Function; : using llvm::LLVMContext; : using llvm::PHINode; : using llvm::PointerType; : using llvm::Type; : using llvm::Value; you can remove these http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/partitioned-hash-join-builder.cc@49 PS4, Line 49: using llvm::ConstantInt; : using llvm::Function; : using llvm::LLVMContext; : using llvm::PointerType; : using llvm::Type; : using llvm::Value you can remove these http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/partitioned-hash-join-node.cc@50 PS4, Line 50: using llvm::BasicBlock; : using llvm::ConstantInt; : using llvm::Function; : using llvm::GlobalValue; : using llvm::LLVMContext; : using llvm::PointerType; : using llvm::Type; : using llvm::Value; you can remove these http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/partitioned-hash-join-node.cc@131 PS4, Line 131: llvm:: search/replace gone wrong http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exec/select-node.cc@30 PS4, Line 30: using llvm::Function; you can remove this http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exprs/scalar-expr.cc@260 PS4, Line 260: llvm:: search/replace gone wrong, after you fix that, you can fit this into one line http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exprs/scalar-fn-call.cc File be/src/exprs/scalar-fn-call.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/exprs/scalar-fn-call.cc@46 PS4, Line 46: using llvm::ArrayType; : using llvm::BasicBlock; : using llvm::Function; : using llvm::GlobalVariable; : using llvm::PointerType; : using llvm::Type; : using llvm::Value; you can remove these http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: http://gerrit.cloudera.org:8080/#/c/8489/4/be/src/runtime/descriptors.cc@40 PS4, Line 40: using llvm::Constant; : using llvm::ConstantAggregateZero; : using llvm::ConstantInt; : using llvm::ConstantStruct; : using llvm::StructType; you can remove these -- To view, visit http://gerrit.cloudera.org:8080/8489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master
[Impala-ASF-CR] IMPALA-6084: Avoid using of global namespace for llvm
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8489 ) Change subject: IMPALA-6084: Avoid using of global namespace for llvm .. Patch Set 4: > I'm ok with using llvm:: in those files, but was going to let > Bikram also weigh in. I agree with Tim and am ok with using llvm:: as well. I was waiting to make a complete pass before I replied, as I initially noticed many search/replace gone wrong, but seems like you fixed those in the latest patch. Let me take another pass to see if i missed something else. -- To view, visit http://gerrit.cloudera.org:8080/8489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7098baff5f746bcc2965583ca99f6188997b733a Gerrit-Change-Number: 8489 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 10 Nov 2017 00:56:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6170: Remove broken backend test from llvm-codegen-test
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8505 Change subject: IMPALA-6170: Remove broken backend test from llvm-codegen-test .. IMPALA-6170: Remove broken backend test from llvm-codegen-test Remove backend test that expects hdfs service to be running, which is not an expectation backend test should have. This caused test runs to fail that use the local filesystem as their default filesystem. Also the code path exercised by that test is already covered in the end to end tests, like in test_udfs.py for cases where the functions used in the query live in the same lib file. Change-Id: Iaed0109f5163343427015d571d6d24233b9d3fdc --- M be/src/codegen/llvm-codegen-test.cc 1 file changed, 0 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/8505/1 -- To view, visit http://gerrit.cloudera.org:8080/8505 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iaed0109f5163343427015d571d6d24233b9d3fdc Gerrit-Change-Number: 8505 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-6084: Avoid using of global namespace for llvm
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8489 ) Change subject: IMPALA-6084: Avoid using of global namespace for llvm .. Patch Set 3: does it make sense to import llvm namespace globally for the files under "be/src/codegen/" since they primarily work with code from that namespace? that might help avoid code bloat for those files -- To view, visit http://gerrit.cloudera.org:8080/8489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7098baff5f746bcc2965583ca99f6188997b733a Gerrit-Change-Number: 8489 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 08 Nov 2017 21:41:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5999: Fix LLVM linkage errors due LibCache sync issues
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8487 to look at the new patch set (#2). Change subject: IMPALA-5999: Fix LLVM linkage errors due LibCache sync issues .. IMPALA-5999: Fix LLVM linkage errors due LibCache sync issues Some test runs are currently failing randomly in test_ir_functions due to LLVM linkage error. This happens when impala tries to link a function from a lib file on hdfs, but it somehow got removed from the lib cache before it could link. This results in a new file being cached but with a slightly different local filename, and since impala only uses local filenames to check for collision for linking of LLVM modules, it ends up linking the same file twice and hence encounters an error. This patch fixes this issue by using the hdfs file path to check for collision of lib files. Testing: Added a backend test that tries to link the same lib twice. Change-Id: Iefa23bbf47998fe7e84011e1edf8e794e94a1757 --- M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h 5 files changed, 39 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/8487/2 -- To view, visit http://gerrit.cloudera.org:8080/8487 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iefa23bbf47998fe7e84011e1edf8e794e94a1757 Gerrit-Change-Number: 8487 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5999: Fix LLVM linkage errors due LibCache sync issues
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8487 ) Change subject: IMPALA-5999: Fix LLVM linkage errors due LibCache sync issues .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8487/1/be/src/codegen/llvm-codegen-test.cc File be/src/codegen/llvm-codegen-test.cc: http://gerrit.cloudera.org:8080/#/c/8487/1/be/src/codegen/llvm-codegen-test.cc@485 PS1, Line 485: NULL > nullptr Done http://gerrit.cloudera.org:8080/#/c/8487/1/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/8487/1/be/src/codegen/llvm-codegen.h@584 PS1, Line 584: LinkModule > Maybe LinkModuleFromLocalFs() or something like that to disambiguate it the Done -- To view, visit http://gerrit.cloudera.org:8080/8487 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iefa23bbf47998fe7e84011e1edf8e794e94a1757 Gerrit-Change-Number: 8487 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 Nov 2017 21:15:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 11: Code-Review+2 Carrying over +2, rebased, fixed clang errors -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 11 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 07 Nov 2017 01:11:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#11). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc M testdata/data/README A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 19 files changed, 634 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/11 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 11 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5999: Fix LLVM linkage errors due LibCache sync issues
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8487 Change subject: IMPALA-5999: Fix LLVM linkage errors due LibCache sync issues .. IMPALA-5999: Fix LLVM linkage errors due LibCache sync issues Some test runs are currently failing randomly in test_ir_functions due to LLVM linkage error. This happens when impala tries to link a function from a lib file on hdfs, but it somehow got removed from the lib cache before it could link. This results in a new file being cached but with a slightly different local filename, and since impala only uses local filenames to check for collision for linking of LLVM modules, it ends up linking the same file twice and hence encounters an error. This patch fixes this issue by using the hdfs file path to check for collision of lib files. Testing: Added a backend test that tries to link the same lib twice. Change-Id: Iefa23bbf47998fe7e84011e1edf8e794e94a1757 --- M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h 5 files changed, 33 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/8487/1 -- To view, visit http://gerrit.cloudera.org:8080/8487 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iefa23bbf47998fe7e84011e1edf8e794e94a1757 Gerrit-Change-Number: 8487 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 8: (2 comments) @Tim, can you take a last look at the changes and carry over the +2 if they are ok? http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33 PS7, Line 33: int Encode(const InternalType& t, int encoded_byte_size, uint8_t* buffer, > Oh ok, thanks for the explanation. I think we should also add a comment men Done http://gerrit.cloudera.org:8080/#/c/7822/8/testdata/data/binary_decimal_dictionary.parquet File testdata/data/binary_decimal_dictionary.parquet: PS8: > We should also mention how these files were generated in the README Done -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 04 Nov 2017 00:23:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#9). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc M testdata/data/README A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 19 files changed, 639 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/9 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#8). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 18 files changed, 629 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/8 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-common.h@391 PS7, Line 391: fixed_len_size > Looked again. The variable name (and recycling the argument storage) is con Done http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/7/be/src/exec/parquet-plain-test.cc@33 PS7, Line 33: int EncodeVarLenDecimal(const DECIMAL_TYPE& t, int fixed_len_size, uint8_t* buffer){ > I took another look at the standard and it says that the minimum number of This test suite basically takes a test value and its expected size and first encodes it then decodes it, in order to make minimal changes to the test suite I reused the function signature and passed the "minimum number of bytes required to store the unscaled value" as the expected size which is passed to the encode methods names as "fixed_len_size". As per you suggestion in a previous comment, I believe this will be more clear if i change the name to encoded_byte_size. Also, I gave an explanation of what "fixed_len_size" for decimals stored as BYTE_ARRAY as a comment at line 46. I believe it'll be better to move that comment right above this method instead. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 01 Nov 2017 23:21:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6123: Fix column order of a query test in test inline view limit
Hello Lars Volker, Tim Armstrong, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8409 to look at the new patch set (#2). Change subject: IMPALA-6123: Fix column order of a query test in test_inline_view_limit .. IMPALA-6123: Fix column order of a query test in test_inline_view_limit Currently a "select *" query in test_inline_view_limit fails during exhaustive testing because Impala returns columns from HBase tables in a different order (IMPALA-886) than the one expected. This fix ensures the column order is consistent by specifying the output columns in the right order in the select query. Testing: Tested locally, with and without exhaustive exploration strategy. Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330 --- M testdata/workloads/functional-query/queries/QueryTest/inline-view-limit.test 1 file changed, 3 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8409/2 -- To view, visit http://gerrit.cloudera.org:8080/8409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330 Gerrit-Change-Number: 8409 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6123: Fix column order of a query test in test inline view limit
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8409 ) Change subject: IMPALA-6123: Fix column order of a query test in test_inline_view_limit .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8409/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8409/1//COMMIT_MSG@10 PS1, Line 10: because the output column order is different from : the expected one for it corresponding hbase table > how about: ... because Impala returns columns from HBase tables in a differ yes this sounds better. -- To view, visit http://gerrit.cloudera.org:8080/8409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330 Gerrit-Change-Number: 8409 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 28 Oct 2017 02:29:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6123: Fix column order of a query test in test inline view limit
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8409 Change subject: IMPALA-6123: Fix column order of a query test in test_inline_view_limit .. IMPALA-6123: Fix column order of a query test in test_inline_view_limit Currently a "select *" query in test_inline_view_limit fails during exhaustive testing because the output column order is different from the expected one for it corresponding hbase table. This fix ensures the column order is consistent by specifying the output columns in the right order in the select query. Testing: Tested locally, with and without exhaustive exploration strategy. Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330 --- M testdata/workloads/functional-query/queries/QueryTest/inline-view-limit.test 1 file changed, 3 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/8409/1 -- To view, visit http://gerrit.cloudera.org:8080/8409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I11667872b8788a8b0040bf9252bf07b987b5d330 Gerrit-Change-Number: 8409 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-6093: diagnostics for flaky TestHashJoinTimer
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8403 ) Change subject: IMPALA-6093: diagnostics for flaky TestHashJoinTimer .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic33296931db807abb960db43b99e5fd0f256 Gerrit-Change-Number: 8403 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 22:04:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6093: diagnostics for flaky TestHashJoinTimer
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8403 ) Change subject: IMPALA-6093: diagnostics for flaky TestHashJoinTimer .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/8403/2/tests/query_test/test_hash_join_timer.py File tests/query_test/test_hash_join_timer.py: http://gerrit.cloudera.org:8080/#/c/8403/2/tests/query_test/test_hash_join_timer.py@103 PS2, Line 103: for impalad in ImpalaCluster().impalads: : verifier = MetricVerifier(impalad.service) : verifier.wait_for_metric("impala-server.num-fragments-in-flight", 0) nice addition! -- To view, visit http://gerrit.cloudera.org:8080/8403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic33296931db807abb960db43b99e5fd0f256 Gerrit-Change-Number: 8403 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 17:46:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#7). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 18 files changed, 633 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/7 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-column-readers.cc@207 PS6, Line 207: InternalType, parquet::Type::type PARQUET_TYPE > you should explain these template parameters Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@40 PS6, Line 40: IMPALA_TO_PARQUET_TYPES I think it makes sense to rename this to INTERNAL_TO_PARQUET_TYPES to be consistent with our naming of template parameters. http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@88 PS6, Line 88: InternalType > let's explain what that is Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@177 PS6, Line 177: static int DecodeByParquetType(const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, > nit: long line Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@333 PS6, Line 333: inline int EncodeDecimal(const T& v, int fixed_len_size, uint8_t* buffer){ > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc@34 PS6, Line 34: int > int32_t for consistency? Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc@79 PS6, Line 79: LOGICAL_TYPE > LogicalType here and below Done -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 00:32:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 10: Code-Review+2 Rebased. carrying over Michael's +2 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 10 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 24 Oct 2017 21:17:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/8196/8/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8196/8/be/src/exec/select-node.cc@72 PS8, Line 72: Status( > It's probably quite clear what the call path is since this is the codegen f both the message in the Status obj created and the information logged by VerifyFunction() provide context related to the failed finalize call. http://gerrit.cloudera.org:8080/#/c/8196/9/tests/query_test/test_codegen.py File tests/query_test/test_codegen.py: http://gerrit.cloudera.org:8080/#/c/8196/9/tests/query_test/test_codegen.py@52 PS9, Line 52: assert_codegen_enabled(result.runtime_profile, [1]) > Optionally, you can consider adding a test in which codegen fails. Not stri Do we already have tests that check for codegen failed? Also is there a way I can induce a failure in codegen? -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 24 Oct 2017 20:45:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#6). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 17 files changed, 619 insertions(+), 273 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/6 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/hdfs-parquet-scanner.h@45 PS5, Line 45: INTERNAL_TYPE > I should have mentioned this earlier, but I think we mostly use CamelCase f Done http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@235 PS5, Line 235: template <> : inline int ParquetPlainEncoder::ByteSize(const Decimal4Value&) { : DCHECK(false); : return -1; : } : template <> : inline int ParquetPlainEncoder::ByteSize(const Decimal8Value&) { : DCHECK(false); : return -1; : } : template <> : inline int ParquetPlainEncoder::ByteSize(const Decimal16Value&) { : DCHECK(false); : return -1; : } fixed code duplication here too. http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@284 PS5, Line 284: template <> : inline int ParquetPlainEncoder::Encode( : const int8_t& v, int fixed_len_size, uint8_t* buffer) { : int32_t val = v; : memcpy(buffer, , sizeof(int32_t)); : return ByteSize(v); : } : : template <> : inline int ParquetPlainEncoder::Encode( : const int16_t& v, int fixed_len_size, uint8_t* buffer) { : int32_t val = v; : memcpy(buffer, , sizeof(int32_t)); : return ByteSize(v); : } Also, duplicated methods for both int8 and int16, since both are written out as int32 http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@328 PS5, Line 328: inline int ParquetPlainEncoder::Encode( > It looks like we could also deduplicate the Encode/Decode methods for FIXED Done http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-common.h@350 PS5, Line 350: Decode(const uint8_t* buffer, > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc@112 PS5, Line 112: //Decimal4Value: General test case, > I find this a bit confusing still because the values being encoded aren't i As discussed, moving these test cases back to its original form http://gerrit.cloudera.org:8080/#/c/7822/5/be/src/exec/parquet-plain-test.cc@200 PS5, Line 200: sizeof(int32_t) > Is there any reason to keep the encoding overhead separate from the encoded see comment above. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 5 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 24 Oct 2017 00:29:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Hello Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8196 to look at the new patch set (#9). Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Improved test coverage for select node with limit. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 1 and l_linenumber > 1 and l_comment >'foo0' and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h M testdata/workloads/functional-query/queries/QueryTest/inline-view-limit.test M tests/query_test/test_codegen.py 8 files changed, 131 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/9 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/8196/8/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8196/8/be/src/exec/select-node.cc@a69 PS8, Line 69: : > Why are these not needed anymore ? Done http://gerrit.cloudera.org:8080/#/c/8196/8/be/src/exec/select-node.cc@72 PS8, Line 72: Status( > Will it be sufficient to use Status::Expected() for this ? I believe finalize function does not fail frequently, so shouldn't affect perf if it writes to log. Also if it ever does fail, its probably worth printing the stacktrace to the logs. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 23 Oct 2017 23:03:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/hdfs-parquet-scanner.h@352 PS10, Line 352: return mem_tracker_; > I tried that earlier but I was getting the below error for parquet table wr I see, that probably happens because HdfsTableSink class only has a forward declaration in hdfs-table-write.h If you would like to give this another try: You can try including the hdfs-table-sink.h in the hdfs-parquet-table-writer.h and it should work. OR you can move the implementation of dict_mem_tracker() to hdfs-parquet-table-writers.cc and add the include there instead in order to minimize the includes in the header file. http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/parquet-column-readers.h@356 PS10, Line 356: if (dict_decoder_base_ != nullptr) dict_decoder_base_->Close(); > GetDictionaryDecoder() returns dict_decoder_base_ it should be the same cal Sorry, I should have been more clearer about what I was suggesting. What I meant was, instead of changing the method GetDictionaryDecoder() in parquet-column-readers.h where you return dict_decoder_base_, we can instead get rid of dict_decoder_base_ member var altogether and use GetDictionaryDecoder() because the overridden method in ScalarColumnReader would return the actual dict_decoder_ object that we can use directly and in all other cases the default implementation will return a nullptr. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 11 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 21 Oct 2017 00:25:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/hdfs-parquet-scanner.h@352 PS10, Line 352: return mem_tracker_; if this is the same mem tracker in scan_node_ perhaps we can just do scan_node_->mem_tracker(); over here and avoid adding a new member variable? Same in the hdfs-table-writer. http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/hdfs-parquet-scanner.cc@337 PS10, Line 337: if (mem_tracker_ != nullptr) { : mem_tracker_ = nullptr; : } nit: one line if (mem_tracker_ != nullptr) mem_tracker_ = nullptr; http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/exec/parquet-column-readers.h@356 PS10, Line 356: if (dict_decoder_base_ != nullptr) dict_decoder_base_->Close(); instead of dict_decoder_base_ we can just use the virtual function GetDictionaryDecoder() to get the dict if it exists http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/util/dict-encoding.h@206 PS10, Line 206: nit: extra line http://gerrit.cloudera.org:8080/#/c/8034/10/be/src/util/dict-encoding.h@324 PS10, Line 324: nit: extra line -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 10 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 19 Oct 2017 21:41:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 4: (16 comments) http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-column-readers.cc@1277 PS4, Line 1277: switch (node.element->type) { > This case is only going to get bigger with the follow on patches - it would Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-common.h@376 PS4, Line 376: inline int ParquetPlainEncoder::Decode( > I think you can avoid stamping this out if you leave it still parameterized Unfortunately, the call to Decode is explicit in our code, eg Decode therefore it will always look for a specialization of those template parameters. Similarly, in the dummy program you wrote, if I explicitly call foo (, ), this will look for a specialization of the double templated method and return 0; Since partial specialization is not possible, the only way to reduce redundant code will be to do what Lars suggested earlier, i.e., create a helper method that is only templatized on and call that for each full specialization. I will make changes accordingly in the next iteration of this patch. http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h@237 PS2, Line 237: > I think this particular call here will always return sizeof(int32_t) (line should we do that everywhere else in this file wherever we specialize a method(both Decode and encode)? http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h@338 PS2, Line 338: return fixed_len_size; > I'm not sure I'm following: it looks like the next three methods are exactl Oh I see what you mean now. Done! http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@66 PS4, Line 66: bool IsSupportedPhysicalType(PrimitiveType impala_type, > let's make it static - we don't want to export the symbol to be linked with Done. enclosed in an anonymous namespace. http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@70 PS4, Line 70: ( > unnecessary parens Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@173 PS4, Line 173: // TODO: Is this check too strict? > I don't see why this shouldn't match the file metadata, this seems valid to Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@182 PS4, Line 182: parquet::SchemaElement* > Why not pass by const ref like above? Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-metadata-utils.cc@209 PS4, Line 209: stringstream ss; > Can you convert to using Substitute() while we're here? We've been very gra Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@176 PS4, Line 176: TestType (Decimal4Value(test_val), > It would be nice to combine the FIXED_LEN_BYTE_ARRAY and BYTE_ARRAY tests. Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@193 PS4, Line 193: int32_t > Any reason to use sizeof(int..) instead of sizeof(Decimal*Value) like above for var len decimals, its encoded size is bytes required to store size + least num of bytes required to store num. in this case bytes required to store numeric_limits::max() is sizeof(int32_t) which is less than the sizeof(Decimal8Value) since Decimal8Value is stored using int64 I will add a comment at the top of these byte_array test to clarify what expected size is. would you recommend I add more comments before limit tests? http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/exec/parquet-plain-test.cc@212 PS4, Line 212: for (int i = 1; i <=16; ++i) { > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h@104 PS4, Line 104: PHYSICAL_TYPE > Isn't this PHYSICAL_TYPE the internal type, which is called LOGICAL_TYPE in reverted to 'T' to minimize change from previous code http://gerrit.cloudera.org:8080/#/c/7822/4/be/src/util/dict-encoding.h@199 PS4, Line 199: PHYSICAL_TYPE > It doesn't seem
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#5). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 17 files changed, 568 insertions(+), 256 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/5 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 5 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 7: Code-Review+2 Rebasing. Carrying over +2. -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 18 Oct 2017 19:10:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 6: Code-Review+2 Carrying over Tim's +2 -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 17 Oct 2017 20:49:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Hello Philip Zeyliger, anujphadke, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8233 to look at the new patch set (#5). Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors Currently if llvm encounters an error it calls exit() and this kills the impalad process. This patch adds a diagnostic handler that llvm can use to pass the error up to impala instead of crashing it. Testing: Added a test which induces an error in llvm and makes sure that its passed up to impala code and handled correctly. Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff --- M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/testutil/gtest-util.h M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h 6 files changed, 93 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/8233/5 -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 5 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc@308 PS4, Line 308: if (!diagnostic_err.empty()) ss <<" "<< diagnostic_err; > nit: Do we do spacing around "<<" operators? Done http://gerrit.cloudera.org:8080/#/c/8233/4/be/src/codegen/llvm-codegen.cc@1625 PS4, Line 1625: diagnostic_printer <<"LLVM diagnostic error: "; > nit: space around "<<"? Done -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 17 Oct 2017 20:29:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Hello Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8196 to look at the new patch set (#8). Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Improved test coverage for select node with limit. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 1 and l_linenumber > 1 and l_comment >'foo0' and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h M testdata/workloads/functional-query/queries/QueryTest/inline-view-limit.test M tests/query_test/test_codegen.py 8 files changed, 131 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/8 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 8 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 7: (3 comments) http://gerrit.cloudera.org:8080/#/c/8196/7/be/src/exec/select-node-ir.cc File be/src/exec/select-node-ir.cc: http://gerrit.cloudera.org:8080/#/c/8196/7/be/src/exec/select-node-ir.cc@28 PS7, Line 28: limit_ > This could return extra rows. E.g. if the limit for the node is 1025, we re Done. also added a test case that checks if limit is enforced by select node. planner tests already exist that check if limit is pushed to a select node, so I have only added a query test. http://gerrit.cloudera.org:8080/#/c/8196/7/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8196/7/be/src/exec/select-node.cc@91 PS7, Line 91: DCHECK(*eos == false); > This isn't safe - the GetNext() API doesn't require that the caller initial Done http://gerrit.cloudera.org:8080/#/c/8196/7/be/src/exec/select-node.cc@112 PS7, Line 112: > nit: remove extra vertical whitespace Done -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 17 Oct 2017 03:02:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Hello Philip Zeyliger, anujphadke, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8233 to look at the new patch set (#4). Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors Currently if llvm encounters an error it calls exit() and this kills the impalad process. This patch adds a diagnostic handler that llvm can use to pass the error up to impala instead of crashing it. Testing: Added a test which induces an error in llvm and makes sure that its passed up to impala code and handled correctly. Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff --- M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/testutil/gtest-util.h M be/src/util/filesystem-util.cc M be/src/util/filesystem-util.h 6 files changed, 93 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/8233/4 -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py File tests/query_test/test_codegen.py: http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py@45 PS1, Line 45: def test_codegen_diagnostic_handler(self, vector): > You could probably test this in be/src/codegen/llvm-codegen-test.cc with co Done -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 16 Oct 2017 22:57:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@308 PS3, Line 308: if (!diagnostic_err.empty()) ss << diagnostic_err; > nit: This might look like "... to main module.Bla bla." I.e., no space afte Done http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@1628 PS3, Line 1628: LOG(INFO) << diagnostic_handler->error_str_; > Include the query id for context, which should be accessible via state_? Wi Done http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@1634 PS3, Line 1634: (std:: > Should be able to drop the std:: - we import it into the namespace in commo Done http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@1635 PS3, Line 1635: error_str_.clear(); > I might be confused about the C++11 stuff, but given that you just std::mov Done http://gerrit.cloudera.org:8080/#/c/8233/3/be/src/codegen/llvm-codegen.cc@1635 PS3, Line 1635: error_str_.clear(); > Yup Done -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 16 Oct 2017 22:56:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Hello Philip Zeyliger, anujphadke, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8233 to look at the new patch set (#3). Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors Currently if llvm encounters an error it calls exit() and this kills the impalad process. This patch adds a diagnostic handler that llvm can use to pass the error up to impala instead of crashing it. Testing: Added a test which induces an error in llvm and makes sure that its passed up to impala code and handled correctly. Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/query_test/test_codegen.py 3 files changed, 75 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/8233/3 -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@781 PS2, Line 781: /// Ensures that an attempt to read diagnostic state would reset the object. > Comment is kinda cryptic. Maybe needs some reference to the LLVM Diagnostic Done http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@784 PS2, Line 784: > nit: extra blank line. removed http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@785 PS2, Line 785: DiagnosticHandler() : encountered_error_(false) {} > Let's initialise the member variables inline. Removed http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@787 PS2, Line 787: /// Returns the error string if an error was encountered and resets the state in > Can you give some hint about when a caller should call this? I.e. after an Done http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@789 PS2, Line 789: string > std::string in a header (I guess there's a rogue "using std::string" somewh Done http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@791 PS2, Line 791: /// Handler function that sets the state on an instance of this class > I think the comments could make the relationship between these two function Done http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@796 PS2, Line 796: string error_str_; > Can you briefly document the member variables and the relationship between removed http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@796 PS2, Line 796: string > std::string. Done http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.h@799 PS2, Line 799: /// Maintains state set by diagnostic handler. > Comment is kinda cryptic. It may not be necessary to have both a class and Done http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.cc@1621 PS2, Line 1621: DiagnosticHandler* diagnostic_handler = reinterpret_cast(context); > Can context be a nullptr in any scenarios? We should check for that before I dont think there is any scenario as far as how we are using this handler, will a DCHECK(context != nullptr) suffice instead? http://gerrit.cloudera.org:8080/#/c/8233/2/be/src/codegen/llvm-codegen.cc@1627 PS2, Line 1627: error_msg.flush(); > I feel like we should also log the error at LOG(INFO) level so that it does Done -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 11 Oct 2017 00:05:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Hello Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8196 to look at the new patch set (#7). Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 1 and l_linenumber > 1 and l_comment >'foo0' and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h M tests/query_test/test_codegen.py 7 files changed, 121 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/7 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node-ir.cc File be/src/exec/select-node-ir.cc: http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node-ir.cc@29 PS6, Line 29: while (child_row_idx_ < child_batch_num_rows) > Have you looked into using FOREACH_ROW_LIMIT() ? May have some complication Done http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.h File be/src/exec/select-node.h: http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.h@40 PS6, Line 40: virtual void Codegen(RuntimeState* state); > While you are at it, would you mind adding override to these functions ? Done http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.h@63 PS6, Line 63: codegend_copy_rows_fn_; > Feel free to ignore but I think we have recently switched to initializing f I should do that change for all the member variables. Do you recommend I make that change in this commit? http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.h@69 PS6, Line 69: /// Codegen CopyRows(). > May help to explain that this is mostly codegen'ing the conjuncts evaluatio Done http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc@91 PS6, Line 91: if (ReachedLimit() || (child_row_idx_ == child_row_batch_->num_rows() && child_eos_)) { > This is still a bit wonky (not your change but we should try to leave this Done http://gerrit.cloudera.org:8080/#/c/8196/6/be/src/exec/select-node.cc@107 PS6, Line 107: child_row_batch_->TransferResourceOwnership(row_batch); > Sorry for the additional churn but I think the resource transfer is overly Done -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 10 Oct 2017 21:46:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Hello Philip Zeyliger, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8233 to look at the new patch set (#2). Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors Currently if llvm encounters an error it calls exit() and this kills the impalad process. This patch adds a diagnostic handler that llvm can use to pass the error up to impala instead of crashing it. Testing: Added a test which induces an error in llvm and makes sure that its passed up to impala code and handled correctly. Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/query_test/test_codegen.py 3 files changed, 72 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/8233/2 -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8233 ) Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.h@783 PS1, Line 783: class DiagnosticState { > Based on http://llvm.org/doxygen/DiagnosticHandler_8h_source.html, it looks that DiagnosticHandler was only added recently (20 days ago) and is not available in the llvm version we use. currently the only way is to set a handler callback function. http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@186 PS1, Line 186: >diagnostic_state_, true); > Could you comment on what this third argument does? Thanks for highlighting this. There are some cases where the diagnostic handler will not be called for some remarks based on filters set using commandline flags. I had initially set this to true to receive all diagnostic messages regardless of filters, but since we are only interested in errors at the moment, we can do away with not receiving those remarks. hence I will revert this back to the default value of false. http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@304 PS1, Line 304: if (error || diagnostic_state_.EncounteredError()) { > So do the diagnostic messages get printed anywhere? Done, also removed this condition from the if statement because it seems that error is true (Linker::linkModules returns true) if a diagnostic handler is set and an error is encountered http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@1621 PS1, Line 1621: DiagnosticPrinterRawOStream diagnostic_printer(llvm::errs()); > We talked about this interface offline and whether it was possible to get t Done http://gerrit.cloudera.org:8080/#/c/8233/1/be/src/codegen/llvm-codegen.cc@1622 PS1, Line 1622: info.print(diagnostic_printer); > Does this make it to our logs? Done http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py File tests/query_test/test_codegen.py: http://gerrit.cloudera.org:8080/#/c/8233/1/tests/query_test/test_codegen.py@45 PS1, Line 45: def test_codegen_diagnostic_handler(self, vector): > You could probably test this in be/src/codegen/llvm-codegen-test.cc with co I tried moving this to llvm-codegen-test.cc but it turns out that a lot ugly code needs to be added to induce a linkage error. I had 2 options there: 1) add a Gtest that calles LlvmCodeGen::LoadFunction twice on the same lib, but I need 2 different names to the same file to induce an error because there are checks to see if a file has already been linked. I am not sure I can do hdfs calls to create a copy of a lib through the backend test. 2) Since the methods related to this were private to LlvmCodegen class, I wrote a Test method in LlvmCodeGenTest to load module from a lib twice and call LLVM linker directly to those 2. but it turns out, that this adds alot of ugle code to the class LlvmCodeGenTest. I think another alternative would be to figure out how we can induce another kind of error through a simpler way, but I'll have to look more into it. would really appreciate any suggestions here. Thanks -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 10 Oct 2017 16:56:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Hello Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8196 to look at the new patch set (#6). Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 1 and l_linenumber > 1 and l_comment >'foo0' and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h M tests/query_test/test_codegen.py 7 files changed, 110 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/6 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Hello Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8196 to look at the new patch set (#5). Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 1 and l_linenumber > 1 and l_comment >'foo0' and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h M tests/query_test/test_codegen.py 7 files changed, 111 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/5 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 5 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8196/4/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8196/4/be/src/exec/select-node.cc@113 PS4, Line 113: copy_rows_result; > I think it would still be better to have a name that makes the below contro removed. see below comment. http://gerrit.cloudera.org:8080/#/c/8196/4/be/src/exec/select-node.cc@120 PS4, Line 120: if (copy_rows_result) { > I think there's an equivalent but simpler way of expressing the control flo You are absolutely right. I have refactored the code which now also make the bool returned by CopyRows() redundant. Therefore I've changed CopyRows()'s return type to void. -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 07 Oct 2017 00:53:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8233 Change subject: IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors .. IMPALA-6002: Add a LLVM diagnostic handler for LLVM linker errors Currently if llvm encounters an error it calls exit() and this kills the impalad process. This patch adds a diagnostic handler that llvm can use to pass the error up to impala instead of crashing it. Testing: Added a test which induces an error in llvm and makes sure that its passed up to impala code and handled correctly. Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M tests/query_test/test_codegen.py 3 files changed, 67 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/8233/1 -- To view, visit http://gerrit.cloudera.org:8080/8233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I907ff1f8975c02d422b4a669afbfc2e536ddf1ff Gerrit-Change-Number: 8233 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-5129: Use KRPC's Kinit code to avoid expensive fork
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use KRPC's Kinit code to avoid expensive fork .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 5 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 06 Oct 2017 18:42:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-scanner.h@485 PS6, Line 485: /// Tracks all the memory allocation used by dictionary in DictDecoder. : struct DictMemTrack { : std::unique_ptr mem_tracker; : : DictMemTrack(MemTracker *parent_memtrack):mem_tracker(new MemTracker(-1, : "Decoder parquet dict", parent_memtrack, false)) { }; : : MemTracker* GetMemTracker() { : return mem_tracker.get(); : } : : ~DictMemTrack() { : mem_tracker->CloseAndUnregisterFromParent(); : } : }; I dont think we need to add a new struct for this, you can just call CloseAndUnregisterFromParent() on the dict_mem_tracker in HdfsParquetScanner::Close Similarly for the writer http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@113 PS6, Line 113: // Bytes used for memory tracking by dictionary in DictEncoder is decremented. : virtual void Cleanup() { : if (dict_encoder_base_ != nullptr) { : dict_encoder_base_->Cleanup(); : } : } you can move this to the BaseColumnWriter::Close() method http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/exec/hdfs-parquet-table-writer.cc@767 PS6, Line 767: // Below code will decrement the bytes used for memory tracking by dictionary in : // DictEncoder class for each ColumnWriter. : for (int i = 0; i < columns_.size(); i++) { : if (columns_[i] != nullptr) { :columns_[i]->Cleanup(); : } : } similarly you can move this to the HdfsParquetTableWriter::Close() method http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@138 PS6, Line 138: SizeofDict I think we can use a more precise name here, SizeOfDict might be confused with returning the number of elements in the Dictionary. Something on the lines of DictByteSize? http://gerrit.cloudera.org:8080/#/c/8034/6/be/src/util/dict-encoding.h@438 PS6, Line 438: bytes_alloc += sizeof(value); can you switch to what joe suggested earlier, that is, instead of bytes_alloc, using something like the function SizeofDict() that you wrote to update ConsumeBytes at the end. Refer Joe's first comment. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 6 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Wed, 04 Oct 2017 22:29:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#4). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 17 files changed, 622 insertions(+), 259 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/4 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Hello Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8196 to look at the new patch set (#4). Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 1 and l_linenumber > 1 and l_comment >'foo0' and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h M tests/query_test/test_codegen.py 7 files changed, 111 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/4 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8196/3/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8196/3/be/src/exec/select-node.cc@113 PS3, Line 113: copy_rows_success > (I'm not sure if that description I just gave is accurate or the best way t true! the description is a bit more involved to capture it in a variable name. I think a generic name like "copy_row_result" is good enough and the function description in the header file is sufficient to understand that result. what do u think? http://gerrit.cloudera.org:8080/#/c/8196/3/be/src/exec/select-node.cc@126 PS3, Line 126: if (child_eos_) { > The control flow here is confusing to me. It took me a while to figure out Done http://gerrit.cloudera.org:8080/#/c/8196/3/tests/query_test/test_codegen.py File tests/query_test/test_codegen.py: http://gerrit.cloudera.org:8080/#/c/8196/3/tests/query_test/test_codegen.py@46 PS3, Line 46: codegned > missing an e Done http://gerrit.cloudera.org:8080/#/c/8196/3/tests/query_test/test_codegen.py@51 PS3, Line 51: assert len(exec_options) > This is a matter of taste but generally we prefer being explcit about the c Done -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 04 Oct 2017 00:46:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 7: Code-Review+2 Rebase to get the fix for IMPALA-6009. carrying over +2 -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Tue, 03 Oct 2017 23:10:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8196/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8196/1//COMMIT_MSG@8 PS1, Line 8: > is there anyway we can make sure through a test that codegen was successful as discussed, I have added the test to check the query profile -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 03 Oct 2017 22:54:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Hello Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8196 to look at the new patch set (#3). Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Testing: Added test case to verify that CopyRows in select node is successfully codegened. Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 1 and l_linenumber > 1 and l_comment >'foo0' and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h M tests/query_test/test_codegen.py 7 files changed, 110 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/3 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5628: (Draft) Add support for reading additional Parquet Decimal Types
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-5628: (Draft) Add support for reading additional Parquet Decimal Types .. Patch Set 3: > What's the next steps here? Is this blocked on something? will be submitting a full fledged patch tomorrow that adds support only for Binary (IMPALA-2494) since adding support for all will make this single patch quite huge. will add support for others in separate patches after this gets in. -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 03 Oct 2017 22:48:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Hello Michael Ho, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8196 to look at the new patch set (#2). Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Performance: Queries used (num_nodes set to 1): 500 Predicates: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 and l_extendedprice > 1 and l_linenumber > 1 and l_comment >'foo0' and l_comment >'foo500' order by l_orderkey limit 10; 1 Predicate: select * from (select * from tpch_parquet.lineitem limit 6001215) t1 where l_partkey > 10 order by l_orderkey limit 10; +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h 6 files changed, 99 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/2 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 1: (11 comments) http://gerrit.cloudera.org:8080/#/c/8196/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8196/1//COMMIT_MSG@8 PS1, Line 8: > Can you add a testing section? It would be good to mention what test covera is there anyway we can make sure through a test that codegen was successful for this node? http://gerrit.cloudera.org:8080/#/c/8196/1//COMMIT_MSG@9 PS1, Line 9: Preformance > Performance Done http://gerrit.cloudera.org:8080/#/c/8196/1//COMMIT_MSG@10 PS1, Line 10: perdicates > predicates Done http://gerrit.cloudera.org:8080/#/c/8196/1//COMMIT_MSG@11 PS1, Line 11: [column_name > value AND] > Can you include a concrete example of the query to make it easier for other Done http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node-ir.cc File be/src/exec/select-node-ir.cc: http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node-ir.cc@29 PS1, Line 29: child_row_batch_->num_rows() > This probably doesn't matter, but you could save a few instructions by hois Done http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node-ir.cc@41 PS1, Line 41: COUNTER_SET(rows_returned_counter_, num_rows_returned_); > COUNTER_SET can be pretty expensive because of the atomic operation. We can Done http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node.cc File be/src/exec/select-node.cc: http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node.cc@55 PS1, Line 55: NULL > Use nullptr in new code (here and below). Done http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node.cc@65 PS1, Line 65: if (codegen_status.ok()) { > Instead of this branch the control flow may be easier to follow if you fact Done http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node.cc@71 PS1, Line 71: DCHECK(copy_rows_fn != NULL); > We should probably check if this is NULL and fail codegen if so. I'm not su Done http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node.cc@99 PS1, Line 99: bool copy_rows_success = false; > Just declare it inside the loop where it's needed? When I see it declared u Done http://gerrit.cloudera.org:8080/#/c/8196/1/be/src/exec/select-node.cc@114 PS1, Line 114: copy_rows_success = false; > It's assigned on both branches so this isn't necessary. Done -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 03 Oct 2017 20:52:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h@412 PS3, Line 412: ConsumeBytes(sizeof(value)); > For strings, the StringValue is a pointer and a length. In the case of the yes I agree -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Tue, 03 Oct 2017 18:59:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8196 Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. IMPALA-4236: Codegen CopyRows() for select nodes Preformance: Query run on tpch_parquet.lineitem with perdicates of the form [column_name > value AND] +--+-+ | | 500 Predicates | 1 Predicate| | ++-++-+ | | After| Before| After| Before| +--++-++-+ | Select Node | 12s385ms | 1m1s| 234ms | 797ms | | Codegen time | 2s619ms| 1s962ms | 200ms | 181ms | +--++-++-+ Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt A be/src/exec/select-node-ir.cc M be/src/exec/select-node.cc M be/src/exec/select-node.h 6 files changed, 96 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/8196/1 -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-3504: [DOCS] Document utc timestamp()
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8190 ) Change subject: IMPALA-3504: [DOCS] Document utc_timestamp() .. Patch Set 1: all these seem to be added in the may-july 2017 time-frame utc_to_unix_micros() was added as a part of IMPALA-5137 in may 2017 timestamp_from_unix_micros() added in IMPALA-5338 (may 2017) was renamed to unix_micros_to_utc_timestamp() in IMPALA-5539 (july 2017) -- To view, visit http://gerrit.cloudera.org:8080/8190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia2662fc79d588f22a24a5067429a57b3c0d0f0f0 Gerrit-Change-Number: 8190 Gerrit-PatchSet: 1 Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Greg Rahn Gerrit-Comment-Date: Mon, 02 Oct 2017 18:02:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 5: Code-Review+2 rebased, carried +2 -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 5 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 02 Oct 2017 17:16:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 4: Code-Review+2 rebased, carried +2 -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 4 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Fri, 29 Sep 2017 22:23:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8168 to look at the new patch set (#3). Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. IMPALA-4951: Fix database visibility for user with only column privilege Currently a database is not visible to a user that only has column level privileges for tables in that database. This patch will make the database visible, which is the expected behavior in this case. Testing: added a test case to verify the same. Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a --- M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test 2 files changed, 73 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/8168/3 -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@744 PS2, Line 744: revoke role grant_revoke_test_ROOT from group root > Is this needed for your test? Done http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@750 PS2, Line 750: QUERY : show current roles : RESULTS: VERIFY_IS_SUBSET : 'grant_revoke_test_ALL_SERVER' : TYPES : STRING : > No need for that. Let's remove to make the test a bit smaller. Done http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@808 PS2, Line 808: # IMPALA-4951: make sure database is visible to a user having only column level access : # to a table in the database > I think it may be best to move this comment at the beginning of your change Done -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Fri, 29 Sep 2017 17:20:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-encoding.h@412 PS3, Line 412: ConsumeBytes(sizeof(value)); > The parquet DictionaryPageHeader contains a num_values field. Look where we using num_values * size of type might not work when dealing with variable sized type like string. I would recommend keeping count of the bytes used in a local variable and then do a ConsumeBytes when we exit the loop. This is because mtrackp loops on all its parent trackers to update mem usage every time Consume(num_bytes) is called. Not a huge optimization but I think it might be worth avoiding another loop inside the hot path. http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-test.cc File be/src/util/dict-test.cc: http://gerrit.cloudera.org:8080/#/c/8034/3/be/src/util/dict-test.cc@39 PS3, Line 39: tracker can you add test cases that verify that the encoder/decoder is keeping track correctly. You can do this by using tracker.consumption() to get the num of bytes consumed and compare it to the expected size you calculate separately. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 3 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Comment-Date: Fri, 29 Sep 2017 01:16:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8168 to look at the new patch set (#2). Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. IMPALA-4951: Fix database visibility for user with only column privilege Currently a database is not visible to a user that only has column level privileges for tables in that database. This patch will make the database visible, which is the expected behavior in this case. Testing: added a test case to verify the same. Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a --- M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test 2 files changed, 90 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/8168/2 -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@786 PS1, Line 786: # IMPALA-4951: make sure database is visible to a user having only column level access : # to a table in the database : show databases : RESULTS : 'default','Default Hive database' : 'grant_rev_db','' : TYPES : STRING,STRING : > I think the test for this jira should work as follows: Done -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 28 Sep 2017 23:25:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5129: Use KRPC's Kinit code to avoid expensive fork
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use KRPC's Kinit code to avoid expensive fork .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7938/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7938/4//COMMIT_MSG@21 PS4, Line 21: Converted existing tests in thrift-server-test to run with and : without kerberos. we can also add tests that run with and without FLAGS_use_krpc_kinit set to make sure both paths work and also make sure that KRPC's Kinit was successful. http://gerrit.cloudera.org:8080/#/c/7938/4/be/src/rpc/thrift-server-test.cc File be/src/rpc/thrift-server-test.cc: http://gerrit.cloudera.org:8080/#/c/7938/4/be/src/rpc/thrift-server-test.cc@129 PS4, Line 129: ASSERT_TRUE(kudu::Env::Default()->GetExecutablePath(_executable_path).ok()) since this assert is done a bunch of times here and would probably be done more in the future as we have added the KPRC code, would it make sense to add a macro for this (for kudu:status object) like we have ASSERT_OK for the impala:status object? -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 4 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 28 Sep 2017 21:38:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@761 PS1, Line 761: root > Why are you using root and not $GROUP_NAME for this test? I wanted to use a different user for this test, otherwise I would have to grant $GROUP_NAME a grant_revoke_test_ALL_SERVER role to create db and assign priv, then revoke grant_revoke_test_ALL_SERVER to do the actual test, and lastly grant grant_revoke_test_ALL_SERVER again to delete the database. I used root as I was sure that root user would always exist on a system. is there any other user/group that I can use? or should I fall back on using $GROUP_NAME throughout? -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 28 Sep 2017 20:55:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8168 Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. IMPALA-4951: Fix database visibility for user with only column privilege Currently a database is not visible to a user that only has column level privileges for tables in that database. This patch will make the database visible, which is the expected behavior in this case. Testing: added a test case to verify the same. Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a --- M fe/src/main/java/org/apache/impala/service/Frontend.java M testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test 2 files changed, 61 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/8168/1 -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-5812: Fix NPE when joining on empty const select
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5812: Fix NPE when joining on empty const select .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7971/1//COMMIT_MSG Commit Message: Line 7: IMPALA-5812: Fix NPE when joining on empty const select > Suggest: Done http://gerrit.cloudera.org:8080/#/c/7971/1/testdata/workloads/functional-planner/queries/PlannerTest/joins.test File testdata/workloads/functional-planner/queries/PlannerTest/joins.test: Line 2524 > Please move this into empty.test which tests various cases with EmptySetNod Done -- To view, visit http://gerrit.cloudera.org:8080/7971 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e425dbcb442aeeac687e103774823d3f50e6436 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5812: Fix NPE when joining on empty const select
Bikramjeet Vig has uploaded a new patch set (#2). Change subject: IMPALA-5812: Fix NPE when joining on empty const select .. IMPALA-5812: Fix NPE when joining on empty const select A NPE is thrown during the creation of the single node plan of a query consisting of a cross join with a constant select that returns an empty result set. This happens because when an empty-set plan node is created, its tupleIds_ and tblRefIds_ are initialized with the tuple ID of a newly create tuple that does not map to any existing tableRefs. This causes a null pre-check to fail during the creation of the join node when it tries to fetch the tableRef from that new tuple Id in the empty-set node but doesn't find one. Testing: Added a planner test. Change-Id: I6e425dbcb442aeeac687e103774823d3f50e6436 --- M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/empty.test 2 files changed, 19 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/7971/2 -- To view, visit http://gerrit.cloudera.org:8080/7971 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e425dbcb442aeeac687e103774823d3f50e6436 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5812: NPE when x joining on const select having an empty result
Bikramjeet Vig has uploaded a new change for review. http://gerrit.cloudera.org:8080/7971 Change subject: IMPALA-5812: NPE when x joining on const select having an empty result .. IMPALA-5812: NPE when x joining on const select having an empty result A NPE is thrown during the creation of the single node plan of a query consisting of a cross join with a constant select that returns an empty result set. This happens because when an empty-set plan node is created, its tupleIds_ and tblRefIds_ are initialized with the tuple ID of a newly create tuple that does not map to any existing tableRefs. This causes a null pre-check to fail during the creation of the join node when it tries to fetch the tableRef from that new tuple Id in the empty-set node but doesn't find one. Testing: Added a planner test. Change-Id: I6e425dbcb442aeeac687e103774823d3f50e6436 --- M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 2 files changed, 19 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/7971/1 -- To view, visit http://gerrit.cloudera.org:8080/7971 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6e425dbcb442aeeac687e103774823d3f50e6436 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-5628: (Draft) Add support for reading additional Parquet Decimal Types
Bikramjeet Vig has uploaded a new patch set (#3). Change subject: IMPALA-5628: (Draft) Add support for reading additional Parquet Decimal Types .. IMPALA-5628: (Draft) Add support for reading additional Parquet Decimal Types Added support for reading Parquet decimal Types encoded in : 1) Int32 2) Int64 3) Binary (Variable size byte array) A few points to look at: 1) the test table I generated to read INT64 encoded decimals shows null values for the whole row if a decimal column exists with precision less than or eqaul to 9 and encoded in Int64. If I alter table and change it to 10 or above, it displays the data correctly. One can infer that there is some issue with impala doing some internal conversion after reading it from the parquet reader, since a precision less than 10 should be stored using a Decimal4 (stores value in int32) but it gets a Decimal8 from the parquet reader. Any idea on what might be going wrong? 2) Currently the test for ParquetPlainEncoder contains templatized test methods that encode and decode a passed value. Since we had encode and decode methods for all types that we supported in both writer and reader, this was fine. now to support the types added in this commit, where only the reader supports it (hence only decode method exists), I would either have to write a completely new test method or specialize the existing test method for every permutation of. In any case a lot of new code will be added, which option do you think is better? 3) Waiting on confirmation that column stats should ne stored with the exact same format (same as the column). If we do get a confirmation that this is the correct. I would have to pass a SchemaElement object of that column to ColumnStatsBase::ReadFromThrift for handeling the case where the type is . We would be able to use the type_length attribute to use the correctly sized DecimalXValue in DecodePlainValue for reading the stat. 4) Will perf regression test be good enough or do you think we need to do perf tests for the newly added types too? Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 17 files changed, 550 insertions(+), 244 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/3 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Rough draft for IMPALA-5628
Bikramjeet Vig has posted comments on this change. Change subject: Rough draft for IMPALA-5628 .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 1254: // where to put that check? refer dcheck added below > This could possibly go with the other metadata checks, after initializing t makes sense, will do http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-column-stats.inline.h File be/src/exec/parquet-column-stats.inline.h: Line 89: switch(parquet_type){ > This switch on the parquet type looks like it may fit better in the Parquet Do you mean to have a Decode wrapper around the templatized Decode methods? or just keep the current single templated Decode and switch on parquet_type inside each specialized Decode? http://gerrit.cloudera.org:8080/#/c/7822/2/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: PS2, Line 237: ByteSize > These calls could now be removed and replaced with the bytesize derived fro Do you mean to have something like - template int ParquetPlainEncoder::ByteSize() , and then use overloading to select which type is passed, not sure how much perf impact will there be for overloading here OR - template int ParquetPlainEncoder::ByteSize() Line 338: template <> > Can this be deduplicated? thats kinda difficult because DecimalUtil::DecodeFromFixedLenByteArray is also templated and extracting the code before that into a common inlined function is not worth it because we need the 3 variables defined there in our call to DecodeFromFixedLenByteArray and to return, so would have pass 3 pointers to that method (and probably call it init or something like that) which might make the code ugly -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Rough draft for IMPALA-5628
Bikramjeet Vig has uploaded a new change for review. http://gerrit.cloudera.org:8080/7822 Change subject: Rough draft for IMPALA-5628 .. Rough draft for IMPALA-5628 Request for general comments on approch. Many changes pending, will keep updating. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-plain-test.cc M be/src/runtime/types.cc M be/src/runtime/types.h M be/src/util/dict-encoding.h 12 files changed, 255 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/2 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Alex Behm, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#11). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 131 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/11 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7560/10/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: PS10, Line 170: table :* set, but no ot > This isn't true, for HDFS tables we set the partition map (to the 'default Done -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5784: Separate planner and user set query options in profile
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5784: Separate planner and user set query options in profile .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7721/3/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: PS3, Line 149: manually > Seems okay. Or slight tweak: Done -- To view, visit http://gerrit.cloudera.org:8080/7721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5784: Separate planner and user set query options in profile
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7721 to look at the new patch set (#4). Change subject: IMPALA-5784: Separate planner and user set query options in profile .. IMPALA-5784: Separate planner and user set query options in profile This separation will help the user better understand the query runtime profile. Testing: Modified an existing test case. Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876 --- M be/src/service/client-request-state.cc M tests/custom_cluster/test_admission_controller.py M tests/query_test/test_observability.py 3 files changed, 13 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/7721/4 -- To view, visit http://gerrit.cloudera.org:8080/7721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#10). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 129 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/10 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 8: (9 comments) http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS8, Line 444: public void loadSchemaFromKudu() throws ImpalaRuntimeException > Move this above loadSchema() so we have these functions clustered. Done http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java: Line 513: public boolean hasStorageLayerConjuncts() { > single line? (and elsewhere) Done http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: Line 526: public boolean hasStorageLayerConjuncts() { > single line? Done http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 217:* Returns true if the node contains any conjuncts to be processed by Impala > Returns true if this node has conjuncts to be evaluated by Impala against t Done Line 224:* Returns true if the node contains any conjuncts pushed to the underlying storage > Returns true if this node has conjuncts to be evaluated by the underlying s Done Line 228: // Derived classes must override this method if they have any pushed conjuncts > don't think we need this, then single line Done http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java: Line 169:* Add a new dummy HDFS or external Kudu table to the catalog based on the given CREATE > Remove the "HDFS or external Kudu" part here. You already explain the limit oops, forgot to remove this. Line 199: }else if (dummyTable instanceof KuduTable) { > space after "}" Done PS8, Line 200: if (!Table.isExternalTable(msTbl)) { : fail("Failed to add table, external kudu table expected:\n" + createTableSql); : } > My concern with this approach is that this now does depend on Kudu, and I'm This wont work for managed kudu tables because this method requires we only create a dummy table in the catalog and no actual table is created in kudu. This breaks planner code because when it creates a kudu scan node, the init method in the scan node connects to kudu client and checks if the table exists or not then throws an exception if it doesn't. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#9). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 129 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/9 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7560/8/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: PS8, Line 444: public void loadSchemaFromKudu() throws ImpalaRuntimeException Extracted relevant code from KuduTable.load() for loading metadata from kudu into a separate public method and using that in FrontendTestBase.addTestTable() to create a temp table in catalog. Please let me know if this change is acceptable for adding support for kudu tables in FrontendTestBase.addTestTable(). If not, I will revert back to the previous patch set. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#8). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: Added frontend planner tests. Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 10 files changed, 141 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/8 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5784 : Separate planner and user set query options in profile
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7721 to look at the new patch set (#2). Change subject: IMPALA-5784 : Separate planner and user set query options in profile .. IMPALA-5784 : Separate planner and user set query options in profile This separation will help the user better understand the query runtime profile. Testing: Modified an existing test case. Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876 --- M be/src/service/client-request-state.cc M tests/custom_cluster/test_admission_controller.py M tests/query_test/test_observability.py 3 files changed, 14 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/7721/2 -- To view, visit http://gerrit.cloudera.org:8080/7721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5784 : Separate planner and user set query options in profile
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5784 : Separate planner and user set query options in profile .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7721/1/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: Line 151: summary_profile_.AddInfoString("Query Options (non default, after planning)", > I'd change 'before planning' and 'after planning' to 'manually set' and 'pl Done -- To view, visit http://gerrit.cloudera.org:8080/7721 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibfc7832963fa0bd278a45c06a5a54e1bf40d8876 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#7). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: For (A): Added a frontend planner test for kudu tables For (B): Added an end to end planner test for kudu and a query test for datasource tables Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M tests/query_test/test_kudu.py 8 files changed, 92 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/7 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 219: // Derived classes should override this method if they have any pushed conjuncts > Fair point. To me it's not confusing, but I'll defer to MJ. Instead of addi As discussed, using hasScanConjuncts to denote conjuncts being used by impala, and hasStorageLayerConjuncts to denote those pushed down to the storage layer. http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java File fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java: Line 57: valid_ = false; > MJ, does this change even make sense? Suppose we have a query with a limit as discussed, this code is correct as we need multiple impala instances to make kudu scans faster http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: Line 132: QUERY > I don't think that's true. This is what I get: Done http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used if table stats are > I understand. My question is whether you have tried extending addTestTable( Still working on a way to make this work. What I have found till now: addTestTable must be able to create a table only in the catalog and not in the actual storage layer. The problem is that when a kuduScan node is created in the planning phase it connects to the kudu client to make sure that the table exists. Now to get around this, if we create an external table, we will still have to copy code out of KuduTable.load() in order to load the columns from the kudu client. KuduTable.load() cannot be used directly as it requires a handle to HMS instance. Either this, or we refactor the code in KuduTable.load() to extract the required code into a separate method and call that. -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs
Hello Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7714 to look at the new patch set (#4). Change subject: IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs .. IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs Fixed a bug where impala crashes during execution of an aggregation query using nondeterministic grouping expressions. This happens when it tries to rebuild a spilled partition that can fit in memory and rows get re-hashed to a partition other than the spilled one due to the use of nondeterministic expressions. Testing: Added a query test to verify successful execution. Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 --- M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M testdata/workloads/functional-query/queries/QueryTest/spilling.test 3 files changed, 42 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7714/4 -- To view, visit http://gerrit.cloudera.org:8080/7714 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7714/3/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1160: // partition index. > Repartitioning should work ok in that each repartitioning step will reduce Just to add to this, we would also have to make the same effort for hash join node to stay consistent -- To view, visit http://gerrit.cloudera.org:8080/7714 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/7714/3/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1156: > nit: to get more code on the screen, i think we can do without blank lines will do in the next patch set. Thanks Line 1160: // partition index. > in the case where we do repartition (and prior to IMPALA-2708), what happen Yes, in case of repartition, the rows do get shuffled around in each iteration. I think the non-deterministic nature of the group by exprs justify this behavior. Although it makes it harder to be specific about what the meaningful way is. I discussed this with Tim, and he pointed out that it might become prohibitive to figure out if the query is non-deterministic, specially in case a non-deterministic UDF is used as a group by expr. We can also put a check to identify if the rows hash to a different partition on each iteration, but that would add code to the hot path and doesn't seem worth it. For the last part of your question, I am not aware of any queries that do this, but this special case might very well exist, if not intentionally, in someone's workload. This way we ensure it continues to behave the same. - Request Tim to pitch in if I missed something. Thanks http://gerrit.cloudera.org:8080/#/c/7714/3/be/src/exec/partitioned-aggregation-node.h File be/src/exec/partitioned-aggregation-node.h: PS3, Line 338: all pointers in this vector will point to a single : /// in-memory partition > outside of the context of this bug fix, it's hard to see why things would b You are absolutely right. This is very specific to our bug fix. would it make sense if I add the Jira ID in the comment? This would be able to provide more context in case some one trying to understand this part needs it. -- To view, visit http://gerrit.cloudera.org:8080/7714 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java File fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java: Line 367: public boolean hasPushedConjuncts() { return !acceptedConjuncts_.isEmpty(); } > Do we have the same issue with 'filters_' in the HBaseScanNode? No, for HBaseScanNode, if filters are created they are not removed from the conjuncts member variable http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/main/java/org/apache/impala/planner/ScanNode.java File fe/src/main/java/org/apache/impala/planner/ScanNode.java: Line 219: public boolean hasPushedConjuncts() { > It seems strange and unnecessary to distinguish the "pushed" from "non-push Makes sense, but as Matt pointed out earlier in a discussion that having similar names like hasConjuncts() or hasScanConjuncts() might make the interface to this class confusing while comparing hasConjuncts() and getConjuncts().isEmpty(). Would it make more sense to have a more specific name like he suggested earlier: getEffectiveScanConjuncts() ? http://gerrit.cloudera.org:8080/#/c/7560/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 317: TQueryOptions options = defaultQueryOptions(); > What do we need this change for? This adds a lot expected test output. Done http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 774: DISTRIBUTEDPLAN > We don't need explain level VERBOSE to test this. Done http://gerrit.cloudera.org:8080/#/c/7560/5/testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test File testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test: Line 132: QUERY > a PlannerTest is more suitable This code change has no effect on the plan created since for a data source a single node is used in the plan regardless of other optimizations. So, to make sure that small query optimization is not used, I had to check the runtime profile instead and make sure query options set by small query optimization dont appear http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 1010: """IMPALA-5602: Test that 'small query' optimization is not used if table stats are > Have you tried to see if we can modify FrontendTestBase.addTestTable() to Unfortunately FrontendTestBase.addTestTable() can only be used to add hdfs tables. http://gerrit.cloudera.org:8080/#/c/7560/5/tests/query_test/test_queries.py File tests/query_test/test_queries.py: Line 164: # Reset this exec_option to check default behaviour of any planner optimization tests > Why? This way we lose coverage of the other case. My understanding is that Done -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5602: Fix query optimization for kudu and datasource tables
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7560 to look at the new patch set (#6). Change subject: IMPALA-5602: Fix query optimization for kudu and datasource tables .. IMPALA-5602: Fix query optimization for kudu and datasource tables Fix a bug where the following queries on kudu and datasource tables were incorrectly being optimized as a 'small query' and therefore running on a single node with a single scanner thread: (A) that have all their predicates pushed to the underlying storage layer and have a limit (B) table stats missing + Conditions in (A) Testing: For (A): Added a frontend planner test for kudu tables For (B): Added an end to end planner test for kudu and a query test for datasource tables Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 --- M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/util/MaxRowsProcessedVisitor.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test M tests/query_test/test_kudu.py 7 files changed, 70 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/7560/6 -- To view, visit http://gerrit.cloudera.org:8080/7560 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93822d67ebda41d5d0456095c429e3915a3f40c4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs
Bikramjeet Vig has uploaded a new patch set (#3). Change subject: IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs .. IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs Fixed a bug where impala crashes during execution of an aggregation query using nondeterministic grouping expressions. This happens when it tries to rebuild a spilled partition that can fit in memory and rows get re-hashed to a partition other than the spilled one due to the use of nondeterministic expressions. Testing: Added a query test to verify successful execution. Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 --- M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M testdata/workloads/functional-query/queries/QueryTest/spilling.test 3 files changed, 43 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7714/3 -- To view, visit http://gerrit.cloudera.org:8080/7714 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs
Bikramjeet Vig has posted comments on this change. Change subject: IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/7714/2/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1160: // partition index > nit: period at end of sentence for consistency. Done PS2, Line 1163: hashTable > hash_table. Although I don't think we really need this temporary variable i Done Line 1407: // We might be dealing with a rebuilt spilled partition, where all partitions are > Maybe add something to the start of this to explain the broader purpose (ot Done Line 1409: // with it as well > nit: end with period for consistency. Done Line 1426: // right partition > nit: end with period for consistency. Done PS2, Line 1487: AggFnEvaluator::FreeLocalAllocations(partition->agg_fn_evals); Just want to make sure: 1. Will this method do the wrong thing if it runs multiple times on the same partition for our rebuilt spilled partition case? 2. Also, AggFnEvaluator::FreeLocalAllocations executes a lot of code, so will a check like "if(i != partition->idx) continue;" to avoid rerunning it on the same partition help with perf? http://gerrit.cloudera.org:8080/#/c/7714/2/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: Line 352: group by 1,2,3,4,5, random() > nit: add spaces after commas for consistency (this was from my original slo Done Line 354: RUNTIME_PROFILE > It would be nice to have a RESULTS section to verify that the query isn't r Exactly. There is no way to check for consistent results, so I just put a check to see if the query runs to completion. Also, is there any other profile stat you think I should add to the check? -- To view, visit http://gerrit.cloudera.org:8080/7714 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs
Bikramjeet Vig has uploaded a new patch set (#2). Change subject: IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs .. IMPALA-5788: Fix agg node crash when grouping by nondeterministic exprs Fixed a bug where impala crashes during execution of an aggregation query using nondeterministic grouping expressions. This happens when it tries to rebuild a spilled partition that can fit in memory and rows get re-hashed to a partition other than the spilled one due to the use of nondeterministic expressions. Testing: Added a query test to verify successful execution. Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 --- M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-aggregation-node.h M testdata/workloads/functional-query/queries/QueryTest/spilling.test 3 files changed, 43 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/7714/2 -- To view, visit http://gerrit.cloudera.org:8080/7714 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibdb09239577b3f0a19d710b0d148e882b0b73e23 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong