[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8146 ) Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/8146/13/be/src/runtime/tuple.h File be/src/runtime/tuple.h: http://gerrit.cloudera.org:8080/#/c/8146/13/be/src/runtime/tuple.h@175 PS13, Line 175: /// Materialize the var-len string data in this tuple into the provided memory : /// pool. > the name and the comment didn't make it clear that this also rewrites this Renamed to CopyStrings(). Agree it's more accurate. -- To view, visit http://gerrit.cloudera.org:8080/8146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233 Gerrit-Change-Number: 8146 Gerrit-PatchSet: 13 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 05:22:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro
Hello Michael Ho, Thomas Tauber-Marshall, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8146 to look at the new patch set (#14). Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro .. IMPALA-5307: Part 2: copy out strings in uncompressed Avro The approach is to re-materialize strings in those tuples that survive conjunct evaluation and may reference disk I/O buffers directly. This means that perf should not regress for the following cases: * Compressed Avro files. * Non-string columns. * Selective scans where the majority of tuples are filtered out. This approach will also work for the Sequence and Text scanners. Includes some improvements to Avro codegen to replace more constants to help win back some performance (with limited success): replaced InitTuple() with an optimised version and substituted tuple_byte_size() with a constant. Removes dead code for handling CHAR(n) - CHAR(n) is now always fixed length. Perf: Did microbenchmarks on uncompressed Avro files, one with all columns from lineitem and one with only l_comment. Tests were run with: set num_scanner_threads=1; I ran the query 5 times and extracted MaterializeTupleTime from the profile to measure CPU cost of materialization. Overall string materialization got significantly slower, mainly because of the extra memcpy() calls required. Selecting one string from a table with multiple columns: select min(l_comment) from biglineitem_avro 1.814 -> 2.096 Selecting one string from a table with one column: select min(l_comment) from biglineitem_comment; profile; 1.708 -> 3.7 Selecting one string from a table with one column with predicate: select min(l_comment) from biglineitem_comment where length(l_comment) > 1; 1.691 -> 1.449 Selecting all columns: select min(l_orderkey), min(l_partkey), min(l_suppkey), min(l_linenumber), min(l_quantity), min(l_extendedprice), min(l_discount), min(l_tax), min(l_returnflag), min(l_linestatus), min(l_shipdate), min(l_commitdate), min(l_receiptdate), min(l_shipinstruct), min(l_shipmode), min(l_comment) from biglineitem_avro; profile; 2.335 -> 3.711 Selecting an int column (no strings): select min(l_linenumber) from biglineitem_avro 1.806 -> 1.819 Testing: Ran exhaustive tests. Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233 --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/common/status.cc M be/src/common/status.h M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/runtime/CMakeLists.txt M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h A be/src/runtime/tuple-ir.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h 19 files changed, 337 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8146/14 -- To view, visit http://gerrit.cloudera.org:8080/8146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233 Gerrit-Change-Number: 8146 Gerrit-PatchSet: 14 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Hello Sailesh Mukil, Joe McDonnell, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8303 to look at the new patch set (#13). Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. IMPALA-1575: Part 1: eagerly release query exec resources Release of backend resources for query execution on the coordinator daemon should occur eagerly as soon as query execution is finished or cancelled. Before this patch some resources managed by QueryState, like scratch files, were only released when the query was closed and all of the query's control structures were torn down. These resources are referred to as "ExecResources" in various places to distinguish them from resources associated with the client request (like the result cache) that are still required after the query finishes executing. This first change does not solve the admission control problem for two reasons: * We don't release the "admitted" memory on the coordinator until the query is unregistered. * Admission control still considers the memory reserved until the query memtracker is unregistered, which happens only when the QueryState is destroyed: see MemTracker::GetPoolMemReserved(). The flow is mostly similar to initial_reservation_refcnt_, except the coordinator also holds onto a reference count, which is released when either the final row is returned or cancellation is initiated. After the coordinator releases its refcount, the resources can be freed as soon as local fragments also release their refcounts. Also clean up Coordinator slightly by preventing runtime_state() from leaking out to ClientRequestState - instead it's possible to log the query MemTracker by following parent links in the MemTracker. This patch is partially based on Joe McDonnell's IMPALA-1575 patch. Testing: Ran core tests. Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 --- M be/src/common/status.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-state.cc M be/src/runtime/test-env.cc M be/src/service/client-request-state.cc 11 files changed, 154 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8303/13 -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 13 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 11: (8 comments) http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc@328 PS11, Line 328: RuntimeState* state > definitely not this change, but should we just get rid of this parameter. d I think the status propagation is a lot more reliable that it used to be. One potential advantage of logging it is if we hit the memory limit in multiple places around the same time we would get more diagnostics in the log instead of whichever error won the relate. Unclear if that's useful or important. We could also consider making the memtracker dump in the status more concise - it's pretty verbose now. http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc@359 PS11, Line 359: was not available > does that comment still make sense? isn't the query_tracker==nullptr case n We do call it on the process MemTracker in QueryState::Init(). http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@205 PS11, Line 205: backend > what do we mean by "backend" here? is that the same thing as "execution"? Yeah I don't think that word added anything. Removed. http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@206 PS11, Line 206: Resources for this query > How about: Query wide resources ... Done http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@258 PS11, Line 258: backend > same question Done http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc@95 PS11, Line 95: if (!released_exec_resources_) ReleaseExecResources(); > we've been trying to move away from doing resource releasing from destructo My initial concern was that handling it in Init() would get messy in the case when the coordinator was already holding a resource refcount. I came up with a reasonable solution - acquiring the refcount always and then releasing it on the error path. http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc@122 PS11, Line 122: TExecQueryFInstancesParams& non_const_params = I needed to remove this declaration because of the goto crossing the declaration. http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/runtime-state.cc@88 PS11, Line 88: local_query_state_->AcquireExecResourceRefcount(); // Decremented in ReleaseResources(). > why is that needed? what resource needs to be held by the runtime state in The resources used for evaluating exprs, etc - this is used in various test environments and in fe-support. We could also expose QueryState::ReleaseExecResources() and call it directly, but it seemed better to use the same API here as in the main query execution flow. -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 11 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 04:39:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Hello Sailesh Mukil, Joe McDonnell, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8303 to look at the new patch set (#12). Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. IMPALA-1575: Part 1: eagerly release query exec resources Release of backend resources for query execution on the coordinator daemon should occur eagerly as soon as query execution is finished or cancelled. Before this patch some resources managed by QueryState, like scratch files, were only released when the query was closed and all of the query's control structures were torn down. These resources are referred to as "ExecResources" in various places to distinguish them from resources associated with the client request (like the result cache) that are still required after the query finishes executing. This first change does not solve the admission control problem for two reasons: * We don't release the "admitted" memory on the coordinator until the query is unregistered. * Admission control still considers the memory reserved until the query memtracker is unregistered, which happens only when the QueryState is destroyed: see MemTracker::GetPoolMemReserved(). The flow is mostly similar to initial_reservation_refcnt_, except the coordinator also holds onto a reference count, which is released when either the final row is returned or cancellation is initiated. After the coordinator releases its refcount, the resources can be freed as soon as local fragments also release their refcounts. Also clean up Coordinator slightly by preventing runtime_state() from leaking out to ClientRequestState - instead it's possible to log the query MemTracker by following parent links in the MemTracker. This patch is partially based on Joe McDonnell's IMPALA-1575 patch. Testing: Ran core tests. Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 --- M be/src/common/status.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-state.cc M be/src/runtime/test-env.cc M be/src/service/client-request-state.cc 11 files changed, 150 insertions(+), 97 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8303/12 -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 12 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8404 ) Change subject: IMPALA-5017: Error on decimal overflow .. Patch Set 1: Ooops, canceled -- To view, visit http://gerrit.cloudera.org:8080/8404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Gerrit-Change-Number: 8404 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 27 Oct 2017 03:32:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8404 ) Change subject: IMPALA-5017: Error on decimal overflow .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1402/ -- To view, visit http://gerrit.cloudera.org:8080/8404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Gerrit-Change-Number: 8404 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 27 Oct 2017 03:32:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1400/ -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 5 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 02:44:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5017: Error on decimal overflow
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8404 Change subject: IMPALA-5017: Error on decimal overflow .. IMPALA-5017: Error on decimal overflow Before this patch, decimal operations would either silently overflow (in the case of sum() and avg()), or produce a warning. In this patch, the beharior is changed so that an error is produced in the case of overflow when DECIMAL_v2 is enabled. Decimal v1 behavior is unchanged. We introduce overflow checks when computing sum() and avg(). This results in a ~30% performance regression when we are in decimal v2 mode compared to decimal v1. Benchmarks: Query: select sum(dec_38_19) from decimal_tbl Decimal v1: 13.09s Decimal v2: 17.10s Query: select avg(dec_38_19) from decimal_tbl Decimal v1: 13.60s Decimal v2: 19.10s Testing: - Added several end to end tests. - Updated Expr tests to check for error in case of overflow. Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/expr-codegen-test.cc M be/src/exprs/expr-test.cc M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test M testdata/workloads/functional-query/queries/QueryTest/decimal.test 6 files changed, 186 insertions(+), 69 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/8404/1 -- To view, visit http://gerrit.cloudera.org:8080/8404 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id98a92c9a9469ec8cf14e518c741a2dab7053019 Gerrit-Change-Number: 8404 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Hello Lars Volker, Matthew Jacobs, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7822 to look at the new patch set (#7). Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. IMPALA-2494: Support for byte array encoded decimals in Parquet scanner Extendes parquet column reader and associated classes to allow for more than one possible physical type for a given logical type. This patch only adds support for variable sized byte array encoded decimals and more will be added in upcoming commits. Also, column level metadata verification which was currently being done per row group will now only be done once per column per file. Testing: Added backend test for verifying newly added decimal types are decoded correctly. Added Query test that decodes both plain and dictionary-encoded decimals using binary encoding. Performance: Initial perf testing using tpcds_1000 shows no regression. Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e --- M be/src/exec/data-source-scan-node.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M be/src/exec/parquet-plain-test.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc A testdata/data/binary_decimal_dictionary.parquet A testdata/data/binary_decimal_no_dictionary.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-formats.test M tests/query_test/test_scanners.py 18 files changed, 633 insertions(+), 283 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/7822/7 -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 7 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2494: Support for byte array encoded decimals in Parquet scanner
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/7822 ) Change subject: IMPALA-2494: Support for byte array encoded decimals in Parquet scanner .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-column-readers.cc@207 PS6, Line 207: InternalType, parquet::Type::type PARQUET_TYPE > you should explain these template parameters Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@40 PS6, Line 40: IMPALA_TO_PARQUET_TYPES I think it makes sense to rename this to INTERNAL_TO_PARQUET_TYPES to be consistent with our naming of template parameters. http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@88 PS6, Line 88: InternalType > let's explain what that is Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@177 PS6, Line 177: static int DecodeByParquetType(const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size, > nit: long line Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-common.h@333 PS6, Line 333: inline int EncodeDecimal(const T& v, int fixed_len_size, uint8_t* buffer){ > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc File be/src/exec/parquet-plain-test.cc: http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc@34 PS6, Line 34: int > int32_t for consistency? Done http://gerrit.cloudera.org:8080/#/c/7822/6/be/src/exec/parquet-plain-test.cc@79 PS6, Line 79: LOGICAL_TYPE > LogicalType here and below Done -- To view, visit http://gerrit.cloudera.org:8080/7822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2c0e881045109f337fecba53fec21f9cfb9e619e Gerrit-Change-Number: 7822 Gerrit-PatchSet: 6 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 27 Oct 2017 00:32:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. Patch Set 14: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 14 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Fri, 27 Oct 2017 00:19:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork Impala currently kinits by forking off a child process. This has proved to be expensive in many cases since the subprocess tries to reserve as much memory as Impala is currently using which can be quite a lot. This patch adds a flag called 'use_kudu_kinit' that defaults to true. When it's true, it uses the Kudu security library's kinit code that programatically uses the krb5 library to kinit. When it's false, we run our current path which kicks off the kinit-thread and forks off a kinit process periodically to reacquire tickets based on FLAGS_kerberos_reinit_interval. Converted existing tests in thrift-server-test to run with and without kerberos. We now run this BE test with kerberos by using Kudu's MiniKdc utility. This introduces a new dependency on some kerberos binaries that are checked through FindKerberosPrograms.cmake. Note that this is only a test dependency and not a dependency for the impalad binaries and friends. Compilation will still succeed if the kerberos binaries for the MiniKdc are not found, however, the thrift-server-test will fail. We run with and without the 'use_kudu_kinit' flag. TODO: Since the setting up and tearing down of our security code isn't idempotent, we can run only any one test in a process with Kerberos now (IMPALA-6085). Updated bin/bootstrap_system.sh to install new sasl-gssapi modules and the kerberos binaries required for the MiniKdc. Also fixed a bug that didn't transfer the environment into 'sudo' in bin/bootstrap_system.sh. Testing: Verified with thrift-server-test and also manually on a live kerberized cluster. Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Reviewed-on: http://gerrit.cloudera.org:8080/7938 Reviewed-by: Sailesh MukilTested-by: Impala Public Jenkins --- M CMakeLists.txt M be/src/exec/kudu-util.h M be/src/kudu/security/CMakeLists.txt M be/src/kudu/security/test/mini_kdc.cc M be/src/rpc/CMakeLists.txt M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M bin/bootstrap_system.sh A cmake_modules/FindKerberosPrograms.cmake 10 files changed, 234 insertions(+), 21 deletions(-) Approvals: Sailesh Mukil: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 15 Gerrit-Owner: Sailesh Mukil Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5307: Part 2: copy out strings in uncompressed Avro
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8146 ) Change subject: IMPALA-5307: Part 2: copy out strings in uncompressed Avro .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/8146/13/be/src/runtime/tuple.h File be/src/runtime/tuple.h: http://gerrit.cloudera.org:8080/#/c/8146/13/be/src/runtime/tuple.h@175 PS13, Line 175: /// Materialize the var-len string data in this tuple into the provided memory : /// pool. the name and the comment didn't make it clear that this also rewrites this tuple. i.e. in that sense, this is different than MaterializeExprs(). In this case, the slots (and strings) are already materialized -- it's re-materializing (or copying) the strings, right? -- To view, visit http://gerrit.cloudera.org:8080/8146 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1fc78790d778c874f5aafa5958c3c045a88d233 Gerrit-Change-Number: 8146 Gerrit-PatchSet: 13 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 23:46:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc@328 PS11, Line 328: RuntimeState* state definitely not this change, but should we just get rid of this parameter. do we really need to explicitly do the LogError() below? If we still rely on that then it seems we should fix status propagation. http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/mem-tracker.cc@359 PS11, Line 359: was not available does that comment still make sense? isn't the query_tracker==nullptr case now mean that it's part of the query hierarchy? though should we even call this function in that case? http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@205 PS11, Line 205: backend what do we mean by "backend" here? is that the same thing as "execution"? http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@206 PS11, Line 206: Resources for this query How about: Query wide resources ... since some FIS-local resources are also resources used for this query but those should be owned and released solely by the FIS thread. http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.h@258 PS11, Line 258: backend same question http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/query-state.cc@95 PS11, Line 95: if (!released_exec_resources_) ReleaseExecResources(); we've been trying to move away from doing resource releasing from destructors, to make the lifetimes super clear. shouldn't this case just be handled in the caller of Init() when !status.ok() (which already has to handle releasing the QS ref count, and the thread creation path already handles this case. we could similarly release the resource ref count if we acquired that sooner before Init can fail). http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/runtime-state.cc File be/src/runtime/runtime-state.cc: http://gerrit.cloudera.org:8080/#/c/8303/11/be/src/runtime/runtime-state.cc@88 PS11, Line 88: local_query_state_->AcquireExecResourceRefcount(); // Decremented in ReleaseResources(). why is that needed? what resource needs to be held by the runtime state in this case? -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 11 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 23:26:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5607: Add additional units to EXTRACT, DATE PART, TRUNC
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8311 ) Change subject: IMPALA-5607: Add additional units to EXTRACT, DATE_PART, TRUNC .. Patch Set 2: Looks like we want NANOSECONDS after all, so the BIGINT change is needed. -- To view, visit http://gerrit.cloudera.org:8080/8311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If23ea310ddcc5c1c806b2b1ac24d24a4bd6aa4d9 Gerrit-Change-Number: 8311 Gerrit-PatchSet: 2 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 26 Oct 2017 23:24:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1400/ -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 5 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 23:15:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 5 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 23:15:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5607: Add additional units to EXTRACT, DATE PART, TRUNC
Zach Amsden has posted comments on this change. ( http://gerrit.cloudera.org:8080/8311 ) Change subject: IMPALA-5607: Add additional units to EXTRACT, DATE_PART, TRUNC .. Patch Set 2: (6 comments) Overall this looks good - and thank you for the tests. It is worth debating whether NANOSECONDs should be supported at all - let's do that in the JIRA. http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/expr-test.cc@5959 PS2, Line 5959: TestStringValue( I don't think the additional test cases add any value unless you add greater precision, i.e. "cast(trunc(cast('2012-09-10 07:02:03.1234' as timestamp), 'SS') as string)" http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc File be/src/exprs/udf-builtins-ir.cc: http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc@120 PS2, Line 120: return time.fractional_seconds() + time.seconds() * 1000 * 1000 * 1000; > The seconds field, including fractional parts, multiplied by 1 000 000 000; Use NANOS_PER_SEC http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc@125 PS2, Line 125: return ExtractNanosecond(time) / 1000; Instead of dividing the result of another function, these should be implement directly, i.e. time.fractional_seconds() / NANOS_PER_MICRO + time.seconds() * MICROS_PER_SEC http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc@130 PS2, Line 130: return ExtractMicrosecond(time) / 1000; same comment applies http://gerrit.cloudera.org:8080/#/c/8311/2/be/src/exprs/udf-builtins-ir.cc@154 PS2, Line 154: BigIntVal > Return type should be changed from INT to BIGINT because INT data type's ra We only need to change this if we actually implement nanosecond units for EXTRACT; no other referenced system in the JIRA currently supports that, so it is worth debating whether we should or not. http://gerrit.cloudera.org:8080/#/c/8311/2/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: http://gerrit.cloudera.org:8080/#/c/8311/2/common/thrift/Exprs.thrift@92 PS2, Line 92: MILLISECONDS, We shouldn't have redundant field definitions here; instead we should be lenient with parsing and convert MILISECONDS to MILISECOND. -- To view, visit http://gerrit.cloudera.org:8080/8311 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If23ea310ddcc5c1c806b2b1ac24d24a4bd6aa4d9 Gerrit-Change-Number: 8311 Gerrit-PatchSet: 2 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 26 Oct 2017 23:13:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.h@482 PS4, Line 482: bool output_null_aware_probe_rows_running_; > Longer-term we should think about ways to reduce the sheer number of statef Yeah many of the members are just used by one state. I think we should summarize the states into enums or put the stateful members into std::variant/boost::variant altogether. http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@949 PS4, Line 949: RowBatch null_build_batch(child(1)->row_desc(), state->batch_size(), mem_tracker()); > This can fit on one line. Done http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@962 PS4, Line 962: RETURN_IF_CANCELLED(state); > We should add a RETURN_IF_CANCELLED() check here, similar to the one below Done http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@1074 PS4, Line 1074: RowBatch build_batch(child(1)->row_desc(), state->batch_size(), mem_tracker()); > This can fit on one line. Done -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 5 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 22:55:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Hello Thomas Tauber-Marshall, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8226 to look at the new patch set (#5). Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. IMPALA-2758: Remove BufferedTupleStream::GetRows This patch removes BufferedTupleStream::GetRows. This function pins a stream and reads all the rows into a single batch. It is not a good API since it creates an arbitrarily large row batch. In this patch the call sites pin the stream and then directly use GetNext to retrieve a single batch at a time. Testing: It passes existing tests. A test case for GetRows is removed. Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 --- M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h 5 files changed, 63 insertions(+), 99 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8226/5 -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 5 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 11: Code-Review+1 (1 comment) Thanks for the explanations. This LGTM. http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095 PS10, Line 1095: query_state_ != nullptr) > Sure, so what you are saying is that query_state_ being not null is not an Makes sense. Better not to make decisions based on non-contractual current implementation details that may change at any point. Seems like we have the same pattern for a lot of other objects too. -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 11 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 22:49:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4236: Codegen CopyRows() for select nodes
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8196 ) Change subject: IMPALA-4236: Codegen CopyRows() for select nodes .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/8196/9/tests/query_test/test_codegen.py File tests/query_test/test_codegen.py: http://gerrit.cloudera.org:8080/#/c/8196/9/tests/query_test/test_codegen.py@52 PS9, Line 52: assert_codegen_enabled(result.runtime_profile, [1]) > Do we already have tests that check for codegen failed? I will try something which involves a char column. -- To view, visit http://gerrit.cloudera.org:8080/8196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0d496d004418468e16b6f564f90f45ebbf87c1e Gerrit-Change-Number: 8196 Gerrit-PatchSet: 11 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 22:48:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. Patch Set 4: Code-Review+2 (4 comments) Looks good, just did a final pass and caught a couple more things. If you fix those and rebase then I can start the merge. http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.h@482 PS4, Line 482: bool output_null_aware_probe_rows_running_; Longer-term we should think about ways to reduce the sheer number of stateful member variables in this class and make the state machine in GetNext() easier to understand. http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@949 PS4, Line 949: RowBatch null_build_batch(child(1)->row_desc(), state->batch_size(), This can fit on one line. http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@962 PS4, Line 962: null_build_batch.Reset(); We should add a RETURN_IF_CANCELLED() check here, similar to the one below (I think this was an oversight in an earlier patch). http://gerrit.cloudera.org:8080/#/c/8226/4/be/src/exec/partitioned-hash-join-node.cc@1074 PS4, Line 1074: RowBatch build_batch(child(1)->row_desc(), state->batch_size(), This can fit on one line. -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 22:22:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095 PS10, Line 1095: query_state_ != nullptr) > But if the query_state_ were null, then we wouldn't release the resources, That would require that query_state_ was not initialised but we acquired a resource refcount. With the code as it is now, it's trivial to convince ourselves that it's not possible. I can remove the null check but my concern was that it's a brittle assumption that there are no failure paths where a reference to the query_state_ is not obtained. I had to look at the implementation of two functions across two files to convince myself that it was true. It seems more like to me that code would move around to invalidate the second assumption than the first. -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 22:14:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8401 ) Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml File docs/topics/impala_ssl.xml: http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml@158 PS1, Line 158: tlsv1: Allow any TLS version of 1.0 or higher. mention this the default may be? -- To view, visit http://gerrit.cloudera.org:8080/8401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e Gerrit-Change-Number: 8401 Gerrit-PatchSet: 1 Gerrit-Owner: John RussellGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Russell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 22:06:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095 PS10, Line 1095: query_state_ != nullptr) > Your reasoning appears to be correct. My tendency is to keep the null check But if the query_state_ were null, then we wouldn't release the resources, which in itself is a bug. So, wouldn't it be better to know that such a bug existed? -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 22:05:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 11: Code-Review+1 (4 comments) Thanks for the comments - I had to think hard about each one of them. http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095 PS10, Line 1095: query_state_ != nullptr) > How would query_state_ be NULL if the coordinator hasn't release its refere Your reasoning appears to be correct. My tendency is to keep the null check though - usually unnecessary checks like this just add confusion but on cleanup logic I find it's a good practice to write defensive code like this since code tends to change, it's hard to reason about the different possible failure modes and test coverage isn't always complete. I'm open to removing it though, I think it would need a comment to explain why it's valid though. http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/mem-tracker.cc@322 PS10, Line 322: while (tracker != nullptr && !tracker->is_query_mem_tracker_) { : tracker = tracker->parent_; : } > What's the thread safety story here? What if a any one of the trackers touc The parent_ pointer is never modified so the only race possible is if intervening MemTrackers are destroyed. This is not possible since the MemTracker hierarchy is now all torn down at once when the query ends. The thing about the parent pointer isn't documented anywhere so I made the pointer const and added a comment. http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc@96 PS10, Line 96: if (query_mem_tracker_ != nullptr) { : // Disconnect the query MemTracker hierarchy from the global hierarchy. After this : // point nothing must touch this query's MemTracker and all tracked memory associated : // with the query must be released. The whole query subtree of MemTrackers can : // therefore be safely destroyed. : > I'm still not clear why this moved from ReleaseExecResources() to ~QuerySta It happens at the same time as before - previously ReleaseResources() ran immediately before the destructor. After this patch everything in ReleaseExecResources() can happen much sooner before the destructor. However this one action of disconnecting the query MemTracker couldn't be moved earlier. I tried to improve the comment a bit to emphasise that this was disconnecting a control structure. I also thought about making a separate Close() method that did this kind of teardown, but it seemed unnecessary given that it would run immediately before the destructor. I.e. query_state->Close(); delete query_state http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc@135 PS10, Line 135: initial_reservations_->Init(query_id(), rpc_params.min_reservation_bytes)); > Can we move this to the start of Init() and make L95 a DCHECK? I like the idea but after trying it out I think it's significantly complicated by the fact that the coordinator grabs a resource refcount without calling Init() - in some cases when Init() fails the refcount is already non-zero so it wouldn't be valid to release the resources yet. -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 11 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 21:59:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Hello Sailesh Mukil, Joe McDonnell, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8303 to look at the new patch set (#11). Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. IMPALA-1575: Part 1: eagerly release query exec resources Release of backend resources for query execution on the coordinator daemon should occur eagerly as soon as query execution is finished or cancelled. Before this patch some resources managed by QueryState, like scratch files, were only released when the query was closed and all of the query's control structures were torn down. These resources are referred to as "ExecResources" in various places to distinguish them from resources associated with the client request (like the result cache) that are still required after the query finishes executing. This first change does not solve the admission control problem for two reasons: * We don't release the "admitted" memory on the coordinator until the query is unregistered. * Admission control still considers the memory reserved until the query memtracker is unregistered, which happens only when the QueryState is destroyed: see MemTracker::GetPoolMemReserved(). The flow is mostly similar to initial_reservation_refcnt_, except the coordinator also holds onto a reference count, which is released when either the final row is returned or cancellation is initiated. After the coordinator releases its refcount, the resources can be freed as soon as local fragments also release their refcounts. Also clean up Coordinator slightly by preventing runtime_state() from leaking out to ClientRequestState - instead it's possible to log the query MemTracker by following parent links in the MemTracker. This patch is partially based on Joe McDonnell's IMPALA-1575 patch. Testing: Ran core tests. Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-state.cc M be/src/runtime/test-env.cc M be/src/service/client-request-state.cc 10 files changed, 129 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/8303/11 -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 11 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. IMPALA-5429: Multi threaded block metadata loading Implements multi threaded block metadata loading on the Catalog server where we fetch block metadata for multiple partitions of a single table in parallel. Number of threads to load the metadata is controlled by the following two parameters (set on the Catalog server startup and applies for each table load) -max_hdfs_partitions_parallel_load(default=5) -max_nonhdfs_partitions_parallel_load(default=20) We use different thread pool sizes for HDFS and non-HDFS tables since non-HDFS supports much higher throughput of RPC calls for listStatus /listFiles. Based on our experiments, S3 showed a linear speed up (up to ~113x) with increasing number of loading threads where as the HDFS throughput was limited to ~5x in un-secure clusters and up to ~3.7x in secure clusters. We narrowed it down to scalability bottlenecks in HDFS RPC implementation (HADOOP-14558) on both the server and the client side. One thing to note here is that the thread pool based metadata fetching is implemented only for loading HDFS block metadata and not for loading HMS partition information. Our experiments showed that while loading large partitioned tables, ~90% of the time is spent in connecting to NN and loading the HDFS block information and optimizing the rest ~10% makes the code unnecessarily complex without much gain. Additional notes: - The multithreading approach is implemented for * INVALIDATE (loading from scratch), * REFRESH (reusing existing md) code paths, * ALTER TABLE ADD/RECOVER PARTITIONS. - This patch makes the implementation of ListMap thread-safe since we use that data structure as a shared state between multiple partition metadata loding threads. Testing and Results: - This patch doesn't add any new tests since there is enough test coverage already. Passed core/exhaustive runs with HDFS/S3. - We noticed up to ~113x speedup on S3 tables(thread_pool_size=160) and up to ~5x speed up in un-secure HDFS clusters and ~3.7x in secure HDFS clusters. - Synthesized the following two large tables on HDFS and S3 and noticed significant reduction in my test DDL queries. (1) 100K partitions + 1 million files (2) 80 partitions + 250K files 100K-PARTITIONS-1M-FILES-CUSTOM-11-REFRESH-PARTITION I -16.4% 100K-PARTITIONS-1M-FILES-CUSTOM-08-ADD-PARTITION I -17.25% 80-PARTITIONS-250K-FILES-11-REFRESH-PARTITIONI -23.57% 80-PARTITIONS-250K-FILES-S3-08-ADD-PARTITION I -23.87% 80-PARTITIONS-250K-FILES-09-INVALIDATE I -24.88% 80-PARTITIONS-250K-FILES-03-RECOVER I -35.90% 80-PARTITIONS-250K-FILES-07-REFRESH I -43.03% 100K-PARTITIONS-1M-FILES-CUSTOM-12-QUERY-PARTITIONS I -43.93% 100K-PARTITIONS-1M-FILES-CUSTOM-05-QUERY-AFTER-INV I -46.59% 80-PARTITIONS-250K-FILES-10-REFRESH-AFTER-ADD-PARTITION I -48.71% 100K-PARTITIONS-1M-FILES-CUSTOM-07-REFRESH I -49.02% 80-PARTITIONS-250K-FILES-05-QUERY-AFTER-INV I -49.05% 100K-PARTITIONS-1M-FILES-CUSTOM-10-REFRESH-AFTER-ADD-PARTI -51.87% 80-PARTITIONS-250K-FILES-S3-03-RECOVER I -67.17% 80-PARTITIONS-250K-FILES-S3-05-QUERY-AFTER-INV I -76.45% 80-PARTITIONS-250K-FILES-S3-07-REFRESH I -87.04% 80-PARTITIONS-250K-FILES-S3-10-REFRESH-AFTER-ADD-PARTI -88.57% Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Reviewed-on: http://gerrit.cloudera.org:8080/8235 Reviewed-by: Bharath VissapragadaTested-by: Impala Public Jenkins --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/ListMap.java 9 files changed, 462 insertions(+), 242 deletions(-) Approvals: Bharath Vissapragada: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 12 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 11 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 26 Oct 2017 21:56:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8401 ) Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml File docs/topics/impala_ssl.xml: http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml@145 PS1, Line 145: impalad, impalad, : and impalad I assume you meant catalogd and statestored in 2 of these? -- To view, visit http://gerrit.cloudera.org:8080/8401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e Gerrit-Change-Number: 8401 Gerrit-PatchSet: 1 Gerrit-Owner: John RussellGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Russell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 21:41:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8366 ) Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than the footer size .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da Gerrit-Change-Number: 8366 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 21:32:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8366 ) Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than the footer size .. IMPALA-1291: Parquet read fails if io buffer size is less than the footer size This commit introduces the compile-time constant READ_SIZE_MIN_VALUE in the newly created common/global-flags.h A static_assert checks if READ_SIZE_MIN_VALUE is greater than or equal to HdfsParquetScanner::FOOTER_SIZE. Also, moved the definition of read_size flag to global-flags.cc, and declared it in global-flags.h. Runtime checks test if read_size is greater than or eaqual to READ_SIZE_MIN_VALUE. If not, the process is terminated. Change-Id: Ib7364147e7daf5380f11368871f8479646b448da Reviewed-on: http://gerrit.cloudera.org:8080/8366 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/common/global-flags.cc A be/src/common/global-flags.h M be/src/common/init.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/runtime/disk-io-mgr.cc M be/src/util/backend-gflag-util.cc 6 files changed, 57 insertions(+), 9 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da Gerrit-Change-Number: 8366 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8401 ) Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml File docs/topics/impala_ssl.xml: http://gerrit.cloudera.org:8080/#/c/8401/1/docs/topics/impala_ssl.xml@177 PS1, Line 177: using the --ssl_cipher_list configuration setting Elaborate on the default state a little bit: "Default is empty and then Impala uses the default cipher list for that platform." -- To view, visit http://gerrit.cloudera.org:8080/8401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e Gerrit-Change-Number: 8401 Gerrit-PatchSet: 1 Gerrit-Owner: John RussellGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Russell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 21:16:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 8: > > Patch Set 7: > > > > > > Patch Set 7: > > > > > > > > Perf results: > > > > ... > > > > > > I'm surprised that only a few queries saw significant > speedups. Is > > > this in line with what you saw with Parquet runtime filters on > > > TPC-H? Or are we losing a lot by using min/max instead of > bloom or > > > in-list style filters? > > > > Not sure about bloom filters perf, though I can run those numbers > for comparison. > > I haven't looked at this patch, but had a question about the > design: > > Are we still pushing blooms across a join to prevent shuffling of > data? Or are we now pushing _only_ min/max? > > It seems there is value in pushing both: the bloom for evaluation > on the other side of the join to prevent shuffling, and the min/max > to push all the way to the scanner to reduce I/O. > > Not sure if the patch is already doing this. Impala only evaluates runtime filters in the scan. Even prior to this patch, the Kudu scanner was not evaluating bloom filters (and hash joins with Kudu scan targets don't build bloom filters). It certainly could be useful to evaluate bloom filters on the Impala side of a Kudu scan, but I believe our thinking was that it wasn't worth it to implement that - better to just wait until bloom filters can be pushed all the way down into Kudu. If bloom filters in Kudu are a long way off, though, we should maybe reevaluate that. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 8 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 26 Oct 2017 20:54:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options
John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8401 ) Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options .. Patch Set 1: I added a couple of questions under IMPALA-6065 for things I wasn't sure about. -- To view, visit http://gerrit.cloudera.org:8080/8401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e Gerrit-Change-Number: 8401 Gerrit-PatchSet: 1 Gerrit-Owner: John RussellGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: John Russell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 20:53:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 8: (10 comments) > I noticed that - BloomFilterBytes is always 0 in the query > profiles. > Can you please veirfy? Yes. This line is always printed to the profile, even if there are no bloom filters being built. We don't expose the mem used by min-max filters in the profile as its negligible. http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7793/7//COMMIT_MSG@26 PS7, Line 26: > Might be worth mentioning why the runtime filters were renumbered in all th Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h File be/src/exec/filter-context.h: http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/exec/filter-context.h@106 PS7, Line 106: Status CloneFrom(const FilterContext& from, ObjectPool* pool, RuntimeState* state, > Long lines Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@140 PS7, Line 140: virtual void* GetMax() override { return _; } > Nit: not sure why this is using underscores while AlwaysTrue() is using cam Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.h@176 PS7, Line 176: StringValue min_; > Can you briefly mention what it means when these are empty - that the value Done http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@291 PS6, Line 291: out->__isset.min = true; > Oh ok, thanks for clarifying. It looked superficially like a last line copy Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc File be/src/util/min-max-filter.cc: http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@137 PS7, Line 137: DCHECK(thrift.__isset.max); > Do we have end-to-end tests for these code paths? I think we could generall I've now added e2e and unit tests for all of the truncation scenarios. http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143 PS7, Line 143: CopyToBuffer(_buffer_, _, max_.len); > This has a fairly large number of edge cases - it would be good to unit tes Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@143 PS7, Line 143: uff > static_cast instead of int()? Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@150 PS7, Line 150: > I would have expected that we would modify the trailing values that overflo Done http://gerrit.cloudera.org:8080/#/c/7793/7/be/src/util/min-max-filter.cc@207 PS7, Line 207: out->min.__set_string_val(in.min.string_val); > It seems tricky to test these various error paths in end-to-end tests. Coul As you say, its difficult to test the oom scenario in an e2e test, since the max amount of mem being used here is so small, but its covered with a unit test now. -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 8 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 26 Oct 2017 20:52:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5473: [DOCS] Document TLS min version & cipher options
John Russell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8401 Change subject: IMPALA-5473: [DOCS] Document TLS min version & cipher options .. IMPALA-5473: [DOCS] Document TLS min version & cipher options Under the doc JIRA IMPALA-6065. Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e --- M docs/topics/impala_ssl.xml 1 file changed, 52 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/8401/1 -- To view, visit http://gerrit.cloudera.org:8080/8401 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia1705262f8c01e38c616541d1c48f5d0cad5498e Gerrit-Change-Number: 8401 Gerrit-PatchSet: 1 Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Hello Michael Ho, Lars Volker, Matthew Jacobs, Anonymous Coward #345, Tim Armstrong, Todd Lipcon, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7793 to look at the new patch set (#8). Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. IMPALA-4252: Min-max runtime filters for Kudu This patch implements min-max filters for runtime filters. Each runtime filter generates a bloom filter or a min-max filter, depending on if it has HDFS or Kudu targets, respectively. In RuntimeFilterGenerator in the planner, each partitioned hash join generates a bloom and min-max filter, but only those filters that end up being assigned to a target make it into the final plan. Min-max filters are only assigned to Kudu scans if the target expr is a column, as Kudu doesn't support bounds on general exprs, and only if the join op is '=' and not 'is distinct from', as Kudu doesn't support returning NULLs if a bound is set. Min-max filters are inserted into by the PartitionedHashJoinBuilder. Codegen is used to eliminate branching on the type of filter. String min-max filters truncate their bounds at 1024 chars, so that the max amount of memory used by min-max filters is negligible. For now, min-max filters are only applied at the KuduScanner, which passes them into the Kudu client. Future work will address applying min-max filters at HDFS scan nodes and applying bloom filters at Kudu scan nodes. Functional Testing: - Added new planner tests and updated the old ones. (in old tests, a lot of runtime filters are renumbered as we always generate min-max filters even if they don't end up getting assigned and they take up some of the RF ids). - Updated existing runtime filter tests to work with Kudu. - Added e2e tests for min-max filter specific functionality. Perf Testing: - All tests run on Kudu stress cluster (10 nodes) and tpch_100_kudu, timings are averages of 3 runs. - Ran a contrived query with a filter that does eliminate any rows (full self join of lineitem). The difference in running time was negligible - 24.46s with filters on, 24.15s with filters off for a ~1% slowdown. - Ran a contrived query with a filter that elimiates all rows (self join on lineitem with a join condition that never matches). The filters resulted in a significant speedup - 0.26s with filters on, 1.46s with filters off for a ~5.6x speedup. This query is added to targeted-perf. Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-mt.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exec/partitioned-hash-join-builder-ir.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator-filter-state.h M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-filter-bank.cc M be/src/runtime/runtime-filter-bank.h M be/src/runtime/runtime-filter-ir.cc M be/src/runtime/runtime-filter.cc M be/src/runtime/runtime-filter.h M be/src/runtime/runtime-filter.inline.h M be/src/runtime/timestamp-value.h M be/src/service/impala-internal-service.cc M be/src/util/CMakeLists.txt A be/src/util/min-max-filter-ir.cc A be/src/util/min-max-filter-test.cc A be/src/util/min-max-filter.cc A be/src/util/min-max-filter.h M common/thrift/Data.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M
[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/7793 ) Change subject: IMPALA-4252: Min-max runtime filters for Kudu .. Patch Set 7: I know that He Lifu at NetEase also prototyped some support for pushing blooms down all the way into Kudu (not just the Impala scanner). On his TPCH benchmarks he got a ~50% speedup on Q17 and Q18, Q9, Q8, which I don't see in the results here. I wonder if this is due to a different partitioning scheme or whether the big gain actually comes from pushing all the way down into the Kudu scanner. Any idea, given the plans for those queries? -- To view, visit http://gerrit.cloudera.org:8080/7793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c Gerrit-Change-Number: 7793 Gerrit-PatchSet: 7 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Anonymous Coward #345 Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 26 Oct 2017 20:35:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. Patch Set 14: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1398/ -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 14 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 26 Oct 2017 20:29:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. Patch Set 14: Code-Review+2 Fix minor clang-tidy issues. Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 14 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 26 Oct 2017 20:29:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Hello Michael Ho, Michael Brown, Jim Apple, Bikramjeet Vig, Dan Hecht, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7938 to look at the new patch set (#14). Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork Impala currently kinits by forking off a child process. This has proved to be expensive in many cases since the subprocess tries to reserve as much memory as Impala is currently using which can be quite a lot. This patch adds a flag called 'use_kudu_kinit' that defaults to true. When it's true, it uses the Kudu security library's kinit code that programatically uses the krb5 library to kinit. When it's false, we run our current path which kicks off the kinit-thread and forks off a kinit process periodically to reacquire tickets based on FLAGS_kerberos_reinit_interval. Converted existing tests in thrift-server-test to run with and without kerberos. We now run this BE test with kerberos by using Kudu's MiniKdc utility. This introduces a new dependency on some kerberos binaries that are checked through FindKerberosPrograms.cmake. Note that this is only a test dependency and not a dependency for the impalad binaries and friends. Compilation will still succeed if the kerberos binaries for the MiniKdc are not found, however, the thrift-server-test will fail. We run with and without the 'use_kudu_kinit' flag. TODO: Since the setting up and tearing down of our security code isn't idempotent, we can run only any one test in a process with Kerberos now (IMPALA-6085). Updated bin/bootstrap_system.sh to install new sasl-gssapi modules and the kerberos binaries required for the MiniKdc. Also fixed a bug that didn't transfer the environment into 'sudo' in bin/bootstrap_system.sh. Testing: Verified with thrift-server-test and also manually on a live kerberized cluster. Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 --- M CMakeLists.txt M be/src/exec/kudu-util.h M be/src/kudu/security/CMakeLists.txt M be/src/kudu/security/test/mini_kdc.cc M be/src/rpc/CMakeLists.txt M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M bin/bootstrap_system.sh A cmake_modules/FindKerberosPrograms.cmake 10 files changed, 234 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7938/14 -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 14 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. Patch Set 13: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1394/ -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 13 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 26 Oct 2017 19:34:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1575: Part 1: eagerly release query exec resources
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/8303 ) Change subject: IMPALA-1575: Part 1: eagerly release query exec resources .. Patch Set 10: (4 comments) Just have a few minor comments. http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/coordinator.cc@1095 PS10, Line 1095: query_state_ != nullptr) How would query_state_ be NULL if the coordinator hasn't release its reference to it yet? We always create a QueryState for a Coordinator and there's no point of failure between when a Coordinator object is created and a QueryState for that Coordinator object is created. http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/mem-tracker.cc@322 PS10, Line 322: while (tracker != nullptr && !tracker->is_query_mem_tracker_) { : tracker = tracker->parent_; : } What's the thread safety story here? What if a any one of the trackers touched here are unregistered from their parent in the middle of this loop? http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc@96 PS10, Line 96: if (query_mem_tracker_ != nullptr) { : // After this point nothing should be touching this query's MemTracker and all : // tracked memory associated with the query must be released. The whole query : // subtree of MemTrackers can be removed from the global tree and destroyed. : query_mem_tracker_->CloseAndUnregisterFromParent(); : } I'm still not clear why this moved from ReleaseExecResources() to ~QueryState(). Doesn't this mean we're releasing the tracked memory later? http://gerrit.cloudera.org:8080/#/c/8303/10/be/src/runtime/query-state.cc@135 PS10, Line 135: AcquireExecResourceRefcount(); // Decremented in QueryExecMgr::StartQueryHelper(). Can we move this to the start of Init() and make L95 a DCHECK? -- To view, visit http://gerrit.cloudera.org:8080/8303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41ff374b0403f10a145f7fee9b3145953ee32341 Gerrit-Change-Number: 8303 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 18:37:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1397/ -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 11 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 26 Oct 2017 18:08:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 11: Code-Review+2 Rebased, carrying +2. Thanks Dimitris. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 11 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 26 Oct 2017 18:07:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8368 ) Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. Patch Set 2: (7 comments) Thanks for tackling this! As a user, I hate that this was broken. I think your changes to impala_shell.py look good. I'm a bit more wary of having another superclass for tests with helper functions. http://gerrit.cloudera.org:8080/#/c/8368/2/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8368/2/shell/impala_shell.py@729 PS2, Line 729: def _validate_database(self, immediately=False): Is there a case where you ever want immediately=True? i.e., could we get rid of the two paths here and converge to just one? http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py File tests/common/cluster_controller.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@33 PS2, Line 33: class ClusterController(CustomClusterTestSuite): I'm generally not fond of the inheritance and multiple-inheritance in these tests. Is this actually distinct from CustomClusterTestSuite? http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@64 PS2, Line 64: def start_cluster_with_args(self, **kwargs): This could go straight into the super class. It's tightly coupled with _start_impala_cluster. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py File tests/custom_cluster/test_shell_interactive_reconnect.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@25 PS2, Line 25: TMP_HISTORY_FILE = os.path.expanduser("~/.impalahistorytmp") Perhaps use an actual tempfile created with tempfile.[something] http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@27 PS2, Line 27: class TestShellInteractiveReconnect(ClusterController): Any reason not to add this simply to tests/shell/test_shell_interactive.py? I think because you need CustomCluster to do the restart? If so, please add a comment about this. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/custom_cluster/test_shell_interactive_reconnect.py@40 PS2, Line 40: @pytest.mark.execute_serially I think all CustomCluster tests run serially? This one seems like it could be parallel. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py File tests/shell/util.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py@116 PS2, Line 116: """ Moves history file to given filepath """ The underlying bug here is that the shell doesn't have a historypath option, but I think this is fine... -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:58:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8368 ) Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. Patch Set 2: (5 comments) Thanks for doing this refactoring, I had some questions about whether we could improve it further, but I think the direction looks good. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py File tests/common/cluster_controller.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@33 PS2, Line 33: class ClusterController(CustomClusterTestSuite): Woh, nice. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@34 PS2, Line 34: breakpad Comment needs update. I guess now it's used for all custom cluster tests that kill processes and restart the cluster? http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@42 PS2, Line 42: self.tmp_dir = tempfile.mkdtemp() I'm thinking about whether this is factored in the best way and whether the additional layer of inheritance is needed. I think a lot of the below utility functions for starting the cluster and killing processes could be moved into CustomClusterTestSuite. Other aspects seem specific to the breakpad tests - disabling core dumps, the temporary directory manipulation. Maybe it makes sense to have those in a Breakpad test base class. What do you think? http://gerrit.cloudera.org:8080/#/c/8368/2/tests/common/cluster_controller.py@85 PS2, Line 85: def kill_processes(self, processes, signal): kill_processes() and wait_for_all_processes_dead() I think could just be standalone utility functions - they aren't specific to cluster tests. http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py File tests/shell/util.py: http://gerrit.cloudera.org:8080/#/c/8368/2/tests/shell/util.py@115 PS2, Line 115: move_history maybe move_shell_history()/restore_shell_history()? It's more verbose but I think is easier to understand at the callsites. -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:58:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. Patch Set 4: Tim, can you do the +2 review for this one? -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:49:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc@948 PS3, Line 948: DCHECK(got_reservation) << "Should have been pinned"; > Maybe add a DCHECK message like '<< "Should have been pinned"' to explain t Done http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc@1073 PS3, Line 1073: DCHECK(got_reservation) << "Should have been pinned"; > Same here Done -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:44:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Hello Thomas Tauber-Marshall, Tim Armstrong, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8226 to look at the new patch set (#4). Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. IMPALA-2758: Remove BufferedTupleStream::GetRows This patch removes BufferedTupleStream::GetRows. This function pins a stream and reads all the rows into a single batch. It is not a good API since it creates an arbitrarily large row batch. In this patch the call sites pin the stream and then directly use GetNext to retrieve a single batch at a time. Testing: It passes existing tests. A test case for GetRows is removed. Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 --- M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h 5 files changed, 64 insertions(+), 99 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/8226/4 -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8366 ) Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than the footer size .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1395/ -- To view, visit http://gerrit.cloudera.org:8080/8366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da Gerrit-Change-Number: 8366 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:40:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8366 ) Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than the footer size .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da Gerrit-Change-Number: 8366 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:40:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8366 ) Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than the footer size .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da Gerrit-Change-Number: 8366 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:39:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2758: Remove BufferedTupleStream::GetRows
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8226 ) Change subject: IMPALA-2758: Remove BufferedTupleStream::GetRows .. Patch Set 3: Code-Review+1 (2 comments) Change looks good to me - glad to see the code removed. Can the other reviewers take another look (or let me know if they don't intend to)? http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc@948 PS3, Line 948: DCHECK(got_reservation); Maybe add a DCHECK message like '<< "Should have been pinned"' to explain the DCHECK. http://gerrit.cloudera.org:8080/#/c/8226/3/be/src/exec/partitioned-hash-join-node.cc@1073 PS3, Line 1073: DCHECK(got_reservation); Same here -- To view, visit http://gerrit.cloudera.org:8080/8226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3831c38994da2b69775a9809ff01de5d23584414 Gerrit-Change-Number: 8226 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 26 Oct 2017 17:36:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 13 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 26 Oct 2017 17:06:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Alex Behm has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Reviewed-on: http://gerrit.cloudera.org:8080/7564 Tested-by: Impala Public Jenkins Reviewed-by: Alex Behm--- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test 12 files changed, 775 insertions(+), 108 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 14 Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1394/ -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 13 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 26 Oct 2017 15:46:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 13 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 26 Oct 2017 13:25:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1291: Parquet read fails if io buffer size is less than the footer size
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8366 to look at the new patch set (#3). Change subject: IMPALA-1291: Parquet read fails if io buffer size is less than the footer size .. IMPALA-1291: Parquet read fails if io buffer size is less than the footer size This commit introduces the compile-time constant READ_SIZE_MIN_VALUE in the newly created common/global-flags.h A static_assert checks if READ_SIZE_MIN_VALUE is greater than or equal to HdfsParquetScanner::FOOTER_SIZE. Also, moved the definition of read_size flag to global-flags.cc, and declared it in global-flags.h. Runtime checks test if read_size is greater than or eaqual to READ_SIZE_MIN_VALUE. If not, the process is terminated. Change-Id: Ib7364147e7daf5380f11368871f8479646b448da --- M be/src/common/global-flags.cc A be/src/common/global-flags.h M be/src/common/init.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/runtime/disk-io-mgr.cc M be/src/util/backend-gflag-util.cc 6 files changed, 57 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/8366/3 -- To view, visit http://gerrit.cloudera.org:8080/8366 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib7364147e7daf5380f11368871f8479646b448da Gerrit-Change-Number: 8366 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2235: Fix current db when shell auto-reconnects
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8368 Change subject: IMPALA-2235: Fix current db when shell auto-reconnects .. IMPALA-2235: Fix current db when shell auto-reconnects When precmd tested the connection it didn't validate that if we are using the previously selected DB. The _validate_database method is responsible for that, but it only appended the "use " command to the cmdqueue (command queue of Cmd class). But, at this point we might already have commands in the command queue that will run before the "use " command. Also, the command processed by precmd can entirely skip the cmdqueue, therefore it is not enough to insert the "use " command to the front of cmdqueue. We need to issue the USE command with the onecmd() method to execute it immediately. I extended the _validate_database method with an "immediately" flag. If this is true, _validate_database will use the onecmd() method. Otherwise, it will append the USE command to the cmdqueue to maintain the previous behaviour. I added a new automated test suite named test_shell_interactive_reconnect.py to the "custom cluster" tests. It sets the default database, and after reconnection it checks if the shell set it again automatically. One test case checks if the shell set the default db after manually reconnecting to the impala daemon by issuing the CONNECT command. The other test case checks if the shell set the default db after automatic reconnection due to cluster restart. I needed to start/restart the cluster in these tests. That functionality was already implemented in class TestBreakpadBase, but I didn't want the new tests to depend on code from an other test suite, therefore I moved TestBreakpadBase class to tests/common/cluster_controller.py and renamed it to ClusterController. I also needed to backup the impala shell history file because I didn't want to pollute it by the test cases (just like the way it is done in tests/shell/test_shell_interactive.py). I created utility functions for this in tests/shell/util.py and now test_shell_interactive.py and the newly created test suite are using these utility functions. Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e --- M shell/impala_shell.py A tests/common/cluster_controller.py M tests/custom_cluster/test_breakpad.py A tests/custom_cluster/test_shell_interactive_reconnect.py M tests/shell/test_shell_interactive.py M tests/shell/util.py 6 files changed, 243 insertions(+), 121 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/8368/2 -- To view, visit http://gerrit.cloudera.org:8080/8368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I40dfa00ba0314d356fe8617446f516505c925e5e Gerrit-Change-Number: 8368 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 13: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 13 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 26 Oct 2017 10:50:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1393/ -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 13 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 26 Oct 2017 07:01:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 13: > Uploaded patch set 13: Patch Set 12 was rebased. Thanks for the review. I think it needed a rebase. Could you restart the job? -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 13 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 26 Oct 2017 06:11:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7938 ) Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/7938/12/bin/bootstrap_system.sh File bin/bootstrap_system.sh: http://gerrit.cloudera.org:8080/#/c/7938/12/bin/bootstrap_system.sh@111 PS12, Line 111: if ! { service --status-all | grep -E '^ \[ \+ \] ssh$'; } > The apt-get on line 106 is calling the function on line 79. Maybe the probl Aha, looks like that was the problem. When running sudo apt-get in L79, we didn't transfer the environment variables into the sudo scope. I fixed that by adding -E. I also tested that it works. -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 13 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Thu, 26 Oct 2017 06:10:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork
Hello Michael Ho, Michael Brown, Jim Apple, Bikramjeet Vig, Dan Hecht, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7938 to look at the new patch set (#13). Change subject: IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork .. IMPALA-5129: Use Kudu's Kinit code to avoid expensive fork Impala currently kinits by forking off a child process. This has proved to be expensive in many cases since the subprocess tries to reserve as much memory as Impala is currently using which can be quite a lot. This patch adds a flag called 'use_kudu_kinit' that defaults to true. When it's true, it uses the Kudu security library's kinit code that programatically uses the krb5 library to kinit. When it's false, we run our current path which kicks off the kinit-thread and forks off a kinit process periodically to reacquire tickets based on FLAGS_kerberos_reinit_interval. Converted existing tests in thrift-server-test to run with and without kerberos. We now run this BE test with kerberos by using Kudu's MiniKdc utility. This introduces a new dependency on some kerberos binaries that are checked through FindKerberosPrograms.cmake. Note that this is only a test dependency and not a dependency for the impalad binaries and friends. Compilation will still succeed if the kerberos binaries for the MiniKdc are not found, however, the thrift-server-test will fail. We run with and without the 'use_kudu_kinit' flag. TODO: Since the setting up and tearing down of our security code isn't idempotent, we can run only any one test in a process with Kerberos now (IMPALA-6085). Updated bin/bootstrap_system.sh to install new sasl-gssapi modules and the kerberos binaries required for the MiniKdc. Also fixed a bug that didn't transfer the environment into 'sudo' in bin/bootstrap_system.sh. Testing: Verified with thrift-server-test and also manually on a live kerberized cluster. Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 --- M CMakeLists.txt M be/src/exec/kudu-util.h M be/src/kudu/security/CMakeLists.txt M be/src/kudu/security/test/mini_kdc.cc M be/src/rpc/CMakeLists.txt M be/src/rpc/auth-provider.h M be/src/rpc/authentication.cc M be/src/rpc/thrift-server-test.cc M bin/bootstrap_system.sh A cmake_modules/FindKerberosPrograms.cmake 10 files changed, 234 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/7938/13 -- To view, visit http://gerrit.cloudera.org:8080/7938 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9cea56cc6e7412d87f4c2e92399a2f91ea6af6c7 Gerrit-Change-Number: 7938 Gerrit-PatchSet: 13 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Attila Jeges has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. IMPALA-3548: Prune runtime filters based on query options in the FE Currently, the FE generates a number of runtime filters and assigns them to the single node plan without taking the value of RUNTIME_FILTER_MODE and DISABLE_ROW_RUNTIME_FILTERING query options into account. The backend then removes filters from exec nodes, based on the following rules: 1. If DISABLE_ROW_RUNTIME_FILTERING is set, filters are removed from the exec nodes that are marked as targets not bound by partition columns. 2. If RUNTIME_FILTER_MODE is set to LOCAL, filters are removed from the exec nodes that are marked as remote targets. This may cause some confusion to users because they may see runtime filters in the output of explain that are not applied when the query is executed. This change moves the logic of runtime filter pruning to the planner in the FE. The runtime filter assignment is done on the distributed plan and the above constraints are enforced there directly. Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb --- M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/service/fe-support.cc M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M fe/src/main/java/org/apache/impala/service/FeSupport.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestFileParser.java A testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test 12 files changed, 775 insertions(+), 108 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/7564/13 -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 13 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall