[Impala-ASF-CR] IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8861 ) Change subject: IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be a performance bottleneck. CTR mode is also stream cipher and is secure .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1687/ -- To view, visit http://gerrit.cloudera.org:8080/8861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9debc240615dd8cdbf00ec8730cff62ffef52aff Gerrit-Change-Number: 8861 Gerrit-PatchSet: 5 Gerrit-Owner: Xianda Ke Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Xianda Ke Gerrit-Comment-Date: Tue, 09 Jan 2018 05:17:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Reviewed-on: http://gerrit.cloudera.org:8080/8548 Reviewed-by: Tim Armstrong Reviewed-by: Tianyi Wang Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-table-writer.cc M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M testdata/data/README A testdata/data/signed_integer_logical_types.parquet M tests/query_test/test_insert_parquet.py 5 files changed, 99 insertions(+), 1 deletion(-) Approvals: Tim Armstrong: Looks good to me, but someone else must approve Tianyi Wang: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 4 Gerrit-Owner: anujphadke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 09 Jan 2018 04:55:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3887: Wait for HDFS replication in data loading
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8916 ) Change subject: IMPALA-3887: Wait for HDFS replication in data loading .. IMPALA-3887: Wait for HDFS replication in data loading When the data loading finishes, it is possible for some HDFS blocks to be under replicated. If impala gets the metadata before the replication is done, some tests may fail. This patch adds a replication waiting step in the data loading script. Resubmitted with filesystem type check. Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8 Reviewed-on: http://gerrit.cloudera.org:8080/8916 Reviewed-by: Vuk Ercegovac Reviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M testdata/bin/create-load-data.sh 1 file changed, 19 insertions(+), 0 deletions(-) Approvals: Vuk Ercegovac: Looks good to me, but someone else must approve Alex Behm: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8 Gerrit-Change-Number: 8916 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-3887: Wait for HDFS replication in data loading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8916 ) Change subject: IMPALA-3887: Wait for HDFS replication in data loading .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8 Gerrit-Change-Number: 8916 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 09 Jan 2018 03:24:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8914 ) Change subject: IMPALA-6193: Track memory of incoming data streams .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/rpc-mgr.h File be/src/rpc/rpc-mgr.h: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/rpc-mgr.h@165 PS3, Line 165: /// Passed to new services. : MemTracker* mem_tracker_; Is this still needed ? http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc@343 PS3, Line 343: nit: extra blank line http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-mgr.cc File be/src/runtime/krpc-data-stream-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-mgr.cc@175 PS3, Line 175: ctx->serialized_size() Wouldn't the majority of the changes be not needed if we add a new interface ctx->GetTransferSize() instead ? IMHO, this may be a more generic interface which may be used by other RPCs to be added in the future. Sorry for not providing better advice earlier. http://gerrit.cloudera.org:8080/#/c/8914/2/be/src/runtime/krpc-data-stream-mgr.inline.h File be/src/runtime/krpc-data-stream-mgr.inline.h: http://gerrit.cloudera.org:8080/#/c/8914/2/be/src/runtime/krpc-data-stream-mgr.inline.h@25 PS2, Line 25: : : const TransmitDataRequestPB* TransmitDataCtx::request() const { : DCHECK(initialized_); : return request_; : } : : TransmitDataResponsePB* TransmitDataCtx::response() { : DCHECK(initialized_); : return response_; : } : : kudu::rpc::RpcContext* TransmitDataCtx::rpc_context() { : DCHECK(initialized_); : return rpc_context_; : } : : const kudu::Slice& TransmitDataCtx::tuple_offsets() const { : DCHECK(initialized_); : return tuple_offsets_; : } : : const kudu::Slice& TransmitDataCtx::tuple_data() const { : DCHECK(initialized_); : return tuple_data_; : } : : int64_t TransmitDataCtx::serialized_size() const { : DCHECK(initialized_); : return serialized_size_; : } : : int64_t TransmitDataCtx::deserialized_size() const { : DCHECK(initialized_); : return deserialized_size_; : } : : void TransmitDataCtx::RespondStatus(const Status& status) { : status.ToProto(response_->mutable_status()); : rpc_context_->RespondSuccess(); : } : : void EndDataStreamCtx::RespondStatus(const Status& status) { : status.ToProto(response_->mutable_status()); : rpc_context_->RespondSuccess(); : } > My intention was to make the code safer by adding the DCHECKs, which makes These changes may not be needed with the TransferSize() changes. -- To view, visit http://gerrit.cloudera.org:8080/8914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Gerrit-Change-Number: 8914 Gerrit-PatchSet: 3 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Tue, 09 Jan 2018 02:26:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 ) Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@177 PS2, Line 177: ss << "Codec bad, corrupted "; Can you include a bit more detail, i.e. mention that it's RCFile and include the name of the codec? http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@337 PS2, Line 337: ss << "Invalid bytes read col_idx: " << col_idx; This could also do with a bit more detail. http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@344 PS2, Line 344: void HdfsRCFileScanner::GetCurrentKeyBuffer(int col_idx, bool skip_col_data, How does this avoid buffer overflows if we don't pass in the length of the 'key_buf_ptr' buffer? I think this should be defensively checking if it reads past the end of the buffer. http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@348 PS2, Line 348: GetVInt These GetVInt() and GetVLong() interfaces seems fundamentally unsafe - they don't take the length of the buffer as input! I think we might need to change them so that the length of the buffer is passed in and they check the bounds. -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 09 Jan 2018 01:47:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6190/6246: Add instances tab and event sequence
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8758 ) Change subject: IMPALA-6190/6246: Add instances tab and event sequence .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/8758/10/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/8758/10/be/src/runtime/fragment-instance-state.cc@222 PS10, Line 222: RETURN_IF_ERROR(codegen->FinalizeModule()); > Would you mind elaborating on the point about "deliberately had the heavywe Yeah exactly, Prepare() isn't cancellable. I can't think of a specific reason why codegen should be split between Open() and Prepare(). -- To view, visit http://gerrit.cloudera.org:8080/8758 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I626456b6afa9101ffd5cda10c4096d63d7f9 Gerrit-Change-Number: 8758 Gerrit-PatchSet: 10 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 09 Jan 2018 01:38:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8861 ) Change subject: IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be a performance bottleneck. CTR mode is also stream cipher and is secure .. Patch Set 5: Code-Review+2 Looks like it build ok on all the OSes (including CentOS 6) -- To view, visit http://gerrit.cloudera.org:8080/8861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9debc240615dd8cdbf00ec8730cff62ffef52aff Gerrit-Change-Number: 8861 Gerrit-PatchSet: 5 Gerrit-Owner: Xianda Ke Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Xianda Ke Gerrit-Comment-Date: Tue, 09 Jan 2018 01:36:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8861 ) Change subject: IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be a performance bottleneck. CTR mode is also stream cipher and is secure .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1687/ -- To view, visit http://gerrit.cloudera.org:8080/8861 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9debc240615dd8cdbf00ec8730cff62ffef52aff Gerrit-Change-Number: 8861 Gerrit-PatchSet: 5 Gerrit-Owner: Xianda Ke Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Xianda Ke Gerrit-Comment-Date: Tue, 09 Jan 2018 01:36:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3651: Adds murmur hash() built-in function
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8893 ) Change subject: IMPALA-3651: Adds murmur_hash() built-in function .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8893 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14d56ffb8fab256f3f66a2669271fd4b3c50cc29 Gerrit-Change-Number: 8893 Gerrit-PatchSet: 5 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 09 Jan 2018 01:35:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 ) Change subject: IMPALA-5152: Introduce metadata loading phase .. Patch Set 1: (11 comments) Patch looks good to me overall, have a few suggestions. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@367 PS1, Line 367: IMO, this method should be documented, given this is the starting point of analysis. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@309 PS1, Line 309: private final Map loadedTables; nit: My feeling is that we can call this referencedTables_ (and backup with a comment saying they are loaded). Reason being, we are separating loading and analysis part, so anything that reached here means it is already loaded. Please ignore if you disagree, I'm not too strong about this either. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2372 PS1, Line 2372: if (table.isLoaded() && table instanceof IncompleteTable) { Preconditions.checkState(table.isLoaded())? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@835 PS1, Line 835: // Database does not exist. Maybe catch DatabaseNotFoundEx? CatalogEx seems generic (I checked Catalog#getTable() seems to be throwing DBNotFoundEx after you removed ImpaladCatalog#getTable()). http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@838 PS1, Line 838: if (!tbl.isLoaded()) { Should we handle tbl.isLoaded() && tbl instanceof IncompleteTable here and fail fast instead? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@859 PS1, Line 859: private Set collectTableCandidates(StatementBase stmt, String defaultDb) { I feel like this method belongs to the AnalysisCtx() and not frontend? Then we could just refactor requestTableLoadAndWait() to requestTableLoadAndWait(List) http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@910 PS1, Line 910: if (missingTbls.isEmpty()) return new LoadedTables(catalog, loadedTbls); Should we update something in the timeline to reflect that all the metadata is locally available? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@931 PS1, Line 931: currCatalog != catalog Not fully familiar with != implementation, so I could be wrong on this one. Shouldn't we do something like (currCatalog.serviceId != catalog.serviceId) ? Else, any change to the catalog, like a new updates etc, can alter the hashCode() resulting in an incorrect equals() ? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@942 PS1, Line 942: LOG.warn(String.format("Waiting for table metadata. " + Should we dump current missingTables() too? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@951 PS1, Line 951: getMissingTables(catalog, missingTbls, defaultDb, loadedTbls); Make getMissingTables() always use getCatalog() instead of passing it around? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@958 PS1, Line 958: issueLoadRequest Not sure I follow the need for this. L960 makes sure missingTbls is up to date and we always need to load the remaining ones right? I don't see a case when issueLoadRequest is false and missingTbls is not empty. In other words, every time we pass the loop condition, we need to load something. -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Comment-Date: Tue, 09 Jan 2018 01:21:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1686/ -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 09 Jan 2018 01:16:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5014: Part 2: Round when casting decimal to timestamp
Taras Bobrovytsky has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8969 Change subject: IMPALA-5014: Part 2: Round when casting decimal to timestamp .. IMPALA-5014: Part 2: Round when casting decimal to timestamp When there are too many digits to the right of the dot in a decimal, we would always truncate when casting to timestamp. In this patch we change the behavior to round instead of truncating when decimal_v2 is enabled. Testing: - Added some EE tests, ran BE tests on my machine. Change-Id: I8fb3a7d976ab980b8572d7e9524850572bad57da --- M be/src/exprs/decimal-operators-ir.cc M be/src/exprs/decimal-operators.h M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test 3 files changed, 65 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8969/1 -- To view, visit http://gerrit.cloudera.org:8080/8969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8fb3a7d976ab980b8572d7e9524850572bad57da Gerrit-Change-Number: 8969 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-3887: Wait for HDFS replication in data loading
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8916 ) Change subject: IMPALA-3887: Wait for HDFS replication in data loading .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1685/ -- To view, visit http://gerrit.cloudera.org:8080/8916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8 Gerrit-Change-Number: 8916 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 08 Jan 2018 23:44:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadke Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 08 Jan 2018 23:43:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadke Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 08 Jan 2018 23:11:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@310 PS2, Line 310: if line_split[0] == "id": > Can you make this an elif chain and assert that the column name is one of t Done http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@339 PS2, Line 339: if line_split[0] == "id": > Same here Done http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@349 PS2, Line 349: > nit: missing space after % Done http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@351 PS2, Line 351: c > same here Done -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadke Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 08 Jan 2018 22:57:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Hello Tianyi Wang, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8548 to look at the new patch set (#3). Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 --- M be/src/exec/hdfs-parquet-table-writer.cc M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M testdata/data/README A testdata/data/signed_integer_logical_types.parquet M tests/query_test/test_insert_parquet.py 5 files changed, 99 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8548/3 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadke Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[native-toolchain-CR] Bump LLVM to 5.0.0
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8932 ) Change subject: Bump LLVM to 5.0.0 .. Patch Set 1: They've released 5.0.1. Does it make sense to go straight there? (I've not looked up the changes.) -- To view, visit http://gerrit.cloudera.org:8080/8932 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9b7b97cb135d202eaa9a0bae03e722a2505b712 Gerrit-Change-Number: 8932 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 08 Jan 2018 22:11:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
Hello Michael Ho, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8349 to look at the new patch set (#7). Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters. .. IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters. This patch generalizes ltrim()/rtrim() functions to accept a second argument that specifies the set of characters to be removed from the leading/trailing end of the target string: ltrim(string text[, characters text]) rtrim(string text[, characters text]) A common string trimming method has been added to StringFunctions, which is called from the general ltrim/rtrim/btrim functions. The functions also share prepare and close operations. New StringFunctions tests have been added to ExprTest for the new forms of ltrim() and rtrim(). New tests to cover handling of special characters have also been added. Note that our string handling functions only work with the ASCII character set. Handling of other character sets lies outside the scope of this patch. The existing ltrim()/rtrim()/trim() functions that take only one argument have been updated to use the more general methods. Testing: Queries like the following were run on a 1.5-billion row tpch_parquet.lineitem table, with the old and new implementations to ensure there is no performance regression: 1. select count(trim(l_shipinstruct)), count(trim(l_returnflag)), ... 2. select count(*) from t where trim(l_shipinstruct) = '' and ... Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79 --- M be/src/exprs/expr-test.cc M be/src/exprs/string-functions-ir.cc M be/src/exprs/string-functions.h M common/function-registry/impala_functions.py 4 files changed, 169 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8349/7 -- To view, visit http://gerrit.cloudera.org:8080/8349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79 Gerrit-Change-Number: 8349 Gerrit-PatchSet: 7 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8349 ) Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters. .. Patch Set 6: (4 comments) Changed implementation to use templates. Also ran perf measurements to check for regressions. Please see patch set #7 http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@475 PS4, Line 475: tionContext* context, > Sure, we can keep passing StringVal(" ") for the no-arg case. It's not used There are test cases that pass NULL for chars_to_trim, so that's taken care. http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@451 PS6, Line 451: direction == LEADING || direction == BOTH > To avoid a logic or, we can consider doing: Changed this to a templatized implementation to avoid branching overhead. http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@464 PS6, Line 464: //return DoTrimStringImpl(str, unique_chars, direction); > delete Done http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@466 PS6, Line 466: : StringVal StringFunctions::Trim(FunctionContext* context, const StringVal& str) { : return DoTrimString(context, str, StringVal(" "), BOTH); : } : : StringVal StringFunctions::Ltrim(FunctionContext* context, const StringVal& str) { : return DoTrimString(context, str, StringVal(" "), LEADING); : } : : StringVal StringFunctions::Rtrim(FunctionContext* context, const StringVal& str) { : return DoTrimString(context, str, StringVal(" "), TRAILING); : } > Can you please do a quick benchmark to make sure we didn't regress the perf Have measured the performance of the new code and old trim/ltrim/rtrim on a tpch_parquet table that has 1.5 billion rows. The test query basically scans all the string columns of the table (8 of them) and calls trim/ltrim/rtrim on each element, where each operation is something like select count(trim(l_shipinstruct)), etc. The idea is to check for any unacceptable perf regression due to how the code has been refactored. Time spent in these function calls aggregated by count()s are the same in the old and new code, within margin of error. -- To view, visit http://gerrit.cloudera.org:8080/8349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79 Gerrit-Change-Number: 8349 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 08 Jan 2018 22:06:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront
Hello Tianyi Wang, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8707 to look at the new patch set (#14). Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront .. IMPALA-4835: Part 2: Allocate scan range buffers upfront This change is a step towards reserving memory for buffers from the buffer pool and constraining per-scanner memory requirements. This change restructures the DiskIoMgr code so that each ScanRange operates with a fixed set of buffers that are allocated upfront and recycled as the I/O mgr works through the ScanRange. One major change is that ScanRanges get blocked when a buffer is not available and get unblocked when a client returns a buffer via ReturnBuffer(). I was able to remove the logic to maintain the blocked_ranges_ list by instead adding a separate set for all ranges that are active. I plan to merge this along with the actual buffer pool switch, but separated it out to allow review of the DiskIoMgr changes separate from other aspects of the buffer pool switchover. Testing: * Ran core and exhaustive tests. Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/io/disk-io-mgr-stress-test.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/bit-util-test.cc M be/src/util/bit-util.h M be/src/util/runtime-profile-counters.h 19 files changed, 776 insertions(+), 425 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/8707/14 -- To view, visit http://gerrit.cloudera.org:8080/8707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f Gerrit-Change-Number: 8707 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3887: Wait for HDFS replication in data loading
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8916 ) Change subject: IMPALA-3887: Wait for HDFS replication in data loading .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8 Gerrit-Change-Number: 8916 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 08 Jan 2018 21:32:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8898 ) Change subject: IMPALA-6231: Implement decimal_v2 fuzz test .. Patch Set 3: Code-Review+1 Zach, since you were taking a look, too, do you want to give +2 when you're satisfied? -- To view, visit http://gerrit.cloudera.org:8080/8898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85 Gerrit-Change-Number: 8898 Gerrit-PatchSet: 3 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Mon, 08 Jan 2018 21:29:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test
Taras Bobrovytsky has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/8898 ) Change subject: IMPALA-6231: Implement decimal_v2 fuzz test .. IMPALA-6231: Implement decimal_v2 fuzz test Implement a test that generates random decimal numbers in the pytest framework, performs a random mathemtaical operation in Impala and verifies that the result is correct by doing the same operating using the Python decimal module. We try to generate not only completely random decimal numbers, but also numbers that have interesting properties, such as the number being a power of two. Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85 --- A tests/query_test/test_decimal_fuzz.py 1 file changed, 248 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8898/3 -- To view, visit http://gerrit.cloudera.org:8080/8898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85 Gerrit-Change-Number: 8898 Gerrit-PatchSet: 3 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/8898 ) Change subject: IMPALA-6231: Implement decimal_v2 fuzz test .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py File tests/query_test/test_decimal_fuzz.py: http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@46 PS1, Line 46: create_exec_option_dimension_from_dict({'_': ['_']}) > I have seen this done using two methods. Done -- To view, visit http://gerrit.cloudera.org:8080/8898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85 Gerrit-Change-Number: 8898 Gerrit-PatchSet: 1 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Mon, 08 Jan 2018 21:24:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8707 ) Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront .. Patch Set 12: Discovered a bug with partial reads. PS12 extends the relevant unit test and fixes it. -- To view, visit http://gerrit.cloudera.org:8080/8707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f Gerrit-Change-Number: 8707 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 08 Jan 2018 20:59:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront
Hello Tianyi Wang, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8707 to look at the new patch set (#12). Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront .. IMPALA-4835: Part 2: Allocate scan range buffers upfront This change is a step towards reserving memory for buffers from the buffer pool and constraining per-scanner memory requirements. This change restructures the DiskIoMgr code so that each ScanRange operates with a fixed set of buffers that are allocated upfront and recycled as the I/O mgr works through the ScanRange. One major change is that ScanRanges get blocked when a buffer is not available and get unblocked when a client returns a buffer via ReturnBuffer(). I was able to remove the logic to maintain the blocked_ranges_ list by instead adding a separate set for all ranges that are active. I plan to merge this along with the actual buffer pool switch, but separated it out to allow review of the DiskIoMgr changes separate from other aspects of the buffer pool switchover. Testing: * Ran core tests. * TODO: run exhaustive tests. Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/io/disk-io-mgr-stress-test.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/bit-util-test.cc M be/src/util/bit-util.h M be/src/util/runtime-profile-counters.h 19 files changed, 776 insertions(+), 425 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/8707/12 -- To view, visit http://gerrit.cloudera.org:8080/8707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f Gerrit-Change-Number: 8707 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8801 ) Change subject: IMPALA-5191: Standardize column alias behavior .. Patch Set 8: (13 comments) http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@273 PS8, Line 273: protected void checkForAmbiguousAliases(Expr expr, String errorPrefix) Let's inline this check into the single caller. http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@283 PS8, Line 283: corresponding "corresponding" is not really explained, I think we should explain ordinal and alias substitution a little more. Returns the substitution of top-level integer NumericLiteral with the corresponding select-list expr http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@285 PS8, Line 285:* 'expr' is analyzed in this function regardless of whether The returned expr is analyzed regardless of whether substitution was performed. http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@288 PS8, Line 288: protected Expr substituteOrdinalAlias(Expr expr, String errorPrefix, substituteOrdinalOrAlias() http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@290 PS8, Line 290: checkForAmbiguousAliases(expr, errorPrefix); Let's move this after attempting to substitute ordinals to make the flow clearer. http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@291 PS8, Line 291: // We can substitute either by ordinal or by alias. Integrate this into the function comment. http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@296 PS8, Line 296: if (substituteExpr == null) { Slightly easier for me to read/reason with inverted logic, i.e. if (substituteExpr != null) return substituteExpr; if (expr instanceof SlotRef) ... http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@314 PS8, Line 314: protected void substituteOrdinalsAliases(List exprs, String errorPrefix, substituteOrdinalsAndAliases() http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@316 PS8, Line 316: ListIterator i = exprs.listIterator(); Less code to use a loop over indexes: for (int i = 0; ...) { exprs.set(i, substitute(...)); } http://gerrit.cloudera.org:8080/#/c/8801/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8801/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@548 PS7, Line 548: substituteOrdi > substituteOrdinalsAliases() takes a list of Exprs as param, while we only h Thanks for confirming and fixing! http://gerrit.cloudera.org:8080/#/c/8801/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/8801/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@985 PS7, Line 985: // IMPALA-5191: In GROUP BY, HAVING, ORDER BY, aliases and ordinals must only be > I found it cumbersome to combine the tests, because some of them pass, some That's fine, maybe it's not worth consolidating the tests. I'll think more carefully about it. Good catch with the weird ordinal substitution. It's great that we keep finding/fixing more edge cases! To address it I think we should change SelectStmt.rewriteExprs() to not accept an expr rewrite in the GROUP BY, HAVING, and ORDER BY clauses if the original expression was not a NumericLiteral eligible to be substituted, but the rewritten expression is a NumericLiteral eligible for ordinal substitution. Here's an example that demonstrates inconsistent behavior in the GROUP BY clause: select int_col, count(*) from functional.alltypes group by 1*2, int_col; The query throws an error with enable_expr_rewrites=true, but works with enable_expr_rewrites=false What do you think? http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@987 PS8, Line 987: AnalyzesOk("with w_test as (select '1' as one, 2 as two, '3' as three) " Let's move this complex test to the end. Easier to follow the tests if we start with the simple cases first. nit: Let's be consistent abo
[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8890 ) Change subject: IMPALA-3526: update FE tests to pass on S3 .. Patch Set 2: changed the test query file names. -- To view, visit http://gerrit.cloudera.org:8080/8890 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141 Gerrit-Change-Number: 8890 Gerrit-PatchSet: 2 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 08 Jan 2018 20:28:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3
Hello Thomas Tauber-Marshall, Taras Bobrovytsky, Lars Volker, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8890 to look at the new patch set (#3). Change subject: IMPALA-3526: update FE tests to pass on S3 .. IMPALA-3526: update FE tests to pass on S3 Plans for some queries/same data may differ when the same data is stored on S3 vs. HDFS. This is due to block size differences used to enumerate range scans on the different file systems. As a result, FE tests have been disabled for S3 configurations. This has also led to staleness in the tests that were specific to S3. Most of the broken tests are due to staleness, but several are due to true plan differences. This change fixes stale tests and separates out those test queries whose plans differ across S3 and HDFS. Tests: - ran FE tests on S3 Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141 --- M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/S3PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/join-order-hdfs.test A testdata/workloads/functional-planner/queries/PlannerTest/join-order-s3.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/s3.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test A testdata/workloads/functional-planner/queries/PlannerTest/tpch-hdfs.test A testdata/workloads/functional-planner/queries/PlannerTest/tpch-s3.test 9 files changed, 2,171 insertions(+), 1,064 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/8890/3 -- To view, visit http://gerrit.cloudera.org:8080/8890 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141 Gerrit-Change-Number: 8890 Gerrit-PatchSet: 3 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/8930 ) Change subject: IMPALA-6307: CTAS statement fails with duplicate column exception. .. Patch Set 1: (3 comments) Thanks for the comments. Please have a look at patch set #2. http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@223 PS1, Line 223: Preconditions.checkState(colDefs.size() + partitionColDefs.size() == types.size()); : for (int i = 0; i < colDefs.size(); ++i) colDefs.get(i).setType(types.get(i)); : for (int i = 0; i < partitionColDefs.size(); ++i) { : partitionColDefs.get(i).setType(types.get(i + colDefs.size())); : } > I believe it would be nice to add a comment in AnalysisContext.java L405 to Done http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/TableDef.java@160 PS1, Line 160: dataLayout_.getPartitionColumnDefs().clear(); > I would add a reset() function in TableDataLayout() and put that there. Done http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1510 PS1, Line 1510: AnalyzesOk("create table functional.ctas_tbl partitioned by (year) as " + : "with tmp as (select a.timestamp_col, a.year from functional.alltypes a " + : "left join functional.alltypes b " + : "on b.timestamp_col between a.timestamp_col and a.timestamp_col) " + : "select a.timestamp_col, a.year from tmp a"); > Add a comment. It will not be clear to everyone what this thing is testing. Done -- To view, visit http://gerrit.cloudera.org:8080/8930 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2 Gerrit-Change-Number: 8930 Gerrit-PatchSet: 1 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 08 Jan 2018 19:52:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8930 to look at the new patch set (#2). Change subject: IMPALA-6307: CTAS statement fails with duplicate column exception. .. IMPALA-6307: CTAS statement fails with duplicate column exception. A CTAS statement with a 'partition by' clause causes the statement to fail with a duplicate column name exception. This is happening because on expression rewrite, the partition defs state is not reset. IMPALA-5796 added TableDef::reset(). This patch expands the method by adding calls to reset ColumnDefs and PartitionColumnDefs. Testing: * Regression test added to AnalyzeDDLTest. * Exhaustive Jenkins build and test. Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2 --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 5 files changed, 19 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8930/2 -- To view, visit http://gerrit.cloudera.org:8080/8930 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2 Gerrit-Change-Number: 8930 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8890 ) Change subject: IMPALA-3526: update FE tests to pass on S3 .. Patch Set 2: For an explanation for the plan difference, pls have a look at this comment I posted on the JIRA: https://issues.apache.org/jira/browse/IMPALA-3526?focusedCommentId=16296008&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16296008 I think the "all" is there to contrast with other tpch tests (kudu, nested, views). I may have misread that one though. My take is that there is a separate design discussion for this issue (see the JIRA comment). This patch is just for enabling FE tests for S3. I'm also keeping ADLS testing as separate for now. -- To view, visit http://gerrit.cloudera.org:8080/8890 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141 Gerrit-Change-Number: 8890 Gerrit-PatchSet: 2 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 08 Jan 2018 19:28:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3282: Adds regexp escape built-in function
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8900 ) Change subject: IMPALA-3282: Adds regexp_escape built-in function .. Patch Set 4: (1 comment) Once this makes it in, we should be sure to update the documentation to add it to our list of functions. http://gerrit.cloudera.org:8080/#/c/8900/4/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8900/4/be/src/exprs/string-functions-ir.cc@625 PS4, Line 625: for (char const *c = start_ptr; c < end_ptr; c++) { I think this is a re-implementation of be/src/gutil/strings/escaping.cc:BackslashEscape(). The implementations are pretty-similar, except that BackslashEscape() uses a bitmap rather than the linear search that std::find() probably does. My instinct is that at least measuring the performance of this RegexpEscape() versus one that uses a bitmap is useful. I don't know if the IR functions can use the gutil functions directly without CMake changes. I only see Substitute() used, but it's on the error paths for the most part, and may be inlined. -- To view, visit http://gerrit.cloudera.org:8080/8900 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I84c3e0ded26f6eb20794c38b75be9b25cd111e4b Gerrit-Change-Number: 8900 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 08 Jan 2018 19:17:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1767: [DOCS] Document new Boolean operators
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8942 ) Change subject: IMPALA-1767: [DOCS] Document new Boolean operators .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/8942/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8942/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-1767: [DOCS] Document new Boolean operators please make this more informative (just include mention of IS [NOT] TRUE | FALSE | UNKNOWN). http://gerrit.cloudera.org:8080/#/c/8942/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/8942/2/docs/shared/impala_common.xml@779 PS2, Line 779: IS [NOT] FALSE as equivalents for the functions "builtin functions" instead of "functions" to match the blurb in the detailed section. http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml File docs/topics/impala_operators.xml: http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1251 PS2, Line 1251: lets nit: let http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1320 PS2, Line 1320: lets nit: let http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1327 PS2, Line 1327: These operators are equivalent to the built-in conditional functions sync this blurb with the one in the index (see comment there). http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1337 PS2, Line 1337: query error must be same error as for "IS [NOT] NULL" for complex types? perhaps its less surprising if this is pointed out (or linked to)? http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1345 PS2, Line 1345: select assertion, b, b is true, b is false, b is unknown : from boolean_test; just for my own info, but why is this lower-case whereas sql inlined in text is upper-case? -- To view, visit http://gerrit.cloudera.org:8080/8942 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iefebf210418ec2d47b154bd37166b76720f085bb Gerrit-Change-Number: 8942 Gerrit-PatchSet: 2 Gerrit-Owner: John Russell Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: John Russell Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 08 Jan 2018 18:57:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3193: Show table's comment on show tables
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8851 ) Change subject: IMPALA-3193: Show table's comment on show tables .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8851/4/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/8851/4/fe/src/main/java/org/apache/impala/service/JniFrontend.java@304 PS4, Line 304: for (Table table: tables) { : org.apache.hadoop.hive.metastore.api.Table msTable : = table.getMetaStoreTable(msClient); : if (msTable != null) { : String comment = msTable.getParameters().get("comment"); : comments.add(comment != null ? comment : ""); : } : } > Regarding your suggestion, there was no loaded tables when I attached debug The catalog server is the only entity responsible for loading metadata and making calls to HMS (for that purpose). On the other hand, Impalad's catalog cache should not be doing anything like that (which is what you do in this block). JniFronted::getTableNames() retrieves and returns the names of tables that the local catalog cache knows of (no external communication with either the catalog server or the HMS). So, what you should do is the following: If the table is fully loaded and has a comment, you can retrieve it directly from the Table object. If the table is not fully loaded, you don't do anything (return empty string) even if the table has a comment stored in HMS. If the table is loaded but has no comment, return an empty string. If the concepts of fully loaded metadata, catalog server and impalad catalog cache are not clear, I suggest you spend some time at the code before working on this change. Impala's metadata management system is a bit complex, so it takes some time to grasp all the details and the information flow. -- To view, visit http://gerrit.cloudera.org:8080/8851 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I44f814af05db6f3c027718ade9f474f8b8153bcd Gerrit-Change-Number: 8851 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Kim Jin Chul Gerrit-Comment-Date: Mon, 08 Jan 2018 18:34:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront
Hello Tianyi Wang, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8707 to look at the new patch set (#11). Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront .. IMPALA-4835: Part 2: Allocate scan range buffers upfront This change is a step towards reserving memory for buffers from the buffer pool and constraining per-scanner memory requirements. This change restructures the DiskIoMgr code so that each ScanRange operates with a fixed set of buffers that are allocated upfront and recycled as the I/O mgr works through the ScanRange. One major change is that ScanRanges get blocked when a buffer is not available and get unblocked when a client returns a buffer via ReturnBuffer(). I was able to remove the logic to maintain the blocked_ranges_ list by instead adding a separate set for all ranges that are active. I plan to merge this along with the actual buffer pool switch, but separated it out to allow review of the DiskIoMgr changes separate from other aspects of the buffer pool switchover. Testing: * Ran core tests. * TODO: run exhaustive tests. Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/io/disk-io-mgr-stress-test.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/bit-util-test.cc M be/src/util/bit-util.h M be/src/util/runtime-profile-counters.h 19 files changed, 721 insertions(+), 407 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/8707/11 -- To view, visit http://gerrit.cloudera.org:8080/8707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f Gerrit-Change-Number: 8707 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 ) Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners .. Patch Set 2: It's ok if we don't have a reproducible test case for bugs caught by the fuzzer, unless the fuzzing revealed a really big hole in test coverage. It's good if we have a test case but for cases where we weren't handling corrupt data correctly there are so many ways files could be corrupted it wasn't feasible to have a standalone test for each case. -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 08 Jan 2018 17:41:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/8898 ) Change subject: IMPALA-6231: Implement decimal_v2 fuzz test .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py File tests/query_test/test_decimal_fuzz.py: http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@46 PS1, Line 46: > I'm just trying to remove all test dimensions that we normally use (such as I have seen this done using two methods. 1. Do not include "vector" in the test method's formal parameter list. 2. Use create_single_exec_option_dimension() http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@70 PS1, Line 70: > Yes, I think we want 38. It's ok that 38 shows up in both extreme precision Done http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@230 PS1, Line 230: expected_result = decimal.Decimal(value1) + decimal.Decimal(value2) : elif op == '-': : expected_result = decimal.Decimal(value1) - decimal.Decimal(value2) : elif op == '*': : expected_result = decimal.Decimal(value1) * decimal.Decimal(value2) : elif op == '/': : expected_result = decimal.Decimal(value1) / decimal.Decimal(value2) : elif op == '%': : expected_result = decimal.Decimal(value1) % decimal.Decimal(value2) : else: > Are you suggesting to create a mapping m that maps, for example, "+" to ope Done http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@248 PS1, Line 248: for _ in xrange(self.itera > Cool tip, but I decided against making each iteration a separate test. Too Done -- To view, visit http://gerrit.cloudera.org:8080/8898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85 Gerrit-Change-Number: 8898 Gerrit-PatchSet: 2 Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Mon, 08 Jan 2018 17:22:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 ) Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners .. Patch Set 2: > Can we test all the cases fixed this with 100% reproducible cases? > For Ex: > I commented out the change in decompress.cc and ran the fuzz test > and it passed. > Can we test all the cases fixed this with 100% reproducible cases? > For Ex: > I commented out the change in decompress.cc and ran the fuzz test > and it passed. If you run a couple of times without the fix the assertion will be hit. I'll see if I can change the test so that we hit the assert every time -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 08 Jan 2018 16:59:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters
Adam Holley has restored this change. ( http://gerrit.cloudera.org:8080/8959 ) Change subject: IMPALA-6371: Additional check for delimiters .. Restored -- To view, visit http://gerrit.cloudera.org:8080/8959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: restore Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde Gerrit-Change-Number: 8959 Gerrit-PatchSet: 3 Gerrit-Owner: Adam Holley Gerrit-Reviewer: Adam Holley
[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters
Adam Holley has abandoned this change. ( http://gerrit.cloudera.org:8080/8959 ) Change subject: IMPALA-6371: Additional check for delimiters .. Abandoned Misplaced length check. -- To view, visit http://gerrit.cloudera.org:8080/8959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde Gerrit-Change-Number: 8959 Gerrit-PatchSet: 3 Gerrit-Owner: Adam Holley Gerrit-Reviewer: Adam Holley
[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters
Adam Holley has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/8959 ) Change subject: IMPALA-6371: Additional check for delimiters .. IMPALA-6371: Additional check for delimiters The check validates the codepoint of the Java char. Testing: - Added tests for valid/invalid unicode in HdfsStorageDescriptorTest. Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde --- M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java M fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java 2 files changed, 24 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/8959/3 -- To view, visit http://gerrit.cloudera.org:8080/8959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde Gerrit-Change-Number: 8959 Gerrit-PatchSet: 3 Gerrit-Owner: Adam Holley Gerrit-Reviewer: Adam Holley
[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters
Adam Holley has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8959 ) Change subject: IMPALA-6371: Additional check for delimiters .. IMPALA-6371: Additional check for delimiters The check validates the codepoint of the Java char. Testing: - Added tests for valid/invalid unicode in HdfsStorageDescriptorTest. Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde --- M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java M fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java 2 files changed, 21 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/8959/2 -- To view, visit http://gerrit.cloudera.org:8080/8959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde Gerrit-Change-Number: 8959 Gerrit-PatchSet: 2 Gerrit-Owner: Adam Holley Gerrit-Reviewer: Adam Holley
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 6: (25 comments) http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@368 PS6, Line 368: /// This is called from WriteAlignedTuples. Explain what error_file_offsets does in comment http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382 PS6, Line 382: column Wouldn't this report an invalid value (column inside row)? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382 PS6, Line 382: /// Utility function to append an error message for an invalid column. It seems confusing to me that there is now a LogColumnParseError and ReportColumnParseError. Do we need both? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@391 PS6, Line 391: // - error_offsets is an out array. error_offsets[i] will be set to file offset Start comments with three /// like the other lines. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@393 PS6, Line 393: is an out 8 byte integer No need to mention its type since it's the same as in the declaration itself. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@408 PS6, Line 408: uint8_t* error_in_row, int64_t* error_file_offsets, int64_t* curr_file_offset) This looks like it could have some performance impact. Did you do any perf experiments? Have you considered not setting error_file_offsets inside WriteCompleteTuple and computing it from the field lengths during error reporting? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@233 PS6, Line 233: uint8_t* error_fields, uint8_t* error_in_row, nit: wrap at 90 characters. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@254 PS6, Line 254: *curr_offset += len + 1; Is +1 for the delimiter? If so, can you add a brief comment? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@578 PS6, Line 578: builder.CreateAdd(builder.CreateZExt(len, codegen->GetType(TYPE_BIGINT)), Please indent function calls 4 characters, here and elsewhere. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@759 PS6, Line 759: "Error parsing row: file: $0, (file offset: $1)", nit: line wrapping. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@761 PS6, Line 761: state_->LogError(ErrorMsg(TErrorCode::GENERAL, s)); Why does this not check state_->HasLogSpace() but the method below does? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@768 PS6, Line 768: at I think the message was clearer before the change. http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README@136 PS6, Line 136: problematic_rows_impala_5993 Can you think of a name that captures the nature of the problems? Most of the files in here are somewhat problematic. It could be "invalid_int.csv" (if it is an invalid int column). http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@30 PS6, Line 30: from os.path import join as join_path I think elsewhere we just use "os.path.join()" when we need it, so for consistency it would be good to do the same here. http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@841 PS6, Line 841: Tests that wrong file offset in value parsing error messages when : scanning text files. nit: missing word? http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@842 PS6, Line 842: scanning text files. When the text scanner hits an error, it always prints the end of : the file as the offset, even if the error occurs in the middle of the file. Rephrase to say what it should do, not what the problem was. http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@852 PS6, Line 852: if pytest.config.option.namenode_http_address is None: If you change this to "if p.c.o...: " it will also cover the empty string. We often return at the end of the if-branch and put the on the top level. You can move the initialization of hdfs_loc up to achieve that. http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@860 PS6, Line 860: def prepare(self): I think it might be cleaner if every test prepared their own test data separately. http://gerrit.cloudera.org:8080/#/c/8747/6/test
[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters
Adam Holley has posted comments on this change. ( http://gerrit.cloudera.org:8080/8959 ) Change subject: IMPALA-6371: Additional check for delimiters .. Patch Set 1: Jenkins build successfully run here: https://jenkins.impala.io/job/gerrit-verify-dryrun-external/58/. -- To view, visit http://gerrit.cloudera.org:8080/8959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde Gerrit-Change-Number: 8959 Gerrit-PatchSet: 1 Gerrit-Owner: Adam Holley Gerrit-Reviewer: Adam Holley Gerrit-Comment-Date: Mon, 08 Jan 2018 15:07:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8400 ) Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT .. Patch Set 6: (8 comments) Apologies for the delayed review. I had a few more questions. Let's discuss the parser changes in person if you feel I misunderstood the approach. http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@19 PS6, Line 19: more add "more memory efficient"? That makes it more obvious why it is more efficient to do additional work (sorting). http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@27 PS6, Line 27: Add exchange node before insert even in case of unpartitioned tables. I don't think this is true, is it? For unpartitioned tables, what would the shuffle / exchange do? Instead "shuffle" forces an exchange, "noshuffle" prevents adding one, and leaving it out results in the planner making a cost-based decision (DistributedPlanner.java:218). http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@35 PS6, Line 35: was moved from tbl_def_without_col_defs to statement rules. Mention how you tested it in the commit message. http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055 PS6, Line 1055: plan_hints:hints Does opt_plan_hints not work here? http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69 PS6, Line 69:* Helper class for parsing. Is this class still needed or a left-over from earlier patch-sets? Can you expand the class comment to explain why we need it and what it does? http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1701 PS6, Line 1701: Multiple non-conflicting Can you add one that has multiple distinct hints, like shuffle, clustered? http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@233 PS6, Line 233: shuffle Is that true? Without specifying a hint there should be cases where the cost based heuristic decides against adding an exchange node. http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344 PS6, Line 344: 01:EXCHANGE [UNPARTITIONED] At first I didn't understand what this one does (I saw that insert.test:574 has the same). On my machine this exchange collects all rows at the coordinator. Maybe you want to add that in a comment? -- To view, visit http://gerrit.cloudera.org:8080/8400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0 Gerrit-Change-Number: 8400 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 08 Jan 2018 14:04:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 10: (4 comments) Thanks Dimitris for the review! http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@124 PS9, Line 124: if (!Table.isExternalTable(table_.getMetaStoreTable()) && !setsToExternal) { : AnalysisUtils.throwI > nit: check if they fit in a single line (I could be wrong) Indeed, it fits :) Done http://gerrit.cloudera.org:8080/#/c/8820/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/8820/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@315 PS5, Line 315: throw new AnalysisExce > In principal yes, in this case no. The name of this function doesn't reveal renamed to analyzeManagedKuduTableName(). Hope that is a bit more suitable here :) http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@67 PS9, Line 67: public CreateTableStmt(TableDef tableDef) { : Preconditions.checkNotNull(tableDef); > Since all the tblproperties live in TableDef, I believe it makes more sense Sure, done. http://gerrit.cloudera.org:8080/#/c/8820/9/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/8820/9/tests/query_test/test_kudu.py@327 PS9, Line 327: try: : cursor.execute("ALTER TABLE %s.foo SET TBLPROPERTIES('kudu.table_name'='blah')" : % (unique_database)) : except Exception as e: : expected_error = "AnalysisException: Not allowed to set 'kudu.table_name' " + \ : "manually for managed Kudu tables" : assert expected_error in str(e) > That should be an analysis test. There is also a test for this in AnalyzeDDLTest.java. Do you think this one should be dropped then? -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 08 Jan 2018 10:39:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Hello Laszlo Gaal, Zoltan Borok-Nagy, Attila Jeges, Dimitris Tsirogiannis, Tim Armstrong, Csaba Ringhofer, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8820 to look at the new patch set (#10). Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE This change disallows explicitly setting the Kudu table name property for managed Kudu tables in a CREATE TABLE statement. The Kudu table name property gets a generated value as the following: 'impala::db_name.table_name' where table_name is the one given in the CREATE TABLE statement. Providing the Kudu table name property when creating a managed Kudu table results in an error without creating the table. E.g.: CREATE TABLE t (i INT) STORED AS KUDU TBLPROPERTIES('kudu.table_name'='some_name'); Alongside the CREATE TABLE statement also the ALTER TABLE statement is changed not to allow the modification of Kudu.table_name of managed Kudu tables. Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test M tests/query_test/test_kudu.py 7 files changed, 167 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8820/10 -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy