[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17061 ) Change subject: IMPALA-10503: testdata load hits hive memory limit errors during hive inserts .. Patch Set 10: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6941/ -- To view, visit http://gerrit.cloudera.org:8080/17061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2 Gerrit-Change-Number: 17061 Gerrit-PatchSet: 10 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Tue, 09 Mar 2021 04:42:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17122 ) Change subject: IMPALA-10549: Register transactions from external frontend DML .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8313/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db Gerrit-Change-Number: 17122 Gerrit-PatchSet: 12 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 09 Mar 2021 01:19:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 ) Change subject: IMPALA-7712: Support Google Cloud Storage .. Patch Set 5: (4 comments) Here's my first pass. I'm also going to look at the ABFS change and see if anything is different. http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc@186 PS5, Line 186: true We'll need to double check that GCS file handles work with the file handle cache. It would be ok to default to false until we validate it. In previous cases like ABFS, we started out with this disabled. We would need to verify that unbuffer() is implemented and everything behaves well when there are thousands of these sitting around in the cache. It might just work. http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@863 PS5, Line 863: if (fs instanceof GoogleHadoopFileSystem) { : curIter_.hasNext(); : } I have seen an issue like this before on older versions of the S3 connector as well for recursively_list_partitions=false. If recursively_list_partitions=false, we won't go through this codepath. We would have a FilterIterator that uses a fs.listStatusIterator(p) directly. See listStatus() in this file. It might make sense for us to put this in the constructor for FilterIterator and do it for all filesystems. http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh File testdata/bin/load-test-warehouse-snapshot.sh: http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh@80 PS5, Line 80: hadoop fs -rm -r -skipTrash ${FILESYSTEM_PREFIX}${TEST_WAREHOUSE_DIR} I'm assuming that this command to remove any existing warehouse works for GCS, because it looks like this is what we would use for ABFS/ADLS. http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py File tests/stress/test_insert_stress.py: http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81 PS5, Line 81: @SkipIfGCS.jira(reason="IMPALA-10563") Does IMPALA-10563 consistently reproduce? Do we have any idea if it is specific to the minicluster environment or if it could happen on a real cluster? -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 09 Mar 2021 01:04:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17122 ) Change subject: IMPALA-10549: Register transactions from external frontend DML .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/17122/12/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/17122/12/fe/src/main/java/org/apache/impala/service/JniFrontend.java@695 PS12, Line 695: public void addTransaction(byte[] thriftQueryContext) throws TransactionException, ImpalaException { line too long (102 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db Gerrit-Change-Number: 17122 Gerrit-PatchSet: 12 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 09 Mar 2021 01:01:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17122 ) Change subject: IMPALA-10549: Register transactions from external frontend DML .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/17122/11/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/17122/11/fe/src/main/java/org/apache/impala/service/Frontend.java@2028 PS11, Line 2028: /** :* Opens a new transaction and registers it to the keepalive object. :* @param queryCtx context of the query that requires the transaction. :* @return the transaction id. :* @throws TransactionException :*/ > This comment is really for openTransaction and should be moved, addTransact Done -- To view, visit http://gerrit.cloudera.org:8080/17122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db Gerrit-Change-Number: 17122 Gerrit-PatchSet: 11 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Tue, 09 Mar 2021 01:00:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML
Hello Thomas Tauber-Marshall, Joe McDonnell, John Sherman, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17122 to look at the new patch set (#12). Change subject: IMPALA-10549: Register transactions from external frontend DML .. IMPALA-10549: Register transactions from external frontend DML This change registers transactions that were started by an external frontend so that coordinator keepalive can track them properly. Testing: manually tested using DMLs from external frontend Reviewed-by: John Sherman Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db --- M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/service/impala-server.cc M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java 5 files changed, 43 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/17122/12 -- To view, visit http://gerrit.cloudera.org:8080/17122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db Gerrit-Change-Number: 17122 Gerrit-PatchSet: 12 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-8680: Docker-based tests fail to archive the minicluster component logs
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15898 ) Change subject: IMPALA-8680: Docker-based tests fail to archive the minicluster component logs .. Patch Set 5: I'm running the docker based tests with this to look at the output, then I'll review. -- To view, visit http://gerrit.cloudera.org:8080/15898 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e Gerrit-Change-Number: 15898 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Garaguly Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Zoltan Garaguly Gerrit-Comment-Date: Tue, 09 Mar 2021 00:20:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17104 ) Change subject: IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements .. Patch Set 13: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6940/ -- To view, visit http://gerrit.cloudera.org:8080/17104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iace716dd67290f08441857dc02d2428b0e335eaa Gerrit-Change-Number: 17104 Gerrit-PatchSet: 13 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 23:30:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17061 ) Change subject: IMPALA-10503: testdata load hits hive memory limit errors during hive inserts .. Patch Set 10: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6941/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2 Gerrit-Change-Number: 17061 Gerrit-PatchSet: 10 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Mon, 08 Mar 2021 23:08:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17061 ) Change subject: IMPALA-10503: testdata load hits hive memory limit errors during hive inserts .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2 Gerrit-Change-Number: 17061 Gerrit-PatchSet: 10 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Mon, 08 Mar 2021 23:07:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17122 ) Change subject: IMPALA-10549: Register transactions from external frontend DML .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/17122/11/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/17122/11/fe/src/main/java/org/apache/impala/service/Frontend.java@2028 PS11, Line 2028: /** :* Opens a new transaction and registers it to the keepalive object. :* @param queryCtx context of the query that requires the transaction. :* @return the transaction id. :* @throws TransactionException :*/ This comment is really for openTransaction and should be moved, addTransaction needs its own comment -- To view, visit http://gerrit.cloudera.org:8080/17122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db Gerrit-Change-Number: 17122 Gerrit-PatchSet: 11 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 23:02:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17140 ) Change subject: IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8312/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd9aea4bf2fff634ea9a30bf6e87987be4e1c611 Gerrit-Change-Number: 17140 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 08 Mar 2021 22:28:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 4: (3 comments) Thanks Joe for the feedback http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12 PS4, Line 12: - When this is set, the external frontend is responsible for : for moving and managing the results > I assume the external frontend is also handling any metadata operations as This is correct. I'll update the commit message to make that clearer. http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127 PS4, Line 127: if (HasExternalStagingDir()) { : // When an external FE has provided a staging path, we use it directly. : staging_dir_ = external_staging_dir_; : } else { : staging_dir_ = Substitute("$0/_impala_insert_staging/$1", : table_desc_->hdfs_base_dir(), PrintId(state->query_id(), "_")); : } > The external staging directory is a staging directory from the point of vie I can certainly change everything to be external output rather than external staging. I'll verify why I'm modifying staging_dir_ here - reading the code I assumed I initially did this so BuildHdfsFileNames would use it correctly to build tmp_hdfs_* variables but indeed I don't think these paths would utilize them. http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@279 PS4, Line 279: // The 0 padding on base and delta is to match the behavior of Hive since various : // systems will expect a certain format for dynamic partition creation. Additionally : // include an 0 statement id for delta directory so various Hive AcidUtils detect : // the directory (such as AcidUtils.baseOrDeltaSubdir()). Multiple statements in a : // single transaction is not supported. : if (overwrite_) { : output_partition->final_hdfs_file_name_prefix += StringPrintf("/base_%07ld/", : write_id_); : } else { : output_partition->final_hdfs_file_name_prefix += StringPrintf( : "/delta_%07ld_%07ld_/", write_id_, write_id_); : } > Should this be triggered by its own variable independently from the externa It is a bit orthogonal. I think ideally Impala/Hive and external frontend implementations would standardize on something consistent. I took the paranoid approach of only applying this to external frontends as I do not want to cause any inadvertent regressions. I'm not sure it is currently worth it to plumb through a flag indicating which format is expected. If another implementation of a frontend shows up,I think it might be worth it. Feel free to push back if you feel strongly. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 22:15:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError
Yida Wu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17140 ) Change subject: IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError .. IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError The DCHECK error happens when there is an IO error during the spilling process if the scratch directory is in a remote filesystem and doing an error recovery(rewrite). Because currently the DCHECK only consider the file number of local scratch files, it leads to a file number requirement mismatch in the DCHECK. Because the implementation of spilling to the local fs and the remote fs are quite different, for simplify, we don't recover write error for spilling to a remote fs in the current version. Instead, the errors generated during spilling to remote would be returned directly to the upper layer. So, we avoid the DCHECK logic for spilling to remote. Tests: * Added a unit test: TmpFileMgrTest::TestRemoteRemoveBuffer. * Ran Unit Tests: $IMPALA_HOME/be/build/latest/runtime/tmp-file-mgr-test Change-Id: Ifd9aea4bf2fff634ea9a30bf6e87987be4e1c611 --- M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc 2 files changed, 38 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/17140/4 -- To view, visit http://gerrit.cloudera.org:8080/17140 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd9aea4bf2fff634ea9a30bf6e87987be4e1c611 Gerrit-Change-Number: 17140 Gerrit-PatchSet: 4 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 3 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 21:21:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. Patch Set 16: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8311/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 16 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Mon, 08 Mar 2021 21:20:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. Patch Set 15: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8310/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Comment-Date: Mon, 08 Mar 2021 21:12:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Qifan Chen has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. IMPALA-10494: Making use of the min/max column stats to improve min/max filters This patch adds the functionality to compute the minimal and the maximal value for a column of type integers, float or double for parquet tables, and to make use of the new stats to discard the min/max filters whose coverage are too close to the actual range. The computation and dislay of the new column min/max stats can be controlled by two new Boolean query options (default to false): 1. compute_column_minmax_stats 2. show_column_minmax_stats When enabled, two new columns 'Min' and 'Max' are added in the output of the show column command as shown below. set show_column_minmax_stats=true; show column stats tpcds_parquet.store_sales; +---+--+-...---+-+-+ | Column| Type | #Falses | Min | Max | +---+--+-...---+-+-+ | ss_sold_time_sk | INT | -1 | 28800 | 75599 | | ss_item_sk| BIGINT | -1 | 1 | 18000 | | ss_customer_sk| INT | -1 | 1 | 10 | | ss_cdemo_sk | INT | -1 | 15 | 1920797 | | ss_hdemo_sk | INT | -1 | 1 | 7200| | ss_addr_sk| INT | -1 | 1 | 5 | | ss_store_sk | INT | -1 | 1 | 10 | | ss_promo_sk | INT | -1 | 1 | 300 | | ss_ticket_number | BIGINT | -1 | 1 | 24 | | ss_quantity | INT | -1 | 1 | 100 | | ss_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_discount_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_tax| DECIMAL(7,2) | -1 | -1 | -1 | | ss_coupon_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid_inc_tax | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_profit | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sold_date_sk | INT | -1 | 2450816 | 2452642 | +---+--+-...---+-+-+ Only the min/max values for non-partition columns are stored in HMS. The min/max values for partition columns are computed in coordinator. The min-max filters, in C++ class or protobuf form, are augmented to deal with the always true state better. Once always true is set, the min and max in the filter is no longer populated. Testing: - Added new compute/show stats tests for integers, float and double column data types in compute-stats-column-minmax.test; - Added new tests in overlap_min_max_filters.test to demonstrate the usefulness of column stats to quickly disable useless filters; - Added tests in min-max-filter-test.cc to demonstrate method Or(), ToProtobuf() and constructor can deal with always true flag well. - core tests. TODO: 1. Test compute stats for timestamp and date columns; Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df --- M be/src/exec/catalog-op-executor.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-scanner.h M be/src/exec/incr-stats-util-test.cc M be/src/exec/incr-stats-util.cc M be/src/exec/incr-stats-util.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/service/hs2-util.cc M be/src/service/hs2-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/CatalogObjects.thrift M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Qifan Chen has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. IMPALA-10494: Making use of the min/max column stats to improve min/max filters This patch adds the functionality to compute the minimal and the maximal value for a column of type integers, float or double for parquet tables, and to make use of the new stats to discard the min/max filters whose coverage are too close to the actual range. The computation and dislay of the new column min/max stats can be controlled by two new Boolean query options (default to false): 1. compute_column_minmax_stats 2. show_column_minmax_stats When enabled, two new columns 'Min' and 'Max' are added in the output of the show column command as shown below. set show_column_minmax_stats=true; show column stats tpcds_parquet.store_sales; +---+--+-...---+-+-+ | Column| Type | #Falses | Min | Max | +---+--+-...---+-+-+ | ss_sold_time_sk | INT | -1 | 28800 | 75599 | | ss_item_sk| BIGINT | -1 | 1 | 18000 | | ss_customer_sk| INT | -1 | 1 | 10 | | ss_cdemo_sk | INT | -1 | 15 | 1920797 | | ss_hdemo_sk | INT | -1 | 1 | 7200| | ss_addr_sk| INT | -1 | 1 | 5 | | ss_store_sk | INT | -1 | 1 | 10 | | ss_promo_sk | INT | -1 | 1 | 300 | | ss_ticket_number | BIGINT | -1 | 1 | 24 | | ss_quantity | INT | -1 | 1 | 100 | | ss_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_discount_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_tax| DECIMAL(7,2) | -1 | -1 | -1 | | ss_coupon_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid_inc_tax | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_profit | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sold_date_sk | INT | -1 | 2450816 | 2452642 | +---+--+-...---+-+-+ Only the min/max values for non-partition columns are stored in HMS. The min/max values for partition columns are computed in coordinator. The min-max filters, in C++ class or protobuf form, are augmented to deal with the always true state better. Once always true is set, the min and max in the filter is no longer populated. Testing: - Added new compute/show stats tests for integers, float and double column data types in compute-stats-column-minmax.test; - Added new tests in overlap_min_max_filters.test to demonstrate the usefulness of column stats to quickly disable useless filters; - Added tests in min-max-filter-test.cc to demonstrate method Or(), ToProtobuf() and constructor can deal with always true flag well. - core tests. TODO: 1. Test compute stats for timestamp and date columns; Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df --- M be/src/exec/catalog-op-executor.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-scanner.h M be/src/exec/incr-stats-util-test.cc M be/src/exec/incr-stats-util.cc M be/src/exec/incr-stats-util.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/runtime-filter-bank.cc M be/src/service/hs2-util.cc M be/src/service/hs2-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/CatalogObjects.thrift M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12 PS4, Line 12: - When this is set, the external frontend is responsible for : for moving and managing the results I assume the external frontend is also handling any metadata operations as well. Is that right? http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127 PS4, Line 127: if (HasExternalStagingDir()) { : // When an external FE has provided a staging path, we use it directly. : staging_dir_ = external_staging_dir_; : } else { : staging_dir_ = Substitute("$0/_impala_insert_staging/$1", : table_desc_->hdfs_base_dir(), PrintId(state->query_id(), "_")); : } The external staging directory is a staging directory from the point of view of an external frontend, but in this file, it doesn't use the staging codepath. It uses the direct writing codepath with an alternate location (i.e. ShouldSkipStaging() returns true and it customizes final_hdfs_file_name_prefix). So, I think there is no need to set staging_dir_ to something different. That doesn't interact with the external staging directory codepath. Separately, what do you think about calling this something other than a staging directory? Custom output location? External output directory? http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@279 PS4, Line 279: // The 0 padding on base and delta is to match the behavior of Hive since various : // systems will expect a certain format for dynamic partition creation. Additionally : // include an 0 statement id for delta directory so various Hive AcidUtils detect : // the directory (such as AcidUtils.baseOrDeltaSubdir()). Multiple statements in a : // single transaction is not supported. : if (overwrite_) { : output_partition->final_hdfs_file_name_prefix += StringPrintf("/base_%07ld/", : write_id_); : } else { : output_partition->final_hdfs_file_name_prefix += StringPrintf( : "/delta_%07ld_%07ld_/", write_id_, write_id_); : } Should this be triggered by its own variable independently from the external staging directory? It seems orthogonal to where the output goes. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 20:52:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262 PS2, Line 262: // When an external FE has provided a staging directory we use that directly. > Sure, I can add a comment here. I didn't initially because if I added a com Done -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 19:38:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10522: Support external use of frontend libraries
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17115 ) Change subject: IMPALA-10522: Support external use of frontend libraries .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17115 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4e3a84721ba196ec00773ce2923b19610b90edd9 Gerrit-Change-Number: 17115 Gerrit-PatchSet: 9 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 19:18:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17104 ) Change subject: IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6940/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iace716dd67290f08441857dc02d2428b0e335eaa Gerrit-Change-Number: 17104 Gerrit-PatchSet: 13 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 17:47:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17104 ) Change subject: IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iace716dd67290f08441857dc02d2428b0e335eaa Gerrit-Change-Number: 17104 Gerrit-PatchSet: 13 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 17:45:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8309/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 17:26:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10546: Add ImpalaServer interface to retrieve BackendConfig from impalad
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17116 ) Change subject: IMPALA-10546: Add ImpalaServer interface to retrieve BackendConfig from impalad .. Patch Set 16: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14a3cee29f1fc91f4431b7ea89053bb3fbfa5e69 Gerrit-Change-Number: 17116 Gerrit-PatchSet: 16 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 17:20:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h@249 PS2, Line 249: > Done Done http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262 PS2, Line 262: output_partition->final_hdfs_file_name_prefix = Substitute("$0/$1/", > Sure, I can add a comment here. I didn't initially because if I added a com I added a comment here and to the commit message for this change set. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 2 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 17:07:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17145 to look at the new patch set (#4). Change subject: IMPALA-10552: External Frontend CTAS support .. IMPALA-10552: External Frontend CTAS support - Adds the concept of an external staging dir to HdfsTableSink - This allows an external to specify the destination of the sink - When this is set, the external frontend is responsible for for moving and managing the results - External frontends may optionally supply a partition depth which acts as a hint to skip a certain number of partitions while creating directories for partitions. This is for when the external frontend has pre-created a certain number of the directories in staging (usually the static portion of a partition specification)/ - Modifies delta/base naming to include 0 prefix padding to match Hive for dynamic partitioning detection - External frontends are responsible for doing authorization checks against the staging directory and it is assumed the external frontend service is not exposed directly to users. Co-authored-by: Kurt Deschler Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Reviewed-by: Kurt Deschler --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java 5 files changed, 131 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/4 -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. Patch Set 12: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8308/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 12 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 08 Mar 2021 16:58:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262 PS2, Line 262: output_partition->final_hdfs_file_name_prefix = Substitute("$0/$1/", > Please add a comment to this effect. We may want to at least validate the p Sure, I can add a comment here. I didn't initially because if I added a comment everywhere where this is possible it'd be a lot of comments. The commit message for the external planner port makes it a bit clearer too "This allows different security policy to be applied to each type of connection. The external_fe_port should be considered a privileged service and should only be exposed to an external frontend that does user authentication and does authorization checks on generated plans" In any case, I do not mind being paranoid via comments in such a case. http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243 PS2, Line 243: if (externalStagingDir_ != null) { > Actually was thinking C++. This is Java an you are correct. Done http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803 PS2, Line 1803: public static void addFinalizationParamsForInsert( > No comment added? I think you are looking at patch set 2 and not 3 -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 2 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 16:51:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. Patch Set 12: (182 comments) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h File be/src/thirdparty/xxhash/xxhash.h: http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@70 PS12, Line 70: https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735 line too long (112 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@92 PS12, Line 92: * https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@113 PS12, Line 113: # elif defined (__cplusplus) || (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */) line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@243 PS12, Line 243: # define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_64bits_reset_withSecret) line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@253 PS12, Line 253: # define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSeed) line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@254 PS12, Line 254: # define XXH3_128bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret) line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@270 PS12, Line 270: #define XXH_VERSION_NUMBER (XXH_VERSION_MAJOR *100*100 + XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE) line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@429 PS12, Line 429: * @param statePtr A pointer to an @ref XXH32_state_t allocated with @ref XXH32_createState(). line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@441 PS12, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, const XXH32_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@476 PS12, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@628 PS12, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, const XXH64_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@631 PS12, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@700 PS12, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@724 PS12, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (120 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@743 PS12, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, const XXH3_state_t* src_state); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@756 PS12, Line 756: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@766 PS12, Line 766: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t secretSize); line too long (121 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@768 PS12, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* statePtr, const void* input, size_t length); line too long (107 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@791 PS12, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@792 PS12, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (122 > 90)
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Daniel Becker has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. IMPALA-9470: Use Parquet Bloom filters - Part 1 This change adds read support for Parquet Bloom filters for some types. The supported Parquet type - Impala type pairs are the following: --- |Parquet type | Impala type| |---| |INT32| TINYINT, SMALLINT, INT | |INT64| BIGINT | |FLOAT| FLOAT | |DOUBLE | DOUBLE | |BYTE_ARRAY | STRING | --- If a Bloom filter is available for a column that is fully dictionary encoded, the Bloom filter is not used as the dictionary can give exact results in filtering. Testing: - Added tests/query_test/test_parquet_bloom_filter.py that tests that Parquet Bloom filtering works for the supported types and that we do not incorrectly discard row groups for the unsupported type VARCHAR. Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 --- M LICENSE.txt M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exprs/expr-value.h M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/kudu/util/block_bloom_filter.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h A be/src/thirdparty/xxhash/README.md A be/src/thirdparty/xxhash/xxhash.h M be/src/util/CMakeLists.txt M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/impala-bloom-filter-buffer-allocator.cc A be/src/util/impala-bloom-filter-buffer-allocator.h A be/src/util/parquet-bloom-filter.cc A be/src/util/parquet-bloom-filter.h M bin/rat_exclude_files.txt M common/thrift/parquet.thrift A testdata/data/parquet-bloom-filtering.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test A tests/query_test/test_parquet_bloom_filter.py 26 files changed, 6,820 insertions(+), 122 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/12 -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 12 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262 PS2, Line 262: output_partition->final_hdfs_file_name_prefix = Substitute("$0/$1/", > For now we are treating the external planner port as a trusted/protected se Please add a comment to this effect. We may want to at least validate the prefix and depth arguments are consistent with what is expected here. http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243 PS2, Line 243: if (externalStagingDir_ != null) { > externalStagingDir_ is a String object and if it is not set I do believe it Actually was thinking C++. This is Java an you are correct. http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803 PS2, Line 1803: public static void addFinalizationParamsForInsert( > Done No comment added? -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 2 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 15:56:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. Patch Set 11: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/8307/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 08 Mar 2021 15:44:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6939/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 3 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 15:39:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. Patch Set 11: (185 comments) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/exec/parquet/hdfs-parquet-scanner.cc File be/src/exec/parquet/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@1576 PS11, Line 1576: status = DeserializeThriftMsg(buffer->buffer(), header_size, true, bloom_filter_header); line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@1819 PS11, Line 1819: "Bloom filter data for file '$1'.", bloom_filter_header.numBytes, filename())); line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h File be/src/thirdparty/xxhash/xxhash.h: http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@70 PS11, Line 70: https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735 line too long (112 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@92 PS11, Line 92: * https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@113 PS11, Line 113: # elif defined (__cplusplus) || (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */) line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@243 PS11, Line 243: # define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_64bits_reset_withSecret) line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@253 PS11, Line 253: # define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSeed) line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@254 PS11, Line 254: # define XXH3_128bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret) line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@270 PS11, Line 270: #define XXH_VERSION_NUMBER (XXH_VERSION_MAJOR *100*100 + XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE) line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@429 PS11, Line 429: * @param statePtr A pointer to an @ref XXH32_state_t allocated with @ref XXH32_createState(). line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@441 PS11, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, const XXH32_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@476 PS11, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@628 PS11, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, const XXH64_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@631 PS11, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@700 PS11, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@724 PS11, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (120 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@743 PS11, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, const XXH3_state_t* src_state); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@756 PS11, Line 756: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@766 PS11, Line 766: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t secretSize); line too long (121 > 90) http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@768 PS11, Line 768: XXH_PUBLIC_API XXH_errorcode
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17026 Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. IMPALA-9470: Use Parquet Bloom filters - Part 1 This change adds read support for Parquet Bloom filters for some types. The supported Parquet type - Impala type pairs are the following: --- |Parquet type | Impala type| |---| |INT32| TINYINT, SMALLINT, INT | |INT64| BIGINT | |FLOAT| FLOAT | |DOUBLE | DOUBLE | |BYTE_ARRAY | STRING | --- If a Bloom filter is available for a column that is fully dictionary encoded, the Bloom filter is not used as the dictionary can give exact results in filtering. Testing: - Added tests/query_test/test_parquet_bloom_filter.py that tests that Parquet Bloom filtering works for the supported types and that we do not incorrectly discard row groups for the unsupported type VARCHAR. Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 --- M LICENSE.txt M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exprs/expr-value.h M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/kudu/util/block_bloom_filter.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h A be/src/thirdparty/xxhash/README.md A be/src/thirdparty/xxhash/xxhash.h M be/src/util/CMakeLists.txt M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/impala-bloom-filter-buffer-allocator.cc A be/src/util/impala-bloom-filter-buffer-allocator.h A be/src/util/parquet-bloom-filter.cc A be/src/util/parquet-bloom-filter.h M common/thrift/parquet.thrift A testdata/data/parquet-bloom-filtering.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test A tests/query_test/test_parquet_bloom_filter.py 25 files changed, 6,815 insertions(+), 122 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/11 -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8306/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 3 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 15:27:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 ) Change subject: IMPALA-7712: Support Google Cloud Storage .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8305/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 08 Mar 2021 15:15:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/17145/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17145/1//COMMIT_MSG@22 PS1, Line 22: > Add co-author for Kurt Done http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h@249 PS2, Line 249: // Stores the indices into the list of non-clustering columns of the target table that > Extra line Done http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.h@249 PS1, Line 249: // Stores the indices into the list of non-clustering columns of the target table that > delete Done http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@131 PS1, Line 131: staging_dir_ = Substitute("$0/_impala_insert_staging/$1", > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@232 PS1, Line 232: void HdfsTableSink::BuildHdfsFileNames( > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@506 PS1, Line 506: for (int j = 0; j < partition_key_expr_evals_.size(); ++j) { > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@549 PS1, Line 549: // partition_name_ss now holds the unique descriptor for this partition, > line too long (95 > 90) Done http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262 PS2, Line 262: // When an external FE has provided a staging directory we use that directly. > Do we need any validation on the path here to avoid security issues writing For now we are treating the external planner port as a trusted/protected service meant for deployment in scenarios you can lock down access to said port. We should certainly in the (near) future look into doing similar things that communication between impalads do that prevent users from impersonating a coordinator. http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@90 PS2, Line 90: > I would say "specifies depth" rather than hint as this needs to be enforced Done http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243 PS2, Line 243: hdfsTableSink.setSort_columns(sortColumns_); > externalStagingDir_ is not a pointer.. externalStagingDir_ is a String object and if it is not set I do believe it can be null. (Maybe you read it as the int externalStagingPartitionDepth_?) http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803 PS2, Line 1803: // This is public to allow external frontends to utilize this method to fill in the > Add comment why this is public static. Done -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 3 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 15:11:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 ) Change subject: IMPALA-7712: Support Google Cloud Storage .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8304/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 08 Mar 2021 15:10:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17145 to look at the new patch set (#3). Change subject: IMPALA-10552: External Frontend CTAS support .. IMPALA-10552: External Frontend CTAS support - Adds the concept of an external staging dir to HdfsTableSink - This allows an external to specify the destination of the sink - When this is set, the external frontend is responsible for for moving and managing the results - External frontends may optionally supply a partition depth which acts as a hint to skip a certain number of partitions while creating directories for partitions. This is for when the external frontend has pre-created a certain number of the directories in staging (usually the static portion of a partition specification)/ - Modifies delta/base naming to include 0 prefix padding to match Hive for dynamic partitioning detection Co-authored-by: Kurt Deschler Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Reviewed-by: Kurt Deschler --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java 5 files changed, 129 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/3 -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 3 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17121 to look at the new patch set (#5). Change subject: IMPALA-7712: Support Google Cloud Storage .. IMPALA-7712: Support Google Cloud Storage This patch adds support for GCS(Google Cloud Storage). Using the gcs-connector, the implementation is similar to other remote FileSystems. New flags for GCS: - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16. - cache_gcs_file_handles: Enable the file handle cache for GCS files. Defaults to true. Follow-up: - Support for spilling to GCS will be addressed in IMPALA-10561. - Some tests are skipped for further investigation (IMPALA-10562, IMPALA-10563). Tests: - Compile and create hdfs test data on a GCE instance. Upload test data to a GCS bucket. Modify all locations in HMS DB to point to the GCS bucket. Remove some hdfs caching params. Run CORE tests. - Compile and load snapshot data to a GCS bucket. Run CORE tests. Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b --- M be/src/exec/hdfs-table-sink.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr.cc M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M java/executor-deps/pom.xml M java/pom.xml M testdata/bin/create-load-data.sh M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/run-all.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py M tests/authorization/test_ranger.py M tests/common/impala_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_admission_controller.py M tests/custom_cluster/test_coordinators.py M tests/custom_cluster/test_event_processing.py M tests/custom_cluster/test_hdfs_fd_caching.py M tests/custom_cluster/test_hive_parquet_codec_interop.py M tests/custom_cluster/test_hive_text_codec_interop.py M tests/custom_cluster/test_insert_behaviour.py M tests/custom_cluster/test_lineage.py M tests/custom_cluster/test_local_catalog.py M tests/custom_cluster/test_local_tz_conversion.py M tests/custom_cluster/test_metadata_replicas.py M tests/custom_cluster/test_parquet_max_page_header.py M tests/custom_cluster/test_permanent_udfs.py M tests/custom_cluster/test_query_retries.py M tests/custom_cluster/test_restart_services.py M tests/custom_cluster/test_topic_update_frequency.py M tests/data_errors/test_data_errors.py M tests/failure/test_failpoints.py M tests/metadata/test_catalogd_debug_actions.py M tests/metadata/test_compute_stats.py M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py M tests/metadata/test_hdfs_permissions.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/metadata/test_refresh_partition.py M tests/metadata/test_reset_metadata.py M tests/metadata/test_stale_metadata.py M tests/metadata/test_testcase_builder.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_acid.py M tests/query_test/test_aggregation.py M tests/query_test/test_date_queries.py M tests/query_test/test_hbase_queries.py M tests/query_test/test_hdfs_caching.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py M tests/query_test/test_insert_permutation.py M tests/query_test/test_join_queries.py M tests/query_test/test_nested_types.py M tests/query_test/test_observability.py M tests/query_test/test_partitioning.py M tests/query_test/test_resource_limits.py M tests/query_test/test_scanners.py M tests/shell/test_shell_commandline.py M tests/stress/test_acid_stress.py M tests/stress/test_ddl_stress.py M tests/stress/test_insert_stress.py M tests/util/filesystem_utils.py 70 files changed, 299 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/5 -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 ) Change subject: IMPALA-7712: Support Google Cloud Storage .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17121/4/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/17121/4/tests/shell/test_shell_commandline.py@656 PS4, Line 656: . flake8: E501 line too long (91 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 08 Mar 2021 14:52:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17121 to look at the new patch set (#4). Change subject: IMPALA-7712: Support Google Cloud Storage .. IMPALA-7712: Support Google Cloud Storage This patch adds support for GCS(Google Cloud Storage). Using the gcs-connector, the implementation is similar to other remote FileSystems. New flags for GCS: - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16. - cache_gcs_file_handles: Enable the file handle cache for GCS files. Defaults to true. Follow-up: - Support for spilling to GCS will be addressed in IMPALA-10561. - Some tests are skipped for further investigation (IMPALA-10562, IMPALA-10563). Tests: - Compile and create hdfs test data on a GCE instance. Upload test data to a GCS bucket. Modify all locations in HMS DB to point to the GCS bucket. Remove some hdfs caching params. Run CORE tests. - Compile and load snapshot data to a GCS bucket. Run CORE tests. Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b --- M be/src/exec/hdfs-table-sink.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr.cc M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M java/executor-deps/pom.xml M java/pom.xml M testdata/bin/create-load-data.sh M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/run-all.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py M tests/authorization/test_ranger.py M tests/common/impala_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_admission_controller.py M tests/custom_cluster/test_coordinators.py M tests/custom_cluster/test_event_processing.py M tests/custom_cluster/test_hdfs_fd_caching.py M tests/custom_cluster/test_hive_parquet_codec_interop.py M tests/custom_cluster/test_hive_text_codec_interop.py M tests/custom_cluster/test_insert_behaviour.py M tests/custom_cluster/test_lineage.py M tests/custom_cluster/test_local_catalog.py M tests/custom_cluster/test_local_tz_conversion.py M tests/custom_cluster/test_metadata_replicas.py M tests/custom_cluster/test_parquet_max_page_header.py M tests/custom_cluster/test_permanent_udfs.py M tests/custom_cluster/test_query_retries.py M tests/custom_cluster/test_restart_services.py M tests/custom_cluster/test_topic_update_frequency.py M tests/data_errors/test_data_errors.py M tests/failure/test_failpoints.py M tests/metadata/test_catalogd_debug_actions.py M tests/metadata/test_compute_stats.py M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py M tests/metadata/test_hdfs_permissions.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/metadata/test_refresh_partition.py M tests/metadata/test_reset_metadata.py M tests/metadata/test_stale_metadata.py M tests/metadata/test_testcase_builder.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_acid.py M tests/query_test/test_aggregation.py M tests/query_test/test_date_queries.py M tests/query_test/test_hbase_queries.py M tests/query_test/test_hdfs_caching.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py M tests/query_test/test_insert_permutation.py M tests/query_test/test_join_queries.py M tests/query_test/test_nested_types.py M tests/query_test/test_observability.py M tests/query_test/test_partitioning.py M tests/query_test/test_resource_limits.py M tests/query_test/test_scanners.py M tests/shell/test_shell_commandline.py M tests/stress/test_acid_stress.py M tests/stress/test_ddl_stress.py M tests/stress/test_insert_stress.py M tests/util/filesystem_utils.py 70 files changed, 299 insertions(+), 61 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/4 -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 ) Change subject: IMPALA-10552: External Frontend CTAS support .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h@249 PS2, Line 249: Extra line http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262 PS2, Line 262: output_partition->final_hdfs_file_name_prefix = Substitute("$0/$1/", Do we need any validation on the path here to avoid security issues writing to an arbitrary HDFS directory? Mainly concerned because the path prefix is supplied externally. http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@90 PS2, Line 90: // This acts as a hint to the sink on how deep into the partition specification in I would say "specifies depth" rather than hint as this needs to be enforced. http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243 PS2, Line 243: if (externalStagingDir_ != null) { externalStagingDir_ is not a pointer.. http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803 PS2, Line 1803: public static void addFinalizationParamsForInsert( Add comment why this is public static. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 2 Gerrit-Owner: John Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Mon, 08 Mar 2021 13:07:23 + Gerrit-HasComments: Yes