Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 )
Change subject: IMPALA-7712: Support Google Cloud Storage ...................................................................... Patch Set 5: (4 comments) Here's my first pass. I'm also going to look at the ABFS change and see if anything is different. http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc@186 PS5, Line 186: true We'll need to double check that GCS file handles work with the file handle cache. It would be ok to default to false until we validate it. In previous cases like ABFS, we started out with this disabled. We would need to verify that unbuffer() is implemented and everything behaves well when there are thousands of these sitting around in the cache. It might just work. http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@863 PS5, Line 863: if (fs instanceof GoogleHadoopFileSystem) { : curIter_.hasNext(); : } I have seen an issue like this before on older versions of the S3 connector as well for recursively_list_partitions=false. If recursively_list_partitions=false, we won't go through this codepath. We would have a FilterIterator that uses a fs.listStatusIterator(p) directly. See listStatus() in this file. It might make sense for us to put this in the constructor for FilterIterator and do it for all filesystems. http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh File testdata/bin/load-test-warehouse-snapshot.sh: http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh@80 PS5, Line 80: hadoop fs -rm -r -skipTrash ${FILESYSTEM_PREFIX}${TEST_WAREHOUSE_DIR} I'm assuming that this command to remove any existing warehouse works for GCS, because it looks like this is what we would use for ABFS/ADLS. http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py File tests/stress/test_insert_stress.py: http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81 PS5, Line 81: @SkipIfGCS.jira(reason="IMPALA-10563") Does IMPALA-10563 consistently reproduce? Do we have any idea if it is specific to the minicluster environment or if it could happen on a real cluster? -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Tue, 09 Mar 2021 01:04:49 +0000 Gerrit-HasComments: Yes