Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
......................................................................


Patch Set 5:

(4 comments)

Here's my first pass. I'm also going to look at the ABFS change and see if 
anything is different.

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc@186
PS5, Line 186: true
We'll need to double check that GCS file handles work with the file handle 
cache. It would be ok to default to false until we validate it. In previous 
cases like ABFS, we started out with this disabled.

We would need to verify that unbuffer() is implemented and everything behaves 
well when there are thousands of these sitting around in the cache. It might 
just work.


http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@863
PS5, Line 863:       if (fs instanceof GoogleHadoopFileSystem) {
             :         curIter_.hasNext();
             :       }
I have seen an issue like this before on older versions of the S3 connector as 
well for recursively_list_partitions=false.

If recursively_list_partitions=false, we won't go through this codepath. We 
would have a FilterIterator that uses a fs.listStatusIterator(p) directly. See 
listStatus() in this file.

It might make sense for us to put this in the constructor for FilterIterator 
and do it for all filesystems.


http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh@80
PS5, Line 80:       hadoop fs -rm -r -skipTrash 
${FILESYSTEM_PREFIX}${TEST_WAREHOUSE_DIR}
I'm assuming that this command to remove any existing warehouse works for GCS, 
because it looks like this is what we would use for ABFS/ADLS.


http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
Does IMPALA-10563 consistently reproduce? Do we have any idea if it is specific 
to the minicluster environment or if it could happen on a real cluster?



--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Tue, 09 Mar 2021 01:04:49 +0000
Gerrit-HasComments: Yes

Reply via email to