Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
......................................................................


Patch Set 2:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/6910/2/bin/impala-config.sh
File bin/impala-config.sh:

PS2, Line 243: azure_tenant_id
> nit: uppercase for consistency?
I kept it lower case to be consistent with the ADLS Python client:
https://github.com/Azure/azure-data-lake-store-python/blob/985ba77f0a4c5d6eb31ff8a7a66e5cb7d159c1e8/docs/source/index.rst


PS2, Line 284: fi
> Should we have a dummy check like the one in L268 to make sure we can acces
We would need to download a command line tool to do that. I checked it out and 
it looks like it may add a lot of dependencies.

Also the documentation for the CLI seems thin and doesn't state what kind of 
OSes it supports, etc.
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-cli

Which is why I didn't add that check. What do you think?


http://gerrit.cloudera.org:8080/#/c/6910/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 790: .
> If there is a jira for this you may want to add it in the comment.
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

PS2, Line 306: fs instanceof LocalFileSystem ||
             :              fs instanceof AdlFileSystem);
> nit: fix indentation
Done


PS2, Line 325: a
> nit: remove
Done


PS2, Line 332: a
> nit: remove
Done


PS2, Line 483: FileSystemUtil.isLocalFileSystem(path) ||
             :             FileSystemUtil.isS3AFileSystem(path) ||
             :             FileSystemUtil.isADLFileSystem(path));
> nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/infra/python/bootstrap_virtualenv.py
File infra/python/bootstrap_virtualenv.py:

PS2, Line 63: Cffi
> huh? :)
Clarified it. "C Foreign Function Interface"


PS2, Line 221: CentOS
> How about Ubuntu? If there is a minimum version, we should mention that too
I've only tested it on Ubuntu 14.04 and 16.04. I don't know what the lowest 
supported version is though, since the documentation for the ADLS python client 
doesn't mention anything.

I found the CentOS 6.6 manually through testing on different CentOS machines 
(which was pretty time consuming). I will try to do the same for Ubuntu 
machines later when there is more time.


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

Line 54: from tests.util.filesystem_utils import IS_S3, S3_BUCKET_NAME, 
IS_ADLS, ADLS_STORE_NAME, \
> This is a nit, but once the import list becomes this long, a common idiom i
Done


Line 76:   assert IS_ADLS == False, "Need the ADLSClient for testing with ADLS"
> Is covering the ImportError with an AssertionError funky? Might it be bette
Done


PS2, Line 137: if IS_S3: cls.filesystem_client = S3Client(S3_BUCKET_NAME)
             :     elif IS_ADLS: cls.filesystem_client = 
ADLSClient(ADLS_STORE_NAME)
> nit: I think the format should be:
Lol! Done.


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

PS2, Line 61: will
> may?
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

PS2, Line 48: slow_client
> is this the right tag? Isn't this the consistency issue described earlier? 
renamed to 'eventually_consistent'


PS2, Line 377: slow_client
> same here
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 164:   #@SkipIfADLS.hdfs_block_size
> now it's commented out
Oops, my bad. I commented it out to do some local testing and forgot to remove 
it. Done.


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

PS2, Line 25: from tests.common.skip import SkipIfOldAggsJoins, SkipIfIsilon, 
SkipIfS3, SkipIfADLS, SkipIfLocal
> long line
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS2, Line 35: from tests.common.skip import SkipIfS3, SkipIfADLS, SkipIfIsilon, 
SkipIfOldAggsJoins, SkipIfLocal
> long line
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/util/adls_util.py
File tests/util/adls_util.py:

Line 29:   @classmethod
> This seems really unusual to me. __init__ methods typically don't need a @c
Thanks for the explanation. I've removed @classmethod.


PS2, Line 29: @classmethod
> Why decorate __init__() with @classmethod? As a result, all 'ADLSClient' in
No, this was a mistake on my part. I've removed @classmethod.


PS2, Line 40: f.close()
> You're correct -- "with  ... as foo" should close foo for you.
Done


PS2, Line 40: f.close()
> I believe this is redundant but I could be wrong.
Done


Line 45:     self.adlsclient.mkdir(path)
> What is the return value of self.adlsclient.mkdir(path)? Does it make sense
It looks like it just throws an exception:
https://github.com/Azure/azure-data-lake-store-python/blob/master/azure/datalake/store/core.py#L444

It has no return value. Do you think it should be wrapped in a try catch?


Line 72:     files = self.ls(path)
> Super insignificant nit: you could put this inside of the list comprehensio
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: David Knupp <dkn...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to