Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS ......................................................................
Patch Set 2: (24 comments) http://gerrit.cloudera.org:8080/#/c/6910/2/bin/impala-config.sh File bin/impala-config.sh: PS2, Line 243: azure_tenant_id > nit: uppercase for consistency? I kept it lower case to be consistent with the ADLS Python client: https://github.com/Azure/azure-data-lake-store-python/blob/985ba77f0a4c5d6eb31ff8a7a66e5cb7d159c1e8/docs/source/index.rst PS2, Line 284: fi > Should we have a dummy check like the one in L268 to make sure we can acces We would need to download a command line tool to do that. I checked it out and it looks like it may add a lot of dependencies. Also the documentation for the CLI seems thin and doesn't state what kind of OSes it supports, etc. https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-cli Which is why I didn't add that check. What do you think? http://gerrit.cloudera.org:8080/#/c/6910/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS2, Line 790: . > If there is a jira for this you may want to add it in the comment. Done http://gerrit.cloudera.org:8080/#/c/6910/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: PS2, Line 306: fs instanceof LocalFileSystem || : fs instanceof AdlFileSystem); > nit: fix indentation Done PS2, Line 325: a > nit: remove Done PS2, Line 332: a > nit: remove Done PS2, Line 483: FileSystemUtil.isLocalFileSystem(path) || : FileSystemUtil.isS3AFileSystem(path) || : FileSystemUtil.isADLFileSystem(path)); > nit: indentation Done http://gerrit.cloudera.org:8080/#/c/6910/2/infra/python/bootstrap_virtualenv.py File infra/python/bootstrap_virtualenv.py: PS2, Line 63: Cffi > huh? :) Clarified it. "C Foreign Function Interface" PS2, Line 221: CentOS > How about Ubuntu? If there is a minimum version, we should mention that too I've only tested it on Ubuntu 14.04 and 16.04. I don't know what the lowest supported version is though, since the documentation for the ADLS python client doesn't mention anything. I found the CentOS 6.6 manually through testing on different CentOS machines (which was pretty time consuming). I will try to do the same for Ubuntu machines later when there is more time. http://gerrit.cloudera.org:8080/#/c/6910/2/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: Line 54: from tests.util.filesystem_utils import IS_S3, S3_BUCKET_NAME, IS_ADLS, ADLS_STORE_NAME, \ > This is a nit, but once the import list becomes this long, a common idiom i Done Line 76: assert IS_ADLS == False, "Need the ADLSClient for testing with ADLS" > Is covering the ImportError with an AssertionError funky? Might it be bette Done PS2, Line 137: if IS_S3: cls.filesystem_client = S3Client(S3_BUCKET_NAME) : elif IS_ADLS: cls.filesystem_client = ADLSClient(ADLS_STORE_NAME) > nit: I think the format should be: Lol! Done. http://gerrit.cloudera.org:8080/#/c/6910/2/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: PS2, Line 61: will > may? Done http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: PS2, Line 48: slow_client > is this the right tag? Isn't this the consistency issue described earlier? renamed to 'eventually_consistent' PS2, Line 377: slow_client > same here Done http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: Line 164: #@SkipIfADLS.hdfs_block_size > now it's commented out Oops, my bad. I commented it out to do some local testing and forgot to remove it. Done. http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_nested_types.py File tests/query_test/test_nested_types.py: PS2, Line 25: from tests.common.skip import SkipIfOldAggsJoins, SkipIfIsilon, SkipIfS3, SkipIfADLS, SkipIfLocal > long line Done http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: PS2, Line 35: from tests.common.skip import SkipIfS3, SkipIfADLS, SkipIfIsilon, SkipIfOldAggsJoins, SkipIfLocal > long line Done http://gerrit.cloudera.org:8080/#/c/6910/2/tests/util/adls_util.py File tests/util/adls_util.py: Line 29: @classmethod > This seems really unusual to me. __init__ methods typically don't need a @c Thanks for the explanation. I've removed @classmethod. PS2, Line 29: @classmethod > Why decorate __init__() with @classmethod? As a result, all 'ADLSClient' in No, this was a mistake on my part. I've removed @classmethod. PS2, Line 40: f.close() > You're correct -- "with ... as foo" should close foo for you. Done PS2, Line 40: f.close() > I believe this is redundant but I could be wrong. Done Line 45: self.adlsclient.mkdir(path) > What is the return value of self.adlsclient.mkdir(path)? Does it make sense It looks like it just throws an exception: https://github.com/Azure/azure-data-lake-store-python/blob/master/azure/datalake/store/core.py#L444 It has no return value. Do you think it should be wrapped in a try catch? Line 72: files = self.ls(path) > Super insignificant nit: you could put this inside of the list comprehensio Done -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: David Knupp <dkn...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-HasComments: Yes