On Fri, May 19, 2017 at 6:01 PM, Sailesh Mukil (Code Review) < ger...@cloudera.org> wrote:
> Sailesh Mukil has posted comments on this change. > > Change subject: IMPALA-5333: Add support for Impala to work with ADLS > ...................................................................... > > > Patch Set 3: > > > I have a few high level questions about this patch. This patch > > treats S3 and ADL the same way but after looking at the HDFS > > classes and the ADL API I see some differences between ADL and S3 > > API. For instance, I don't see a way in ADL to recursively > > enumerate all entries under a particular directory (something that > > exists in S3). Our code today for S3 sort of relies on that > > property so I am wondering if our code works as expected or if I am > > missing something (quite possible). Also, the AdlFileSystem class > > has functions for returning the block locations, which means that > > we don't have to call the synthesize metadata calls as we do for > > S3. In the long run, HDFS may expose replica locations from the ADL > > local tiers (e.g. Cosmos), but I guess this is not relevant today. > > Thoughts? > > Very good questions, Dimitris. > > - Could you point me to the exact API that does the recursive enumeration? > I was referring to the listFiles() method in S3AFileSystem. I couldn't find an equivalent call that recursively enumerates all files under a path in the ADLS API. See the code in HdfsTable::loadBlockMetadata() for uses of listFiles(). > > - It's true that we don't need to synthesize the block metadata. However, > I was in conversation with the ADLS connector folks and ADLS doesn't > actually expose the block locations internally. So if you look at the > getFileBlockLocations() code in AdlFileSystem, it actually itself > synthesizes the block metadata: > http://github.mtv.cloudera.com/CDH/hadoop/blob/ > 4da5de1a7c965f63f0b798e9ec4ec72ed7249f18/hadoop-tools/ > hadoop-azure-datalake/src/main/java/org/apache/hadoop/ > fs/adl/AdlFileSystem.java#L893 > > Exactly. I was thinking that if there is a call to do that, we don't have to do it ourselves. > So functionally it's the same. But I agree, we should call the relevant > API if there it is "supported" in some form or the other. I will update the > patch. > > -- > To view, visit http://gerrit.cloudera.org:8080/6910 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 > Gerrit-PatchSet: 3 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com> > Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> > Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> > Gerrit-Reviewer: David Knupp <dkn...@cloudera.com> > Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> > Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> > Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> > Gerrit-HasComments: No >