Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs. ......................................................................
Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/6868/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 809: public HdfsPartition cloneNewFds(List<FileDescriptor> newFds) { i don't think this is used anymore. http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 1961: > I tried integrating the sampling into computeScanRangeLocations() but ran i i'd say it's cleaner than partition cloning, because the way the cloning works is incomprehensible outside of the context of file sampling. but you're right, returning as your map value a list<integer> and then constructing a list<filedescriptor> from it later is inferior to simply returning list<filedescriptor> values here directly. http://gerrit.cloudera.org:8080/#/c/6868/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 1940: * The 'percentBytes' parameter must be between 0 and 100. i'd also point out that it allocates something like 12+ bytes per existing (pre-sampled) file, and that the goal was to minimize that amount. just so the next person doesn't come in and "simplify" this by allocating larger structures instead of the two scalar arrays. i'm just worried that this will end up generating millions of intermittent objects for very large tables (where it'll clearly be used, and where the sampling percentage itself might be fairly small, ie, the query itself would run quickly). -- To view, visit http://gerrit.cloudera.org:8080/6868 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-HasComments: Yes