Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6868/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 809:   public HdfsPartition cloneNewFds(List<FileDescriptor> newFds) {
i don't think this is used anymore.


http://gerrit.cloudera.org:8080/#/c/6868/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1961: 
> I tried integrating the sampling into computeScanRangeLocations() but ran i
i'd say it's cleaner than partition cloning, because the way the cloning works 
is incomprehensible outside of the context of file sampling.

but you're right, returning as your map value a list<integer> and then 
constructing a list<filedescriptor> from it later is inferior to simply 
returning list<filedescriptor> values here directly.


http://gerrit.cloudera.org:8080/#/c/6868/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1940:    * The 'percentBytes' parameter must be between 0 and 100.
i'd also point out that it allocates something like 12+ bytes per existing 
(pre-sampled) file, and that the goal was to minimize that amount. just so the 
next person doesn't come in and "simplify" this by allocating larger structures 
instead of the two scalar arrays.

i'm just worried that this will end up generating millions of intermittent 
objects for very large tables (where it'll clearly be used, and where the 
sampling percentage itself might be fairly small, ie, the query itself would 
run quickly).


-- 
To view, visit http://gerrit.cloudera.org:8080/6868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to