[
https://issues.apache.org/jira/browse/HIVE-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558084#comment-13558084
]
Phabricator commented on HIVE-2780:
-----------------------------------
ashutoshc has requested changes to the revision "HIVE-2780 [jira] Implement
more restrictive table sampler".
Few comments.
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:489 This
config needs to be added to HiveConf.java and in hive-site.xml.template with
description. Also, indicate that alternate sampler is available if someone
wants to use it.
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:583
Instead of growing this file further, I think it will make sense to put this
class in its own java file. Also, can you please also add comments on
algorithm which this sampler follows.
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:620
Instead of growing this file further, I think it will make sense to put this
class in its own java file. Also, can you please also add comments on
algorithm which this sampler follows.
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:657 I
assume split is splitable only if its either FileInputFormat or uncompressed
TextInputFormat. Is that correct ? If so, I think it will be easier to read
this logic if its written as follows:
if ( if instanceof FileIF || if instanceof mapreduce.FileIF || (if
instanceof TextIF && !uncompressed))
return true
else
return false
ql/src/java/org/apache/hadoop/hive/ql/io/SplitSampler.java:34 Please document
the contract of this interface.
ql/src/test/results/clientpositive/split_sample_sampler.q.out:27 Just because
sampler is different, value of count( * ) should not change ? But you got 8
with HeadSampler, but 118 with Default Sampler ?
ql/src/test/results/clientpositive/split_sample_sampler.q.out:36 Just because
sampler is different, value of count( * ) should not change ? But you got 8
with HeadSampler, but 20 with Default Sampler ? Also, default sampler generated
same number 118 for both percent as well as Bytes, but Head sampler got
different values. Whats the reason for that ?
REVISION DETAIL
https://reviews.facebook.net/D1623
BRANCH
DPAL-722
To: JIRA, ashutoshc, navis
> Implement more restrictive table sampler
> ----------------------------------------
>
> Key: HIVE-2780
> URL: https://issues.apache.org/jira/browse/HIVE-2780
> Project: Hive
> Issue Type: Improvement
> Reporter: Navis
> Assignee: Navis
> Priority: Trivial
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2780.D1623.1.patch,
> ASF.LICENSE.NOT.GRANTED--HIVE-2780.D1623.2.patch, HIVE-2780.D1623.3.patch
>
>
> Current table sampling scans whole block, making more rows included than
> expected especially for small tables.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira