[jira] [Commented] (HIVE-2780) Implement more restrictive table sampler

Phabricator (JIRA) Sat, 19 Jan 2013 12:02:15 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558084#comment-13558084
 ]


Phabricator commented on HIVE-2780:
-----------------------------------

ashutoshc has requested changes to the revision "HIVE-2780 [jira] Implement 
more restrictive table sampler".

  Few comments.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:489 This 
config needs to be added to HiveConf.java and in hive-site.xml.template with 
description. Also, indicate that alternate sampler is available if someone 
wants to use it.
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:583 
Instead of growing this file further, I think it will make sense to put this 
class in its own java file. Also,  can you please also add comments on 
algorithm which this sampler follows.
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:620 
Instead of growing this file further, I think it will make sense to put this 
class in its own java file. Also,  can you please also add comments on 
algorithm which this sampler follows.
  ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java:657 I 
assume split is splitable only if its either FileInputFormat or uncompressed 
TextInputFormat. Is that correct ? If so, I think it will be easier to read 
this logic if its written as follows:

  if ( if instanceof FileIF  || if instanceof mapreduce.FileIF || (if 
instanceof TextIF && !uncompressed))
   return true
  else
  return false
  ql/src/java/org/apache/hadoop/hive/ql/io/SplitSampler.java:34 Please document 
the contract of this interface.
  ql/src/test/results/clientpositive/split_sample_sampler.q.out:27 Just because 
sampler is different, value of count( * ) should not change ? But you got 8 
with HeadSampler, but 118 with Default Sampler ?
  ql/src/test/results/clientpositive/split_sample_sampler.q.out:36 Just because 
sampler is different, value of count( * ) should not change ? But you got 8 
with HeadSampler, but 20 with Default Sampler ? Also, default sampler generated 
same number 118 for both percent as well as Bytes, but Head sampler got 
different values. Whats the reason for that ?

REVISION DETAIL
  https://reviews.facebook.net/D1623

BRANCH
  DPAL-722

To: JIRA, ashutoshc, navis

                
> Implement more restrictive table sampler
> ----------------------------------------
>
>                 Key: HIVE-2780
>                 URL: https://issues.apache.org/jira/browse/HIVE-2780
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>         Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2780.D1623.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2780.D1623.2.patch, HIVE-2780.D1623.3.patch
>
>
> Current table sampling scans whole block, making more rows included than 
> expected especially for small tables.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2780) Implement more restrictive table sampler

Reply via email to