[jira] [Commented] (HIVE-2121) Input Sampling By Splits

Namit Jain (JIRA) Tue, 26 Apr 2011 22:30:51 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025614#comment-13025614
 ]


Namit Jain commented on HIVE-2121:
----------------------------------

A few other comments:

1. The behavior should be reflected in explain plan extended.
   Yongqiang recently added a similar plan change in pathtoAlias etc.
   in HIVE-2126. Take a look as an example.
2. It would increase the utility if I can add a new session variable
   where I specify the split percentage - this will be applicable to 
   all tables (this can also be done in a follow-up)

> Input Sampling By Splits
> ------------------------
>
>                 Key: HIVE-2121
>                 URL: https://issues.apache.org/jira/browse/HIVE-2121
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch, HIVE-2121.3.patch, 
> HIVE-2121.4.patch
>
>
> We need a better input sampling to serve at least two purposes:
> 1. test their queries against a smaller data set
> 2. understand more about how the data look like without scanning the whole 
> table.
> A simple function that gives a subset splits will help in those cases. It 
> doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2121) Input Sampling By Splits

Reply via email to