[ 
https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839004#action_12839004
 ] 

Namit Jain commented on HIVE-1193:
----------------------------------

There are 2 different jiras: one for ensuring the bucketing properties and one 
for ensuring the sorted properties.

Currently, even though the tables are sorted and bucketed during the table 
creation, they are not enforced.
It is up to the user to make sure the data is bucketed/sorted appropriately 
while loading.
Since it is not enforced, the optimizer cannot take advantage of that because 
it doesnt know whether the data is actually sorted.

There was a jira previously, which took advantage of the fact that the data is 
sorted for processing for group by.
This is controlled by configurable parameters.

Going forward, we want to use them for joining, specifically for sort merge 
joins.

@Edward, currently we are not doing skipping based on sorting properties.

Currently, we create an additional map-reduce job for bucketing/sorting.
Even if there is a cluster by, and the data is already bucketed/sorted by the 
correct key, we dont use that. There
will be another map-reduce job. This can be optimized in future.

Merging of map-only jobs is disabled, but same thing should be performed for 
map-reduce jobs also. I will file a follow-up
jira on that.


> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we 
> dont make sure that data is sorted. That might be useful some downstream 
> operations.
> This cannot be made the default due to backward compatibility, but an option 
> can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to