[ 
https://issues.apache.org/jira/browse/HIVE-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880120#action_12880120
 ] 

Jeff Zhang commented on HIVE-1402:
----------------------------------

Hi, I make a draft implementation for one special case. And it works, but since 
it is only for one special case, so I have some hard coding. I hope someone can 
give some help or instruction for the next step. 
One big problem of parallel ORDER BY is that the output  key type of ExecMapper 
is HiveKey, and it has been serialized by LazyBinarySerDe, so the original 
column type is lost here. But when do sampling and partition, I should use the 
original column type.

The following is my initial design.

1. During parse stage, extract one SampleOperator which has two children: 
TableScanOperator, SelectOperator ( I am not familiar with Hive Parse Stage, 
and the code is not clear for me, could anyone give some help or recommend some 
documentation about the Hive parser ? )

2. Modify the TotalOrderPartitioner.  Add a Deserializer to convert the HiveKey 
to its original column type. and deserialie the HiveKey in method 
getPartition(). 

Welcome any comments and help.



> Add parallel ORDER BY to Hive
> -----------------------------
>
>                 Key: HIVE-1402
>                 URL: https://issues.apache.org/jira/browse/HIVE-1402
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.5.0
>            Reporter: Jeff Hammerbacher
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to