[ 
https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839575#action_12839575
 ] 

He Yongqiang commented on HIVE-1197:
------------------------------------

Looks very good overall, congrats!

just few minor comments:
1. Can you change inputFormatClassName to use getter and setter method?
2. some duplication code with HiveInputFormat, can we reuse them?
3. In BucketizedHiveRecordReader's next, i think should remove the check of 
"curReader == null". we should throw an exception if curReader==null, which 
means the reader has been closed.
4. i think we should remove line 207 in BucketizedHiveInputFormat:   
newjob.setInputFormat(inputFormat.getClass());
5. In HiveRecordReader,
5.1 progress is calculated based on (number of splits done) / (total split 
number), can we make it more accurate? Let's say the work is evenly divided 
among all splits. something like this: (number of splits done) / (total split 
number) + currReader.getProgess();
5.2 getPos should return this currReader.getPos()

Another one is do you think it is a good idea to let the 
BucketizedHiveInputFormat extend HiveInputFormat? That way, the code would be 
more clear. And we should put the RecordReader and InputSplit in the same file 
as BucketizedHiveInputFormat.

> create a new input format where a mapper spans a file
> -----------------------------------------------------
>
>                 Key: HIVE-1197
>                 URL: https://issues.apache.org/jira/browse/HIVE-1197
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Siying Dong
>             Fix For: 0.6.0
>
>         Attachments: hive.1197.1.patch
>
>
> This will be needed for Sort merge joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to