[
https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839575#action_12839575
]
He Yongqiang commented on HIVE-1197:
------------------------------------
Looks very good overall, congrats!
just few minor comments:
1. Can you change inputFormatClassName to use getter and setter method?
2. some duplication code with HiveInputFormat, can we reuse them?
3. In BucketizedHiveRecordReader's next, i think should remove the check of
"curReader == null". we should throw an exception if curReader==null, which
means the reader has been closed.
4. i think we should remove line 207 in BucketizedHiveInputFormat:
newjob.setInputFormat(inputFormat.getClass());
5. In HiveRecordReader,
5.1 progress is calculated based on (number of splits done) / (total split
number), can we make it more accurate? Let's say the work is evenly divided
among all splits. something like this: (number of splits done) / (total split
number) + currReader.getProgess();
5.2 getPos should return this currReader.getPos()
Another one is do you think it is a good idea to let the
BucketizedHiveInputFormat extend HiveInputFormat? That way, the code would be
more clear. And we should put the RecordReader and InputSplit in the same file
as BucketizedHiveInputFormat.
> create a new input format where a mapper spans a file
> -----------------------------------------------------
>
> Key: HIVE-1197
> URL: https://issues.apache.org/jira/browse/HIVE-1197
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Siying Dong
> Fix For: 0.6.0
>
> Attachments: hive.1197.1.patch
>
>
> This will be needed for Sort merge joins.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.