[ https://issues.apache.org/jira/browse/HIVE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839575#action_12839575 ]
He Yongqiang commented on HIVE-1197: ------------------------------------ Looks very good overall, congrats! just few minor comments: 1. Can you change inputFormatClassName to use getter and setter method? 2. some duplication code with HiveInputFormat, can we reuse them? 3. In BucketizedHiveRecordReader's next, i think should remove the check of "curReader == null". we should throw an exception if curReader==null, which means the reader has been closed. 4. i think we should remove line 207 in BucketizedHiveInputFormat: newjob.setInputFormat(inputFormat.getClass()); 5. In HiveRecordReader, 5.1 progress is calculated based on (number of splits done) / (total split number), can we make it more accurate? Let's say the work is evenly divided among all splits. something like this: (number of splits done) / (total split number) + currReader.getProgess(); 5.2 getPos should return this currReader.getPos() Another one is do you think it is a good idea to let the BucketizedHiveInputFormat extend HiveInputFormat? That way, the code would be more clear. And we should put the RecordReader and InputSplit in the same file as BucketizedHiveInputFormat. > create a new input format where a mapper spans a file > ----------------------------------------------------- > > Key: HIVE-1197 > URL: https://issues.apache.org/jira/browse/HIVE-1197 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Namit Jain > Assignee: Siying Dong > Fix For: 0.6.0 > > Attachments: hive.1197.1.patch > > > This will be needed for Sort merge joins. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.