[ 
https://issues.apache.org/jira/browse/HIVE-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723657#comment-15723657
 ] 

Yongzhi Chen commented on HIVE-15359:
-------------------------------------

Current skip footer feature need one file map to one split to work properly. 
The split need to be not only logical one but also physically one. Which means, 
the related file is unsplitable. Reproduce the issue with a data file has size 
of 140M. In hadoop, it is put into two blocks: the lengths are: 128M, 12M . 
128M is dfs.block.size. For this query, hive use CombineHiveInputSplit to 
handle split, although logically, There is only one CombineHiveInputSplit(so 
one mapper), but the split has two paths (the same path with different startpos 
and lengths: 128M, 12M).
When CombineHiveRecordReader use the split, CombineHiveRecordReader generate 
two FileSplits for the two blocks. And the code in HiveContextAwareRecordReader 
that handle skip footer assuming each FileSplit is physically independent file, 
it skip footer in the first block and does not do any thing in the second 
block. So some record in the middle of the file is wrongly skipped as the 
footer, the real footer is still in the result. 
Fix the issue by tranfer footerbuffer across FileSplits for the same file, that 
will make the one mapper case correctly for skipping footer.

> skip.footer.line.count doesnt work properly for certain situations
> ------------------------------------------------------------------
>
>                 Key: HIVE-15359
>                 URL: https://issues.apache.org/jira/browse/HIVE-15359
>             Project: Hive
>          Issue Type: Bug
>          Components: Reader
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>
> This issue's reproduce is very like HIVE-12718 , but the data file is larger 
> than 128M . In this case, even make sure only one mapper is used, the footer 
> is still wrongly skipped. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to