[ https://issues.apache.org/jira/browse/PIG-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712200#action_12712200 ]
Pradeep Kamath commented on PIG-814: ------------------------------------ The patch also contains a simple fix for enable split by 'file' in the load statement - in this case, pig should not try to split the input file by block size, but process the entire file in a map. > Make Binstorage more robust when data contains record markers > ------------------------------------------------------------- > > Key: PIG-814 > URL: https://issues.apache.org/jira/browse/PIG-814 > Project: Pig > Issue Type: Bug > Affects Versions: 0.2.1 > Reporter: Pradeep Kamath > Assignee: Pradeep Kamath > Fix For: 0.3.0 > > Attachments: PIG-814.patch > > > When the inputstream for BinStorage is at a position where the data has the > record marker sequence, the code incorrectly assumes that it is at the > beginning of a record (tuple) and calls DataReaderWriter.readDatum() trying > to read the tuple. The problem is more likely when RandomSampleLoader (used > in order by implementation) skips the input stream for sampling and calls > Binstorage.getNext(). The code should be more robust in such cases -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.