[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1231: ---------------------------- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to both trunk and 0.6 branch. > Default DataBagIterator.hasNext() should be idempotent in all cases > ------------------------------------------------------------------- > > Key: PIG-1231 > URL: https://issues.apache.org/jira/browse/PIG-1231 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.6.0 > Reporter: Daniel Dai > Assignee: Daniel Dai > Fix For: 0.6.0 > > Attachments: PIG-1231-1.patch, PIG-1231-2.patch > > > DefaultDataBagIterator.hasNext() is not repeatable when the below conditions > met: > 1. There is no more tuple in the last spill file > 2. There is no tuples in memory (all contents are spilled to files) > This is not acceptable cuz the name hasNext() implies that it is idempotent. > In BagFormat, we do misuse DataBagIterator.hasNext() because of the > assumption that hasNext() is always idempotent, which leads to some > mysterious errors. > Condition 2 seems to be very restrictive, but when the databag is really big, > the memory can hold less than a couple of tuples, the chance to hit 2. is > high enough. > Here is one error we saw: > Caused by: java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readByte(DataInputStream.java:248) > at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) > at > org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) > ... 20 more > This happens because: we call hasNext(), which reach EOF and we close the > file. Then we call hasNext() again in the assumption that it is idempotent. > However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.