[ https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831400#action_12831400 ]
Hadoop QA commented on PIG-1231: -------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435230/PIG-1231-1.patch against trunk revision 907760. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/206/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/206/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/206/console This message is automatically generated. > Default DataBagIterator.hasNext() should be idempotent in all cases > ------------------------------------------------------------------- > > Key: PIG-1231 > URL: https://issues.apache.org/jira/browse/PIG-1231 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.6.0 > Reporter: Daniel Dai > Assignee: Daniel Dai > Fix For: 0.6.0 > > Attachments: PIG-1231-1.patch > > > DefaultDataBagIterator.hasNext() is not repeatable when the below conditions > met: > 1. There is no more tuple in the last spill file > 2. There is no tuples in memory (all contents are spilled to files) > This is not acceptable cuz the name hasNext() implies that it is idempotent. > In BagFormat, we do misuse DataBagIterator.hasNext() because of the > assumption that hasNext() is always idempotent, which leads to some > mysterious errors. > Condition 2 seems to be very restrictive, but when the databag is really big, > the memory can hold less than a couple of tuples, the chance to hit 2. is > high enough. > Here is one error we saw: > Caused by: java.io.IOException: Stream closed > at > java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:189) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readByte(DataInputStream.java:248) > at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278) > at > org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237) > ... 20 more > This happens because: we call hasNext(), which reach EOF and we close the > file. Then we call hasNext() again in the assumption that it is idempotent. > However, the stream is closed so we get this error message. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.