[ 
https://issues.apache.org/jira/browse/PIG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1231:
----------------------------

    Attachment: PIG-1231-1.patch

DefaultDataBagIterator is the only DataBag has this problem. Other databag 
handles this through different mechanisms. 

> DataBagIterator.hasNext() should be idempotent
> ----------------------------------------------
>
>                 Key: PIG-1231
>                 URL: https://issues.apache.org/jira/browse/PIG-1231
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>         Attachments: PIG-1231-1.patch
>
>
> DataBagIterator.hasNext() is not repeatable in some situations. This is not 
> acceptable cuz the name hasNext() implies that it is idempotent. While 
> hasNext() returns true, it is repeatable, but if hasNext() returns false, it 
> is not. In BagFormat, we do misuse DataBagIterator.hasNext() because of the 
> assumption that hasNext() is always idempotent, which leads to some 
> mysterious errors. Here is one error we saw:
> Caused by: java.io.IOException: Stream closed
>         at 
> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readByte(DataInputStream.java:248)
>         at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:278)
>         at 
> org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.readFromFile(DefaultDataBag.java:237)
>         ... 20 more
> This happens because: we call hasNext(), which reach EOF and we close the 
> file. Then we call hasNext() again in the assumption that it is idempotent. 
> However, the stream is closed so we get this error message.
> This fix will go to DefaultDataBagIterator, DistinctDataBagIterator, 
> CachedBagIterator, SortedDataBagIterator. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to