[ https://issues.apache.org/jira/browse/PIG-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mohit Sabharwal updated PIG-4542: --------------------------------- Attachment: PIG-4542.patch > OutputConsumerIterator should flush buffered records > ---------------------------------------------------- > > Key: PIG-4542 > URL: https://issues.apache.org/jira/browse/PIG-4542 > Project: Pig > Issue Type: Sub-task > Components: spark > Affects Versions: spark-branch > Reporter: Mohit Sabharwal > Assignee: Mohit Sabharwal > Fix For: spark-branch > > Attachments: PIG-4542.patch > > > Certain operators may buffer the output. We need to flush the last set of > records from such operators, when we encounter the last input record, before > calling getNextTuple() for the last time. > Currently, to flush the last set of records, we compute RDD.count() and > compare the count with a running counter to determine if we have reached the > last record. This is an unnecessary and inefficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)