[
https://issues.apache.org/jira/browse/PIG-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy reassigned PIG-5357:
---------------------------------------
Assignee: Jacob Tolar
Hadoop Flags: Reviewed
Fix Version/s: 0.18.0
{quote}All of the internal code now uses InternalDistinctBag instead of
DistinctDataBag.
{quote}
Difference is that InternalDistinctBag proactively spills based on memory usage
and caching limit configured. It also spills when spill() is called if read is
not already started. DistinctDataBag does not have proactive spilling, but
takes care of spilling even if it is in the middle of a read when spill() is
called. So it is fine to still use it.
+1. Committed to trunk. Thanks [~jtolar] for this enhancement.
> BagFactory interface should support creating a distinct bag from a set
> ----------------------------------------------------------------------
>
> Key: PIG-5357
> URL: https://issues.apache.org/jira/browse/PIG-5357
> Project: Pig
> Issue Type: Improvement
> Reporter: Jacob Tolar
> Assignee: Jacob Tolar
> Priority: Minor
> Fix For: 0.18.0
>
> Attachments: PIG-5357-1.patch, PIG-5357-2.patch
>
>
> It would be nice if BagFactory supported creating a distinct bag from a set
> of tuples, similar to:
> {code:java}
> newDefaultBag(List<Tuple> listOfTuples);
> {code}
> [https://github.com/apache/pig/blob/trunk/src/org/apache/pig/data/BagFactory.java]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)