DIFF does not work in types branch
----------------------------------
Key: PIG-511
URL: https://issues.apache.org/jira/browse/PIG-511
Project: Pig
Issue Type: Bug
Components: data
Affects Versions: types_branch
Environment: CentOS 5, hadoop 0.18.0, pig built from types branch
Reporter: Cristian Ivascu
using DIFF(bag1, bag2) always returns an empty bag
Reason: in the compute_diff, the input bags are discarded, and the actual
operations are done against two newly created, empty bags
fix: make sure the compute_diff(bag1, bag2, output) does its work on bag 1 and
bag2, instead of d1 and d2.
Currently:
DataBag d1 = mBagFactory.newDistinctBag();
DataBag d2 = mBagFactory.newDistinctBag();
Iterator<Tuple> i1 = d1.iterator();
Iterator<Tuple> i2 = d2.iterator();
while (i1.hasNext()) d1.add(i1.next());
while (i2.hasNext()) d2.add(i2.next());
Should be:
DataBag d1 = mBagFactory.newDistinctBag();
DataBag d2 = mBagFactory.newDistinctBag();
Iterator<Tuple> i1 = bag1.iterator();
Iterator<Tuple> i2 = bag2.iterator();
while (i1.hasNext()) d1.add(i1.next());
while (i2.hasNext()) d2.add(i2.next());
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.