Ahh.. Now it makes more sense.

I think I got the solution. I was adding to List<Tuple> and then finally 
creating a DataBag with that list.. Instead I should create a bag and keep 
adding to it..!! Is that correct? 
Thanks Alan. 

Thanks
-- Prasanth

On Sep 5, 2012, at 9:24 PM, Alan Gates <ga...@hortonworks.com> wrote:

> You cannot modify a bag once it is written.  The implementation is written 
> around the assumption that bags are immutable after they are written.  
> 
> Creating a new bag should not create an OOM exception, as bags are built to 
> spill when they grow too large.  In fact it's this spilling feature that 
> makes in place modification impossible.
> 
> Alan.
> 
> On Sep 5, 2012, at 6:08 PM, Prasanth J wrote:
> 
>> Hello devs
>> 
>> I have specific case where I need to modify the contents (remove a field 
>> from each tuples) of Databag but I want to do it in-place and do not want to 
>> create another databag with new set of tuples. 
>> The situation is, say I have the following input tuple for an UDF
>> 
>> {(111,222,3,121), (112,223,2,131), (113,224,4,141)}
>> 
>> I want to iterate through this bag and generate an output bag removing the 
>> 3rd the of each tuples in the bag to get the following output
>> {(111,222,121), (112,223,131), (113,224,141)}
>> 
>> Since the number of tuples in this bag are expected to be large I cannot 
>> create new set of tuples and create a bag, as this will cause OOM exception. 
>> 
>> Also I do not want to flatten this bag as this bag will be passed to 
>> DISTINCT operator for computing distinct elements in the bag.
>> As seen from the javadocs for DataBag, there is no way to convert a bag on 
>> the fly. I wonder if there is any other way to solve this?
>> 
>> Thanks
>> -- Prasanth
>> 
> 

Reply via email to