Ahh.. Now it makes more sense. I think I got the solution. I was adding to List<Tuple> and then finally creating a DataBag with that list.. Instead I should create a bag and keep adding to it..!! Is that correct? Thanks Alan.
Thanks -- Prasanth On Sep 5, 2012, at 9:24 PM, Alan Gates <ga...@hortonworks.com> wrote: > You cannot modify a bag once it is written. The implementation is written > around the assumption that bags are immutable after they are written. > > Creating a new bag should not create an OOM exception, as bags are built to > spill when they grow too large. In fact it's this spilling feature that > makes in place modification impossible. > > Alan. > > On Sep 5, 2012, at 6:08 PM, Prasanth J wrote: > >> Hello devs >> >> I have specific case where I need to modify the contents (remove a field >> from each tuples) of Databag but I want to do it in-place and do not want to >> create another databag with new set of tuples. >> The situation is, say I have the following input tuple for an UDF >> >> {(111,222,3,121), (112,223,2,131), (113,224,4,141)} >> >> I want to iterate through this bag and generate an output bag removing the >> 3rd the of each tuples in the bag to get the following output >> {(111,222,121), (112,223,131), (113,224,141)} >> >> Since the number of tuples in this bag are expected to be large I cannot >> create new set of tuples and create a bag, as this will cause OOM exception. >> >> Also I do not want to flatten this bag as this bag will be passed to >> DISTINCT operator for computing distinct elements in the bag. >> As seen from the javadocs for DataBag, there is no way to convert a bag on >> the fly. I wonder if there is any other way to solve this? >> >> Thanks >> -- Prasanth >> >