ok, it sounds like I have a plan. So I need to write a UDF from tuple to bag(t2b) and bag to tuple(b2t), and then I do
exploded= foreach foo generate id, FLATTEN(t2b(field1, field2, field3)); implode= group exploded by id; implode= foreach implode generate id, flatten(b2t(implode)); to (almost) recover original table, except for field order may be messed up. Is there a way to write a udf like flatten that preserve order? Thanks! On Mon, Feb 22, 2010 at 9:57 AM, Dmitriy Ryaboy <[email protected]> wrote: > Same thing -- a udf to convert a tuple into a bag, then flatten. > Don't rely on any order you see in bags during testing -- there is > explicitly no guarantee there, it may change on you version to version and > execution to execution. > > -D > > On Mon, Feb 22, 2010 at 9:45 AM, hc busy <[email protected]> wrote: > > > Thanks, Dmitriy and Rekha . So I understand the flatten on bag explodes > to > > multiple rows now. > > > > The BagConcat seems to work. Actually, doing a simple example using the > > group by, it would appear that the bag contains the results in the order > > that they were before entering the group by. (so, if I group after an > order > > by x desc, then when I dump the table it prints the bag, but contents are > > reversed)... So, actually, for my purposes, not having results in order > is > > okay. > > > > what about instead of charsplit, the data I have is this: > > > > 1,a,b,c,d > > 2,a,s,d,f > > > > and I want to explode it into > > 1,a > > 1,b > > 1,c > > 1,d > > 2,a > > 2,s > > 2,d > > 2,f > > > > (sorry, I made a mistake in the original question, the string is not a > > string but a tuple.) I think I may be able to get it into: > > > > 1, (a,b,c,d) > > 2, (a,s,d,f) > > > > but still, I need to explode it into several rows to operate on them > > separately. > > > > > > > > On Sun, Feb 21, 2010 at 8:03 PM, Rekha Joshi <[email protected]> > > wrote: > > > > > You would require a udf for this.Please check if you already have an > > > existing one in latest pig-udf.jar. > > > Or since this is a pretty simple one , you can write one yourself - > take > > > the tuple, assess the type , append the strings and return it from your > > > exec() method. > > > > > > Cheers, > > > /R > > > > > > > > > On 2/19/10 11:51 PM, "hc busy" <[email protected]> wrote: > > > > > > Guys, I know this must be a common use case, but how do you explode and > > > implode in pig? > > > > > > so, I have a file like this... > > > > > > 1, asdf > > > 2, qewrty > > > 3, zcxvb > > > > > > > > > and I want to apply an explode operation to it: > > > > > > 1, a > > > 1, s > > > 1, d > > > 1, f > > > 2, q > > > 2, e > > > 2, w > > > 2, r > > > 2, t > > > 2, y > > > 3, z > > > 3, c > > > 3, x > > > 3, v > > > 3, b > > > > > > and after some work... I have this file: > > > > > > 1, aa > > > 1, ss > > > 1, dd > > > 1, ff > > > 2, qq > > > 2, ee > > > 2, ww > > > 2, rr > > > 2, tt > > > 2, yy > > > 3, zz > > > 3, cc > > > 3, xx > > > 3, vv > > > 3, bb > > > > > > > > > and I want to perform an implode: > > > > > > 1, aassddff > > > 2, qqeewwrrttyy > > > 3, zzccxxvvbb > > > > > > > > > well, obviously this is a dumb example, but I'd like to do those > things. > > > Can > > > somebody help me with this? I looked in the piggy bank and didn't see > > > anything that would do this for me. > > > > > > Thanks! > > > > > > > > >
