ok, it sounds like I have a plan. So I need to write a  UDF from tuple to
bag(t2b) and bag to tuple(b2t), and then I do

exploded= foreach foo generate id, FLATTEN(t2b(field1, field2, field3));
implode= group exploded by id;
implode= foreach implode generate id, flatten(b2t(implode));

to (almost) recover original table, except for field order may be messed up.
Is there a way to write a udf like flatten that preserve order?


Thanks!




On Mon, Feb 22, 2010 at 9:57 AM, Dmitriy Ryaboy <[email protected]> wrote:

> Same thing -- a udf to convert a tuple into a bag, then flatten.
> Don't rely on any order you see in bags during testing -- there is
> explicitly no guarantee there, it may change on you version to version and
> execution to execution.
>
> -D
>
> On Mon, Feb 22, 2010 at 9:45 AM, hc busy <[email protected]> wrote:
>
> > Thanks, Dmitriy and Rekha . So I understand the flatten on bag explodes
> to
> > multiple rows now.
> >
> > The BagConcat seems to work. Actually, doing a simple example using the
> > group by, it would appear that the bag contains the results in the order
> > that they were before entering the group by. (so, if I group after an
> order
> > by x desc, then when I dump the table it prints the bag, but contents are
> > reversed)... So, actually, for my purposes, not having results in order
> is
> > okay.
> >
> > what about instead of charsplit, the data I have is this:
> >
> > 1,a,b,c,d
> > 2,a,s,d,f
> >
> > and I want to explode it into
> > 1,a
> > 1,b
> > 1,c
> > 1,d
> > 2,a
> > 2,s
> > 2,d
> > 2,f
> >
> > (sorry, I made a mistake in the original question, the string is not a
> > string but a tuple.) I think I may be able to get it into:
> >
> > 1, (a,b,c,d)
> > 2, (a,s,d,f)
> >
> > but still, I need to explode it into several rows to operate on them
> > separately.
> >
> >
> >
> > On Sun, Feb 21, 2010 at 8:03 PM, Rekha Joshi <[email protected]>
> > wrote:
> >
> > > You would require a udf for this.Please check if you already have an
> > > existing one in latest pig-udf.jar.
> > > Or since this is a pretty simple one , you can write one yourself -
> take
> > > the tuple, assess the type , append the strings and return it from your
> > > exec() method.
> > >
> > > Cheers,
> > > /R
> > >
> > >
> > > On 2/19/10 11:51 PM, "hc busy" <[email protected]> wrote:
> > >
> > > Guys, I know this must be a common use case, but how do you explode and
> > > implode in pig?
> > >
> > > so, I have a file like this...
> > >
> > > 1, asdf
> > > 2, qewrty
> > > 3, zcxvb
> > >
> > >
> > > and I want to apply an explode operation to it:
> > >
> > > 1, a
> > > 1, s
> > > 1, d
> > > 1, f
> > > 2, q
> > > 2, e
> > > 2, w
> > > 2, r
> > > 2, t
> > > 2, y
> > > 3, z
> > > 3, c
> > > 3, x
> > > 3, v
> > > 3, b
> > >
> > > and after some work... I have this file:
> > >
> > > 1, aa
> > > 1, ss
> > > 1, dd
> > > 1, ff
> > > 2, qq
> > > 2, ee
> > > 2, ww
> > > 2, rr
> > > 2, tt
> > > 2, yy
> > > 3, zz
> > > 3, cc
> > > 3, xx
> > > 3, vv
> > > 3, bb
> > >
> > >
> > > and I want to perform an implode:
> > >
> > > 1, aassddff
> > > 2, qqeewwrrttyy
> > > 3, zzccxxvvbb
> > >
> > >
> > > well, obviously this is a dumb example, but I'd like to do those
> things.
> > > Can
> > > somebody help me with this? I looked in the piggy bank and didn't see
> > > anything that would do this for me.
> > >
> > > Thanks!
> > >
> > >
> >
>

Reply via email to