Explode: exploded = foreach foo generate id, FLATTEN(CharSplit(string));
-- MySplit is an EvalFunc<DataBag> that takes a string and splits it into characters -- flatten on a bag creates multiple new rows, one per element in the bag imploded = group exploded by id; imploded = foreach imploded generate BagConcat(exploded) -- BagConcat is an EvalFunc<String> that takes a bag of one-field tuples and returns a string that's a concatenation of all the strings in the tuples in the bag do note that bags do not guarantee order, so if you have an order requirement, you may need to enforce it in BagConcat -D On Fri, Feb 19, 2010 at 10:21 AM, hc busy <[email protected]> wrote: > Guys, I know this must be a common use case, but how do you explode and > implode in pig? > > so, I have a file like this... > > 1, asdf > 2, qewrty > 3, zcxvb > > > and I want to apply an explode operation to it: > > 1, a > 1, s > 1, d > 1, f > 2, q > 2, e > 2, w > 2, r > 2, t > 2, y > 3, z > 3, c > 3, x > 3, v > 3, b > > and after some work... I have this file: > > 1, aa > 1, ss > 1, dd > 1, ff > 2, qq > 2, ee > 2, ww > 2, rr > 2, tt > 2, yy > 3, zz > 3, cc > 3, xx > 3, vv > 3, bb > > > and I want to perform an implode: > > 1, aassddff > 2, qqeewwrrttyy > 3, zzccxxvvbb > > > well, obviously this is a dumb example, but I'd like to do those things. > Can > somebody help me with this? I looked in the piggy bank and didn't see > anything that would do this for me. > > Thanks! >
