I would say the additional nesting level is a bug. But we should check if we break stuff with this change.
Cheers, -- Gianmarco On Thu, Apr 5, 2012 at 01:36, Jonathan Coveney <jcove...@gmail.com> wrote: > Pig folks: it seems like it defies the expectation if TOBAG is run on a > single TUPLE and you don't get a bag. I can patch it, but seem like a fair > change? > > 2012/4/4 Eli Finkelshteyn <iefin...@gmail.com> > > > Nah, doesn't work because it doubles up the tuple, so that: > > > > TOBAG(('hello', 'howdy', 'hi')) > > returns > > {(('hello', 'howdy', 'hi'))} > > > > And so, > > > > FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) > > gets me > > > > ('hello', 'howdy', 'hi'), ('hola', 'bonjour') > > > > which is just what I started with. > > > > Anyway, to solve this problem, what I did was make a quick python udf to > > make a bag from a tuple without doubling up the tuple, and then ran > FLATTEN > > on that, which looks like: > > > > bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)**), > > FLATTEN(py_udfs.tupleToBag(t2)**); > > > > Where the Python udf I'm using is: > > > > @outputSchema("b:bag{}") > > def tupleToBag(tup): > > b = [tupify(i) for i in tupify(tup)] > > return b > > > > def tupify(tup): > > if isinstance(tup, tuple): > > return tup > > return (tup,) > > > > I'll add that into Python PiggyBank as soon as I get a chance to finish > > that stuff up. > > > > Eli > > > > > > > > On 4/4/12 2:43 PM, Jonathan Coveney wrote: > > > >> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross > >> > >> 2012/4/4 Eli Finkelshteyn<iefinkel@gmail.**com <iefin...@gmail.com>> > >> > >> That's for a relation only. Unless I'm missing something, it does not > >>> work > >>> for tuples. What I'm doing what require a FOREACH, I'm thinking. > >>> > >>> Eli > >>> > >>> > >>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: > >>> > >>> http://pig.apache.org/docs/r0.****9.1/basic.html#cross< > http://pig.apache.org/docs/r0.**9.1/basic.html#cross> > >>>> <http://**pig.apache.org/docs/r0.9.1/**basic.html#cross< > http://pig.apache.org/docs/r0.9.1/basic.html#cross> > >>>> > > >>>> > >>>> -Prashant > >>>> > >>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.**** > >>>> com<iefin...@gmail.com> > >>>> > >>>> wrote: > >>>>> > >>>> Hi Folks, > >>>> > >>>>> I'm currently trying to do something I figured would be trivial, but > >>>>> actually wound up being a bit of work for me, so I'm wondering if I'm > >>>>> missing something. All I want to do is get a cross product of two > >>>>> tuples. > >>>>> So for example, given an input of: > >>>>> > >>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') > >>>>> > >>>>> I'd get: > >>>>> > >>>>> ('hello', 'hola') > >>>>> ('hello', 'bonjour') > >>>>> ('howdy', 'hola') > >>>>> ('howdy', 'bonjour') > >>>>> ('hi', 'hola') > >>>>> ('hi', 'bonjour') > >>>>> > >>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but > that's > >>>>> no > >>>>> good cause the tuples are first themselves put into new tuples. So, > >>>>> what > >>>>> I'm left with no is writing a dirty and slow python udf for this. Is > >>>>> there > >>>>> really no better way to do this? I'd think it would be a pretty > >>>>> standard > >>>>> task. > >>>>> > >>>>> Eli > >>>>> > >>>>> > >>>>> > > >