Pig folks: it seems like it defies the expectation if TOBAG is run on a
single TUPLE and you don't get a bag. I can patch it, but seem like a fair
change?

2012/4/4 Eli Finkelshteyn <iefin...@gmail.com>

> Nah, doesn't work because it doubles up the tuple, so that:
>
> TOBAG(('hello', 'howdy', 'hi'))
> returns
> {(('hello', 'howdy', 'hi'))}
>
> And so,
>
> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2))
> gets me
>
> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>
> which is just what I started with.
>
> Anyway, to solve this problem, what I did was make a quick python udf to
> make a bag from a tuple without doubling up the tuple, and then ran FLATTEN
> on that, which looks like:
>
> bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)**),
> FLATTEN(py_udfs.tupleToBag(t2)**);
>
> Where the Python udf I'm using is:
>
> @outputSchema("b:bag{}")
> def tupleToBag(tup):
>    b = [tupify(i) for i in tupify(tup)]
>    return b
>
> def tupify(tup):
>    if isinstance(tup, tuple):
>        return tup
>    return (tup,)
>
> I'll add that into Python PiggyBank as soon as I get a chance to finish
> that stuff up.
>
> Eli
>
>
>
> On 4/4/12 2:43 PM, Jonathan Coveney wrote:
>
>> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross
>>
>> 2012/4/4 Eli Finkelshteyn<iefinkel@gmail.**com <iefin...@gmail.com>>
>>
>>  That's for a relation only. Unless I'm missing something, it does not
>>> work
>>> for tuples. What I'm doing what require a FOREACH, I'm thinking.
>>>
>>> Eli
>>>
>>>
>>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote:
>>>
>>>  
>>> http://pig.apache.org/docs/r0.****9.1/basic.html#cross<http://pig.apache.org/docs/r0.**9.1/basic.html#cross>
>>>> <http://**pig.apache.org/docs/r0.9.1/**basic.html#cross<http://pig.apache.org/docs/r0.9.1/basic.html#cross>
>>>> >
>>>>
>>>> -Prashant
>>>>
>>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.****
>>>> com<iefin...@gmail.com>
>>>>
>>>>  wrote:
>>>>>
>>>>  Hi Folks,
>>>>
>>>>> I'm currently trying to do something I figured would be trivial, but
>>>>> actually wound up being a bit of work for me, so I'm wondering if I'm
>>>>> missing something. All I want to do is get a cross product of two
>>>>> tuples.
>>>>> So for example, given an input of:
>>>>>
>>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour')
>>>>>
>>>>> I'd get:
>>>>>
>>>>> ('hello', 'hola')
>>>>> ('hello', 'bonjour')
>>>>> ('howdy', 'hola')
>>>>> ('howdy', 'bonjour')
>>>>> ('hi', 'hola')
>>>>> ('hi', 'bonjour')
>>>>>
>>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's
>>>>> no
>>>>> good cause the tuples are first themselves put into new tuples. So,
>>>>> what
>>>>> I'm left with no is writing a dirty and slow python udf for this. Is
>>>>> there
>>>>> really no better way to do this? I'd think it would be a pretty
>>>>> standard
>>>>> task.
>>>>>
>>>>> Eli
>>>>>
>>>>>
>>>>>
>

Reply via email to