That's a fuzzy match (join two tables not on equality, but on one
table's column value matching a dynamically generated regex based on
another column). I don't know of efficient ways of doing that in MR,
be it Pig or Hive.. what is Hive's execution plan for that?

The only thing that comes to mind for me is a pretty fancy udf which
loads up one table completely in memory, and applies the match to all
entries as the other table is streamed through. But of course that
would be quite expensive if the lookup table is of any respectable
size.

D



On Wed, Oct 3, 2012 at 11:32 AM, J. Rottinghuis <jrottingh...@gmail.com> wrote:
> <moved common-user@hadoop.apache.org to bcc and added u...@pig.apache.org>
>
> Best asked on the Pig users list.
>
> Cheers,
>
> Joep
>
> On Wed, Oct 3, 2012 at 7:04 AM, Abhishek <abhishek.dod...@gmail.com> wrote:
>
>> Hi all,
>>
>> Below hive query in pig latin how to do that.
>>
>> select t2.col1, t3.col2
>>
>> from table2 t2
>>
>> join table3 t3
>>
>> WHERE t3.col2 IS NOT NULL
>>
>> AND t2.col1 LIKE CONCAT(CONCAT('%',t3.col2),'%')
>>
>> Regards
>> Abhi
>>
>>
>>

Reply via email to