That's a fuzzy match (join two tables not on equality, but on one table's column value matching a dynamically generated regex based on another column). I don't know of efficient ways of doing that in MR, be it Pig or Hive.. what is Hive's execution plan for that?
The only thing that comes to mind for me is a pretty fancy udf which loads up one table completely in memory, and applies the match to all entries as the other table is streamed through. But of course that would be quite expensive if the lookup table is of any respectable size. D On Wed, Oct 3, 2012 at 11:32 AM, J. Rottinghuis <jrottingh...@gmail.com> wrote: > <moved common-user@hadoop.apache.org to bcc and added u...@pig.apache.org> > > Best asked on the Pig users list. > > Cheers, > > Joep > > On Wed, Oct 3, 2012 at 7:04 AM, Abhishek <abhishek.dod...@gmail.com> wrote: > >> Hi all, >> >> Below hive query in pig latin how to do that. >> >> select t2.col1, t3.col2 >> >> from table2 t2 >> >> join table3 t3 >> >> WHERE t3.col2 IS NOT NULL >> >> AND t2.col1 LIKE CONCAT(CONCAT('%',t3.col2),'%') >> >> Regards >> Abhi >> >> >>