hi, i recently found pig, really like it and want to use it for one of our actual projects.
getting the basics running was easy, but now i am struggling one a problem. i am trying to get customers whose email is not blacklisted. blacklist entires can be specified as: [email protected] or wildcarded @domain.de in sql i would solve this by: ---- select * from customer c left join blacklist b on c.email like concat("%",b.email) where b.email is null ---- this is the structure of my input files: raw_customer = LOAD 'customer.csv' USING PigStorage('\t') AS (id: long, email: chararray); raw_blacklist = LOAD 'blacklist.csv' USING PigStorage('\t') AS (email: chararray); how would i solve this using pig ? - especially handling the "like %" condition. i already looked into udf, but need some advice how to implement this. any help would be really appreciated. regards, jan
