hi,

i recently found pig, really like it and want to use it for one of our actual 
projects.

getting the basics running was easy, but now i am struggling one a problem.

i am trying to get customers whose email is not blacklisted.

blacklist entires can be specified as:

[email protected]

or wildcarded

@domain.de

in sql i would solve this by:

----

select
  * 
from 
  customer c
left join blacklist b
on
  c.email like concat("%",b.email)
where
  b.email is null

----

this is the structure of my input files:

raw_customer = LOAD 'customer.csv' USING PigStorage('\t') AS (id: long, email: 
chararray);
raw_blacklist = LOAD 'blacklist.csv' USING PigStorage('\t') AS (email: 
chararray);


how would i solve this using pig ? - especially handling the "like %" condition.

i already looked into udf, but need some advice how to implement this.


any help would be really appreciated.

regards,
jan

Reply via email to