Tamir, x2 = FILTER x1 BY ( IsEmpty(p3) AND (IsEmpty(rdt1) OR (rdt1.to matches '.*com')) );
Here projecting the column 'to' from the bag 'rdt1' will give you a bag of chararray. You could write a UDF that takes this bag, iterate over the contents and do a regex match on each item. Thanks, Santhosh -----Original Message----- From: Tamir Kamara [mailto:[email protected]] Sent: Thursday, March 26, 2009 12:12 AM To: [email protected] Subject: Regex operand - chararray only Hi, Following a COGROUP I would like to filter results by one of the fields but I'm getting an error: Operand of Regex can be CharArray only. The relevant lines in my script are: x1 = COGROUP p3 BY domain, rdt1 BY from, f4 BY target; x2 = FILTER x1 BY ( IsEmpty(p3) AND (IsEmpty(rdt1) OR (rdt1.to matches '.*com')) ); x3 = FOREACH x2 GENERATE flatten(f4); describe of x1 x1: {group: chararray,p3: {domain: chararray},rdt1: {from: chararray,to: chararray},f4: {source: chararray,target: chararray}} I'm not sure why the error occurs. Is it because rdt1 inside x1 is a bag - multiple rdt1 can exist in the same group ? I can get around this with this script: x1 = COGROUP p3 BY domain, rdt1 BY from, f4 BY target parallel 32; x2 = FOREACH x1 GENERATE flatten(f4), COUNT(p3) as p3_count, COUNT(rdt1) as rdt1_count, flatten(rdt1.to); x3 = FILTER x2 BY ( p3_count==0 AND (rdt1_count==0 OR (to matches '.com')) ); x4 = FOREACH x3 GENERATE source, target; but it seems to me too complicated. Is there a way to make my first version work ? Thanks in advance, Tamir
