Hi,
I have written a ‘Pig Script’ which is processing Sequence files given as
input.
It is working fine but there is one problem mentioned below.
I have repetitive statements in my pig script, as shown below:
- Filtered_Data _1= FILTER BagName BY ($0 matches 'RegEx-1');
-
Load the regex patterns from a file (one pattern per line), CROSS their
relation with BagName, and then use SelectFieldByName UDF to summon the
regex pattern from the regex relation.
https://issues.apache.org/jira/plugins/servlet/mobile#issue/DATAFU-69
I believe you can use a field name against
Are these regex static? If yes, this is easily achieved with embedding your
script in Java or any other language that Pig supports
http://pig.apache.org/docs/r0.13.0/cont.html
You could also possibly write a UDF that loops through all the regex and
returns result.
On Mon, Oct 6, 2014 at 12:44
Hi Ankur,
Is the list of regular expressions static or dynamic? If it's a static
list, you can collapse all the filter operators into a single operator and
use the AND keyword to combine them.
E.g.
Filtered_Data = FILTER BagName BY ($0 matches 'RegEx-1') AND ($0 matches
'RegEx-2') AND ($0
Actually, I don't think you need SelectFieldByValue. Just use the name of
the field directly.
On Monday, October 6, 2014, Prashant Kommireddi prash1...@gmail.com wrote:
Are these regex static? If yes, this is easily achieved with embedding your
script in Java or any other language that Pig
In case you haven't seen this already, take a look at
http://pig.apache.org/docs/r0.13.0/perf.html for some basic strategies on
optimizing your pig scripts.
On Mon, Oct 6, 2014 at 1:08 PM, Russell Jurney russell.jur...@gmail.com
wrote:
Actually, I don't think you need SelectFieldByValue. Just
Thanks for replying everyone. Few comments to everyone's suggestion.
1 I am processing sequence file which consist of many CSV files. I need
to extract only few among all CSV'S. So that is the reason I am doing
'SelectFieldByValue'
which is file name in my case not by field directly.
2 All
It looks like the best option at this point is to write a custom UDF that
takes loads a set of regular expressions from file and runs the data
through all of them.
On Mon, Oct 6, 2014 at 1:44 PM, Ankur Kasliwal ankur.kasliwal...@gmail.com
wrote:
Thanks for replying everyone. Few comments to
If you can describe the layout of your input files more thoroughly, it
would help.
On Monday, October 6, 2014, Pradeep Gollakota pradeep...@gmail.com wrote:
It looks like the best option at this point is to write a custom UDF that
takes loads a set of regular expressions from file and runs the