RE: Trouble with REGEX in PIG

2013-12-04 Thread Watrous, Daniel
Pradeep,

Does the documentation here need to be updated: 
http://pig.apache.org/docs/r0.12.0/func.html#regex-extract

It suggests that the function can run against a string and should return the 
expected value.

I did confirm that I can use REGEX_EXTRACT on values loaded from a file. 

Thank you,
Daniel

-Original Message-
From: Pradeep Gollakota [mailto:pradeep...@gmail.com] 
Sent: Wednesday, December 04, 2013 11:28 AM
To: user@pig.apache.org
Subject: Re: Trouble with REGEX in PIG

It's not valid PigLatin...

The Grunt shell doesn't let you try out functions and UDFs are you're trying to 
use them.

A = LOAD 'data' USING PigStorage() as (ip: chararray);
B = FOREACH A GENERATE REGEX_EXTRACT(ip, '(.*):(.*)', 1);
DUMP B;

You always have to load a dataset and work with said dataset(s).
You can create a file called 'data' (per the above script) and put "
192.168.1.5:8020" in the file and try the above set of commands in the grunt 
shell.


On Wed, Dec 4, 2013 at 10:15 AM, Ankit Bhatnagar wrote:

> R u planning to use
>
> org.apache.pig.builtin.REGEX_EXTRACT
>
>
> ?
>
> On 12/4/13 9:28 AM, "Watrous, Daniel"  wrote:
>
> >Hi,
> >
> >I'm trying to use regular expressions in PIG, but it's failing. Based 
> >on the documentation 
> >http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am 
> >trying
> >this:
> >
> >[watrous@c0003913 ~]$ pig -x local
> >which: no hadoop in
> >(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin
> >:/usr 
> >/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/
> >opt/p 
> >b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watro
> >us/pi
> >g-0.12.0/bin)
> >2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Apache Pig 
> >version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
> >2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Logging 
> >error messages to: /home/watrous/pig_1386177315394.log
> >2013-12-04 17:15:15,425 [main] INFO  org.apache.pig.impl.util.Utils - 
> >Default bootup file /home/watrous/.pigbootup not found
> >2013-12-04 17:15:15,599 [main] INFO
> >org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
> >Connecting to hadoop file system at: file:///
> >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
> >2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt 
> >- ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: 
> >Macro must be defined before expansion.
> >Details at logfile: /home/watrous/pig_1386177315394.log
> >
> >Here's the relevant bit from the log file:
> >Pig Stack Trace
> >---
> >ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: 
> >Macro must be defined before expansion.
> >
> >Failed to parse:  Cannot expand macro 'REGEX_EXTRACT'. Reason:
> >Macro must be defined before expansion.
> >at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455)
> >at
> >org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver
> >.java
> >:298)
> >at
> >org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver
> >.java
> >:287)
> >at
> >org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
> >at
> >org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
> >at
> >org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
> >at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
> >at
> >org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
> >at
> >org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScript
> >Parse
> >r.java:501)
> >at
> >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j
> >ava:1
> >98)
> >at
> >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j
> >ava:1
> >73)
> >at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> >at org.apache.pig.Main.run(Main.java:541)
> >at org.apache.pig.Main.main(Main.java:156)
> >
> >I attempted to define the macro (following this tutorial 
> >http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't 
> >define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I 
> >located the most likely file in the current version of the jar.
> >
> >grunt> register
> >/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar
> >grunt> DEFINE REGEX_EXTRACT
> >org.apache.pig.piggybank.evaluation.string.RegexExtract;
> >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
> >2013-12-04 17:23:20,383 [main] ERROR org.apache.pig.tools.grunt.Grunt 
> >- ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: 
> >Macro must be defined before expansion.
> >Details at logfile: /home/watrous/pig_1386177315394.log
> >
> >I get the same stack trace with the only change being a reference to 
> > instead of .
> >
> >Any idea how I can get this working?
> >
> >Daniel
>
>


RE: Trouble with REGEX in PIG

2013-12-04 Thread Watrous, Daniel
That's what I was trying first, but then I tried defining it too.

-Original Message-
From: Ankit Bhatnagar [mailto:ank...@yahoo-inc.com] 
Sent: Wednesday, December 04, 2013 11:15 AM
To: user@pig.apache.org; Watrous, Daniel
Subject: Re: Trouble with REGEX in PIG

R u planning to use

org.apache.pig.builtin.REGEX_EXTRACT


?

On 12/4/13 9:28 AM, "Watrous, Daniel"  wrote:

>Hi,
>
>I'm trying to use regular expressions in PIG, but it's failing. Based 
>on the documentation 
>http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying
>this:
>
>[watrous@c0003913 ~]$ pig -x local
>which: no hadoop in
>(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/
>usr 
>/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/op
>t/p 
>b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watrous
>/pi
>g-0.12.0/bin)
>2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Apache Pig 
>version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
>2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Logging 
>error messages to: /home/watrous/pig_1386177315394.log
>2013-12-04 17:15:15,425 [main] INFO  org.apache.pig.impl.util.Utils - 
>Default bootup file /home/watrous/.pigbootup not found
>2013-12-04 17:15:15,599 [main] INFO
>org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
>Connecting to hadoop file system at: file:///
>grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
>2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
>ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro 
>must be defined before expansion.
>Details at logfile: /home/watrous/pig_1386177315394.log
>
>Here's the relevant bit from the log file:
>Pig Stack Trace
>---
>ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro 
>must be defined before expansion.
>
>Failed to parse:  Cannot expand macro 'REGEX_EXTRACT'. Reason:
>Macro must be defined before expansion.
>at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455)
>at
>org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.j
>ava
>:298)
>at
>org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.j
>ava
>:287)
>at
>org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
>at
>org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
>at
>org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
>at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
>at
>org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
>at
>org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPa
>rse
>r.java:501)
>at
>org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.jav
>a:1
>98)
>at
>org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.jav
>a:1
>73)
>at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>at org.apache.pig.Main.run(Main.java:541)
>at org.apache.pig.Main.main(Main.java:156)
>
>I attempted to define the macro (following this tutorial 
>http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't 
>define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I located 
>the most likely file in the current version of the jar.
>
>grunt> register
>/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar
>grunt> DEFINE REGEX_EXTRACT
>org.apache.pig.piggybank.evaluation.string.RegexExtract;
>grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
>2013-12-04 17:23:20,383 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
>ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro 
>must be defined before expansion.
>Details at logfile: /home/watrous/pig_1386177315394.log
>
>I get the same stack trace with the only change being a reference to 
> instead of .
>
>Any idea how I can get this working?
>
>Daniel



Re: Trouble with REGEX in PIG

2013-12-04 Thread Pradeep Gollakota
It's not valid PigLatin...

The Grunt shell doesn't let you try out functions and UDFs are you're
trying to use them.

A = LOAD 'data' USING PigStorage() as (ip: chararray);
B = FOREACH A GENERATE REGEX_EXTRACT(ip, '(.*):(.*)', 1);
DUMP B;

You always have to load a dataset and work with said dataset(s).
You can create a file called 'data' (per the above script) and put "
192.168.1.5:8020" in the file and try the above set of commands in the
grunt shell.


On Wed, Dec 4, 2013 at 10:15 AM, Ankit Bhatnagar wrote:

> R u planning to use
>
> org.apache.pig.builtin.REGEX_EXTRACT
>
>
> ?
>
> On 12/4/13 9:28 AM, "Watrous, Daniel"  wrote:
>
> >Hi,
> >
> >I'm trying to use regular expressions in PIG, but it's failing. Based on
> >the documentation
> >http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying
> >this:
> >
> >[watrous@c0003913 ~]$ pig -x local
> >which: no hadoop in
> >(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr
> >/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/opt/p
> >b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watrous/pi
> >g-0.12.0/bin)
> >2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Apache Pig
> >version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
> >2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Logging error
> >messages to: /home/watrous/pig_1386177315394.log
> >2013-12-04 17:15:15,425 [main] INFO  org.apache.pig.impl.util.Utils -
> >Default bootup file /home/watrous/.pigbootup not found
> >2013-12-04 17:15:15,599 [main] INFO
> >org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >Connecting to hadoop file system at: file:///
> >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
> >2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> >ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro
> >must be defined before expansion.
> >Details at logfile: /home/watrous/pig_1386177315394.log
> >
> >Here's the relevant bit from the log file:
> >Pig Stack Trace
> >---
> >ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro
> >must be defined before expansion.
> >
> >Failed to parse:  Cannot expand macro 'REGEX_EXTRACT'. Reason:
> >Macro must be defined before expansion.
> >at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455)
> >at
> >org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.java
> >:298)
> >at
> >org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java
> >:287)
> >at
> >org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
> >at
> >org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
> >at
> >org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
> >at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
> >at
> >org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
> >at
> >org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParse
> >r.java:501)
> >at
> >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1
> >98)
> >at
> >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1
> >73)
> >at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> >at org.apache.pig.Main.run(Main.java:541)
> >at org.apache.pig.Main.main(Main.java:156)
> >
> >I attempted to define the macro (following this tutorial
> >http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't
> >define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I located
> >the most likely file in the current version of the jar.
> >
> >grunt> register
> >/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar
> >grunt> DEFINE REGEX_EXTRACT
> >org.apache.pig.piggybank.evaluation.string.RegexExtract;
> >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
> >2013-12-04 17:23:20,383 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> >ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro
> >must be defined before expansion.
> >Details at logfile: /home/watrous/pig_1386177315394.log
> >
> >I get the same stack trace with the only change being a reference to
> > instead of .
> >
> >Any idea how I can get this working?
> >
> >Daniel
>
>


Re: Trouble with REGEX in PIG

2013-12-04 Thread Ankit Bhatnagar
R u planning to use

org.apache.pig.builtin.REGEX_EXTRACT


?

On 12/4/13 9:28 AM, "Watrous, Daniel"  wrote:

>Hi,
>
>I'm trying to use regular expressions in PIG, but it's failing. Based on
>the documentation 
>http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying
>this:
>
>[watrous@c0003913 ~]$ pig -x local
>which: no hadoop in
>(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr
>/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/opt/p
>b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watrous/pi
>g-0.12.0/bin)
>2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Apache Pig
>version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
>2013-12-04 17:15:15,398 [main] INFO  org.apache.pig.Main - Logging error
>messages to: /home/watrous/pig_1386177315394.log
>2013-12-04 17:15:15,425 [main] INFO  org.apache.pig.impl.util.Utils -
>Default bootup file /home/watrous/.pigbootup not found
>2013-12-04 17:15:15,599 [main] INFO
>org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>Connecting to hadoop file system at: file:///
>grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
>2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro
>must be defined before expansion.
>Details at logfile: /home/watrous/pig_1386177315394.log
>
>Here's the relevant bit from the log file:
>Pig Stack Trace
>---
>ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro
>must be defined before expansion.
>
>Failed to parse:  Cannot expand macro 'REGEX_EXTRACT'. Reason:
>Macro must be defined before expansion.
>at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455)
>at 
>org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.java
>:298)
>at 
>org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java
>:287)
>at 
>org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
>at 
>org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
>at 
>org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
>at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
>at 
>org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
>at 
>org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParse
>r.java:501)
>at 
>org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1
>98)
>at 
>org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1
>73)
>at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>at org.apache.pig.Main.run(Main.java:541)
>at org.apache.pig.Main.main(Main.java:156)
>
>I attempted to define the macro (following this tutorial
>http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't
>define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I located
>the most likely file in the current version of the jar.
>
>grunt> register 
>/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar
>grunt> DEFINE REGEX_EXTRACT
>org.apache.pig.piggybank.evaluation.string.RegexExtract;
>grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
>2013-12-04 17:23:20,383 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>ERROR 1200:  Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro
>must be defined before expansion.
>Details at logfile: /home/watrous/pig_1386177315394.log
>
>I get the same stack trace with the only change being a reference to
> instead of .
>
>Any idea how I can get this working?
>
>Daniel