RE: Trouble with REGEX in PIG
Pradeep, Does the documentation here need to be updated: http://pig.apache.org/docs/r0.12.0/func.html#regex-extract It suggests that the function can run against a string and should return the expected value. I did confirm that I can use REGEX_EXTRACT on values loaded from a file. Thank you, Daniel -Original Message- From: Pradeep Gollakota [mailto:pradeep...@gmail.com] Sent: Wednesday, December 04, 2013 11:28 AM To: user@pig.apache.org Subject: Re: Trouble with REGEX in PIG It's not valid PigLatin... The Grunt shell doesn't let you try out functions and UDFs are you're trying to use them. A = LOAD 'data' USING PigStorage() as (ip: chararray); B = FOREACH A GENERATE REGEX_EXTRACT(ip, '(.*):(.*)', 1); DUMP B; You always have to load a dataset and work with said dataset(s). You can create a file called 'data' (per the above script) and put " 192.168.1.5:8020" in the file and try the above set of commands in the grunt shell. On Wed, Dec 4, 2013 at 10:15 AM, Ankit Bhatnagar wrote: > R u planning to use > > org.apache.pig.builtin.REGEX_EXTRACT > > > ? > > On 12/4/13 9:28 AM, "Watrous, Daniel" wrote: > > >Hi, > > > >I'm trying to use regular expressions in PIG, but it's failing. Based > >on the documentation > >http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am > >trying > >this: > > > >[watrous@c0003913 ~]$ pig -x local > >which: no hadoop in > >(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin > >:/usr > >/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/ > >opt/p > >b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watro > >us/pi > >g-0.12.0/bin) > >2013-12-04 17:15:15,398 [main] INFO org.apache.pig.Main - Apache Pig > >version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14 > >2013-12-04 17:15:15,398 [main] INFO org.apache.pig.Main - Logging > >error messages to: /home/watrous/pig_1386177315394.log > >2013-12-04 17:15:15,425 [main] INFO org.apache.pig.impl.util.Utils - > >Default bootup file /home/watrous/.pigbootup not found > >2013-12-04 17:15:15,599 [main] INFO > >org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >Connecting to hadoop file system at: file:/// > >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); > >2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt > >- ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: > >Macro must be defined before expansion. > >Details at logfile: /home/watrous/pig_1386177315394.log > > > >Here's the relevant bit from the log file: > >Pig Stack Trace > >--- > >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: > >Macro must be defined before expansion. > > > >Failed to parse: Cannot expand macro 'REGEX_EXTRACT'. Reason: > >Macro must be defined before expansion. > >at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455) > >at > >org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver > >.java > >:298) > >at > >org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver > >.java > >:287) > >at > >org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) > >at > >org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648) > >at > >org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621) > >at org.apache.pig.PigServer.registerQuery(PigServer.java:575) > >at > >org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) > >at > >org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScript > >Parse > >r.java:501) > >at > >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j > >ava:1 > >98) > >at > >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j > >ava:1 > >73) > >at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) > >at org.apache.pig.Main.run(Main.java:541) > >at org.apache.pig.Main.main(Main.java:156) > > > >I attempted to define the macro (following this tutorial > >http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't > >define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I > >located the most likely file in the current version of the jar. > > > >grunt> register > >/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar > >grunt> DEFINE REGEX_EXTRACT > >org.apache.pig.piggybank.evaluation.string.RegexExtract; > >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); > >2013-12-04 17:23:20,383 [main] ERROR org.apache.pig.tools.grunt.Grunt > >- ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: > >Macro must be defined before expansion. > >Details at logfile: /home/watrous/pig_1386177315394.log > > > >I get the same stack trace with the only change being a reference to > > instead of . > > > >Any idea how I can get this working? > > > >Daniel > >
RE: Trouble with REGEX in PIG
That's what I was trying first, but then I tried defining it too. -Original Message- From: Ankit Bhatnagar [mailto:ank...@yahoo-inc.com] Sent: Wednesday, December 04, 2013 11:15 AM To: user@pig.apache.org; Watrous, Daniel Subject: Re: Trouble with REGEX in PIG R u planning to use org.apache.pig.builtin.REGEX_EXTRACT ? On 12/4/13 9:28 AM, "Watrous, Daniel" wrote: >Hi, > >I'm trying to use regular expressions in PIG, but it's failing. Based >on the documentation >http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying >this: > >[watrous@c0003913 ~]$ pig -x local >which: no hadoop in >(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/ >usr >/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/op >t/p >b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watrous >/pi >g-0.12.0/bin) >2013-12-04 17:15:15,398 [main] INFO org.apache.pig.Main - Apache Pig >version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14 >2013-12-04 17:15:15,398 [main] INFO org.apache.pig.Main - Logging >error messages to: /home/watrous/pig_1386177315394.log >2013-12-04 17:15:15,425 [main] INFO org.apache.pig.impl.util.Utils - >Default bootup file /home/watrous/.pigbootup not found >2013-12-04 17:15:15,599 [main] INFO >org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >Connecting to hadoop file system at: file:/// >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); >2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt - >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro >must be defined before expansion. >Details at logfile: /home/watrous/pig_1386177315394.log > >Here's the relevant bit from the log file: >Pig Stack Trace >--- >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro >must be defined before expansion. > >Failed to parse: Cannot expand macro 'REGEX_EXTRACT'. Reason: >Macro must be defined before expansion. >at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455) >at >org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.j >ava >:298) >at >org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.j >ava >:287) >at >org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) >at >org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648) >at >org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621) >at org.apache.pig.PigServer.registerQuery(PigServer.java:575) >at >org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) >at >org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPa >rse >r.java:501) >at >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.jav >a:1 >98) >at >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.jav >a:1 >73) >at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) >at org.apache.pig.Main.run(Main.java:541) >at org.apache.pig.Main.main(Main.java:156) > >I attempted to define the macro (following this tutorial >http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't >define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I located >the most likely file in the current version of the jar. > >grunt> register >/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar >grunt> DEFINE REGEX_EXTRACT >org.apache.pig.piggybank.evaluation.string.RegexExtract; >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); >2013-12-04 17:23:20,383 [main] ERROR org.apache.pig.tools.grunt.Grunt - >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro >must be defined before expansion. >Details at logfile: /home/watrous/pig_1386177315394.log > >I get the same stack trace with the only change being a reference to > instead of . > >Any idea how I can get this working? > >Daniel
Re: Trouble with REGEX in PIG
It's not valid PigLatin... The Grunt shell doesn't let you try out functions and UDFs are you're trying to use them. A = LOAD 'data' USING PigStorage() as (ip: chararray); B = FOREACH A GENERATE REGEX_EXTRACT(ip, '(.*):(.*)', 1); DUMP B; You always have to load a dataset and work with said dataset(s). You can create a file called 'data' (per the above script) and put " 192.168.1.5:8020" in the file and try the above set of commands in the grunt shell. On Wed, Dec 4, 2013 at 10:15 AM, Ankit Bhatnagar wrote: > R u planning to use > > org.apache.pig.builtin.REGEX_EXTRACT > > > ? > > On 12/4/13 9:28 AM, "Watrous, Daniel" wrote: > > >Hi, > > > >I'm trying to use regular expressions in PIG, but it's failing. Based on > >the documentation > >http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying > >this: > > > >[watrous@c0003913 ~]$ pig -x local > >which: no hadoop in > >(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr > >/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/opt/p > >b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watrous/pi > >g-0.12.0/bin) > >2013-12-04 17:15:15,398 [main] INFO org.apache.pig.Main - Apache Pig > >version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14 > >2013-12-04 17:15:15,398 [main] INFO org.apache.pig.Main - Logging error > >messages to: /home/watrous/pig_1386177315394.log > >2013-12-04 17:15:15,425 [main] INFO org.apache.pig.impl.util.Utils - > >Default bootup file /home/watrous/.pigbootup not found > >2013-12-04 17:15:15,599 [main] INFO > >org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > >Connecting to hadoop file system at: file:/// > >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); > >2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt - > >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro > >must be defined before expansion. > >Details at logfile: /home/watrous/pig_1386177315394.log > > > >Here's the relevant bit from the log file: > >Pig Stack Trace > >--- > >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro > >must be defined before expansion. > > > >Failed to parse: Cannot expand macro 'REGEX_EXTRACT'. Reason: > >Macro must be defined before expansion. > >at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455) > >at > >org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.java > >:298) > >at > >org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java > >:287) > >at > >org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) > >at > >org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648) > >at > >org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621) > >at org.apache.pig.PigServer.registerQuery(PigServer.java:575) > >at > >org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) > >at > >org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParse > >r.java:501) > >at > >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1 > >98) > >at > >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1 > >73) > >at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) > >at org.apache.pig.Main.run(Main.java:541) > >at org.apache.pig.Main.main(Main.java:156) > > > >I attempted to define the macro (following this tutorial > >http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't > >define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I located > >the most likely file in the current version of the jar. > > > >grunt> register > >/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar > >grunt> DEFINE REGEX_EXTRACT > >org.apache.pig.piggybank.evaluation.string.RegexExtract; > >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); > >2013-12-04 17:23:20,383 [main] ERROR org.apache.pig.tools.grunt.Grunt - > >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro > >must be defined before expansion. > >Details at logfile: /home/watrous/pig_1386177315394.log > > > >I get the same stack trace with the only change being a reference to > > instead of . > > > >Any idea how I can get this working? > > > >Daniel > >
Re: Trouble with REGEX in PIG
R u planning to use org.apache.pig.builtin.REGEX_EXTRACT ? On 12/4/13 9:28 AM, "Watrous, Daniel" wrote: >Hi, > >I'm trying to use regular expressions in PIG, but it's failing. Based on >the documentation >http://pig.apache.org/docs/r0.12.0/func.html#regex-extract I am trying >this: > >[watrous@c0003913 ~]$ pig -x local >which: no hadoop in >(/opt/krb5/sbin/64:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr >/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/sbin:/usr/sbin:/usr/bin:/opt/p >b/bin:/opt/perf/bin:/bin:/usr/local/bin:/home/watrous/bin:/home/watrous/pi >g-0.12.0/bin) >2013-12-04 17:15:15,398 [main] INFO org.apache.pig.Main - Apache Pig >version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14 >2013-12-04 17:15:15,398 [main] INFO org.apache.pig.Main - Logging error >messages to: /home/watrous/pig_1386177315394.log >2013-12-04 17:15:15,425 [main] INFO org.apache.pig.impl.util.Utils - >Default bootup file /home/watrous/.pigbootup not found >2013-12-04 17:15:15,599 [main] INFO >org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >Connecting to hadoop file system at: file:/// >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); >2013-12-04 17:16:59,753 [main] ERROR org.apache.pig.tools.grunt.Grunt - >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro >must be defined before expansion. >Details at logfile: /home/watrous/pig_1386177315394.log > >Here's the relevant bit from the log file: >Pig Stack Trace >--- >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro >must be defined before expansion. > >Failed to parse: Cannot expand macro 'REGEX_EXTRACT'. Reason: >Macro must be defined before expansion. >at org.apache.pig.parser.PigMacro.macroInline(PigMacro.java:455) >at >org.apache.pig.parser.QueryParserDriver.inlineMacro(QueryParserDriver.java >:298) >at >org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java >:287) >at >org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) >at >org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648) >at >org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621) >at org.apache.pig.PigServer.registerQuery(PigServer.java:575) >at >org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093) >at >org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParse >r.java:501) >at >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1 >98) >at >org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:1 >73) >at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) >at org.apache.pig.Main.run(Main.java:541) >at org.apache.pig.Main.main(Main.java:156) > >I attempted to define the macro (following this tutorial >http://aws.amazon.com/articles/2729). However, piggybank.jar doesn't >define org.apache.pig.piggybank.evaluation.string.EXTRACT, so I located >the most likely file in the current version of the jar. > >grunt> register >/home/watrous/pig-0.12.0/contrib/piggybank/java/piggybank.jar >grunt> DEFINE REGEX_EXTRACT >org.apache.pig.piggybank.evaluation.string.RegexExtract; >grunt> REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1); >2013-12-04 17:23:20,383 [main] ERROR org.apache.pig.tools.grunt.Grunt - >ERROR 1200: Cannot expand macro 'REGEX_EXTRACT'. Reason: Macro >must be defined before expansion. >Details at logfile: /home/watrous/pig_1386177315394.log > >I get the same stack trace with the only change being a reference to > instead of . > >Any idea how I can get this working? > >Daniel