Thanks for confirming, Dmitriy. I've reported this as: PIG-1581 https://issues.apache.org/jira/browse/PIG-1581
-Chris On Mon, Aug 30, 2010 at 2:50 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > That's a parser bug, it's treating all semicolons as semicolons, without > paying attention to quoting. > > Please open a ticket on the Pig Jira. > > In the meantime, there's a workaround: > > delimited = FOREACH lines GENERATE FLATTEN (REGEXEXTRACTALL(line, > '^(\\d+)\\u003B(\\w+)$')) AS (digit:int,word:chararray) > > On Mon, Aug 30, 2010 at 11:38 AM, Christopher Hackman < > christopher.hack...@gmail.com> wrote: > > > I'm attempting to parse some log files using the RegexExtractAll function > > in > > the piggybank. Everything was going along swimmingly until I tried to > > include an expression which contains a semi-colon. > > > > Here's the short, reproducible version of what I'm trying to do... > > > > Given an input file: > > > > /test1.txt (in the hdfs) > > 1;a > > 2;b > > 3;c > > 4;d > > 5;e > > > > > > And the following Pig script: > > > > REGISTER /tmp/piggybank.jar ; > > DEFINE REGEXEXTRACTALL > > org.apache.pig.piggybank.evaluation.string.RegexExtractAll(); > > lines = LOAD '/test1.txt' AS (line:chararray); > > delimited = FOREACH lines GENERATE FLATTEN ( > > REGEXEXTRACTALL(line, '^(\\d+);(\\w+)$') > > ) AS ( > > digit:int, > > word:chararray > > ); > > DUMP delimited; > > > > > > I receive the following error: > > > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during > parsing. > > Lexical error at line 5, column 40. Encountered: <EOF> after : > > "\'^(\\\\d+);" > > > > > > If I change the source file to use commas (or pipes, or dashes, etc...) > and > > change the regex accordingly, it works as expected. It looks to me like > Pig > > is not parsing the regex string correctly, and is assuming that the > > semi-colon (even though it's part of a quoted string) is an EOL > character. > > I've tried escaping the semi-colon, putting another at the end of the > > REGEXEXTRACTALL line, etc... nothing seems to prevent Pig from dying. > > > > Can anyone tell me if I'm missing something obvious? > > >