Thanks for confirming, Dmitriy. I've reported this as: PIG-1581

https://issues.apache.org/jira/browse/PIG-1581

-Chris

On Mon, Aug 30, 2010 at 2:50 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> That's a parser bug, it's treating all semicolons as semicolons, without
> paying attention to quoting.
>
> Please open a ticket on the Pig Jira.
>
> In the meantime, there's a workaround:
>
> delimited = FOREACH lines GENERATE FLATTEN (REGEXEXTRACTALL(line,
> '^(\\d+)\\u003B(\\w+)$')) AS (digit:int,word:chararray)
>
> On Mon, Aug 30, 2010 at 11:38 AM, Christopher Hackman <
> christopher.hack...@gmail.com> wrote:
>
> > I'm attempting to parse some log files using the RegexExtractAll function
> > in
> > the piggybank. Everything was going along swimmingly until I tried to
> > include an expression which contains a semi-colon.
> >
> > Here's the short, reproducible version of what I'm trying to do...
> >
> > Given an input file:
> >
> > /test1.txt (in the hdfs)
> > 1;a
> > 2;b
> > 3;c
> > 4;d
> > 5;e
> >
> >
> > And the following Pig script:
> >
> > REGISTER /tmp/piggybank.jar ;
> > DEFINE REGEXEXTRACTALL
> > org.apache.pig.piggybank.evaluation.string.RegexExtractAll();
> > lines = LOAD '/test1.txt' AS (line:chararray);
> > delimited = FOREACH lines GENERATE FLATTEN (
> >        REGEXEXTRACTALL(line, '^(\\d+);(\\w+)$')
> > ) AS (
> >        digit:int,
> >        word:chararray
> > );
> > DUMP delimited;
> >
> >
> > I receive the following error:
> >
> > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during
> parsing.
> > Lexical error at line 5, column 40.  Encountered: <EOF> after :
> > "\'^(\\\\d+);"
> >
> >
> > If I change the source file to use commas (or pipes, or dashes, etc...)
> and
> > change the regex accordingly, it works as expected. It looks to me like
> Pig
> > is not parsing the regex string correctly, and is assuming that the
> > semi-colon (even though it's part of a quoted string) is an EOL
> character.
> > I've tried escaping the semi-colon, putting another at the end of the
> > REGEXEXTRACTALL line, etc... nothing seems to prevent Pig from dying.
> >
> > Can anyone tell me if I'm missing something obvious?
> >
>

Reply via email to