[ 
https://issues.apache.org/jira/browse/PIG-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1622:
-----------------------------

    Attachment: PIG-1622.patch

Test-patch run:

     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or 
modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning 
messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> DEFINE streaming options are ill defined and not properly documented
> --------------------------------------------------------------------
>
>                 Key: PIG-1622
>                 URL: https://issues.apache.org/jira/browse/PIG-1622
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Alan Gates
>            Assignee: Corinne Chandel
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: PIG-1622.patch
>
>
> According to the documentation 
> (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#DEFINE) the 
> syntax for DEFINE when used to define a streaming command is:
> DEFINE cmd INPUT(stdin|path) OUTPUT(stdout|stderr|path) SHIP(path [, path, 
> ...]) CACHE (path [, path, ...])
> However, the actual parser accepts something pretty different.  Consider the 
> following script:
> {code}
> define strm `wc -l` INPUT(stdin) 
>                     CACHE('/Users/gates/.vimrc#myvim') 
>                     OUTPUT(stdin)
>                     INPUT('/tmp/fred') 
>                     OUTPUT('/tmp/bob')
>                     SHIP('/Users/gates/.bashrc') 
>                     SHIP('/Users/gates/.vimrc') 
>                     CACHE('/Users/gates/.bashrc#mybash')
>                     stderr('/tmp/errors' limit 10);
> A = load '/Users/gates/test/data/studenttab10';
> B = stream A through strm;
> dump B;
> {code}
> The above actually parsers.  I see several issues here:
> # What do multiple INPUT and OUTPUT statements mean in the context of 
> streaming?  These should not be allowed.
> # The documentation implies an order (INPUT, OUTPUT, SHIP, CACHE) that is not 
> enforced by the parser.  We should either enforce the order in the parser or 
> update the documentation.  Most likely the latter to avoid breaking existing 
> scripts.
> # Why are multiple SHIP and CACHE clauses allowed when each can take multiple 
> paths?  It seems we should only allow one of each.
> # The error clause is completely different that what is given in the 
> documentation.  I suspect this is a documentation error and the grammar 
> supported by the parser here is what we want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to