[
https://issues.apache.org/jira/browse/PIG-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xuefu Zhang updated PIG-1622:
-----------------------------
Attachment: PIG-1622-1.patch
Added a minor change for a test case.
> DEFINE streaming options are ill defined and not properly documented
> --------------------------------------------------------------------
>
> Key: PIG-1622
> URL: https://issues.apache.org/jira/browse/PIG-1622
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Alan Gates
> Assignee: Corinne Chandel
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-1622-1.patch, PIG-1622.patch
>
>
> According to the documentation
> (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#DEFINE) the
> syntax for DEFINE when used to define a streaming command is:
> DEFINE cmd INPUT(stdin|path) OUTPUT(stdout|stderr|path) SHIP(path [, path,
> ...]) CACHE (path [, path, ...])
> However, the actual parser accepts something pretty different. Consider the
> following script:
> {code}
> define strm `wc -l` INPUT(stdin)
> CACHE('/Users/gates/.vimrc#myvim')
> OUTPUT(stdin)
> INPUT('/tmp/fred')
> OUTPUT('/tmp/bob')
> SHIP('/Users/gates/.bashrc')
> SHIP('/Users/gates/.vimrc')
> CACHE('/Users/gates/.bashrc#mybash')
> stderr('/tmp/errors' limit 10);
> A = load '/Users/gates/test/data/studenttab10';
> B = stream A through strm;
> dump B;
> {code}
> The above actually parsers. I see several issues here:
> # What do multiple INPUT and OUTPUT statements mean in the context of
> streaming? These should not be allowed.
> # The documentation implies an order (INPUT, OUTPUT, SHIP, CACHE) that is not
> enforced by the parser. We should either enforce the order in the parser or
> update the documentation. Most likely the latter to avoid breaking existing
> scripts.
> # Why are multiple SHIP and CACHE clauses allowed when each can take multiple
> paths? It seems we should only allow one of each.
> # The error clause is completely different that what is given in the
> documentation. I suspect this is a documentation error and the grammar
> supported by the parser here is what we want.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira