[ https://issues.apache.org/jira/browse/PIG-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated PIG-1622: ----------------------------- Attachment: PIG-1622-1.patch Added a minor change for a test case. > DEFINE streaming options are ill defined and not properly documented > -------------------------------------------------------------------- > > Key: PIG-1622 > URL: https://issues.apache.org/jira/browse/PIG-1622 > Project: Pig > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Alan Gates > Assignee: Corinne Chandel > Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-1622-1.patch, PIG-1622.patch > > > According to the documentation > (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#DEFINE) the > syntax for DEFINE when used to define a streaming command is: > DEFINE cmd INPUT(stdin|path) OUTPUT(stdout|stderr|path) SHIP(path [, path, > ...]) CACHE (path [, path, ...]) > However, the actual parser accepts something pretty different. Consider the > following script: > {code} > define strm `wc -l` INPUT(stdin) > CACHE('/Users/gates/.vimrc#myvim') > OUTPUT(stdin) > INPUT('/tmp/fred') > OUTPUT('/tmp/bob') > SHIP('/Users/gates/.bashrc') > SHIP('/Users/gates/.vimrc') > CACHE('/Users/gates/.bashrc#mybash') > stderr('/tmp/errors' limit 10); > A = load '/Users/gates/test/data/studenttab10'; > B = stream A through strm; > dump B; > {code} > The above actually parsers. I see several issues here: > # What do multiple INPUT and OUTPUT statements mean in the context of > streaming? These should not be allowed. > # The documentation implies an order (INPUT, OUTPUT, SHIP, CACHE) that is not > enforced by the parser. We should either enforce the order in the parser or > update the documentation. Most likely the latter to avoid breaking existing > scripts. > # Why are multiple SHIP and CACHE clauses allowed when each can take multiple > paths? It seems we should only allow one of each. > # The error clause is completely different that what is given in the > documentation. I suspect this is a documentation error and the grammar > supported by the parser here is what we want. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira