[jira] Subscription: PIG patch available

2013-03-13 Thread jira
Issue Subscription
Filter: PIG patch available (37 issues)

Subscriber: pigdaily

Key Summary
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3244Make PIG_HOME configurable
https://issues.apache.org/jira/browse/PIG-3244
PIG-3238Pig current releases lack a UDF Stuff(). This UDF deletes a 
specified length of characters and inserts another set of characters at a 
specified starting point.
https://issues.apache.org/jira/browse/PIG-3238
PIG-3237Pig current releases lack a UDF MakeSet(). This UDF returns a set 
value (a string containing substrings separated by "," characters) consisting 
of the strings that have the corresponding bit in the first argument
https://issues.apache.org/jira/browse/PIG-3237
PIG-3235Enable DEBUG log messages in unit tests by default
https://issues.apache.org/jira/browse/PIG-3235
PIG-3233Deploy a Piggybank Jar
https://issues.apache.org/jira/browse/PIG-3233
PIG-3215[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated 
Values) files
https://issues.apache.org/jira/browse/PIG-3215
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3208[zebra] TFile should not set io.compression.codec.lzo.buffersize
https://issues.apache.org/jira/browse/PIG-3208
PIG-3205Passing arguments to python script does not work with -f option
https://issues.apache.org/jira/browse/PIG-3205
PIG-3198Let users use any function from PigType -> PigType as if it were 
builtlin
https://issues.apache.org/jira/browse/PIG-3198
PIG-3194Changes to ObjectSerializer.java break compatibility with Hadoop 
0.20.2
https://issues.apache.org/jira/browse/PIG-3194
PIG-3190Add LuceneTokenizer and SnowballTokenizer to Pig - useful text 
tokenization
https://issues.apache.org/jira/browse/PIG-3190
PIG-3183rm or rmf commands should respect globbing/regex of path
https://issues.apache.org/jira/browse/PIG-3183
PIG-3172Partition filter push down does not happen when there is a non 
partition key map column filter
https://issues.apache.org/jira/browse/PIG-3172
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3164Pig current releases lack a UDF endsWith.This UDF tests if a given 
string ends with the specified suffix.
https://issues.apache.org/jira/browse/PIG-3164
PIG-3141Giving CSVExcelStorage an option to handle header rows
https://issues.apache.org/jira/browse/PIG-3141
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3122Operators should not implicitly become reserved keywords
https://issues.apache.org/jira/browse/PIG-3122
PIG-3114Duplicated macro name error when using pigunit
https://issues.apache.org/jira/browse/PIG-3114
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3077TestMultiQueryLocal should not write in /tmp
https://issues.apache.org/jira/browse/PIG-3077
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2643Use bytecode generation to make a performance replacement for 
InvokeForLong, InvokeForString, etc
https://issues.apache.org/jira/browse/PIG-2643
PIG-2641Create toJSON function for all complex types: tuples, bags and maps
https://issues.apache.org/jira/browse/PIG-2641
PIG-2591Unit 

[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3247:


Fix Version/s: 0.12
   Status: Patch Available  (was: Open)

> Piggybank functions to mimic OVER clause in SQL
> ---
>
> Key: PIG-3247
> URL: https://issues.apache.org/jira/browse/PIG-3247
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.12
>
> Attachments: Over.patch
>
>
> In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
> OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3247:


Attachment: Over.patch

> Piggybank functions to mimic OVER clause in SQL
> ---
>
> Key: PIG-3247
> URL: https://issues.apache.org/jira/browse/PIG-3247
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: Over.patch
>
>
> In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
> OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601801#comment-13601801
 ] 

Alan Gates commented on PIG-3247:
-

Basic OVER functionality can be accomplished in Pig using GROUP BY and FOREACH 
FLATTEN.  For example:

{code}
select s, min(i) over (partition by s) from T
{code}

is done in Pig as:

{code}
A = load 'T';
B = group A by s;
C = foreach B generate flatten(A), MIN(A.i) as min;
D = foreach C generate A::s, min;
{code}

But as soon as a windowing clause is added this no longer works because the 
function needs to be called once for each row in the bag and only a subset of 
the bag should be passed to the function.  To address this I've added two new 
functions:

Stitch - Given multiple bags this stitches them together row by row.  So if you 
have two bags:

{code}
bag A:
{ (1, 2), 
  (3, 4) }
bag B
{ (a, b),
  (c, d) }
{code}

Then Stitch(A, B) will return
{code}
{ (1, 2, a, b),
  (3, 4, c, d) }
{code}

Over - Implements the standard SQL windowing and analytic functions, including 
: rank, dense_rank, cume_dist, percent_rank, ntile, first_value, last_value, 
lead, and lag.  Together these can be used to do windowing and analytics 
functions in Pig.

Pig already has rank and dense_rank, and this is in no way meant to replace 
that.  This is meant to mimic exactly the SQL functionality.  Also, these 
functions make no allowance for large sets that don't fit in memory on a single 
reducer.  

> Piggybank functions to mimic OVER clause in SQL
> ---
>
> Key: PIG-3247
> URL: https://issues.apache.org/jira/browse/PIG-3247
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
> OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-13 Thread Alan Gates (JIRA)
Alan Gates created PIG-3247:
---

 Summary: Piggybank functions to mimic OVER clause in SQL
 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates


In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3246) not possible to use remote filesystems (S3) in a pig script

2013-03-13 Thread Moritz Moeller (JIRA)
Moritz Moeller created PIG-3246:
---

 Summary: not possible to use remote filesystems (S3) in a pig 
script
 Key: PIG-3246
 URL: https://issues.apache.org/jira/browse/PIG-3246
 Project: Pig
  Issue Type: Bug
 Environment: Apache Pig version 0.10.0-cdh4.2.0 (rexported)
Hadoop 2.0.0-cdh4.2.0
Reporter: Moritz Moeller


My Hadoop cluster is configured using hdfs://namenode/, hdfs dfs + Pig scripts 
work fine.
Now I want to read data from S3, hdfs dfs -ls s3n://mybucket/file.csv works 
fine.
A Pig script doing LOAD 's3n://mybucket/test.csv' however fails - looks as if 
Pig is performing the LOAD request using a hdfs FileSystem.
I wasn't sure whether to mark this as bug or improvement as I do not know if 
this should be possible - but as it is a basic feature for Hadoop I guess it 
should work in Pig, too.


org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
java.net.UnknownHostException: mybucket
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:452)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:469)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at 
org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
at 
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233)
at java.lang.Thread.run(Thread.java:722)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
sdfa
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at 
org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:295)
at 
org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:247)
at 
org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:468)
at 
org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:452)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:205)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:269)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
... 13 more
Caused by: java.net.UnknownHostException: mybucket
... 25 more




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3077) TestMultiQueryLocal should not write in /tmp

2013-03-13 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601720#comment-13601720
 ] 

Prashant Kommireddi commented on PIG-3077:
--

Hi [~dreambird], thanks for working on this. I have a comment:

{code}
String tdir = System.getProperty("user.dir") + "/build/test/tmp/";
{code}
user.dir would not be required here. Setting tdir to "build/test/tmp" should 
work.




> TestMultiQueryLocal should not write in /tmp
> 
>
> Key: PIG-3077
> URL: https://issues.apache.org/jira/browse/PIG-3077
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Johnny Zhang
> Attachments: PIG-3077.patch.txt
>
>
> temporary files from tests should be under build/test so that they are 
> cleaned by "ant clean"
> Currently two test suites running on the same machine step on each other and 
> create flaky tests results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3077) TestMultiQueryLocal should not write in /tmp

2013-03-13 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated PIG-3077:
--

Attachment: PIG-3077.patch.txt

this patch respect 'pig.temp.dir' first in case we want to put temp file in 
other location. Otherwise set location as System.getProperty("user.dir") + 
"/build/test/tmp/". 

It uses the FileLocalizer to get temp dir from PigContext, and then replace all 
/tmp in the test code.

I verify it and TestMultiQueryLocal passes for me.

> TestMultiQueryLocal should not write in /tmp
> 
>
> Key: PIG-3077
> URL: https://issues.apache.org/jira/browse/PIG-3077
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
> Attachments: PIG-3077.patch.txt
>
>
> temporary files from tests should be under build/test so that they are 
> cleaned by "ant clean"
> Currently two test suites running on the same machine step on each other and 
> create flaky tests results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3077) TestMultiQueryLocal should not write in /tmp

2013-03-13 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated PIG-3077:
--

Status: Patch Available  (was: Open)

> TestMultiQueryLocal should not write in /tmp
> 
>
> Key: PIG-3077
> URL: https://issues.apache.org/jira/browse/PIG-3077
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Johnny Zhang
> Attachments: PIG-3077.patch.txt
>
>
> temporary files from tests should be under build/test so that they are 
> cleaned by "ant clean"
> Currently two test suites running on the same machine step on each other and 
> create flaky tests results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3077) TestMultiQueryLocal should not write in /tmp

2013-03-13 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang reassigned PIG-3077:
-

Assignee: Johnny Zhang

> TestMultiQueryLocal should not write in /tmp
> 
>
> Key: PIG-3077
> URL: https://issues.apache.org/jira/browse/PIG-3077
> Project: Pig
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Johnny Zhang
> Attachments: PIG-3077.patch.txt
>
>
> temporary files from tests should be under build/test so that they are 
> cleaned by "ant clean"
> Currently two test suites running on the same machine step on each other and 
> create flaky tests results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3239) Unable to return multiple values from a macro using SPLIT

2013-03-13 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3239:
---

Status: Open  (was: Patch Available)

[~dreambird], thank you for the fix. I think your fix is correct. Can you 
please add a unit test case for this?

TestMacroExpansion.java has splitTest, but that doesn't cover OTHERWISE. You 
might want to expand that test case, or add a new test case.

Thanks!

> Unable to return multiple values from a macro using SPLIT
> -
>
> Key: PIG-3239
> URL: https://issues.apache.org/jira/browse/PIG-3239
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-cdh4.2.0 (rexported) 
> compiled Feb 15 2013, 12:19:17
> Linux 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64 
> x86_64 x86_64 GNU/Linux
>Reporter: Luis Belloch
>Assignee: Johnny Zhang
>Priority: Minor
> Attachments: PIG-3239.patch.txt
>
>
> Hi, I'm unable to return multiple values from a macro when values come from a 
> SPLIT. Here is an small example:
> {code}
> DEFINE my_macro(seq) RETURNS valid, invalid {
> added = FOREACH $seq GENERATE $0 * 2, $1;
> SPLIT added INTO $valid IF $1 == true, $invalid OTHERWISE;
> }
> data = LOAD 'case.csv' USING PigStorage(',') AS (value: int, valid: boolean);
> P, Q = my_macro(data);
> DUMP P;
> DUMP Q;
> {code}
> Pig is unable to recognize the {{OTHERWISE}} side. Error is: {{ERROR 
> org.apache.pig.tools.grunt.Grunt - ERROR 1200:  Invalid 
> macro definition: . Reason: Macro 'my_macro' missing return alias: invalid}}
> Simple workaround is to force {{$invalid}} to be returned as {{FOREACH}} 
> result:
> {code}
> SPLIT added INTO $valid IF $1 == true, tmp_invalid OTHERWISE;
> $invalid = FOREACH tmp_invalid GENERATE *;
> {code}
> Samples and logs attached to the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3239) Unable to return multiple values from a macro using SPLIT

2013-03-13 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated PIG-3239:
--

Status: Patch Available  (was: Open)

> Unable to return multiple values from a macro using SPLIT
> -
>
> Key: PIG-3239
> URL: https://issues.apache.org/jira/browse/PIG-3239
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-cdh4.2.0 (rexported) 
> compiled Feb 15 2013, 12:19:17
> Linux 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64 
> x86_64 x86_64 GNU/Linux
>Reporter: Luis Belloch
>Assignee: Johnny Zhang
>Priority: Minor
> Attachments: PIG-3239.patch.txt
>
>
> Hi, I'm unable to return multiple values from a macro when values come from a 
> SPLIT. Here is an small example:
> {code}
> DEFINE my_macro(seq) RETURNS valid, invalid {
> added = FOREACH $seq GENERATE $0 * 2, $1;
> SPLIT added INTO $valid IF $1 == true, $invalid OTHERWISE;
> }
> data = LOAD 'case.csv' USING PigStorage(',') AS (value: int, valid: boolean);
> P, Q = my_macro(data);
> DUMP P;
> DUMP Q;
> {code}
> Pig is unable to recognize the {{OTHERWISE}} side. Error is: {{ERROR 
> org.apache.pig.tools.grunt.Grunt - ERROR 1200:  Invalid 
> macro definition: . Reason: Macro 'my_macro' missing return alias: invalid}}
> Simple workaround is to force {{$invalid}} to be returned as {{FOREACH}} 
> result:
> {code}
> SPLIT added INTO $valid IF $1 == true, tmp_invalid OTHERWISE;
> $invalid = FOREACH tmp_invalid GENERATE *;
> {code}
> Samples and logs attached to the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3239) Unable to return multiple values from a macro using SPLIT

2013-03-13 Thread Luis Belloch (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600949#comment-13600949
 ] 

Luis Belloch commented on PIG-3239:
---

Thanks! We'll test it internally.



> Unable to return multiple values from a macro using SPLIT
> -
>
> Key: PIG-3239
> URL: https://issues.apache.org/jira/browse/PIG-3239
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-cdh4.2.0 (rexported) 
> compiled Feb 15 2013, 12:19:17
> Linux 3.2.0-38-generic #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013 x86_64 
> x86_64 x86_64 GNU/Linux
>Reporter: Luis Belloch
>Assignee: Johnny Zhang
>Priority: Minor
> Attachments: PIG-3239.patch.txt
>
>
> Hi, I'm unable to return multiple values from a macro when values come from a 
> SPLIT. Here is an small example:
> {code}
> DEFINE my_macro(seq) RETURNS valid, invalid {
> added = FOREACH $seq GENERATE $0 * 2, $1;
> SPLIT added INTO $valid IF $1 == true, $invalid OTHERWISE;
> }
> data = LOAD 'case.csv' USING PigStorage(',') AS (value: int, valid: boolean);
> P, Q = my_macro(data);
> DUMP P;
> DUMP Q;
> {code}
> Pig is unable to recognize the {{OTHERWISE}} side. Error is: {{ERROR 
> org.apache.pig.tools.grunt.Grunt - ERROR 1200:  Invalid 
> macro definition: . Reason: Macro 'my_macro' missing return alias: invalid}}
> Simple workaround is to force {{$invalid}} to be returned as {{FOREACH}} 
> result:
> {code}
> SPLIT added INTO $valid IF $1 == true, tmp_invalid OTHERWISE;
> $invalid = FOREACH tmp_invalid GENERATE *;
> {code}
> Samples and logs attached to the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira