[jira] [Updated] (PIG-4838) Fix test TestBuiltin

2016-03-19 Thread Pallavi Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Rao updated PIG-4838:
-
Status: Patch Available  (was: Open)

[~xuefuz], please commit.

> Fix test TestBuiltin
> 
>
> Key: PIG-4838
> URL: https://issues.apache.org/jira/browse/PIG-4838
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4838.patch
>
>
> In https://builds.apache.org/job/Pig-spark/316/, following unit tests fail:
> org.apache.pig.test.TestBuiltin.testRANDOMWithJob
> org.apache.pig.test.TestBuiltin.testUniqueID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4838) Fix test TestBuiltin

2016-03-19 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-4838:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Liyun!

> Fix test TestBuiltin
> 
>
> Key: PIG-4838
> URL: https://issues.apache.org/jira/browse/PIG-4838
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4838.patch
>
>
> In https://builds.apache.org/job/Pig-spark/316/, following unit tests fail:
> org.apache.pig.test.TestBuiltin.testRANDOMWithJob
> org.apache.pig.test.TestBuiltin.testUniqueID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is still unstable: Pig-trunk-commit #2303

2016-03-19 Thread Apache Jenkins Server
See 



[jira] [Commented] (PIG-4837) TestNativeMapReduce test fix

2016-03-19 Thread Xianda Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200866#comment-15200866
 ] 

Xianda Ke commented on PIG-4837:


Hi [~kellyzly], 
how about if  move the static function executeCommand()  to utility class, such 
as org.apache.pig.impl.util.Utils or SparkUtil ? 

> TestNativeMapReduce test fix
> 
>
> Key: PIG-4837
> URL: https://issues.apache.org/jira/browse/PIG-4837
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4837.patch, build23.PNG
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4837) TestNativeMapReduce test fix

2016-03-19 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200926#comment-15200926
 ] 

liyunzhang_intel commented on PIG-4837:
---

 [~xuefuz]:  This issue can not be reproduced on my own jenkins now. Please 
first checkin the PIG-4837.patch so we can output the ARG_MAX value in the log.
if ARG_MAX of the jenkins server is very small, the issue is reasonable. if 
ARG_MAX is big like 262144,this maybe a random issue.

> TestNativeMapReduce test fix
> 
>
> Key: PIG-4837
> URL: https://issues.apache.org/jira/browse/PIG-4837
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4837.patch, build23.PNG
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4837) TestNativeMapReduce test fix

2016-03-19 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4837:
--
Attachment: build23.PNG
PIG-4837.patch

> TestNativeMapReduce test fix
> 
>
> Key: PIG-4837
> URL: https://issues.apache.org/jira/browse/PIG-4837
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4837.patch, build23.PNG
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4837) TestNativeMapReduce test fix

2016-03-19 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200990#comment-15200990
 ] 

Xuefu Zhang commented on PIG-4837:
--

Committed. Thanks, Liyun! I will keep this JIRA open for now.

> TestNativeMapReduce test fix
> 
>
> Key: PIG-4837
> URL: https://issues.apache.org/jira/browse/PIG-4837
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4837.patch, build23.PNG
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4837) TestNativeMapReduce test fix

2016-03-19 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200804#comment-15200804
 ] 

liyunzhang_intel commented on PIG-4837:
---

[~pallavi.rao]: yes i also met "Argument list too long" problem on my jenkins 
server. but when i used the attached patch to output the value of ARG_MAX of my 
jenkins server in the program. it shows the ARG_MAX is 2621440 and the error 
disappears. All the unit tests about TestNativeMapReduce pass on my jenkins(see 
build23.png). 

> TestNativeMapReduce test fix
> 
>
> Key: PIG-4837
> URL: https://issues.apache.org/jira/browse/PIG-4837
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4837) TestNativeMapReduce test fix

2016-03-19 Thread Pallavi Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197234#comment-15197234
 ] 

Pallavi Rao commented on PIG-4837:
--

These tests pass on my machine. On the build machine, the error is :
{noformat}
Stack trace: java.io.IOException: Cannot run program "bash" (in directory 
"/home/jenkins/jenkins-slave/workspace/Pig-spark/target/PigMiniCluster/PigMiniCluster-localDir-nm-0_3/usercache/jenkins/appcache/application_1457627184976_0002/container_1457627184976_0002_01_04"):
 error=7, Argument list too long
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)

Caused by: java.io.IOException: error=7, Argument list too long
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.(UNIXProcess.java:135)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
... 10 more
{noformat}

This, I believe is due to this setting -> 
http://www.in-ulm.de/~mascheck/various/argmax/

On my machine, 
{noformat}
$ getconf ARG_MAX
262144
{noformat}


> TestNativeMapReduce test fix
> 
>
> Key: PIG-4837
> URL: https://issues.apache.org/jira/browse/PIG-4837
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Xianda Ke
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4842) Collected group doesn't work in some cases

2016-03-19 Thread Xianda Ke (JIRA)
Xianda Ke created PIG-4842:
--

 Summary: Collected group doesn't work in some cases
 Key: PIG-4842
 URL: https://issues.apache.org/jira/browse/PIG-4842
 Project: Pig
  Issue Type: Sub-task
Reporter: Xianda Ke


Scenario:
1. input data:
cat collectedgroup1
1
1
2

2. pig script:
A = LOAD 'collectedgroup1' USING myudfs.DummyCollectableLoader() AS (id);
B = GROUP A by $0 USING 'collected';
C = GROUP B by $0 USING 'collected';
DUMP C;

The expected output:
(1,{(1,{(1),(1)})})
(2,{(2,{(2)})})

The actual output:
(1,{(1,{(1),(1)})})
(1,)
(2,{(2,{(2)})})






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4796) Authenticate with Kerberos using a keytab file

2016-03-19 Thread Niels Basjes (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199372#comment-15199372
 ] 

Niels Basjes commented on PIG-4796:
---

[~rohini] & [~daijy] Do I need to do something to get this included?

> Authenticate with Kerberos using a keytab file
> --
>
> Key: PIG-4796
> URL: https://issues.apache.org/jira/browse/PIG-4796
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.15.0
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>  Labels: feature, kerberos, security
> Attachments: 2016-02-18-1510-PIG-4796.patch, 
> 2016-02-18-PIG-4796-rough-proof-of-concept.patch, PIG-4796-2016-02-23.patch, 
> PIG-4796-4.patch
>
>
> When running in a Kerberos secured environment users are faced with the 
> limitation that their jobs cannot run longer than the (remaining) ticket 
> lifetime of their Kerberos tickets. The environment I work in these tickets 
> expire after 10 hours, thus limiting the maximum job duration to at most 10 
> hours (which is a problem).
> In the Hadoop tooling there is a feature where you can authenticate using a 
> Kerberos keytab file (essentially a file that contains the encrypted form of 
> the kerberos principal and password). Using this the running application can 
> request new tickets from the Kerberos server when the initial tickets expire.
> In my Java/Hadoop applications I commonly include these two lines:
> {code}
> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
> UserGroupInformation.loginUserFromKeytab("nbas...@xx.net", 
> "/home/nbasjes/.krb/nbasjes.keytab");
> {code}
> This way I have run an Apache Flink based application for more than 170 hours 
> (about a week) on the kerberos secured Yarn cluster.
> What I propose is to have a feature that I can set the relevant kerberos 
> values in my pig script and from there be able to run a pig job for many days 
> on the secured cluster.
> Proposal how this can look in a pig script:
> {code}
> SET java.security.krb5.conf '/etc/krb5.conf'
> SET job.security.krb5.principal 'nbas...@xx.net'
> SET job.security.krb5.keytab '/home/nbasjes/.krb/nbasjes.keytab'
> {code}
> So iff all of these are set (or at least the last two) then the 
> aforementioned  UserGroupInformation.loginUserFromKeytab method is called 
> before submitting the job to the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4796) Authenticate with Kerberos using a keytab file

2016-03-19 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4796:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.16.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Niels!

> Authenticate with Kerberos using a keytab file
> --
>
> Key: PIG-4796
> URL: https://issues.apache.org/jira/browse/PIG-4796
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.15.0
>Reporter: Niels Basjes
>Assignee: Niels Basjes
>  Labels: feature, kerberos, security
> Fix For: 0.16.0
>
> Attachments: 2016-02-18-1510-PIG-4796.patch, 
> 2016-02-18-PIG-4796-rough-proof-of-concept.patch, PIG-4796-2016-02-23.patch, 
> PIG-4796-4.patch
>
>
> When running in a Kerberos secured environment users are faced with the 
> limitation that their jobs cannot run longer than the (remaining) ticket 
> lifetime of their Kerberos tickets. The environment I work in these tickets 
> expire after 10 hours, thus limiting the maximum job duration to at most 10 
> hours (which is a problem).
> In the Hadoop tooling there is a feature where you can authenticate using a 
> Kerberos keytab file (essentially a file that contains the encrypted form of 
> the kerberos principal and password). Using this the running application can 
> request new tickets from the Kerberos server when the initial tickets expire.
> In my Java/Hadoop applications I commonly include these two lines:
> {code}
> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
> UserGroupInformation.loginUserFromKeytab("nbas...@xx.net", 
> "/home/nbasjes/.krb/nbasjes.keytab");
> {code}
> This way I have run an Apache Flink based application for more than 170 hours 
> (about a week) on the kerberos secured Yarn cluster.
> What I propose is to have a feature that I can set the relevant kerberos 
> values in my pig script and from there be able to run a pig job for many days 
> on the secured cluster.
> Proposal how this can look in a pig script:
> {code}
> SET java.security.krb5.conf '/etc/krb5.conf'
> SET job.security.krb5.principal 'nbas...@xx.net'
> SET job.security.krb5.keytab '/home/nbasjes/.krb/nbasjes.keytab'
> {code}
> So iff all of these are set (or at least the last two) then the 
> aforementioned  UserGroupInformation.loginUserFromKeytab method is called 
> before submitting the job to the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-4837) TestNativeMapReduce test fix

2016-03-19 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel reassigned PIG-4837:
-

Assignee: liyunzhang_intel  (was: Xianda Ke)

> TestNativeMapReduce test fix
> 
>
> Key: PIG-4837
> URL: https://issues.apache.org/jira/browse/PIG-4837
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: PIG patch available

2016-03-19 Thread jira
Issue Subscription
Filter: PIG patch available (31 issues)

Subscriber: pigdaily

Key Summary
PIG-4841Inline-op with schema declaration fails with syntax error
https://issues.apache.org/jira/browse/PIG-4841
PIG-4796Authenticate with Kerberos using a keytab file
https://issues.apache.org/jira/browse/PIG-4796
PIG-4788the value BytesRead metric info always returns 0 even the length of 
input file is not 0 in spark engine
https://issues.apache.org/jira/browse/PIG-4788
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4734TOMAP schema inferring breaks some scripts in type checking for 
bincond
https://issues.apache.org/jira/browse/PIG-4734
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4526Make setting up the build environment easier
https://issues.apache.org/jira/browse/PIG-4526
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3906ant site errors out
https://issues.apache.org/jira/browse/PIG-3906
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384


[jira] Subscription: PIG patch available

2016-03-19 Thread jira
Issue Subscription
Filter: PIG patch available (30 issues)

Subscriber: pigdaily

Key Summary
PIG-4841Inline-op with schema declaration fails with syntax error
https://issues.apache.org/jira/browse/PIG-4841
PIG-4788the value BytesRead metric info always returns 0 even the length of 
input file is not 0 in spark engine
https://issues.apache.org/jira/browse/PIG-4788
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4734TOMAP schema inferring breaks some scripts in type checking for 
bincond
https://issues.apache.org/jira/browse/PIG-4734
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4526Make setting up the build environment easier
https://issues.apache.org/jira/browse/PIG-4526
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3906ant site errors out
https://issues.apache.org/jira/browse/PIG-3906
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384


[jira] [Assigned] (PIG-4842) Collected group doesn't work in some cases

2016-03-19 Thread Xianda Ke (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianda Ke reassigned PIG-4842:
--

Assignee: Xianda Ke

> Collected group doesn't work in some cases
> --
>
> Key: PIG-4842
> URL: https://issues.apache.org/jira/browse/PIG-4842
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
>
> Scenario:
> 1. input data:
> cat collectedgroup1
> 1
> 1
> 2
> 2. pig script:
> A = LOAD 'collectedgroup1' USING myudfs.DummyCollectableLoader() AS (id);
> B = GROUP A by $0 USING 'collected';
> C = GROUP B by $0 USING 'collected';
> DUMP C;
> The expected output:
> (1,{(1,{(1),(1)})})
> (2,{(2,{(2)})})
> The actual output:
> (1,{(1,{(1),(1)})})
> (1,)
> (2,{(2,{(2)})})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: PIG patch available

2016-03-19 Thread jira
Issue Subscription
Filter: PIG patch available (30 issues)

Subscriber: pigdaily

Key Summary
PIG-4841Inline-op with schema declaration fails with syntax error
https://issues.apache.org/jira/browse/PIG-4841
PIG-4788the value BytesRead metric info always returns 0 even the length of 
input file is not 0 in spark engine
https://issues.apache.org/jira/browse/PIG-4788
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4734TOMAP schema inferring breaks some scripts in type checking for 
bincond
https://issues.apache.org/jira/browse/PIG-4734
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4526Make setting up the build environment easier
https://issues.apache.org/jira/browse/PIG-4526
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3906ant site errors out
https://issues.apache.org/jira/browse/PIG-3906
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384