[jira] Subscription: PIG patch available

2014-10-14 Thread jira
Issue Subscription
Filter: PIG patch available (18 issues)

Subscriber: pigdaily

Key Summary
PIG-4184UDF backward compatibility issue after POStatus.STATUS_NULL 
refactory
https://issues.apache.org/jira/browse/PIG-4184
PIG-4160-forcelocaljars / -j flag when using a remote url for a script
https://issues.apache.org/jira/browse/PIG-4160
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4103Fix TestRegisteredJarVisibility(after PIG-4083)
https://issues.apache.org/jira/browse/PIG-4103
PIG-4084Port TestPigRunner to Tez
https://issues.apache.org/jira/browse/PIG-4084
PIG-4066An optimization for ROLLUP operation in Pig
https://issues.apache.org/jira/browse/PIG-4066
PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce
https://issues.apache.org/jira/browse/PIG-4004
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3861duplicate jars get added to distributed cache
https://issues.apache.org/jira/browse/PIG-3861
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384


[jira] [Updated] (PIG-4151) Pig Cannot Write Empty Maps to HBase

2014-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4151:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.14 branch.

> Pig Cannot Write Empty Maps to HBase
> 
>
> Key: PIG-4151
> URL: https://issues.apache.org/jira/browse/PIG-4151
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4151-1.patch
>
>
> Pig is unable to write empty maps to HBase. Instruction for reproduce:
> input file pig_data_bad.txt:
> {code}
> row1;Homer;Morrison;[1#Silvia,2#Stacy]
> row2;Sheila;Fletcher;[1#Becky,2#Salvador,3#Lois]
> row4;Andre;Morton;[1#Nancy]
> row3;Sonja;Webb;[]
> {code}
> Create table in hbase:
> create 'test', 'info', 'friends'
> Pig script:
> {code}
> source = LOAD '/pig_data_bad.txt' USING PigStorage(';') AS (row:chararray, 
> first_name:chararray, last_name:chararray, friends:map[]);
> STORE source INTO 'hbase://test' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:fname info:lname 
> friends:*');
> {code}
> Stack:
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:880)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4151) Pig Cannot Write Empty Maps to HBase

2014-10-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171981#comment-14171981
 ] 

Daniel Dai commented on PIG-4151:
-

Thanks Thejas for review!

> Pig Cannot Write Empty Maps to HBase
> 
>
> Key: PIG-4151
> URL: https://issues.apache.org/jira/browse/PIG-4151
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4151-1.patch
>
>
> Pig is unable to write empty maps to HBase. Instruction for reproduce:
> input file pig_data_bad.txt:
> {code}
> row1;Homer;Morrison;[1#Silvia,2#Stacy]
> row2;Sheila;Fletcher;[1#Becky,2#Salvador,3#Lois]
> row4;Andre;Morton;[1#Nancy]
> row3;Sonja;Webb;[]
> {code}
> Create table in hbase:
> create 'test', 'info', 'friends'
> Pig script:
> {code}
> source = LOAD '/pig_data_bad.txt' USING PigStorage(';') AS (row:chararray, 
> first_name:chararray, last_name:chararray, friends:map[]);
> STORE source INTO 'hbase://test' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:fname info:lname 
> friends:*');
> {code}
> Stack:
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:880)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PIG-4233) Package pig along with dependencies into a fat jar while job submission to Spark cluster

2014-10-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171977#comment-14171977
 ] 

Rohini Palaniswamy edited comment on PIG-4233 at 10/15/14 4:51 AM:
---

[~praveenr019],
   The  pig-withouthadoop.jar should not contain spark specific jars and should 
only be pig dependencies. The actual fat jar containing hadoop dependencies has 
been totally removed in 0.14. If you need to create a fat jar with spark 
dependencies create a separate build target.


was (Author: rohini):
[~praveenr019],
   The  pig-withouthadoop.jar should not contain spark specific jars and should 
only be pig dependencies. The actual fat jar containing hadoop dependencies has 
been totally removed in 0.14. 

> Package pig along with dependencies into a fat jar while job submission to 
> Spark cluster
> 
>
> Key: PIG-4233
> URL: https://issues.apache.org/jira/browse/PIG-4233
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
> Attachments: PIG-4233.patch
>
>
> Currently we have a fat jar created in legacy directory which contains pig 
> along with dependencies. 
> Would need to modify build.xml to add spark dependency jars to include in 
> legacy fat jar.
> Running job on Spark cluster:
> 1. export SPARK_HOME=/path/to/spark
> 2. export 
> SPARK_PIG_JAR=$PIG_HOME/legacy/pig-0.14.0-SNAPSHOT-withouthadoop-h1.jar
> 3. export SPARK_MASTER=spark://localhost:7077
> 4 export HADOOP_HOME=/path/to/hadoop
> 5. Launch the job using ./bin/pig -x spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4233) Package pig along with dependencies into a fat jar while job submission to Spark cluster

2014-10-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171977#comment-14171977
 ] 

Rohini Palaniswamy commented on PIG-4233:
-

[~praveenr019],
   The  pig-withouthadoop.jar should not contain spark specific jars and should 
only be pig dependencies. The actual fat jar containing hadoop dependencies has 
been totally removed in 0.14. 

> Package pig along with dependencies into a fat jar while job submission to 
> Spark cluster
> 
>
> Key: PIG-4233
> URL: https://issues.apache.org/jira/browse/PIG-4233
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
> Attachments: PIG-4233.patch
>
>
> Currently we have a fat jar created in legacy directory which contains pig 
> along with dependencies. 
> Would need to modify build.xml to add spark dependency jars to include in 
> legacy fat jar.
> Running job on Spark cluster:
> 1. export SPARK_HOME=/path/to/spark
> 2. export 
> SPARK_PIG_JAR=$PIG_HOME/legacy/pig-0.14.0-SNAPSHOT-withouthadoop-h1.jar
> 3. export SPARK_MASTER=spark://localhost:7077
> 4 export HADOOP_HOME=/path/to/hadoop
> 5. Launch the job using ./bin/pig -x spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4235) Fix unit test failures on Windows

2014-10-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171976#comment-14171976
 ] 

Rohini Palaniswamy commented on PIG-4235:
-

Few comments:
  -  Can remove the line 
  -  You can either use line or value with jvmarg 
(http://ant.apache.org/manual/using.html#arg). So the new jvmarg value will be 
overriding the existing jvmarg line one. For which unit test failure do we need 
to add ${hadoop.root}\bin to library path?
  - Do we need both Util.generateURI and Util.encodeEscape. Can we inline or 
call Util.encodeEscape in Util.generateURI ?
  - TestUnion.java - Can we also inline or call Util.encodeEscape in 
Util.createInputFile. This would help avoid having to fix new tests often for 
Windows and can also remove Util.encodeEscape from existing tests.

> Fix unit test failures on Windows
> -
>
> Key: PIG-4235
> URL: https://issues.apache.org/jira/browse/PIG-4235
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4235-1.patch
>
>
> Bunch of unit tests fail on trunk and 0.14 branch, we need to fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4151) Pig Cannot Write Empty Maps to HBase

2014-10-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171925#comment-14171925
 ] 

Thejas M Nair commented on PIG-4151:


+1

> Pig Cannot Write Empty Maps to HBase
> 
>
> Key: PIG-4151
> URL: https://issues.apache.org/jira/browse/PIG-4151
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4151-1.patch
>
>
> Pig is unable to write empty maps to HBase. Instruction for reproduce:
> input file pig_data_bad.txt:
> {code}
> row1;Homer;Morrison;[1#Silvia,2#Stacy]
> row2;Sheila;Fletcher;[1#Becky,2#Salvador,3#Lois]
> row4;Andre;Morton;[1#Nancy]
> row3;Sonja;Webb;[]
> {code}
> Create table in hbase:
> create 'test', 'info', 'friends'
> Pig script:
> {code}
> source = LOAD '/pig_data_bad.txt' USING PigStorage(';') AS (row:chararray, 
> first_name:chararray, last_name:chararray, friends:map[]);
> STORE source INTO 'hbase://test' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:fname info:lname 
> friends:*');
> {code}
> Stack:
> java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:880)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
> at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4004) Upgrade the Pigmix queries from the (old) mapred API to mapreduce

2014-10-14 Thread Keren Ouaknine (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171880#comment-14171880
 ] 

Keren Ouaknine commented on PIG-4004:
-

You are right, a join can be done in a single MR job even in the old API by 
using the reporter object.  
Another reason I used the new API is because it's shorter code, and easier to 
understand.

The MR queries in the current Pig trunk are (i) failing at scale (due to 
overuse of memory), (ii) using three MR jobs to express one join, and (iii) 
using an old API. 
The patch solves all of the above :)

Thanks,
Keren


> Upgrade the Pigmix queries from the (old) mapred API to mapreduce
> -
>
> Key: PIG-4004
> URL: https://issues.apache.org/jira/browse/PIG-4004
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.12.1
>Reporter: Keren Ouaknine
> Fix For: 0.15.0
>
> Attachments: PIG-4004.patch
>
>
> Until now, the Pigmix queries were written using the old mapred API. 
> As a result, some queries were expressed with three concatenated MR jobs 
> instead of one. I rewrote all the queries to match the newer mapreduce API 
> and optimized them on the fly. 
> This is a continuity work to PIG-3915.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4227) Streaming Python UDF handles bag outputs incorrectly

2014-10-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171649#comment-14171649
 ] 

Daniel Dai commented on PIG-4227:
-

e2e tests StreamingPythonUDFs_5 and StreamingPythonUDFs_12 also fail due to 
this.

> Streaming Python UDF handles bag outputs incorrectly
> 
>
> Key: PIG-4227
> URL: https://issues.apache.org/jira/browse/PIG-4227
> Project: Pig
>  Issue Type: Bug
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.14.0
>
> Attachments: PIG-4227-1.patch
>
>
> I have a udf that generates different outputs when running as jython and 
> streaming python.
> {code:title=jython}
> {([[BBC Worldwide]])}
> {code} 
> {code:title=streaming python}
> {(BC Worldwid)}
> {code}
> The problem is that streaming python encodes a bag output incorrectly. For 
> this particular example, it serializes the output string as follows-
> {code}
> |{_[[BBC Worldwide]]|}_
> {code}
> where '|' and '\_' wrap bag delimiters '\{' and '\}'. i.e. '\{' => '|\{\_' 
> and '\}' => '|\}\_'.
> But this is wrong because bag must contain tuples not chararrays. i.e. the 
> correct encoding is as follows-
> {code}
> |{_|(_[[BBC Worldwide]]|)_|}_
> {code}
> where '|' and '_' wrap tuple delimiters '(' and ')' as well as bag delimiters.
> This results in truncated outputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (PIG-4227) Streaming Python UDF handles bag outputs incorrectly

2014-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reopened PIG-4227:
-

Seems this breaks TestStreamingUDF. I am looking.

> Streaming Python UDF handles bag outputs incorrectly
> 
>
> Key: PIG-4227
> URL: https://issues.apache.org/jira/browse/PIG-4227
> Project: Pig
>  Issue Type: Bug
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.14.0
>
> Attachments: PIG-4227-1.patch
>
>
> I have a udf that generates different outputs when running as jython and 
> streaming python.
> {code:title=jython}
> {([[BBC Worldwide]])}
> {code} 
> {code:title=streaming python}
> {(BC Worldwid)}
> {code}
> The problem is that streaming python encodes a bag output incorrectly. For 
> this particular example, it serializes the output string as follows-
> {code}
> |{_[[BBC Worldwide]]|}_
> {code}
> where '|' and '\_' wrap bag delimiters '\{' and '\}'. i.e. '\{' => '|\{\_' 
> and '\}' => '|\}\_'.
> But this is wrong because bag must contain tuples not chararrays. i.e. the 
> correct encoding is as follows-
> {code}
> |{_|(_[[BBC Worldwide]]|)_|}_
> {code}
> where '|' and '_' wrap tuple delimiters '(' and ')' as well as bag delimiters.
> This results in truncated outputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4210) Drop support for JDK 6 from Pig 0.14

2014-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4210:

  Component/s: build
Fix Version/s: (was: 0.14.0)
   0.15.0

Though we drop support for JDK 6 in Pig 0.14 and don't test JDK 6 regularly, we 
don't need to break it immediately. In fact, JDK 6 should be mostly Ok till 
now. I would like to keep the current setting for another version.

> Drop support for JDK 6 from Pig 0.14
> 
>
> Key: PIG-4210
> URL: https://issues.apache.org/jira/browse/PIG-4210
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohini Palaniswamy
> Fix For: 0.15.0
>
>
> Change default javac.version to 1.7 and publish jdk7 compiled jars to maven 
> from 0.14 release



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4199) Mapreduce ACLs should be translated to Tez ACLs

2014-10-14 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4199:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to branch-0.14 and trunk.

> Mapreduce ACLs should be translated to Tez ACLs
> ---
>
> Key: PIG-4199
> URL: https://issues.apache.org/jira/browse/PIG-4199
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.14.0
>
> Attachments: PIG-4199-1.patch
>
>
> MR job view and modify ACLs need to be translated and set on Tez



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-3977) Get TezStats working for Oozie

2014-10-14 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy reassigned PIG-3977:
---

Assignee: Rohini Palaniswamy

> Get TezStats working for Oozie
> --
>
> Key: PIG-3977
> URL: https://issues.apache.org/jira/browse/PIG-3977
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.14.0
>
>
>  Below methods accessed by Oozie now throw UnsupportedOperationException in 
> TezStats
> {code}
> jobStatsGroup.put("PROACTIVE_SPILL_COUNT_OBJECTS", 
> Long.toString(jobStats.getProactiveSpillCountObjects()));
> jobStatsGroup.put("PROACTIVE_SPILL_COUNT_RECS", 
> Long.toString(jobStats.getProactiveSpillCountRecs()));
> jobStatsGroup.put("SMMS_SPILL_COUNT", 
> Long.toString(jobStats.getSMMSpillCount()));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-4039) New interface for resetting static variables

2014-10-14 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy reassigned PIG-4039:
---

Assignee: Rohini Palaniswamy

> New interface for resetting static variables
> 
>
> Key: PIG-4039
> URL: https://issues.apache.org/jira/browse/PIG-4039
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.14.0
>
>
> In Tez, when there is container reuse, static variables, thread locals, etc 
> have to be reinitialized to avoid memory leak or having wrong values. For the 
> short term, we ended up making some of the static variables public or adding 
> destroy method to each class which is hacky. Also it does not help users who 
> will want something similar to be done in their UDFs or LoadFunc. Need to 
> define a interface with a reset/destroy method and find all loaded classes 
> implementing that interface and call destroy on them in PigProcessor.close(). 
> ServiceLoader and annotations are some of the ways to find classes 
> implementing an interface and there are other libraries as well. Need to find 
> the best and fastest way to do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4235) Fix unit test failures on Windows

2014-10-14 Thread Daniel Dai (JIRA)
Daniel Dai created PIG-4235:
---

 Summary: Fix unit test failures on Windows
 Key: PIG-4235
 URL: https://issues.apache.org/jira/browse/PIG-4235
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0
 Attachments: PIG-4235-1.patch

Bunch of unit tests fail on trunk and 0.14 branch, we need to fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4235) Fix unit test failures on Windows

2014-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4235:

Attachment: PIG-4235-1.patch

> Fix unit test failures on Windows
> -
>
> Key: PIG-4235
> URL: https://issues.apache.org/jira/browse/PIG-4235
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.14.0
>
> Attachments: PIG-4235-1.patch
>
>
> Bunch of unit tests fail on trunk and 0.14 branch, we need to fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-3749) PigPerformance - data in the map gets lost during parsing

2014-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3749:

Fix Version/s: (was: 0.14.0)
   0.15.0

> PigPerformance - data in the map gets lost during parsing
> -
>
> Key: PIG-3749
> URL: https://issues.apache.org/jira/browse/PIG-3749
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Keren Ouaknine
>Assignee: Keren Ouaknine
> Fix For: 0.15.0
>
> Attachments: PIG-3749.patch
>
>
> Create a Pigmix sample dataset which looks as follow:
> keren 1   2   qt  3   4   5.0 aaaabbbb 
> mccccddddeeeedmffffgggghhhh
> Launch the following query:
> A = load 'page_views_sample.txt' using 
> org.apache.pig.test.pigmix.udf.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> store A into 'L1out_A';
> B = foreach A generate user, (int)action as action, (map[])page_info as 
> page_info, flatten((bag{tuple(map[])})page_links) as page_links;
> store B into 'L1out_B';
> The result looks like this: 
> keren 1   [b#bbb,a#aaa]   [d#,e#eee,c#ccc]
> keren 1   [b#bbb,a#aaa]   [f#fff,g#ggg,h#hhh
> It is missing the 'ddd' value and a closing bracket.
> Thanks,
> Keren



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3749) PigPerformance - data in the map gets lost during parsing

2014-10-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171322#comment-14171322
 ] 

Daniel Dai commented on PIG-3749:
-

I tried something similar but not able to reproduce it.

Seems your patch deals with the 0x00 in the bytearray. Is it in the middle of 
the bytearray or in the end? I checked DataGenerator, it does not seems we 
generate 0x00 in the middle. If it is in the end, shouldn't it also be bounded 
by b.length?

Can you upload your page_views_sample with the offending record?

> PigPerformance - data in the map gets lost during parsing
> -
>
> Key: PIG-3749
> URL: https://issues.apache.org/jira/browse/PIG-3749
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Keren Ouaknine
>Assignee: Keren Ouaknine
> Fix For: 0.14.0
>
> Attachments: PIG-3749.patch
>
>
> Create a Pigmix sample dataset which looks as follow:
> keren 1   2   qt  3   4   5.0 aaaabbbb 
> mccccddddeeeedmffffgggghhhh
> Launch the following query:
> A = load 'page_views_sample.txt' using 
> org.apache.pig.test.pigmix.udf.PigPerformanceLoader()
> as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> store A into 'L1out_A';
> B = foreach A generate user, (int)action as action, (map[])page_info as 
> page_info, flatten((bag{tuple(map[])})page_links) as page_links;
> store B into 'L1out_B';
> The result looks like this: 
> keren 1   [b#bbb,a#aaa]   [d#,e#eee,c#ccc]
> keren 1   [b#bbb,a#aaa]   [f#fff,g#ggg,h#hhh
> It is missing the 'ddd' value and a closing bracket.
> Thanks,
> Keren



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4115) Timezone inconsistency in Pig Oozie action fails with "(org.apache.pig.builtin.ToDate2ARGS)[datetime]"

2014-10-14 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171241#comment-14171241
 ] 

Rohini Palaniswamy commented on PIG-4115:
-

 I just had something similar reported yesterday (timezone is UTC though) and 
user said it worked with jodatime 2.5. 

{code}
java.lang.IllegalArgumentException: Cannot parse "20141012": Illegal instant 
due to time zone offset transition (America/Santiago)
at 
org.joda.time.format.DateTimeParserBucket.computeMillis(DateTimeParserBucket.java:336)
at 
org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:662)
{code}

Can you recompile pig with jodatime version changed to 2.5 in 
ivy/libraries.properties and check if that works for you? jodatime is bundled 
into pig-withouthadoop.jar in pig versions before 0.14. 

> Timezone inconsistency in Pig Oozie action fails with 
> "(org.apache.pig.builtin.ToDate2ARGS)[datetime]"
> --
>
> Key: PIG-4115
> URL: https://issues.apache.org/jira/browse/PIG-4115
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Michalis Kongtongk
>Assignee: Rohini Palaniswamy
>
> Running a Pig Action in Oozie : 
> CentOS
> - mv /etc/localtime /etc/localtime.mv #backup your current tz
> - ln -sf /usr/share/zoneinfo/Europe/Zurich /etc/localtime # set tz to 
> Europe/Zurich
> - create a hdfs://tmp/file.txt in hdfs with content "1"
> In Pig do:
> A = load '/tmp/file.txt' as (a:chararray);
> B = foreach A generate *, ToDate('02/11/1940', 'dd/MM/') ;
> dump B;
> In Oozie Pig Action, 
> produce the same script in in workflow.xml and execute
> this is where it'll fail.
> {code}
> ...
> ERROR 0: Exception while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-254 Operator 
> Key: scope-254) children: null at []]: 
> java.lang.IllegalArgumentException: Cannot parse "02/11/1940": Illegal 
> instant due to time zone offset transition (Europe/Zurich) 
> ...
> {code}
> Since Oozie is using Pig as a library, I believe they should behave the same.
> We notice this inconsistency, when the OS is set to
> $ date +%Z # timezone name 
> CEST 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PIG-4231) Make rank work with Spark

2014-10-14 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park reassigned PIG-4231:
--

Assignee: Carlos Balduz

> Make rank work with Spark
> -
>
> Key: PIG-4231
> URL: https://issues.apache.org/jira/browse/PIG-4231
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Carlos Balduz
>Assignee: Carlos Balduz
>  Labels: spork
>
> Rank does not work with Spark since PORank and POCounter have  not been 
> implemented yet.
> Pig Stack Trace
> ---
> ERROR 0: java.lang.IllegalArgumentException: Spork unsupported 
> PhysicalOperator: (Name: DATA: POCounter[tuple] - scope-146 Operator Key: 
> scope-146)
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> java.lang.IllegalArgumentException: Spork unsupported PhysicalOperator: 
> (Name: DATA: POCounter[tuple] - scope-146 Operator Key: scope-146)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:285)
>   at org.apache.pig.PigServer.launchPlan(PigServer.java:1378)
>   at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1363)
>   at org.apache.pig.PigServer.execute(PigServer.java:1352)
>   at org.apache.pig.PigServer.executeBatch(PigServer.java:403)
>   at org.apache.pig.PigServer.executeBatch(PigServer.java:386)
>   at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:233)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:204)
>   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
>   at org.apache.pig.Main.run(Main.java:482)
>   at org.apache.pig.Main.main(Main.java:164)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Error "ERROR 2088: Fetch failed. Couldn't retrieve result" happened during "HCatLoader() then DUMP"

2014-10-14 Thread lulynn_2008
Hi All,
I was running HCatStore and HCatLoader in pig grunt. But encounter "ERROR 2088: 
Fetch failed. Couldn't retrieve result". Please help give a glance and give 
your suggestions. Thanks.

Test case:
1. Create table in hive:
create table junit_unparted_basic(a int, b string) stored as RCFILE 
tblproperties('hcat.isd'='org.apache.hive.hcatalog.rcfile.RCFileInputDriver','hcat.osd'='org.apache.hive.hcatalog.rcfile.RCFileOutputDriver');
2. copy basic.input.data file into hdfs, here is the content in file:
1S1S
1S2S
1S3S
2S1S
2S2S
2S3S
3S1S
3S2S
3S3S

3. run Pig: pig -useHCatalog
4. grunt> A = load 'basic.input.data' as (a:int, b:chararray);
5. grunt> store A into 'junit_unparted_basic' using 
org.apache.hive.hcatalog.pig.HCatStorer();
6. X = load 'junit_unparted_basic' using 
org.apache.hive.hcatalog.pig.HCatLoader();

7. grunt> dump X

Error Log:

Pig Stack Trace
---
ERROR 2088: Fetch failed. Couldn't retrieve result

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias X
at org.apache.pig.PigServer.openIterator(PigServer.java:912)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:542)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:619)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias X
at org.apache.pig.PigServer.storeEx(PigServer.java:1015)
at org.apache.pig.PigServer.store(PigServer.java:974)
at org.apache.pig.PigServer.openIterator(PigServer.java:887)
... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2088: 
Fetch failed. Couldn't retrieve result
at 
org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:180)
at 
org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:275)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
at org.apache.pig.PigServer.storeEx(PigServer.java:1011)
... 14 more




[jira] [Updated] (PIG-4234) Order By error after Group By in Spark

2014-10-14 Thread Carlos Balduz (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlos Balduz updated PIG-4234:
---
Summary: Order By error after Group By in Spark  (was: Order By error after 
Group By)

> Order By error after Group By in Spark
> --
>
> Key: PIG-4234
> URL: https://issues.apache.org/jira/browse/PIG-4234
> Project: Pig
>  Issue Type: Bug
>  Components: spark
>Reporter: Carlos Balduz
>  Labels: spork
>
> Trying to sort after a Group By produces the following error:
> 2014-10-14 16:04:55,189 [Executor task launch worker-0] ERROR 
> org.apache.spark.executor.Executor - Exception in task 3.0 in stage 0.0 (TID 
> 4)
> java.io.NotSerializableException: 
> org.apache.pig.data.SelfSpillBag$MemoryLimits
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> at java.util.ArrayList.writeObject(ArrayList.java:742)
> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Operations like for instance Rank By are not possible with this error, since 
> it needs to sort right after grouping the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4234) Order By error after Group By

2014-10-14 Thread Carlos Balduz (JIRA)
Carlos Balduz created PIG-4234:
--

 Summary: Order By error after Group By
 Key: PIG-4234
 URL: https://issues.apache.org/jira/browse/PIG-4234
 Project: Pig
  Issue Type: Bug
  Components: spark
Reporter: Carlos Balduz


Trying to sort after a Group By produces the following error:

2014-10-14 16:04:55,189 [Executor task launch worker-0] ERROR 
org.apache.spark.executor.Executor - Exception in task 3.0 in stage 0.0 (TID 4)
java.io.NotSerializableException: org.apache.pig.data.SelfSpillBag$MemoryLimits
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at java.util.ArrayList.writeObject(ArrayList.java:742)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Operations like for instance Rank By are not possible with this error, since it 
needs to sort right after grouping the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4233) Package pig along with dependencies into a fat jar while job submission to Spark cluster

2014-10-14 Thread Praveen Rachabattuni (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen Rachabattuni updated PIG-4233:
--
Description: 
Currently we have a fat jar created in legacy directory which contains pig 
along with dependencies. 
Would need to modify build.xml to add spark dependency jars to include in 
legacy fat jar.

Running job on Spark cluster:

1. export SPARK_HOME=/path/to/spark
2. export 
SPARK_PIG_JAR=$PIG_HOME/legacy/pig-0.14.0-SNAPSHOT-withouthadoop-h1.jar
3. export SPARK_MASTER=spark://localhost:7077
4 export HADOOP_HOME=/path/to/hadoop
5. Launch the job using ./bin/pig -x spark

  was:
Currently we have a fat jar created in legacy directory which contains pig 
along with dependencies. 
Would need to modify build.xml to add spark dependency jars to include in 
legacy fat jar.


> Package pig along with dependencies into a fat jar while job submission to 
> Spark cluster
> 
>
> Key: PIG-4233
> URL: https://issues.apache.org/jira/browse/PIG-4233
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
> Attachments: PIG-4233.patch
>
>
> Currently we have a fat jar created in legacy directory which contains pig 
> along with dependencies. 
> Would need to modify build.xml to add spark dependency jars to include in 
> legacy fat jar.
> Running job on Spark cluster:
> 1. export SPARK_HOME=/path/to/spark
> 2. export 
> SPARK_PIG_JAR=$PIG_HOME/legacy/pig-0.14.0-SNAPSHOT-withouthadoop-h1.jar
> 3. export SPARK_MASTER=spark://localhost:7077
> 4 export HADOOP_HOME=/path/to/hadoop
> 5. Launch the job using ./bin/pig -x spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4233) Package pig along with dependencies into a fat jar while job submission to Spark cluster

2014-10-14 Thread Praveen Rachabattuni (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen Rachabattuni updated PIG-4233:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Package pig along with dependencies into a fat jar while job submission to 
> Spark cluster
> 
>
> Key: PIG-4233
> URL: https://issues.apache.org/jira/browse/PIG-4233
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
> Attachments: PIG-4233.patch
>
>
> Currently we have a fat jar created in legacy directory which contains pig 
> along with dependencies. 
> Would need to modify build.xml to add spark dependency jars to include in 
> legacy fat jar.
> Running job on Spark cluster:
> 1. export SPARK_HOME=/path/to/spark
> 2. export 
> SPARK_PIG_JAR=$PIG_HOME/legacy/pig-0.14.0-SNAPSHOT-withouthadoop-h1.jar
> 3. export SPARK_MASTER=spark://localhost:7077
> 4 export HADOOP_HOME=/path/to/hadoop
> 5. Launch the job using ./bin/pig -x spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4233) Package pig along with dependencies into a fat jar while job submission to Spark cluster

2014-10-14 Thread Praveen Rachabattuni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170869#comment-14170869
 ] 

Praveen Rachabattuni commented on PIG-4233:
---

Attached patch. Dependencies are currently added from build.xml using pattern 
matching i.e. scala*.jar, rather than mentioning the exact version which could 
added as another jira later.

Committed to spark branch and closing this issue for now.

> Package pig along with dependencies into a fat jar while job submission to 
> Spark cluster
> 
>
> Key: PIG-4233
> URL: https://issues.apache.org/jira/browse/PIG-4233
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
> Attachments: PIG-4233.patch
>
>
> Currently we have a fat jar created in legacy directory which contains pig 
> along with dependencies. 
> Would need to modify build.xml to add spark dependency jars to include in 
> legacy fat jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4233) Package pig along with dependencies into a fat jar while job submission to Spark cluster

2014-10-14 Thread Praveen Rachabattuni (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen Rachabattuni updated PIG-4233:
--
Attachment: PIG-4233.patch

> Package pig along with dependencies into a fat jar while job submission to 
> Spark cluster
> 
>
> Key: PIG-4233
> URL: https://issues.apache.org/jira/browse/PIG-4233
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
> Attachments: PIG-4233.patch
>
>
> Currently we have a fat jar created in legacy directory which contains pig 
> along with dependencies. 
> Would need to modify build.xml to add spark dependency jars to include in 
> legacy fat jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4233) Package pig along with dependencies into a fat jar while job submission to Spark cluster

2014-10-14 Thread Praveen Rachabattuni (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen Rachabattuni updated PIG-4233:
--
Status: Patch Available  (was: Open)

> Package pig along with dependencies into a fat jar while job submission to 
> Spark cluster
> 
>
> Key: PIG-4233
> URL: https://issues.apache.org/jira/browse/PIG-4233
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Praveen Rachabattuni
>Assignee: Praveen Rachabattuni
>
> Currently we have a fat jar created in legacy directory which contains pig 
> along with dependencies. 
> Would need to modify build.xml to add spark dependency jars to include in 
> legacy fat jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4233) Package pig along with dependencies into a fat jar while job submission to Spark cluster

2014-10-14 Thread Praveen Rachabattuni (JIRA)
Praveen Rachabattuni created PIG-4233:
-

 Summary: Package pig along with dependencies into a fat jar while 
job submission to Spark cluster
 Key: PIG-4233
 URL: https://issues.apache.org/jira/browse/PIG-4233
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Reporter: Praveen Rachabattuni
Assignee: Praveen Rachabattuni


Currently we have a fat jar created in legacy directory which contains pig 
along with dependencies. 
Would need to modify build.xml to add spark dependency jars to include in 
legacy fat jar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4231) Make rank work with Spark

2014-10-14 Thread Praveen Rachabattuni (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170790#comment-14170790
 ] 

Praveen Rachabattuni commented on PIG-4231:
---

Could you add Carlos Balduz to the contributor list, so I can assign the
issue to him.

Thanks,
Praveen R

On Tue, Oct 14, 2014 at 4:42 PM, Carlos Balduz (JIRA) 



> Make rank work with Spark
> -
>
> Key: PIG-4231
> URL: https://issues.apache.org/jira/browse/PIG-4231
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Carlos Balduz
>  Labels: spork
>
> Rank does not work with Spark since PORank and POCounter have  not been 
> implemented yet.
> Pig Stack Trace
> ---
> ERROR 0: java.lang.IllegalArgumentException: Spork unsupported 
> PhysicalOperator: (Name: DATA: POCounter[tuple] - scope-146 Operator Key: 
> scope-146)
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> java.lang.IllegalArgumentException: Spork unsupported PhysicalOperator: 
> (Name: DATA: POCounter[tuple] - scope-146 Operator Key: scope-146)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:285)
>   at org.apache.pig.PigServer.launchPlan(PigServer.java:1378)
>   at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1363)
>   at org.apache.pig.PigServer.execute(PigServer.java:1352)
>   at org.apache.pig.PigServer.executeBatch(PigServer.java:403)
>   at org.apache.pig.PigServer.executeBatch(PigServer.java:386)
>   at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:233)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:204)
>   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
>   at org.apache.pig.Main.run(Main.java:482)
>   at org.apache.pig.Main.main(Main.java:164)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4231) Make rank work with Spark

2014-10-14 Thread Carlos Balduz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170783#comment-14170783
 ] 

Carlos Balduz commented on PIG-4231:


Could you please assign this task to me [~praveenr019]?

> Make rank work with Spark
> -
>
> Key: PIG-4231
> URL: https://issues.apache.org/jira/browse/PIG-4231
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Carlos Balduz
>  Labels: spork
>
> Rank does not work with Spark since PORank and POCounter have  not been 
> implemented yet.
> Pig Stack Trace
> ---
> ERROR 0: java.lang.IllegalArgumentException: Spork unsupported 
> PhysicalOperator: (Name: DATA: POCounter[tuple] - scope-146 Operator Key: 
> scope-146)
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> java.lang.IllegalArgumentException: Spork unsupported PhysicalOperator: 
> (Name: DATA: POCounter[tuple] - scope-146 Operator Key: scope-146)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:285)
>   at org.apache.pig.PigServer.launchPlan(PigServer.java:1378)
>   at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1363)
>   at org.apache.pig.PigServer.execute(PigServer.java:1352)
>   at org.apache.pig.PigServer.executeBatch(PigServer.java:403)
>   at org.apache.pig.PigServer.executeBatch(PigServer.java:386)
>   at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:233)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:204)
>   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
>   at org.apache.pig.Main.run(Main.java:482)
>   at org.apache.pig.Main.main(Main.java:164)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4115) Timezone inconsistency in Pig Oozie action fails with "(org.apache.pig.builtin.ToDate2ARGS)[datetime]"

2014-10-14 Thread Giovanni Ruggiero (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170754#comment-14170754
 ] 

Giovanni Ruggiero commented on PIG-4115:


Yes, we also incurred in this bug (using STORE) and we don't have OOZIE.

> Timezone inconsistency in Pig Oozie action fails with 
> "(org.apache.pig.builtin.ToDate2ARGS)[datetime]"
> --
>
> Key: PIG-4115
> URL: https://issues.apache.org/jira/browse/PIG-4115
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Michalis Kongtongk
>Assignee: Rohini Palaniswamy
>
> Running a Pig Action in Oozie : 
> CentOS
> - mv /etc/localtime /etc/localtime.mv #backup your current tz
> - ln -sf /usr/share/zoneinfo/Europe/Zurich /etc/localtime # set tz to 
> Europe/Zurich
> - create a hdfs://tmp/file.txt in hdfs with content "1"
> In Pig do:
> A = load '/tmp/file.txt' as (a:chararray);
> B = foreach A generate *, ToDate('02/11/1940', 'dd/MM/') ;
> dump B;
> In Oozie Pig Action, 
> produce the same script in in workflow.xml and execute
> this is where it'll fail.
> {code}
> ...
> ERROR 0: Exception while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-254 Operator 
> Key: scope-254) children: null at []]: 
> java.lang.IllegalArgumentException: Cannot parse "02/11/1940": Illegal 
> instant due to time zone offset transition (Europe/Zurich) 
> ...
> {code}
> Since Oozie is using Pig as a library, I believe they should behave the same.
> We notice this inconsistency, when the OS is set to
> $ date +%Z # timezone name 
> CEST 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Pig-trunk #1681

2014-10-14 Thread Apache Jenkins Server
See 



[jira] [Commented] (PIG-4115) Timezone inconsistency in Pig Oozie action fails with "(org.apache.pig.builtin.ToDate2ARGS)[datetime]"

2014-10-14 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170743#comment-14170743
 ] 

Harsh J commented on PIG-4115:
--

Just curious - does this happen when if you use STORE instead of DUMP as well? 
If it does not, then the issue belongs on the OOZIE project.

> Timezone inconsistency in Pig Oozie action fails with 
> "(org.apache.pig.builtin.ToDate2ARGS)[datetime]"
> --
>
> Key: PIG-4115
> URL: https://issues.apache.org/jira/browse/PIG-4115
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Michalis Kongtongk
>Assignee: Rohini Palaniswamy
>
> Running a Pig Action in Oozie : 
> CentOS
> - mv /etc/localtime /etc/localtime.mv #backup your current tz
> - ln -sf /usr/share/zoneinfo/Europe/Zurich /etc/localtime # set tz to 
> Europe/Zurich
> - create a hdfs://tmp/file.txt in hdfs with content "1"
> In Pig do:
> A = load '/tmp/file.txt' as (a:chararray);
> B = foreach A generate *, ToDate('02/11/1940', 'dd/MM/') ;
> dump B;
> In Oozie Pig Action, 
> produce the same script in in workflow.xml and execute
> this is where it'll fail.
> {code}
> ...
> ERROR 0: Exception while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-254 Operator 
> Key: scope-254) children: null at []]: 
> java.lang.IllegalArgumentException: Cannot parse "02/11/1940": Illegal 
> instant due to time zone offset transition (Europe/Zurich) 
> ...
> {code}
> Since Oozie is using Pig as a library, I believe they should behave the same.
> We notice this inconsistency, when the OS is set to
> $ date +%Z # timezone name 
> CEST 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions

2014-10-14 Thread Remi Catherinot (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170728#comment-14170728
 ] 

Remi Catherinot commented on PIG-3259:
--

Make SanityChecker thread safe. Current implementation is statefull (because of 
the numDots field) and not used within a synchronized block so it's not 
thread-safe. Make sanityCheckIntegerLongDecimal so it returns a byte, 0 would 
mean long/integer/byte/short, 1 would mean double, 2 would mean NaN. Doing so 
would make it thread safe and won't slow down implementation.

another little speed up is : when doing if (str.charAt(i)>='0' && 
str.charAt(i)<='9' &&  charAt(i) ... charAt(i) )
This can be replaced by declaring a char before the test, and use it in the 
test :
char c;
if ( (c=str.charAt(i))>='0' && c<='9' && ... c  c )
because this code only calls charAt once

Also beware, it seems to me that you change the contract of the method. The 
current one tries its best to find a Long, if it fails then it fully relies on 
the JVM parsing (and so on the full specs) which cause the performance 
degradation in case of a bad format (mostly because of the exception). In the 
optimized one, if the check fails, null is returned. We can only do this if we 
are really fully confident on the fact the checker follow strictfully all the 
JVM number format specs (like for exemple octal long values, hexadecimal values 
which use 'p' rather than 'e' as their exponent operator, etc.).

Maybe a good way would be to take the code from the src.jar shipped with the 
JVM changing the "throw NumberFormatException" behavior with a "return null + 
rounding in case of double2long implicit cast" behavior, which is what you want 
to achieve. The JVM is slow in case of bad format because of the exception but 
is the fastest in case of good format. Just changing the behaviour.

> Optimize byte to Long/Integer conversions
> -
>
> Key: PIG-3259
> URL: https://issues.apache.org/jira/browse/PIG-3259
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.11, 0.11.1
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.15.0
>
> Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)