[jira] Subscription: PIG patch available

2015-09-17 Thread jira
Issue Subscription
Filter: PIG patch available (30 issues)

Subscriber: pigdaily

Key Summary
PIG-4677Display failure information on stop on failure
https://issues.apache.org/jira/browse/PIG-4677
PIG-4670Embedded Python scripts still parse line by line
https://issues.apache.org/jira/browse/PIG-4670
PIG-4667Enable Pig on Spark to run on Yarn Client mode
https://issues.apache.org/jira/browse/PIG-4667
PIG-4663HBaseStorage should allow the MaxResultsPerColumnFamily limit to 
avoid memory or scan timeout issues
https://issues.apache.org/jira/browse/PIG-4663
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4644PORelationToExprProject.clone() is broken
https://issues.apache.org/jira/browse/PIG-4644
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4468Pig's jackson version conflicts with that of hadoop 2.6.0
https://issues.apache.org/jira/browse/PIG-4468
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4417Pig's register command should support automatic fetching of jars 
from repo.
https://issues.apache.org/jira/browse/PIG-4417
PIG-4373Implement PIG-3861 in Tez
https://issues.apache.org/jira/browse/PIG-4373
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384


[jira] [Created] (PIG-4680) Enable pig job graphs to resume from last successful state

2015-09-17 Thread Abhishek Agarwal (JIRA)
Abhishek Agarwal created PIG-4680:
-

 Summary: Enable pig job graphs to resume from last successful state
 Key: PIG-4680
 URL: https://issues.apache.org/jira/browse/PIG-4680
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Abhishek Agarwal


Pig scripts can have multiple ETL jobs in the DAG which may take hours to 
finish. In case of transient errors, the job fails. When the job is rerun, all 
the nodes in Job graph will rerun. Some of these nodes may have already run 
successfully. Redundant runs lead to wastage of cluster capacity and pipeline 
delays. 

In case of failure, we can persist the graph state. In next run, only the 
failed nodes and their successors will rerun. This is of course subject to 
preconditions such as 
 - Pig script has not changed
 - Input locations have not changed
 - Output data from previous run is intact
 - Configuration has not changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Pig-trunk #1825

2015-09-17 Thread Apache Jenkins Server
See 

Changes:

[daijy] PIG-4673: Built In UDF - REPLACE_MULTI : For a given string, search and 
replace all occurrences of search keys with replacement values

[daijy] PIG-4674: TOMAP should infer schema

[daijy] PIG-4679: Performance degradation due to InputSizeReducerEstimator 
since PIG-3754

[daijy] PIG-4676: Upgrade Hive to 1.2.1 (PIG-4676-fixtest.patch)

--
[...truncated 67 lines...]
[ivy:configure] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = 


ivy-resolve:

ivy-compile:
[ivy:cachepath] DEPRECATED: 'ivy.conf.file' is deprecated, use 
'ivy.settings.file' instead
[ivy:cachepath] :: loading settings :: file = 


init:
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

 [move] Moving 1 file to 


cc-compile:
   [javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
   [javacc] (type "javacc" with no arguments for help)
   [javacc] Reading from file 

 . . .
   [javacc] File "TokenMgrError.java" does not exist.  Will create one.
   [javacc] File "ParseException.java" does not exist.  Will create one.
   [javacc] File "Token.java" does not exist.  Will create one.
   [javacc] File "JavaCharStream.java" does not exist.  Will create one.
   [javacc] Parser generated successfully.
   [javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
   [javacc] (type "javacc" with no arguments for help)
   [javacc] Reading from file 

 . . .
   [javacc] Warning: Lookahead adequacy checking not being performed since 
option LOOKAHEAD is more than 1.  Set option FORCE_LA_CHECK to true to force 
checking.
   [javacc] File "TokenMgrError.java" does not exist.  Will create one.
   [javacc] File "ParseException.java" does not exist.  Will create one.
   [javacc] File "Token.java" does not exist.  Will create one.
   [javacc] File "JavaCharStream.java" does not exist.  Will create one.
   [javacc] Parser generated with 0 errors and 1 warnings.
   [javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
   [javacc] (type "javacc" with no arguments for help)
   [javacc] Reading from file 

 . . .
   [javacc] File "TokenMgrError.java" is being rebuilt.
   [javacc] File "ParseException.java" is being rebuilt.
   [javacc] File "Token.java" is being rebuilt.
   [javacc] File "JavaCharStream.java" is being rebuilt.
   [javacc] Parser generated successfully.
   [jjtree] Java Compiler Compiler Version 4.2 (Tree Builder)
   [jjtree] (type "jjtree" with no arguments for help)
   [jjtree] Reading from file 

 . . .
   [jjtree] File "Node.java" does not exist.  Will create one.
   [jjtree] File "SimpleNode.java" does not exist.  Will create one.
   [jjtree] File "DOTParserTreeConstants.java" does not exist.  Will create one.
   [jjtree] File "JJTDOTParserState.java" does not exist.  Will create one.
   [jjtree] Annotated grammar generated successfully in 

   [javacc] Java Compiler Compiler Version 4.2 (Parser Generator)
   [javacc] (type "javacc" with no arguments for help)
   [javacc] Reading from file 

 . . .
   [javacc] File "TokenMgrError.java" does not exist.  Will create one.
   [javacc] File "ParseException.java" does not exist.  Will create one.
   [javacc] File "Token.java" does not exist.  Will create one.
   [javacc] File "SimpleCharStream.java" does not exist.  Will create one.
   [javacc] Parser generated successfully.

prepare:
[mkdir] Created dir: 

[jira] [Commented] (PIG-4680) Enable pig job graphs to resume from last successful state

2015-09-17 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14802876#comment-14802876
 ] 

Srikanth Sundarrajan commented on PIG-4680:
---

This can be quite handy, particularly when pig scripts is launched via oozie 
and if the launcher were to fail and attempt is retried.

> Enable pig job graphs to resume from last successful state
> --
>
> Key: PIG-4680
> URL: https://issues.apache.org/jira/browse/PIG-4680
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Abhishek Agarwal
>Assignee: Abhishek Agarwal
>
> Pig scripts can have multiple ETL jobs in the DAG which may take hours to 
> finish. In case of transient errors, the job fails. When the job is rerun, 
> all the nodes in Job graph will rerun. Some of these nodes may have already 
> run successfully. Redundant runs lead to wastage of cluster capacity and 
> pipeline delays. 
> In case of failure, we can persist the graph state. In next run, only the 
> failed nodes and their successors will rerun. This is of course subject to 
> preconditions such as 
>  - Pig script has not changed
>  - Input locations have not changed
>  - Output data from previous run is intact
>  - Configuration has not changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4680) Enable pig job graphs to resume from last successful state

2015-09-17 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan updated PIG-4680:
--
Assignee: Abhishek Agarwal

> Enable pig job graphs to resume from last successful state
> --
>
> Key: PIG-4680
> URL: https://issues.apache.org/jira/browse/PIG-4680
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Abhishek Agarwal
>Assignee: Abhishek Agarwal
>
> Pig scripts can have multiple ETL jobs in the DAG which may take hours to 
> finish. In case of transient errors, the job fails. When the job is rerun, 
> all the nodes in Job graph will rerun. Some of these nodes may have already 
> run successfully. Redundant runs lead to wastage of cluster capacity and 
> pipeline delays. 
> In case of failure, we can persist the graph state. In next run, only the 
> failed nodes and their successors will rerun. This is of course subject to 
> preconditions such as 
>  - Pig script has not changed
>  - Input locations have not changed
>  - Output data from previous run is intact
>  - Configuration has not changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4674) TOMAP should infer schema

2015-09-17 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803365#comment-14803365
 ] 

Rohini Palaniswamy commented on PIG-4674:
-

+1 for the 
https://issues.apache.org/jira/secure/attachment/12757130/PIG-4674-fixtest.patch.

> TOMAP should infer schema
> -
>
> Key: PIG-4674
> URL: https://issues.apache.org/jira/browse/PIG-4674
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.16.0
>
> Attachments: PIG-4674-1.patch, PIG-4674-2.patch, PIG-4674-3.patch, 
> PIG-4674-fixtest.patch
>
>
> TOMAP schema is map only without map value schema. This should be inferred if 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4677) Display failure information on stop on failure

2015-09-17 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated PIG-4677:
---
Status: Open  (was: Patch Available)

Oh.
I will update the patch.

> Display failure information on stop on failure
> --
>
> Key: PIG-4677
> URL: https://issues.apache.org/jira/browse/PIG-4677
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: PIG-4677.patch
>
>
> When stop on failure option is specified, pig abruptly exits without 
> displaying any job stats or failed job information which it usually does in 
> case of failures.
> {code}
> 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 9% complete
> 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Running jobs are 
> [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756]
> 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR 
> org.apache.pig.tools.grunt.Grunt  - ERROR 6017: Job failed!
> Hadoop Job IDs executed by Pig: 
> job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754
> <<< Invocation of Main class completed <<<
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Pig-trunk #1826

2015-09-17 Thread Apache Jenkins Server
See 



[jira] [Updated] (PIG-4674) TOMAP should infer schema

2015-09-17 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4674:

Attachment: PIG-4674-fixtest.patch

Commit PIG-4674-fixtest.patch to fix unit test failure.

> TOMAP should infer schema
> -
>
> Key: PIG-4674
> URL: https://issues.apache.org/jira/browse/PIG-4674
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.16.0
>
> Attachments: PIG-4674-1.patch, PIG-4674-2.patch, PIG-4674-3.patch, 
> PIG-4674-fixtest.patch
>
>
> TOMAP schema is map only without map value schema. This should be inferred if 
> available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4681) Enable Pig on Spark to run on Yarn Cluster mode

2015-09-17 Thread Srikanth Sundarrajan (JIRA)
Srikanth Sundarrajan created PIG-4681:
-

 Summary: Enable Pig on Spark to run on Yarn Cluster mode
 Key: PIG-4681
 URL: https://issues.apache.org/jira/browse/PIG-4681
 Project: Pig
  Issue Type: Sub-task
  Components: spark
Affects Versions: spark-branch
Reporter: Srikanth Sundarrajan
Assignee: Srikanth Sundarrajan
 Fix For: spark-branch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4681) Enable Pig on Spark to run on Yarn Cluster mode

2015-09-17 Thread Srikanth Sundarrajan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Sundarrajan updated PIG-4681:
--
Labels: spork  (was: spark)

> Enable Pig on Spark to run on Yarn Cluster mode
> ---
>
> Key: PIG-4681
> URL: https://issues.apache.org/jira/browse/PIG-4681
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Affects Versions: spark-branch
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
>  Labels: spork
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4667) Enable Pig on Spark to run on Yarn Client mode

2015-09-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-4667:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, [~sriksun].

> Enable Pig on Spark to run on Yarn Client mode
> --
>
> Key: PIG-4667
> URL: https://issues.apache.org/jira/browse/PIG-4667
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Srikanth Sundarrajan
>Assignee: Srikanth Sundarrajan
> Fix For: spark-branch
>
> Attachments: PIG-4667-logs.tgz, PIG-4667-v1.patch, PIG-4667.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4554) Compress pig.script before encoding

2015-09-17 Thread Sandeep Samdaria (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Samdaria updated PIG-4554:
--
Status: Patch Available  (was: Open)

> Compress pig.script before encoding
> ---
>
> Key: PIG-4554
> URL: https://issues.apache.org/jira/browse/PIG-4554
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4554.patch
>
>
>   Currently we truncate the pig script (maxScriptSize = 10240) and base64 
> encode it and store in config. We should remove the truncation and store the 
> full script by compressing and then doing base64 encoding. We already do that 
> for udfcontext, etc. It will save space as it will compress really well and 
> will also give the full pig script while debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4554) Compress pig.script before encoding

2015-09-17 Thread Sandeep Samdaria (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Samdaria updated PIG-4554:
--
Attachment: PIG-4554.patch

Attaching the patch.

> Compress pig.script before encoding
> ---
>
> Key: PIG-4554
> URL: https://issues.apache.org/jira/browse/PIG-4554
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4554.patch
>
>
>   Currently we truncate the pig script (maxScriptSize = 10240) and base64 
> encode it and store in config. We should remove the truncation and store the 
> full script by compressing and then doing base64 encoding. We already do that 
> for udfcontext, etc. It will save space as it will compress really well and 
> will also give the full pig script while debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: PIG-4554 related question.

2015-09-17 Thread Sandeep Samdaria
Hi,
I have attached the patch on the jira PIG-4554
 . The patch contains the
following:

   - Truncate of the script has been removed.
   - The script is encoded using the ObjectSerializer.serialize(script).

Could you please review it and let me know if I have missed anything?

Thanks,
Sandeep.

On Sun, Sep 13, 2015 at 9:42 PM, Sandeep Samdaria  wrote:

> Hi,
> I was looking at the newbie ticket PIG-4554
>  and wanted to provide a
> fix for it. After setting up the environment and on debugging, I found that
> the fix which needs to be made is inside setScript(String script)of
> ScriptSate class.
>
> In the jira, its mentioned: "We should remove the truncation and store the
> full script by compressing and then doing base64 encoding. We already do
> that for udfcontext, etc."
>
> Could somebody please give me some pointer where I can find an
> implementation for the udfcontext, so that I can have a look at it and
> re-use it, if possible ?
>
> Another question not related to the jira.
> I executed "exec test.pig" in the grunt console of eclipse (debugging
> org.apache.pig.Main from eclipse). I observed that the workflow is
> different(setScript is not called) as compared to "pig test.pig"
> execution. Is there a reason why the "exec test.pig" from grunt and "pig
> test.pig" have different behavior?
> If the behavior is expected, then is it possible that I can attach the
> eclipse debugger to the Pig Source code when I execute "pig test.pig" from
> terminal?
>
> Thanks,
> Sandeep.
>


[jira] [Updated] (PIG-4554) Compress pig.script before encoding

2015-09-17 Thread Sandeep Samdaria (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Samdaria updated PIG-4554:
--
Status: Open  (was: Patch Available)

Misunderstood "Submit Patch" as the workflow to submit the patch. Reverting it 
back to open.

> Compress pig.script before encoding
> ---
>
> Key: PIG-4554
> URL: https://issues.apache.org/jira/browse/PIG-4554
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4554.patch
>
>
>   Currently we truncate the pig script (maxScriptSize = 10240) and base64 
> encode it and store in config. We should remove the truncation and store the 
> full script by compressing and then doing base64 encoding. We already do that 
> for udfcontext, etc. It will save space as it will compress really well and 
> will also give the full pig script while debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)