[jira] [Commented] (PIG-4554) Compress pig.script before encoding

2015-09-18 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876225#comment-14876225
 ] 

Rohini Palaniswamy commented on PIG-4554:
-

bq. Misunderstood "Submit Patch" as the workflow to submit the patch. 
  You were right the first time. Once you upload patch. You click on "Submit 
Patch" to make it "Patch Available".

Have few comments based on the new feature we added to show the uncompressed 
Pig script in Tez UI.

1) ScriptState.java
 - Can you also retain the old truncated script in a new variable 
 - Can you change current getScript() to getSerializedScript(); and 
TezScriptState and MRScriptState refer to that.
 - Can you change getScript() to now return the truncated orginal script.
2) TezJobCompiler.java
Change

String script = new 
String(Base64.decodeBase64(TezScriptState.get().getScript()));
tezDag.setDAGInfo(createDagInfo(script));

to 

 tezDag.setDAGInfo(createDagInfo(TezScriptState.get().getScript()));  // The 
truncated uncompressed script is shown in the Tez DAG UI. I have seen a lot of 
huge scripts. So better to show the truncated one here and have folks go 
decompress pig.script if full script is needed.


> Compress pig.script before encoding
> ---
>
> Key: PIG-4554
> URL: https://issues.apache.org/jira/browse/PIG-4554
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4554.patch
>
>
>   Currently we truncate the pig script (maxScriptSize = 10240) and base64 
> encode it and store in config. We should remove the truncation and store the 
> full script by compressing and then doing base64 encoding. We already do that 
> for udfcontext, etc. It will save space as it will compress really well and 
> will also give the full pig script while debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4682) Update Pig

2015-09-18 Thread Olaf Flebbe (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805294#comment-14805294
 ] 

Olaf Flebbe commented on PIG-4682:
--

Sorry wrong queue

> Update Pig
> --
>
> Key: PIG-4682
> URL: https://issues.apache.org/jira/browse/PIG-4682
> Project: Pig
>  Issue Type: Bug
>Reporter: Olaf Flebbe
>Assignee: Olaf Flebbe
>
> * Update Pig to 0.15
> * Incorporate PIG-4676
> * fix other small issues
> Needed since hive update will break PIG as it is now



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4682) Update Pig

2015-09-18 Thread Olaf Flebbe (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olaf Flebbe resolved PIG-4682.
--
Resolution: Invalid

> Update Pig
> --
>
> Key: PIG-4682
> URL: https://issues.apache.org/jira/browse/PIG-4682
> Project: Pig
>  Issue Type: Bug
>Reporter: Olaf Flebbe
>Assignee: Olaf Flebbe
>
> * Update Pig to 0.15
> * Incorporate PIG-4676
> * fix other small issues
> Needed since hive update will break PIG as it is now



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4682) Update Pig

2015-09-18 Thread Olaf Flebbe (JIRA)
Olaf Flebbe created PIG-4682:


 Summary: Update Pig
 Key: PIG-4682
 URL: https://issues.apache.org/jira/browse/PIG-4682
 Project: Pig
  Issue Type: Bug
Reporter: Olaf Flebbe
Assignee: Olaf Flebbe


* Update Pig to 0.15
* Incorporate PIG-4676
* fix other small issues

Needed since hive update will break PIG as it is now



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4554) Compress pig.script before encoding

2015-09-18 Thread Sandeep Samdaria (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876305#comment-14876305
 ] 

Sandeep Samdaria commented on PIG-4554:
---

I have added a new variable called {{truncatedScript}} and storing the 
truncated script in that variable. 
I have one question. How can I test and verify that the Tez DAG UI is showing 
the truncated script?
I will upload the patch, once I am able to verify the above.

> Compress pig.script before encoding
> ---
>
> Key: PIG-4554
> URL: https://issues.apache.org/jira/browse/PIG-4554
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4554.patch
>
>
>   Currently we truncate the pig script (maxScriptSize = 10240) and base64 
> encode it and store in config. We should remove the truncation and store the 
> full script by compressing and then doing base64 encoding. We already do that 
> for udfcontext, etc. It will save space as it will compress really well and 
> will also give the full pig script while debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4673) Built In UDF - REPLACE_MULTI : For a given string, search and replace all occurrences of search keys with replacement values.

2015-09-18 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876190#comment-14876190
 ] 

Rohini Palaniswamy commented on PIG-4673:
-

Good feature [~murali.k.h@gmail.com]. Would you be interested in enhancing 
this UDF for better performance in a new jira?

http://stackoverflow.com/questions/7661460/replace-multiple-substrings-at-once/7661573#7661573
 - You can basically compile the Pattern once and cache it (Have a limit on the 
cache if the search strings are variable and not constant)  and do the multiple 
replace in one go. Have seen a lot of jobs suffer in performance because of 
UDFs with regex match and not reusing compiled Pattern. 

> Built In UDF - REPLACE_MULTI : For a given string, search and replace all 
> occurrences of search keys with replacement values. 
> --
>
> Key: PIG-4673
> URL: https://issues.apache.org/jira/browse/PIG-4673
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: site
>Reporter: Murali Rao
>Assignee: Murali Rao
>Priority: Minor
>  Labels: None
> Fix For: 0.16.0
>
> Attachments: PIG-4673-1.patch, replace_multi_udf.patch
>
>
> Lets say we have a string = 'A1B2C3D4'. Our objective is to replace A with 1, 
> B with 2, C with 3 and D with 4 to derive 11223344 string. 
> Using existing REPLACE method 
> REPLACE(REPLACE(REPLACE(REPLACE('A1B2C3D4','A','1'),'B','2'),'C','3'),'D','4')
>  
> With proposed UDF : REPLACE_MULTI method
> General Syntax : 
> REPLACE_MULTI ( sourceString,  [  search1#replacement1, ... ] )
> REPLACE_MULTI ( 'A1B2C3D4',  [ 'A'#'1','B'#'2', 'C'#'3', 'D'#'4' ] )
> Advantage : 
>   1. Function calls are reduced. 
>   2. Ease to code and better readable.
>   
> Let me know your thoughts/ inputs on having this UDF in Piggy Bank. Will take 
> this up based on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4554) Compress pig.script before encoding

2015-09-18 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4554:

Assignee: Sandeep Samdaria

> Compress pig.script before encoding
> ---
>
> Key: PIG-4554
> URL: https://issues.apache.org/jira/browse/PIG-4554
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Sandeep Samdaria
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4554.patch
>
>
>   Currently we truncate the pig script (maxScriptSize = 10240) and base64 
> encode it and store in config. We should remove the truncation and store the 
> full script by compressing and then doing base64 encoding. We already do that 
> for udfcontext, etc. It will save space as it will compress really well and 
> will also give the full pig script while debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4673) Built In UDF - REPLACE_MULTI : For a given string, search and replace all occurrences of search keys with replacement values.

2015-09-18 Thread Murali Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876840#comment-14876840
 ] 

Murali Rao commented on PIG-4673:
-

[~rohini] : Thanks for your inputs, will check on the performance and will take 
necessary actions. 

> Built In UDF - REPLACE_MULTI : For a given string, search and replace all 
> occurrences of search keys with replacement values. 
> --
>
> Key: PIG-4673
> URL: https://issues.apache.org/jira/browse/PIG-4673
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: site
>Reporter: Murali Rao
>Assignee: Murali Rao
>Priority: Minor
>  Labels: None
> Fix For: 0.16.0
>
> Attachments: PIG-4673-1.patch, replace_multi_udf.patch
>
>
> Lets say we have a string = 'A1B2C3D4'. Our objective is to replace A with 1, 
> B with 2, C with 3 and D with 4 to derive 11223344 string. 
> Using existing REPLACE method 
> REPLACE(REPLACE(REPLACE(REPLACE('A1B2C3D4','A','1'),'B','2'),'C','3'),'D','4')
>  
> With proposed UDF : REPLACE_MULTI method
> General Syntax : 
> REPLACE_MULTI ( sourceString,  [  search1#replacement1, ... ] )
> REPLACE_MULTI ( 'A1B2C3D4',  [ 'A'#'1','B'#'2', 'C'#'3', 'D'#'4' ] )
> Advantage : 
>   1. Function calls are reduced. 
>   2. Ease to code and better readable.
>   
> Let me know your thoughts/ inputs on having this UDF in Piggy Bank. Will take 
> this up based on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4554) Compress pig.script before encoding

2015-09-18 Thread Sandeep Samdaria (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Samdaria updated PIG-4554:
--
Attachment: PIG-4554-2.patch

Made changes as suggested by Rohini.

> Compress pig.script before encoding
> ---
>
> Key: PIG-4554
> URL: https://issues.apache.org/jira/browse/PIG-4554
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Sandeep Samdaria
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4554-2.patch, PIG-4554.patch
>
>
>   Currently we truncate the pig script (maxScriptSize = 10240) and base64 
> encode it and store in config. We should remove the truncation and store the 
> full script by compressing and then doing base64 encoding. We already do that 
> for udfcontext, etc. It will save space as it will compress really well and 
> will also give the full pig script while debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4677) Display failure information on stop on failure

2015-09-18 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated PIG-4677:
---
Attachment: PIG-4677.2.patch

Updated the patch.

> Display failure information on stop on failure
> --
>
> Key: PIG-4677
> URL: https://issues.apache.org/jira/browse/PIG-4677
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: PIG-4677.2.patch, PIG-4677.patch
>
>
> When stop on failure option is specified, pig abruptly exits without 
> displaying any job stats or failed job information which it usually does in 
> case of failures.
> {code}
> 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 9% complete
> 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Running jobs are 
> [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756]
> 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR 
> org.apache.pig.tools.grunt.Grunt  - ERROR 6017: Job failed!
> Hadoop Job IDs executed by Pig: 
> job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754
> <<< Invocation of Main class completed <<<
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4677) Display failure information on stop on failure

2015-09-18 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated PIG-4677:
---
Status: Patch Available  (was: Open)

> Display failure information on stop on failure
> --
>
> Key: PIG-4677
> URL: https://issues.apache.org/jira/browse/PIG-4677
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: PIG-4677.2.patch, PIG-4677.patch
>
>
> When stop on failure option is specified, pig abruptly exits without 
> displaying any job stats or failed job information which it usually does in 
> case of failures.
> {code}
> 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 9% complete
> 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Running jobs are 
> [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756]
> 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR 
> org.apache.pig.tools.grunt.Grunt  - ERROR 6017: Job failed!
> Hadoop Job IDs executed by Pig: 
> job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754
> <<< Invocation of Main class completed <<<
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4554) Compress pig.script before encoding

2015-09-18 Thread Sandeep Samdaria (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Samdaria updated PIG-4554:
--
Status: Patch Available  (was: Open)

> Compress pig.script before encoding
> ---
>
> Key: PIG-4554
> URL: https://issues.apache.org/jira/browse/PIG-4554
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Sandeep Samdaria
>  Labels: newbie
> Fix For: 0.16.0
>
> Attachments: PIG-4554.patch
>
>
>   Currently we truncate the pig script (maxScriptSize = 10240) and base64 
> encode it and store in config. We should remove the truncation and store the 
> full script by compressing and then doing base64 encoding. We already do that 
> for udfcontext, etc. It will save space as it will compress really well and 
> will also give the full pig script while debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4677) Display failure information on stop on failure

2015-09-18 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876652#comment-14876652
 ] 

Rohini Palaniswamy commented on PIG-4677:
-

It should still be assertFalse(server.existsFile("done")); . With this change 
it is still not stopping execution of the script when it is compiled in two 
phases due to fs statements. Will have to make checkStopOnFailure return true 
instead of void, and throw new ExecException(msg.toString(), errCode, 
PigException.REMOTE_ENVIRONMENT); instead of return pigStats at the end in 
launchPig if checkStopOnFailure returned true. 

> Display failure information on stop on failure
> --
>
> Key: PIG-4677
> URL: https://issues.apache.org/jira/browse/PIG-4677
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: PIG-4677.2.patch, PIG-4677.patch
>
>
> When stop on failure option is specified, pig abruptly exits without 
> displaying any job stats or failed job information which it usually does in 
> case of failures.
> {code}
> 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 9% complete
> 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Running jobs are 
> [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756]
> 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR 
> org.apache.pig.tools.grunt.Grunt  - ERROR 6017: Job failed!
> Hadoop Job IDs executed by Pig: 
> job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754
> <<< Invocation of Main class completed <<<
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)