[jira] [Commented] (PIG-4554) Compress pig.script before encoding
[ https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876225#comment-14876225 ] Rohini Palaniswamy commented on PIG-4554: - bq. Misunderstood "Submit Patch" as the workflow to submit the patch. You were right the first time. Once you upload patch. You click on "Submit Patch" to make it "Patch Available". Have few comments based on the new feature we added to show the uncompressed Pig script in Tez UI. 1) ScriptState.java - Can you also retain the old truncated script in a new variable - Can you change current getScript() to getSerializedScript(); and TezScriptState and MRScriptState refer to that. - Can you change getScript() to now return the truncated orginal script. 2) TezJobCompiler.java Change String script = new String(Base64.decodeBase64(TezScriptState.get().getScript())); tezDag.setDAGInfo(createDagInfo(script)); to tezDag.setDAGInfo(createDagInfo(TezScriptState.get().getScript())); // The truncated uncompressed script is shown in the Tez DAG UI. I have seen a lot of huge scripts. So better to show the truncated one here and have folks go decompress pig.script if full script is needed. > Compress pig.script before encoding > --- > > Key: PIG-4554 > URL: https://issues.apache.org/jira/browse/PIG-4554 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy > Labels: newbie > Fix For: 0.16.0 > > Attachments: PIG-4554.patch > > > Currently we truncate the pig script (maxScriptSize = 10240) and base64 > encode it and store in config. We should remove the truncation and store the > full script by compressing and then doing base64 encoding. We already do that > for udfcontext, etc. It will save space as it will compress really well and > will also give the full pig script while debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4682) Update Pig
[ https://issues.apache.org/jira/browse/PIG-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805294#comment-14805294 ] Olaf Flebbe commented on PIG-4682: -- Sorry wrong queue > Update Pig > -- > > Key: PIG-4682 > URL: https://issues.apache.org/jira/browse/PIG-4682 > Project: Pig > Issue Type: Bug >Reporter: Olaf Flebbe >Assignee: Olaf Flebbe > > * Update Pig to 0.15 > * Incorporate PIG-4676 > * fix other small issues > Needed since hive update will break PIG as it is now -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PIG-4682) Update Pig
[ https://issues.apache.org/jira/browse/PIG-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olaf Flebbe resolved PIG-4682. -- Resolution: Invalid > Update Pig > -- > > Key: PIG-4682 > URL: https://issues.apache.org/jira/browse/PIG-4682 > Project: Pig > Issue Type: Bug >Reporter: Olaf Flebbe >Assignee: Olaf Flebbe > > * Update Pig to 0.15 > * Incorporate PIG-4676 > * fix other small issues > Needed since hive update will break PIG as it is now -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4682) Update Pig
Olaf Flebbe created PIG-4682: Summary: Update Pig Key: PIG-4682 URL: https://issues.apache.org/jira/browse/PIG-4682 Project: Pig Issue Type: Bug Reporter: Olaf Flebbe Assignee: Olaf Flebbe * Update Pig to 0.15 * Incorporate PIG-4676 * fix other small issues Needed since hive update will break PIG as it is now -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4554) Compress pig.script before encoding
[ https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876305#comment-14876305 ] Sandeep Samdaria commented on PIG-4554: --- I have added a new variable called {{truncatedScript}} and storing the truncated script in that variable. I have one question. How can I test and verify that the Tez DAG UI is showing the truncated script? I will upload the patch, once I am able to verify the above. > Compress pig.script before encoding > --- > > Key: PIG-4554 > URL: https://issues.apache.org/jira/browse/PIG-4554 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy > Labels: newbie > Fix For: 0.16.0 > > Attachments: PIG-4554.patch > > > Currently we truncate the pig script (maxScriptSize = 10240) and base64 > encode it and store in config. We should remove the truncation and store the > full script by compressing and then doing base64 encoding. We already do that > for udfcontext, etc. It will save space as it will compress really well and > will also give the full pig script while debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4673) Built In UDF - REPLACE_MULTI : For a given string, search and replace all occurrences of search keys with replacement values.
[ https://issues.apache.org/jira/browse/PIG-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876190#comment-14876190 ] Rohini Palaniswamy commented on PIG-4673: - Good feature [~murali.k.h@gmail.com]. Would you be interested in enhancing this UDF for better performance in a new jira? http://stackoverflow.com/questions/7661460/replace-multiple-substrings-at-once/7661573#7661573 - You can basically compile the Pattern once and cache it (Have a limit on the cache if the search strings are variable and not constant) and do the multiple replace in one go. Have seen a lot of jobs suffer in performance because of UDFs with regex match and not reusing compiled Pattern. > Built In UDF - REPLACE_MULTI : For a given string, search and replace all > occurrences of search keys with replacement values. > -- > > Key: PIG-4673 > URL: https://issues.apache.org/jira/browse/PIG-4673 > Project: Pig > Issue Type: New Feature > Components: piggybank >Affects Versions: site >Reporter: Murali Rao >Assignee: Murali Rao >Priority: Minor > Labels: None > Fix For: 0.16.0 > > Attachments: PIG-4673-1.patch, replace_multi_udf.patch > > > Lets say we have a string = 'A1B2C3D4'. Our objective is to replace A with 1, > B with 2, C with 3 and D with 4 to derive 11223344 string. > Using existing REPLACE method > REPLACE(REPLACE(REPLACE(REPLACE('A1B2C3D4','A','1'),'B','2'),'C','3'),'D','4') > > With proposed UDF : REPLACE_MULTI method > General Syntax : > REPLACE_MULTI ( sourceString, [ search1#replacement1, ... ] ) > REPLACE_MULTI ( 'A1B2C3D4', [ 'A'#'1','B'#'2', 'C'#'3', 'D'#'4' ] ) > Advantage : > 1. Function calls are reduced. > 2. Ease to code and better readable. > > Let me know your thoughts/ inputs on having this UDF in Piggy Bank. Will take > this up based on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4554) Compress pig.script before encoding
[ https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-4554: Assignee: Sandeep Samdaria > Compress pig.script before encoding > --- > > Key: PIG-4554 > URL: https://issues.apache.org/jira/browse/PIG-4554 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy >Assignee: Sandeep Samdaria > Labels: newbie > Fix For: 0.16.0 > > Attachments: PIG-4554.patch > > > Currently we truncate the pig script (maxScriptSize = 10240) and base64 > encode it and store in config. We should remove the truncation and store the > full script by compressing and then doing base64 encoding. We already do that > for udfcontext, etc. It will save space as it will compress really well and > will also give the full pig script while debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4673) Built In UDF - REPLACE_MULTI : For a given string, search and replace all occurrences of search keys with replacement values.
[ https://issues.apache.org/jira/browse/PIG-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876840#comment-14876840 ] Murali Rao commented on PIG-4673: - [~rohini] : Thanks for your inputs, will check on the performance and will take necessary actions. > Built In UDF - REPLACE_MULTI : For a given string, search and replace all > occurrences of search keys with replacement values. > -- > > Key: PIG-4673 > URL: https://issues.apache.org/jira/browse/PIG-4673 > Project: Pig > Issue Type: New Feature > Components: piggybank >Affects Versions: site >Reporter: Murali Rao >Assignee: Murali Rao >Priority: Minor > Labels: None > Fix For: 0.16.0 > > Attachments: PIG-4673-1.patch, replace_multi_udf.patch > > > Lets say we have a string = 'A1B2C3D4'. Our objective is to replace A with 1, > B with 2, C with 3 and D with 4 to derive 11223344 string. > Using existing REPLACE method > REPLACE(REPLACE(REPLACE(REPLACE('A1B2C3D4','A','1'),'B','2'),'C','3'),'D','4') > > With proposed UDF : REPLACE_MULTI method > General Syntax : > REPLACE_MULTI ( sourceString, [ search1#replacement1, ... ] ) > REPLACE_MULTI ( 'A1B2C3D4', [ 'A'#'1','B'#'2', 'C'#'3', 'D'#'4' ] ) > Advantage : > 1. Function calls are reduced. > 2. Ease to code and better readable. > > Let me know your thoughts/ inputs on having this UDF in Piggy Bank. Will take > this up based on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4554) Compress pig.script before encoding
[ https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Samdaria updated PIG-4554: -- Attachment: PIG-4554-2.patch Made changes as suggested by Rohini. > Compress pig.script before encoding > --- > > Key: PIG-4554 > URL: https://issues.apache.org/jira/browse/PIG-4554 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy >Assignee: Sandeep Samdaria > Labels: newbie > Fix For: 0.16.0 > > Attachments: PIG-4554-2.patch, PIG-4554.patch > > > Currently we truncate the pig script (maxScriptSize = 10240) and base64 > encode it and store in config. We should remove the truncation and store the > full script by compressing and then doing base64 encoding. We already do that > for udfcontext, etc. It will save space as it will compress really well and > will also give the full pig script while debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4677) Display failure information on stop on failure
[ https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated PIG-4677: --- Attachment: PIG-4677.2.patch Updated the patch. > Display failure information on stop on failure > -- > > Key: PIG-4677 > URL: https://issues.apache.org/jira/browse/PIG-4677 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: PIG-4677.2.patch, PIG-4677.patch > > > When stop on failure option is specified, pig abruptly exits without > displaying any job stats or failed job information which it usually does in > case of failures. > {code} > 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 9% complete > 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Running jobs are > [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756] > 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR > org.apache.pig.tools.grunt.Grunt - ERROR 6017: Job failed! > Hadoop Job IDs executed by Pig: > job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754 > <<< Invocation of Main class completed <<< > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4677) Display failure information on stop on failure
[ https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated PIG-4677: --- Status: Patch Available (was: Open) > Display failure information on stop on failure > -- > > Key: PIG-4677 > URL: https://issues.apache.org/jira/browse/PIG-4677 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: PIG-4677.2.patch, PIG-4677.patch > > > When stop on failure option is specified, pig abruptly exits without > displaying any job stats or failed job information which it usually does in > case of failures. > {code} > 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 9% complete > 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Running jobs are > [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756] > 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR > org.apache.pig.tools.grunt.Grunt - ERROR 6017: Job failed! > Hadoop Job IDs executed by Pig: > job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754 > <<< Invocation of Main class completed <<< > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4554) Compress pig.script before encoding
[ https://issues.apache.org/jira/browse/PIG-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Samdaria updated PIG-4554: -- Status: Patch Available (was: Open) > Compress pig.script before encoding > --- > > Key: PIG-4554 > URL: https://issues.apache.org/jira/browse/PIG-4554 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy >Assignee: Sandeep Samdaria > Labels: newbie > Fix For: 0.16.0 > > Attachments: PIG-4554.patch > > > Currently we truncate the pig script (maxScriptSize = 10240) and base64 > encode it and store in config. We should remove the truncation and store the > full script by compressing and then doing base64 encoding. We already do that > for udfcontext, etc. It will save space as it will compress really well and > will also give the full pig script while debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4677) Display failure information on stop on failure
[ https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876652#comment-14876652 ] Rohini Palaniswamy commented on PIG-4677: - It should still be assertFalse(server.existsFile("done")); . With this change it is still not stopping execution of the script when it is compiled in two phases due to fs statements. Will have to make checkStopOnFailure return true instead of void, and throw new ExecException(msg.toString(), errCode, PigException.REMOTE_ENVIRONMENT); instead of return pigStats at the end in launchPig if checkStopOnFailure returned true. > Display failure information on stop on failure > -- > > Key: PIG-4677 > URL: https://issues.apache.org/jira/browse/PIG-4677 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: PIG-4677.2.patch, PIG-4677.patch > > > When stop on failure option is specified, pig abruptly exits without > displaying any job stats or failed job information which it usually does in > case of failures. > {code} > 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 9% complete > 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Running jobs are > [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756] > 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR > org.apache.pig.tools.grunt.Grunt - ERROR 6017: Job failed! > Hadoop Job IDs executed by Pig: > job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754 > <<< Invocation of Main class completed <<< > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)