[jira] [Commented] (PIG-3461) Rewrite PartitionFilterOptimizer to make it work for all the cases
[ https://issues.apache.org/jira/browse/PIG-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770466#comment-13770466 ] Aniket Mokashi commented on PIG-3461: - RB: https://reviews.apache.org/r/14196/ > Rewrite PartitionFilterOptimizer to make it work for all the cases > -- > > Key: PIG-3461 > URL: https://issues.apache.org/jira/browse/PIG-3461 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3461-2.patch, PIG-3461-4.patch > > > Current algorithm for Partition Filter pushdown identification fails in > several corner cases. We need to rewrite its logic so that it works in all > cases and does the maximum possible filter pushdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3461) Rewrite PartitionFilterOptimizer to make it work for all the cases
[ https://issues.apache.org/jira/browse/PIG-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3461: Status: Patch Available (was: Open) > Rewrite PartitionFilterOptimizer to make it work for all the cases > -- > > Key: PIG-3461 > URL: https://issues.apache.org/jira/browse/PIG-3461 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3461-2.patch, PIG-3461-4.patch > > > Current algorithm for Partition Filter pushdown identification fails in > several corner cases. We need to rewrite its logic so that it works in all > cases and does the maximum possible filter pushdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3461) Rewrite PartitionFilterOptimizer to make it work for all the cases
[ https://issues.apache.org/jira/browse/PIG-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3461: Attachment: PIG-3461-4.patch > Rewrite PartitionFilterOptimizer to make it work for all the cases > -- > > Key: PIG-3461 > URL: https://issues.apache.org/jira/browse/PIG-3461 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3461-2.patch, PIG-3461-4.patch > > > Current algorithm for Partition Filter pushdown identification fails in > several corner cases. We need to rewrite its logic so that it works in all > cases and does the maximum possible filter pushdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3465) Fix problems with LogicalExpressionSimplifier and turn on the optimizer by default
Aniket Mokashi created PIG-3465: --- Summary: Fix problems with LogicalExpressionSimplifier and turn on the optimizer by default Key: PIG-3465 URL: https://issues.apache.org/jira/browse/PIG-3465 Project: Pig Issue Type: Bug Reporter: Aniket Mokashi -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3464) Mark ExecType and ExecutionEngine interfaces as evolving
[ https://issues.apache.org/jira/browse/PIG-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3464: --- Attachment: PIG-3464-1.patch The attached adds the evolving annotation to the new interfaces: ExecType and ExecutionEngine. > Mark ExecType and ExecutionEngine interfaces as evolving > > > Key: PIG-3464 > URL: https://issues.apache.org/jira/browse/PIG-3464 > Project: Pig > Issue Type: Bug >Affects Versions: 0.12 >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3464-1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3464) Mark ExecType and ExecutionEngine interfaces as evolving
[ https://issues.apache.org/jira/browse/PIG-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3464: --- Status: Patch Available (was: Open) > Mark ExecType and ExecutionEngine interfaces as evolving > > > Key: PIG-3464 > URL: https://issues.apache.org/jira/browse/PIG-3464 > Project: Pig > Issue Type: Bug >Affects Versions: 0.12 >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3464-1.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3464) Mark ExecType and ExecutionEngine interfaces as evolving
Cheolsoo Park created PIG-3464: -- Summary: Mark ExecType and ExecutionEngine interfaces as evolving Key: PIG-3464 URL: https://issues.apache.org/jira/browse/PIG-3464 Project: Pig Issue Type: Bug Affects Versions: 0.12 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.12 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3457) Provide backward compatibility for PigStatsUtil and JobStats
[ https://issues.apache.org/jira/browse/PIG-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770286#comment-13770286 ] Cheolsoo Park commented on PIG-3457: [~daijy], isn't your patch ready to commit? It looks good to me. > Provide backward compatibility for PigStatsUtil and JobStats > > > Key: PIG-3457 > URL: https://issues.apache.org/jira/browse/PIG-3457 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12 > > Attachments: PIG-3457-1.patch > > > PIG-3419 restructured PigStatsUtil which break downstream projects such as > Oozie. Oozie uses PigStatsUtil.{HDFS_BYTES_WRITTEN, MAP_INPUT_RECORDS, > MAP_OUTPUT_RECORDS, REDUCE_INPUT_RECORDS, REDUCE_OUTPUT_RECORDS}. We need to > provide a backward compatible way. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (15 issues) Subscriber: pigdaily Key Summary PIG-3455Pig 0.11.1 OutOfMemory error https://issues.apache.org/jira/browse/PIG-3455 PIG-3451EvalFunc ctor reflection to determine value of type param T is brittle https://issues.apache.org/jira/browse/PIG-3451 PIG-3449Move JobCreationException to org.apache.pig.backend.hadoop.executionengine https://issues.apache.org/jira/browse/PIG-3449 PIG-3448Tez backend layout https://issues.apache.org/jira/browse/PIG-3448 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3434Null subexpression in bincond nullifies outer tuple (or bag) https://issues.apache.org/jira/browse/PIG-3434 PIG-3388No support for Regex for row filter in org.apache.pig.backend.hadoop.hbase.HBaseStorage https://issues.apache.org/jira/browse/PIG-3388 PIG-3367Add assert keyword (operator) in pig https://issues.apache.org/jira/browse/PIG-3367 PIG-3325Adding a tuple to a bag is slow https://issues.apache.org/jira/browse/PIG-3325 PIG-3292Logical plan invalid state: duplicate uid in schema during self-join to get cross product https://issues.apache.org/jira/browse/PIG-3292 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3021Split results missing records when there is null values in the column comparison https://issues.apache.org/jira/browse/PIG-3021 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Commented] (PIG-3455) Pig 0.11.1 OutOfMemory error
[ https://issues.apache.org/jira/browse/PIG-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770256#comment-13770256 ] Bill Graham commented on PIG-3455: -- +1, much better. > Pig 0.11.1 OutOfMemory error > > > Key: PIG-3455 > URL: https://issues.apache.org/jira/browse/PIG-3455 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Shubham Chopra >Priority: Critical > Fix For: 0.12, 0.11.2 > > Attachments: PIG-3455-1.patch > > > When running Pig on a relatively large script (around 1.5k lines, 85 > assignments), Pig fails with the following error even before any jobs are > fired: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. Java heap space > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2882) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) > at java.lang.StringBuilder.append(StringBuilder.java:119) > at > org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirstLP(LogicalPlanPrinter.java:83) > at > org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.visit(LogicalPlanPrinter.java:69) > at > org.apache.pig.newplan.logical.relational.LogicalPlan.getSignature(LogicalPlan.java:122) > at org.apache.pig.PigServer.execute(PigServer.java:1237) > at org.apache.pig.PigServer.executeBatch(PigServer.java:333) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:604) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > The same script works fine with Pig-0.10.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3455) Pig 0.11.1 OutOfMemory error
[ https://issues.apache.org/jira/browse/PIG-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3455: Attachment: PIG-3455-1.patch > Pig 0.11.1 OutOfMemory error > > > Key: PIG-3455 > URL: https://issues.apache.org/jira/browse/PIG-3455 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Shubham Chopra >Priority: Critical > Fix For: 0.12, 0.11.2 > > Attachments: PIG-3455-1.patch > > > When running Pig on a relatively large script (around 1.5k lines, 85 > assignments), Pig fails with the following error even before any jobs are > fired: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. Java heap space > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2882) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) > at java.lang.StringBuilder.append(StringBuilder.java:119) > at > org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirstLP(LogicalPlanPrinter.java:83) > at > org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.visit(LogicalPlanPrinter.java:69) > at > org.apache.pig.newplan.logical.relational.LogicalPlan.getSignature(LogicalPlan.java:122) > at org.apache.pig.PigServer.execute(PigServer.java:1237) > at org.apache.pig.PigServer.executeBatch(PigServer.java:333) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:604) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > The same script works fine with Pig-0.10.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3455) Pig 0.11.1 OutOfMemory error
[ https://issues.apache.org/jira/browse/PIG-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3455: Status: Patch Available (was: Open) > Pig 0.11.1 OutOfMemory error > > > Key: PIG-3455 > URL: https://issues.apache.org/jira/browse/PIG-3455 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Shubham Chopra >Priority: Critical > Fix For: 0.12, 0.11.2 > > Attachments: PIG-3455-1.patch > > > When running Pig on a relatively large script (around 1.5k lines, 85 > assignments), Pig fails with the following error even before any jobs are > fired: > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. Java heap space > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2882) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) > at java.lang.StringBuilder.append(StringBuilder.java:119) > at > org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirstLP(LogicalPlanPrinter.java:83) > at > org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.visit(LogicalPlanPrinter.java:69) > at > org.apache.pig.newplan.logical.relational.LogicalPlan.getSignature(LogicalPlan.java:122) > at org.apache.pig.PigServer.execute(PigServer.java:1237) > at org.apache.pig.PigServer.executeBatch(PigServer.java:333) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:604) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > The same script works fine with Pig-0.10.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3367: Status: Open (was: Patch Available) > Add assert keyword (operator) in pig > > > Key: PIG-3367 > URL: https://issues.apache.org/jira/browse/PIG-3367 > Project: Pig > Issue Type: New Feature > Components: parser >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3367-2.patch, PIG-3367.patch > > > Assert operator can be used for data validation. With assert you can write > script as following- > {code} > a = load 'something' as (a0:int, a1:int); > assert a by a0 > 0, 'a cant be negative for reasons'; > {code} > This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3367: Attachment: PIG-3367-2.patch Incorporated code review comments > Add assert keyword (operator) in pig > > > Key: PIG-3367 > URL: https://issues.apache.org/jira/browse/PIG-3367 > Project: Pig > Issue Type: New Feature > Components: parser >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3367-2.patch, PIG-3367.patch > > > Assert operator can be used for data validation. With assert you can write > script as following- > {code} > a = load 'something' as (a0:int, a1:int); > assert a by a0 > 0, 'a cant be negative for reasons'; > {code} > This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3367: Status: Patch Available (was: Open) > Add assert keyword (operator) in pig > > > Key: PIG-3367 > URL: https://issues.apache.org/jira/browse/PIG-3367 > Project: Pig > Issue Type: New Feature > Components: parser >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3367-2.patch, PIG-3367.patch > > > Assert operator can be used for data validation. With assert you can write > script as following- > {code} > a = load 'something' as (a0:int, a1:int); > assert a by a0 > 0, 'a cant be negative for reasons'; > {code} > This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3287) MultiQueryOptimizer can prevent CombinerOptimizer from working
[ https://issues.apache.org/jira/browse/PIG-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770022#comment-13770022 ] Christon DeWan commented on PIG-3287: - This is a very long workflow and needs both optimizations to work effectively. For now I've refactored my flow to avoid needing both at once. > MultiQueryOptimizer can prevent CombinerOptimizer from working > -- > > Key: PIG-3287 > URL: https://issues.apache.org/jira/browse/PIG-3287 > Project: Pig > Issue Type: Bug >Affects Versions: 0.10.1 >Reporter: Christon DeWan > > The CombinerOptimizer does not operate on the script below. As a result, all > work is done in the reducer(s), killing performance. Removing one STORE or > refactoring the query to use a single FOREACH after the group allows the > CombinerOptimizer to work. > {noformat} > %declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i 5; done) > | hadoop fs -put - /tmp/test_data.tsv; true'` > s = LOAD '/tmp/test_data.tsv' USING PigStorage(' ') AS (n:long, g:long); > grouped = GROUP s BY g; > counted = FOREACH grouped GENERATE flatten($0), COUNT_STAR($1); > STORE counted INTO '/tmp/test_count'; > summed = FOREACH grouped GENERATE flatten($0), SUM($1.n); > STORE summed INTO '/tmp/test_sum'; > FS -rmr /tmp/test_{data.tsv,count,sum} > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770005#comment-13770005 ] Daniel Dai commented on PIG-2417: - Thanks. Wish to check this in before branch. > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12 >Reporter: Jeremy Karn > Fix For: 0.12 > > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, PIG-2417-6.patch, > PIG-2417-7.patch, PIG-2417-8.patch, PIG-2417-e2e.patch, streaming2.patch, > streaming3.patch, streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3461) Rewrite PartitionFilterOptimizer to make it work for all the cases
[ https://issues.apache.org/jira/browse/PIG-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3461: Assignee: Aniket Mokashi Status: Patch Available (was: Open) > Rewrite PartitionFilterOptimizer to make it work for all the cases > -- > > Key: PIG-3461 > URL: https://issues.apache.org/jira/browse/PIG-3461 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3461-2.patch > > > Current algorithm for Partition Filter pushdown identification fails in > several corner cases. We need to rewrite its logic so that it works in all > cases and does the maximum possible filter pushdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769981#comment-13769981 ] Jeremy Karn commented on PIG-2417: -- I'll update the patch (probably tomorrow) to take advantage of PIG-3255. I think the only outstanding comment in the review board is how the logging works with Hadoop2. I'm hoping to get a chance to test that in the next couple of days. > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12 >Reporter: Jeremy Karn > Fix For: 0.12 > > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, PIG-2417-6.patch, > PIG-2417-7.patch, PIG-2417-8.patch, PIG-2417-e2e.patch, streaming2.patch, > streaming3.patch, streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3463) Pig should use hadoop local mode for small jobs
Aniket Mokashi created PIG-3463: --- Summary: Pig should use hadoop local mode for small jobs Key: PIG-3463 URL: https://issues.apache.org/jira/browse/PIG-3463 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.11.1 Reporter: Aniket Mokashi Fix For: 0.12 Pig should use hadoop local mode for small jobs - few mappers, few reducers and few mb of data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3367) Add assert keyword (operator) in pig
[ https://issues.apache.org/jira/browse/PIG-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769985#comment-13769985 ] Julien Le Dem commented on PIG-3367: Looks good to me. Is there a way you can factor out some of the content of buildAssertOp() ? It looks like some of this would be common with other methods. > Add assert keyword (operator) in pig > > > Key: PIG-3367 > URL: https://issues.apache.org/jira/browse/PIG-3367 > Project: Pig > Issue Type: New Feature > Components: parser >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3367.patch > > > Assert operator can be used for data validation. With assert you can write > script as following- > {code} > a = load 'something' as (a0:int, a1:int); > assert a by a0 > 0, 'a cant be negative for reasons'; > {code} > This script will fail if assert is violated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3461) Rewrite PartitionFilterOptimizer to make it work for all the cases
[ https://issues.apache.org/jira/browse/PIG-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3461: Attachment: PIG-3461-2.patch > Rewrite PartitionFilterOptimizer to make it work for all the cases > -- > > Key: PIG-3461 > URL: https://issues.apache.org/jira/browse/PIG-3461 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3461-2.patch > > > Current algorithm for Partition Filter pushdown identification fails in > several corner cases. We need to rewrite its logic so that it works in all > cases and does the maximum possible filter pushdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2417) Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation.
[ https://issues.apache.org/jira/browse/PIG-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769970#comment-13769970 ] Daniel Dai commented on PIG-2417: - [~jeremykarn] With PIG-3255 check in, do you want to add this optimization? Also can you respond to my comments in review board. > Streaming UDFs - allow users to easily write UDFs in scripting languages > with no JVM implementation. > - > > Key: PIG-2417 > URL: https://issues.apache.org/jira/browse/PIG-2417 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.12 >Reporter: Jeremy Karn > Fix For: 0.12 > > Attachments: PIG-2417-4.patch, PIG-2417-5.patch, PIG-2417-6.patch, > PIG-2417-7.patch, PIG-2417-8.patch, PIG-2417-e2e.patch, streaming2.patch, > streaming3.patch, streaming.patch > > > The goal of Streaming UDFs is to allow users to easily write UDFs in > scripting languages with no JVM implementation or a limited JVM > implementation. The initial proposal is outlined here: > https://cwiki.apache.org/confluence/display/PIG/StreamingUDFs. > In order to implement this we need new syntax to distinguish a streaming UDF > from an embedded JVM UDF. I'd propose something like the following (although > I'm not sure 'language' is the best term to be using): > {code}define my_streaming_udfs language('python') > ship('my_streaming_udfs.py'){code} > We'll also need a language-specific controller script that gets shipped to > the cluster which is responsible for reading the input stream, deserializing > the input data, passing it to the user written script, serializing that > script output, and writing that to the output stream. > Finally, we'll need to add a StreamingUDF class that extends evalFunc. This > class will likely share some of the existing code in POStream and > ExecutableManager (where it make sense to pull out shared code) to stream > data to/from the controller script. > One alternative approach to creating the StreamingUDF EvalFunc is to use the > POStream operator directly. This would involve inserting the POStream > operator instead of the POUserFunc operator whenever we encountered a > streaming UDF while building the physical plan. This approach seemed > problematic because there would need to be a lot of changes in order to > support POStream in all of the places we want to be able use UDFs (For > example - to operate on a single field inside of a for each statement). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3461) Rewrite PartitionFilterOptimizer to make it work for all the cases
[ https://issues.apache.org/jira/browse/PIG-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-3461: Status: Open (was: Patch Available) Some extra code (EvalFunc) got added to the patch mistakenly. I will resubmit a refactored patch soon. Canceling the patch in meantime. > Rewrite PartitionFilterOptimizer to make it work for all the cases > -- > > Key: PIG-3461 > URL: https://issues.apache.org/jira/browse/PIG-3461 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Fix For: 0.12 > > Attachments: PIG-3461-2.patch > > > Current algorithm for Partition Filter pushdown identification fails in > several corner cases. We need to rewrite its logic so that it works in all > cases and does the maximum possible filter pushdown. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3255) Avoid extra byte array copies in streaming
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3255: Summary: Avoid extra byte array copies in streaming (was: Avoid extra byte array copy in streaming deserialize) Committed to trunk. Thanks Alan, Daniel and Jeremy > Avoid extra byte array copies in streaming > -- > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch, > PIG-3255-4.patch, PIG-3255-5.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3255) Avoid extra byte array copies in streaming
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3255: Resolution: Fixed Status: Resolved (was: Patch Available) > Avoid extra byte array copies in streaming > -- > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch, > PIG-3255-4.patch, PIG-3255-5.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3462) POForEach evaluates POProject one by one
Rohini Palaniswamy created PIG-3462: --- Summary: POForEach evaluates POProject one by one Key: PIG-3462 URL: https://issues.apache.org/jira/browse/PIG-3462 Project: Pig Issue Type: Improvement Affects Versions: 0.11.1 Reporter: Rohini Palaniswamy A = load '/tmp/data' using PigStorage() as (a1, a2, a3); B = foreach A generate a1,a2,a3;\n" generates the plan as -B: New For Each(false,false,false)[bag] - scope-45 | | | Project[bytearray][0] - scope-39 | | | Project[bytearray][1] - scope-41 | | | Project[bytearray][2] - scope-43 | |---A: Load(/tmp/data:PigStorage()) - scope-38 It would be good to change the plan generated to combine all these and fetch all projected columns at once instead of looping and projecting one by one. POUserFunc, POCast, etc in the Foreach cannot be combined and will have to be separate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize
[ https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769786#comment-13769786 ] Alan Gates commented on PIG-3255: - I gave my +1 above, so we're good from my viewpoint. > Avoid extra byte array copy in streaming deserialize > > > Key: PIG-3255 > URL: https://issues.apache.org/jira/browse/PIG-3255 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.12 > > Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch, > PIG-3255-4.patch, PIG-3255-5.patch > > > PigStreaming.java: > public Tuple deserialize(byte[] bytes) throws IOException { > Text val = new Text(bytes); > return StorageUtil.textToTuple(val, fieldDel); > } > Should remove new Text(bytes) copy and construct the tuple directly from the > bytes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769734#comment-13769734 ] Cheolsoo Park commented on PIG-3419: I will open a jira to add "unstable" annotations. I am also reviewing PIG-3457 now. > Pluggable Execution Engine > --- > > Key: PIG-3419 > URL: https://issues.apache.org/jira/browse/PIG-3419 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.12 >Reporter: Achal Soni >Assignee: Achal Soni >Priority: Minor > Fix For: 0.12 > > Attachments: execengine.patch, mapreduce_execengine.patch, > stats_scriptstate.patch, test_failures.txt, test_suite.patch, > updated-8-22-2013-exec-engine.patch, updated-8-23-2013-exec-engine.patch, > updated-8-27-2013-exec-engine.patch, updated-8-28-2013-exec-engine.patch, > updated-8-29-2013-exec-engine.patch > > > In an effort to adapt Pig to work using Apache Tez > (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for > a cleaner ExecutionEngine abstraction than existed before. The changes are > not that major as Pig was already relatively abstracted out between the > frontend and backend. The changes in the attached commit are essentially the > barebones changes -- I tried to not change the structure of Pig's different > components too much. I think it will be interesting to see in the future how > we can refactor more areas of Pig to really honor this abstraction between > the frontend and backend. > Some of the changes was to reinstate an ExecutionEngine interface to tie > together the front end and backend, and making the changes in Pig to delegate > to the EE when necessary, and creating an MRExecutionEngine that implements > this interface. Other work included changing ExecType to cycle through the > ExecutionEngines on the classpath and select the appropriate one (this is > done using Java ServiceLoader, exactly how MapReduce does for choosing the > framework to use between local and distributed mode). Also I tried to make > ScriptState, JobStats, and PigStats as abstract as possible in its current > state. I think in the future some work will need to be done here to perhaps > re-evaluate the usage of ScriptState and the responsibilities of the > different statistics classes. I haven't touched the PPNL, but I think more > abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3065) pig output format/committer should support recovery for hadoop 0.23
[ https://issues.apache.org/jira/browse/PIG-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-3065. - Resolution: Fixed Hadoop Flags: Reviewed Patch committed to trunk. > pig output format/committer should support recovery for hadoop 0.23 > --- > > Key: PIG-3065 > URL: https://issues.apache.org/jira/browse/PIG-3065 > Project: Pig > Issue Type: New Feature >Reporter: Rohini Palaniswamy >Assignee: Daniel Dai >Priority: Minor > Fix For: 0.12 > > Attachments: PIG-3065-3.patch, PIG-3065.patch.txt, PIG-3065.patch.txt > > > In hadoop 0.23 the output committer can optionally support recovery to handle > the application master getting restarted (failing some # of attempts). If its > possible the pig outputformat/committer should support recovery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3065) pig output format/committer should support recovery for hadoop 0.23
[ https://issues.apache.org/jira/browse/PIG-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy reassigned PIG-3065: --- Assignee: Daniel Dai > pig output format/committer should support recovery for hadoop 0.23 > --- > > Key: PIG-3065 > URL: https://issues.apache.org/jira/browse/PIG-3065 > Project: Pig > Issue Type: New Feature >Reporter: Rohini Palaniswamy >Assignee: Daniel Dai >Priority: Minor > Fix For: 0.12 > > Attachments: PIG-3065-3.patch, PIG-3065.patch.txt, PIG-3065.patch.txt > > > In hadoop 0.23 the output committer can optionally support recovery to handle > the application master getting restarted (failing some # of attempts). If its > possible the pig outputformat/committer should support recovery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Rounding with 8 places
Hi, I don't know, how to round the number with 8 decimal places in pig . Eg: Number => Rounded 0 => 0. 3 => 3. 3.1 => 3.1000 -10.18 => -10.180 Any suggestions? Regards, Arun Prakash The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"
[jira] [Updated] (PIG-3388) No support for Regex for row filter in org.apache.pig.backend.hadoop.hbase.HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorand Bendig updated PIG-3388: --- Fix Version/s: 0.12 > No support for Regex for row filter in > org.apache.pig.backend.hadoop.hbase.HBaseStorage > --- > > Key: PIG-3388 > URL: https://issues.apache.org/jira/browse/PIG-3388 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.11, 0.11.1 >Reporter: vikram s >Assignee: Lorand Bendig > Fix For: 0.12 > > Attachments: PIG-3388.patch > > > Currently,scan operation with rowfilter has support for gt,lt,gte,etc. > However no support for the regular expression. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira