[jira] [Updated] (PIG-3764) Compile physical operators to bytecode
[ https://issues.apache.org/jira/browse/PIG-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3764: Description: I started a prototype here: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan The current physical plan is relatively inefficient at evaluating expressions. In the context of a better execution engine (Tez, Spark, ...), compiling expressions to bytecode would be a significant speedup. This is a candidate project for Google summer of code 2014. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2014 was: I started a prototype here: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan The current physical plan is relatively inefficient at evaluating expressions. In the context of a better execution engine (Tez, Spark, ...), compiling expressions to bytecode would be a significant speedup. > Compile physical operators to bytecode > -- > > Key: PIG-3764 > URL: https://issues.apache.org/jira/browse/PIG-3764 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Julien Le Dem > Labels: GSOC2014 > > I started a prototype here: > https://github.com/julienledem/pig/compare/trunk...compile_physical_plan > The current physical plan is relatively inefficient at evaluating expressions. > In the context of a better execution engine (Tez, Spark, ...), compiling > expressions to bytecode would be a significant speedup. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-2784) Framework for dynamic query optimization
[ https://issues.apache.org/jira/browse/PIG-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2784: Description: We need a framework to implement dynamic query optimization, i.e. changing the query plan at runtime. Currently we support estimating the number of reducers dynamically, which works well as the first step but was not perfectly implemented. In near future, we'll support more dynamic optimization, like [removing sample job for order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting skew and using skew-join, etc. Currently estimating #reducer is implemented in JobControlCompiler after MRCompiler compiles all the MapReduceOperators and generate the complete MRPlan. One place (discussed with Thejas) to implement the framework is at the MRCompiler, where the MRPlan'll be generated at batches and adjusted dynamically. Any comment? This is a candidate project for Google summer of code 2014. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2014 was: We need a framework to implement dynamic query optimization, i.e. changing the query plan at runtime. Currently we support estimating the number of reducers dynamically, which works well as the first step but was not perfectly implemented. In near future, we'll support more dynamic optimization, like [removing sample job for order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting skew and using skew-join, etc. Currently estimating #reducer is implemented in JobControlCompiler after MRCompiler compiles all the MapReduceOperators and generate the complete MRPlan. One place (discussed with Thejas) to implement the framework is at the MRCompiler, where the MRPlan'll be generated at batches and adjusted dynamically. Any comment? > Framework for dynamic query optimization > > > Key: PIG-2784 > URL: https://issues.apache.org/jira/browse/PIG-2784 > Project: Pig > Issue Type: New Feature >Reporter: Jie Li >Assignee: Aniket Mokashi > Labels: GSOC2014 > > We need a framework to implement dynamic query optimization, i.e. changing > the query plan at runtime. Currently we support estimating the number of > reducers dynamically, which works well as the first step but was not > perfectly implemented. In near future, we'll support more dynamic > optimization, like [removing sample job for > order-by|https://issues.apache.org/jira/browse/PIG-483], [removing limit > job|https://issues.apache.org/jira/browse/PIG-2675], dynamically detecting > skew and using skew-join, etc. > Currently estimating #reducer is implemented in JobControlCompiler after > MRCompiler compiles all the MapReduceOperators and generate the complete > MRPlan. One place (discussed with Thejas) to implement the framework is at > the MRCompiler, where the MRPlan'll be generated at batches and adjusted > dynamically. > Any comment? > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-2599) Mavenize Pig
[ https://issues.apache.org/jira/browse/PIG-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2599: Description: Switch Pig build system from ant to maven. This is a candidate project for Google summer of code 2014. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2014 was: Switch Pig build system from ant to maven. This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 > Mavenize Pig > > > Key: PIG-2599 > URL: https://issues.apache.org/jira/browse/PIG-2599 > Project: Pig > Issue Type: New Feature > Components: build >Reporter: Daniel Dai > Labels: gsoc2014 > Fix For: 0.13.0 > > Attachments: maven-pig.1.zip > > > Switch Pig build system from ant to maven. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-2597) Move grunt from javacc to ANTRL
[ https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2597: Description: Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig. This is a candidate project for Google summer of code 2014. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2014 was: Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig. This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013 > Move grunt from javacc to ANTRL > --- > > Key: PIG-2597 > URL: https://issues.apache.org/jira/browse/PIG-2597 > Project: Pig > Issue Type: Improvement >Reporter: Jonathan Coveney > Labels: gsoc2014 > Attachments: pig02.diff > > > Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The > parser is very difficult to work with, and next to impossible to understand > or modify. ANTLR provides a much cleaner, more standard way to generate > parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we > continue to add features to Pig. > This is a candidate project for Google summer of code 2014. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-2597) Move grunt from javacc to ANTRL
[ https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2597: Labels: gsoc2014 (was: gsoc2013) > Move grunt from javacc to ANTRL > --- > > Key: PIG-2597 > URL: https://issues.apache.org/jira/browse/PIG-2597 > Project: Pig > Issue Type: Improvement >Reporter: Jonathan Coveney > Labels: gsoc2014 > Attachments: pig02.diff > > > Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The > parser is very difficult to work with, and next to impossible to understand > or modify. ANTLR provides a much cleaner, more standard way to generate > parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we > continue to add features to Pig. > This is a candidate project for Google summer of code 2013. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2013 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (10 issues) Subscriber: pigdaily Key Summary PIG-3757Make scalar work https://issues.apache.org/jira/browse/PIG-3757 PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder https://issues.apache.org/jira/browse/PIG-3737 PIG-3735UDF to data cleanse the dirty data with expected pattern https://issues.apache.org/jira/browse/PIG-3735 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3613UDF for SimilarityMatching between strings with matching scores https://issues.apache.org/jira/browse/PIG-3613 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3456Reduce threadlocal conf access in backend for each record https://issues.apache.org/jira/browse/PIG-3456 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3373XMLLoader returns non-matching nodes when a tag name spans through the block boundary https://issues.apache.org/jira/browse/PIG-3373 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Updated] (PIG-3774) Piggybank Over UDF get wrong result
[ https://issues.apache.org/jira/browse/PIG-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3774: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk and 0.12 branch. > Piggybank Over UDF get wrong result > --- > > Key: PIG-3774 > URL: https://issues.apache.org/jira/browse/PIG-3774 > Project: Pig > Issue Type: Bug > Components: piggybank >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.1, 0.13.0 > > Attachments: PIG-3774-1.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3774) Piggybank Over UDF get wrong result
[ https://issues.apache.org/jira/browse/PIG-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907765#comment-13907765 ] Alan Gates commented on PIG-3774: - +1. > Piggybank Over UDF get wrong result > --- > > Key: PIG-3774 > URL: https://issues.apache.org/jira/browse/PIG-3774 > Project: Pig > Issue Type: Bug > Components: piggybank >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.1, 0.13.0 > > Attachments: PIG-3774-1.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3774) Piggybank Over UDF get wrong result
[ https://issues.apache.org/jira/browse/PIG-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3774: Attachment: PIG-3774-1.patch > Piggybank Over UDF get wrong result > --- > > Key: PIG-3774 > URL: https://issues.apache.org/jira/browse/PIG-3774 > Project: Pig > Issue Type: Bug > Components: piggybank >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.1, 0.13.0 > > Attachments: PIG-3774-1.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3774) Piggybank Over UDF get wrong result
[ https://issues.apache.org/jira/browse/PIG-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3774: Status: Patch Available (was: Open) > Piggybank Over UDF get wrong result > --- > > Key: PIG-3774 > URL: https://issues.apache.org/jira/browse/PIG-3774 > Project: Pig > Issue Type: Bug > Components: piggybank >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.12.1, 0.13.0 > > Attachments: PIG-3774-1.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PIG-3774) Piggybank Over UDF get wrong result
Daniel Dai created PIG-3774: --- Summary: Piggybank Over UDF get wrong result Key: PIG-3774 URL: https://issues.apache.org/jira/browse/PIG-3774 Project: Pig Issue Type: Bug Components: piggybank Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.1, 0.13.0 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907473#comment-13907473 ] Aniket Mokashi commented on PIG-2672: - Thanks [~brocknoland]! Looks like it existed even before this @https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java#L1524. Let me open another jira to fix it. > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-2672-10.patch, PIG-2672-5.patch, PIG-2672-7.patch, > PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-2672) Optimize the use of DistributedCache
[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907447#comment-13907447 ] Brock Noland commented on PIG-2672: --- FYI in in HIVE-860 a reviewer asked me if the following code (copied from this patch) closed the stream: {noformat} String checksum = DigestUtils.shaHex(url.openStream()); {noformat} Doesn't look like it does according to the common-codec source. Therefore I think pig has a file descriptor leak. > Optimize the use of DistributedCache > > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Aniket Mokashi > Fix For: 0.13.0 > > Attachments: PIG-2672-10.patch, PIG-2672-5.patch, PIG-2672-7.patch, > PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of >* Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. >* Performance - The jar distribution impacts the job launch time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3675) Documentation for AccumuloStorage
[ https://issues.apache.org/jira/browse/PIG-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-3675: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Josh! > Documentation for AccumuloStorage > - > > Key: PIG-3675 > URL: https://issues.apache.org/jira/browse/PIG-3675 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Daniel Dai >Assignee: Josh Elser > Fix For: 0.13.0 > > Attachments: > 0001-PIG-3675-Initial-documentation-for-AccumuloStorage.patch, > 0001-PIG-3675-Initial-documentation-for-AccumuloStorage.patch.2 > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3773) Grunt ERROR 2017 with RANK and two output paths
[ https://issues.apache.org/jira/browse/PIG-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ville Weijo updated PIG-3773: - Attachment: pig_1392907156669.log Stack trace > Grunt ERROR 2017 with RANK and two output paths > --- > > Key: PIG-3773 > URL: https://issues.apache.org/jira/browse/PIG-3773 > Project: Pig > Issue Type: Bug >Affects Versions: 0.12.0, 0.11.1 >Reporter: Ville Weijo > Attachments: pig_1392907156669.log > > > Execution of Pig script > {code} > A = LOAD 'input.txt'; > B = RANK A; > STORE B INTO 'output1.txt'; > STORE A INTO 'output2.txt'; > {code} > crashes with > {code} > [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error > creating job configuration. > {code} > If "STORE A INTO 'output2.txt'" is removed, the script works fine. Content of > 'input.txt' does not seem to matter much, except it cannot be empty > (apparently triggers bug > [PIG-3726|https://issues.apache.org/jira/browse/PIG-3726]). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PIG-3773) Grunt ERROR 2017 with RANK and two output paths
Ville Weijo created PIG-3773: Summary: Grunt ERROR 2017 with RANK and two output paths Key: PIG-3773 URL: https://issues.apache.org/jira/browse/PIG-3773 Project: Pig Issue Type: Bug Affects Versions: 0.11.1, 0.12.0 Reporter: Ville Weijo Execution of Pig script {code} A = LOAD 'input.txt'; B = RANK A; STORE B INTO 'output1.txt'; STORE A INTO 'output2.txt'; {code} crashes with {code} [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration. {code} If "STORE A INTO 'output2.txt'" is removed, the script works fine. Content of 'input.txt' does not seem to matter much, except it cannot be empty (apparently triggers bug [PIG-3726|https://issues.apache.org/jira/browse/PIG-3726]). -- This message was sent by Atlassian JIRA (v6.1.5#6160)