[jira] [Assigned] (PIG-2597) Move grunt from javacc to ANTRL
[ https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-2597: --- Assignee: Daniel Dai Move grunt from javacc to ANTRL --- Key: PIG-2597 URL: https://issues.apache.org/jira/browse/PIG-2597 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Daniel Dai Labels: gsoc2014 Attachments: pig02.diff Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig. This is a candidate project for Google summer of code 2014. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-2597) Move grunt from javacc to ANTRL
[ https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925567#comment-13925567 ] Daniel Dai commented on PIG-2597: - [~vimuth], I should be available to mentor this. Move grunt from javacc to ANTRL --- Key: PIG-2597 URL: https://issues.apache.org/jira/browse/PIG-2597 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Daniel Dai Labels: gsoc2014 Attachments: pig02.diff Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The parser is very difficult to work with, and next to impossible to understand or modify. ANTLR provides a much cleaner, more standard way to generate parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we continue to add features to Pig. This is a candidate project for Google summer of code 2014. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2014 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3603) Add counters to TezStats
[ https://issues.apache.org/jira/browse/PIG-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3603: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to tez branch. Thank you Rohini for the review! Add counters to TezStats Key: PIG-3603 URL: https://issues.apache.org/jira/browse/PIG-3603 Project: Pig Issue Type: Sub-task Components: tez Affects Versions: tez-branch Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: tez-branch Attachments: PIG-3603-1.patch Counters are now supported by Tez (TEZ-12). We should add counters to TezStats. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-3804) Limit Optimizer getLimitPlan broken
Íñigo Goiri created PIG-3804: Summary: Limit Optimizer getLimitPlan broken Key: PIG-3804 URL: https://issues.apache.org/jira/browse/PIG-3804 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.11.1, 0.12.0, 0.10.1, 0.11, 0.10.0, 0.9.2, 0.8.1, 0.8.0 Reporter: Íñigo Goiri Priority: Minor Fix For: 0.13.0 The LimitOptimizer cases don't seem right (Starting at line 142 in file src/org/apache/pig/newplan/logical/rules/LimitOptimizer.java). It looks like one optimization was added wrongly and one extra bracket had to be added at the end. There's a TODO note but it looks like some parts weren't commented properly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3782) PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment
[ https://issues.apache.org/jira/browse/PIG-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925947#comment-13925947 ] Koji Noguchi commented on PIG-3782: --- bq. I attach another patch only fix PushDownForEachFlatten. I reuse your test case though. Can you check if it looks better? Thanks [~daijy]. I can see how your patch fixes the issue, but I'm not sure if it's better. Maybe it'll help me understand if you can teach me why we want to avoid my approach of getting rid of the extra copy inside the LOGenerate. To me, fixing the issue at the source(producer) by storing UIDs inside mUserDefinedSchema is better than fixing on the receiver side with a workaround. PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment --- Key: PIG-3782 URL: https://issues.apache.org/jira/browse/PIG-3782 Project: Pig Issue Type: Bug Affects Versions: 0.10.1, 0.12.0, 0.11.1 Reporter: Koji Noguchi Assignee: Koji Noguchi Attachments: PIG-3782-2.patch, pig-3782-v01.patch {noformat} a = load '1.txt' as (a0:int, a1, a2:bag{}); b = load '2.txt' as (b0:int, b1); c = foreach a generate a0, flatten(a2) as (q1, q2); d = join c by a0, b by b0; e = foreach d generate a0, q1, q2; f = foreach e generate a0, (int)q1, (int)q2; store f into 'output'; {noformat} This pig script fails with 2014-02-27 11:49:45,657 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 13 Input: 0 Column: 1) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3794) pig -useHCatalog fails using pig command line interface on HDInsight
[ https://issues.apache.org/jira/browse/PIG-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926147#comment-13926147 ] Eric Hanson commented on PIG-3794: -- I just know that HCAT_HOME is not defined on HDINSIGHT or our windows one-box setup. pig -useHCatalog fails using pig command line interface on HDInsight Key: PIG-3794 URL: https://issues.apache.org/jira/browse/PIG-3794 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12.0, 0.13.0 Environment: Windows Azure HDINSIGHT c:\apps\dist\pig-0.11.0.1.3.2.0-05 Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: PIG-3794.01.patch, PIG-3794.02.patch, PIG-3794.03.patch When you connect to an HDP 1.3 version HDINSIGHT cluster with remote desktop, if you try this: c:\apps\dist\pig-0.11.0.1.3.2.0-05bin\pig -useHCatalog you get this: HCAT_HOME should be defined but you should not get an error. It appears that pig.cmd should use HCATALOG_HOME instead of HCAT_HOME. The same problem exists on the 1.3 one-box installation for Windows. A quick look at the source code indicates it is still a problem on trunk. Equivalent Monarch Pig JIRA is PIG-125 In addition, if you set hive.metastore.uris to '', this is supposed to create an embedded metastore instead of going to the metastore service. This fails on Azure HDINSIGHT and Windows because of missing datanucleus and sqljdbc4 jars. This is covered in Monarch Pig JIRA Pig-126. Finally, if you submit a pig job from WebHCat, due to argument quoting for windows, -useHCatalog comes in as -useHCatalog into pig.cmd. This causes -useHCatalog to never work on Windows from WebHCat. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema
[ https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926350#comment-13926350 ] Julien Le Dem commented on PIG-3801: I would use properties.getProperty(MAPREDUCE_FRAMEWORK_NAME).equals(LOCAL) to decide if it's running locally, but otherwise this looks good to me. Auto local mode does not call storeSchema - Key: PIG-3801 URL: https://issues.apache.org/jira/browse/PIG-3801 Project: Pig Issue Type: Bug Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: PIG-3801.patch https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481 Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We also need to add this for auto-local jobs. To repro this problem, run- a = load '2.txt' as (a0:chararray, a1:int); store a into 'a' using PigStorage(',','-schema'); This creates .pig_schema file in pig -x local mode, but does not create .pig_schema file in auto-local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema
[ https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926351#comment-13926351 ] Julien Le Dem commented on PIG-3801: +1 Auto local mode does not call storeSchema - Key: PIG-3801 URL: https://issues.apache.org/jira/browse/PIG-3801 Project: Pig Issue Type: Bug Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: PIG-3801.patch https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481 Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We also need to add this for auto-local jobs. To repro this problem, run- a = load '2.txt' as (a0:chararray, a1:int); store a into 'a' using PigStorage(',','-schema'); This creates .pig_schema file in pig -x local mode, but does not create .pig_schema file in auto-local mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-3805) ToString(datetime [, format string]) doesn't work without the second argument
Jenny Thompson created PIG-3805: --- Summary: ToString(datetime [, format string]) doesn't work without the second argument Key: PIG-3805 URL: https://issues.apache.org/jira/browse/PIG-3805 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.12.0 Reporter: Jenny Thompson Priority: Minor The exec function of ToString is written to handle a 1 or 2 arguments (it defaults to ISO, which is consistent with ToDate). However, the getArgToFuncMapping function returns only one FuncSpec, requiring the formatString argument. To fix: just return add another FuncSpec to getArcToFuncMapping. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (15 issues) Subscriber: pigdaily Key Summary PIG-3801Auto local mode does not call storeSchema https://issues.apache.org/jira/browse/PIG-3801 PIG-3794pig -useHCatalog fails using pig command line interface on HDInsight https://issues.apache.org/jira/browse/PIG-3794 PIG-3783We can predict when small local jobs will cause an OOM and change io.sort.mb in that case https://issues.apache.org/jira/browse/PIG-3783 PIG-3782PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment https://issues.apache.org/jira/browse/PIG-3782 PIG-3771Piggybank Avrostorage makes a lot of namenode calls in the backend https://issues.apache.org/jira/browse/PIG-3771 PIG-3757Make scalar work https://issues.apache.org/jira/browse/PIG-3757 PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder https://issues.apache.org/jira/browse/PIG-3737 PIG-3735UDF to data cleanse the dirty data with expected pattern https://issues.apache.org/jira/browse/PIG-3735 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3613UDF for SimilarityMatching between strings with matching scores https://issues.apache.org/jira/browse/PIG-3613 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3456Reduce threadlocal conf access in backend for each record https://issues.apache.org/jira/browse/PIG-3456 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 PIG-3373XMLLoader returns non-matching nodes when a tag name spans through the block boundary https://issues.apache.org/jira/browse/PIG-3373 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384
[jira] [Updated] (PIG-3783) We can predict when small local jobs will cause an OOM and change io.sort.mb in that case
[ https://issues.apache.org/jira/browse/PIG-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated PIG-3783: -- Resolution: Duplicate Status: Resolved (was: Patch Available) Duplicating to PIG-3731 We can predict when small local jobs will cause an OOM and change io.sort.mb in that case - Key: PIG-3783 URL: https://issues.apache.org/jira/browse/PIG-3783 Project: Pig Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Ravi Prakash Labels: newbie Attachments: PIG-3783.branch-0.12.patch Seems like a lot of new users run into this problem: http://stackoverflow.com/questions/10165648/apache-pig-outofmemory-exception-with-simple-group-by-in-local-mode http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html http://stackoverflow.com/questions/16499432/pig-local-mode-group-or-join-java-lang-outofmemoryerror-java-heap-space -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3783) We can predict when small local jobs will cause an OOM and change io.sort.mb in that case
[ https://issues.apache.org/jira/browse/PIG-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929917#comment-13929917 ] Ravi Prakash commented on PIG-3783: --- Thanks Aniket! I'll defer to your better judgement and close this JIRA as a duplicate of PIG-3731. We can predict when small local jobs will cause an OOM and change io.sort.mb in that case - Key: PIG-3783 URL: https://issues.apache.org/jira/browse/PIG-3783 Project: Pig Issue Type: Improvement Affects Versions: 0.12.0 Reporter: Ravi Prakash Labels: newbie Attachments: PIG-3783.branch-0.12.patch Seems like a lot of new users run into this problem: http://stackoverflow.com/questions/10165648/apache-pig-outofmemory-exception-with-simple-group-by-in-local-mode http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html http://stackoverflow.com/questions/16499432/pig-local-mode-group-or-join-java-lang-outofmemoryerror-java-heap-space -- This message was sent by Atlassian JIRA (v6.2#6252)