[jira] [Assigned] (PIG-2597) Move grunt from javacc to ANTRL

2014-03-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-2597:
---

Assignee: Daniel Dai

 Move grunt from javacc to ANTRL
 ---

 Key: PIG-2597
 URL: https://issues.apache.org/jira/browse/PIG-2597
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Daniel Dai
  Labels: gsoc2014
 Attachments: pig02.diff


 Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The 
 parser is very difficult to work with, and next to impossible to understand 
 or modify. ANTLR provides a much cleaner, more standard way to generate 
 parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we 
 continue to add features to Pig.
 This is a candidate project for Google summer of code 2014. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-2597) Move grunt from javacc to ANTRL

2014-03-10 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925567#comment-13925567
 ] 

Daniel Dai commented on PIG-2597:
-

[~vimuth], I should be available to mentor this.

 Move grunt from javacc to ANTRL
 ---

 Key: PIG-2597
 URL: https://issues.apache.org/jira/browse/PIG-2597
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Daniel Dai
  Labels: gsoc2014
 Attachments: pig02.diff


 Currently, the parser for queries is in ANTLR, but Grunt is still javacc. The 
 parser is very difficult to work with, and next to impossible to understand 
 or modify. ANTLR provides a much cleaner, more standard way to generate 
 parsers/lexers/ASTs/etc, and moving from javacc to Grunt would be huge as we 
 continue to add features to Pig.
 This is a candidate project for Google summer of code 2014. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2014



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3603) Add counters to TezStats

2014-03-10 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3603:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to tez branch. Thank you Rohini for the review!

 Add counters to TezStats
 

 Key: PIG-3603
 URL: https://issues.apache.org/jira/browse/PIG-3603
 Project: Pig
  Issue Type: Sub-task
  Components: tez
Affects Versions: tez-branch
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: tez-branch

 Attachments: PIG-3603-1.patch


 Counters are now supported by Tez (TEZ-12). We should add counters to 
 TezStats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3804) Limit Optimizer getLimitPlan broken

2014-03-10 Thread JIRA
Íñigo Goiri created PIG-3804:


 Summary: Limit Optimizer getLimitPlan broken
 Key: PIG-3804
 URL: https://issues.apache.org/jira/browse/PIG-3804
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.11.1, 0.12.0, 0.10.1, 0.11, 0.10.0, 0.9.2, 0.8.1, 0.8.0
Reporter: Íñigo Goiri
Priority: Minor
 Fix For: 0.13.0


The LimitOptimizer cases don't seem right (Starting at line 142 in file 
src/org/apache/pig/newplan/logical/rules/LimitOptimizer.java). It looks like 
one optimization was added wrongly and one extra bracket had to be added at the 
end. There's a TODO note but it looks like some parts weren't commented 
properly.






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3782) PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment

2014-03-10 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925947#comment-13925947
 ] 

Koji Noguchi commented on PIG-3782:
---

bq. I attach another patch only fix PushDownForEachFlatten. I reuse your test 
case though. Can you check if it looks better?

Thanks [~daijy].  I can see how your patch fixes the issue, but I'm not sure if 
it's better. 
Maybe it'll help me understand if you can teach me why we want to avoid my 
approach of getting rid of the extra copy inside the LOGenerate.
To me, fixing the issue at the source(producer) by storing UIDs inside 
mUserDefinedSchema is better than fixing on the receiver side with a 
workaround. 


 PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing 
 due to incorrect UID assignment
 ---

 Key: PIG-3782
 URL: https://issues.apache.org/jira/browse/PIG-3782
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.1, 0.12.0, 0.11.1
Reporter: Koji Noguchi
Assignee: Koji Noguchi
 Attachments: PIG-3782-2.patch, pig-3782-v01.patch


 {noformat}
 a = load '1.txt' as (a0:int, a1, a2:bag{});
 b = load '2.txt' as (b0:int, b1);
 c = foreach a generate a0, flatten(a2) as (q1, q2);
 d = join c by a0, b by b0;
 e = foreach d generate a0, q1, q2;
 f = foreach e generate a0, (int)q1, (int)q2;
 store f into 'output';
 {noformat}
 This pig script fails with 
 2014-02-27 11:49:45,657 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2229: Couldn't find matching uid -1 for project (Name: Project Type: 
 bytearray Uid: 13 Input: 0 Column: 1)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3794) pig -useHCatalog fails using pig command line interface on HDInsight

2014-03-10 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926147#comment-13926147
 ] 

Eric Hanson commented on PIG-3794:
--

I just know that HCAT_HOME is not defined on HDINSIGHT or our windows one-box 
setup. 

 pig -useHCatalog fails using pig command line interface on HDInsight
 

 Key: PIG-3794
 URL: https://issues.apache.org/jira/browse/PIG-3794
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12.0, 0.13.0
 Environment: Windows Azure HDINSIGHT
 c:\apps\dist\pig-0.11.0.1.3.2.0-05
Reporter: Eric Hanson
Assignee: Eric Hanson
 Fix For: 0.13.0

 Attachments: PIG-3794.01.patch, PIG-3794.02.patch, PIG-3794.03.patch


 When you connect to an HDP 1.3 version HDINSIGHT cluster with remote desktop, 
 if you try this:
 c:\apps\dist\pig-0.11.0.1.3.2.0-05bin\pig -useHCatalog
 you get this:
 HCAT_HOME should be defined
 but you should not get an error.
 It appears that pig.cmd should use HCATALOG_HOME instead of HCAT_HOME.
 The same problem exists on the 1.3 one-box installation for Windows. A quick 
 look at the source code indicates it is still a problem on trunk.
 
 Equivalent Monarch Pig JIRA is PIG-125
 In addition, if you set hive.metastore.uris to '', this is supposed to create 
 an embedded metastore instead of going to the metastore service. This fails 
 on Azure HDINSIGHT and Windows because of missing datanucleus and sqljdbc4 
 jars. This is covered in Monarch Pig JIRA Pig-126.
 Finally, if you submit a pig job from WebHCat, due to argument quoting for 
 windows, -useHCatalog comes in as -useHCatalog into pig.cmd. This causes 
 -useHCatalog to never work on Windows from WebHCat.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema

2014-03-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926350#comment-13926350
 ] 

Julien Le Dem commented on PIG-3801:


I would use properties.getProperty(MAPREDUCE_FRAMEWORK_NAME).equals(LOCAL) to 
decide if it's running locally, but otherwise this looks good to me.

 Auto local mode does not call storeSchema
 -

 Key: PIG-3801
 URL: https://issues.apache.org/jira/browse/PIG-3801
 Project: Pig
  Issue Type: Bug
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: PIG-3801.patch


 https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481
 Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We 
 also need to add this for auto-local jobs.
 To repro this problem, run-
   a = load '2.txt' as (a0:chararray, a1:int);
   store a into 'a' using PigStorage(',','-schema');
 This creates .pig_schema file in pig -x local mode, but does not create 
 .pig_schema file in auto-local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3801) Auto local mode does not call storeSchema

2014-03-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926351#comment-13926351
 ] 

Julien Le Dem commented on PIG-3801:


+1

 Auto local mode does not call storeSchema
 -

 Key: PIG-3801
 URL: https://issues.apache.org/jira/browse/PIG-3801
 Project: Pig
  Issue Type: Bug
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: PIG-3801.patch


 https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java#L481
 Pig code explicitly runs PigOutputCommitter.storeCleanup for local jobs. We 
 also need to add this for auto-local jobs.
 To repro this problem, run-
   a = load '2.txt' as (a0:chararray, a1:int);
   store a into 'a' using PigStorage(',','-schema');
 This creates .pig_schema file in pig -x local mode, but does not create 
 .pig_schema file in auto-local mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PIG-3805) ToString(datetime [, format string]) doesn't work without the second argument

2014-03-10 Thread Jenny Thompson (JIRA)
Jenny Thompson created PIG-3805:
---

 Summary: ToString(datetime [, format string]) doesn't work without 
the second argument
 Key: PIG-3805
 URL: https://issues.apache.org/jira/browse/PIG-3805
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Affects Versions: 0.12.0
Reporter: Jenny Thompson
Priority: Minor


The exec function of ToString is written to handle a 1 or 2 arguments (it 
defaults to ISO, which is consistent with ToDate).

However, the getArgToFuncMapping function returns only one FuncSpec, requiring 
the formatString argument.

To fix: just return add another FuncSpec to getArcToFuncMapping.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] Subscription: PIG patch available

2014-03-10 Thread jira
Issue Subscription
Filter: PIG patch available (15 issues)

Subscriber: pigdaily

Key Summary
PIG-3801Auto local mode does not call storeSchema
https://issues.apache.org/jira/browse/PIG-3801
PIG-3794pig -useHCatalog fails using pig command line interface on HDInsight
https://issues.apache.org/jira/browse/PIG-3794
PIG-3783We can predict when small local jobs will cause an OOM and change 
io.sort.mb in that case
https://issues.apache.org/jira/browse/PIG-3783
PIG-3782PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema 
failing due to incorrect UID assignment
https://issues.apache.org/jira/browse/PIG-3782
PIG-3771Piggybank Avrostorage makes a lot of namenode calls in the backend
https://issues.apache.org/jira/browse/PIG-3771
PIG-3757Make scalar work
https://issues.apache.org/jira/browse/PIG-3757
PIG-3737Bundle dependent jars in distribution in %PIG_HOME%/lib folder
https://issues.apache.org/jira/browse/PIG-3737
PIG-3735UDF to data cleanse the dirty data with expected pattern
https://issues.apache.org/jira/browse/PIG-3735
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3613UDF for SimilarityMatching between strings with matching scores
https://issues.apache.org/jira/browse/PIG-3613
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3456Reduce threadlocal conf access in backend for each record
https://issues.apache.org/jira/browse/PIG-3456
PIG-3441Allow Pig to use default resources from Configuration objects
https://issues.apache.org/jira/browse/PIG-3441
PIG-3373XMLLoader returns non-matching nodes when a tag name spans through 
the block boundary
https://issues.apache.org/jira/browse/PIG-3373

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384


[jira] [Updated] (PIG-3783) We can predict when small local jobs will cause an OOM and change io.sort.mb in that case

2014-03-10 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated PIG-3783:
--

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Duplicating to PIG-3731

 We can predict when small local jobs will cause an OOM and change io.sort.mb 
 in that case
 -

 Key: PIG-3783
 URL: https://issues.apache.org/jira/browse/PIG-3783
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Ravi Prakash
  Labels: newbie
 Attachments: PIG-3783.branch-0.12.patch


 Seems like a lot of new users run into this problem:
 http://stackoverflow.com/questions/10165648/apache-pig-outofmemory-exception-with-simple-group-by-in-local-mode
 http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html
 http://stackoverflow.com/questions/16499432/pig-local-mode-group-or-join-java-lang-outofmemoryerror-java-heap-space



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3783) We can predict when small local jobs will cause an OOM and change io.sort.mb in that case

2014-03-10 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929917#comment-13929917
 ] 

Ravi Prakash commented on PIG-3783:
---

Thanks Aniket! I'll defer to your better judgement and close this JIRA as a 
duplicate of PIG-3731.

 We can predict when small local jobs will cause an OOM and change io.sort.mb 
 in that case
 -

 Key: PIG-3783
 URL: https://issues.apache.org/jira/browse/PIG-3783
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Ravi Prakash
  Labels: newbie
 Attachments: PIG-3783.branch-0.12.patch


 Seems like a lot of new users run into this problem:
 http://stackoverflow.com/questions/10165648/apache-pig-outofmemory-exception-with-simple-group-by-in-local-mode
 http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html
 http://stackoverflow.com/questions/16499432/pig-local-mode-group-or-join-java-lang-outofmemoryerror-java-heap-space



--
This message was sent by Atlassian JIRA
(v6.2#6252)