[jira] Commented: (PIG-1209) Port POJoinPackage to proactively spill

2010-02-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828540#action_12828540
 ] 

Hadoop QA commented on PIG-1209:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434483/pig-1209.patch
  against trunk revision 905377.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/196/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/196/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/196/console

This message is automatically generated.

 Port POJoinPackage to proactively spill
 ---

 Key: PIG-1209
 URL: https://issues.apache.org/jira/browse/PIG-1209
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1209.patch


 POPackage proactively spills the bag whereas POJoinPackage still uses the 
 SpillableMemoryManager. We should port this to use InternalCacheBag which 
 proactively spills.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-02-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-20.patch

Fix one bug in MergeJoin when index has only one entry.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
 PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2010-02-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-598:
---

Fix Version/s: 0.7.0

 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Fix For: 0.7.0

 Attachments: PIG-598.1.patch, PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1215) Make Hadoop jobId more prominent in the client log

2010-02-02 Thread Olga Natkovich (JIRA)
Make Hadoop jobId more prominent in the client log
--

 Key: PIG-1215
 URL: https://issues.apache.org/jira/browse/PIG-1215
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0


This is a request from applications that want to be able to programmatically 
parse client logs to find hadoop Ids.

The woould like to see each job id on a separate line in the following format:

hadoopJobId: job_123456789

They would also like to see the jobs in the order they are executed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-377) Grunt parser doesn't handle escape codes correctly

2010-02-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-377:
---

Priority: Major  (was: Critical)

Not sure why this issue was marked critical

 Grunt parser doesn't handle escape codes correctly
 --

 Key: PIG-377
 URL: https://issues.apache.org/jira/browse/PIG-377
 Project: Pig
  Issue Type: Bug
  Components: grunt
 Environment: Pig Trunk  01-Aug-2008
 Hadoop 17.1
 Linux Ubuntu
Reporter: Rafael Turk

 Grunt parser doesn't handle escape codes correctly such as \s \n \\..
 Exemple, using:
  raw_filtered = FILTER raw BY ngram matches '^[a-zA-Z0-9\s]$';
 OR
  raw_filtered = FILTER raw BY ngram matches ^[a-zA-Z0-9\s]$;
 I get the following error:
 org.apache.pig.impl.logicalLayer.parser.TokenMgrError: Lexical error at line 
 1, column 57.  Encountered: s (115), after : \'^[a-zA-Z0-9\\
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1623)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_consume_token(QueryParser.java:4744)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.PUnaryCond(QueryParser.java:1117)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.PAndCond(QueryParser.java:1055)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.POrCond(QueryParser.java:1005)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.PCond(QueryParser.java:973)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.FilterClause(QueryParser.java:941)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:686)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:81)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:60)
 at org.apache.pig.Main.main(Main.java:294)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1131:


Priority: Major  (was: Critical)

Not sure why this issue was marked critical

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-723) Pig generates incorrect schema for generated bags after FOREACH.

2010-02-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-723:
---

Description: 
grunt rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, 
rhs:chararray, r:float, p:float, c:float);
grunt rf_grouped = GROUP rf_src BY rhs;
  
grunt lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) 
as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
grunt describe lhs_grouped;
lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}

I think it should be:
lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: float}

Because of this, we are not able to perform UNION on 2 sets because union on 
incompatible schemas is causing a complete loss of schema information, making 
further processing impossible.

This is what we want to UNION with:

grunt asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int); 
   
grunt aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as 
lhs, -10F as p, -10F as c;
grunt describe aa;
aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}

If there is something wrong with what I am trying to do, please let me know.


  was:

grunt rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, 
rhs:chararray, r:float, p:float, c:float);
grunt rf_grouped = GROUP rf_src BY rhs;
  
grunt lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, r) 
as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
grunt describe lhs_grouped;
lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}

I think it should be:
lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: float}

Because of this, we are not able to perform UNION on 2 sets because union on 
incompatible schemas is causing a complete loss of schema information, making 
further processing impossible.

This is what we want to UNION with:

grunt asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, a:int); 
   
grunt aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as 
lhs, -10F as p, -10F as c;
grunt describe aa;
aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}

If there is something wrong with what I am trying to do, please let me know.


   Priority: Major  (was: Critical)

Not sure why this issue was marked as critical

 Pig generates incorrect schema for generated bags after FOREACH.
 

 Key: PIG-723
 URL: https://issues.apache.org/jira/browse/PIG-723
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.1.0
 Environment: Linux
 $pig --version
 Apache Pig version 0.1.0-dev (r750430)
 compiled Mar 07 2009, 09:20:13
Reporter: Dhruv M

 grunt rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, 
 rhs:chararray, r:float, p:float, c:float);
 grunt rf_grouped = GROUP rf_src BY rhs;  
 
 grunt lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, 
 r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
 grunt describe lhs_grouped;
 lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}
 I think it should be:
 lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: 
 float}
 Because of this, we are not able to perform UNION on 2 sets because union on 
 incompatible schemas is causing a complete loss of schema information, making 
 further processing impossible.
 This is what we want to UNION with:
 grunt asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, 
 a:int);
 grunt aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as 
 lhs, -10F as p, -10F as c;
 grunt describe aa;
 aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}
 If there is something wrong with what I am trying to do, please let me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1046) join algorithm specification is within double quotes

2010-02-02 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1046:
--

Status: Open  (was: Patch Available)

Collected hint for Group needs similar fix. Canceling patch as a result. Will 
be uploading new patch with that included.  

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1046) join algorithm specification is within double quotes

2010-02-02 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1046:
--

Status: Patch Available  (was: Open)

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1046) join algorithm specification is within double quotes

2010-02-02 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1046:
--

Attachment: pig-1046_2.patch

Patch which includes same fix for collected hint for Group by. Test cases 
included.

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1214) Pig/Zebra 0.6 patch - docs

2010-02-02 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-1214:
-

Attachment: pig-1214.patch

Patch file.

 Pig/Zebra 0.6 patch - docs
 --

 Key: PIG-1214
 URL: https://issues.apache.org/jira/browse/PIG-1214
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.6.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Attachments: pig-1214.patch


 Pig Docs
  piglatin_ref2.xml - Update PigStorage function to include information about 
  '/r' delimiter
 Zebra Docs
  zebra_pig.xml - Add new section, Sorting Data: Zebra only supports tables 
  sorted in ascending (ASC) order; tables sorted in descending (DESC) order 
  are treated as unsorted tables

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1214) Pig/Zebra 0.6 patch - docs

2010-02-02 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel updated PIG-1214:
-

Status: Patch Available  (was: Open)

(1) No new test code required; changes to documentation only.

(2) Apply this path to

 Pig TRUNK  -  http://svn.apache.org/repos/asf/hadoop/pig/trunk

 Pig branch-0.6  -  
 http://svn.apache.org/repos/asf/hadoop/pig/branches/branch-0.6/

 Pig/Zebra 0.6 patch - docs
 --

 Key: PIG-1214
 URL: https://issues.apache.org/jira/browse/PIG-1214
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.6.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Attachments: pig-1214.patch


 Pig Docs
  piglatin_ref2.xml - Update PigStorage function to include information about 
  '/r' delimiter
 Zebra Docs
  zebra_pig.xml - Add new section, Sorting Data: Zebra only supports tables 
  sorted in ascending (ASC) order; tables sorted in descending (DESC) order 
  are treated as unsorted tables

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828822#action_12828822
 ] 

Ankit Modi commented on PIG-1154:
-

It looks like the problem is caused by overwritten value of mapred.system.dir 
from mapred-default.xml and the path mentioned above 
/mapredsystem/hadoop/mapredsystem/ may not exist.

This cannot be solved in local mode as it is not possible to change classpath 
at runtime.

I'll provide a patch which would
   * Provide a warning whenever classpath contains mapred-site.xml or 
hdfs-site.xml.
   * It'll exit pig with an error message if above case is encountered.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1201) [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra

2010-02-02 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-1201:
---


Patch looks good +1

 [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all 
 JobConf contents including those unused by zebra
 --

 Key: PIG-1201
 URL: https://issues.apache.org/jira/browse/PIG-1201
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1201.patch, PIG-1201.patch, PIG-1201.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828826#action_12828826
 ] 

Alan Gates commented on PIG-1154:
-

I'm assuming you mean it will provide a warning and error out when the user is 
in local mode and mapred-site.xml or hdfs-site.xml are found in the classpath?

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828831#action_12828831
 ] 

Ankit Modi commented on PIG-1154:
-

It will provide warning whenever the files are encountered in Local Mode.

On top of that it will exit with error if mapred.system.dir is different from 
the default one and it does not exist.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1154:


Attachment: pig_1154.patch

Patch according to comments mentioned above.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: pig_1154.patch


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1154:


Status: Patch Available  (was: Open)

This patch only affects only Local Mode in pig.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: pig_1154.patch


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-02-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828839#action_12828839
 ] 

Alan Gates commented on PIG-366:


I talked to Shubham (the original author).  He indicated that the code in 
pigpen.patch from 11-12-2008 is the latest code.

 PigPen - Eclipse plugin for a graphical PigLatin editor
 ---

 Key: PIG-366
 URL: https://issues.apache.org/jira/browse/PIG-366
 Project: Pig
  Issue Type: New Feature
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Attachments: org.apache.pig.pigpen_0.0.1.jar, 
 org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
 pigpen.patch, pigPen.patch, PigPen.tgz


 This is an Eclipse plugin that provides a GUI that can help users create 
 PigLatin scripts and see the example generator outputs on the fly and submit 
 the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1046) join algorithm specification is within double quotes

2010-02-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828851#action_12828851
 ] 

Hadoop QA commented on PIG-1046:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434583/pig-1046_2.patch
  against trunk revision 905377.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/187/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/187/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/187/console

This message is automatically generated.

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-02-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: (was: PIG-1090-20.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
 PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-02-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1090:


Attachment: PIG-1090-20.patch

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-16.patch, 
 PIG-1090-17.patch, PIG-1090-18.patch, PIG-1090-19.patch, PIG-1090-2.patch, 
 PIG-1090-20.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front

2010-02-02 Thread Alan Gates (JIRA)
New load store design does not allow Pig to validate inputs and outputs up front


 Key: PIG-1216
 URL: https://issues.apache.org/jira/browse/PIG-1216
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates


In Pig 0.6 and before, Pig attempts to verify existence of inputs and 
non-existence of outputs during parsing to avoid run time failures when inputs 
don't exist or outputs can't be overwritten.  The downside to this was that Pig 
assumed all inputs and outputs were HDFS files, which made implementation 
harder for non-HDFS based load and store functions.  In the load store redesign 
(PIG-966) this was delegated to InputFormats and OutputFormats to avoid this 
problem and to make use of the checks already being done in those 
implementations.  Unfortunately, for Pig Latin scripts that run more then one 
MR job, this does not work well.  MR does not do input/output verification on 
all the jobs at once.  It does them one at a time.  So if a Pig Latin script 
results in 10 MR jobs and the file to store to at the end already exists, the 
first 9 jobs will be run before the 10th job discovers that the whole thing was 
doomed from the beginning.  

To avoid this a validate call needs to be added to the new LoadFunc and 
StoreFunc interfaces.  Pig needs to pass this method enough information that 
the load function implementer can delegate to InputFormat.getSplits() and the 
store function implementer to OutputFormat.checkOutputSpecs() if s/he decides 
to.  Since 90% of all load and store functions use HDFS and PigStorage will 
also need to, the Pig team should implement a default file existence check on 
HDFS and make it available as a static method to other Load/Store function 
implementers.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-02 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-1178:


Status: Open  (was: Patch Available)

I found a bug in the code so I'll be releasing another patch for the same.

I'll keep this patch in the JIRA until I replace it with a new one so everyone 
can review it.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1188) Padding nulls to the input tuple according to input schema

2010-02-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828896#action_12828896
 ] 

Alan Gates commented on PIG-1188:
-

After further thought I want to change my position on this.

There are two cases to consider, when schema is present and when it isn't.  The 
problem is by the time Pig is trying to access the missing field (in the 
backend), it has no idea whether the schema exists or not.  So at runtime, Pig 
should just return a null if it gets ArrayOutOfBoundsException.

How to pad missing data should be left up to the load function.  Perhaps 
certain load functions do know how to pad missing data, or are ok with the pad 
at the end scheme proposed here.  If the load function does not check, then Pig 
would effectively pad at the end, given the proposal above.  If the load 
function implementer does not what this to happen, s/he can check each tuple 
being read from the input to assure it matches the schema, and then decide to 
pad the tuple with nulls, reject the tuple, or return a tuple full of nulls.

In the case of PigStorage, checking each tuple for a match against the schema 
is too expensive.  Ideally I would like it to, because I think that when the 
user gives a schema it's an error if the data doesn't match.  But I don't want 
to pay the performance penalty in this case.  

 Padding nulls to the input tuple according to input schema
 --

 Key: PIG-1188
 URL: https://issues.apache.org/jira/browse/PIG-1188
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
 Fix For: 0.7.0


 Currently, the number of fields in the input tuple is determined by the data. 
 When we have schema, we should generate input data according to the schema, 
 and padding nulls if necessary. Here is one example:
 Pig script:
 {code}
 a = load '1.txt' as (a0, a1);
 dump a;
 {code}
 Input file:
 {code}
 1   2
 1   2   3
 1
 {code}
 Current result:
 {code}
 (1,2)
 (1,2,3)
 (1)
 {code}
 Desired result:
 {code}
 (1,2)
 (1,2)
 (1, null)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1214) Pig/Zebra 0.6 patch - docs

2010-02-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828929#action_12828929
 ] 

Hadoop QA commented on PIG-1214:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434607/pig-1214.patch
  against trunk revision 905377.

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/197/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/197/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/197/console

This message is automatically generated.

 Pig/Zebra 0.6 patch - docs
 --

 Key: PIG-1214
 URL: https://issues.apache.org/jira/browse/PIG-1214
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.6.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Attachments: pig-1214.patch


 Pig Docs
  piglatin_ref2.xml - Update PigStorage function to include information about 
  '/r' delimiter
 Zebra Docs
  zebra_pig.xml - Add new section, Sorting Data: Zebra only supports tables 
  sorted in ascending (ASC) order; tables sorted in descending (DESC) order 
  are treated as unsorted tables

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-02-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828966#action_12828966
 ] 

Hadoop QA commented on PIG-1154:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434613/pig_1154.patch
  against trunk revision 905377.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/188/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/188/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/188/console

This message is automatically generated.

 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: pig_1154.patch


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Private variables are not eco-friendly

2010-02-02 Thread Pradeep Kamath
Would it be better to make them protected when a use case for
inheritance arises rather than begin as protected? 

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Tuesday, February 02, 2010 7:35 PM
To: pig-dev@hadoop.apache.org
Subject: Private variables are not eco-friendly

Hi all,
I keep running into problems trying to extend Pig due to variables
being declared private. The latest time around it was in PigSlice --
one can't inherit it and do much meaningful overriding of methods
because the input streams are private rather than protected, so I
can't change how it gets created. I wound up having to copy+paste the
class wholesale, which is unfortunate. I know the Slice/Slicer
interfaces are going away, but as a general rule -- can we be mindful
of folks trying to extend classes, and make inner members protected,
rather than private or package?

Thanks
-Dmitriy