[jira] Updated: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-17 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-566:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Yes, my mistake. Thanks Mridul. Fortunately Gianmarco doesn't listen to me :) I 
manually test the patch, all tests pass. Committed to trunk. Thanks Gianmarco!

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Santhosh Srinivasan
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, 
 PIG-566.patch, PIG-566.patch


 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867898#action_12867898
 ] 

Daniel Dai commented on PIG-566:


Seems hudson is down. I will manually run the tests.

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Santhosh Srinivasan
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, 
 PIG-566.patch, PIG-566.patch


 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1381:


Fix Version/s: (was: 0.7.0)

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch, PIG-1381-5.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-566:
---

Fix Version/s: (was: 0.7.0)

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Santhosh Srinivasan
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, 
 PIG-566.patch, PIG-566.patch


 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1391:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

 pig unit tests leave behind files in temp directory because MiniCluster files 
 don't get deleted
 ---

 Key: PIG-1391
 URL: https://issues.apache.org/jira/browse/PIG-1391
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.7.0, 0.8.0, 0.6.0

 Attachments: minicluster.patch, PIG-1391.06.2.patch, 
 PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch


 Pig unit test runs leave behind files in temp dir (/tmp) and there are too 
 many files in the directory over time.
 Most of the files are left behind by MiniCluster . It closes/shutsdown 
 MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the 
 constructor is called, only in finalize(). And java does not guarantee that 
 finalize() will be called. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1417) Site changes for 0.7

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1417.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

 Site changes for 0.7
 

 Key: PIG-1417
 URL: https://issues.apache.org/jira/browse/PIG-1417
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-598.
--


 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Fix For: 0.7.0

 Attachments: PIG-598.1.patch, PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-617) Using SUM with basic type fails

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-617.
--


 Using SUM with basic type fails
 ---

 Key: PIG-617
 URL: https://issues.apache.org/jira/browse/PIG-617
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
 Fix For: 0.7.0


 SUM is an aggregate function that expects a bag as an argument. When basic 
 types are used as arguments to SUM, Pig fails during run time. The 
 typechecker should catch this error and fail earlier.
 An example is given below:
 {code}
 grunt a = load 'one' as (i: int);
 grunt b = foreach a generate SUM(i);
 grunt dump b;
 2009-01-12 14:11:47,595 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-01-12 14:12:12,617 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Map reduce job failed
 2009-01-12 14:12:12,618 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Job failed!
 2009-01-12 14:12:12,623 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
 message from task (map) 
 task_200812151518_9683_m_00java.lang.ClassCastException: 
 java.lang.Integer cannot be cast to org.apache.pig.data.DataBag
 2009-01-12 14:12:12,623 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
 message from task (map) 
 task_200812151518_9683_m_00java.lang.ClassCastException: 
 java.lang.Integer cannot be cast to org.apache.pig.data.DataBag
 at org.apache.pig.builtin.IntSum.sum(IntSum.java:141)
 at org.apache.pig.builtin.IntSum.exec(IntSum.java:41)
 at org.apache.pig.builtin.IntSum.exec(IntSum.java:36)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:185)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:247)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:265)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:197)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:187)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:175)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
 at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
 ...
 2009-01-12 14:12:12,629 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1066: Unable to open iterator for alias b
 2009-01-12 14:12:12,629 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 org.apache.pig.impl.logicalLayer.FrontendException: Unable to open iterator 
 for alias b
 at org.apache.pig.PigServer.openIterator(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:271)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:72)
 at org.apache.pig.Main.main(Main.java:302)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 at org.apache.pig.PigServer.openIterator(PigServer.java:419)
 ... 5 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-257) Allow usage of custom Hadoop InputFormat in Pig

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-257.
--


 Allow usage of custom Hadoop InputFormat in Pig
 ---

 Key: PIG-257
 URL: https://issues.apache.org/jira/browse/PIG-257
 Project: Pig
  Issue Type: New Feature
Reporter: Pi Song
 Fix For: 0.7.0


 This very cool idea sprang out from a discussion in mailing-list (Thanks 
 Manish Shah)
 There is a semantic issue that Hadoop Input Format generally expects K,V but 
 Pig expects Tuple. We can solve this by sticking K,V as fields in Tuple. 
 Provided that we've got rich built-in string/binary manipulation functions, 
 Hadoop users shouldn't find it too costly to use Pig. This should definitely 
 help accelerate Pig adoption process.
 After a brief look at the current code, this new feature will require changes 
 in Map Reduce execution engine so I will wait until the type branch is 
 complete before start working on this (If nobody expresses interest in doing 
 it :) ) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-518) LOBinCond exception in LogicalPlanValidationExecutor when providing default values for bag

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-518.
--


 LOBinCond  exception in LogicalPlanValidationExecutor when providing default 
 values for bag
 ---

 Key: PIG-518
 URL: https://issues.apache.org/jira/browse/PIG-518
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
 Fix For: 0.7.0

 Attachments: queries.txt, sports_views.txt


 The following piece of Pig script, which provides default values for bags 
 {('','')}  when the COUNT returns 0 fails with the following error. (Note: 
 Files used in this script are enclosed on this Jira.)
 
 a = load 'sports_views.txt' as (col1, col2, col3);
 b = load 'queries.txt' as (colb1,colb2,colb3);
 mycogroup = cogroup a by col1 inner, b by colb1;
 mynewalias = foreach mycogroup generate flatten(a), flatten((COUNT(b)  0L ? 
 b.(colb2,colb3) : {('','')}));
 dump mynewalias;
 
 java.io.IOException: Unable to open iterator for alias: mynewalias [Unable to 
 store for alias: mynewalias [Can't overwrite cause]]
  at java.lang.Throwable.initCause(Throwable.java:320)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1494)
  at org.apache.pig.impl.logicalLayer.LOBinCond.visit(LOBinCond.java:85)
  at org.apache.pig.impl.logicalLayer.LOBinCond.visit(LOBinCond.java:28)
  at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
  at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.checkInnerPlan(TypeCheckingVisitor.java:2345)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2252)
  at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:121)
  at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:40)
  at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
  at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
  at 
 org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
  at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
  at 
 org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:
 79)
  at org.apache.pig.PigServer.compileLp(PigServer.java:684)
  at org.apache.pig.PigServer.compileLp(PigServer.java:655)
  at org.apache.pig.PigServer.store(PigServer.java:433)
  at org.apache.pig.PigServer.store(PigServer.java:421)
  at org.apache.pig.PigServer.openIterator(PigServer.java:384)
  at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
  at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
  at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
  at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Unable to store for alias: mynewalias [Can't 
 overwrite cause]
  ... 26 more
 Caused by: java.lang.IllegalStateException: Can't overwrite cause
  ... 26 more
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-756) UDFs should have API for transparently opening and reading files from HDFS or from local file system with only relative path

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-756.
--


 UDFs should have API for transparently opening and reading files from HDFS or 
 from local file system with only relative path
 

 Key: PIG-756
 URL: https://issues.apache.org/jira/browse/PIG-756
 Project: Pig
  Issue Type: Bug
Reporter: David Ciemiewicz
 Fix For: 0.7.0


 I have a utility function util.INSETFROMFILE() that I pass a file name during 
 initialization.
 {code}
 define inQuerySet util.INSETFROMFILE(analysis/queries);
 A = load 'logs' using PigStorage() as ( date int, query chararray );
 B = filter A by inQuerySet(query);
 {code}
 This provides a computationally inexpensive way to effect map-side joins for 
 small sets plus functions of this style provide the ability to encapsulate 
 more complex matching rules.
 For rapid development and debugging purposes, I want this code to run without 
 modification on both my local file system when I do pig -exectype local and 
 on HDFS.
 Pig needs to provide an API for UDFs which allow them to either:
 1) know  when they are in local or HDFS mode and let them open and read 
 from files as appropriate
 2) just provide a file name and read statements and have pig transparently 
 manage local or HDFS opens and reads for the UDF
 UDFs need to read configuration information off the filesystem and it 
 simplifies the process if one can just flip the switch of -exectype local.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-758) Converting load/store locations into fully qualified absolute paths

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-758.
--


 Converting load/store locations into fully qualified absolute paths
 ---

 Key: PIG-758
 URL: https://issues.apache.org/jira/browse/PIG-758
 Project: Pig
  Issue Type: Bug
Reporter: Gunther Hagleitner
 Fix For: 0.7.0


 As part of the multiquery optimization work there is a need to use absolute 
 paths for load and store operations (because the current directory changes 
 during the execution of the script). In order to do so, we are suggesting a 
 change to the semantics of the location/filename string used in LoadFunc and 
 Slicer/Slice.
 The proposed change is:
* Load locations without a scheme part are expected to be hdfs (mapreduce 
 mode) or local (local mode) paths
* Any hdfs or local path will be translated to a fully qualified absolute 
 path before it is handed to either a LoadFunc or Slicer
* Any scheme other than file or hdfs will result in the load path to 
 be passed through to the LoadFunc or Slicer without any modification.
 Example:
 If you have a LoadFunc that reads from a database, in the current system the 
 following could be used:
 {noformat}
 a = load 'table' using DBLoader();
 {noformat}
 With the proposed changes table would be translated into an hdfs path though 
 (hdfs:///table). Probably not what the DBLoader would want to see. In 
 order to make it work one could use:
 {noformat}
 a = load 'sql://table' using DBLoader();
 {noformat}
 Now the DBLoader would see the unchanged string sql://table.
 This is an incompatible change, but hopefully not affecting many existing 
 Loaders/Slicers. Since this is needed with the multiquery feature, the 
 behavior can be reverted back by using the no_multiquery pig flag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-613) Casting complex type(tuple/bag/map) does not take effect

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-613.
--


 Casting complex type(tuple/bag/map) does not take effect
 

 Key: PIG-613
 URL: https://issues.apache.org/jira/browse/PIG-613
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: myfloatdata.txt, PIG-613-1.patch, PIG-613-2.patch, 
 SQUARE.java


 Consider the following Pig script which casts return values of the SQUARE UDF 
 which are  tuples of doubles to long. The describe output of B shows it is 
 long, however the result is still double.
 {code}
 register statistics.jar;
 A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double);
 B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as 
 squares:(loadtimesq);
 describe B;
 explain B;
 dump B;
 {code}
 ===
 Describe output of B:
 B: {squares: (loadtimesq: long)}
 ===
 Sample output of B:
 ((7885.44))
 ((792098.2200010001))
 ((1497360.9268889998))
 ((50023.7956))
 ((0.972196))
 ((0.30980356))
 ((9.9760144E-7))
 ===
 Cause: The cast for Tuples has not been implemented in POCast.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-726) Stop printing scope as part of Operator.toString()

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-726.
--


 Stop printing scope as part of Operator.toString()
 --

 Key: PIG-726
 URL: https://issues.apache.org/jira/browse/PIG-726
 Project: Pig
  Issue Type: Improvement
Reporter: Thejas M Nair
Assignee: Gunther Hagleitner
 Fix For: 0.7.0


 When an operator is printed in pig, it prints a string with the user name and 
 date at which the grunt shell was started. This information is not useful and 
 makes the output very verbose.
 For example, a line in explain is like -
 ForEach tejas-Thu Mar 19 11:25:23 PDT 2009-4 Schema: {themap: map[ ]} Type: 
 bag
 I am proposing that it should change to -
 ForEach (id:4) Schema: {themap: map[ ]} Type: bag
 That string comes from scope in OperatorKey class. We don't use make use of 
 it anywhere, so we should stop printing it. The change is only in 
 OperatorKey.toString();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-829) DECLARE statement stop processing after special characters such as dot . , + % etc..

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-829.
--


 DECLARE statement stop processing after special characters such as dot . , 
 + % etc..
 --

 Key: PIG-829
 URL: https://issues.apache.org/jira/browse/PIG-829
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
 Fix For: 0.7.0


 The below Pig script does not work well, when special characters are used in 
 the DECLARE statement.
 {code}
 %DECLARE OUT foo.bar
 x = LOAD 'something' as (a:chararray, b:chararray);
 y = FILTER x BY ( a MATCHES '^.*yahoo.*$' );
 STORE y INTO '$OUT';
 {code}
 When the above script is run in the dry run mode; the substituted file does 
 not contain the special character.
 {code}
 java -cp pig.jar:/homes/viraj/hadoop-0.18.0-dev/conf -Dhod.server='' 
 org.apache.pig.Main -r declaresp.pig
 {code}
 Resulting file: declaresp.pig.substituted
 {code}
 x = LOAD 'something' as (a:chararray, b:chararray);
 y = FILTER x BY ( a MATCHES '^.*yahoo.*$' );
 STORE y INTO 'foo';
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-760.
--


 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pigstorageschema-2.patch, pigstorageschema.patch, 
 pigstorageschema_3.patch, pigstorageschema_4.patch, pigstorageschema_5.patch, 
 pigstorageschema_7.patch, 
 TEST-org.apache.pig.piggybank.test.TestPigStorageSchema.txt


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-803) Pig Latin Reference Manual - discussion of Pig streaming is incomplete

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-803.
--


 Pig Latin Reference Manual - discussion of Pig streaming is incomplete
 --

 Key: PIG-803
 URL: https://issues.apache.org/jira/browse/PIG-803
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: David Ciemiewicz
Assignee: Corinne Chandel
 Fix For: 0.7.0


 The Pig Latin Reference Manual section on STREAM is missing broad swaths of 
 information such as a discussion of the ship() clause.
 http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_STREAM_
 A more complete definition seems to be here:
 http://wiki.apache.org/pig/PigStreamingFunctionalSpec
 However, it discusses auto shipping of scripts which doesn't seem to be 
 working.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-887) document use of expressions in join,group,cogroup

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-887.
--


 document use of expressions in join,group,cogroup
 -

 Key: PIG-887
 URL: https://issues.apache.org/jira/browse/PIG-887
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Thejas M Nair
Assignee: Olga Natkovich
 Fix For: 0.7.0


 For join,group,cogroup relational operators, pig allows expressions to be 
 used in place of the field aliases in the syntax documented in 
 http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm .
 But this feature is not documented in the manual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-834) incorrect plan when algebraic functions are nested

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-834.
--


 incorrect plan when algebraic functions are nested
 --

 Key: PIG-834
 URL: https://issues.apache.org/jira/browse/PIG-834
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch


 a = load 'students.txt' as (c1,c2,c3,c4); 
 c = group a by c2;  
 f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
 Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
 distinct does not function, and incorrect results are produced.
 Distinct should have been evaluated in the 3 stages and output of Distinct 
 should be given to COUNT in reduce stage.
 {code}
 # Map Reduce Plan  
 #--
 MapReduce node 1-122
 Map Plan
 Local Rearrange[tuple]{bytearray}(false) - 1-139
 |   |
 |   Project[bytearray][1] - 1-140
 |
 |---New For Each(false,false)[bag] - 1-127
 |   |
 |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
 |   |
 |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
 |   |
 |   |---Project[bag][2] - 1-123
 |   |
 |   |---Project[bag][1] - 1-124
 |   |
 |   Project[bytearray][0] - 1-133
 |
 |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
 |
 
 |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
  - 1-111
 Combine Plan
 Local Rearrange[tuple]{bytearray}(false) - 1-143
 |   |
 |   Project[bytearray][1] - 1-144
 |
 |---New For Each(false,false)[bag] - 1-132
 |   |
 |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
 |   |
 |   |---Project[bag][0] - 1-135
 |   |
 |   Project[bytearray][1] - 1-134
 |
 |---POCombinerPackage[tuple]{bytearray} - 1-137
 Reduce Plan
 Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
 |
 |---New For Each(false)[bag] - 1-120
 |   |
 |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
 |   |
 |   |---Project[bag][0] - 1-136
 |
 |---POCombinerPackage[tuple]{bytearray} - 1-145
 Global sort: false
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-843) PERFORMANCE: improvements in memory management

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-843.
--


 PERFORMANCE: improvements in memory management
 --

 Key: PIG-843
 URL: https://issues.apache.org/jira/browse/PIG-843
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
 Fix For: 0.7.0


 Currently, Pig uses way too much memory. We need to understand where memory 
 goes and come up with strategy to minimize memory footprint

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-844) PERFORMANCE: streaming data to the UDFs in foreach

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-844.
--


 PERFORMANCE: streaming data to the UDFs in foreach
 --

 Key: PIG-844
 URL: https://issues.apache.org/jira/browse/PIG-844
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
 Fix For: 0.7.0


 Currently, Pig places the data passed to UDFs into a bag. This can cause the 
 process to use more memory than actually needed as in many cases it would be 
 better to push the data one tuple at a time to the UDFs.
 For the case where combiner is invoked, this might not be that important; 
 however, for non-algebraic UDFs as well as other cases where combiner can't 
 be used, this can provide significant memory improvement.
 Another possible use case is where the data is already grouped going into pig 
 and we don't need to group it again.
 How this will effect UDF interface needs to be further discussed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-872) use distributed cache for the replicated data set in FR join

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-872.
--


 use distributed cache for the replicated data set in FR join
 

 Key: PIG-872
 URL: https://issues.apache.org/jira/browse/PIG-872
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_872.patch.1


 Currently, the replicated file is read directly from DFS by all maps. If the 
 number of the concurrent maps is huge, we can overwhelm the NameNode with 
 open calls.
 Using distributed cache will address the issue and might also give a 
 performance boost since the file will be copied locally once and the reused 
 by all tasks running on the same machine.
 The basic approach would be to use cacheArchive to place the file into the 
 cache on the frontend and on the backend, the tasks would need to refer to 
 the data using path from the cache.
 Note that cacheArchive does not work in Hadoop local mode. (Not a problem for 
 us right now as we don't use it.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-879.
--


 Pig should provide a way for input location string in load statement to be 
 passed as-is to the Loader
 -

 Key: PIG-879
 URL: https://issues.apache.org/jira/browse/PIG-879
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Pradeep Kamath
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-879.patch, PIG-879.patch, PIG-879.patch, 
 PIG-879.patch, PIG-879.patch


  Due to multiquery optimization, Pig always converts the filenames to 
 absolute URIs (see 
 http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section 
 about Incompatible Changes - Path Names and Schemes). This is necessary since 
 the script may have cd .. statements between load or store statements and 
 if the load statements have relative paths, we would need to convert to 
 absolute paths to know where to load/store from. To do this 
 QueryParser.massageFilename() has the code below[1] which basically gives the 
 fully qualified hdfs path
  
 However the issue with this approach is that if the filename string is 
 something like 
 hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2,
  the code below[1] actually translates this to 
 hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2
  and throws an exception that it is an incorrect path.
  
 Some loaders may want to interpret the filenames (the input location string 
 in the load statement) in any way they wish and may want Pig to not make 
 absolute paths out of them.
  
 There are a few options to address this:
 1)A command line switch to indicate to Pig that pathnames in the script 
 are all absolute and hence Pig should not alter them and pass them as-is to 
 Loaders and Storers. 
 2)A keyword in the load and store statements to indicate the same intent 
 to pig
 3)A property which users can supply on cmdline or in pig.properties to 
 indicate the same intent.
 4)A method in LoadFunc - relativeToAbsolutePath(String filename, String 
 curDir) which does the conversion to absolute - this way Loader can chose to 
 implement it as a noop.
 Thoughts?
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-933) broken link in pig-latin reference manual to hadoop file glob pattern documentation

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-933.
--


 broken link in pig-latin reference manual to hadoop file glob pattern 
 documentation
 ---

 Key: PIG-933
 URL: https://issues.apache.org/jira/browse/PIG-933
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Thejas M Nair
Assignee: Olga Natkovich
Priority: Minor
 Fix For: 0.7.0


 http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_LOAD
  has a link to 
 http://lucene.apache.org/hadoop/api/org/apache/hadoop/fs/FileSystem.html#globPaths(org.apache.hadoop.fs.Path)the
  , 
 it should be - 
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-937) Task get stuck in BasicTable's BTScaner's atEnd() method

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-937.
--


 Task get stuck in BasicTable's BTScaner's atEnd() method
 

 Key: PIG-937
 URL: https://issues.apache.org/jira/browse/PIG-937
 Project: Pig
  Issue Type: Bug
Reporter: He Yongqiang
 Fix For: 0.7.0


 It seems is caused by the infinite loop in the code:
 BasicTable, Line 698
 {noformat}
 while (true)
 {
   int index = random.nextInt(cgScanners.length - 1) + 1;
   if (cgScanners[index] != null) {
 if (cgScanners[index].atEnd() != ret) {
   throw new IOException(
   atEnd() failed: Column Groups are not evenly positioned.);
 }
 break;
   }
 }
 {noformat}
 I think it's fine to just use a for loop here, like:
 {noformat}
 for (int index = 0; index  cgScanners.length; index++) {
   if (cgScanners[index] != null) {
 if (cgScanners[index].atEnd() != ret) {
   throw new IOException(
   atEnd() failed: Column Groups are not evenly positioned.);
 }
 break;
   }
 }
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-940) Cross site HDFS access using the default.fs.name not possible in Pig

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-940.
--


 Cross site HDFS access using the default.fs.name not possible in Pig
 

 Key: PIG-940
 URL: https://issues.apache.org/jira/browse/PIG-940
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.5.0
 Environment: Hadoop 20
Reporter: Viraj Bhat
 Fix For: 0.7.0


 I have a script which does the following.. access data from a remote HDFS 
 location (via a HDFS installed at:hdfs://remotemachine1.company.com/ ) [[as I 
 do not want to copy this huge amount of data between HDFS locations]].
 However I want my Pigscript  to write data to the HDFS running on 
 localmachine.company.com.
 Currently Pig does not support that behavior and complains that: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist
 {code}
 A = LOAD 'hdfs://remotemachine1.company.com/user/viraj/A1.txt' as (a, b); 
 B = LOAD 'hdfs://remotemachine1.company.com/user/viraj/B1.txt' as (c, d); 
 C = JOIN A by a, B by c; 
 store C into 'output' using PigStorage();  
 {code}
 ===
 2009-09-01 00:37:24,032 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localmachine.company.com:8020
 2009-09-01 00:37:24,277 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localmachine.company.com:50300
 2009-09-01 00:37:24,567 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
  - Rewrite: POPackage-POForEach to POJoinPackage
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2009-09-01 00:37:24,573 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2009-09-01 00:37:26,197 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2009-09-01 00:37:26,249 [Thread-9] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-09-01 00:37:26,746 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-09-01 00:37:26,747 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map reduce job(s) failed!
 2009-09-01 00:37:26,756 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed to produce result in: 
 hdfs:/localmachine.company.com/tmp/temp-1470407685/tmp-510854480
 2009-09-01 00:37:26,756 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-09-01 00:37:26,758 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2100: hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 Details at logfile: /home/viraj/pigscripts/pig_1251765443851.log
 ===
 The error file in Pig contains:
 ===
 ERROR 2998: Unhandled internal error. 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2100: 
 hdfs://localmachine.company.com/user/viraj/A1.txt does not exist.
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:126)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:228)
 at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 

[jira] Closed: (PIG-948) [Usability] Relating pig script with MR jobs

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-948.
--


 [Usability] Relating pig script with MR jobs
 

 Key: PIG-948
 URL: https://issues.apache.org/jira/browse/PIG-948
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.4.0
Reporter: Ashutosh Chauhan
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.7.0

 Attachments: pig-948-2.patch, pig-948-3.patch, PIG-948-4.patch, 
 PIG-948-5.patch, PIG-948-6.patch, pig-948.patch


 Currently its hard to find a way to relate pig script with specific MR job. 
 In a loaded cluster with multiple simultaneous job submissions, its not easy 
 to figure out which specific MR jobs were launched for a given pig script. If 
 Pig can provide this info, it will be useful to debug and monitor the jobs 
 resulting from a pig script.
 At the very least, Pig should be able to provide user the following 
 information
 1) Job id of the launched job.
 2) Complete web url of jobtracker running this job. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-977) exit status does not account for JOB_STATUS.TERMINATED

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-977.
--


 exit status does not account for JOB_STATUS.TERMINATED
 --

 Key: PIG-977
 URL: https://issues.apache.org/jira/browse/PIG-977
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-977.patch


 For determining the exit status of pig query, only JOB_STATUS.FAILED is being 
 used and status TERMINATED is ignored.
 I think the reason for this is that in  ExecJob.JOB_STATUS only FAILED and 
 COMPLETED are being used anywhere. Rest are unused. I think we should comment 
 out the unused parts for now to indicate that, or fix the code  for 
 determining success/failure in GruntParser. executeBatch 
 {code}
 public enum JOB_STATUS {
 QUEUED,
 RUNNING,
 SUSPENDED,
 TERMINATED,
 FAILED,
 COMPLETED,
 }
 {code}
 {code}
 private void executeBatch() throws IOException {
 if (mPigServer.isBatchOn()) {
 if (mExplain != null) {
 explainCurrentBatch();
 }
 if (!mLoadOnly) {
 ListExecJob jobs = mPigServer.executeBatch();
 for(ExecJob job: jobs) {
 ==   if (job.getStatus() == ExecJob.JOB_STATUS.FAILED) {
 mNumFailedJobs++;
 if (job.getException() != null) {
 LogUtils.writeLog(
   job.getException(), 
   
 mPigServer.getPigContext().getProperties().getProperty(pig.logfile), 
   log, 
   
 true.equalsIgnoreCase(mPigServer.getPigContext().getProperties().getProperty(verbose)),
   Pig Stack Trace);
 }
 }
 else {
 mNumSucceededJobs++;
 }
 }
 }
 }
 }
 {code}
 Any opinions ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-952) [Zebra] Make Zebra Version Same as Pig Version

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-952.
--


 [Zebra] Make Zebra Version Same as Pig Version
 --

 Key: PIG-952
 URL: https://issues.apache.org/jira/browse/PIG-952
 Project: Pig
  Issue Type: Improvement
  Components: build
Affects Versions: 0.4.0
Reporter: Gaurav Jain
Assignee: Gaurav Jain
Priority: Minor
 Fix For: 0.7.0


 Zebra release versions need to be same as Pig release versions for consistency

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-961) Integration with Hadoop 21

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-961.
--


 Integration with Hadoop 21
 --

 Key: PIG-961
 URL: https://issues.apache.org/jira/browse/PIG-961
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
Assignee: Ying He
 Fix For: 0.7.0

 Attachments: hadoop21.jar, PIG-961.patch, PIG-961.patch2


 Hadoop 21 is not yet released but we know that switch to new MR API is coming 
 there. This JIRA is for early integration with this API

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-966.
--


 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
 ---

 Key: PIG-966
 URL: https://issues.apache.org/jira/browse/PIG-966
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.7.0


 I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces 
 significantly.  See http://wiki.apache.org/pig/LoadStoreRedesignProposal for 
 full details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-990) Provide a way to pin LogicalOperator Options

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-990.
--


 Provide a way to pin LogicalOperator Options
 

 Key: PIG-990
 URL: https://issues.apache.org/jira/browse/PIG-990
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: pinned_options.patch, pinned_options_2.patch, 
 pinned_options_3.patch, pinned_options_4.patch, pinned_options_5.patch, 
 pinned_options_6.patch


 This is a proactive patch, setting up the groundwork for adding an optimizer.
 Some of the LogicalOperators have options. For example, LOJoin has a variety 
 of join types (regular, fr, skewed, merge), which can be set by the user or 
 chosen by a hypothetical optimizer.  If a user selects a join type, pig 
 philoophy guides us to always respect the user's choice and not explore 
 alternatives.  Therefore, we need a way to pin options.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-973) type resolution inconsistency

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-973.
--


 type resolution inconsistency
 -

 Key: PIG-973
 URL: https://issues.apache.org/jira/browse/PIG-973
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-973.patch


 This script works:
 A = load 'test' using PigStorage(':') as (name: chararray, age: int, gpa: 
 float);
 B = group A by age;
 C = foreach B {
D = filter A by gpa  2.5;
E = order A by name;
F = A.age;
describe F;
G = distinct F;
generate group, COUNT(D), MAX (E.name), MIN(G.$0);}
 dump C;
 This one produces an error:
 A = load 'test' using PigStorage(':') as (name: chararray, age: int, gpa: 
 float);
 B = group A by age;
 C = foreach B {
D = filter A by gpa  2.5;
E = order A by name;
F = A.age;
G = distinct F;
generate group, COUNT(D), MAX (E.name), MIN(G);}
 dump C;
 Notice the difference in how MIN is passed the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-980) Optimizing nested order bys

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-980.
--


 Optimizing nested order bys
 ---

 Key: PIG-980
 URL: https://issues.apache.org/jira/browse/PIG-980
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Fix For: 0.7.0


 Pig needs to take advantage of secondary sort in Hadoop to optimize nested 
 order bys.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1022.
---


 optimizer pushes filter before the foreach that generates column used by 
 filter
 ---

 Key: PIG-1022
 URL: https://issues.apache.org/jira/browse/PIG-1022
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1022-1.patch


 grunt l = load 'students.txt' using PigStorage() as (name:chararray, 
 gender:chararray, age:chararray, score:chararray);
 grunt f = foreach l generate name, gender, age,score, '200'  as 
 gid:chararray;
 grunt g = group f by (name, gid);
 grunt f2 = foreach g generate group.name as name: chararray, group.gid as 
 gid: chararray;
 grunt filt = filter f2 by gid == '200';
 grunt explain filt;
 In the plan generated filt is pushed up after the load and before the first 
 foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1045) Integration with Hadoop 20 New API

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1045.
---


 Integration with Hadoop 20 New API
 --

 Key: PIG-1045
 URL: https://issues.apache.org/jira/browse/PIG-1045
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1045.patch, PIG-1045.patch


 Hadoop 21 is not yet released but we know that switch to new MR API is coming 
 there. This JIRA is for early integration with the portion of this API that 
 has been implemented in Hadoop 20.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1053) Consider moving to Hadoop for local mode

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1053.
---


 Consider moving to Hadoop for local mode
 

 Key: PIG-1053
 URL: https://issues.apache.org/jira/browse/PIG-1053
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: hadoopLocal.patch


 We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1072.
---


 ReversibleLoadStoreFunc interface should be removed to enable different load 
 and store implementation classes to be used in a reversible manner
 ---

 Key: PIG-1072
 URL: https://issues.apache.org/jira/browse/PIG-1072
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1072.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1079) Modify merge join to use distributed cache to maintain the index

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1079.
---


 Modify merge join to use distributed cache to maintain the index
 

 Key: PIG-1079
 URL: https://issues.apache.org/jira/browse/PIG-1079
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1079.patch, PIG-1079.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1086) Nested sort by * throw exception

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1086.
---


 Nested sort by * throw exception
 

 Key: PIG-1086
 URL: https://issues.apache.org/jira/browse/PIG-1086
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Daniel Dai
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1086.patch


 The following script fail:
 A = load '1.txt' as (a0, a1, a2);
 B = group A by a0;
 C = foreach B { D = order A by *; generate group, D;};
 explain C;
 Here is the stack:
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
 at java.util.ArrayList.get(ArrayList.java:324)
 at 
 org.apache.pig.impl.logicalLayer.schema.Schema.getField(Schema.java:752)
 at 
 org.apache.pig.impl.logicalLayer.LOSort.getSortInfo(LOSort.java:332)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1365)
 at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:176)
 at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:43)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:69)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1274)
 at 
 org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:130)
 at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:45)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:234)
 at org.apache.pig.PigServer.compilePp(PigServer.java:864)
 at org.apache.pig.PigServer.explain(PigServer.java:583)
 ... 8 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1093) pig.properties file is missing from distributions

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1093.
---


 pig.properties file is missing from distributions
 -

 Key: PIG-1093
 URL: https://issues.apache.org/jira/browse/PIG-1093
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.5.0, 0.6.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.7.0

 Attachments: PIG-1093.patch


 pig.properties (in fact the entire conf directory) is not included in the 
 jars distributed as part of the 0.5 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1075) Error in Cogroup when key fields types don't match

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1075.
---


 Error in Cogroup when key fields types don't match
 --

 Key: PIG-1075
 URL: https://issues.apache.org/jira/browse/PIG-1075
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ankur
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1075.patch


 When Cogrouping 2 relations on multiple key fields, pig throws an error if 
 the corresponding types don't match. 
 Consider the following script:-
 A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int);
 B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int);
 C = CoGROUP A BY (a,b,c), B BY (a,b,c);
 D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B);
 describe D;
 dump D;
 The complete stack trace of the error thrown is
 Pig Stack Trace
 ---
 ERROR 1051: Cannot cast to Unknown
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to 
 describe schema for alias D
 at org.apache.pig.PigServer.dumpSchema(PigServer.java:436)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
 Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
 unexpected exception caused the validation to stop
 at 
 org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
 at 
 org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83)
 at org.apache.pig.PigServer.compileLp(PigServer.java:821)
 at org.apache.pig.PigServer.dumpSchema(PigServer.java:428)
 ... 6 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1060: Cannot resolve COGroup output schema
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463)
 at 
 org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372)
 at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
 ... 11 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1051: Cannot cast to Unknown
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552)
 at 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451)
 ... 16 more
 The error message does not help the user in identifying the issue clearly 
 especially if the pig script is large and complex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1101) Pig parser does not recognize its own data type in LIMIT statement

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1101.
---


 Pig parser does not recognize its own data type in LIMIT statement
 --

 Key: PIG-1101
 URL: https://issues.apache.org/jira/browse/PIG-1101
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
Priority: Minor
 Fix For: 0.7.0

 Attachments: pig-1101.patch


 I have a Pig script in which I specify the number of records to limit as a 
 long type. 
 {code}
 A = LOAD '/user/viraj/echo.txt' AS (txt:chararray);
 B = LIMIT A 10L;
 DUMP B;
 {code}
 I get a parser error:
 2009-11-21 02:25:51,100 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Encountered  LONGINTEGER 10L  at line 3, 
 column 13.
 Was expecting:
 INTEGER ...
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.generateParseException(QueryParser.java:8963)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.jj_consume_token(QueryParser.java:8839)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LimitClause(QueryParser.java:1656)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1280)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1017)
 In fact 10L seems to work in the foreach generate construct.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1088.
---


 change merge join and merge join indexer to work with new LoadFunc interface
 

 Key: PIG-1088
 URL: https://issues.apache.org/jira/browse/PIG-1088
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.7.0

 Attachments: PIG-1088.1.patch, PIG-1088.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1106) FR join should not spill

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1106.
---


 FR join should not spill
 

 Key: PIG-1106
 URL: https://issues.apache.org/jira/browse/PIG-1106
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: frjoin-nonspill.patch


 Currently, the values for the replicated side of the data are placed in a 
 spillable bag (POFRJoin near line 275). This does not make sense because the 
 whole point of the optimization is that the data on one side fits into 
 memory. We already have a non-spillable bag implemented 
 (NonSpillableDataBag.java) and we need to change FRJoin code to use it. And 
 of course need to do lots of testing to make sure that we don't spill but die 
 instead when we run out of memory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1099) [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1099.
---


 [zebra] version on APACHE trunk should be 0.7.0 to be in pace with PIG
 --

 Key: PIG-1099
 URL: https://issues.apache.org/jira/browse/PIG-1099
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Trivial
 Fix For: 0.7.0

 Attachments: PIG_1099.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1102) Collect number of spills per job

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1102.
---


 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch, PIG_1102.patch.1


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1103) refactor test-commit

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1103.
---


 refactor test-commit
 

 Key: PIG-1103
 URL: https://issues.apache.org/jira/browse/PIG-1103
 Project: Pig
  Issue Type: Task
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.7.0

 Attachments: PIG-1103.patch


 Due to the changes to the local mode, many tests are now taking longer. Need 
 to make sure that test-commit still finishes within 10 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1110) Handle compressed file formats -- Gz, BZip with the new proposal

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1110.
---


 Handle compressed file formats -- Gz, BZip with the new proposal
 

 Key: PIG-1110
 URL: https://issues.apache.org/jira/browse/PIG-1110
 Project: Pig
  Issue Type: Sub-task
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1110.patch, PIG-1110.patch, PIG_1110_Jeff.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1115) [zebra] temp files are not cleaned.

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1115.
---


 [zebra] temp files are not cleaned.
 ---

 Key: PIG-1115
 URL: https://issues.apache.org/jira/browse/PIG-1115
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Hong Tang
Assignee: Gaurav Jain
 Fix For: 0.7.0

 Attachments: PIG-1115.patch


 Temp files created by zebra during table creation are not cleaned where there 
 is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1117) Pig reading hive columnar rc tables

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1117.
---


 Pig reading hive columnar rc tables
 ---

 Key: PIG-1117
 URL: https://issues.apache.org/jira/browse/PIG-1117
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Gerrit Jansen van Vuuren
Assignee: Gerrit Jansen van Vuuren
 Fix For: 0.7.0

 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
 PIG-1117-0.7.0-new.patch, PIG-1117-0.7.0-reviewed.patch, 
 PIG-1117-0.7.0-reviewed.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, 
 PIG-117-v.0.7.0.patch


 I've coded a LoadFunc implementation that can read from Hive Columnar RC 
 tables, this is needed for a project that I'm working on because all our data 
 is stored using the Hive thrift serialized Columnar RC format. I have looked 
 at the piggy bank but did not find any implementation that could do this. 
 We've been running it on our cluster for the last week and have worked out 
 most bugs.
  
 There are still some improvements to be done but I would need  like setting 
 the amount of mappers based on date partitioning. Its been optimized so as to 
 read only specific columns and can churn through a data set almost 8 times 
 faster with this improvement because not all column data is read.
 I would like to contribute the class to the piggybank can you guide me in 
 what I need to do?
 I've used hive specific classes to implement this, is it possible to add this 
 to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1122.
---


 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1122.patch


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1131.
---


 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, pig-1131.patch, pig-1131.patch, 
 simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1140.
---


 [zebra] Use of Hadoop 2.0 APIs  
 

 Key: PIG-1140
 URL: https://issues.apache.org/jira/browse/PIG-1140
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Xuefu Zhang
 Fix For: 0.7.0

 Attachments: zebra.0209, zebra.0211, zebra.0212, zebra.0213


 Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
 upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1146) Inconsistent column pruning in LOUnion

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1146.
---


 Inconsistent column pruning in LOUnion
 --

 Key: PIG-1146
 URL: https://issues.apache.org/jira/browse/PIG-1146
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1146-1.patch, PIG-1146-2.patch


 This happens when we do a union on two relations, if one column comes from a 
 loader, the other matching column comes from a constant, and this column get 
 pruned. We prune for the one from loader and did not prune the constant. Thus 
 leaves union an inconsistent state. Here is a script:
 {code}
 a = load '1.txt' as (a0, a1:chararray, a2);
 b = load '2.txt' as (b0, b2);
 c = foreach b generate b0, 'hello', b2;
 d = union a, c;
 e = foreach d generate $0, $2;
 dump e;
 {code}
 1.txt: 
 {code}
 ulysses thompson64  1.90
 katie carson25  3.65
 {code}
 2.txt:
 {code}
 luke king   0.73
 holly davidson  2.43
 {code}
 expected output:
 (ulysses thompson,1.90)
 (katie carson,3.65)
 (luke king,0.73)
 (holly davidson,2.43)
 real output:
 (ulysses thompson,)
 (katie carson,)
 (luke king,0.73)
 (holly davidson,2.43)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1124) Unable to set Custom Job Name using the -Dmapred.job.name parameter

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1124.
---


 Unable to set Custom Job Name using the -Dmapred.job.name parameter
 ---

 Key: PIG-1124
 URL: https://issues.apache.org/jira/browse/PIG-1124
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
Priority: Minor
 Fix For: 0.7.0

 Attachments: pig-1124.patch


 As a Hadoop user I want to control the Job name for my analysis via the 
 command line using the following construct::
 java -cp pig.jar:$HADOOP_HOME/conf -Dmapred.job.name=hadoop_junkie 
 org.apache.pig.Main broken.pig
 -Dmapred.job.name should normally set my Hadoop Job name, but somehow during 
 the formation of the job.xml in Pig this information is lost and the job name 
 turns out to be:
 PigLatin:broken.pig
 The current workaround seems to be wiring it in the script itself, using the 
 following ( or using parameter substitution).
 set job.name 'my job'
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1149.
---


 Allow instantiation of SampleLoaders with parametrized LoadFuncs
 

 Key: PIG-1149
 URL: https://issues.apache.org/jira/browse/PIG-1149
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: pig_1149.patch, pig_1149_lsr-branch.patch


 Currently, it is not possible to instantiate a SampleLoader with something 
 like PigStorage(':').  We should allow passing parameters to the loaders 
 being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1136.
---


 [zebra] Map Split of Storage info do not allow for leading underscore char '_'
 --

 Key: PIG-1136
 URL: https://issues.apache.org/jira/browse/PIG-1136
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Priority: Minor
 Fix For: 0.7.0

 Attachments: pig-1136-xuefu-new.patch


 There is some user need to support that type of map keys. Pig's column does 
 not allow for leading underscore, but apparently no restriction is placed on 
 the map key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1153) [zebra] spliting columns at different levels in a complex record column into different column groups throws exception

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1153.
---


 [zebra] spliting columns at different levels in a complex record column into 
 different column groups throws exception
 -

 Key: PIG-1153
 URL: https://issues.apache.org/jira/browse/PIG-1153
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Xuefu Zhang
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1153.patch, PIG-1153.patch


 The following code sample:
   String strSch = r1:record(f1:int, f2:int), r2:record(f5:int, 
 r3:record(f3:float, f4));
   String strStorage = [r1.f1, r2.r3.f3, r2.f5]; [r1.f2, r2.r3.f4];
   Partition p = new Partition(schema.toString(), strStorage, null);
 gives the following exception:
 org.apache.hadoop.zebra.parser.ParseException: Different Split Types Set 
 on the same field: r2.f5

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1141) Make streaming work with the new load-store interfaces

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1141.
---


 Make streaming work with the new load-store interfaces 
 ---

 Key: PIG-1141
 URL: https://issues.apache.org/jira/browse/PIG-1141
 Project: Pig
  Issue Type: Sub-task
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1141.patch, PIG-1141.patch, PIG-1141.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1154) local mode fails when hadoop config directory is specified in classpath

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1154.
---


 local mode fails when hadoop config directory is specified in classpath
 ---

 Key: PIG-1154
 URL: https://issues.apache.org/jira/browse/PIG-1154
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Ankit Modi
 Fix For: 0.7.0

 Attachments: pig_1154.patch


 In local mode, the hadoop configuration should not be taken from the 
 classpath . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1148) Move splitable logic from pig latin to InputFormat

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1148.
---


 Move splitable logic from pig latin to InputFormat
 --

 Key: PIG-1148
 URL: https://issues.apache.org/jira/browse/PIG-1148
 Project: Pig
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG-1148.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1157.
---


 Sucessive replicated joins do not generate Map Reduce plan and fails due to 
 OOM
 ---

 Key: PIG-1157
 URL: https://issues.apache.org/jira/browse/PIG-1157
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, 
 replicatedjoinexplain.log


 Hi all,
  I have a script which does 2 replicated joins in succession. Please note 
 that the inputs do not exist on the HDFS.
 {code}
 A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c);
 A1 = FOREACH A GENERATE a;
 B = GROUP A1 BY a;
 C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y);
 D = JOIN C BY x, B BY group USING replicated;
 E = JOIN A BY a, D by x USING replicated;
 dump E;
 {code}
 2009-12-16 19:12:00,253 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 4
 2009-12-16 19:12:00,254 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - Merged 1 map-only splittees.
 2009-12-16 19:12:00,254 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - Merged 1 map-reduce splittees.
 2009-12-16 19:12:00,254 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - Merged 2 out of total 2 splittees.
 2009-12-16 19:12:00,254 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 2
 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. unable to create new native thread
 Details at logfile: pig_1260990666148.log
 Looking at the log file:
 Pig Stack Trace
 ---
 ERROR 2998: Unhandled internal error. unable to create new native thread
 java.lang.OutOfMemoryError: unable to create new native thread
 at java.lang.Thread.start0(Native Method)
 at java.lang.Thread.start(Thread.java:597)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773)
 at org.apache.pig.PigServer.store(PigServer.java:522)
 at org.apache.pig.PigServer.openIterator(PigServer.java:458)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
 
 If we want to look at the explain output, we find that there is no Map Reduce 
 plan that is generated. 
  Why is the M/R plan not generated?
 Attaching the script and explain output.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1161) Add missing apache headers to a few classes

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1161.
---


 Add missing apache headers to a few classes
 ---

 Key: PIG-1161
 URL: https://issues.apache.org/jira/browse/PIG-1161
 Project: Pig
  Issue Type: Task
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Trivial
 Fix For: 0.7.0

 Attachments: pig_missing_licenses.patch


 The following java classes are missing Apache License headers:
 StoreConfig
 MapRedUtil
 SchemaUtil
 TestDataBagAccess
 TestNullConstant
 TestSchemaUtil
 We should add the missing headers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1169.
---


 Top-N queries produce incorrect results when a store statement is added 
 between order by and limit statement
 

 Key: PIG-1169
 URL: https://issues.apache.org/jira/browse/PIG-1169
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1169.patch


 ??We tried to get top N results after a groupby and sort, and got different 
 results with or without storing the full sorted results. Here is a skeleton 
 of our pig script.??
 {code}
 raw_data = Load 'input_files' AS (f1, f2, ..., fn);
 grouped = group raw_data by (f1, f2);
 data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value;
 ordered = order data by value DESC parallel 10;
 topn = limit ordered 10;
 store ordered into 'outputdir/full';
 store topn into 'outputdir/topn';
 {code}
 ??With the statement 'store ordered ...', top N results are incorrect, but 
 without the statement, results are correct. Has anyone seen this before? I 
 know a similar bug has been fixed in the multi-query release. We are on pig 
 .4 and hadoop .20.1.??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1156.
---


 Add aliases to ExecJobs and PhysicalOperators
 -

 Key: PIG-1156
 URL: https://issues.apache.org/jira/browse/PIG-1156
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: pig_batchAliases.patch


 Currently, the way to use muti-query from Java is as follows:
 1.  pigServer.setBatchOn();
 2. register your queries with pigServer
 3. ListExecJob jobs = pigServer.executeBatch();
 4. for (ExecJob job : jobs) { IteratorTuple results = job.getResults(); }
 This will cause all stores to get evaluated in a single batch. However, there 
 is no way to identify which of the ExecJobs corresponds to which store.  We 
 should add aliases by which the stored relations are known to ExecJob in 
 order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1158) pig command line -M option doesn't support table union correctly (comma seperated paths)

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1158.
---


 pig command line -M option doesn't support table union correctly (comma 
 seperated paths)
 

 Key: PIG-1158
 URL: https://issues.apache.org/jira/browse/PIG-1158
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1158.patch


 for example, load (1.txt,2.txt) USING 
 org.apache.hadoop.zebra.pig.TableLoader()
 i see this errror from stand out:
 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: 
 hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/1.txt,2.txt does not 
 exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1176.
---


 Column Pruner issues in union of loader with and without schema
 ---

 Key: PIG-1176
 URL: https://issues.apache.org/jira/browse/PIG-1176
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1176-1.patch


 Column pruner for union could fail if one source of union have the schema and 
 the other does not have schema. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2);
 b = foreach a generate a0;
 c = load '2.txt';
 d = foreach c generate $0;
 e = union b, d;
 dump e;
 {code}
 However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1164) [zebra]smoke test

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1164.
---


 [zebra]smoke test
 -

 Key: PIG-1164
 URL: https://issues.apache.org/jira/browse/PIG-1164
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: PIG-1164.patch, PIG-SMOKE.patch, smoke.patch


 Change zebra build.xml file to add smoke target. 
 And env.sh and run script under zebra/src/test/smoke dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1170) [zebra] end to end test and stress test

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1170.
---


 [zebra] end to end test and stress test
 ---

 Key: PIG-1170
 URL: https://issues.apache.org/jira/browse/PIG-1170
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: e2eStress.patch


 Add test cases for zebra end 2 end test , stress test and  stress test 
 verification tool. 
 No unit test is needed for this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1187.
---


 UTF-8 (international code) breaks with loader when load with schema is 
 specified
 

 Key: PIG-1187
 URL: https://issues.apache.org/jira/browse/PIG-1187
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0


 I have a set of Pig statements which dump an international dataset.
 {code}
 INPUT_OBJECT = load 'internationalcode';
 describe INPUT_OBJECT;
 dump INPUT_OBJECT;
 {code}
 Sample output
 (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)})
 It works and dumps results but when I use a schema for loading it fails.
 {code}
 INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag 
 {T: tuple(label:chararray)});
 describe INPUT_OBJECT;
 {code}
 The error message is as follows:2010-01-14 02:23:27,320 FATAL 
 org.apache.hadoop.mapred.Child: Error running child : 
 org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop 
 caused by repeated empty string matches at line 1, column 21.
   at 
 org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620)
   at 
 org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569)
   at 
 org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651)
   at 
 org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152)
   at 
 org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100)
   at 
 org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382)
   at 
 org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
   at 
 org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68)
   at 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1171) Top-N queries produce incorrect results when followed by a cross statement

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1171.
---


 Top-N queries produce incorrect results when followed by a cross statement
 --

 Key: PIG-1171
 URL: https://issues.apache.org/jira/browse/PIG-1171
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1171.patch


 ??I am not sure if this is a bug, or something more subtle, but here is the 
 problem that I am having.??
 ??When I LOAD a dataset, change it with an ORDER, LIMIT it, then CROSS it 
 with itself, the results are not correct. I expect to see the cross of the 
 limited, ordered dataset, but instead I see the cross of the limited dataset. 
 Effectively, its like the LIMIT is being excluded.??
 ??Example code follows:??
 {code}
 A = load 'foo' as (f1:int, f2:int, f3:int); B = load 'foo' as (f1:int, 
 f2:int, f3:int);
 a = ORDER A BY f1 DESC;
 b = ORDER B BY f1 DESC;
 aa = LIMIT a 1;
 bb = LIMIT b 1;
 C = CROSS aa, bb;
 DUMP C;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1173) pig cannot be built without an internet connection

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1173.
---


 pig cannot be built without an internet connection
 --

 Key: PIG-1173
 URL: https://issues.apache.org/jira/browse/PIG-1173
 Project: Pig
  Issue Type: Bug
Reporter: Jeff Hodges
Assignee: Jeff Hodges
Priority: Minor
 Fix For: 0.7.0

 Attachments: offlinebuild-v2.patch, offlinebuild.patch


 Pig's build.xml does not allow for offline building even when it's been built 
 before. This is because the ivy-download target has not conditional 
 associated with it to turn it off. The Hadoop seems to be adding an 
 unless=offline to the ivy-download target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1184) PruneColumns optimization does not handle the case of foreach flatten correctly if flattened bag is not used later

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1184.
---


 PruneColumns optimization does not handle the case of foreach flatten 
 correctly if flattened bag is not used later
 --

 Key: PIG-1184
 URL: https://issues.apache.org/jira/browse/PIG-1184
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Pradeep Kamath
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1184-1.patch, PIG-1184-2.patch


 The following script :
 {noformat}
 -e a = load 'input.txt' as (f1:chararray, f2:chararray, 
 f3:bag{t:tuple(id:chararray)}, f4:bag{t:tuple(loc:chararray)}); b = foreach a 
 generate f1, f2, flatten(f3), flatten(f4), 10; b = foreach b generate f1, f2, 
 \$4; dump b;
 {noformat}
 gives the following result:
 (oiue,M,10)
 {noformat}
 cat input.txt:
 oiueM   {(3),(4)}   {(toronto),(montreal)}
 {noformat}
 If PruneColumns optimizations is disabled, we get the right result:
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 (oiue,M,10)
 The flatten results in 4 records - so the output should contain 4 records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1189.
---


 StoreFunc UDF should ship to the backend automatically without register
 -

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: multimapstore.pig, multireducestore.pig, 
 PIG-1189-1.patch, PIG-1189-2.patch, PIG-1189-3.patch, singlemapstore.pig, 
 singlereducestore.pig


 Pig should ship store UDF to backend even if user do not use register. The 
 prerequisite is that UDF should be in classpath on frontend. We make that 
 work for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), 
 we shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1194.
---


 ERROR 2055: Received Error while processing the map plan
 

 Key: PIG-1194
 URL: https://issues.apache.org/jira/browse/PIG-1194
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.5.0, 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: inputdata.txt, PIG-1194.patch, PIG-1294_1.patch


 I have a simple Pig script which takes 3 columns out of which one is null. 
 {code}
 input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3);
 a = GROUP input BY (((double) col3)/((double) col2)  .001 OR col1  11 ? 
 col1 : -1);
 b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, 
 SUM(input.col3) as  col3;
 store b into 'finalresult';
 {code}
 When I run this script I get the following error:
 ERROR 2055: Received Error while processing the map plan.
 org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received 
 Error while processing the map plan.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 
 A more useful error message for the purpose of debugging would be helpful.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1200) Using TableInputFormat in HBaseStorage

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1200.
---


 Using TableInputFormat in HBaseStorage
 --

 Key: PIG-1200
 URL: https://issues.apache.org/jira/browse/PIG-1200
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: Pig_1200.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1190.
---


 Handling of quoted strings in pig-latin/grunt commands
 --

 Key: PIG-1190
 URL: https://issues.apache.org/jira/browse/PIG-1190
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: correct-testcase.patch, pig-1190.patch, pig-1190_1.patch


 There is some inconsistency in the way quoted strings are used/handled in 
 pig-latin .
 In load/store and define-ship commands, files are specified in quoted strings 
 , and the file name is the content within the quotes.  But in case of 
 register, set, and file system commands , if string is specified in quotes, 
 the quotes are also included as part of the string. This is not only 
 inconsistent , it is also unintuitive. 
 This is also inconsistent with the way hdfs commandline (or bash shell) 
 interpret file names.
 For example, currently with the command - 
 set job.name 'job123'
 The job name set set to 'job123' (including the quotes) not job123 .
 This needs to be fixed, and above command should be considered equivalent to 
 - set job.name job123. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1198) [zebra] performance improvements

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1198.
---


 [zebra] performance improvements
 

 Key: PIG-1198
 URL: https://issues.apache.org/jira/browse/PIG-1198
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1198.patch, PIG-1198.patch, PIG-1198.patch


 Current input split generation is row-based split on individual TFiles. This 
 leaves undesired fact that even for TFiles smaller than one block one split 
 is still generated for each. Consequently, there will be many mappers, and 
 many waves, needed to handle the many small TFiles generated by as many 
 mappers/reducers that wrote the data. This issue can be addressed by 
 generating input splits that can include multiple TFiles. 
 For sorted tables, key distribution generation by table, which is used to 
 generated proper input splits, includes key distributions from column groups 
 even they are not in projection. This incurs extra cost to perform 
 unnecessary computations and, more inappropriately, creates unreasonable 
 results on input split generations; 
 For unsorted tables, when row split is generated on a union of tables, the 
 FileSplits are generated for each table and then lumped together to form the 
 final list of splits to Map/Reduce. This has a undesirable fact that number 
 of splits is subject to the number of tables in the table union and not just 
 controlled by the number of splits used by the Map/Reduce framework; 
 The input split's goal size is calculated on all column groups even if some 
 of them are not in projection; 
 For input splits of multiple files in one column group, all files are opened 
 at startup. This is unnecessary and takes unnecessarily resources from start 
 to end. The files should be opened when needed and closed when not; 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1203.
---


 Temporarily disable failed unit test in load-store-redesign branch which have 
 external dependency
 -

 Key: PIG-1203
 URL: https://issues.apache.org/jira/browse/PIG-1203
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1203-1.patch


 In load-store-redesign branch, two test suits, TestHBaseStorage and 
 TestCounters always fail. TestHBaseStorage depends on 
 https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on 
 future version of hadoop. We disable these two test suits temporarily, and 
 will enable them once the dependent issues are solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1209) Port POJoinPackage to proactively spill

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1209.
---


 Port POJoinPackage to proactively spill
 ---

 Key: PIG-1209
 URL: https://issues.apache.org/jira/browse/PIG-1209
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1209.patch


 POPackage proactively spills the bag whereas POJoinPackage still uses the 
 SpillableMemoryManager. We should port this to use InternalCacheBag which 
 proactively spills.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1212) LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1212.
---


 LogicalPlan.replaceAndAddSucessors produce wrong result when successors are 
 null
 

 Key: PIG-1212
 URL: https://issues.apache.org/jira/browse/PIG-1212
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1212-1.patch, PIG-1212-2.patch


 The following script throw a NPE:
 a = load '1.txt' as (a0:chararray);
 b = load '2.txt' as (b0:chararray);
 c = join a by a0, b by b0;
 d = filter c by a0 == 'a';
 explain d;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1204) Pig hangs when joining two streaming relations in local mode

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1204.
---


 Pig hangs when joining two streaming relations in local mode
 

 Key: PIG-1204
 URL: https://issues.apache.org/jira/browse/PIG-1204
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1204.patch


 The following script hangs running in local mode  when inpuf files contains 
 many lines (e.g. 10K). The same script works when runing in MR mode.
 {code}
 A = load 'input1' as (a0, a1, a2);
 B = stream A through `head -1` as (a0, a1, a2);
 C = load 'input2' as (a0, a1, a2);
 D = stream C through `head -1` as (a0, a1, a2);
 E = join B by a0, D by a0;
 dump E
 {code}  
 Here is one stack trace:
 Thread-13 prio=10 tid=0x09938400 nid=0x1232 in Object.wait() 
 [0x8fffe000..0x8030]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on 0x9b8e0a40 (a 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream)
 at java.lang.Object.wait(Object.java:485)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNextHelper(POStream.java:291)
 - locked 0x9b8e0a40 (a 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNext(POStream.java:214)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:272)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1207) [zebra] Data sanity check should be performed at the end of writing instead of later at query time

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1207.
---


 [zebra] Data sanity check should be performed at the end  of writing instead 
 of later at query time
 ---

 Key: PIG-1207
 URL: https://issues.apache.org/jira/browse/PIG-1207
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1207.patch, PIG-1207.patch


 Currently the equity check of number of rows across different column groups 
 are performed by the query. And the error info is sketchy and only emits a 
 Column groups are not evenly distributed, or worse,  throws an 
 IndexOufOfBound exception from CGScanner.getCGValue since BasicTable.atEnd 
 and BasicTable.getKey, which are called just before BasicTable.getValue, only 
 checks the first column group in projection and any discrepancy of the number 
 of rows per file cross multiple column groups in projection could have  
 BasicTable.atEnd  return false and BasicTable.getKey return a key normally 
 but another column group already exaust its current file and the call to its 
 CGScanner.getCGValue throw the exception. 
 This check should also be performed at the end of writing and the error info 
 should be more informational.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1215) Make Hadoop jobId more prominent in the client log

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1215.
---


 Make Hadoop jobId more prominent in the client log
 --

 Key: PIG-1215
 URL: https://issues.apache.org/jira/browse/PIG-1215
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1215.patch, pig-1215.patch, pig-1215_1.patch, 
 pig-1215_3.patch, pig-1215_4.patch


 This is a request from applications that want to be able to programmatically 
 parse client logs to find hadoop Ids.
 The woould like to see each job id on a separate line in the following format:
 hadoopJobId: job_123456789
 They would also like to see the jobs in the order they are executed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1216.
---


 New load store design does not allow Pig to validate inputs and outputs up 
 front
 

 Key: PIG-1216
 URL: https://issues.apache.org/jira/browse/PIG-1216
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1216.patch, pig-1216_1.patch


 In Pig 0.6 and before, Pig attempts to verify existence of inputs and 
 non-existence of outputs during parsing to avoid run time failures when 
 inputs don't exist or outputs can't be overwritten.  The downside to this was 
 that Pig assumed all inputs and outputs were HDFS files, which made 
 implementation harder for non-HDFS based load and store functions.  In the 
 load store redesign (PIG-966) this was delegated to InputFormats and 
 OutputFormats to avoid this problem and to make use of the checks already 
 being done in those implementations.  Unfortunately, for Pig Latin scripts 
 that run more then one MR job, this does not work well.  MR does not do 
 input/output verification on all the jobs at once.  It does them one at a 
 time.  So if a Pig Latin script results in 10 MR jobs and the file to store 
 to at the end already exists, the first 9 jobs will be run before the 10th 
 job discovers that the whole thing was doomed from the beginning.  
 To avoid this a validate call needs to be added to the new LoadFunc and 
 StoreFunc interfaces.  Pig needs to pass this method enough information that 
 the load function implementer can delegate to InputFormat.getSplits() and the 
 store function implementer to OutputFormat.checkOutputSpecs() if s/he decides 
 to.  Since 90% of all load and store functions use HDFS and PigStorage will 
 also need to, the Pig team should implement a default file existence check on 
 HDFS and make it available as a static method to other Load/Store function 
 implementers.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1217.
---


 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff, fix_top_udf.diff, fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1230) Streaming input in POJoinPackage should use nonspillable bag to collect tuples

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1230.
---


 Streaming input in POJoinPackage should use nonspillable bag to collect tuples
 --

 Key: PIG-1230
 URL: https://issues.apache.org/jira/browse/PIG-1230
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1230.patch, pig-1230_1.patch, pig-1230_2.patch


 Last table of join statement is streamed through instead of collecting all 
 its tuple in a bag. As a further optimization of that, tuples of that 
 relation are collected in chunks in a bag. Since we don't want to spill the 
 tuples from this bag, NonSpillableBag should be used to hold tuples for this 
 relation. Initially, DefaultDataBag was used, which was later changed to 
 InternalCachedBag as a part of PIG-1209.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1218) Use distributed cache to store samples

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1218.
---


 Use distributed cache to store samples
 --

 Key: PIG-1218
 URL: https://issues.apache.org/jira/browse/PIG-1218
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1218.patch, PIG-1218_2.patch, PIG-1218_3.patch


 Currently, in the case of skew join and order by we use sample that is just 
 written to the dfs (not distributed cache) and, as the result, get opened and 
 copied around more than necessary. This impacts query performance and also 
 places unnecesary load on the name node

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1220) Document unknown keywords as missing or to do in future

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1220.
---


 Document unknown keywords as missing or to do in future
 ---

 Key: PIG-1220
 URL: https://issues.apache.org/jira/browse/PIG-1220
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.7.0


 To get help at the grunt shell I do the following:
 grunttouchz
 010-02-04 00:59:28,714 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Encountered  IDENTIFIER touchz  at line 1, 
 column 1.
 Was expecting one of:
 EOF 
 cat ...
 fs ...
 cd ...
 cp ...
 copyFromLocal ...
 copyToLocal ...
 dump ...
 describe ...
 aliases ...
 explain ...
 help ...
 kill ...
 ls ...
 mv ...
 mkdir ...
 pwd ...
 quit ...
 register ...
 rm ...
 rmf ...
 set ...
 illustrate ...
 run ...
 exec ...
 scriptDone ...
  ...
 EOL ...
 ; ...
 I looked at the code and found that we do nothing at:
 scriptDone: Is there some future value of that command.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1241) Accumulator is turned on when a map is used with a non-accumulative UDF

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1241.
---


 Accumulator is turned on when a map is used with a non-accumulative UDF
 ---

 Key: PIG-1241
 URL: https://issues.apache.org/jira/browse/PIG-1241
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.7.0

 Attachments: accum.patch


 Exception is thrown for a script like the following:
 register /homes/yinghe/owl/string.jar;
 a = load 'a.txt' as (id, url);
 b = group  a by (id, url);
 c = foreach b generate  COUNT(a), (CHARARRAY) 
 string.URLPARSE(group.url)#'url';
 dump c;
 In this query, URLPARSE() is not accumulative, and it returns a map. 
 The accumulator optimizer failed to check UDF in this case, and tries to run 
 the job in accumulative mode. ClassCastException is thrown when trying to 
 cast UDF into Accumulator interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1224) Collected group should change to use new (internal) bag

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1224.
---


 Collected group should change to use new (internal) bag
 ---

 Key: PIG-1224
 URL: https://issues.apache.org/jira/browse/PIG-1224
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1224.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1240) [Zebra] suggestion to have zebra manifest file contain version and svn-revision etc.

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1240.
---


 [Zebra]  suggestion to have zebra manifest file contain version and 
 svn-revision etc.
 -

 Key: PIG-1240
 URL: https://issues.apache.org/jira/browse/PIG-1240
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Gaurav Jain
Assignee: Gaurav Jain
Priority: Minor
 Fix For: 0.7.0

 Attachments: PIG-1240.patch


 Zebra jars' manifest file sld  have zebra manifest file contain version and 
 svn-revision etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1226) Need to be able to register jars on the command line

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1226.
---


 Need to be able to register jars on the command line
 

 Key: PIG-1226
 URL: https://issues.apache.org/jira/browse/PIG-1226
 Project: Pig
  Issue Type: Bug
Reporter: Alan Gates
Assignee: Thejas M Nair
 Fix For: 0.7.0

 Attachments: PIG-1126.patch


 Currently 'register' can only be done inside a Pig Latin script.  Users often 
 run their scripts in different environments, so jar locations or versions may 
 change.  But they don't want to edit their script to fit each environment.  
 Instead they could register on the command line, something like:
 pig -Dpig.additional.jars=my.jar:your.jar script.pig
 These would not override registers in the Pig Latin script itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1248) [piggybank] useful String functions

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1248.
---


 [piggybank] useful String functions
 ---

 Key: PIG-1248
 URL: https://issues.apache.org/jira/browse/PIG-1248
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff


 Pig ships with very few evalFuncs for working with strings. This jira is for 
 adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1233) NullPointerException in AVG

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1233.
---


 NullPointerException in AVG 
 

 Key: PIG-1233
 URL: https://issues.apache.org/jira/browse/PIG-1233
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur
Assignee: Ankur
 Fix For: 0.7.0

 Attachments: jira-1233.patch


 The overridden method - getValue() in AVG throws null pointer exception in 
 case accumulate() is not called leaving variable 'intermediateCount'  
 initialized to null. This causes java to throw exception when it tries to 
 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1250) Make StoreFunc an abstract class and create a mirror interface called StoreFuncInterface

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1250.
---


 Make StoreFunc an abstract class and create a mirror interface called 
 StoreFuncInterface
 

 Key: PIG-1250
 URL: https://issues.apache.org/jira/browse/PIG-1250
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1250-2.patch, PIG-1250.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1238) Dump does not respect the schema

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1238.
---


 Dump does not respect the schema
 

 Key: PIG-1238
 URL: https://issues.apache.org/jira/browse/PIG-1238
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1238.patch


 For complex data type and certain sequence of operations dump produces 
 results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1243) Passing Complex map types to and from streaming causes a problem

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1243.
---


 Passing Complex map types to and from streaming causes a problem
 

 Key: PIG-1243
 URL: https://issues.apache.org/jira/browse/PIG-1243
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.7.0


 I have a program which generates different types of Maps fields and stores it 
 into PigStorage.
 {code}
 A = load '/user/viraj/three.txt' using PigStorage();
 B = foreach A generate ['a'#'12'] as b:map[], ['b'#['c'#'12']] as c, 
 ['c'#{(['d'#'15']),(['e'#'16'])}] as d;
 store B into '/user/viraj/pigtest' using PigStorage();
 {code}
 Now I test the previous output in the below script to make sure I have the 
 right results. I also pass this data to a Perl script and I observe that the 
 complex Map types I have generated, are lost when I get the result back.
 {code}
 DEFINE CMD `simple.pl` SHIP('simple.pl');
 A = load '/user/viraj/pigtest' using PigStorage() as (simpleFields, 
 mapFields, mapListFields);
 B = foreach A generate $0, $1, $2;
 dump B;
 C = foreach A generate  (chararray)simpleFields#'a' as value, $0,$1,$2;
 D = stream C through CMD as (a0:map[], a1:map[], a2:map[]);
 dump D;
 {code}
 dumping B results in:
 ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}])
 ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}])
 ([a#12],[b#[c#12]],[c#{([d#15]),([e#16])}])
 dumping D results in:
 ([a#12],,)
 ([a#12],,)
 ([a#12],,)
 The Perl script used here is:
 {code}
 #!/usr/local/bin/perl
 use warnings;
 use strict;
 while() {
 my($bc,$s,$m,$l)=split/\t/;
 print($s\t$m\t$l);
 }
 {code}
 Is there an issue with handling of complex Map fields within streaming? How 
 can I fix this to obtain the right result?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1234) Unable to create input slice for har:// files

2010-05-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1234.
---


 Unable to create input slice for har:// files
 -

 Key: PIG-1234
 URL: https://issues.apache.org/jira/browse/PIG-1234
 Project: Pig
  Issue Type: Bug
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1234.patch


 Tried to load har:// files
 {noformat}
 grunt a = LOAD 'har://hdfs-namenode/user/tsz/t20.har/t20' USING 
 PigStorage('\n') AS (line);
 grunt dump 
 {noformat}
 but pig says
 {noformat}
 2010-02-10 18:42:20,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2118:
  Unable to create input slice for: har://hdfs-namenode/user/tsz/t20.har/t20
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



<    1   2   3   4   5   6   7   8   9   10   >