[jira] Commented: (PIG-919) Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText when doing simple group

2009-08-13 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742748#action_12742748
 ] 

Ankur commented on PIG-919:
---

I have seem this issue in other places when the value coming out of a map[] is 
used in a group/cogroup/join. Pig throws a the same error. And Viraj is right, 
explicit casting to chararray alleviates the issue. But this is confusing for 
users. Pig should be converting "NullableText" to "NullableBytesWritable" 
automatically. Here is another sample script that throws an error. Exlicit 
casting to chararray resolves the issue

data = LOAD 'mydata' USING CustomLoader()  AS (f1:double, f2: map[])

dataProjected =  FOREACH data GENERATE f2#'Url' as url, f1 as rank

data2 = LOAD 'urlList' AS (url:bytearray);

grouped = COGROUP data BY url, data2 url Parallel 10;

STORE grouped INTO 'results'


> Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText when doing simple group
> --
>
> Key: PIG-919
> URL: https://issues.apache.org/jira/browse/PIG-919
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
> Fix For: 0.3.0
>
> Attachments: GenHashList.java, mapscript.pig, mymapudf.jar
>
>
> I have a Pig script, which takes in a student file and generates a bag of 
> maps.  I later want to group on the value of the key "name0" which 
> corresponds to the first name of the student.
> {code}
> register mymapudf.jar;
> data = LOAD '/user/viraj/studenttab10k' AS 
> (somename:chararray,age:long,marks:float);
> genmap = foreach data generate flatten(mymapudf.GenHashList(somename,' ')) as 
> bp:map[], age, marks;
> getfirstnames = foreach genmap generate bp#'name0' as firstname, age, marks;
> filternonnullfirstnames = filter getfirstnames by firstname is not null;
> groupgenmap = group filternonnullfirstnames by firstname;
> dump groupgenmap;
> {code}
> When I execute this code, I get an error in the Map Phase:
> ===
> java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:242)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-919) Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText when doing simple group

2009-08-13 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742846#action_12742846
 ] 

Jeff Zhang commented on PIG-919:


Viraj,
The error is because Pig Latin do not support declaring the type of map's key 
and value type.
In your UDF, the type of map's value is String.
But regarding your script, Pig can not guess what is the type of map's value. 
So default is bytearray which is do not consistent with the actual real type.

If you change you UDF to :  
{code}
HashMap pairs = new HashMap();
pairs.put(key, new DataByteArray(names[i]));
{code}
then it will be OK.


BTW, it really a shortage of Pig that Pig Latin can not support declaring the 
type of map's key and value type.
Hope, in the future it can support this.





> Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText when doing simple group
> --
>
> Key: PIG-919
> URL: https://issues.apache.org/jira/browse/PIG-919
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
> Fix For: 0.3.0
>
> Attachments: GenHashList.java, mapscript.pig, mymapudf.jar
>
>
> I have a Pig script, which takes in a student file and generates a bag of 
> maps.  I later want to group on the value of the key "name0" which 
> corresponds to the first name of the student.
> {code}
> register mymapudf.jar;
> data = LOAD '/user/viraj/studenttab10k' AS 
> (somename:chararray,age:long,marks:float);
> genmap = foreach data generate flatten(mymapudf.GenHashList(somename,' ')) as 
> bp:map[], age, marks;
> getfirstnames = foreach genmap generate bp#'name0' as firstname, age, marks;
> filternonnullfirstnames = filter getfirstnames by firstname is not null;
> groupgenmap = group filternonnullfirstnames by firstname;
> dump groupgenmap;
> {code}
> When I execute this code, I get an error in the Map Phase:
> ===
> java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:242)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
> ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-920) optimizing diamond queries

2009-08-13 Thread Olga Natkovich (JIRA)
optimizing diamond queries
--

 Key: PIG-920
 URL: https://issues.apache.org/jira/browse/PIG-920
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich


The following query

A = load 'foo';
B = filer A by $0>1;
C = filter A by $1 = 'foo';
D = COGROUP C by $0, B by $0;
..

does not get efficiently executed. Currently, it runs a map only job that 
basically reads and write the same data before doing the query processing.

Query where the data is loaded twice actually executed more efficiently.

This is not an uncommon query and we should fix this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-08-13 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742914#action_12742914
 ] 

Pradeep Kamath commented on PIG-845:


A couple of comments on new patch:
In MRCompiler.java, earlier there was code:

{code}
if(rightMROpr == null || rightMROpr.equals(curMROp))
 throw new MRCompilerException("Successor of right input not ...
{code}


{code}
if(curMROp.equals(rightMROpr)){
 int errCode = 2170;...
{code}

Do you also need to check rightMROpr == null here?

If index is empty it could mean one of the following two things:
1) Data for right input only has null for join key(s)
2) right input is empty
Are there any other reasons why the index would be empty?
In both these cases, join output would be empty - currently the code throws an 
exception
Should this change?
A unit test where right side input is empty would be a good one to add.






> PERFORMANCE: Merge Join
> ---
>
> Key: PIG-845
> URL: https://issues.apache.org/jira/browse/PIG-845
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Attachments: merge-join.patch
>
>
> Thsi join would work if the data for both tables is sorted on the join key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-917) [zebra]some issues on compression

2009-08-13 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742919#action_12742919
 ] 

Jing Huang commented on PIG-917:


Another issue is that pig loader should read the compress information from 
jobConf. 

> [zebra]some issues on compression
> -
>
> Key: PIG-917
> URL: https://issues.apache.org/jira/browse/PIG-917
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Jing Huang
> Fix For: 0.4.0
>
>
> These are zebra compression related issues:
> 1. ColumnGoupParser only recognize "gzip" not "gz". For example, if user 
> specify "compress by gz", it will throw 
> org.apache.hadoop.zebra.types.ParseException.
> 2. BasicTable.dumpInfo is wrong. It will always print "Compressor: lzo2" even 
> if the default compressor is "gz", or user specifies "compress by gzip".
> So we can not verify if the default compressor can be actually  over written. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-917) [zebra]some issues on compression

2009-08-13 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742920#action_12742920
 ] 

Jing Huang commented on PIG-917:


Oops, pig store not pig loader. :)

> [zebra]some issues on compression
> -
>
> Key: PIG-917
> URL: https://issues.apache.org/jira/browse/PIG-917
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Jing Huang
> Fix For: 0.4.0
>
>
> These are zebra compression related issues:
> 1. ColumnGoupParser only recognize "gzip" not "gz". For example, if user 
> specify "compress by gz", it will throw 
> org.apache.hadoop.zebra.types.ParseException.
> 2. BasicTable.dumpInfo is wrong. It will always print "Compressor: lzo2" even 
> if the default compressor is "gz", or user specifies "compress by gzip".
> So we can not verify if the default compressor can be actually  over written. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-08-13 Thread Chao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742927#action_12742927
 ] 

Chao Wang commented on PIG-918:
---

Already reviewed the fix.

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-913:
---

Status: Open  (was: Patch Available)

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913-2.patch, PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-913:
---

Attachment: PIG-913-2.patch

New patch include test case

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913-2.patch, PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-913:
---

Status: Patch Available  (was: Open)

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913-2.patch, PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-845) PERFORMANCE: Merge Join

2009-08-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-845:
-

Status: Open  (was: Patch Available)

> PERFORMANCE: Merge Join
> ---
>
> Key: PIG-845
> URL: https://issues.apache.org/jira/browse/PIG-845
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Attachments: merge-join.patch
>
>
> Thsi join would work if the data for both tables is sorted on the join key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-845) PERFORMANCE: Merge Join

2009-08-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-845:
-

Status: Patch Available  (was: Open)

Running through hudson. Release audit warning can be ignored.

> PERFORMANCE: Merge Join
> ---
>
> Key: PIG-845
> URL: https://issues.apache.org/jira/browse/PIG-845
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Attachments: merge-join.patch, merge-join.patch
>
>
> Thsi join would work if the data for both tables is sorted on the join key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-845) PERFORMANCE: Merge Join

2009-08-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-845:
-

Attachment: merge-join.patch

{code}
if(rightMROpr == null || rightMROpr.equals(curMROp))
 throw new MRCompilerException("Successor of right input not ...
{code}

Do you also need to check rightMROpr == null here?
>> I removed null check because that indicates that two preceding MROperator 
>> exists but one of them is null. This is highly unlikely and MRCompiler 
>> probably would have thrown exception while compiling those preceding 
>> physical operator. But I added the check back again in any case.

If index is empty it could mean one of the following two things:
1) Data for right input only has null for join key(s)
2) right input is empty
Are there any other reasons why the index would be empty?
In both these cases, join output would be empty - currently the code throws an 
exception
Should this change?
A unit test where right side input is empty would be a good one to add.
>> Exception thrown at that point is correct because if after reading index you 
>> get null object, its a bug. But there was problem dealing with empty right 
>> file nonetheless. I fixed that and added a test case for it as well.

Additionally, fixed findbugs warning.
Release audit warning is because of gold file addition for testing. Apache 
header cant be added in it. So, it can be ignored.

> PERFORMANCE: Merge Join
> ---
>
> Key: PIG-845
> URL: https://issues.apache.org/jira/browse/PIG-845
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Attachments: merge-join.patch, merge-join.patch
>
>
> Thsi join would work if the data for both tables is sorted on the join key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743014#action_12743014
 ] 

Hadoop QA commented on PIG-913:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416483/PIG-913-2.patch
  against trunk revision 803377.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 162 release audit warnings 
(more than the trunk's current 161 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/161/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/161/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/161/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/161/console

This message is automatically generated.

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913-2.patch, PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.j

Build failed in Hudson: Pig-Patch-minerva.apache.org #161

2009-08-13 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/161/

--
[...truncated 103138 lines...]
 [exec] [junit] 09/08/13 23:11:34 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:48924
 [exec] [junit] 09/08/13 23:11:34 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:47519
 [exec] [junit] 09/08/13 23:11:34 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/13 23:11:34 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/13 23:11:34 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:41305 to delete  blk_5449192116454872859_1004
 [exec] [junit] 09/08/13 23:11:34 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:35151 to delete  blk_-8414325160659496849_1005 
blk_5449192116454872859_1004 blk_-5536604562819844795_1006
 [exec] [junit] 09/08/13 23:11:35 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/13 23:11:35 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908132310_0002/job.jar. 
blk_-5542368915389250723_1012
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Receiving block 
blk_-5542368915389250723_1012 src: /127.0.0.1:33371 dest: /127.0.0.1:36269
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Receiving block 
blk_-5542368915389250723_1012 src: /127.0.0.1:60275 dest: /127.0.0.1:37552
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Receiving block 
blk_-5542368915389250723_1012 src: /127.0.0.1:38985 dest: /127.0.0.1:35151
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Received block 
blk_-5542368915389250723_1012 of size 1477824 from /127.0.0.1
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: PacketResponder 0 
for block blk_-5542368915389250723_1012 terminating
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:35151 is added to 
blk_-5542368915389250723_1012 size 1477824
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Received block 
blk_-5542368915389250723_1012 of size 1477824 from /127.0.0.1
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37552 is added to 
blk_-5542368915389250723_1012 size 1477824
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: PacketResponder 1 
for block blk_-5542368915389250723_1012 terminating
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Received block 
blk_-5542368915389250723_1012 of size 1477824 from /127.0.0.1
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: PacketResponder 2 
for block blk_-5542368915389250723_1012 terminating
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:36269 is added to 
blk_-5542368915389250723_1012 size 1477824
 [exec] [junit] 09/08/13 23:11:35 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908132310_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/13 23:11:35 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908132310_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908132310_0002/job.split. 
blk_8713625961704254739_1013
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Receiving block 
blk_8713625961704254739_1013 src: /127.0.0.1:33374 dest: /127.0.0.1:36269
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Receiving block 
blk_8713625961704254739_1013 src: /127.0.0.1:60278 dest: /127.0.0.1:37552
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Receiving block 
blk_8713625961704254739_1013 src: /127.0.0.1:39952 dest: /127.0.0.1:41305
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Received block 
blk_8713625961704254739_1013 of size 1837 from /127.0.0.1
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: PacketResponder 0 
for block blk_8713625961704254739_1013 terminating
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41305 is added to 
blk_8713625961704254739_1013 size 1837
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: Received block 
blk_8713625961704254739_1013 of size 1837 from /127.0.0.1
 [exec] [junit] 09/08/13 23:11:35 INFO dfs.DataNode: PacketResponder 1 
fo

[jira] Updated: (PIG-823) Hadoop Metadata Service

2009-08-13 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated PIG-823:
-

Attachment: owl.patch.gz
owl.filelist

owl.filelist : the output of an svn add contrib/owl after which the patch was 
generated. (includes listing of binaries yet to be attached as well)
owl.patch.gz : Owl-0.1 patch , to be patched from outside the contrib directory 
in pig.

(libraries and other binaries still yet to be attached)

> Hadoop Metadata Service
> ---
>
> Key: PIG-823
> URL: https://issues.apache.org/jira/browse/PIG-823
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
> Attachments: owl.filelist, owl.patch.gz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. 
> The goal of the system is to allow users and applications to register data 
> stored on HDFS, search for the data available on HDFS, and associate metadata 
> such as schema, statistics, etc. with a particular data unit or a data set 
> stored on HDFS. The initial goal is to provide a fairly generic, low level 
> abstraction that any user or application on HDFS can use to store an retrieve 
> metadata. Over time a higher level abstractions closely tied to particular 
> applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a 
> subproject within Hadoop. For now, the proposal is to make it a contrib to 
> Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-823) Hadoop Metadata Service

2009-08-13 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated PIG-823:
-

Attachment: owl_libdeps.tgz

owl_libdeps.tgz : libraries that extract to contrib/owl/java/lib/

> Hadoop Metadata Service
> ---
>
> Key: PIG-823
> URL: https://issues.apache.org/jira/browse/PIG-823
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
> Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. 
> The goal of the system is to allow users and applications to register data 
> stored on HDFS, search for the data available on HDFS, and associate metadata 
> such as schema, statistics, etc. with a particular data unit or a data set 
> stored on HDFS. The initial goal is to provide a fairly generic, low level 
> abstraction that any user or application on HDFS can use to store an retrieve 
> metadata. Over time a higher level abstractions closely tied to particular 
> applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a 
> subproject within Hadoop. For now, the proposal is to make it a contrib to 
> Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-823) Hadoop Metadata Service

2009-08-13 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated PIG-823:
-

Attachment: owl_otherdeps.tgz

owl_otherdeps.tgz : Other binaries 

> Hadoop Metadata Service
> ---
>
> Key: PIG-823
> URL: https://issues.apache.org/jira/browse/PIG-823
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
> Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, 
> owl_otherdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. 
> The goal of the system is to allow users and applications to register data 
> stored on HDFS, search for the data available on HDFS, and associate metadata 
> such as schema, statistics, etc. with a particular data unit or a data set 
> stored on HDFS. The initial goal is to provide a fairly generic, low level 
> abstraction that any user or application on HDFS can use to store an retrieve 
> metadata. Over time a higher level abstractions closely tied to particular 
> applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a 
> subproject within Hadoop. For now, the proposal is to make it a contrib to 
> Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-823) Hadoop Metadata Service

2009-08-13 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated PIG-823:
-

Status: Patch Available  (was: Open)

owl.filelist : the output of an svn add contrib/owl after which the patch was 
generated. (includes listing of binaries yet to be attached as well)
owl.patch.gz : Owl-0.1 patch , to be patched from outside the contrib directory 
in pig.
owl_libdeps.tgz : libraries that extract to contrib/owl/java/lib/
owl_otherdeps.tgz : Other binaries

--

All the .tgz files can be extracted from your pig root dir (they extract to 
contrib/owl from the working dir) after applying the patch.

This is our initial upload of the 0.1 implementation of owl.

> Hadoop Metadata Service
> ---
>
> Key: PIG-823
> URL: https://issues.apache.org/jira/browse/PIG-823
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
> Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, 
> owl_otherdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. 
> The goal of the system is to allow users and applications to register data 
> stored on HDFS, search for the data available on HDFS, and associate metadata 
> such as schema, statistics, etc. with a particular data unit or a data set 
> stored on HDFS. The initial goal is to provide a fairly generic, low level 
> abstraction that any user or application on HDFS can use to store an retrieve 
> metadata. Over time a higher level abstractions closely tied to particular 
> applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a 
> subproject within Hadoop. For now, the proposal is to make it a contrib to 
> Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode

2009-08-13 Thread Viraj Bhat (JIRA)
Strange use case for Join which produces different results in local and map 
reduce mode
---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
 Fix For: 0.3.0


I have script in this manner, loads from 2 files A.txt and B.txt
{code}
A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
C = JOIN A by a.a1, B by b.b1;
DESCRIBE C;
DUMP C;
{code}

A.txt contains the following lines:
{code}
(1,a)
(2,aa)
{code}


B.txt contains the following lines:
{code}
(1,b)
(2,bb)
{code}

Now running the above script in local and map reduce mode on Hadoop 18 & Hadoop 
20, produces the following:

Hadoop 18
=
(1,1)
(2,2)
=
Hadoop 20
=
(1,1)
(2,2)
=
Local Mode: Pig with Hadoop 18 jar release 
=
2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
/homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1002: Unable to store alias C
09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
=
Caused by: java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
at 
org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
at 
org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
at 
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
... 9 more
=
Local Mode: Pig with Hadoop 20 jar release
=
((1,a),(1,b))
((2,aa),(2,bb)
=

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode

2009-08-13 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-921:
---

Attachment: joinusecase.pig
B.txt
A.txt

Script with test data.

> Strange use case for Join which produces different results in local and map 
> reduce mode
> ---
>
> Key: PIG-921
> URL: https://issues.apache.org/jira/browse/PIG-921
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
> Environment: Hadoop 18 and Hadoop 20
>Reporter: Viraj Bhat
> Fix For: 0.3.0
>
> Attachments: A.txt, B.txt, joinusecase.pig
>
>
> I have script in this manner, loads from 2 files A.txt and B.txt
> {code}
> A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
> B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
> C = JOIN A by a.a1, B by b.b1;
> DESCRIBE C;
> DUMP C;
> {code}
> A.txt contains the following lines:
> {code}
> (1,a)
> (2,aa)
> {code}
> B.txt contains the following lines:
> {code}
> (1,b)
> (2,bb)
> {code}
> Now running the above script in local and map reduce mode on Hadoop 18 & 
> Hadoop 20, produces the following:
> Hadoop 18
> =
> (1,1)
> (2,2)
> =
> Hadoop 20
> =
> (1,1)
> (2,2)
> =
> Local Mode: Pig with Hadoop 18 jar release 
> =
> 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
> 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias C
> 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
> Details at logfile: 
> /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> =
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
> at 
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
> at 
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
> at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
> ... 9 more
> =
> Local Mode: Pig with Hadoop 20 jar release
> =
> ((1,a),(1,b))
> ((2,aa),(2,bb)
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #162

2009-08-13 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/162/

--
[...truncated 111497 lines...]
 [exec] [junit] 09/08/14 01:04:12 INFO dfs.DataNode: PacketResponder 0 
for block blk_-8444068200244587298_1011 terminating
 [exec] [junit] 09/08/14 01:04:12 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:46204 is added to 
blk_-8444068200244587298_1011 size 6
 [exec] [junit] 09/08/14 01:04:12 INFO dfs.DataNode: Received block 
blk_-8444068200244587298_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/14 01:04:12 INFO dfs.DataNode: PacketResponder 1 
for block blk_-8444068200244587298_1011 terminating
 [exec] [junit] 09/08/14 01:04:12 INFO dfs.DataNode: Received block 
blk_-8444068200244587298_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/14 01:04:12 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:53076 is added to 
blk_-8444068200244587298_1011 size 6
 [exec] [junit] 09/08/14 01:04:12 INFO dfs.DataNode: PacketResponder 2 
for block blk_-8444068200244587298_1011 terminating
 [exec] [junit] 09/08/14 01:04:12 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54103 is added to 
blk_-8444068200244587298_1011 size 6
 [exec] [junit] 09/08/14 01:04:12 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:39451
 [exec] [junit] 09/08/14 01:04:12 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:37275
 [exec] [junit] 09/08/14 01:04:12 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/14 01:04:12 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:54103 to delete  blk_2041339510669703375_1006 
blk_-6501275518667471236_1004
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:53076 to delete  blk_2041339510669703375_1006 
blk_-1345684358638650445_1005
 [exec] [junit] 09/08/14 01:04:13 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/14 01:04:13 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908140103_0002/job.jar. 
blk_-1132408246207808536_1012
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: Receiving block 
blk_-1132408246207808536_1012 src: /127.0.0.1:48256 dest: /127.0.0.1:53076
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: Receiving block 
blk_-1132408246207808536_1012 src: /127.0.0.1:49559 dest: /127.0.0.1:46204
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: Receiving block 
blk_-1132408246207808536_1012 src: /127.0.0.1:46577 dest: /127.0.0.1:39689
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: Received block 
blk_-1132408246207808536_1012 of size 1492596 from /127.0.0.1
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: PacketResponder 0 
for block blk_-1132408246207808536_1012 terminating
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: Received block 
blk_-1132408246207808536_1012 of size 1492596 from /127.0.0.1
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:39689 is added to 
blk_-1132408246207808536_1012 size 1492596
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: PacketResponder 1 
for block blk_-1132408246207808536_1012 terminating
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: Received block 
blk_-1132408246207808536_1012 of size 1492596 from /127.0.0.1
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.DataNode: PacketResponder 2 
for block blk_-1132408246207808536_1012 terminating
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:46204 is added to 
blk_-1132408246207808536_1012 size 1492596
 [exec] [junit] 09/08/14 01:04:13 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:53076 is added to 
blk_-1132408246207808536_1012 size 1492596
 [exec] [junit] 09/08/14 01:04:13 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908140103_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/14 01:04:13 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908140103_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/14 01:04:14 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/had

[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743045#action_12743045
 ] 

Hadoop QA commented on PIG-845:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416501/merge-join.patch
  against trunk revision 803377.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 13 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 162 release audit warnings 
(more than the trunk's current 161 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/162/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/162/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/162/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/162/console

This message is automatically generated.

> PERFORMANCE: Merge Join
> ---
>
> Key: PIG-845
> URL: https://issues.apache.org/jira/browse/PIG-845
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Attachments: merge-join.patch, merge-join.patch
>
>
> Thsi join would work if the data for both tables is sorted on the join key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #163

2009-08-13 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/163/

--
started
Building remotely on minerva.apache.org (Ubuntu)
Updating http://svn.apache.org/repos/asf/hadoop/pig/trunk
Fetching 'http://svn.apache.org/repos/asf/hadoop/nightly/test-patch' at -1 into 
'http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/trunk/test/bin'
 
At revision 804068
At revision 804068
no change for http://svn.apache.org/repos/asf/hadoop/pig/trunk since the 
previous build
no change for http://svn.apache.org/repos/asf/hadoop/nightly/test-patch since 
the previous build
[Pig-Patch-minerva.apache.org] $ /bin/bash /tmp/hudson3426357161446270002.sh
/home/hudson/tools/java/latest1.6/bin/java
Buildfile: build.xml

check-for-findbugs:

findbugs.check:

java5.check:

forrest.check:

hudson-test-patch:
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Testing patch for PIG-823.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Reverted 'test/org/apache/pig/test/TestMRCompiler.java'
 [exec] Reverted 'test/org/apache/pig/test/utils/LogicalPlanTester.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapBase.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MultiQueryOptimizer.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceOper.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/LogToPhyTranslationVisitor.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLoad.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POFRJoin.java'
 [exec] Reverted 
'src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java'
 [exec] Reverted 'src/org/apache/pig/impl/builtin/RandomSampleLoader.java'
 [exec] Reverted 
'src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt'
 [exec] Reverted 'src/org/apache/pig/impl/logicalLayer/LOJoin.java'
 [exec] Reverted 'src/org/apache/pig/impl/util/MultiMap.java'
 [exec] 
 [exec] Fetching external item into 'test/bin'
 [exec] Atest/bin/test-patch.sh
 [exec] Updated external to revision 804068.
 [exec] 
 [exec] Updated to revision 804068.
 [exec] PIG-823 patch is being downloaded at Fri Aug 14 02:05:59 UTC 2009 
from
 [exec] 
http://issues.apache.org/jira/secure/attachment/12416510/owl_otherdeps.tgz
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Pre-building trunk to determine trunk number
 [exec] of release audit, javac, and Findbugs warnings.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] /home/hudson/tools/ant/latest/bin/ant  
-Djava5.home=/home/hudson/tools/java/latest1.5 
-Dforrest.home=/home/nigel/tools/forrest/latest -DPigPatchProcess= releaseaudit 
> 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/patchprocess/trunkReleaseAuditWarnings.txt
  2>&1
 [exec] /home/hudson/tools/ant/latest/bin/ant  -Djavac.args=-Xlint 
-Xmaxwarns 1000 -Declipse.home=/home/nigel/tools/eclipse/latest 
-Djava5.home=/home/hudson/tools/java/latest1.5 
-Dforrest.home=/home/nigel/tools/forrest/latest -DPigPatchProcess= clean tar > 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/ws/patchprocess/trunkJavacWarnings.txt
  2>&1
 [exec] /home/hudson/tools/ant/latest/bin/ant  
-Dfindbugs.home=/home/nigel/tools/findbugs/latest 
-Djava5.home=/home/hudson/tools/java/latest1.5 
-Dforrest.home=/home/nigel/tools/forrest/latest -DPigPatchProcess= findbugs > 
/dev/null 2>&1
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
=

[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743053#action_12743053
 ] 

Hadoop QA commented on PIG-823:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416510/owl_otherdeps.tgz
  against trunk revision 803377.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/163/console

This message is automatically generated.

> Hadoop Metadata Service
> ---
>
> Key: PIG-823
> URL: https://issues.apache.org/jira/browse/PIG-823
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
> Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, 
> owl_otherdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. 
> The goal of the system is to allow users and applications to register data 
> stored on HDFS, search for the data available on HDFS, and associate metadata 
> such as schema, statistics, etc. with a particular data unit or a data set 
> stored on HDFS. The initial goal is to provide a fairly generic, low level 
> abstraction that any user or application on HDFS can use to store an retrieve 
> metadata. Over time a higher level abstractions closely tied to particular 
> applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a 
> subproject within Hadoop. For now, the proposal is to make it a contrib to 
> Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.