[jira] Created: (PIG-1170) [zebra] end to end test and stress test

2009-12-23 Thread Jing Huang (JIRA)
[zebra] end to end test and stress test
---

 Key: PIG-1170
 URL: https://issues.apache.org/jira/browse/PIG-1170
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


Add test cases for zebra end 2 end test , stress test and  stress test 
verification tool. 
No unit test is needed for this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1164) [zebra]smoke test

2009-12-23 Thread Jing Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-1164:


Attachment: smoke.patch

patch for zebra smoke test

 [zebra]smoke test
 -

 Key: PIG-1164
 URL: https://issues.apache.org/jira/browse/PIG-1164
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: smoke.patch


 Change zebra build.xml file to add smoke target. 
 And env.sh and run script under zebra/src/test/smoke dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1170) [zebra] end to end test and stress test

2009-12-23 Thread Jing Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-1170:


Attachment: e2eStress.patch

zebra e2e and stress test patch.
No unit test is need. 

 [zebra] end to end test and stress test
 ---

 Key: PIG-1170
 URL: https://issues.apache.org/jira/browse/PIG-1170
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: e2eStress.patch


 Add test cases for zebra end 2 end test , stress test and  stress test 
 verification tool. 
 No unit test is needed for this jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1164) [zebra]smoke test

2009-12-21 Thread Jing Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-1164:


Attachment: (was: smoke.patch)

 [zebra]smoke test
 -

 Key: PIG-1164
 URL: https://issues.apache.org/jira/browse/PIG-1164
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


 Change zebra build.xml file to add smoke target. 
 And env.sh and run script under zebra/src/test/smoke dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1164) [zebra]smoke test

2009-12-18 Thread Jing Huang (JIRA)
[zebra]smoke test
-

 Key: PIG-1164
 URL: https://issues.apache.org/jira/browse/PIG-1164
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


Change zebra build.xml file to add smoke target. 
And env.sh and run script under zebra/src/test/smoke dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




[jira] Updated: (PIG-1164) [zebra]smoke test

2009-12-18 Thread Jing Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-1164:


Attachment: smoke.patch

Patch for the zebra smoke test. 
No unit test needed for this patch. 
Only changed build.xml to add smoke target and added environment setup file. 

 [zebra]smoke test
 -

 Key: PIG-1164
 URL: https://issues.apache.org/jira/browse/PIG-1164
 Project: Pig
  Issue Type: Test
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0

 Attachments: smoke.patch


 Change zebra build.xml file to add smoke target. 
 And env.sh and run script under zebra/src/test/smoke dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1158) pig command line -M option doesn't support table union correctly (comma seperated paths)

2009-12-16 Thread Jing Huang (JIRA)
pig command line -M option doesn't support table union correctly (comma 
seperated paths)


 Key: PIG-1158
 URL: https://issues.apache.org/jira/browse/PIG-1158
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


for example, load (1.txt,2.txt) USING org.apache.hadoop.zebra.pig.TableLoader()
i see this errror from stand out:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: 
hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/1.txt,2.txt does not exist.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1159) merge join right side table does not support comma seperated paths

2009-12-16 Thread Jing Huang (JIRA)
merge join right side table does not support comma seperated paths
--

 Key: PIG-1159
 URL: https://issues.apache.org/jira/browse/PIG-1159
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


For example this is my script:(join_jira1.pig)

register /grid/0/dev/hadoopqa/jars/zebra.jar;

--a1 = load '1.txt' as (a:int, 
b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
--a2 = load '2.txt' as (a:int, 
b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);

--sort1 = order a1 by a parallel 6;
--sort2 = order a2 by a parallel 5;

--store sort1 into 'asort1' using 
org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
--store sort2 into 'asort2' using 
org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
--store sort1 into 'asort3' using 
org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
--store sort2 into 'asort4' using 
org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');

joinl = LOAD 'asort1,asort2' USING 
org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');

joinr = LOAD 'asort3,asort4' USING 
org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');


joina = join joinl by a, joinr by a using merge ;
dump joina;


==
here is the log:
Backend error message
-
java.lang.IllegalArgumentException: Pathname 
/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 
from 
hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
 is not a valid DFS filename.
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
at 
org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
at org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534)
at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)

Pig Stack Trace
---
ERROR 6015: During execution, encountered a Hadoop error.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias joina
at org.apache.pig.PigServer.openIterator(PigServer.java:482)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:386)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
During execution, encountered a Hadoop error.
at 
.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
at 
.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)at 
.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
at 
.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
at 
.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
at .apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534)
at .apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338)
at 

[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-11 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789188#action_12789188
 ] 

Jing Huang commented on PIG-1145:
-

found another failure on merge join
This merge join script failed:
register $zebraJar;
--fs -rmr $outputDir


--a1 = LOAD '$inputDir/unsorted1' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2');
--a2 = LOAD '$inputDir/unsorted2' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2');

--sort1 = order a1 by byte2;
--sort2 = order a2 by byte2;

--store sort1 into '$outputDir/100Msortedbyte21' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2];[byte2]');
--store sort2 into '$outputDir/100Msortedbyte22' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2];[byte2]');

rec1 = load '$outputDir/100Msortedbyte21' using 
org.apache.hadoop.zebra.pig.TableLoader('','sorted');
rec2 = load '$outputDir/100Msortedbyte22' using 
org.apache.hadoop.zebra.pig.TableLoader('','sorted');

joina = join rec1 by byte2, rec2 by byte2 using merge ;

E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as str2, 
$4 as byte2;

store E into '$outputDir/bad1' using 
org.apache.hadoop.zebra.pig.TableStorer('');
=
instead, this similiar script works with the previous patch:
register $zebraJar;
--fs -rmr $outputDir


a1 = LOAD '$inputDir/unsorted1' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2');
a2 = LOAD '$inputDir/unsorted2' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2');

sort1 = order a1 by byte2;
sort2 = order a2 by byte2;

store sort1 into '$outputDir/100Msortedbyte21' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2,byte2]');
store sort2 into '$outputDir/100Msortedbyte22' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2,byte2]');

rec1 = load '$outputDir/100Msortedbyte21' using 
org.apache.hadoop.zebra.pig.TableLoader('','sorted');
rec2 = load '$outputDir/100Msortedbyte22' using 
org.apache.hadoop.zebra.pig.TableLoader('','sorted');

joina = join rec1 by byte2, rec2 by byte2 using merge ;

E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as str2, 
$4 as byte2;

store E into '$outputDir/join3' using 
org.apache.hadoop.zebra.pig.TableStorer('');
~ 

Here is stack trace:
Backend error message
-
org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
processing right input during merge join
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
Caused by: java.io.EOFException: No key-value to read
at 
org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590)
at 
org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611)
at 
org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
at 
org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
at 
org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1083)
at 
org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
... 9 more
=
This is how I run it (i disabled pruning to simply the possible problem)
java -cp 
/grid/0/dev/hadoopqa/jing1234/conf:/grid/0/dev/hadoopqa/jars/pig.jar:/grid/0/dev/hadoopqa/jars/tfile.jar:/grid/0/dev/hadoopqa/jars/zebra.jar
 org.apache.pig.Main -m config -M -t PruneColumns bad_join.pig 



 [zebra] 

[jira] Commented: (PIG-1142) Got NullPointerException merge join with pruning

2009-12-10 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788949#action_12788949
 ] 

Jing Huang commented on PIG-1142:
-

Verified fix on pig 0.6.0 branch. 
Fix works.

 Got NullPointerException merge join with pruning
 

 Key: PIG-1142
 URL: https://issues.apache.org/jira/browse/PIG-1142
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1142-1.patch


 Here is my pig script:
 register $zebraJar;
 --fs -rmr $outputDir
 a1 = LOAD '$inputDir/small1' USING 
 org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
 a2 = LOAD '$inputDir/small2' USING 
 org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
 sort1 = order a1 by str2;
 sort2 = order a2 by str2;
 --store sort1 into '$outputDir/smallsorted11' using 
 org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
 --store sort2 into '$outputDir/smallsorted21' using 
 org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
 rec1 = load '$outputDir/smallsorted11' using 
 org.apache.hadoop.zebra.pig.TableLoader();
 rec2 = load '$outputDir/smallsorted21' using 
 org.apache.hadoop.zebra.pig.TableLoader();
 joina = join rec1 by str2, rec2 by str2 using merge ;
 E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
 str2;
 --limitedVals = LIMIT E 5;
 --dump limitedVals;
 store E into '$outputDir/smalljoin2' using 
 org.apache.hadoop.zebra.pig.TableStorer('');
 
 Here is the stacktrace:
 java.lang.NullPointerException at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:312)
  at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.extractKeysFromTuple(POMergeJoin.java:464)
  at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:341)
  at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
  at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
  at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
  at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
 org.apache.hadoop.mapred.Child.main(Child.java:159) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-10 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788952#action_12788952
 ] 

Jing Huang commented on PIG-1145:
-

I verified fix. 
It works.

 [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
 

 Key: PIG-1145
 URL: https://issues.apache.org/jira/browse/PIG-1145
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1145.patch


 Pig script :
 register $zebraJar;
 --fs -rmr $outputDir
 a1 = LOAD '$inputDir/unsorted1' USING 
 org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
 a2 = LOAD '$inputDir/unsorted2' USING 
 org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
 sort1 = order a1 by str2;
 sort2 = order a2 by str2;
 --store sort1 into '$outputDir/sorted11' using 
 org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
 --store sort2 into '$outputDir/sorted21' using 
 org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
 rec1 = load '$outputDir/sorted11' using 
 org.apache.hadoop.zebra.pig.TableLoader();
 rec2 = load '$outputDir/sorted21' using 
 org.apache.hadoop.zebra.pig.TableLoader();
 joina = join rec1 by str2, rec2 by str2 using merge ;
 --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
 str2;
 store joina into '$outputDir/join1' using 
 org.apache.hadoop.zebra.pig.TableStorer('');
 ~ 
   
 
 ~ 
   
 
 ~  
 ==
 stacktrace:
 org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
 processing right input during merge join at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
  at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
  at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
  at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
  at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
 org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
 org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
 java.io.EOFException: No key-value to read at 
 org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) 
 at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) 
 at 
 org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
  at 
 org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
  at 
 org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
  at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
  at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
  ... 7 more 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1142) Got NullPointerException merge join with pruning

2009-12-09 Thread Jing Huang (JIRA)
Got NullPointerException merge join with pruning


 Key: PIG-1142
 URL: https://issues.apache.org/jira/browse/PIG-1142
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


Here is my pig script:
register $zebraJar;
--fs -rmr $outputDir


a1 = LOAD '$inputDir/small1' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
a2 = LOAD '$inputDir/small2' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');

sort1 = order a1 by str2;
sort2 = order a2 by str2;

--store sort1 into '$outputDir/smallsorted11' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
--store sort2 into '$outputDir/smallsorted21' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');

rec1 = load '$outputDir/smallsorted11' using 
org.apache.hadoop.zebra.pig.TableLoader();
rec2 = load '$outputDir/smallsorted21' using 
org.apache.hadoop.zebra.pig.TableLoader();

joina = join rec1 by str2, rec2 by str2 using merge ;

E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as str2;

--limitedVals = LIMIT E 5;
--dump limitedVals;

store E into '$outputDir/smalljoin2' using 
org.apache.hadoop.zebra.pig.TableStorer('');




Here is the stacktrace:
java.lang.NullPointerException at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:312)
 at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.extractKeysFromTuple(POMergeJoin.java:464)
 at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:341)
 at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
 at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
org.apache.hadoop.mapred.Child.main(Child.java:159) 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-09 Thread Jing Huang (JIRA)
[zebra] merge join on large table ( 100,000.000 rows zebra table) failed


 Key: PIG-1145
 URL: https://issues.apache.org/jira/browse/PIG-1145
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang


Pig script :
register $zebraJar;
--fs -rmr $outputDir


a1 = LOAD '$inputDir/unsorted1' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
a2 = LOAD '$inputDir/unsorted2' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');

sort1 = order a1 by str2;
sort2 = order a2 by str2;

--store sort1 into '$outputDir/sorted11' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
--store sort2 into '$outputDir/sorted21' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');

rec1 = load '$outputDir/sorted11' using 
org.apache.hadoop.zebra.pig.TableLoader();
rec2 = load '$outputDir/sorted21' using 
org.apache.hadoop.zebra.pig.TableLoader();

joina = join rec1 by str2, rec2 by str2 using merge ;

--E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
str2;


store joina into '$outputDir/join1' using 
org.apache.hadoop.zebra.pig.TableStorer('');
~   

~   

~  
==
stacktrace:
org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
processing right input during merge join at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
 at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
 at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
java.io.EOFException: No key-value to read at 
org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) at 
org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) at 
org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
 at 
org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
 at 
org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
 at 
org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
 at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
 ... 7 more 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1121) [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully

2009-12-07 Thread Jing Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-1121:


Fix Version/s: (was: 0.6.0)
   0.7.0

 [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in 
 order to be able to store successfully
 

 Key: PIG-1121
 URL: https://issues.apache.org/jira/browse/PIG-1121
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


 In the following pig script, if user do 
 b =  foreach a generate m1#'a' ; 
 describe b will be:
 b: {bytearray}
 zebra store will fail, since there is no name passed to zebra, and zebra not 
 only need type but also name in order to store. 
 =
 If user do 
 b =  foreach a generate m1#'a' as ms1;
 describe b will be:
 b: {ms1: bytearray}
 Then zebra store can be succeeded. 
 =
 Here is the full pig script. 
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 a = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 b =  foreach a generate m1#'a' as ms1;
 describe b;
 store b into 'map1' using org.apache.hadoop.zebra.pig.TableStorer('');
 
 So, we should either fix it or document it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1120) [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint

2009-12-07 Thread Jing Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-1120:


Fix Version/s: (was: 0.6.0)
   0.7.0

 [zebra] should support  using org.apache.hadoop.zebra.pig.TableStorer() if 
 user does not want to specify storage hint
 -

 Key: PIG-1120
 URL: https://issues.apache.org/jira/browse/PIG-1120
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


 If user doesn't want to specify storage hint, current zebra implementation 
 only support  using org.apache.hadoop.zebra.pig.TableStorer('')  Note: empty 
 string in TableStorer(' ').
 We should support the format of  using 
 org.apache.hadoop.zebra.pig.TableStorer() as we do on  using 
 org.apache.hadoop.zebra.pig.TableLoader()
 sample pig script:
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 a = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 b = load '2.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 c = join a by a, b by a;
 d = foreach c generate a::a, a::b, b::c;
 describe d;
 dump d;
 store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer('');
 --this will fail
 --store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer( );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1120) [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint

2009-12-07 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787083#action_12787083
 ] 

Jing Huang commented on PIG-1120:
-

 for apache trunck

 [zebra] should support  using org.apache.hadoop.zebra.pig.TableStorer() if 
 user does not want to specify storage hint
 -

 Key: PIG-1120
 URL: https://issues.apache.org/jira/browse/PIG-1120
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


 If user doesn't want to specify storage hint, current zebra implementation 
 only support  using org.apache.hadoop.zebra.pig.TableStorer('')  Note: empty 
 string in TableStorer(' ').
 We should support the format of  using 
 org.apache.hadoop.zebra.pig.TableStorer() as we do on  using 
 org.apache.hadoop.zebra.pig.TableLoader()
 sample pig script:
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 a = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 b = load '2.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 c = join a by a, b by a;
 d = foreach c generate a::a, a::b, b::c;
 describe d;
 dump d;
 store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer('');
 --this will fail
 --store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer( );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1121) [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully

2009-12-07 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787084#action_12787084
 ] 

Jing Huang commented on PIG-1121:
-

for apache trunk

 [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in 
 order to be able to store successfully
 

 Key: PIG-1121
 URL: https://issues.apache.org/jira/browse/PIG-1121
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.7.0


 In the following pig script, if user do 
 b =  foreach a generate m1#'a' ; 
 describe b will be:
 b: {bytearray}
 zebra store will fail, since there is no name passed to zebra, and zebra not 
 only need type but also name in order to store. 
 =
 If user do 
 b =  foreach a generate m1#'a' as ms1;
 describe b will be:
 b: {ms1: bytearray}
 Then zebra store can be succeeded. 
 =
 Here is the full pig script. 
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 a = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 b =  foreach a generate m1#'a' as ms1;
 describe b;
 store b into 'map1' using org.apache.hadoop.zebra.pig.TableStorer('');
 
 So, we should either fix it or document it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1119) [zebra] group is a Pig preserved word, zebra needs to use other string for table's group information

2009-12-03 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785662#action_12785662
 ] 

Jing Huang commented on PIG-1119:
-

Here is the pig script for the test:(group_wrong.pig)

register /grid/0/dev/hadoopqa/jars/zebra.jar;
A = load 'filter.txt' as (name:chararray, age:int);
B = group A by name;
C = foreach B generate group, COUNT(A.name) as cnt;
Store C into 'group1' using 
org.apache.hadoop.zebra.pig.TableStorer('[group];[cnt]');

===
Here is the error message before the fix:
ackend error message during job submission
---
java.io.IOException: ColumnGroup.Writer constructor failed : Partition 
constructor failed :Encountered  group group  at line 1, column 2.
Was expecting one of:
IDENTIFIER ...
] ...

at 
org.apache.hadoop.zebra.io.BasicTable$Writer.init(BasicTable.java:1259)
at 
org.apache.hadoop.zebra.pig.TableOutputFormat.checkOutputSpecs(TableStorer.java:135)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at 
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at 
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:619)

Pig Stack Trace
---
ERROR 2997: Unable to recreate exception from backend error: 
java.io.IOException: ColumnGroup.Writer constructor failed : Partition 
constructor failed :Encountered  group group  at line 1, column 2.

org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to 
recreate exception from backend error: java.io.IOException: ColumnGroup.Writer 
constructor failed : Partition constructor failed :Encountered  group group 
 at line 1, column 2.
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:176)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:253)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780)
at org.apache.pig.PigServer.execute(PigServer.java:773)
at org.apache.pig.PigServer.access$100(PigServer.java:89)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:951)
at org.apache.pig.PigServer.executeBatch(PigServer.java:248)
at 
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:386)


Now with the patch we can successfully create the table with the same script.
$HADOOP_HOME/bin/hadoop fs -cat group1/.btschema
[group];[cnt]cnt:long
 group:strincnt:long 

 [zebra] group is a Pig preserved word, zebra needs to use other string for 
 table's group information
 --

 Key: PIG-1119
 URL: https://issues.apache.org/jira/browse/PIG-1119
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.6.0

 Attachments: PIG-1119.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1120) [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint

2009-12-02 Thread Jing Huang (JIRA)
[zebra] should support  using org.apache.hadoop.zebra.pig.TableStorer() if user 
does not want to specify storage hint
-

 Key: PIG-1120
 URL: https://issues.apache.org/jira/browse/PIG-1120
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.6.0


If user doesn't want to specify storage hint, current zebra implementation only 
support  using org.apache.hadoop.zebra.pig.TableStorer('')  Note: empty string 
in TableStorer(' ').

We should support the format of  using 
org.apache.hadoop.zebra.pig.TableStorer() as we do on  using 
org.apache.hadoop.zebra.pig.TableLoader()

sample pig script:
register /grid/0/dev/hadoopqa/jars/zebra.jar;
a = load '1.txt' as (a:int, 
b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);

b = load '2.txt' as (a:int, 
b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);


c = join a by a, b by a;
d = foreach c generate a::a, a::b, b::c;
describe d;
dump d;
store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer('');
--this will fail
--store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer( );


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1121) [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully

2009-12-02 Thread Jing Huang (JIRA)
[zebre] zebra user forces pig script to have 'as xxx' in foreach statement in 
order to be able to store successfully


 Key: PIG-1121
 URL: https://issues.apache.org/jira/browse/PIG-1121
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.6.0


In the following pig script, if user do 
b =  foreach a generate m1#'a' ; 

describe b will be:
b: {bytearray}
zebra store will fail, since there is no name passed to zebra, and zebra not 
only need type but also name in order to store. 

=
If user do 
b =  foreach a generate m1#'a' as ms1;

describe b will be:
b: {ms1: bytearray}

Then zebra store can be succeeded. 

=
Here is the full pig script. 
register /grid/0/dev/hadoopqa/jars/zebra.jar;
a = load '1.txt' as (a:int, 
b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);

b =  foreach a generate m1#'a' as ms1;
describe b;

store b into 'map1' using org.apache.hadoop.zebra.pig.TableStorer('');



So, we should either fix it or document it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

2009-12-01 Thread Jing Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-:


Description: 
Zebra enables application to stream data into different zebra table instances.

New Interface added:

setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class? 
extends ZebraOutputPartitioner theClass.

Zebra maintains a list of tables instances based on commaseparatedlocations ( 
in that order )

ZebraOutputPartitioner interface has getOutputPartition method which is 
implemented by the application. It will return an index into the list. Zebra 
will write to that instance

We also introduce a new mapred property for setting multiple outputs.

mapred.lib.table.multi.output.dirs
 

  was:

Zebra enables application to stream data into different zebra table instances.

New Interface added:

setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class? 
extends ZebraOutputPartitioner theClass.

Zebra maintains a list of tables instances based on commaseparatedlocations ( 
in that order )

ZebraOutputPartitioner interface has getOutputPartition method which is 
implemented by the application. It will return an index into the list. Zebra 
will write to that instance

We also introduce a new mapred property for setting multiple outputs.

mapred.lib.table.multi.output.dirs
 

Summary: [Zebra] multiple outputs support  (was: [Zebra])

 [Zebra] multiple outputs support
 

 Key: PIG-
 URL: https://issues.apache.org/jira/browse/PIG-
 Project: Pig
  Issue Type: New Feature
Reporter: Gaurav Jain
Assignee: Gaurav Jain
 Fix For: 0.6.0, 0.7.0


 Zebra enables application to stream data into different zebra table instances.
 New Interface added:
 setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class? 
 extends ZebraOutputPartitioner theClass.
 Zebra maintains a list of tables instances based on commaseparatedlocations ( 
 in that order )
 ZebraOutputPartitioner interface has getOutputPartition method which is 
 implemented by the application. It will return an index into the list. Zebra 
 will write to that instance
 We also introduce a new mapred property for setting multiple outputs.
 mapred.lib.table.multi.output.dirs
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1119) [zebra] group is a Pig preserved word, zebra needs to use other string for table's group information

2009-12-01 Thread Jing Huang (JIRA)
[zebra] group is a Pig preserved word, zebra needs to use other string for 
table's group information
--

 Key: PIG-1119
 URL: https://issues.apache.org/jira/browse/PIG-1119
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.6.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1056) table can not be loaded after store

2009-10-27 Thread Jing Huang (JIRA)
table can not be loaded after store
---

 Key: PIG-1056
 URL: https://issues.apache.org/jira/browse/PIG-1056
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang


Pig Stack Trace
---
ERROR 1018: Problem determining schema during load

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
parsing. Problem determining schema during load
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1023)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:967)
at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:716)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:397)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem 
determining schema during load
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734)
at 
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1017)
... 8 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018: 
Problem determining schema during load
at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732)
... 10 more
Caused by: java.io.IOException: No table specified for input
at 
org.apache.hadoop.zebra.pig.TableLoader.checkConf(TableLoader.java:238)
at 
org.apache.hadoop.zebra.pig.TableLoader.determineSchema(TableLoader.java:258)
at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148)
... 11 more

~   
  

script:
register /grid/0/dev/hadoopqa/hadoop/lib/zebra.jar;
A = load 'filter.txt' as (name:chararray, age:int);

B = filter A by age  20;
--dump B;
store B into 'filter1' using 
org.apache.hadoop.zebra.pig.TableStorer('[name];[age]');
rec1 = load 'B' using org.apache.hadoop.zebra.pig.TableLoader();
dump rec1;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-996) [zebra] Zebra build script does not have findbugs and clover targets.

2009-10-20 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767905#action_12767905
 ] 

Jing Huang commented on PIG-996:


+1
New patch reviewed.

 [zebra] Zebra build script does not have findbugs and clover targets.
 -

 Key: PIG-996
 URL: https://issues.apache.org/jira/browse/PIG-996
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: patch_build, patch_build


 Zebra build script does not have findbugs and clover targets, leading hudson 
 build process to fail on Zebra.
 This jira is to fix this by adding these two targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1026) [zebra] map split returns null

2009-10-20 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767983#action_12767983
 ] 

Jing Huang commented on PIG-1026:
-

Created a customer scenario with this schema and storage hint: 
(TestJira1026.java)

 final static String STR_SCHEMA = bcookie:bytes,yuid:bytes, 
ip:bytes,query_term:bytes,clickinfo:map(String),demog:map(String),page_params:map(String),viewinfo:collection(f1:map(String));
   

 final static String STR_STORAGE = 
[bcookie,yuid,ip,query_term];[clickinfo#{pos|sec|slk|targurl|cost|gpos},page_params#{ipc|vtestid|frcode|pagenum|query}];[clickinfo,page_params,demog];[viewinfo];
 
Got NullPointExcepiton.

 [zebra] map split returns null
 --

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0

 Attachments: MultipleKeyInMapSplitException.patch


 Here is the test scenario:
  final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
   //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1];
  final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1,m2];
 projection: String projection2 = new String(m1#{b}, m2#{x|z});
 User got null pointer exception on reading m1#{b}.
 Yan, please refer to the test class:
 TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1026) [zebra] map split returns null

2009-10-16 Thread Jing Huang (JIRA)
[zebra] map split returns null
--

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0


Here is the test scenario:
 final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
  //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
m2#{z}];[m1];
 final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
m2#{z}];[m1,m2];

projection: String projection2 = new String(m1#{b}, m2#{x|z});
User got null pointer exception on reading m1#{b}.

Yan, please refer to the test class:
TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-996) [zebra] Zebra build script does not have findbugs and clover targets.

2009-10-07 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763324#action_12763324
 ] 

Jing Huang commented on PIG-996:


+1 
Patch reviewed.

 [zebra] Zebra build script does not have findbugs and clover targets.
 -

 Key: PIG-996
 URL: https://issues.apache.org/jira/browse/PIG-996
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: patch_build


 Zebra build script does not have findbugs and clover targets, leading hudson 
 build process to fail on Zebra.
 This jira is to fix this by adding these two targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-11 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754353#action_12754353
 ] 

Jing Huang commented on PIG-949:


Thanks Alok. 
I am able to reproduce the problem. 
I was only using i/o layer (not pig loader) to test map split. 
This is what I did:
  final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
  final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
m2#{z}];[m1];
...create table and insert data ..

load:  String projection = new String(m1#{a});

I only got null returned. 



Without storage hint [m1], everything works fine. , i.e. 
 final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];
 ...create table and insert data ..
load:  String projection = new String(m1#{a});
I am able to get value m1#{a}. 

Zebra team is working on the fix.



 Zebra Bug: splitting map into multiple column group using storage hint causes 
 unexpected behaviour
 --

 Key: PIG-949
 URL: https://issues.apache.org/jira/browse/PIG-949
 Project: Pig
  Issue Type: Bug
 Environment: linux
Reporter: Alok Singh

 Hi 
  The storage hint
 specification plays a important part whether the output table is readable or 
 not
 say if we have have the map 'map'.
 One can split the map into a column group using [map#{k1}, map#{k2}...] 
 however the remaining map field will automatically be added to the default 
 group.
 if user try to create a new column group for the remaining fields as follows
 [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
 the table writer will create the table.
 however, if one tries to load the created table via pig or via map reduce 
 using TableInputFormat
  
 then the reader  have problem reading the map
 We get the following stack trace
 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
 attempt_200908191538_33939_m_21_2, Status : FAILED
 java.io.IOException: getValue() failed: null
 at 
 org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-09-01 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750093#action_12750093
 ] 

Jing Huang commented on PIG-833:


Hi Yongqiang, 
Sorry for the late reply. I was out of town last week. 
Right, SF_F is not defined in the schema, query a none-existing column is 
allowed and it will return null.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-19 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745125#action_12745125
 ] 

Jing Huang commented on PIG-833:


Zebra supports int, long, float, double, bool, collection (equivalent to Pig 
Bag), map, record (equivalent to Pig Tuple), string, bytes (equivalent to Pig 
Bytearray)

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-917) [zebra]some issues on compression

2009-08-13 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742920#action_12742920
 ] 

Jing Huang commented on PIG-917:


Oops, pig store not pig loader. :)

 [zebra]some issues on compression
 -

 Key: PIG-917
 URL: https://issues.apache.org/jira/browse/PIG-917
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Jing Huang
 Fix For: 0.4.0


 These are zebra compression related issues:
 1. ColumnGoupParser only recognize gzip not gz. For example, if user 
 specify compress by gz, it will throw 
 org.apache.hadoop.zebra.types.ParseException.
 2. BasicTable.dumpInfo is wrong. It will always print Compressor: lzo2 even 
 if the default compressor is gz, or user specifies compress by gzip.
 So we can not verify if the default compressor can be actually  over written. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-917) [zebra]some issues on compression

2009-08-12 Thread Jing Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Huang updated PIG-917:
---

Affects Version/s: (was: 0.1.0)
   0.3.0
Fix Version/s: (was: 0.2.0)
   0.4.0

 [zebra]some issues on compression
 -

 Key: PIG-917
 URL: https://issues.apache.org/jira/browse/PIG-917
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Jing Huang
 Fix For: 0.4.0


 These are zebra compression related issues:
 1. ColumnGoupParser only recognize gzip not gz. For example, if user 
 specify compress by gz, it will throw 
 org.apache.hadoop.zebra.types.ParseException.
 2. BasicTable.dumpInfo is wrong. It will always print Compressor: lzo2 even 
 if the default compressor is gz, or user specifies compress by gzip.
 So we can not verify if the default compressor can be actually  over written. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.