[jira] Created: (PIG-1170) [zebra] end to end test and stress test
[zebra] end to end test and stress test --- Key: PIG-1170 URL: https://issues.apache.org/jira/browse/PIG-1170 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Add test cases for zebra end 2 end test , stress test and stress test verification tool. No unit test is needed for this jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1164) [zebra]smoke test
[ https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1164: Attachment: smoke.patch patch for zebra smoke test [zebra]smoke test - Key: PIG-1164 URL: https://issues.apache.org/jira/browse/PIG-1164 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Attachments: smoke.patch Change zebra build.xml file to add smoke target. And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1170) [zebra] end to end test and stress test
[ https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1170: Attachment: e2eStress.patch zebra e2e and stress test patch. No unit test is need. [zebra] end to end test and stress test --- Key: PIG-1170 URL: https://issues.apache.org/jira/browse/PIG-1170 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Attachments: e2eStress.patch Add test cases for zebra end 2 end test , stress test and stress test verification tool. No unit test is needed for this jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1164) [zebra]smoke test
[ https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1164: Attachment: (was: smoke.patch) [zebra]smoke test - Key: PIG-1164 URL: https://issues.apache.org/jira/browse/PIG-1164 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Change zebra build.xml file to add smoke target. And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1164) [zebra]smoke test
[zebra]smoke test - Key: PIG-1164 URL: https://issues.apache.org/jira/browse/PIG-1164 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Change zebra build.xml file to add smoke target. And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1164) [zebra]smoke test
[ https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1164: Attachment: smoke.patch Patch for the zebra smoke test. No unit test needed for this patch. Only changed build.xml to add smoke target and added environment setup file. [zebra]smoke test - Key: PIG-1164 URL: https://issues.apache.org/jira/browse/PIG-1164 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Attachments: smoke.patch Change zebra build.xml file to add smoke target. And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1158) pig command line -M option doesn't support table union correctly (comma seperated paths)
pig command line -M option doesn't support table union correctly (comma seperated paths) Key: PIG-1158 URL: https://issues.apache.org/jira/browse/PIG-1158 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 for example, load (1.txt,2.txt) USING org.apache.hadoop.zebra.pig.TableLoader() i see this errror from stand out: [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/1.txt,2.txt does not exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1159) merge join right side table does not support comma seperated paths
merge join right side table does not support comma seperated paths -- Key: PIG-1159 URL: https://issues.apache.org/jira/browse/PIG-1159 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 For example this is my script:(join_jira1.pig) register /grid/0/dev/hadoopqa/jars/zebra.jar; --a1 = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --a2 = load '2.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --sort1 = order a1 by a parallel 6; --sort2 = order a2 by a parallel 5; --store sort1 into 'asort1' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort2 into 'asort2' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort1 into 'asort3' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort2 into 'asort4' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); joinl = LOAD 'asort1,asort2' USING org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted'); joinr = LOAD 'asort3,asort4' USING org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted'); joina = join joinl by a, joinr by a using merge ; dump joina; == here is the log: Backend error message - java.lang.IllegalArgumentException: Pathname /user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 from hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) at org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Pig Stack Trace --- ERROR 6015: During execution, encountered a Hadoop error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias joina at org.apache.pig.PigServer.openIterator(PigServer.java:482) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: During execution, encountered a Hadoop error. at .apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158) at .apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)at .apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) at .apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) at .apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) at .apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534) at .apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338) at
[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
[ https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789188#action_12789188 ] Jing Huang commented on PIG-1145: - found another failure on merge join This merge join script failed: register $zebraJar; --fs -rmr $outputDir --a1 = LOAD '$inputDir/unsorted1' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2'); --a2 = LOAD '$inputDir/unsorted2' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2'); --sort1 = order a1 by byte2; --sort2 = order a2 by byte2; --store sort1 into '$outputDir/100Msortedbyte21' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2];[byte2]'); --store sort2 into '$outputDir/100Msortedbyte22' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2];[byte2]'); rec1 = load '$outputDir/100Msortedbyte21' using org.apache.hadoop.zebra.pig.TableLoader('','sorted'); rec2 = load '$outputDir/100Msortedbyte22' using org.apache.hadoop.zebra.pig.TableLoader('','sorted'); joina = join rec1 by byte2, rec2 by byte2 using merge ; E = foreach joina generate $0 as count, $1 as seed, $2 as int1, $3 as str2, $4 as byte2; store E into '$outputDir/bad1' using org.apache.hadoop.zebra.pig.TableStorer(''); = instead, this similiar script works with the previous patch: register $zebraJar; --fs -rmr $outputDir a1 = LOAD '$inputDir/unsorted1' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2'); a2 = LOAD '$inputDir/unsorted2' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2'); sort1 = order a1 by byte2; sort2 = order a2 by byte2; store sort1 into '$outputDir/100Msortedbyte21' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2,byte2]'); store sort2 into '$outputDir/100Msortedbyte22' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2,byte2]'); rec1 = load '$outputDir/100Msortedbyte21' using org.apache.hadoop.zebra.pig.TableLoader('','sorted'); rec2 = load '$outputDir/100Msortedbyte22' using org.apache.hadoop.zebra.pig.TableLoader('','sorted'); joina = join rec1 by byte2, rec2 by byte2 using merge ; E = foreach joina generate $0 as count, $1 as seed, $2 as int1, $3 as str2, $4 as byte2; store E into '$outputDir/join3' using org.apache.hadoop.zebra.pig.TableStorer(''); ~ Here is stack trace: Backend error message - org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error processing right input during merge join at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.io.EOFException: No key-value to read at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) at org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854) at org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035) at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1083) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105) at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415) ... 9 more = This is how I run it (i disabled pruning to simply the possible problem) java -cp /grid/0/dev/hadoopqa/jing1234/conf:/grid/0/dev/hadoopqa/jars/pig.jar:/grid/0/dev/hadoopqa/jars/tfile.jar:/grid/0/dev/hadoopqa/jars/zebra.jar org.apache.pig.Main -m config -M -t PruneColumns bad_join.pig [zebra]
[jira] Commented: (PIG-1142) Got NullPointerException merge join with pruning
[ https://issues.apache.org/jira/browse/PIG-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788949#action_12788949 ] Jing Huang commented on PIG-1142: - Verified fix on pig 0.6.0 branch. Fix works. Got NullPointerException merge join with pruning Key: PIG-1142 URL: https://issues.apache.org/jira/browse/PIG-1142 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1142-1.patch Here is my pig script: register $zebraJar; --fs -rmr $outputDir a1 = LOAD '$inputDir/small1' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2'); a2 = LOAD '$inputDir/small2' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2'); sort1 = order a1 by str2; sort2 = order a2 by str2; --store sort1 into '$outputDir/smallsorted11' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]'); --store sort2 into '$outputDir/smallsorted21' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]'); rec1 = load '$outputDir/smallsorted11' using org.apache.hadoop.zebra.pig.TableLoader(); rec2 = load '$outputDir/smallsorted21' using org.apache.hadoop.zebra.pig.TableLoader(); joina = join rec1 by str2, rec2 by str2 using merge ; E = foreach joina generate $0 as count, $1 as seed, $2 as int1, $3 as str2; --limitedVals = LIMIT E 5; --dump limitedVals; store E into '$outputDir/smalljoin2' using org.apache.hadoop.zebra.pig.TableStorer(''); Here is the stacktrace: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:312) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.extractKeysFromTuple(POMergeJoin.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:341) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
[ https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788952#action_12788952 ] Jing Huang commented on PIG-1145: - I verified fix. It works. [zebra] merge join on large table ( 100,000.000 rows zebra table) failed Key: PIG-1145 URL: https://issues.apache.org/jira/browse/PIG-1145 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0, 0.7.0 Attachments: PIG-1145.patch Pig script : register $zebraJar; --fs -rmr $outputDir a1 = LOAD '$inputDir/unsorted1' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2'); a2 = LOAD '$inputDir/unsorted2' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2'); sort1 = order a1 by str2; sort2 = order a2 by str2; --store sort1 into '$outputDir/sorted11' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]'); --store sort2 into '$outputDir/sorted21' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]'); rec1 = load '$outputDir/sorted11' using org.apache.hadoop.zebra.pig.TableLoader(); rec2 = load '$outputDir/sorted21' using org.apache.hadoop.zebra.pig.TableLoader(); joina = join rec1 by str2, rec2 by str2 using merge ; --E = foreach joina generate $0 as count, $1 as seed, $2 as int1, $3 as str2; store joina into '$outputDir/join1' using org.apache.hadoop.zebra.pig.TableStorer(''); ~ ~ ~ == stacktrace: org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error processing right input during merge join at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.io.EOFException: No key-value to read at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) at org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854) at org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035) at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105) at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1142) Got NullPointerException merge join with pruning
Got NullPointerException merge join with pruning Key: PIG-1142 URL: https://issues.apache.org/jira/browse/PIG-1142 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Here is my pig script: register $zebraJar; --fs -rmr $outputDir a1 = LOAD '$inputDir/small1' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2'); a2 = LOAD '$inputDir/small2' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2'); sort1 = order a1 by str2; sort2 = order a2 by str2; --store sort1 into '$outputDir/smallsorted11' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]'); --store sort2 into '$outputDir/smallsorted21' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]'); rec1 = load '$outputDir/smallsorted11' using org.apache.hadoop.zebra.pig.TableLoader(); rec2 = load '$outputDir/smallsorted21' using org.apache.hadoop.zebra.pig.TableLoader(); joina = join rec1 by str2, rec2 by str2 using merge ; E = foreach joina generate $0 as count, $1 as seed, $2 as int1, $3 as str2; --limitedVals = LIMIT E 5; --dump limitedVals; store E into '$outputDir/smalljoin2' using org.apache.hadoop.zebra.pig.TableStorer(''); Here is the stacktrace: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:312) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.extractKeysFromTuple(POMergeJoin.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:341) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
[zebra] merge join on large table ( 100,000.000 rows zebra table) failed Key: PIG-1145 URL: https://issues.apache.org/jira/browse/PIG-1145 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Pig script : register $zebraJar; --fs -rmr $outputDir a1 = LOAD '$inputDir/unsorted1' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2'); a2 = LOAD '$inputDir/unsorted2' USING org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2'); sort1 = order a1 by str2; sort2 = order a2 by str2; --store sort1 into '$outputDir/sorted11' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]'); --store sort2 into '$outputDir/sorted21' using org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]'); rec1 = load '$outputDir/sorted11' using org.apache.hadoop.zebra.pig.TableLoader(); rec2 = load '$outputDir/sorted21' using org.apache.hadoop.zebra.pig.TableLoader(); joina = join rec1 by str2, rec2 by str2 using merge ; --E = foreach joina generate $0 as count, $1 as seed, $2 as int1, $3 as str2; store joina into '$outputDir/join1' using org.apache.hadoop.zebra.pig.TableStorer(''); ~ ~ ~ == stacktrace: org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error processing right input during merge join at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: java.io.EOFException: No key-value to read at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) at org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854) at org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035) at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105) at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415) ... 7 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1121) [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully
[ https://issues.apache.org/jira/browse/PIG-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1121: Fix Version/s: (was: 0.6.0) 0.7.0 [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully Key: PIG-1121 URL: https://issues.apache.org/jira/browse/PIG-1121 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 In the following pig script, if user do b = foreach a generate m1#'a' ; describe b will be: b: {bytearray} zebra store will fail, since there is no name passed to zebra, and zebra not only need type but also name in order to store. = If user do b = foreach a generate m1#'a' as ms1; describe b will be: b: {ms1: bytearray} Then zebra store can be succeeded. = Here is the full pig script. register /grid/0/dev/hadoopqa/jars/zebra.jar; a = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); b = foreach a generate m1#'a' as ms1; describe b; store b into 'map1' using org.apache.hadoop.zebra.pig.TableStorer(''); So, we should either fix it or document it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1120) [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint
[ https://issues.apache.org/jira/browse/PIG-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1120: Fix Version/s: (was: 0.6.0) 0.7.0 [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint - Key: PIG-1120 URL: https://issues.apache.org/jira/browse/PIG-1120 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 If user doesn't want to specify storage hint, current zebra implementation only support using org.apache.hadoop.zebra.pig.TableStorer('') Note: empty string in TableStorer(' '). We should support the format of using org.apache.hadoop.zebra.pig.TableStorer() as we do on using org.apache.hadoop.zebra.pig.TableLoader() sample pig script: register /grid/0/dev/hadoopqa/jars/zebra.jar; a = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); b = load '2.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); c = join a by a, b by a; d = foreach c generate a::a, a::b, b::c; describe d; dump d; store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer(''); --this will fail --store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer( ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1120) [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint
[ https://issues.apache.org/jira/browse/PIG-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787083#action_12787083 ] Jing Huang commented on PIG-1120: - for apache trunck [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint - Key: PIG-1120 URL: https://issues.apache.org/jira/browse/PIG-1120 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 If user doesn't want to specify storage hint, current zebra implementation only support using org.apache.hadoop.zebra.pig.TableStorer('') Note: empty string in TableStorer(' '). We should support the format of using org.apache.hadoop.zebra.pig.TableStorer() as we do on using org.apache.hadoop.zebra.pig.TableLoader() sample pig script: register /grid/0/dev/hadoopqa/jars/zebra.jar; a = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); b = load '2.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); c = join a by a, b by a; d = foreach c generate a::a, a::b, b::c; describe d; dump d; store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer(''); --this will fail --store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer( ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1121) [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully
[ https://issues.apache.org/jira/browse/PIG-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787084#action_12787084 ] Jing Huang commented on PIG-1121: - for apache trunk [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully Key: PIG-1121 URL: https://issues.apache.org/jira/browse/PIG-1121 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 In the following pig script, if user do b = foreach a generate m1#'a' ; describe b will be: b: {bytearray} zebra store will fail, since there is no name passed to zebra, and zebra not only need type but also name in order to store. = If user do b = foreach a generate m1#'a' as ms1; describe b will be: b: {ms1: bytearray} Then zebra store can be succeeded. = Here is the full pig script. register /grid/0/dev/hadoopqa/jars/zebra.jar; a = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); b = foreach a generate m1#'a' as ms1; describe b; store b into 'map1' using org.apache.hadoop.zebra.pig.TableStorer(''); So, we should either fix it or document it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1119) [zebra] group is a Pig preserved word, zebra needs to use other string for table's group information
[ https://issues.apache.org/jira/browse/PIG-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785662#action_12785662 ] Jing Huang commented on PIG-1119: - Here is the pig script for the test:(group_wrong.pig) register /grid/0/dev/hadoopqa/jars/zebra.jar; A = load 'filter.txt' as (name:chararray, age:int); B = group A by name; C = foreach B generate group, COUNT(A.name) as cnt; Store C into 'group1' using org.apache.hadoop.zebra.pig.TableStorer('[group];[cnt]'); === Here is the error message before the fix: ackend error message during job submission --- java.io.IOException: ColumnGroup.Writer constructor failed : Partition constructor failed :Encountered group group at line 1, column 2. Was expecting one of: IDENTIFIER ... ] ... at org.apache.hadoop.zebra.io.BasicTable$Writer.init(BasicTable.java:1259) at org.apache.hadoop.zebra.pig.TableOutputFormat.checkOutputSpecs(TableStorer.java:135) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Pig Stack Trace --- ERROR 2997: Unable to recreate exception from backend error: java.io.IOException: ColumnGroup.Writer constructor failed : Partition constructor failed :Encountered group group at line 1, column 2. org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backend error: java.io.IOException: ColumnGroup.Writer constructor failed : Partition constructor failed :Encountered group group at line 1, column 2. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:176) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:253) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780) at org.apache.pig.PigServer.execute(PigServer.java:773) at org.apache.pig.PigServer.access$100(PigServer.java:89) at org.apache.pig.PigServer$Graph.execute(PigServer.java:951) at org.apache.pig.PigServer.executeBatch(PigServer.java:248) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) Now with the patch we can successfully create the table with the same script. $HADOOP_HOME/bin/hadoop fs -cat group1/.btschema [group];[cnt]cnt:long group:strincnt:long [zebra] group is a Pig preserved word, zebra needs to use other string for table's group information -- Key: PIG-1119 URL: https://issues.apache.org/jira/browse/PIG-1119 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.6.0 Attachments: PIG-1119.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1120) [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint
[zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint - Key: PIG-1120 URL: https://issues.apache.org/jira/browse/PIG-1120 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.6.0 If user doesn't want to specify storage hint, current zebra implementation only support using org.apache.hadoop.zebra.pig.TableStorer('') Note: empty string in TableStorer(' '). We should support the format of using org.apache.hadoop.zebra.pig.TableStorer() as we do on using org.apache.hadoop.zebra.pig.TableLoader() sample pig script: register /grid/0/dev/hadoopqa/jars/zebra.jar; a = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); b = load '2.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); c = join a by a, b by a; d = foreach c generate a::a, a::b, b::c; describe d; dump d; store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer(''); --this will fail --store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer( ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1121) [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully
[zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully Key: PIG-1121 URL: https://issues.apache.org/jira/browse/PIG-1121 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.6.0 In the following pig script, if user do b = foreach a generate m1#'a' ; describe b will be: b: {bytearray} zebra store will fail, since there is no name passed to zebra, and zebra not only need type but also name in order to store. = If user do b = foreach a generate m1#'a' as ms1; describe b will be: b: {ms1: bytearray} Then zebra store can be succeeded. = Here is the full pig script. register /grid/0/dev/hadoopqa/jars/zebra.jar; a = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); b = foreach a generate m1#'a' as ms1; describe b; store b into 'map1' using org.apache.hadoop.zebra.pig.TableStorer(''); So, we should either fix it or document it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1111) [Zebra] multiple outputs support
[ https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-: Description: Zebra enables application to stream data into different zebra table instances. New Interface added: setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class? extends ZebraOutputPartitioner theClass. Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order ) ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance We also introduce a new mapred property for setting multiple outputs. mapred.lib.table.multi.output.dirs was: Zebra enables application to stream data into different zebra table instances. New Interface added: setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class? extends ZebraOutputPartitioner theClass. Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order ) ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance We also introduce a new mapred property for setting multiple outputs. mapred.lib.table.multi.output.dirs Summary: [Zebra] multiple outputs support (was: [Zebra]) [Zebra] multiple outputs support Key: PIG- URL: https://issues.apache.org/jira/browse/PIG- Project: Pig Issue Type: New Feature Reporter: Gaurav Jain Assignee: Gaurav Jain Fix For: 0.6.0, 0.7.0 Zebra enables application to stream data into different zebra table instances. New Interface added: setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class? extends ZebraOutputPartitioner theClass. Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order ) ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance We also introduce a new mapred property for setting multiple outputs. mapred.lib.table.multi.output.dirs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1119) [zebra] group is a Pig preserved word, zebra needs to use other string for table's group information
[zebra] group is a Pig preserved word, zebra needs to use other string for table's group information -- Key: PIG-1119 URL: https://issues.apache.org/jira/browse/PIG-1119 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.6.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1056) table can not be loaded after store
table can not be loaded after store --- Key: PIG-1056 URL: https://issues.apache.org/jira/browse/PIG-1056 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Pig Stack Trace --- ERROR 1018: Problem determining schema during load org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Problem determining schema during load at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1023) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:967) at org.apache.pig.PigServer.registerQuery(PigServer.java:383) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:716) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem determining schema during load at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1017) ... 8 more Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018: Problem determining schema during load at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732) ... 10 more Caused by: java.io.IOException: No table specified for input at org.apache.hadoop.zebra.pig.TableLoader.checkConf(TableLoader.java:238) at org.apache.hadoop.zebra.pig.TableLoader.determineSchema(TableLoader.java:258) at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148) ... 11 more ~ script: register /grid/0/dev/hadoopqa/hadoop/lib/zebra.jar; A = load 'filter.txt' as (name:chararray, age:int); B = filter A by age 20; --dump B; store B into 'filter1' using org.apache.hadoop.zebra.pig.TableStorer('[name];[age]'); rec1 = load 'B' using org.apache.hadoop.zebra.pig.TableLoader(); dump rec1; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-996) [zebra] Zebra build script does not have findbugs and clover targets.
[ https://issues.apache.org/jira/browse/PIG-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767905#action_12767905 ] Jing Huang commented on PIG-996: +1 New patch reviewed. [zebra] Zebra build script does not have findbugs and clover targets. - Key: PIG-996 URL: https://issues.apache.org/jira/browse/PIG-996 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: patch_build, patch_build Zebra build script does not have findbugs and clover targets, leading hudson build process to fail on Zebra. This jira is to fix this by adding these two targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1026) [zebra] map split returns null
[ https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767983#action_12767983 ] Jing Huang commented on PIG-1026: - Created a customer scenario with this schema and storage hint: (TestJira1026.java) final static String STR_SCHEMA = bcookie:bytes,yuid:bytes, ip:bytes,query_term:bytes,clickinfo:map(String),demog:map(String),page_params:map(String),viewinfo:collection(f1:map(String)); final static String STR_STORAGE = [bcookie,yuid,ip,query_term];[clickinfo#{pos|sec|slk|targurl|cost|gpos},page_params#{ipc|vtestid|frcode|pagenum|query}];[clickinfo,page_params,demog];[viewinfo]; Got NullPointExcepiton. [zebra] map split returns null -- Key: PIG-1026 URL: https://issues.apache.org/jira/browse/PIG-1026 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0 Attachments: MultipleKeyInMapSplitException.patch Here is the test scenario: final static String STR_SCHEMA = m1:map(string),m2:map(map(int)); //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]; final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1,m2]; projection: String projection2 = new String(m1#{b}, m2#{x|z}); User got null pointer exception on reading m1#{b}. Yan, please refer to the test class: TestNonDefaultWholeMapSplit.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1026) [zebra] map split returns null
[zebra] map split returns null -- Key: PIG-1026 URL: https://issues.apache.org/jira/browse/PIG-1026 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Yan Zhou Fix For: 0.6.0 Here is the test scenario: final static String STR_SCHEMA = m1:map(string),m2:map(map(int)); //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]; final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1,m2]; projection: String projection2 = new String(m1#{b}, m2#{x|z}); User got null pointer exception on reading m1#{b}. Yan, please refer to the test class: TestNonDefaultWholeMapSplit.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-996) [zebra] Zebra build script does not have findbugs and clover targets.
[ https://issues.apache.org/jira/browse/PIG-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763324#action_12763324 ] Jing Huang commented on PIG-996: +1 Patch reviewed. [zebra] Zebra build script does not have findbugs and clover targets. - Key: PIG-996 URL: https://issues.apache.org/jira/browse/PIG-996 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: patch_build Zebra build script does not have findbugs and clover targets, leading hudson build process to fail on Zebra. This jira is to fix this by adding these two targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754353#action_12754353 ] Jing Huang commented on PIG-949: Thanks Alok. I am able to reproduce the problem. I was only using i/o layer (not pig loader) to test map split. This is what I did: final static String STR_SCHEMA = m1:map(string),m2:map(map(int)); final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}];[m1]; ...create table and insert data .. load: String projection = new String(m1#{a}); I only got null returned. Without storage hint [m1], everything works fine. , i.e. final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, m2#{z}]; ...create table and insert data .. load: String projection = new String(m1#{a}); I am able to get value m1#{a}. Zebra team is working on the fix. Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Environment: linux Reporter: Alok Singh Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750093#action_12750093 ] Jing Huang commented on PIG-833: Hi Yongqiang, Sorry for the late reply. I was out of town last week. Right, SF_F is not defined in the schema, query a none-existing column is allowed and it will return null. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745125#action_12745125 ] Jing Huang commented on PIG-833: Zebra supports int, long, float, double, bool, collection (equivalent to Pig Bag), map, record (equivalent to Pig Tuple), string, bytes (equivalent to Pig Bytearray) Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-917) [zebra]some issues on compression
[ https://issues.apache.org/jira/browse/PIG-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742920#action_12742920 ] Jing Huang commented on PIG-917: Oops, pig store not pig loader. :) [zebra]some issues on compression - Key: PIG-917 URL: https://issues.apache.org/jira/browse/PIG-917 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Jing Huang Fix For: 0.4.0 These are zebra compression related issues: 1. ColumnGoupParser only recognize gzip not gz. For example, if user specify compress by gz, it will throw org.apache.hadoop.zebra.types.ParseException. 2. BasicTable.dumpInfo is wrong. It will always print Compressor: lzo2 even if the default compressor is gz, or user specifies compress by gzip. So we can not verify if the default compressor can be actually over written. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-917) [zebra]some issues on compression
[ https://issues.apache.org/jira/browse/PIG-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-917: --- Affects Version/s: (was: 0.1.0) 0.3.0 Fix Version/s: (was: 0.2.0) 0.4.0 [zebra]some issues on compression - Key: PIG-917 URL: https://issues.apache.org/jira/browse/PIG-917 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Jing Huang Fix For: 0.4.0 These are zebra compression related issues: 1. ColumnGoupParser only recognize gzip not gz. For example, if user specify compress by gz, it will throw org.apache.hadoop.zebra.types.ParseException. 2. BasicTable.dumpInfo is wrong. It will always print Compressor: lzo2 even if the default compressor is gz, or user specifies compress by gzip. So we can not verify if the default compressor can be actually over written. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.