TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to estimateNumberOfReducers bug --------------------------------------------------------------------------------------------------------
Key: PIG-1652 URL: https://issues.apache.org/jira/browse/PIG-1652 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Fix For: 0.8.0 TestSortedTableUnion and TestSortedTableUnionMergeJoin fail on trunk due to the input size estimation. Here is the stack of TestSortedTableUnionMergeJoin: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias records3 at org.apache.pig.PigServer.storeEx(PigServer.java:877) at org.apache.pig.PigServer.store(PigServer.java:815) at org.apache.pig.PigServer.openIterator(PigServer.java:727) at org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer(TestSortedTableUnionMergeJoin.java:203) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:326) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1197) at org.apache.pig.PigServer.storeEx(PigServer.java:873) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 69: org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file: at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.<init>(Path.java:126) at org.apache.hadoop.fs.Path.<init>(Path.java:50) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:963) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:966) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:902) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:866) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:844) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:715) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:688) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visitMROp(SampleOptimizer.java:140) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:246) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:41) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:71) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:52) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SampleOptimizer.visit(SampleOptimizer.java:69) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:491) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:116) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:301) Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 69: org.apache.hadoop.zebra.pig.TestSortedTableUnionMergeJoin.testStorer1,file: at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parse(URI.java:3009) at java.net.URI.<init>(URI.java:736) at org.apache.hadoop.fs.Path.initialize(Path.java:137) The reason is we are trying to do globStatus on a URL which is a comma seperated list. Here is the URL we get in JobControlCompiler.getTotalInputFileSize: file:///homes/jianyong/pig2/build/contrib/zebra/test/data/org.apache.hadoop.zebra.pig.TestSortedTableUnion.testStorer1,file:///homes/jianyong/pig2/build/contrib/zebra/test/data/org.apache.hadoop.zebra.pig.TestSortedTableUnion.testStorer2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.