[ https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648078#comment-13648078 ]
Johnny Zhang commented on PIG-3223: ----------------------------------- [~rohini], could you please review the latest patch https://issues.apache.org/jira/secure/attachment/12581645/PIG-3223.patch.txt ? new added test cases in TestAvroStorage is also clean. Please let me know any concern regarding to the implementation, I will revised it as soon as possible! Let me know if you want me post another patch for 0.11 branch too! I also tried Viraj's patch 'PIG-3223.viraj.txt', it not clean on trunk {noformat} patching file contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java patching file contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java Hunk #2 succeeded at 45 (offset 1 line). Hunk #3 succeeded at 72 (offset 1 line). Hunk #4 succeeded at 91 with fuzz 1 (offset 1 line). Hunk #5 succeeded at 1005 (offset 19 lines). {noformat} also the TestAvroStorage test failed {noformat} <error message="Error during parsing. java.net.URISyntaxException: Illegal character in scheme name at index 4: test_glob1.avro,file:" type="org.apache.pig.impl.logicalLayer.FrontendException">org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. java.net.URISyntaxException: Illegal character in scheme name at index 4: test_glob1.avro,file: at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1670) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1608) at org.apache.pig.PigServer.registerQuery(PigServer.java:565) at org.apache.pig.PigServer.registerQuery(PigServer.java:578) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testAvroStorage(TestAvroStorage.java:1058) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testAvroStorage(TestAvroStorage.java:1051) at org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testComma1(TestAvroStorage.java:1020) Caused by: Failed to parse: java.net.URISyntaxException: Illegal character in scheme name at index 4: test_glob1.avro,file: at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1661) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 4: test_glob1.avro,file: at org.apache.hadoop.fs.Path.initialize(Path.java:148) at org.apache.hadoop.fs.Path.<init>(Path.java:126) at org.apache.hadoop.fs.Path.<init>(Path.java:50) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1084) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087) at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1023) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:987) at org.apache.pig.piggybank.storage.avro.AvroStorageUtils.getAllSubDirs(AvroStorageUtils.java:120) at org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:387) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174) at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:88) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:856) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3256) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1335) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:819) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:537) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:412) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:181) Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 4: test_glob1.avro,file: at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parse(URI.java:3009) at java.net.URI.<init>(URI.java:736) at org.apache.hadoop.fs.Path.initialize(Path.java:145) {noformat} > AvroStorage does not handle comma separated input paths > ------------------------------------------------------- > > Key: PIG-3223 > URL: https://issues.apache.org/jira/browse/PIG-3223 > Project: Pig > Issue Type: Bug > Components: piggybank > Affects Versions: 0.10.0, 0.11 > Reporter: Michael Kramer > Assignee: Johnny Zhang > Attachments: AvroStorage.patch, AvroStorage.patch-2, > AvroStorageUtils.patch, AvroStorageUtils.patch-2, PIG-3223.patch.txt, > PIG-3223.patch.txt, PIG-3223.patch.txt, PIG-3223.patch.txt, PIG-3223.viraj.txt > > > In pig 0.11, a patch was issued to AvroStorage to support globs and comma > separated input paths (PIG-2492). While this function works fine for > glob-formatted input paths, it fails when issued a standard comma separated > list of paths. fs.globStatus does not seem to be able to parse out such a > list, and a java.net.URISyntaxException is thrown when toURI is called on the > path. > I have a working fix for this, but it's extremely ugly (basically checking if > the string of input paths is globbed, otherwise splitting on ","). I'm sure > there's a more elegant solution. I'd be happy to post the relevant methods > and "fixes" if necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira