[ 
https://issues.apache.org/jira/browse/PIG-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648078#comment-13648078
 ] 

Johnny Zhang commented on PIG-3223:
-----------------------------------

[~rohini], could you please review the latest patch 
https://issues.apache.org/jira/secure/attachment/12581645/PIG-3223.patch.txt ?
new added test cases in TestAvroStorage is also clean. Please let me know any 
concern regarding to the implementation, I will revised it as soon as possible!
Let me know if you want me post another patch for 0.11 branch too!

I also tried Viraj's patch 'PIG-3223.viraj.txt', it not clean on trunk
{noformat}
patching file 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
patching file 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
Hunk #2 succeeded at 45 (offset 1 line).
Hunk #3 succeeded at 72 (offset 1 line).
Hunk #4 succeeded at 91 with fuzz 1 (offset 1 line).
Hunk #5 succeeded at 1005 (offset 19 lines).
{noformat}

also the TestAvroStorage test failed
{noformat}
    <error message="Error during parsing. java.net.URISyntaxException: Illegal 
character in scheme name at index 4: test_glob1.avro,file:" 
type="org.apache.pig.impl.logicalLayer.FrontendException">org.apache.pig.impl.logicalLayer.FrontendException:
 ERROR 1000: Error during parsing. java.net.URISyntaxException: Illegal 
character in scheme name at index 4: test_glob1.avro,file:
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1670)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1608)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:565)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:578)
        at 
org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testAvroStorage(TestAvroStorage.java:1058)
        at 
org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testAvroStorage(TestAvroStorage.java:1051)
        at 
org.apache.pig.piggybank.test.storage.avro.TestAvroStorage.testComma1(TestAvroStorage.java:1020)
Caused by: Failed to parse: java.net.URISyntaxException: Illegal character in 
scheme name at index 4: test_glob1.avro,file:
        at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1661)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Illegal character in scheme name at index 4: test_glob1.avro,file:
        at org.apache.hadoop.fs.Path.initialize(Path.java:148)
        at org.apache.hadoop.fs.Path.&lt;init&gt;(Path.java:126)
        at org.apache.hadoop.fs.Path.&lt;init&gt;(Path.java:50)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1084)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
        at 
org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1023)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:987)
        at 
org.apache.pig.piggybank.storage.avro.AvroStorageUtils.getAllSubDirs(AvroStorageUtils.java:120)
        at 
org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:387)
        at 
org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
        at 
org.apache.pig.newplan.logical.relational.LOLoad.&lt;init&gt;(LOLoad.java:88)
        at 
org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:856)
        at 
org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3256)
        at 
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1335)
        at 
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:819)
        at 
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:537)
        at 
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:412)
        at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:181)
Caused by: java.net.URISyntaxException: Illegal character in scheme name at 
index 4: test_glob1.avro,file:
        at java.net.URI$Parser.fail(URI.java:2809)
        at java.net.URI$Parser.checkChars(URI.java:2982)
        at java.net.URI$Parser.parse(URI.java:3009)
        at java.net.URI.&lt;init&gt;(URI.java:736)
        at org.apache.hadoop.fs.Path.initialize(Path.java:145)
{noformat}
                
> AvroStorage does not handle comma separated input paths
> -------------------------------------------------------
>
>                 Key: PIG-3223
>                 URL: https://issues.apache.org/jira/browse/PIG-3223
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.10.0, 0.11
>            Reporter: Michael Kramer
>            Assignee: Johnny Zhang
>         Attachments: AvroStorage.patch, AvroStorage.patch-2, 
> AvroStorageUtils.patch, AvroStorageUtils.patch-2, PIG-3223.patch.txt, 
> PIG-3223.patch.txt, PIG-3223.patch.txt, PIG-3223.patch.txt, PIG-3223.viraj.txt
>
>
> In pig 0.11, a patch was issued to AvroStorage to support globs and comma 
> separated input paths (PIG-2492).  While this function works fine for 
> glob-formatted input paths, it fails when issued a standard comma separated 
> list of paths.  fs.globStatus does not seem to be able to parse out such a 
> list, and a java.net.URISyntaxException is thrown when toURI is called on the 
> path.  
> I have a working fix for this, but it's extremely ugly (basically checking if 
> the string of input paths is globbed, otherwise splitting on ",").  I'm sure 
> there's a more elegant solution.  I'd be happy to post the relevant methods 
> and "fixes" if necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to