[ 
https://issues.apache.org/jira/browse/PIG-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419964#comment-13419964
 ] 

Cheolsoo Park commented on PIG-2492:
------------------------------------

Attached [^PIG-2492-4.patch] is the newest patch.

There is one thing that I'd like to mention although I already discussed it in 
the review board.

I changed the type of 1st parameter of AvroStorageUtils.getAllSubDirs() from 
URI to hadoop.fs.Path. This is needed because '{' and '}' are not allowed in 
URI, so URI.create() throws a URISyntaxException on a glob pattern that 
contains those characters.

But these characters are automatically escaped when constructing a Path, so 
what I did is constructing a Path with the given glob pattern string and 
getting a URI from that Path by Path.toUri().

In fact, this reverts some changes made by PIG-2540 
(https://issues.apache.org/jira/browse/PIG-2540). However, this does not break 
S3 support because inside AvroStorageUtils.getAllSubDirs(), file system is 
still constructed with the given URI, and globStatus() is called on that file 
system.

{code}
FileSystem fs = FileSystem.get(path.toUri(), job.getConfiguration());
FileStatus[] matchedFiles = fs.globStatus(path);
{code}

So if path is a s3 URI, S3 file system will be used.

Please let me know if I am wrong. Thanks!
                
> AvroStorage should recognize globs and commas
> ---------------------------------------------
>
>                 Key: PIG-2492
>                 URL: https://issues.apache.org/jira/browse/PIG-2492
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1, 0.10.0
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>         Attachments: AvroStorage.patch, AvroStorageUtils.patch, 
> PIG-2492-2.patch, PIG-2492-3.patch, PIG-2492-4.patch, PIG-2492.patch, 
> avro_test_files-2.tar.gz, avro_test_files.tar.gz
>
>
> I've patched AvroStorage and AvroStorageUtils to support the same file input 
> syntax as currently supported
> by hadoop's FileInputFormat.  Specifically, globs and commas are supported.
> Somebody should write some unit tests for theses changes; I am currently 
> pressed for time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to