[ 
https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincenz Priesnitz updated AVRO-867:
-----------------------------------

    Affects Version/s: 1.7.5
         Release Note: avro-tools can now access Hadoop supported filesystem 
when started via hadoop jar.
               Status: Patch Available  (was: Open)

Attached you find a patch that changes the Utils class to use the hadoop 
FileSystem class. It is now possible to use any supported filesystem for input 
or output files in more tools. 

Without any configurations, the tools behave as before:
{noformat}
# reads from local file system by default
# supports relative paths
java -jar avro-tools-1.7.5.jar tojson ~/myDir/myData.avro
{noformat}

If invoked via hadoop jar, the tools support more filesystems. Different 
filesystems can be used in a single call. Furthermore, any default filesystem 
that might be specified in core-site.xml is respected.
{noformat}
# combines an ftp file and a local file and writes result file 
combinedData.avro directly on the default hdfs server.
hadoop jar avro-tools-1.7.5.jar concat ftp://myFtpServer/data1.avro 
file:///home/user/data2.avro combinedData.avro
{noformat}

Now it is possible to take a look at remote files quicker, e.g.:
{noformat}
hadoop jar avro-Tools getschema Data_on_hdfs.avro
hadoop jar avro-Tools tojson ftp://server-address/Data_on_ftp.avro 
{noformat}

The following tools now use Utils for accessing files: concat, fragtojson, 
fromjson, fromtext, getmeta, getschema, jsontofrag, recodec, tojson, totext.
                
> Allow tools to read files via hadoop FileSystem class
> -----------------------------------------------------
>
>                 Key: AVRO-867
>                 URL: https://issues.apache.org/jira/browse/AVRO-867
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.5
>            Reporter: Joe Crobak
>            Assignee: Joe Crobak
>
> It would be great if I could use the various tools to read/parse files that 
> are in HDFS, S3, etc via the 
> [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html]
>  api. We could retain backwards compatibility by assuming that unqualified 
> urls are "file://" but allow reading of files from fully qualified urls such 
> as hdfs://. The required apis are already part of the avro-tools uber jar to 
> support the TetherTool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to