I don't believe a Hadoop FileSystem is a Java OutputStream? --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> wrote:
> From: Doug Cutting <cutt...@apache.org> > Subject: Re: Is it possible to append to an already existing avro file > To: user@avro.apache.org > Date: Tuesday, February 5, 2013, 5:27 PM > It will work on an OutputStream that > supports append. > > http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, > java.io.OutputStream) > > So it depends on how well HDFS implements > FileSystem#append(), not on > any changes in Avro. > > http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) > > I have no recent personal experience with append in > HDFS. Does anyone > else here? > > Doug > > On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <michaelma...@yahoo.com> > wrote: > > My understanding is that will append to a file on the > local filesystem, but not to a file on HDFS. > > > > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> > wrote: > > > >> From: Doug Cutting <cutt...@apache.org> > >> Subject: Re: Is it possible to append to an already > existing avro file > >> To: user@avro.apache.org > >> Date: Tuesday, February 5, 2013, 5:08 PM > >> The Jira is: > >> > >> https://issues.apache.org/jira/browse/AVRO-1035 > >> > >> It is possible to append to an existing Avro file: > >> > >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> > >> Should we close that issue as "fixed"? > >> > >> Doug > >> > >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak > <michaelma...@yahoo.com> > >> wrote: > >> > Was a JIRA ticket ever created regarding > appending to > >> an existing Avro file on HDFS? > >> > > >> > What is the status of such a capability, a > year out > >> from when the issue below was raised? > >> > > >> > On Wed, 22 Feb 2012 10:57:48 +0100, > "Vyacheslav > >> Zholudev" <vyacheslav.zholu...@gmail.com> > >> wrote: > >> > > >> >> Thanks for your reply, I suspected this. > >> >> > >> >> I will create a JIRA ticket. > >> >> > >> >> Vyacheslav > >> >> > >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey > wrote: > >> >> > >> >>> > >> >>> On 2/21/12 7:29 AM, "Vyacheslav > Zholudev" > >> <vyacheslav.zholu...@gmail.com> > >> >>> wrote: > >> >>> > >> >>>> Yep, I saw that method as well as > the > >> stackoverflow post. However, I'm > >> >>>> interested how to append to a file > on the > >> arbitrary file system, not > >> >>>> only on the local one. > >> >>>> > >> >>>> I want to get an OutputStream > based on the > >> Path and the FileSystem > >> >>>> implementation and then pass it > for > >> appending to avro methods. > >> >>>> > >> >>>> Is that possible? > >> >>> > >> >>> It is not possible without modifying > >> DataFileWriter. Please open a JIRA > >> >>> ticket. > >> >>> > >> >>> It could not simply append to an > OutputStream, > >> since it must either: > >> >>> * Seek to the start to validate the > schemas > >> match and find the sync > >> >>> marker, or > >> >>> * Trust that the schemas match and > find the > >> sync marker from the last > >> >>> block > >> >>> > >> >>> DataFileWriter cannot refer to Hadoop > classes > >> such as FileSystem, but we > >> >>> could add something to the mapred > module that > >> takes a Path and > >> >>> FileSystem and returns something that > >> implemements an interface that > >> >>> DataFileWriter can append to. > This would > >> be something that is both a > >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >> >>> and an OutputStream, or has both an > InputStream > >> from the start of the > >> >>> existing file and an OutputStream at > the end. > >> >>> > >> >>>> Thanks, > >> >>>> Vyacheslav > >> >>>> > >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh > J > >> wrote: > >> >>>> > >> >>>>> Hi, > >> >>>>> > >> >>>>> Use the appendTo feature of > the > >> DataFileWriter. See > >> >>>>> > >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >>>>> > >> >>>>> For a quick setup example, > read also: > >> >>>>> > >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >> >>>>> > >> >>>>> On Tue, Feb 21, 2012 at 3:15 > AM, > >> Vyacheslav Zholudev > >> >>>>> <vyacheslav.zholu...@gmail.com> > >> wrote: > >> >>>>>> Hi, > >> >>>>>> > >> >>>>>> is it possible to append > to an > >> already existing avro file when it was > >> >>>>>> written and closed > before? > >> >>>>>> > >> >>>>>> If I use > >> >>>>>> outputStream = > >> fs.append(avroFilePath); > >> >>>>>> > >> >>>>>> then later on I get: > >> java.io.IOException: Invalid sync! > >> >>>>>> > >> >>>>>> Probably because the > schema is > >> written twice and some other issues. > >> >>>>>> > >> >>>>>> If I use outputStream = > >> fs.create(avroFilePath); then the avro file > >> >>>>>> gets > >> >>>>>> overwritten. > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> Vyacheslav > >> >>>>> > >> >>>>> -- > >> >>>>> Harsh J > >> >>>>> Customer Ops. Engineer > >> >>>>> Cloudera | http://tiny.cloudera.com/about > >> > > >> > >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak > <michaelma...@yahoo.com> > >> wrote: > >> > Was a JIRA ticket ever created regarding > appending to > >> an existing Avro file on HDFS? > >> > > >> > What is the status of such a capability, a > year out > >> from when the issue below was raised? > >> > > >> > On Wed, 22 Feb 2012 10:57:48 +0100, > "Vyacheslav > >> Zholudev" <vyacheslav.zholu...@gmail.com> > >> wrote: > >> > > >> >> Thanks for your reply, I suspected this. > >> >> > >> >> I will create a JIRA ticket. > >> >> > >> >> Vyacheslav > >> >> > >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey > wrote: > >> >> > >> >>> > >> >>> On 2/21/12 7:29 AM, "Vyacheslav > Zholudev" > >> <vyacheslav.zholu...@gmail.com> > >> >>> wrote: > >> >>> > >> >>>> Yep, I saw that method as well as > the > >> stackoverflow post. However, I'm > >> >>>> interested how to append to a file > on the > >> arbitrary file system, not > >> >>>> only on the local one. > >> >>>> > >> >>>> I want to get an OutputStream > based on the > >> Path and the FileSystem > >> >>>> implementation and then pass it > for > >> appending to avro methods. > >> >>>> > >> >>>> Is that possible? > >> >>> > >> >>> It is not possible without modifying > >> DataFileWriter. Please open a JIRA > >> >>> ticket. > >> >>> > >> >>> It could not simply append to an > OutputStream, > >> since it must either: > >> >>> * Seek to the start to validate the > schemas > >> match and find the sync > >> >>> marker, or > >> >>> * Trust that the schemas match and > find the > >> sync marker from the last > >> >>> block > >> >>> > >> >>> DataFileWriter cannot refer to Hadoop > classes > >> such as FileSystem, but we > >> >>> could add something to the mapred > module that > >> takes a Path and > >> >>> FileSystem and returns something that > >> implemements an interface that > >> >>> DataFileWriter can append to. > This would > >> be something that is both a > >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html > >> >>> and an OutputStream, or has both an > InputStream > >> from the start of the > >> >>> existing file and an OutputStream at > the end. > >> >>> > >> >>>> Thanks, > >> >>>> Vyacheslav > >> >>>> > >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh > J > >> wrote: > >> >>>> > >> >>>>> Hi, > >> >>>>> > >> >>>>> Use the appendTo feature of > the > >> DataFileWriter. See > >> >>>>> > >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) > >> >>>>> > >> >>>>> For a quick setup example, > read also: > >> >>>>> > >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file > >> >>>>> > >> >>>>> On Tue, Feb 21, 2012 at 3:15 > AM, > >> Vyacheslav Zholudev > >> >>>>> <vyacheslav.zholu...@gmail.com> > >> wrote: > >> >>>>>> Hi, > >> >>>>>> > >> >>>>>> is it possible to append > to an > >> already existing avro file when it was > >> >>>>>> written and closed > before? > >> >>>>>> > >> >>>>>> If I use > >> >>>>>> outputStream = > >> fs.append(avroFilePath); > >> >>>>>> > >> >>>>>> then later on I get: > >> java.io.IOException: Invalid sync! > >> >>>>>> > >> >>>>>> Probably because the > schema is > >> written twice and some other issues. > >> >>>>>> > >> >>>>>> If I use outputStream = > >> fs.create(avroFilePath); then the avro file > >> >>>>>> gets > >> >>>>>> overwritten. > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> Vyacheslav > >> >>>>> > >> >>>>> -- > >> >>>>> Harsh J > >> >>>>> Customer Ops. Engineer > >> >>>>> Cloudera | http://tiny.cloudera.com/about > >> > > >> >