I don't believe a Hadoop FileSystem is a Java OutputStream?

--- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> wrote:

> From: Doug Cutting <cutt...@apache.org>
> Subject: Re: Is it possible to append to an already existing avro file
> To: user@avro.apache.org
> Date: Tuesday, February 5, 2013, 5:27 PM
> It will work on an OutputStream that
> supports append.
> 
> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
> java.io.OutputStream)
> 
> So it depends on how well HDFS implements
> FileSystem#append(), not on
> any changes in Avro.
> 
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
> 
> I have no recent personal experience with append in
> HDFS.  Does anyone
> else here?
> 
> Doug
> 
> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <michaelma...@yahoo.com>
> wrote:
> > My understanding is that will append to a file on the
> local filesystem, but not to a file on HDFS.
> >
> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
> wrote:
> >
> >> From: Doug Cutting <cutt...@apache.org>
> >> Subject: Re: Is it possible to append to an already
> existing avro file
> >> To: user@avro.apache.org
> >> Date: Tuesday, February 5, 2013, 5:08 PM
> >> The Jira is:
> >>
> >> https://issues.apache.org/jira/browse/AVRO-1035
> >>
> >> It is possible to append to an existing Avro file:
> >>
> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >>
> >> Should we close that issue as "fixed"?
> >>
> >> Doug
> >>
> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
> <michaelma...@yahoo.com>
> >> wrote:
> >> > Was a JIRA ticket ever created regarding
> appending to
> >> an existing Avro file on HDFS?
> >> >
> >> > What is the status of such a capability, a
> year out
> >> from when the issue below was raised?
> >> >
> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> "Vyacheslav
> >> Zholudev" <vyacheslav.zholu...@gmail.com>
> >> wrote:
> >> >
> >> >> Thanks for your reply, I suspected this.
> >> >>
> >> >> I will create a JIRA ticket.
> >> >>
> >> >> Vyacheslav
> >> >>
> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
> wrote:
> >> >>
> >> >>>
> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
> Zholudev"
> >> <vyacheslav.zholu...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> Yep, I saw that method as well as
> the
> >> stackoverflow post. However, I'm
> >> >>>> interested how to append to a file
> on the
> >> arbitrary file system, not
> >> >>>> only on the local one.
> >> >>>>
> >> >>>> I want to get an OutputStream
> based on the
> >> Path and the FileSystem
> >> >>>> implementation and then pass it
> for
> >> appending to avro methods.
> >> >>>>
> >> >>>> Is that possible?
> >> >>>
> >> >>> It is not possible without modifying
> >> DataFileWriter. Please open a JIRA
> >> >>> ticket.
> >> >>>
> >> >>> It could not simply append to an
> OutputStream,
> >> since it must either:
> >> >>> * Seek to the start to validate the
> schemas
> >> match and find the sync
> >> >>> marker, or
> >> >>> * Trust that the schemas match and
> find the
> >> sync marker from the last
> >> >>> block
> >> >>>
> >> >>> DataFileWriter cannot refer to Hadoop
> classes
> >> such as FileSystem, but we
> >> >>> could add something to the mapred
> module that
> >> takes a Path and
> >> >>> FileSystem and returns something that
> >> implemements an interface that
> >> >>> DataFileWriter can append to. 
> This would
> >> be something that is both a
> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >>> and an OutputStream, or has both an
> InputStream
> >> from the start of the
> >> >>> existing file and an OutputStream at
> the end.
> >> >>>
> >> >>>> Thanks,
> >> >>>> Vyacheslav
> >> >>>>
> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
> J
> >> wrote:
> >> >>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> Use the appendTo feature of
> the
> >> DataFileWriter. See
> >> >>>>>
> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>>>>
> >> >>>>> For a quick setup example,
> read also:
> >> >>>>>
> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >>>>>
> >> >>>>> On Tue, Feb 21, 2012 at 3:15
> AM,
> >> Vyacheslav Zholudev
> >> >>>>> <vyacheslav.zholu...@gmail.com>
> >> wrote:
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> is it possible to append
> to an
> >> already existing avro file when it was
> >> >>>>>> written and closed
> before?
> >> >>>>>>
> >> >>>>>> If I use
> >> >>>>>> outputStream =
> >> fs.append(avroFilePath);
> >> >>>>>>
> >> >>>>>> then later on I get:
> >> java.io.IOException: Invalid sync!
> >> >>>>>>
> >> >>>>>> Probably because the
> schema is
> >> written twice and some other issues.
> >> >>>>>>
> >> >>>>>> If I use outputStream =
> >> fs.create(avroFilePath); then the avro file
> >> >>>>>> gets
> >> >>>>>> overwritten.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Vyacheslav
> >> >>>>>
> >> >>>>> --
> >> >>>>> Harsh J
> >> >>>>> Customer Ops. Engineer
> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >
> >>
> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak
> <michaelma...@yahoo.com>
> >> wrote:
> >> > Was a JIRA ticket ever created regarding
> appending to
> >> an existing Avro file on HDFS?
> >> >
> >> > What is the status of such a capability, a
> year out
> >> from when the issue below was raised?
> >> >
> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
> "Vyacheslav
> >> Zholudev" <vyacheslav.zholu...@gmail.com>
> >> wrote:
> >> >
> >> >> Thanks for your reply, I suspected this.
> >> >>
> >> >> I will create a JIRA ticket.
> >> >>
> >> >> Vyacheslav
> >> >>
> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey
> wrote:
> >> >>
> >> >>>
> >> >>> On 2/21/12 7:29 AM, "Vyacheslav
> Zholudev"
> >> <vyacheslav.zholu...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> Yep, I saw that method as well as
> the
> >> stackoverflow post. However, I'm
> >> >>>> interested how to append to a file
> on the
> >> arbitrary file system, not
> >> >>>> only on the local one.
> >> >>>>
> >> >>>> I want to get an OutputStream
> based on the
> >> Path and the FileSystem
> >> >>>> implementation and then pass it
> for
> >> appending to avro methods.
> >> >>>>
> >> >>>> Is that possible?
> >> >>>
> >> >>> It is not possible without modifying
> >> DataFileWriter. Please open a JIRA
> >> >>> ticket.
> >> >>>
> >> >>> It could not simply append to an
> OutputStream,
> >> since it must either:
> >> >>> * Seek to the start to validate the
> schemas
> >> match and find the sync
> >> >>> marker, or
> >> >>> * Trust that the schemas match and
> find the
> >> sync marker from the last
> >> >>> block
> >> >>>
> >> >>> DataFileWriter cannot refer to Hadoop
> classes
> >> such as FileSystem, but we
> >> >>> could add something to the mapred
> module that
> >> takes a Path and
> >> >>> FileSystem and returns something that
> >> implemements an interface that
> >> >>> DataFileWriter can append to. 
> This would
> >> be something that is both a
> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
> >> >>> and an OutputStream, or has both an
> InputStream
> >> from the start of the
> >> >>> existing file and an OutputStream at
> the end.
> >> >>>
> >> >>>> Thanks,
> >> >>>> Vyacheslav
> >> >>>>
> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh
> J
> >> wrote:
> >> >>>>
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> Use the appendTo feature of
> the
> >> DataFileWriter. See
> >> >>>>>
> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
> >> >>>>>
> >> >>>>> For a quick setup example,
> read also:
> >> >>>>>
> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
> >> >>>>>
> >> >>>>> On Tue, Feb 21, 2012 at 3:15
> AM,
> >> Vyacheslav Zholudev
> >> >>>>> <vyacheslav.zholu...@gmail.com>
> >> wrote:
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> is it possible to append
> to an
> >> already existing avro file when it was
> >> >>>>>> written and closed
> before?
> >> >>>>>>
> >> >>>>>> If I use
> >> >>>>>> outputStream =
> >> fs.append(avroFilePath);
> >> >>>>>>
> >> >>>>>> then later on I get:
> >> java.io.IOException: Invalid sync!
> >> >>>>>>
> >> >>>>>> Probably because the
> schema is
> >> written twice and some other issues.
> >> >>>>>>
> >> >>>>>> If I use outputStream =
> >> fs.create(avroFilePath); then the avro file
> >> >>>>>> gets
> >> >>>>>> overwritten.
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Vyacheslav
> >> >>>>>
> >> >>>>> --
> >> >>>>> Harsh J
> >> >>>>> Customer Ops. Engineer
> >> >>>>> Cloudera | http://tiny.cloudera.com/about
> >> >
> >>
>

Reply via email to