I confess to being a user of rather than a developer of open source, but 
perhaps you could elaborate on what "depends on" means and what the 
consequences are?

Isn't it -- or couldn't it be made -- a run-time binding, so that only those 
who try to use the HDFS append functionality would be required to also include 
the HDFS Jars in their classpath?

Or is the issue more of a bookkeeping one, whereby every update to HDFS will 
require an Avro regression test?

Now that Hive supports Avro as of the Jan. 11 release of Hive 0.10, the use 
case of ingesting data into Avro on HDFS is only going to get more popular, and 
appending is very handy for ingesting, especially for live real-time or 
near-real-time data.  So it seems to me that if the inconveniences are minor or 
can be worked around, that Avro indeed should perhaps "depend on" HDFS.

--- On Thu, 2/7/13, Harsh J <ha...@cloudera.com> wrote:

> From: Harsh J <ha...@cloudera.com>
> Subject: Re: Is it possible to append to an already existing avro file
> To: user@avro.apache.org
> Date: Thursday, February 7, 2013, 9:28 AM
> I assume by non-trivial you meant the
> extra Seekable stuff I needed to
> wrap around the DFS output streams to let Avro take it as
> append-able?
> I don't think its possible for Avro to carry it since Avro
> (core) does
> not reverse-depend on Hadoop. Should we document it
> somewhere though?
> Do you have any ideas on the best place to do that?
> 
> On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <michaelma...@yahoo.com>
> wrote:
> > Thanks so much for the code -- it works great!
> >
> > Since it is a non-trivial amount of code required to
> > achieve append, I suggest attaching that code to AVRO-1035,
> > in the hopes that someone will come up with an interface
> > that requires just one line of user code to achieve append.
> >
> > --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com>
> wrote:
> >
> >> From: Harsh J <ha...@cloudera.com>
> >> Subject: Re: Is it possible to append to an already existing avro file
> >> To: user@avro.apache.org
> >> Date: Wednesday, February 6, 2013, 11:17 AM
> >> Hey Michael,
> >>
> >> It does implement the regular Java OutputStream interface,
> >> as seen in
> >> the API: 
> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
> >>
> >> Here's a sample program that works on Hadoop 2.x in my
> >> tests:
> >> https://gist.github.com/QwertyManiac/4724582

Reply via email to