Re: Hadoop File API v.s. Commons VFS

2020-03-10 Thread David Mollitor
And by "wow" I mean to say, "your input was awesome and generous," not "wow
that's a lot of work."

On Tue, Mar 10, 2020, 9:24 PM David Mollitor  wrote:

> Wow. Thanks for that started point.
>
> On Tue, Mar 10, 2020, 8:48 PM Owen O'Malley 
> wrote:
>
>> It would be a lot of work. Of course there is a lot of overlap, but they
>> have different use cases, so there are significant differences. From the
>> big data side, there are a lot of blockers.
>>
>>1. CVFS does not have the concept of replication, so there is no way
>>to get or set a file's replication.
>>2. It doesn't look like CVFS supports appending to files.
>>3. CVFS doesn't support data locality.
>>4. CVFS positioned reads are difficult/inefficient. The equivalent of
>>file.readFully(seekPos, buffer, offset, length) is
>>1. FileContent fc = file.getContent();
>>   2. RandomAccessContent random = fc.getRandomAccessContent();
>>   3. random.seek(seekPos);
>>   4. InputStream stream = random.getInputStream()
>>   5. loop until stream.read(buffer, offset, length) returns enough
>>   bytes.
>>
>> .. Owen
>>
>> On Tue, Mar 10, 2020 at 3:57 PM David Mollitor  wrote:
>>
>>> I just see a lot of overlap and doubling of effort here.  Would be nice
>>> if
>>> we can all be working in tandem.
>>>
>>> On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri  wrote:
>>>
>>> > It is a good question. I'm not familiar with Apache commons VFS (which
>>> I
>>> > assume you are talking about, versus the BSD/Unix VFS layer). There no
>>> > doubt will be semantic differences between Hadoop FS interface and
>>> VFS. It
>>> > would be an interesting exercise to implement a connector that bridges
>>> the
>>> > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone
>>> else
>>> > looked at this or have experience with Apache VFS?
>>> >
>>> > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor 
>>> wrote:
>>> >
>>> >> Hello,
>>> >>
>>> >> I'm curious to know what the history of Hadoop File API is in
>>> relationship
>>> >> to VFS.  Hadoop supports several file schemes and so does VFS.  Why
>>> are
>>> >> there two projects working on this same effort and what are the
>>> pros/cons
>>> >> of each?
>>> >>
>>> >> Thanks.
>>> >>
>>> >
>>>
>>


Re: Hadoop File API v.s. Commons VFS

2020-03-10 Thread David Mollitor
Wow. Thanks for that started point.

On Tue, Mar 10, 2020, 8:48 PM Owen O'Malley  wrote:

> It would be a lot of work. Of course there is a lot of overlap, but they
> have different use cases, so there are significant differences. From the
> big data side, there are a lot of blockers.
>
>1. CVFS does not have the concept of replication, so there is no way
>to get or set a file's replication.
>2. It doesn't look like CVFS supports appending to files.
>3. CVFS doesn't support data locality.
>4. CVFS positioned reads are difficult/inefficient. The equivalent of
>file.readFully(seekPos, buffer, offset, length) is
>1. FileContent fc = file.getContent();
>   2. RandomAccessContent random = fc.getRandomAccessContent();
>   3. random.seek(seekPos);
>   4. InputStream stream = random.getInputStream()
>   5. loop until stream.read(buffer, offset, length) returns enough
>   bytes.
>
> .. Owen
>
> On Tue, Mar 10, 2020 at 3:57 PM David Mollitor  wrote:
>
>> I just see a lot of overlap and doubling of effort here.  Would be nice if
>> we can all be working in tandem.
>>
>> On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri  wrote:
>>
>> > It is a good question. I'm not familiar with Apache commons VFS (which I
>> > assume you are talking about, versus the BSD/Unix VFS layer). There no
>> > doubt will be semantic differences between Hadoop FS interface and VFS.
>> It
>> > would be an interesting exercise to implement a connector that bridges
>> the
>> > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone
>> else
>> > looked at this or have experience with Apache VFS?
>> >
>> > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor 
>> wrote:
>> >
>> >> Hello,
>> >>
>> >> I'm curious to know what the history of Hadoop File API is in
>> relationship
>> >> to VFS.  Hadoop supports several file schemes and so does VFS.  Why are
>> >> there two projects working on this same effort and what are the
>> pros/cons
>> >> of each?
>> >>
>> >> Thanks.
>> >>
>> >
>>
>


Re: Hadoop File API v.s. Commons VFS

2020-03-10 Thread Owen O'Malley
It would be a lot of work. Of course there is a lot of overlap, but they
have different use cases, so there are significant differences. From the
big data side, there are a lot of blockers.

   1. CVFS does not have the concept of replication, so there is no way to
   get or set a file's replication.
   2. It doesn't look like CVFS supports appending to files.
   3. CVFS doesn't support data locality.
   4. CVFS positioned reads are difficult/inefficient. The equivalent of
   file.readFully(seekPos, buffer, offset, length) is
   1. FileContent fc = file.getContent();
  2. RandomAccessContent random = fc.getRandomAccessContent();
  3. random.seek(seekPos);
  4. InputStream stream = random.getInputStream()
  5. loop until stream.read(buffer, offset, length) returns enough
  bytes.

.. Owen

On Tue, Mar 10, 2020 at 3:57 PM David Mollitor  wrote:

> I just see a lot of overlap and doubling of effort here.  Would be nice if
> we can all be working in tandem.
>
> On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri  wrote:
>
> > It is a good question. I'm not familiar with Apache commons VFS (which I
> > assume you are talking about, versus the BSD/Unix VFS layer). There no
> > doubt will be semantic differences between Hadoop FS interface and VFS.
> It
> > would be an interesting exercise to implement a connector that bridges
> the
> > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone
> else
> > looked at this or have experience with Apache VFS?
> >
> > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor 
> wrote:
> >
> >> Hello,
> >>
> >> I'm curious to know what the history of Hadoop File API is in
> relationship
> >> to VFS.  Hadoop supports several file schemes and so does VFS.  Why are
> >> there two projects working on this same effort and what are the
> pros/cons
> >> of each?
> >>
> >> Thanks.
> >>
> >
>


Re: Hadoop File API v.s. Commons VFS

2020-03-10 Thread David Mollitor
I just see a lot of overlap and doubling of effort here.  Would be nice if
we can all be working in tandem.

On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri  wrote:

> It is a good question. I'm not familiar with Apache commons VFS (which I
> assume you are talking about, versus the BSD/Unix VFS layer). There no
> doubt will be semantic differences between Hadoop FS interface and VFS. It
> would be an interesting exercise to implement a connector that bridges the
> gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone else
> looked at this or have experience with Apache VFS?
>
> On Fri, Feb 28, 2020 at 6:42 AM David Mollitor  wrote:
>
>> Hello,
>>
>> I'm curious to know what the history of Hadoop File API is in relationship
>> to VFS.  Hadoop supports several file schemes and so does VFS.  Why are
>> there two projects working on this same effort and what are the pros/cons
>> of each?
>>
>> Thanks.
>>
>


Re: Hadoop File API v.s. Commons VFS

2020-03-10 Thread Aaron Fabbri
It is a good question. I'm not familiar with Apache commons VFS (which I
assume you are talking about, versus the BSD/Unix VFS layer). There no
doubt will be semantic differences between Hadoop FS interface and VFS. It
would be an interesting exercise to implement a connector that bridges the
gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone else
looked at this or have experience with Apache VFS?

On Fri, Feb 28, 2020 at 6:42 AM David Mollitor  wrote:

> Hello,
>
> I'm curious to know what the history of Hadoop File API is in relationship
> to VFS.  Hadoop supports several file schemes and so does VFS.  Why are
> there two projects working on this same effort and what are the pros/cons
> of each?
>
> Thanks.
>


Hadoop File API v.s. Commons VFS

2020-02-28 Thread David Mollitor
Hello,

I'm curious to know what the history of Hadoop File API is in relationship
to VFS.  Hadoop supports several file schemes and so does VFS.  Why are
there two projects working on this same effort and what are the pros/cons
of each?

Thanks.