Re: Hadoop File API v.s. Commons VFS
And by "wow" I mean to say, "your input was awesome and generous," not "wow that's a lot of work." On Tue, Mar 10, 2020, 9:24 PM David Mollitor wrote: > Wow. Thanks for that started point. > > On Tue, Mar 10, 2020, 8:48 PM Owen O'Malley > wrote: > >> It would be a lot of work. Of course there is a lot of overlap, but they >> have different use cases, so there are significant differences. From the >> big data side, there are a lot of blockers. >> >>1. CVFS does not have the concept of replication, so there is no way >>to get or set a file's replication. >>2. It doesn't look like CVFS supports appending to files. >>3. CVFS doesn't support data locality. >>4. CVFS positioned reads are difficult/inefficient. The equivalent of >>file.readFully(seekPos, buffer, offset, length) is >>1. FileContent fc = file.getContent(); >> 2. RandomAccessContent random = fc.getRandomAccessContent(); >> 3. random.seek(seekPos); >> 4. InputStream stream = random.getInputStream() >> 5. loop until stream.read(buffer, offset, length) returns enough >> bytes. >> >> .. Owen >> >> On Tue, Mar 10, 2020 at 3:57 PM David Mollitor wrote: >> >>> I just see a lot of overlap and doubling of effort here. Would be nice >>> if >>> we can all be working in tandem. >>> >>> On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri wrote: >>> >>> > It is a good question. I'm not familiar with Apache commons VFS (which >>> I >>> > assume you are talking about, versus the BSD/Unix VFS layer). There no >>> > doubt will be semantic differences between Hadoop FS interface and >>> VFS. It >>> > would be an interesting exercise to implement a connector that bridges >>> the >>> > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone >>> else >>> > looked at this or have experience with Apache VFS? >>> > >>> > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor >>> wrote: >>> > >>> >> Hello, >>> >> >>> >> I'm curious to know what the history of Hadoop File API is in >>> relationship >>> >> to VFS. Hadoop supports several file schemes and so does VFS. Why >>> are >>> >> there two projects working on this same effort and what are the >>> pros/cons >>> >> of each? >>> >> >>> >> Thanks. >>> >> >>> > >>> >>
Re: Hadoop File API v.s. Commons VFS
Wow. Thanks for that started point. On Tue, Mar 10, 2020, 8:48 PM Owen O'Malley wrote: > It would be a lot of work. Of course there is a lot of overlap, but they > have different use cases, so there are significant differences. From the > big data side, there are a lot of blockers. > >1. CVFS does not have the concept of replication, so there is no way >to get or set a file's replication. >2. It doesn't look like CVFS supports appending to files. >3. CVFS doesn't support data locality. >4. CVFS positioned reads are difficult/inefficient. The equivalent of >file.readFully(seekPos, buffer, offset, length) is >1. FileContent fc = file.getContent(); > 2. RandomAccessContent random = fc.getRandomAccessContent(); > 3. random.seek(seekPos); > 4. InputStream stream = random.getInputStream() > 5. loop until stream.read(buffer, offset, length) returns enough > bytes. > > .. Owen > > On Tue, Mar 10, 2020 at 3:57 PM David Mollitor wrote: > >> I just see a lot of overlap and doubling of effort here. Would be nice if >> we can all be working in tandem. >> >> On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri wrote: >> >> > It is a good question. I'm not familiar with Apache commons VFS (which I >> > assume you are talking about, versus the BSD/Unix VFS layer). There no >> > doubt will be semantic differences between Hadoop FS interface and VFS. >> It >> > would be an interesting exercise to implement a connector that bridges >> the >> > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone >> else >> > looked at this or have experience with Apache VFS? >> > >> > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor >> wrote: >> > >> >> Hello, >> >> >> >> I'm curious to know what the history of Hadoop File API is in >> relationship >> >> to VFS. Hadoop supports several file schemes and so does VFS. Why are >> >> there two projects working on this same effort and what are the >> pros/cons >> >> of each? >> >> >> >> Thanks. >> >> >> > >> >
Re: Hadoop File API v.s. Commons VFS
It would be a lot of work. Of course there is a lot of overlap, but they have different use cases, so there are significant differences. From the big data side, there are a lot of blockers. 1. CVFS does not have the concept of replication, so there is no way to get or set a file's replication. 2. It doesn't look like CVFS supports appending to files. 3. CVFS doesn't support data locality. 4. CVFS positioned reads are difficult/inefficient. The equivalent of file.readFully(seekPos, buffer, offset, length) is 1. FileContent fc = file.getContent(); 2. RandomAccessContent random = fc.getRandomAccessContent(); 3. random.seek(seekPos); 4. InputStream stream = random.getInputStream() 5. loop until stream.read(buffer, offset, length) returns enough bytes. .. Owen On Tue, Mar 10, 2020 at 3:57 PM David Mollitor wrote: > I just see a lot of overlap and doubling of effort here. Would be nice if > we can all be working in tandem. > > On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri wrote: > > > It is a good question. I'm not familiar with Apache commons VFS (which I > > assume you are talking about, versus the BSD/Unix VFS layer). There no > > doubt will be semantic differences between Hadoop FS interface and VFS. > It > > would be an interesting exercise to implement a connector that bridges > the > > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone > else > > looked at this or have experience with Apache VFS? > > > > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor > wrote: > > > >> Hello, > >> > >> I'm curious to know what the history of Hadoop File API is in > relationship > >> to VFS. Hadoop supports several file schemes and so does VFS. Why are > >> there two projects working on this same effort and what are the > pros/cons > >> of each? > >> > >> Thanks. > >> > > >
Re: Hadoop File API v.s. Commons VFS
I just see a lot of overlap and doubling of effort here. Would be nice if we can all be working in tandem. On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri wrote: > It is a good question. I'm not familiar with Apache commons VFS (which I > assume you are talking about, versus the BSD/Unix VFS layer). There no > doubt will be semantic differences between Hadoop FS interface and VFS. It > would be an interesting exercise to implement a connector that bridges the > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone else > looked at this or have experience with Apache VFS? > > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor wrote: > >> Hello, >> >> I'm curious to know what the history of Hadoop File API is in relationship >> to VFS. Hadoop supports several file schemes and so does VFS. Why are >> there two projects working on this same effort and what are the pros/cons >> of each? >> >> Thanks. >> >
Re: Hadoop File API v.s. Commons VFS
It is a good question. I'm not familiar with Apache commons VFS (which I assume you are talking about, versus the BSD/Unix VFS layer). There no doubt will be semantic differences between Hadoop FS interface and VFS. It would be an interesting exercise to implement a connector that bridges the gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone else looked at this or have experience with Apache VFS? On Fri, Feb 28, 2020 at 6:42 AM David Mollitor wrote: > Hello, > > I'm curious to know what the history of Hadoop File API is in relationship > to VFS. Hadoop supports several file schemes and so does VFS. Why are > there two projects working on this same effort and what are the pros/cons > of each? > > Thanks. >
Hadoop File API v.s. Commons VFS
Hello, I'm curious to know what the history of Hadoop File API is in relationship to VFS. Hadoop supports several file schemes and so does VFS. Why are there two projects working on this same effort and what are the pros/cons of each? Thanks.