Hi Pei,

rethinking about that, I understand that the purpose of the Beam filesystem is to avoid to bring a bunch of dependencies into the core. That makes perfect sense.

So, I agree that a Beam filesystem abstract is fine.

My point is that we should provide a HadoopFilesystem extension/plugin for Beam filesystem asap: that would help us to support a good range of filesystems quickly.

Just my $0.01 ;)

Regards
JB

On 11/17/2016 08:18 PM, Pei He wrote:
Hi JB,
My proposals are based on the current IOChannelFactory, and how they are
used in FileBasedSink.

Let's me spend more time to investigate Hadoop FileSystem interface.
--
Pei

On Thu, Nov 17, 2016 at 1:21 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

By the way, Pei, for the record: why introducing BeamFileSystem and not
using the Hadoop FileSystem interface ?

Thanks
Regards
JB

On 11/17/2016 01:09 AM, Pei He wrote:

Hi,

I am working on BEAM-59
<https://issues.apache.org/jira/browse/BEAM-59> "IOChannelFactory
redesign". The goals are:

1. Support file-based IOs (TextIO, AvorIO) with user-defined file system.

2. Support configuring any user-defined file system.

And, I drafted the design proposal in two parts to address them in order:

Part 1: IOChannelFactory Redesign
<https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJ
sVG3qel2lhdKTknmZ_7M/edit#>

Summary:

Old API: WritableByteChannel create(String spec, String mimeType);

New API: WritableByteChannel create(URI uri, CreateOptions options);

Noticeable proposed changes:


   1.

   Includes the options parameter in most methods to specify behaviors.
   2.

   Replace String with URI to include scheme for files/directories
   locations.
   3.

   Require file systems to provide a SeekableByteChannel for read.
   4.

   Additional methods, such as getMetadata(), rename() e.t.c


Part 2: Configurable BeamFileSystem
<https://docs.google.com/document/d/1-7vo9nLRsEEzDGnb562PuL4
q9mUiq_ZVpCAiyyJw8p8/edit#heading=h.p3gc3colc2cs>

Summary:

Old API: IOChannelUtils.getFactory(glob).match(glob);

New API: BeamFileSystems.getFileSystem(glob, config).match(glob);


Looking for comments and feedback.

Thanks

--

Pei


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to