Hi dipti!

It sounds like there are two possible implementation options:
1. HdfsIO that is implemented using HadoopInputFormatIO
2. HdfsIO that is implemented using IOChannelFactory (I think
BeamFileSystem is the new name?)

Either way, I agree that it makes sense to have one module that contains
the IO transforms that rely on Hadoop, so it sounds like merging them is a
good path forward.

What I'm not sure is whether we agree that dipti should submit the change
without having to refactor HdfsIO? (with some obvious, easy refactoring to
eg. use WritableCoder/serializableSplit?) I don't want to create too much
additional work, but if the correct implementation is #1 (HdfsIO uses
HIFIO), then it seems like the right time to do that would probably be now
given how much code is shared there. If the correct answer is #2, then I
don't think we should do that refactoring now.

S

On Wed, Feb 15, 2017 at 11:27 AM Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi
>
> I guess your saw my comment in the PR. Basically I was waiting the
> refactoring of IOChannelFactory to refactore hdfs IO as hadoop file format
> on top of IOChannelFactory. I would have wait a bit and I would be more
> than happy to help you on the PR.
>
> Regards
> JB
>
> On Feb 15, 2017, 14:55, at 14:55, Dipti Kulkarni <
> [email protected]> wrote:
> >Hello there!
> >I am working on writing a Read IO for Hadoop InputFormat. This will
> >enable reading from any datasource which supports Hadoop InputFormat,
> >i.e. provides source to read from InputFormat for integration with
> >Hadoop.
> >It makes sense for the HadoopInputFormatIO to share some code with the
> >HdfsIO - WritableCoder in particular, but also some helper classes like
> >SerializableSplit etc. I was wondering if we could move HDFS and
> >HadoopInputFormat into a shared module for Hadoop IO in general instead
> >of maintaining them separately.
> >Do let me know on what you think, please let me know if you can think
> >of any other ideas too.
> >
> >Thanks,
> >Dipti
> >
> >
> >DISCLAIMER
> >==========
> >This e-mail may contain privileged and confidential information which
> >is the property of Persistent Systems Ltd. It is intended only for the
> >use of the individual or entity to which it is addressed. If you are
> >not the intended recipient, you are not authorized to read, retain,
> >copy, print, distribute or use this message. If you have received this
> >communication in error, please notify the sender and delete all copies
> >of this message. Persistent Systems Ltd. does not accept any liability
> >for virus infected mails.
>

Reply via email to