Hi Eugene,

In order to reuse HadoopInputFormatIO, this is what I am thinking -

1. Extend HadoopInputFormatBoundedSource to create - HCatalogBoundedSource
2. Override necessary methods in HCatalogBoundedSource to perform
HCatalog-specific steps. ( overriding computeSplitsIfNecessary() method
should be enough as I see it now )
3. Use HCatalogBoundedSource and HadoopInputFormatReader in HCatalog
wrapper class to perform IO

Initially I started this way but since it involves modifying
HadoopInputFormatReader
/ HadoopInputFormatBoundedSource to make it public / extensible, I wasn't
sure if this fits with Beam authoring guidelines and hence came up with the
solution I shared in my earlier note.

Please let me know your thoughts !

*HadoopInputFormatIO *-
https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L172

HadoopInputFormatBoundedSource -
https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L367

HadoopInputFormatReader -
https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L584

On Thu, May 11, 2017 at 4:57 PM, Seshadri Raghunathan <sesh...@gmail.com>
wrote:

> Thanks Eugene, that makes sense. This solution heavily borrows on 
> HadoopInputFormatIO
> with a tweak for HCatalog (and related parameters). I will try to re-use  
> HadoopInputFormatIO
> rather than the current approach.
>
> On Thu, May 11, 2017 at 4:44 PM, Eugene Kirpichov <
> kirpic...@google.com.invalid> wrote:
>
>> Thanks Seshadri! This seems to have a great deal of copy-paste from
>> HadoopInputFormatIO. Is it possible to instead implement this connector as
>> a wrapper around it, rather than copy-paste?
>>
>> On Thu, May 11, 2017 at 4:41 PM Seshadri Raghunathan <sesh...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > Here is a draft implementation of this proposal -
>> >
>> > https://github.com/seshadri-cr/beam/commit/78cdf8772f2cd5bb9
>> cd018b1c99c3ad0854157c1
>> >
>> > Many thanks to Ismaël Mejía who helped in a high level review &
>> follow-up
>> > of this design / approach.
>> >
>> > Looking forward for further review/comments from wider community to move
>> > forward on this proposal.
>> >
>> > Thanks,
>> > Seshadri
>> >
>> >
>> > On Wed, May 10, 2017 at 3:05 PM, Madhusudan Borkar <mbor...@etouch.net>
>> > wrote:
>> >
>> > > Hi all,
>> > > Thank you for your response to the earlier proposal. Taking into
>> account
>> > > all the suggestions, we are making a new proposal for Hive connector.
>> > > Please, let us know your feedback.
>> > >
>> > > [1]
>> > > https://docs.google.com/document/d/1aeQRLXjVr38Z03_
>> > > zWkHO9YQhtnj0jHoCfhsSNm-wxtA/edit?usp=sharing
>> > >
>> > > [2] https://issues.apache.org/jira/browse/BEAM-1158
>> > > <https://issues.apache.org/jira/browse/BEAM-1158>
>> > >
>> > > Madhu Borkar
>> > >
>> >
>>
>
>

Reply via email to