Hi Eugene, In order to reuse HadoopInputFormatIO, this is what I am thinking -
1. Extend HadoopInputFormatBoundedSource to create - HCatalogBoundedSource 2. Override necessary methods in HCatalogBoundedSource to perform HCatalog-specific steps. ( overriding computeSplitsIfNecessary() method should be enough as I see it now ) 3. Use HCatalogBoundedSource and HadoopInputFormatReader in HCatalog wrapper class to perform IO Initially I started this way but since it involves modifying HadoopInputFormatReader / HadoopInputFormatBoundedSource to make it public / extensible, I wasn't sure if this fits with Beam authoring guidelines and hence came up with the solution I shared in my earlier note. Please let me know your thoughts ! *HadoopInputFormatIO *- https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L172 HadoopInputFormatBoundedSource - https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L367 HadoopInputFormatReader - https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L584 On Thu, May 11, 2017 at 4:57 PM, Seshadri Raghunathan <sesh...@gmail.com> wrote: > Thanks Eugene, that makes sense. This solution heavily borrows on > HadoopInputFormatIO > with a tweak for HCatalog (and related parameters). I will try to re-use > HadoopInputFormatIO > rather than the current approach. > > On Thu, May 11, 2017 at 4:44 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > >> Thanks Seshadri! This seems to have a great deal of copy-paste from >> HadoopInputFormatIO. Is it possible to instead implement this connector as >> a wrapper around it, rather than copy-paste? >> >> On Thu, May 11, 2017 at 4:41 PM Seshadri Raghunathan <sesh...@gmail.com> >> wrote: >> >> > Hi all, >> > >> > Here is a draft implementation of this proposal - >> > >> > https://github.com/seshadri-cr/beam/commit/78cdf8772f2cd5bb9 >> cd018b1c99c3ad0854157c1 >> > >> > Many thanks to Ismaël Mejía who helped in a high level review & >> follow-up >> > of this design / approach. >> > >> > Looking forward for further review/comments from wider community to move >> > forward on this proposal. >> > >> > Thanks, >> > Seshadri >> > >> > >> > On Wed, May 10, 2017 at 3:05 PM, Madhusudan Borkar <mbor...@etouch.net> >> > wrote: >> > >> > > Hi all, >> > > Thank you for your response to the earlier proposal. Taking into >> account >> > > all the suggestions, we are making a new proposal for Hive connector. >> > > Please, let us know your feedback. >> > > >> > > [1] >> > > https://docs.google.com/document/d/1aeQRLXjVr38Z03_ >> > > zWkHO9YQhtnj0jHoCfhsSNm-wxtA/edit?usp=sharing >> > > >> > > [2] https://issues.apache.org/jira/browse/BEAM-1158 >> > > <https://issues.apache.org/jira/browse/BEAM-1158> >> > > >> > > Madhu Borkar >> > > >> > >> > >