Hi! Why do you need at all to override methods like computeSplitsIfNecessary - is HCatalogIO substantially different from other HadoopInputFormat's that it can not be handled by the generic code of HadoopInputFormatIO? I looked at the implementation in your commit and it seems identical, except for one line - "HCatInputFormat.setInput(conf.getHadoopConfiguration(), database, table, filter)" - but this line seems like simply specifying the Configuration for the HadoopInputFormatIO, which can be done by HadoopInputFormatIO.withConfiguration().
I.e. so far it seems like HCatalogIO can be implemented by *configuring* HadoopInputFormatIO, rather than extending it. Am I missing something? On Fri, May 12, 2017 at 12:11 PM Seshadri Raghunathan <sesh...@gmail.com> wrote: > Hi Eugene, > > In order to reuse HadoopInputFormatIO, this is what I am thinking - > > 1. Extend HadoopInputFormatBoundedSource to create - HCatalogBoundedSource > 2. Override necessary methods in HCatalogBoundedSource to perform > HCatalog-specific steps. ( overriding computeSplitsIfNecessary() method > should be enough as I see it now ) > 3. Use HCatalogBoundedSource and HadoopInputFormatReader in HCatalog > wrapper class to perform IO > > Initially I started this way but since it involves modifying > HadoopInputFormatReader > / HadoopInputFormatBoundedSource to make it public / extensible, I wasn't > sure if this fits with Beam authoring guidelines and hence came up with the > solution I shared in my earlier note. > > Please let me know your thoughts ! > > *HadoopInputFormatIO *- > > https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L172 > > HadoopInputFormatBoundedSource - > > https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L367 > > HadoopInputFormatReader - > > https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L584 > > On Thu, May 11, 2017 at 4:57 PM, Seshadri Raghunathan <sesh...@gmail.com> > wrote: > > > Thanks Eugene, that makes sense. This solution heavily borrows on > HadoopInputFormatIO > > with a tweak for HCatalog (and related parameters). I will try to > re-use HadoopInputFormatIO > > rather than the current approach. > > > > On Thu, May 11, 2017 at 4:44 PM, Eugene Kirpichov < > > kirpic...@google.com.invalid> wrote: > > > >> Thanks Seshadri! This seems to have a great deal of copy-paste from > >> HadoopInputFormatIO. Is it possible to instead implement this connector > as > >> a wrapper around it, rather than copy-paste? > >> > >> On Thu, May 11, 2017 at 4:41 PM Seshadri Raghunathan <sesh...@gmail.com > > > >> wrote: > >> > >> > Hi all, > >> > > >> > Here is a draft implementation of this proposal - > >> > > >> > https://github.com/seshadri-cr/beam/commit/78cdf8772f2cd5bb9 > >> cd018b1c99c3ad0854157c1 > >> > > >> > Many thanks to Ismaël Mejía who helped in a high level review & > >> follow-up > >> > of this design / approach. > >> > > >> > Looking forward for further review/comments from wider community to > move > >> > forward on this proposal. > >> > > >> > Thanks, > >> > Seshadri > >> > > >> > > >> > On Wed, May 10, 2017 at 3:05 PM, Madhusudan Borkar < > mbor...@etouch.net> > >> > wrote: > >> > > >> > > Hi all, > >> > > Thank you for your response to the earlier proposal. Taking into > >> account > >> > > all the suggestions, we are making a new proposal for Hive > connector. > >> > > Please, let us know your feedback. > >> > > > >> > > [1] > >> > > https://docs.google.com/document/d/1aeQRLXjVr38Z03_ > >> > > zWkHO9YQhtnj0jHoCfhsSNm-wxtA/edit?usp=sharing > >> > > > >> > > [2] https://issues.apache.org/jira/browse/BEAM-1158 > >> > > <https://issues.apache.org/jira/browse/BEAM-1158> > >> > > > >> > > Madhu Borkar > >> > > > >> > > >> > > > > >