Hey Matthias, to add on to what Chamikara mentioned, we have lots of info in the generic IO authoring guide [1], the Python IO authoring guide [2] and the PTransform Style Guide[3]. The PTransform style guide doesn't sound like it applies, but it has a lot of specific tips from lessons we've learned in the past from I/O work.
If you plan on contributing it back to the community, I'd also suggest opening up a JIRA issue & updating the beam website (eg [4]) that you're working on this (those steps are pretty trivial.) We've recently been trying out using branches when we add new I/Os since the PRs tend to get bigger than we like for a since PR. Please feel free to email the dev mailing list if you have questions! We are excited and happy to help out with thinking about design/etc... (eg, as cham hinted at, should you use a Source vs. use regular ParDo transforms?) S [1] https://beam.apache.org/documentation/io/authoring-overview/ [2] https://beam.apache.org/documentation/sdks/python-custom-io/ [3] https://beam.apache.org/contribute/ptransform-style-guide/ [4] https://github.com/apache/beam-site/pull/250 On Sun, May 28, 2017 at 5:32 PM Chamikara Jayalath <[email protected]> wrote: > Thanks for offering to help. I would suggest to look into existing Java > BigTableIO connector and currently available Python client library for > Cloud BigTable to see how feasible it is to develop an efficient BigTable > connector at this point. From Python SDK's perspective you can use > iobase.BoundedSource API (wrapped by a PTrasnform) to develop a read > PTransform with support for dynamic/static splitting. Sinks are usually > developed as PTransforms (iobase.Sink interface is deprecated so I suggest > not to use that). I would be happy to review any PRs related to this. > > Thanks, > Cham > > On Sun, May 28, 2017 at 2:30 AM Matthias Baetens < > [email protected]> wrote: > > > Hey guys, > > > > We have been using Beam for quite a few months now, so we (my colleague > > Robert & I) thought it might be cool to contribute a bit as well. > > > > The challenge we want to take up is writing the BigTableIO for the Python > > SDK (which is not yet in the works according to the website > > < > > > https://github.com/apache/beam-site/blob/asf-site/src/documentation/io/built-in.md > > >. > > I have searched JIRA for the BigTableIO issue and did not find it, so I > > suppose this is the first step we take. > > > > Any pointers or feedback more than welcome! > > > > Best, > > > > Matthias > > >
