Hi Cham, Stephan, Thanks a lot for the input, really useful to get started.
We'll probably start with implementing the Source (looks the most straightforward). I made a working document <https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing> to organise and track our progress a bit, happy to discuss or receive feedback there as well. We made a JIRA issue <https://issues.apache.org/jira/browse/BEAM-2395> as well; should we get assigned to it? About writing the Sink: are there any examples of how this was done previously where we can get some inspiration from? I think it would be good to discuss this in more detail once we finish writing the Source. Matthias ᐧ On Tue, May 30, 2017 at 7:28 PM, Stephen Sisk <s...@google.com.invalid> wrote: > Hey Matthias, > > to add on to what Chamikara mentioned, we have lots of info in the generic > IO authoring guide [1], the Python IO authoring guide [2] and the > PTransform Style Guide[3]. The PTransform style guide doesn't sound like > it applies, but it has a lot of specific tips from lessons we've learned in > the past from I/O work. > > If you plan on contributing it back to the community, I'd also suggest > opening up a JIRA issue & updating the beam website (eg [4]) that you're > working on this (those steps are pretty trivial.) > > We've recently been trying out using branches when we add new I/Os since > the PRs tend to get bigger than we like for a since PR. > > Please feel free to email the dev mailing list if you have questions! We > are excited and happy to help out with thinking about design/etc... (eg, as > cham hinted at, should you use a Source vs. use regular ParDo transforms?) > > S > > [1] https://beam.apache.org/documentation/io/authoring-overview/ > [2] https://beam.apache.org/documentation/sdks/python-custom-io/ > [3] https://beam.apache.org/contribute/ptransform-style-guide/ > [4] https://github.com/apache/beam-site/pull/250 > > On Sun, May 28, 2017 at 5:32 PM Chamikara Jayalath <chamik...@apache.org> > wrote: > > > Thanks for offering to help. I would suggest to look into existing Java > > BigTableIO connector and currently available Python client library for > > Cloud BigTable to see how feasible it is to develop an efficient BigTable > > connector at this point. From Python SDK's perspective you can use > > iobase.BoundedSource API (wrapped by a PTrasnform) to develop a read > > PTransform with support for dynamic/static splitting. Sinks are usually > > developed as PTransforms (iobase.Sink interface is deprecated so I > suggest > > not to use that). I would be happy to review any PRs related to this. > > > > Thanks, > > Cham > > > > On Sun, May 28, 2017 at 2:30 AM Matthias Baetens < > > matthias.baet...@datatonic.com> wrote: > > > > > Hey guys, > > > > > > We have been using Beam for quite a few months now, so we (my colleague > > > Robert & I) thought it might be cool to contribute a bit as well. > > > > > > The challenge we want to take up is writing the BigTableIO for the > Python > > > SDK (which is not yet in the works according to the website > > > < > > > > > https://github.com/apache/beam-site/blob/asf-site/src/ > documentation/io/built-in.md > > > >. > > > I have searched JIRA for the BigTableIO issue and did not find it, so I > > > suppose this is the first step we take. > > > > > > Any pointers or feedback more than welcome! > > > > > > Best, > > > > > > Matthias > > > > > > -- *Matthias Baetens* *datatonic | data power unleashed* office +44 203 668 3680 | mobile +44 74 918 20646 Level24 | 1 Canada Square | Canary Wharf | E14 5AB London We've been announced <https://blog.google/topics/google-cloud/investing-vibrant-google-cloud-ecosystem-new-programs-and-partnerships/> as one of the top global Google Cloud Machine Learning partners.