Thanks. I added some comments to the doc.

Davor should be able to assign this JIRA to you. Also, Solomon who
implemented the Java BigTable connector might have more input here.

- Cham

On Thu, Jun 1, 2017 at 2:19 AM Matthias Baetens <
[email protected]> wrote:

> Hi Cham, Stephan,
>
> Thanks a lot for the input, really useful to get started.
>
> We'll probably start with implementing the Source (looks the most
> straightforward).
> I made a working document
> <
> https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing
> >
> to
> organise and track our progress a bit, happy to discuss or receive feedback
> there as well. We made a JIRA issue
> <https://issues.apache.org/jira/browse/BEAM-2395> as well; should we get
> assigned to it?
>
> About writing the Sink: are there any examples of how this was done
> previously where we can get some inspiration from? I think it would be good
> to discuss this in more detail once we finish writing the Source.
>
> Matthias
> ᐧ
>
> On Tue, May 30, 2017 at 7:28 PM, Stephen Sisk <[email protected]>
> wrote:
>
> > Hey Matthias,
> >
> > to add on to what Chamikara mentioned, we have lots of info in the
> generic
> > IO authoring guide [1], the Python IO authoring guide [2] and the
> > PTransform Style Guide[3].  The PTransform style guide doesn't sound like
> > it applies, but it has a lot of specific tips from lessons we've learned
> in
> > the past from I/O work.
> >
> > If you plan on contributing it back to the community, I'd also suggest
> > opening up a JIRA issue & updating the beam website (eg [4]) that you're
> > working on this (those steps are pretty trivial.)
> >
> > We've recently been trying out using branches when we add new I/Os since
> > the PRs tend to get bigger than we like for a since PR.
> >
> > Please feel free to email the dev mailing list if you have questions! We
> > are excited and happy to help out with thinking about design/etc... (eg,
> as
> > cham hinted at, should you use a Source vs. use regular ParDo
> transforms?)
> >
> > S
> >
> > [1] https://beam.apache.org/documentation/io/authoring-overview/
> > [2] https://beam.apache.org/documentation/sdks/python-custom-io/
> > [3] https://beam.apache.org/contribute/ptransform-style-guide/
> > [4] https://github.com/apache/beam-site/pull/250
> >
> > On Sun, May 28, 2017 at 5:32 PM Chamikara Jayalath <[email protected]
> >
> > wrote:
> >
> > > Thanks for offering to help. I would suggest to look into existing Java
> > > BigTableIO connector and currently available Python client library for
> > > Cloud BigTable to see how feasible it is to develop an efficient
> BigTable
> > > connector at this point. From Python SDK's perspective you can use
> > > iobase.BoundedSource API (wrapped by a PTrasnform) to develop a read
> > > PTransform with support for dynamic/static splitting. Sinks are usually
> > > developed as PTransforms (iobase.Sink interface is deprecated so I
> > suggest
> > > not to use that). I would be happy to review any PRs related to this.
> > >
> > > Thanks,
> > > Cham
> > >
> > > On Sun, May 28, 2017 at 2:30 AM Matthias Baetens <
> > > [email protected]> wrote:
> > >
> > > > Hey guys,
> > > >
> > > > We have been using Beam for quite a few months now, so we (my
> colleague
> > > > Robert & I) thought it might be cool to contribute a bit as well.
> > > >
> > > > The challenge we want to take up is writing the BigTableIO for the
> > Python
> > > > SDK (which is not yet in the works according to the website
> > > > <
> > > >
> > > https://github.com/apache/beam-site/blob/asf-site/src/
> > documentation/io/built-in.md
> > > > >.
> > > > I have searched JIRA for the BigTableIO issue and did not find it,
> so I
> > > > suppose this is the first step we take.
> > > >
> > > > Any pointers or feedback more than welcome!
> > > >
> > > > Best,
> > > >
> > > > Matthias
> > > >
> > >
> >
>
>
>
> --
>
>
> *Matthias Baetens*
>
>
> *datatonic | data power unleashed*
>
> office +44 203 668 3680 <+44%2020%203668%203680>  |  mobile +44 74 918
> 20646
>
> Level24 | 1 Canada Square | Canary Wharf | E14 5AB London
>
>
> We've been announced
> <
> https://blog.google/topics/google-cloud/investing-vibrant-google-cloud-ecosystem-new-programs-and-partnerships/
> >
> as
> one of the top global Google Cloud Machine Learning partners.
>

Reply via email to