Re: Python SDK: BigTableIO

Matthias Baetens Thu, 01 Jun 2017 02:19:47 -0700

Hi Cham, Stephan,

Thanks a lot for the input, really useful to get started.


We'll probably start with implementing the Source (looks the most
straightforward).
I made a working document
<https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing>
to
organise and track our progress a bit, happy to discuss or receive feedback
there as well. We made a JIRA issue
<https://issues.apache.org/jira/browse/BEAM-2395> as well; should we get
assigned to it?

About writing the Sink: are there any examples of how this was done
previously where we can get some inspiration from? I think it would be good
to discuss this in more detail once we finish writing the Source.

Matthias
ᐧ

On Tue, May 30, 2017 at 7:28 PM, Stephen Sisk <[email protected]>
wrote:

> Hey Matthias,
>
> to add on to what Chamikara mentioned, we have lots of info in the generic
> IO authoring guide [1], the Python IO authoring guide [2] and the
> PTransform Style Guide[3].  The PTransform style guide doesn't sound like
> it applies, but it has a lot of specific tips from lessons we've learned in
> the past from I/O work.
>
> If you plan on contributing it back to the community, I'd also suggest
> opening up a JIRA issue & updating the beam website (eg [4]) that you're
> working on this (those steps are pretty trivial.)
>
> We've recently been trying out using branches when we add new I/Os since
> the PRs tend to get bigger than we like for a since PR.
>
> Please feel free to email the dev mailing list if you have questions! We
> are excited and happy to help out with thinking about design/etc... (eg, as
> cham hinted at, should you use a Source vs. use regular ParDo transforms?)
>
> S
>
> [1] https://beam.apache.org/documentation/io/authoring-overview/
> [2] https://beam.apache.org/documentation/sdks/python-custom-io/
> [3] https://beam.apache.org/contribute/ptransform-style-guide/
> [4] https://github.com/apache/beam-site/pull/250
>
> On Sun, May 28, 2017 at 5:32 PM Chamikara Jayalath <[email protected]>
> wrote:
>
> > Thanks for offering to help. I would suggest to look into existing Java
> > BigTableIO connector and currently available Python client library for
> > Cloud BigTable to see how feasible it is to develop an efficient BigTable
> > connector at this point. From Python SDK's perspective you can use
> > iobase.BoundedSource API (wrapped by a PTrasnform) to develop a read
> > PTransform with support for dynamic/static splitting. Sinks are usually
> > developed as PTransforms (iobase.Sink interface is deprecated so I
> suggest
> > not to use that). I would be happy to review any PRs related to this.
> >
> > Thanks,
> > Cham
> >
> > On Sun, May 28, 2017 at 2:30 AM Matthias Baetens <
> > [email protected]> wrote:
> >
> > > Hey guys,
> > >
> > > We have been using Beam for quite a few months now, so we (my colleague
> > > Robert & I) thought it might be cool to contribute a bit as well.
> > >
> > > The challenge we want to take up is writing the BigTableIO for the
> Python
> > > SDK (which is not yet in the works according to the website
> > > <
> > >
> > https://github.com/apache/beam-site/blob/asf-site/src/
> documentation/io/built-in.md
> > > >.
> > > I have searched JIRA for the BigTableIO issue and did not find it, so I
> > > suppose this is the first step we take.
> > >
> > > Any pointers or feedback more than welcome!
> > >
> > > Best,
> > >
> > > Matthias
> > >
> >
>



-- 


*Matthias Baetens*


*datatonic | data power unleashed*

office +44 203 668 3680  |  mobile +44 74 918 20646

Level24 | 1 Canada Square | Canary Wharf | E14 5AB London


We've been announced
<https://blog.google/topics/google-cloud/investing-vibrant-google-cloud-ecosystem-new-programs-and-partnerships/>
as
one of the top global Google Cloud Machine Learning partners.

Re: Python SDK: BigTableIO

Reply via email to