On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski < [email protected]> wrote:
> Based on your feedback, I think it'd be fine to deal with the problem as > follows: > * for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai` > * for Java: create a `google-cloud-platform-ai` module in > `sdks/java/extensions` folder > > As for cross language, we expect those transforms to be quite simple, so > the cost of implementing them twice is not that high. > One option would be to implement inference in a library like tfx_bsl [1]. It comes with a generalized Beam transform that can do inference either from a saved model file or by using a service endpoint. The service endpoint API option is there and could support cloud AI APIs. If we utilize tfx_bsl, we will leverage the existing TFX integration and would avoid creating a parallel set of transforms. Then for Java, we could enable the same interface with cross language transform and offer a unified inference API for both languages. [1] https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78 > > Thanks for your input, > Kamil > > On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <[email protected]> wrote: > >> If it's in Java also be careful to align with the current google cloud >> IO's, certainly it's dependencies. The google IO's are not depending on the >> the newest client libraries and that's something we're sometimes struggling >> with when we depend on our own client libraries. So make sure to align them. >> >> Also note that although gRPC is vendored, the google IO's do still have >> their own dependency on gRPC and this is the biggest reason for trouble. >> >> _/ >> _/ Alex Van Boxel >> >> >> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <[email protected]> wrote: >> >>> It depends on what language the client libraries are exposed in. For >>> example, if the client libraries are in Java, sdks/java/extensions makes >>> sense while if its Python then integrating it within the gcp extension >>> within sdks/python/apache_beam makes sense. >>> >>> Adding additional dependencies is ok depending on the licensing and the >>> process is slightly different for each language. >>> >>> For transforms that are complicated, there is a cross language effort >>> going on so that one can execute one language's transforms within another >>> languages pipeline which may remove the need to write the transforms more >>> then once. >>> >>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <[email protected]> wrote: >>> >>>> Nice idea, IO looks like a good place for them but there is another >>>> path that could fit this case: `sdks/java/extensions`, some module like >>>> `google-cloud-platform-ai` in that folder or something like that, no? >>>> >>>> In any case great initiative. +1 >>>> >>>> >>>> >>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski < >>>> [email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> We’d like to implement a set of PTransforms that would allow users to >>>>> use some of the Google Cloud AI services in Beam pipelines. >>>>> >>>>> Here's the full list of services and functionalities we’d like to >>>>> integrate Beam with: >>>>> >>>>> * Video Intelligence [1] >>>>> >>>>> * Cloud Natural Language [2] >>>>> >>>>> * Cloud AI Platform Prediction [3] >>>>> >>>>> * Data Masking/Tokenization [4] >>>>> >>>>> * Inspecting image data for sensitive information using Cloud Vision >>>>> [5] >>>>> >>>>> However, we're not sure whether to put those transforms directly into >>>>> Beam, because they would require some additional GCP dependencies. One of >>>>> our ideas is a separate library, that depends on Beam and that can be >>>>> installed optionally, stored somewhere in the beam repository (e.g. in the >>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or >>>>> maybe it is totally fine to put them into SDKs, just like other IOs? >>>>> >>>>> If you have any other thoughts, do not hesitate to let us know. >>>>> >>>>> Best, >>>>> >>>>> Kamil >>>>> >>>>> [1] https://cloud.google.com/video-intelligence/ >>>>> >>>>> [2] https://cloud.google.com/natural-language/ >>>>> >>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview >>>>> >>>>> [4] >>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming >>>>> >>>>> [5] https://cloud.google.com/vision/ >>>>> >>>>
