On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski <
[email protected]> wrote:

> Based on your feedback, I think it'd be fine to deal with the problem as
> follows:
> * for Python: put the transforms into `sdks/python/apache_beam/io/gcp/ai`
> * for Java: create a `google-cloud-platform-ai` module in
> `sdks/java/extensions` folder
>
> As for cross language, we expect those transforms to be quite simple, so
> the cost of implementing them twice is not that high.
>

One option would be to implement inference in a library like tfx_bsl [1].
It comes with a generalized Beam transform that can do inference either
from a saved model file or by using a service endpoint. The service
endpoint API option is there and could support cloud AI APIs. If we utilize
tfx_bsl, we will leverage the existing TFX integration and would avoid
creating a parallel set of transforms. Then for Java, we could enable the
same interface with cross language transform and offer a unified inference
API for both languages.

[1]
https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78



>
> Thanks for your input,
> Kamil
>
> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <[email protected]> wrote:
>
>> If it's in Java also be careful to align with the current google cloud
>> IO's, certainly it's dependencies. The google IO's are not depending on the
>> the newest client libraries and that's something we're sometimes struggling
>> with when we depend on our own client libraries. So make sure to align them.
>>
>> Also note that although gRPC is vendored, the google IO's do still have
>> their own dependency on gRPC and this is the biggest reason for trouble.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <[email protected]> wrote:
>>
>>> It depends on what language the client libraries are exposed in. For
>>> example, if the client libraries are in Java, sdks/java/extensions makes
>>> sense while if its Python then integrating it within the gcp extension
>>> within sdks/python/apache_beam makes sense.
>>>
>>> Adding additional dependencies is ok depending on the licensing and the
>>> process is slightly different for each language.
>>>
>>> For transforms that are complicated, there is a cross language effort
>>> going on so that one can execute one language's transforms within another
>>> languages pipeline which may remove the need to write the transforms more
>>> then once.
>>>
>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <[email protected]> wrote:
>>>
>>>> Nice idea, IO looks like a good place for them but there is another
>>>> path that could fit this case: `sdks/java/extensions`, some module like
>>>> `google-cloud-platform-ai` in that folder or something like that, no?
>>>>
>>>> In any case great initiative. +1
>>>>
>>>>
>>>>
>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We’d like to implement a set of PTransforms that would allow users to
>>>>> use some of the Google Cloud AI services in Beam pipelines.
>>>>>
>>>>> Here's the full list of services and functionalities we’d like to
>>>>> integrate Beam with:
>>>>>
>>>>> * Video Intelligence [1]
>>>>>
>>>>> * Cloud Natural Language [2]
>>>>>
>>>>> * Cloud AI Platform Prediction [3]
>>>>>
>>>>> * Data Masking/Tokenization [4]
>>>>>
>>>>> * Inspecting image data for sensitive information using Cloud Vision
>>>>> [5]
>>>>>
>>>>> However, we're not sure whether to put those transforms directly into
>>>>> Beam, because they would require some additional GCP dependencies. One of
>>>>> our ideas is a separate library, that depends on Beam and that can be
>>>>> installed optionally, stored somewhere in the beam repository (e.g. in the
>>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? Or
>>>>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>>>>
>>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>>
>>>>> Best,
>>>>>
>>>>> Kamil
>>>>>
>>>>> [1] https://cloud.google.com/video-intelligence/
>>>>>
>>>>> [2] https://cloud.google.com/natural-language/
>>>>>
>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>>
>>>>> [4]
>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>>
>>>>> [5] https://cloud.google.com/vision/
>>>>>
>>>>

Reply via email to