Hello, we are synced I was exactly back to need that same functionality.
Last time I checked (end november 2019) there were still many things that
were not there. First the External transform is not yet correctly exposed
to SDK users (see the previous discussion [1] and Jira ticket BEAM-8546
[2]).

I also hit file staging issues, I am not sure yet if those were my problem
or something that should be fixed too but I will probably take a look at
this soon. Max, Heejong or anyone more familiar with cross-language
pipelines has info on progress in this area?

[1]
https://lists.apache.org/thread.html/28f44041748deff8a587a149b4fcf0a8d13d219b32c5063979072474%40%3Cdev.beam.apache.org%3E
[2] https://issues.apache.org/jira/browse/BEAM-8546


On Tue, Jan 21, 2020 at 10:18 AM Michał Walenia <[email protected]>
wrote:

> Is using Python from Java via ExternalTransform working and tested?
>
> On Tue, Jan 21, 2020 at 6:50 AM Reza Rokni <[email protected]> wrote:
>
>> +1 for using cross language transforms.
>>
>> On Thu, 16 Jan 2020 at 01:23, Ahmet Altay <[email protected]> wrote:
>>
>>>
>>>
>>> On Wed, Jan 15, 2020 at 8:12 AM Kamil Wasilewski <
>>> [email protected]> wrote:
>>>
>>>> Based on your feedback, I think it'd be fine to deal with the problem
>>>> as follows:
>>>> * for Python: put the transforms into
>>>> `sdks/python/apache_beam/io/gcp/ai`
>>>> * for Java: create a `google-cloud-platform-ai` module in
>>>> `sdks/java/extensions` folder
>>>>
>>>> As for cross language, we expect those transforms to be quite simple,
>>>> so the cost of implementing them twice is not that high.
>>>>
>>>
>>> One option would be to implement inference in a library like tfx_bsl
>>> [1]. It comes with a generalized Beam transform that can do inference
>>> either from a saved model file or by using a service endpoint. The service
>>> endpoint API option is there and could support cloud AI APIs. If we utilize
>>> tfx_bsl, we will leverage the existing TFX integration and would avoid
>>> creating a parallel set of transforms. Then for Java, we could enable the
>>> same interface with cross language transform and offer a unified inference
>>> API for both languages.
>>>
>>> [1]
>>> https://github.com/tensorflow/tfx-bsl/blob/a9f5b6128309595570cc6212f8076e7a20063ac2/tfx_bsl/beam/run_inference.py#L78
>>>
>>>
>>>
>>>>
>>>> Thanks for your input,
>>>> Kamil
>>>>
>>>> On Wed, Jan 15, 2020 at 7:58 AM Alex Van Boxel <[email protected]>
>>>> wrote:
>>>>
>>>>> If it's in Java also be careful to align with the current google cloud
>>>>> IO's, certainly it's dependencies. The google IO's are not depending on 
>>>>> the
>>>>> the newest client libraries and that's something we're sometimes 
>>>>> struggling
>>>>> with when we depend on our own client libraries. So make sure to align 
>>>>> them.
>>>>>
>>>>> Also note that although gRPC is vendored, the google IO's do still
>>>>> have their own dependency on gRPC and this is the biggest reason for
>>>>> trouble.
>>>>>
>>>>>  _/
>>>>> _/ Alex Van Boxel
>>>>>
>>>>>
>>>>> On Wed, Jan 15, 2020 at 1:18 AM Luke Cwik <[email protected]> wrote:
>>>>>
>>>>>> It depends on what language the client libraries are exposed in. For
>>>>>> example, if the client libraries are in Java, sdks/java/extensions makes
>>>>>> sense while if its Python then integrating it within the gcp extension
>>>>>> within sdks/python/apache_beam makes sense.
>>>>>>
>>>>>> Adding additional dependencies is ok depending on the licensing and
>>>>>> the process is slightly different for each language.
>>>>>>
>>>>>> For transforms that are complicated, there is a cross language effort
>>>>>> going on so that one can execute one language's transforms within another
>>>>>> languages pipeline which may remove the need to write the transforms more
>>>>>> then once.
>>>>>>
>>>>>> On Tue, Jan 14, 2020 at 7:43 AM Ismaël Mejía <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Nice idea, IO looks like a good place for them but there is another
>>>>>>> path that could fit this case: `sdks/java/extensions`, some module like
>>>>>>> `google-cloud-platform-ai` in that folder or something like that, no?
>>>>>>>
>>>>>>> In any case great initiative. +1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 14, 2020 at 4:22 PM Kamil Wasilewski <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We’d like to implement a set of PTransforms that would allow users
>>>>>>>> to use some of the Google Cloud AI services in Beam pipelines.
>>>>>>>>
>>>>>>>> Here's the full list of services and functionalities we’d like to
>>>>>>>> integrate Beam with:
>>>>>>>>
>>>>>>>> * Video Intelligence [1]
>>>>>>>>
>>>>>>>> * Cloud Natural Language [2]
>>>>>>>>
>>>>>>>> * Cloud AI Platform Prediction [3]
>>>>>>>>
>>>>>>>> * Data Masking/Tokenization [4]
>>>>>>>>
>>>>>>>> * Inspecting image data for sensitive information using Cloud
>>>>>>>> Vision [5]
>>>>>>>>
>>>>>>>> However, we're not sure whether to put those transforms directly
>>>>>>>> into Beam, because they would require some additional GCP 
>>>>>>>> dependencies. One
>>>>>>>> of our ideas is a separate library, that depends on Beam and that can 
>>>>>>>> be
>>>>>>>> installed optionally, stored somewhere in the beam repository (e.g. in 
>>>>>>>> the
>>>>>>>> BEAM_ROOT/extras directory). Do you think it is a reasonable approach? 
>>>>>>>> Or
>>>>>>>> maybe it is totally fine to put them into SDKs, just like other IOs?
>>>>>>>>
>>>>>>>> If you have any other thoughts, do not hesitate to let us know.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Kamil
>>>>>>>>
>>>>>>>> [1] https://cloud.google.com/video-intelligence/
>>>>>>>>
>>>>>>>> [2] https://cloud.google.com/natural-language/
>>>>>>>>
>>>>>>>> [3] https://cloud.google.com/ml-engine/docs/prediction-overview
>>>>>>>>
>>>>>>>> [4]
>>>>>>>> https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#dlptexttobigquerystreaming
>>>>>>>>
>>>>>>>> [5] https://cloud.google.com/vision/
>>>>>>>>
>>>>>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>
>
> --
>
> Michał Walenia
> Polidea <https://www.polidea.com/> | Software Engineer
>
> M: +48 791 432 002 <+48791432002>
> E: [email protected]
>
> Unique Tech
> Check out our projects! <https://www.polidea.com/our-work>
>

Reply via email to