Hi Sergio,
By the way, you can also use TensorFrame allowing you to use TensorFlow
directly with Spark dataframe, and more direct access. I discussed with
Tim Hunter from Databricks about that who's working on TensorFrame.
Back on Beam, what you could do:
1. you expose the service on a microservice container (for instance
Apache Karaf ;))
In your pipeline, you have two options:
2.a. in your Beam pipeline, in a DoFn, in the @Setup you can create the
REST client (using CXF, or whatever), and in the @ProcessElement you can
use the service (hosted by Karaf)
2.b. I also have a RestIO (source and sink) that can request a REST
endpoint. However, for now, this IO acts as a pipeline endpoint
(PTransform<PBegin, PCollection> or PTransform<PCollection, PDone>). In
your case, if the service called is a step of your pipeline, ParDo(your
DoFn) would be easier.
Is it what you mean by microservice ?
Regards
JB
On 11/25/2016 01:18 PM, Sergio Fernández wrote:
Hi JB,
On Tue, Nov 22, 2016 at 11:14 AM, Jean-Baptiste Onofré <[email protected]>
wrote:
DoFn will execute per element (with eventually a hook on StartBundle,
FinishBundle, and Teardown). It's basic the way it works in IO WriteFn: we
create the connection in StartBundle and send each element (with a batch)
to external resource.
PTransform is maybe more flexible in case of interact with "outside"
resources.
Probably PTransform would be a better place. I'm still pretty new to some
of the Beam terms and apis.
Do you have use case to be sure I understand ?
Yes, Well, it's far more complex, but this question I can simplify it:
We have a TensorFlow-based classifier. In our pipeline one step performs
that classification of the data. Currently it's implemented as a Spark
Function, because TensorFlow models can directly be embedded within
pipelines using PySpark.
Therefore I'm looking for the best option to move such classification
process one level up in the abstraction with Beam, so I could make it
portable. The first idea I'm exploring is relying on a external function
(i.e., microservice) that I'd need to scale up and down independently of
the pipeline. So I'm more than welcome to discuss ideas ;-)
Thanks.
Cheers,
On 11/22/2016 10:39 AM, Sergio Fernández wrote:
Hi,
I'd like resume the idea to have TensorFlow-based tasks running in a Beam
Pipeline. So far the cleaner approach I can imagine would be to have it
running outside (Functions in GCP, Lambdas in AWS, Microservices generally
speaking).
Therefore, does the current Beam model provide the sense of a DoFn which
actually runs externally?
Thanks in advance for the feedback.
Cheers,
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com
--
<http://www.talend.com>
<http://www.talend.com>
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: <http://www.talend.com>[email protected]
w: http://redlink.co
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com