[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-19 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r395146160
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google Cloud AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use [Google Cloud AI Platform 
Prediction](https://cloud.google.com/ai-platform/prediction/docs/overview) to 
make predictions about new data from a cloud-hosted machine learning model.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library with a Beam 
PTransform called `RunInference`. `RunInference` is able to perform an 
inference that can use a service endpoint. When using a service endpoint, the 
transform takes a PCollection of type `tf.train.Example` and, for every bunch 
of elements, sends a request to AI Platform Prediction. The transform produces 
a PCollection of type `PredictLog`, which contains predictions.
 
 Review comment:
   I believe BatchElements (the transform responsible for batching here) doc is 
sufficient. Here's a 
[link](https://github.com/apache/beam/blob/42d79c29949725b3afabfb3754bfb394be594460/sdks/python/apache_beam/transforms/util.py#L540)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-19 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r395118686
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
+
+Before getting started, deploy a machine learning model to the cloud. The 
cloud service manages the infrastructure needed to handle prediction requests 
in both efficient and scalable way. Only Tensorflow models are supported. For 
more information, see [Exporting a SavedModel for 
prediction](https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction).
 
 Review comment:
   In this case it's for receiving data. In other words, AI Platform Prediction 
exposes a service endpoint that can receive data. I'll clarify it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-19 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r395113820
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google Cloud AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use [Google Cloud AI Platform 
Prediction](https://cloud.google.com/ai-platform/prediction/docs/overview) to 
make predictions about new data from a cloud-hosted machine learning model.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library with a Beam 
PTransform called `RunInference`. `RunInference` is able to perform an 
inference that can use a service endpoint. When using a service endpoint, the 
transform takes a PCollection of type `tf.train.Example` and, for every bunch 
of elements, sends a request to AI Platform Prediction. The transform produces 
a PCollection of type `PredictLog`, which contains predictions.
 
 Review comment:
   >  Is there away to configure how many elements are in the batch?
   
   No, in this case Beam handles the size of a batch on its own.
   
   > I think the previous version of this paragraph stated that one request is 
sent per element.
   
   That's right. After a discussion on batching it comes to my mind this is not 
truly correct.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-19 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r395071873
 
 

 ##
 File path: website/src/documentation/patterns/overview.md
 ##
 @@ -38,6 +38,9 @@ Pipeline patterns demonstrate common Beam use cases. 
Pipeline patterns are based
 **Custom window patterns** - Patterns for windowing functions
 * [Using data to dynamically set session window gaps]({{ site.baseurl 
}}/documentation/patterns/custom-windows/#using-data-to-dynamically-set-session-window-gaps)
 
+**AI Platform integration patterns** - Patterns for Google AI Platform 
transforms
 
 Review comment:
   Looks good. One minor comment: there are more Google Cloud AI Platform 
transforms. This pull request is just about Prediction, more patterns could be 
added later. So I'd stick with `Patterns for Google Cloud AI Platform 
transforms`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-19 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r395049430
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
+
+Before getting started, deploy a machine learning model to the cloud. The 
cloud service manages the infrastructure needed to handle prediction requests 
in both efficient and scalable way. Only Tensorflow models are supported. For 
more information, see [Exporting a SavedModel for 
prediction](https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction).
 
 Review comment:
   Alright, but I'd rather keep this sentence: "only TensorFlow models are 
supported by the transform" anyway. AI Platform, unlike `RunInference`, doesn't 
only support TensorFlow models.  It's important to put an emphasis on that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-19 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r395037728
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
 
 Review comment:
   Fair enough, this is an another evidence that this mention causes confusion. 
I'll remote it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-19 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r395033918
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
 
 Review comment:
   No, AI Platform is a service that manages cloud-hosted machine learning 
models.
   How about something like this?
   
   > This section shows how to use Google Cloud AI Platform Prediction  to make 
predictions about new data from a cloud-hosted machine learning model.
   
   I'll also put an url link to an overview of AI Platform Prediction. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-17 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r393764367
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
+
+Before getting started, deploy a machine learning model to the cloud. The 
cloud service manages the infrastructure needed to handle prediction requests 
in both efficient and scalable way. Only Tensorflow models are supported. For 
more information, see [Exporting a SavedModel for 
prediction](https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction).
+
+Once a machine learning model is deployed, prepare a list of instances to get 
predictions for. 
+
+Here is an example of a pipeline that reads input instances from the file, 
converts JSON objects to `tf.train.Example` objects and sends data to the 
service. The content of a file can look like this:
+
+```
+{"input": "the quick brown"}
+{"input": "la bruja le"}
+``` 
+
+The example creates `tf.train.BytesList` instances, thus it expects byte-like 
strings as input, but other data types, like `tf.train.FloatList` and 
`tf.train.Int64List`, are also supported by the transform. To send binary data, 
make sure that the name of an input ends in `_bytes`.
 
 Review comment:
   Basically, the input format is very easy: the transform expects a 
PCollection of `tf.train.Example` objects, which are pretty well-known in 
tensorflow world. But I agree that the information about sending binary data is 
not visible enough. I'll try to improve it. 
   
   > do you mean that we need to change l74 to something like:
   feature={name+'_bytes', value} for sending binary data to endpoint?
   
   Yes. It's a sign for the transform that data should be b64-encoded. In a 
request we'd have something like this:
   
   `
   "instances": [{"input_bytes": {"b64": "xxx"}}]
   `


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-16 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r393007213
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
+
+Before getting started, deploy a machine learning model to the cloud. The 
cloud service manages the infrastructure needed to handle prediction requests 
in both efficient and scalable way. Only Tensorflow models are supported. For 
more information, see [Exporting a SavedModel for 
prediction](https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction).
 
 Review comment:
   I'm curious what others think about this. @aaltay @Ardagan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-16 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r393005385
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
+
+Before getting started, deploy a machine learning model to the cloud. The 
cloud service manages the infrastructure needed to handle prediction requests 
in both efficient and scalable way. Only Tensorflow models are supported. For 
more information, see [Exporting a SavedModel for 
prediction](https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction).
 
 Review comment:
   Yes, the transform expects that a specified model is up and running. Users 
should also un-deploy redundant model manually.
   
   It is theoretically possible to implement such initial setup and clean-up 
operations, however, I wonder whether this is a good use-case for Beam: those 
are long-running operations and we'd have to wait until they are finished. As a 
result, Dataflow workers (or any other runner) would be idle for some time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-16 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r392991557
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
+
+Before getting started, deploy a machine learning model to the cloud. The 
cloud service manages the infrastructure needed to handle prediction requests 
in both efficient and scalable way. Only Tensorflow models are supported. For 
more information, see [Exporting a SavedModel for 
prediction](https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction).
+
+Once a machine learning model is deployed, prepare a list of instances to get 
predictions for. 
+
+Here is an example of a pipeline that reads input instances from the file, 
converts JSON objects to `tf.train.Example` objects and sends data to the 
service. The content of a file can look like this:
+
+```
+{"input": "the quick brown"}
+{"input": "la bruja le"}
+``` 
+
+The example creates `tf.train.BytesList` instances, thus it expects byte-like 
strings as input, but other data types, like `tf.train.FloatList` and 
`tf.train.Int64List`, are also supported by the transform. To send binary data, 
make sure that the name of an input ends in `_bytes`.
+
+Here is the code:
+
+{:.language-java}
+```java
+// Getting predictions is not yet available for Java. [BEAM-9501]
+```
+
+{:.language-py}
+```py
+import json
+
+import apache_beam as beam
+
+import tensorflow as tf
+from tfx_bsl.beam.run_inference import RunInference
+from tfx_bsl.proto import model_spec_pb2
+
+def convert_json_to_tf_example(json_obj):
+  dict_ = json.loads(json_obj)
+  for name, text in dict_.items():
+  value = tf.train.Feature(bytes_list=tf.train.BytesList(
+value=[text.encode('utf-8')]))
+  feature = {name: value}
+  return tf.train.Example(features=tf.train.Features(feature=feature))
+
+with beam.Pipeline() as p:
+ _ = (p
+ | beam.io.ReadFromText('gs://my-bucket/samples.json')
 
 Review comment:
   `RunInference` uses a Beam's built-in transform `BatchElements`, which 
batches elements (it means that it consumes a PCollection of `tf.train.Example` 
and produces a PCollection of `List[tf.train.Example]`). Thanks to that, 
multiple input items could be sent using a single HTTP request.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-16 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r392972669
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
+
+Before getting started, deploy a machine learning model to the cloud. The 
cloud service manages the infrastructure needed to handle prediction requests 
in both efficient and scalable way. Only Tensorflow models are supported. For 
more information, see [Exporting a SavedModel for 
prediction](https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction).
+
+Once a machine learning model is deployed, prepare a list of instances to get 
predictions for. 
+
+Here is an example of a pipeline that reads input instances from the file, 
converts JSON objects to `tf.train.Example` objects and sends data to the 
service. The content of a file can look like this:
+
+```
+{"input": "the quick brown"}
+{"input": "la bruja le"}
+``` 
+
+The example creates `tf.train.BytesList` instances, thus it expects byte-like 
strings as input, but other data types, like `tf.train.FloatList` and 
`tf.train.Int64List`, are also supported by the transform. To send binary data, 
make sure that the name of an input ends in `_bytes`.
+
+Here is the code:
+
+{:.language-java}
+```java
+// Getting predictions is not yet available for Java. [BEAM-9501]
+```
+
+{:.language-py}
+```py
+import json
+
+import apache_beam as beam
+
+import tensorflow as tf
+from tfx_bsl.beam.run_inference import RunInference
+from tfx_bsl.proto import model_spec_pb2
+
+def convert_json_to_tf_example(json_obj):
+  dict_ = json.loads(json_obj)
+  for name, text in dict_.items():
+  value = tf.train.Feature(bytes_list=tf.train.BytesList(
+value=[text.encode('utf-8')]))
+  feature = {name: value}
+  return tf.train.Example(features=tf.train.Features(feature=feature))
+
+with beam.Pipeline() as p:
+ _ = (p
+ | beam.io.ReadFromText('gs://my-bucket/samples.json')
+ | beam.Map(convert_json_to_tf_example)
+ | RunInference(
 
 Review comment:
   Yes. The transform uses this client library: 
https://github.com/googleapis/google-api-python-client. 
   
   > is there a plan to support gRPC in the future?
   
   As far as I know, there is no such plan at the moment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-16 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r392966450
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use a cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+[tfx_bsl](https://github.com/tensorflow/tfx-bsl) is a library that provides 
`RunInference` Beam's PTransform. `RunInference` is a PTransform able to 
perform two types of inference. One of them can use a service endpoint. When 
using a service endpoint, the transform takes a PCollection of type 
`tf.train.Example` and, for each element, sends a request to Google Cloud AI 
Platform Prediction service. The transform produces a PCollection of type 
`PredictLog` which contains predictions.
 
 Review comment:
   Offline inference from a SavedModel instance. I decided it's not worth 
mentioning here, since the article is focused on AI Platform patterns.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [beam] kamilwu commented on a change in pull request #11075: [BEAM-9421] Website section that describes getting predictions using AI Platform Prediciton

2020-03-13 Thread GitBox
kamilwu commented on a change in pull request #11075: [BEAM-9421] Website 
section that describes getting predictions using AI Platform Prediciton
URL: https://github.com/apache/beam/pull/11075#discussion_r392306737
 
 

 ##
 File path: website/src/documentation/patterns/ai-platform.md
 ##
 @@ -0,0 +1,87 @@
+---
+layout: section
+title: "AI Platform integration patterns"
+section_menu: section-menu/documentation.html
+permalink: /documentation/patterns/ai-platform/
+---
+
+
+# AI Platform integration patterns
+
+This page describes common patterns in pipelines with Google AI Platform 
transforms.
+
+
+  Adapt for:
+  
+Java SDK
+Python SDK
+  
+
+
+## Getting predictions
+
+This section shows how to use your cloud-hosted machine learning model to make 
predictions about new data using Google Cloud AI Platform Prediction within 
Beam's pipeline.
+ 
+We are going to use [tfx_bsl](https://github.com/tensorflow/tfx-bsl) library 
which provides `RunInference` Beam's PTransform. `RunInference` is a PTransform 
able to perform two types of inference. In this section we are going to 
consider one of them that uses a service endpoint. When using a service 
endpoint, the transform takes a PCollection of type `tf.train.Example` and, for 
each element, sends a request to Google Cloud AI Platform Prediction service. 
The transform produces a PCollection of type `PredictLog` which contains 
predictions.
+
+Before we get started, we have to deploy a machine learning model to the 
cloud. The cloud service manages the infrastructure needed to handle prediction 
requests in both efficient and scalable way. Only Tensorflow models are 
supported. For more information, see [Exporting a SavedModel for 
prediction](https://cloud.google.com/ai-platform/prediction/docs/exporting-savedmodel-for-prediction).
+
+Let's show an example of a pipeline that reads input instances from the file, 
converts JSON objects to `tf.train.Example` objects and sends data to the 
service. The content of a file can look like this:
+
+```
+{"input": "the quick brown"}
+{"input": "la bruja le"}
+``` 
+
+Here is the code:
+
+{:.language-java}
+```java
+// Getting predictions is not available for Java.
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services