[GitHub] [beam] rszper commented on a diff in pull request #27709: MLTransform transform catalog snippets

via GitHub Mon, 21 Aug 2023 10:16:21 -0700


rszper commented on code in PR #27709:
URL: https://github.com/apache/beam/pull/27709#discussion_r1300421093



##########
website/www/site/content/en/documentation/transforms/python/elementwise/mltransform.md:
##########
@@ -0,0 +1,120 @@
+---
+title: "MLTransform"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# MLTransform for data processing
+
+{{< localstorage language language-py >}}
+
+
+<table>
+  <tr>
+    <td>
+      <a>
+      {{< button-pydoc path="apache_beam.ml.transforms" class="MLTransform" >}}
+      </a>
+   </td>
+  </tr>
+</table>
+
+
+Use `MLTransform` to apply common machine learning (ML) processing tasks on 
keyed data. Apache Beam provides ML data processing transformations that you 
can use with `MLTransform`. For the full list of available data
+processing transformations, see the [tft.py 
file](https://github.com/apache/beam/blob/ab93fb1988051baac6c3b9dd1031f4d68bd9a149/sdks/python/apache_beam/ml/transforms/tft.py#L52)
 in GitHub.
+
+
+To define a data processing transformation by using `MLTransform`, create 
instances of data processing transforms with `columns` as input parameters. The 
data in the specified `columns` is transformed and outputted to the `beam.Row` 
object.
+
+The following example demonstrates how to use `MLTransform` to normalize your 
data between 0 and 1 by using the minimum and maximum values from your entire 
dataset. `MLTransform` uses the `ScaleTo01` transformation.
+
+
+```
+scale_to_z_score_transform = ScaleToZScore(columns=['x', 'y'])
+with beam.Pipeline() as p:
+  (data | 
MLTransform(write_artifact_location=artifact_location).with_transform(scale_to_z_score_transform))
+```
+
+In this example, `MLTransform` receives a value for `write_artifact_location`. 
`MLTransform` then uses this location value to write artifacts generated by the 
transform. To pass the data processing transform, you can use either the 
with_transform method of `MLTransform` or a list.
+
+```
+MLTransform(transforms=transforms, 
write_artifact_location=write_artifact_location)
+```
+
+The transforms passed to `MLTransform` are applied sequentially on the 
dataset. `MLTransform` expects a dictionary and return a transformed Row 
objecst with numpy arrays.

Review Comment:
   ```suggestion
   The transforms passed to `MLTransform` are applied sequentially on the 
dataset. `MLTransform` expects a dictionary and returns a transformed row 
object with NumPy arrays.
   ```



##########
website/www/site/content/en/documentation/transforms/python/elementwise/mltransform.md:
##########
@@ -0,0 +1,120 @@
+---
+title: "MLTransform"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# MLTransform for data processing
+
+{{< localstorage language language-py >}}
+
+
+<table>
+  <tr>
+    <td>
+      <a>
+      {{< button-pydoc path="apache_beam.ml.transforms" class="MLTransform" >}}
+      </a>
+   </td>
+  </tr>
+</table>
+
+
+Use `MLTransform` to apply common machine learning (ML) processing tasks on 
keyed data. Apache Beam provides ML data processing transformations that you 
can use with `MLTransform`. For the full list of available data
+processing transformations, see the [tft.py 
file](https://github.com/apache/beam/blob/ab93fb1988051baac6c3b9dd1031f4d68bd9a149/sdks/python/apache_beam/ml/transforms/tft.py#L52)
 in GitHub.
+
+
+To define a data processing transformation by using `MLTransform`, create 
instances of data processing transforms with `columns` as input parameters. The 
data in the specified `columns` is transformed and outputted to the `beam.Row` 
object.
+
+The following example demonstrates how to use `MLTransform` to normalize your 
data between 0 and 1 by using the minimum and maximum values from your entire 
dataset. `MLTransform` uses the `ScaleTo01` transformation.
+
+
+```
+scale_to_z_score_transform = ScaleToZScore(columns=['x', 'y'])
+with beam.Pipeline() as p:
+  (data | 
MLTransform(write_artifact_location=artifact_location).with_transform(scale_to_z_score_transform))
+```
+
+In this example, `MLTransform` receives a value for `write_artifact_location`. 
`MLTransform` then uses this location value to write artifacts generated by the 
transform. To pass the data processing transform, you can use either the 
with_transform method of `MLTransform` or a list.

Review Comment:
   ```suggestion
   In this example, `MLTransform` receives a value for 
`write_artifact_location`. `MLTransform` then uses this location value to write 
artifacts generated by the transform. To pass the data processing transform, 
you can use either the `with_transform` method of `MLTransform` or a list.
   ```



##########
website/www/site/content/en/documentation/transforms/python/elementwise/mltransform.md:
##########
@@ -0,0 +1,120 @@
+---
+title: "MLTransform"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# MLTransform for data processing
+
+{{< localstorage language language-py >}}
+
+
+<table>
+  <tr>
+    <td>
+      <a>
+      {{< button-pydoc path="apache_beam.ml.transforms" class="MLTransform" >}}
+      </a>
+   </td>
+  </tr>
+</table>
+
+
+Use `MLTransform` to apply common machine learning (ML) processing tasks on 
keyed data. Apache Beam provides ML data processing transformations that you 
can use with `MLTransform`. For the full list of available data
+processing transformations, see the [tft.py 
file](https://github.com/apache/beam/blob/ab93fb1988051baac6c3b9dd1031f4d68bd9a149/sdks/python/apache_beam/ml/transforms/tft.py#L52)
 in GitHub.
+
+
+To define a data processing transformation by using `MLTransform`, create 
instances of data processing transforms with `columns` as input parameters. The 
data in the specified `columns` is transformed and outputted to the `beam.Row` 
object.
+
+The following example demonstrates how to use `MLTransform` to normalize your 
data between 0 and 1 by using the minimum and maximum values from your entire 
dataset. `MLTransform` uses the `ScaleTo01` transformation.
+
+
+```
+scale_to_z_score_transform = ScaleToZScore(columns=['x', 'y'])
+with beam.Pipeline() as p:
+  (data | 
MLTransform(write_artifact_location=artifact_location).with_transform(scale_to_z_score_transform))
+```
+
+In this example, `MLTransform` receives a value for `write_artifact_location`. 
`MLTransform` then uses this location value to write artifacts generated by the 
transform. To pass the data processing transform, you can use either the 
with_transform method of `MLTransform` or a list.
+
+```
+MLTransform(transforms=transforms, 
write_artifact_location=write_artifact_location)
+```
+
+The transforms passed to `MLTransform` are applied sequentially on the 
dataset. `MLTransform` expects a dictionary and return a transformed Row 
objecst with numpy arrays.
+## Examples
+
+The following examples demonstrate how to to create pipelines that use 
`MLTransform` to preprocess data.
+
+MLTransform can do a full pass on the dataset, which is useful when you need 
to transform a single element only after analyzing the entire dataset.

Review Comment:
   ```suggestion
   `MLTransform` can do a full pass on the dataset, which is useful when you 
need to transform a single element only after analyzing the entire dataset.
   ```



##########
website/www/site/content/en/documentation/transforms/python/elementwise/mltransform.md:
##########
@@ -0,0 +1,120 @@
+---
+title: "MLTransform"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# MLTransform for data processing
+
+{{< localstorage language language-py >}}
+
+
+<table>
+  <tr>
+    <td>
+      <a>
+      {{< button-pydoc path="apache_beam.ml.transforms" class="MLTransform" >}}
+      </a>
+   </td>
+  </tr>
+</table>
+
+
+Use `MLTransform` to apply common machine learning (ML) processing tasks on 
keyed data. Apache Beam provides ML data processing transformations that you 
can use with `MLTransform`. For the full list of available data
+processing transformations, see the [tft.py 
file](https://github.com/apache/beam/blob/ab93fb1988051baac6c3b9dd1031f4d68bd9a149/sdks/python/apache_beam/ml/transforms/tft.py#L52)
 in GitHub.
+
+
+To define a data processing transformation by using `MLTransform`, create 
instances of data processing transforms with `columns` as input parameters. The 
data in the specified `columns` is transformed and outputted to the `beam.Row` 
object.
+
+The following example demonstrates how to use `MLTransform` to normalize your 
data between 0 and 1 by using the minimum and maximum values from your entire 
dataset. `MLTransform` uses the `ScaleTo01` transformation.
+
+
+```
+scale_to_z_score_transform = ScaleToZScore(columns=['x', 'y'])
+with beam.Pipeline() as p:
+  (data | 
MLTransform(write_artifact_location=artifact_location).with_transform(scale_to_z_score_transform))
+```
+
+In this example, `MLTransform` receives a value for `write_artifact_location`. 
`MLTransform` then uses this location value to write artifacts generated by the 
transform. To pass the data processing transform, you can use either the 
with_transform method of `MLTransform` or a list.
+
+```
+MLTransform(transforms=transforms, 
write_artifact_location=write_artifact_location)
+```
+
+The transforms passed to `MLTransform` are applied sequentially on the 
dataset. `MLTransform` expects a dictionary and return a transformed Row 
objecst with numpy arrays.
+## Examples
+
+The following examples demonstrate how to to create pipelines that use 
`MLTransform` to preprocess data.
+
+MLTransform can do a full pass on the dataset, which is useful when you need 
to transform a single element only after analyzing the entire dataset.
+The first two examples require a full pass over the dataset to complete the 
data transformation.
+
+* For the `ComputeAndApplyVocabulary` transform, the transform needs access to 
all of the unique words in the dataset.
+* For the `ScaleTo01` transform, the transform needs to know the minimum and 
maximum values in the dataset.
+
+### Example 1
+
+This example creates a pipeline that uses `MLTransform` to scale data between 
0 and 1.
+The example takes a list of ints and converts them into the range of 0 to 1 
using the transform `ScaleTo01`.

Review Comment:
   ```suggestion
   The example takes a list of integers and converts them into the range of 0 
to 1 using the transform `ScaleTo01`.
   ```



##########
website/www/site/content/en/documentation/transforms/python/elementwise/mltransform.md:
##########
@@ -0,0 +1,120 @@
+---
+title: "MLTransform"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# MLTransform for data processing
+
+{{< localstorage language language-py >}}
+
+
+<table>
+  <tr>
+    <td>
+      <a>
+      {{< button-pydoc path="apache_beam.ml.transforms" class="MLTransform" >}}
+      </a>
+   </td>
+  </tr>
+</table>
+
+
+Use `MLTransform` to apply common machine learning (ML) processing tasks on 
keyed data. Apache Beam provides ML data processing transformations that you 
can use with `MLTransform`. For the full list of available data
+processing transformations, see the [tft.py 
file](https://github.com/apache/beam/blob/ab93fb1988051baac6c3b9dd1031f4d68bd9a149/sdks/python/apache_beam/ml/transforms/tft.py#L52)
 in GitHub.
+
+
+To define a data processing transformation by using `MLTransform`, create 
instances of data processing transforms with `columns` as input parameters. The 
data in the specified `columns` is transformed and outputted to the `beam.Row` 
object.
+
+The following example demonstrates how to use `MLTransform` to normalize your 
data between 0 and 1 by using the minimum and maximum values from your entire 
dataset. `MLTransform` uses the `ScaleTo01` transformation.
+
+
+```
+scale_to_z_score_transform = ScaleToZScore(columns=['x', 'y'])
+with beam.Pipeline() as p:
+  (data | 
MLTransform(write_artifact_location=artifact_location).with_transform(scale_to_z_score_transform))
+```
+
+In this example, `MLTransform` receives a value for `write_artifact_location`. 
`MLTransform` then uses this location value to write artifacts generated by the 
transform. To pass the data processing transform, you can use either the 
with_transform method of `MLTransform` or a list.
+
+```
+MLTransform(transforms=transforms, 
write_artifact_location=write_artifact_location)
+```
+
+The transforms passed to `MLTransform` are applied sequentially on the 
dataset. `MLTransform` expects a dictionary and return a transformed Row 
objecst with numpy arrays.
+## Examples
+
+The following examples demonstrate how to to create pipelines that use 
`MLTransform` to preprocess data.
+
+MLTransform can do a full pass on the dataset, which is useful when you need 
to transform a single element only after analyzing the entire dataset.
+The first two examples require a full pass over the dataset to complete the 
data transformation.
+
+* For the `ComputeAndApplyVocabulary` transform, the transform needs access to 
all of the unique words in the dataset.
+* For the `ScaleTo01` transform, the transform needs to know the minimum and 
maximum values in the dataset.
+
+### Example 1
+
+This example creates a pipeline that uses `MLTransform` to scale data between 
0 and 1.
+The example takes a list of ints and converts them into the range of 0 to 1 
using the transform `ScaleTo01`.
+
+{{< highlight language="py" 
file="sdks/python/apache_beam/examples/snippets/transforms/elementwise/mltransform.py"
+  class="notebook-skip" >}}
+{{< code_sample 
"sdks/python/apache_beam/examples/snippets/transforms/elementwise/mltransform.py"
 mltransform_scale_to_0_1 >}}
+{{</ highlight >}}
+
+{{< paragraph class="notebook-skip" >}}
+Output:
+{{< /paragraph >}}
+{{< highlight class="notebook-skip" >}}
+{{< code_sample 
"sdks/python/apache_beam/examples/snippets/transforms/elementwise/mltransform_test.py"
 mltransform_scale_to_0_1 >}}
+{{< /highlight >}}
+
+
+### Example 2
+
+This example creates a pipeline that use `MLTransform` to compute vocabulary 
on the entire dataset and assign indices to each unique vocabulary item.
+It takes a list of strings, computes vocabulary over the entire dataset, and 
then applies a unique index to each vocabulary item.
+
+
+{{< highlight language="py" 
file="sdks/python/apache_beam/examples/snippets/transforms/elementwise/mltransform.py"
+  class="notebook-skip" >}}
+{{< code_sample 
"sdks/python/apache_beam/examples/snippets/transforms/elementwise/mltransform.py"
 mltransform_compute_and_apply_vocabulary >}}
+{{</ highlight >}}
+
+{{< paragraph class="notebook-skip" >}}
+Output:
+{{< /paragraph >}}
+{{< highlight class="notebook-skip" >}}
+{{< code_sample 
"sdks/python/apache_beam/examples/snippets/transforms/elementwise/mltransform_test.py"
 mltransform_compute_and_apply_vocab >}}
+{{< /highlight >}}
+
+
+The above two examples requires a full pass over the dataset to transform the 
dataset. For `ComputeAndApplyVocabulary`, all the unqiue words in the dataset 
needs to be known before transforming the data. For `ScaleTo01`, the minimum 
and maximum of the dataset needs to be known before transforming the dataset. 
This is acheived by `MLTransform`.

Review Comment:
   ```suggestion
   ```
   Delete this, because it's been moved to the intro.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] rszper commented on a diff in pull request #27709: MLTransform transform catalog snippets

Reply via email to