[GitHub] [beam] damccorm commented on a diff in pull request #24437: ML notebook formatting and text updates

GitBox Wed, 30 Nov 2022 13:29:00 -0800


damccorm commented on code in PR #24437:
URL: https://github.com/apache/beam/pull/24437#discussion_r1036458907



##########
examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb:
##########
@@ -38,29 +38,23 @@
         "\n",
         "For rapid execution, Pandas loads all of the data into memory on a 
single machine (one node). This configuration works well when dealing with 
small-scale datasets. However, many projects involve datasets that are too big 
to fit in memory. These use cases generally require parallel data processing 
frameworks, such as Apache Beam.\n",
         "\n",
-        "\n",
-        "## Apache Beam DataFrames\n",
-        "\n",
-        "\n",
-        "Beam DataFrames provide a pandas-like\n",
+        "Beam DataFrames provide a Pandas-like\n",
         "API to declare and define Beam processing pipelines. It provides a 
familiar interface for machine learning practioners to build complex 
data-processing pipelines by only invoking standard pandas commands.\n",
         "\n",
         "To learn more about Apache Beam DataFrames, see the\n",
         "[Beam DataFrames 
overview](https://beam.apache.org/documentation/dsls/dataframes/overview) 
page.\n",
         "\n",
-        "## Goal\n",
-        "The goal of this notebook is to explore a dataset preprocessed with 
the Beam DataFrame API for machine learning model training.\n",
+        "## Overview\n",
+        "The goal of this example is to explore a dataset preprocessed with 
the Beam DataFrame API for machine learning model training.\n",
         "\n",
-        "\n",
-        "## Tutorial outline\n",
-        "\n",
-        "This notebook demonstrates the use of the Apache Beam DataFrames API 
to perform common data exploration as well as the preprocessing steps that are 
necessary to prepare your dataset for machine learning model training and 
inference. These steps include the following:  \n",
+        "This example demonstrates the use of the Apache Beam DataFrames API 
to perform common data exploration as well as the preprocessing steps that are 
necessary to prepare your dataset for machine learning model training and 
inference. This example includes the following steps:  \n",
         "\n",
         "*   Removing unwanted columns.\n",
         "*   One-hot encoding categorical columns.\n",
         "*   Normalizing numerical columns.\n",
         "\n",
-        "\n"
+        "In this example, the first section demonstrates how to build and 
execute a pipeline locally using the interactive runner.\n",
+        "The second section uses a distributed runner to demonstrate how to 
run the pipeline on the full dataset.\n",

Review Comment:
   That fixed it - 
https://github.com/rszper/beam/blob/rszper-ML-notebooks/examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm commented on a diff in pull request #24437: ML notebook formatting and text updates

Reply via email to