damccorm commented on code in PR #24437:
URL: https://github.com/apache/beam/pull/24437#discussion_r1036458907
##########
examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb:
##########
@@ -38,29 +38,23 @@
"\n",
"For rapid execution, Pandas loads all of the data into memory on a
single machine (one node). This configuration works well when dealing with
small-scale datasets. However, many projects involve datasets that are too big
to fit in memory. These use cases generally require parallel data processing
frameworks, such as Apache Beam.\n",
"\n",
- "\n",
- "## Apache Beam DataFrames\n",
- "\n",
- "\n",
- "Beam DataFrames provide a pandas-like\n",
+ "Beam DataFrames provide a Pandas-like\n",
"API to declare and define Beam processing pipelines. It provides a
familiar interface for machine learning practioners to build complex
data-processing pipelines by only invoking standard pandas commands.\n",
"\n",
"To learn more about Apache Beam DataFrames, see the\n",
"[Beam DataFrames
overview](https://beam.apache.org/documentation/dsls/dataframes/overview)
page.\n",
"\n",
- "## Goal\n",
- "The goal of this notebook is to explore a dataset preprocessed with
the Beam DataFrame API for machine learning model training.\n",
+ "## Overview\n",
+ "The goal of this example is to explore a dataset preprocessed with
the Beam DataFrame API for machine learning model training.\n",
"\n",
- "\n",
- "## Tutorial outline\n",
- "\n",
- "This notebook demonstrates the use of the Apache Beam DataFrames API
to perform common data exploration as well as the preprocessing steps that are
necessary to prepare your dataset for machine learning model training and
inference. These steps include the following: \n",
+ "This example demonstrates the use of the Apache Beam DataFrames API
to perform common data exploration as well as the preprocessing steps that are
necessary to prepare your dataset for machine learning model training and
inference. This example includes the following steps: \n",
"\n",
"* Removing unwanted columns.\n",
"* One-hot encoding categorical columns.\n",
"* Normalizing numerical columns.\n",
"\n",
- "\n"
+ "In this example, the first section demonstrates how to build and
execute a pipeline locally using the interactive runner.\n",
+ "The second section uses a distributed runner to demonstrate how to
run the pipeline on the full dataset.\n",
Review Comment:
That fixed it -
https://github.com/rszper/beam/blob/rszper-ML-notebooks/examples/notebooks/beam-ml/dataframe_api_preprocessing.ipynb
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]