[GitHub] [beam] TheNeuralBit commented on a change in pull request #15600: [BEAM-11480] Use snippets for DataFrame examples

GitBox Sat, 16 Oct 2021 17:30:02 -0700


TheNeuralBit commented on a change in pull request #15600:
URL: https://github.com/apache/beam/pull/15600#discussion_r729968203




##########
File path: website/www/site/content/en/documentation/dsls/dataframes/overview.md
##########
@@ -55,32 +50,8 @@ To use the DataFrames API in a larger pipeline, you can 
convert a PCollection to
 
 Here’s an example that creates a schema-aware PCollection, converts it to a 
DataFrame using `to_dataframe`, processes the DataFrame, and then converts the 
DataFrame back to a PCollection using `to_pcollection`:
 
-<!-- TODO(BEAM-11480): Convert these examples to snippets -->
 {{< highlight py >}}
-from apache_beam.dataframe.convert import to_dataframe
-from apache_beam.dataframe.convert import to_pcollection
-...
-    # Read the text file[pattern] into a PCollection.
-    lines = p | 'Read' >> ReadFromText(known_args.input)
-
-    words = (
-        lines
-        | 'Split' >> beam.FlatMap(
-            lambda line: re.findall(r'[\w]+', line)).with_output_types(str)
-        # Map to Row objects to generate a schema suitable for conversion
-        # to a dataframe.
-        | 'ToRows' >> beam.Map(lambda word: beam.Row(word=word)))
-
-    df = to_dataframe(words)
-    df['count'] = 1
-    counted = df.groupby('word').sum()
-    counted.to_csv(known_args.output)
-
-    # Deferred DataFrames can also be converted back to schema'd PCollections
-    counted_pc = to_pcollection(counted, include_indexes=True)
-
-    # Do something with counted_pc
-    ...
+{{< code_sample "sdks/python/apache_beam/examples/dataframe/wordcount.py" 
DataFrame_wordcount >}}

Review comment:
       Yes that's right, I guess I can just hardcode the imports here to keep 
them in. Thanks for pointing this out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [beam] TheNeuralBit commented on a change in pull request #15600: [BEAM-11480] Use snippets for DataFrame examples

Reply via email to