TheNeuralBit commented on a change in pull request #15600: URL: https://github.com/apache/beam/pull/15600#discussion_r729968203
########## File path: website/www/site/content/en/documentation/dsls/dataframes/overview.md ########## @@ -55,32 +50,8 @@ To use the DataFrames API in a larger pipeline, you can convert a PCollection to Here’s an example that creates a schema-aware PCollection, converts it to a DataFrame using `to_dataframe`, processes the DataFrame, and then converts the DataFrame back to a PCollection using `to_pcollection`: -<!-- TODO(BEAM-11480): Convert these examples to snippets --> {{< highlight py >}} -from apache_beam.dataframe.convert import to_dataframe -from apache_beam.dataframe.convert import to_pcollection -... - # Read the text file[pattern] into a PCollection. - lines = p | 'Read' >> ReadFromText(known_args.input) - - words = ( - lines - | 'Split' >> beam.FlatMap( - lambda line: re.findall(r'[\w]+', line)).with_output_types(str) - # Map to Row objects to generate a schema suitable for conversion - # to a dataframe. - | 'ToRows' >> beam.Map(lambda word: beam.Row(word=word))) - - df = to_dataframe(words) - df['count'] = 1 - counted = df.groupby('word').sum() - counted.to_csv(known_args.output) - - # Deferred DataFrames can also be converted back to schema'd PCollections - counted_pc = to_pcollection(counted, include_indexes=True) - - # Do something with counted_pc - ... +{{< code_sample "sdks/python/apache_beam/examples/dataframe/wordcount.py" DataFrame_wordcount >}} Review comment: Yes that's right, I guess I can just hardcode the imports here to keep them in. Thanks for pointing this out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org