[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294231 ]
ASF GitHub Bot logged work on BEAM-7389: ---------------------------------------- Author: ASF GitHub Bot Created on: 13/Aug/19 22:17 Start Date: 13/Aug/19 22:17 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add code examples for Partition page URL: https://github.com/apache/beam/pull/9261#discussion_r313632402 ########## File path: website/src/documentation/transforms/python/element-wise/partition.md ########## @@ -39,12 +46,130 @@ You cannot determine the number of partitions in mid-pipeline See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#partition). ## Examples -See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. -## Related transforms -* [Filter]({{ site.baseurl }}/documentation/transforms/python/elementwise/filter) is useful if the function is just +In the following examples, we create a pipeline with a `PCollection` of produce with their icon, name, and duration. +Then, we apply `Partition` in multiple ways to split the `PCollection` into multiple `PCollections`. + +`Partition` accepts a function that receives the number of partitions, +and returns the index of the desired partition for the element. +The number of partitions passed must be a positive integer, +and it must return an integer in the range `0` to `num_partitions-1`. + +### Example 1: Partition with a function + +In the following example, we have a known list of durations. +We partition the `PCollection` into one `PCollection` for every duration type. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py tag:partition_function %}``` + +Output `PCollection`s: + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py tag:partitions %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Example 2: Partition with a lambda function + +We can also use lambda functions to simplify **Example 1**. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py tag:partition_lambda %}``` + +Output `PCollection`s: + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py tag:partitions %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Example 3: Partition with multiple arguments + +You can pass functions with multiple arguments to `Partition`. +They are passed as additional positional arguments or keyword arguments to the function. + +In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` as arguments. +`num_partitions` is used by `Partitions` as a positional argument, +while any other argument will be passed to `split_dataset`. + +In machine learning, it is common to split it into +[training and a testing datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets). +Typically, 80% of the data is used for training a model, and 20% is used for testing. + +We will split a `PCollection` dataset into training and testing datasets. +We define `split_dataset` which receives the element, the number of partitions, and an additional argument `ratio` that describes the ratio of the split. Review comment: "and an additional argument `ratio` that describes the ratio of the split."->"and an additional argument `ratio`, which is a list of numbers that represents that ratio of items in the partitions." ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 294231) > Colab examples for element-wise transforms (Python) > --------------------------------------------------- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website > Reporter: Rose Nguyen > Assignee: David Cavazos > Priority: Minor > Time Spent: 45h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)