[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=285211&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-285211 ]
ASF GitHub Bot logged work on BEAM-7389: ---------------------------------------- Author: ASF GitHub Bot Created on: 30/Jul/19 19:53 Start Date: 30/Jul/19 19:53 Worklog Time Spent: 10m Work Description: rosetn commented on pull request #9184: [BEAM-7389] Add code examples for Filter page URL: https://github.com/apache/beam/pull/9184#discussion_r308884898 ########## File path: website/src/documentation/transforms/python/element-wise/filter.md ########## @@ -18,25 +18,168 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Filter -<table align="left"> - <a target="_blank" class="button" +# Filter + +<script type="text/javascript"> +localStorage.setItem('language', 'language-py') +</script> + +<table> + <td> + <a class="button" target="_blank" href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Filter"> - <img src="https://beam.apache.org/images/logos/sdks/python.png" width="20px" height="20px" - alt="Pydoc" /> - Pydoc + <img src="https://beam.apache.org/images/logos/sdks/python.png" + width="20px" height="20px" alt="Pydoc" /> + Pydoc </a> + </td> </table> <br> + Given a predicate, filter out all elements that don't satisfy that predicate. May also be used to filter based on an inequality with a given value based on the comparison ordering of the element. ## Examples -See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. -## Related transforms +### Function + +`Filter` accepts a function that receives the element as the first argument, +and returns `True` if the element will be kept, or `False` if the element will be filtered out. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py tag:filter_function %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Lambda + +Lambda functions can also be used. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py tag:filter_lambda %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Multiple Arguments + +Multiple function arguments can be passed as additional arguments to `Filter`. +They will be passed as additional positional arguments or keyword arguments to the function. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py tag:filter_multiple_arguments %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Side Inputs - Singleton + +If there is a single value in the `PCollection`, such as the average from another computation, +passing the `PCollection` as a *singleton* will access that value. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py tag:filter_side_inputs_singleton %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Side Inputs - Iterator + +If there are multiple values in the `PCollection`, it is recommended way to pass the `PCollection` as an *iterator*. +This will access elements lazily as they are needed, +so it is possible to iterate over very large `PCollection`s that don't fit into memory. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py tag:filter_side_inputs_iter %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/filter.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +> **Note**: It is also possible to pass the `PCollection` as a *list* with `beam.pvalue.AsList(pcollection)`, +> but this will require all the elements to fit into memory. Review comment: "will require all" -> "requires that all" ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 285211) > Colab examples for element-wise transforms (Python) > --------------------------------------------------- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website > Reporter: Rose Nguyen > Assignee: David Cavazos > Priority: Minor > Time Spent: 23.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)