This is an automated email from the ASF dual-hosted git repository. altay pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push: new d4b6f32 [BEAM-7389] Add code examples for WithTimestamps page new 899037d Merge pull request #9267 from davidcavazos/withtimestamps-page d4b6f32 is described below commit d4b6f32b919e5aeb56f706faf1b7432d86fa0838 Author: David Cavazos <dcava...@google.com> AuthorDate: Mon Aug 5 10:36:42 2019 -0700 [BEAM-7389] Add code examples for WithTimestamps page --- .../python/element-wise/withtimestamps.md | 113 ++++++++++++++++++++- 1 file changed, 110 insertions(+), 3 deletions(-) diff --git a/website/src/documentation/transforms/python/element-wise/withtimestamps.md b/website/src/documentation/transforms/python/element-wise/withtimestamps.md index a9dcd56..8495063 100644 --- a/website/src/documentation/transforms/python/element-wise/withtimestamps.md +++ b/website/src/documentation/transforms/python/element-wise/withtimestamps.md @@ -19,10 +19,117 @@ limitations under the License. --> # WithTimestamps + +<script type="text/javascript"> +localStorage.setItem('language', 'language-py') +</script> + Assigns timestamps to all the elements of a collection. ## Examples -See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. -## Related transforms -* [Reify]({{ site.baseurl }}/documentation/transforms/python/elementwise/reify) converts between explicit and implicit forms of Beam values. \ No newline at end of file +In the following examples, we create a pipeline with a `PCollection` and attach a timestamp value to each of its elements. +When windowing and late data play an important role in streaming pipelines, timestamps are especially useful. + +### Example 1: Timestamp by event time + +The elements themselves often already contain a timestamp field. +`beam.window.TimestampedValue` takes a value and a +[Unix timestamp](https://en.wikipedia.org/wiki/Unix_time) +in the form of seconds. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps.py tag:event_time %}``` + +Output `PCollection` after getting the timestamps: + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps_test.py tag:plant_timestamps %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +To convert from a +[`time.struct_time`](https://docs.python.org/3/library/time.html#time.struct_time) +to `unix_time` you can use +[`time.mktime`](https://docs.python.org/3/library/time.html#time.mktime). +For more information on time formatting options, see +[`time.strftime`](https://docs.python.org/3/library/time.html#time.strftime). + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps.py tag:time_tuple2unix_time %}``` + +To convert from a +[`datetime.datetime`](https://docs.python.org/3/library/datetime.html#datetime.datetime) +to `unix_time` you can use convert it to a `time.struct_time` first with +[`datetime.timetuple`](https://docs.python.org/3/library/datetime.html#datetime.datetime.timetuple). + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps.py tag:datetime2unix_time %}``` + +### Example 2: Timestamp by logical clock + +If each element has a chronological number, these numbers can be used as a +[logical clock](https://en.wikipedia.org/wiki/Logical_clock). +These numbers have to be converted to a *"seconds"* equivalent, which can be especially important depending on your windowing and late data rules. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps.py tag:logical_clock %}``` + +Output `PCollection` after getting the timestamps: + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps_test.py tag:plant_events %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Example 3: Timestamp by processing time + +If the elements do not have any time data available, you can also use the current processing time for each element. +Note that this grabs the local time of the *worker* that is processing each element. +Workers might have time deltas, so using this method is not a reliable way to do precise ordering. + +By using processing time, there is no way of knowing if data is arriving late because the timestamp is attached when the element *enters* into the pipeline. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps.py tag:processing_time %}``` + +Output `PCollection` after getting the timestamps: + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps_test.py tag:plant_processing_times %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/with_timestamps.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +## Related transforms + +* [Reify]({{ site.baseurl }}/documentation/transforms/python/elementwise/reify) converts between explicit and implicit forms of Beam values.