pabloem commented on a change in pull request #13326:
URL: https://github.com/apache/beam/pull/13326#discussion_r526305889



##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,63 @@ restriction pairs.
 #### 12.1.1. A basic SDF {#a-basic-sdf}
 
 A basic SDF is composed of three parts: a restriction, a restriction provider, 
and a
-restriction tracker. The restriction is used to represent a subset of work for 
a given element.
-The restriction provider lets SDF authors override default implementations for 
splitting, sizing,
-watermark estimation, and so forth. In 
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark, especially in a 
streaming
+pipeline, two more components are needed: a watermark estimator provider and a 
watermark estimator.
+
+The restriction is a user-defined object that is used to represent a subset of
+work for a given element. For example, we defined `OffsetRange` as a 
restriction to represent offset
+positions in 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
 
+and 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations, 
including the ones for
+splitting and sizing. In 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
 and 
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`. 
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is 
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`. 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated `RestrictionProvider` type.
+
+The restriction tracker is responsible for tracking which subset of the 
restriction has been
+completed during processing. For APIs details, read the 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
 
+and 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
+reference documentation.
+
+There are some built-in `RestrictionTracker` implementations defined in Java:
+1. 
[OffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.html)
+2. 
[GrowableOffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.html)
+3. 
[ByteKeyRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.html)
+
+The SDF also has a built-in `RestrictionTracker` implementation in Python:
+1. 
[OffsetRangeTracker](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRestrictionTracker)
+
+The watermark state is a user-defined object which is used to create a 
`WatermarkEstimator` from a
+`WatermarkEstimatorProvider`. The simplest watermark state could be a 
`timestamp`.
+
+The watermark estimator provider lets SDF authors define how to initialize the 
watermark state and
+create a watermark estimator. In 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
+this is the `DoFn`. 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.WatermarkEstimatorProvider)
+has a dedicated `WatermarkEstimatorProvider` type.
+
+The watermark estimator tracks the watermark when an element-restriction pair 
is in progress.
+For APIs details, read the 
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimator.html)
+and 
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.WatermarkEstimator)
+reference documentation.
+
+There are some built-in `WatermarkEstimator` implementations in Java:
+1. 
[Manual](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.Manual.html)
+2. 
[MonotonicallyIncreasing](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.MonotonicallyIncreasing.html)
+3. 
[WallTime](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.WallTime.html)
+
+Along with the default `WatermarkEstimatorProvider`, there are the same set of 
built-in
+`WatermarkEstimator` implementations in Python:
+1. 
[ManualWatermarkEstimator](https://beam.apache.org/releases/pydoc/current/apache_beam.io.watermark_estimators.html#apache_beam.io.watermark_estimators.ManualWatermarkEstimator)
+2. 
[MonotonicWatermarkEstimator](https://beam.apache.org/releases/pydoc/current/apache_beam.io.watermark_estimators.html#apache_beam.io.watermark_estimators.MonotonicWatermarkEstimator)
+3. 
[WalltimeWatermarkEstimator](https://beam.apache.org/releases/pydoc/current/apache_beam.io.watermark_estimators.html#apache_beam.io.watermark_estimators.WalltimeWatermarkEstimator)
 
 To define an SDF, you must choose whether the SDF is bounded (default) or
 unbounded and define a way to initialize an initial restriction for an element.

Review comment:
       Perhaps you can explain how does one know if it's bounded / unbounded? 
   
   I took a stab at it here, but feel free to write your own if you think this 
is not good: 
   ```
   Bounded DoFns are those where the work represented by an element is well 
known a priori, and that 
   it has an end. Unbounded are those where the amount of work does not have a 
specific end, or the 
   amount of work is not known a priori. The boundedness/unboundedness of your 
SDF has implications
    for Bundle Finalization(link).
   ```
   
   Perhaps it's useful to mention examples?
   ```
   An example of an unbounded element would be a Kafka or a PubSub topic; while 
a bounded element 
   may represent a file, or group of files.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to