rosetn commented on a change in pull request #13326:
URL: https://github.com/apache/beam/pull/13326#discussion_r524389224
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
Review comment:
I'd remove "properly" and add a comma after "watermark"
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
Review comment:
Replace "which" with "that"
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
+completed during processing. For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
+documentations.
+There are some built-in RestrictionTracker defined in Java:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.html)
+2.
[GrowableOffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.html)
+3.
[ByteKeyRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.html)
+
+We also have built-in RestrictionTracker in Python:
Review comment:
Maybe "SDFs also have a built-in RestrictionTracker implementation in
Python:"
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
+completed during processing. For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
+documentations.
+There are some built-in RestrictionTracker defined in Java:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.html)
+2.
[GrowableOffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.html)
+3.
[ByteKeyRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.html)
+
+We also have built-in RestrictionTracker in Python:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRestrictionTracker)
+
+The watermark state is a user-defined object which is used to create a
`WatermarkEstimator` from a
+`WatermarkEstimatorProvider`. The simplest watermark state could be a
`timestamp`.
+
+The watermark estimator provider lets SDF authors to define the way of
initializing the watermark
+state and creating a watermark estimator. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.WatermarkEstimatorProvider)
+has a dedicated WatermarkEstimatorProvider type.
+
+The watermark estimator is for tracking watermark when an element-restriction
pair is in progress.
Review comment:
The watermark estimator tracks the watermark when an element-restriction
pair is in progress.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
Review comment:
Make OffsetRange in code font by adding backticks. Can you change all of
the instances of class names into code font? More information here:
https://developers.google.com/style/code-in-text#some-specific-items-to-put-in-code-font
`OffsetRange`
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
Review comment:
Replace "what" with "which"
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
+completed during processing. For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
Review comment:
I recommend rewording this to be more specific
For APIs details, read the
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
reference documentation.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
+completed during processing. For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
+documentations.
+There are some built-in RestrictionTracker defined in Java:
Review comment:
I think this is missing a noun. WDYT about the following?:
There are some built-in RestrictionTracker implementations defined in Java:
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
+completed during processing. For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
+documentations.
+There are some built-in RestrictionTracker defined in Java:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.html)
+2.
[GrowableOffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.html)
+3.
[ByteKeyRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.html)
+
+We also have built-in RestrictionTracker in Python:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRestrictionTracker)
+
+The watermark state is a user-defined object which is used to create a
`WatermarkEstimator` from a
+`WatermarkEstimatorProvider`. The simplest watermark state could be a
`timestamp`.
+
+The watermark estimator provider lets SDF authors to define the way of
initializing the watermark
+state and creating a watermark estimator. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.WatermarkEstimatorProvider)
+has a dedicated WatermarkEstimatorProvider type.
+
+The watermark estimator is for tracking watermark when an element-restriction
pair is in progress.
+For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimator.html)
Review comment:
For APIs details, read the Java and Python reference documentation.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
Review comment:
I'd remove "and so forth."
https://developers.google.com/style/word-list#etc
The restriction provider lets SDF authors override default implementations,
including the ones for splitting and sizing.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
+completed during processing. For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
+documentations.
+There are some built-in RestrictionTracker defined in Java:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.html)
+2.
[GrowableOffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.html)
+3.
[ByteKeyRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.html)
+
+We also have built-in RestrictionTracker in Python:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRestrictionTracker)
+
+The watermark state is a user-defined object which is used to create a
`WatermarkEstimator` from a
+`WatermarkEstimatorProvider`. The simplest watermark state could be a
`timestamp`.
+
+The watermark estimator provider lets SDF authors to define the way of
initializing the watermark
+state and creating a watermark estimator. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.WatermarkEstimatorProvider)
+has a dedicated WatermarkEstimatorProvider type.
+
+The watermark estimator is for tracking watermark when an element-restriction
pair is in progress.
+For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimator.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.WatermarkEstimator)
+documentations.
+There are some built-in `WatermarkEstimator` defined in Java:
Review comment:
add "implementations"
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
+completed during processing. For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
+documentations.
+There are some built-in RestrictionTracker defined in Java:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.html)
+2.
[GrowableOffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.html)
+3.
[ByteKeyRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.html)
+
+We also have built-in RestrictionTracker in Python:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRestrictionTracker)
+
+The watermark state is a user-defined object which is used to create a
`WatermarkEstimator` from a
+`WatermarkEstimatorProvider`. The simplest watermark state could be a
`timestamp`.
+
+The watermark estimator provider lets SDF authors to define the way of
initializing the watermark
Review comment:
"The watermark estimator provider lets SDF authors define how to
initialize the watermark
state and create a watermark estimator."
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5324,10 +5368,15 @@ resource utilization.
A runner at any time may attempt to split a restriction while it is being
processed. This allows the
runner to either pause processing of the restriction so that other work may be
done (common for
unbounded restrictions to limit the amount of output and/or improve latency)
or split the restriction
-into two pieces, increasing the available parallelism within the system. It is
important to author a
-SDF with this in mind since the end of the restriction may change. Thus when
writing the
-processing loop, it is important to use the result from trying to claim a
piece of the restriction
-instead of assuming one can process till the end.
+into two pieces, increasing the available parallelism within the system.
Please note that different
+runners(e.g., Dataflow, Flink, Spark) have different strategies to issue
splits under batch and
+streaming execution.
+
+It is important to author an SDF with this in mind since the end of the
restriction may change. Thus
Review comment:
Cleaning up a little. WDYT about this?:
Author an SDF with this in mind since the end of the restriction may change.
When writing the processing loop, use the result from trying to claim a piece
of the restriction instead of assuming you can process until the end.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5188,16 +5188,60 @@ restriction pairs.
#### 12.1.1. A basic SDF {#a-basic-sdf}
A basic SDF is composed of three parts: a restriction, a restriction provider,
and a
-restriction tracker. The restriction is used to represent a subset of work for
a given element.
-The restriction provider lets SDF authors override default implementations for
splitting, sizing,
-watermark estimation, and so forth. In
[Java](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L92)
+restriction tracker. If you want to control the watermark properly especially
in a streaming
+pipeline, two more components are needed: a watermark estimator provider and a
watermark estimator.
+
+The restriction is a user-defined object which is used to represent a subset of
+work for a given element. For example, we defined OffsetRange as a restriction
to represent offset
+positions in
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/range/OffsetRange.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRange).
+
+The restriction provider lets SDF authors override default implementations
+for splitting, sizing, and so forth. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
and
[Go](https://github.com/apache/beam/blob/0f466e6bcd4ac8677c2bd9ecc8e6af3836b7f3b8/sdks/go/pkg/beam/pardo.go#L226),
-this is the `DoFn`.
[Python](https://github.com/apache/beam/blob/f4c2734261396858e388ebef2eef50e7d48231a8/sdks/python/apache_beam/transforms/core.py#L213)
-has a dedicated RestrictionProvider type. The restriction tracker is
responsible for tracking
-what subset of the restriction has been completed during processing.
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.RestrictionProvider)
+has a dedicated RestrictionProvider type.
+
+The restriction tracker is responsible for tracking what subset of the
restriction has been
+completed during processing. For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.RestrictionTracker)
+documentations.
+There are some built-in RestrictionTracker defined in Java:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTracker.html)
+2.
[GrowableOffsetRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.html)
+3.
[ByteKeyRangeTracker](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/ByteKeyRangeTracker.html)
+
+We also have built-in RestrictionTracker in Python:
+1.
[OffsetRangeTracker](https://beam.apache.org/releases/pydoc/current/apache_beam.io.restriction_trackers.html#apache_beam.io.restriction_trackers.OffsetRestrictionTracker)
+
+The watermark state is a user-defined object which is used to create a
`WatermarkEstimator` from a
+`WatermarkEstimatorProvider`. The simplest watermark state could be a
`timestamp`.
+
+The watermark estimator provider lets SDF authors to define the way of
initializing the watermark
+state and creating a watermark estimator. In
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/DoFn.ProcessElement.html)
+this is the `DoFn`.
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.WatermarkEstimatorProvider)
+has a dedicated WatermarkEstimatorProvider type.
+
+The watermark estimator is for tracking watermark when an element-restriction
pair is in progress.
+For APIs details, please refer to
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimator.html)
+and
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.iobase.html#apache_beam.io.iobase.WatermarkEstimator)
+documentations.
+There are some built-in `WatermarkEstimator` defined in Java:
+1.
[Manual](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.Manual.html)
+2.
[MonotonicallyIncreasing](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.MonotonicallyIncreasing.html)
+3.
[WallTime](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.WallTime.html)
+
+There are the same set of built-in `WatermarkEstimator` in Python along with
default `WatermarkEstimatorProvider` as well:
Review comment:
Along with the default `WatermarkEstimatorProvider`, there are the same
set of built-in `WatermarkEstimator` implementations in Python:
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5324,10 +5368,15 @@ resource utilization.
A runner at any time may attempt to split a restriction while it is being
processed. This allows the
runner to either pause processing of the restriction so that other work may be
done (common for
unbounded restrictions to limit the amount of output and/or improve latency)
or split the restriction
-into two pieces, increasing the available parallelism within the system. It is
important to author a
-SDF with this in mind since the end of the restriction may change. Thus when
writing the
-processing loop, it is important to use the result from trying to claim a
piece of the restriction
-instead of assuming one can process till the end.
+into two pieces, increasing the available parallelism within the system.
Please note that different
Review comment:
Different
runners (e.g., Dataflow, Flink, Spark) have different strategies to issue
splits under batch and
streaming execution.
##########
File path: website/www/site/content/en/documentation/programming-guide.md
##########
@@ -5324,10 +5368,15 @@ resource utilization.
A runner at any time may attempt to split a restriction while it is being
processed. This allows the
runner to either pause processing of the restriction so that other work may be
done (common for
unbounded restrictions to limit the amount of output and/or improve latency)
or split the restriction
-into two pieces, increasing the available parallelism within the system. It is
important to author a
-SDF with this in mind since the end of the restriction may change. Thus when
writing the
-processing loop, it is important to use the result from trying to claim a
piece of the restriction
-instead of assuming one can process till the end.
+into two pieces, increasing the available parallelism within the system.
Please note that different
+runners(e.g., Dataflow, Flink, Spark) have different strategies to issue
splits under batch and
+streaming execution.
+
+It is important to author an SDF with this in mind since the end of the
restriction may change. Thus
+when writing the processing loop, it is important to use the result from
trying to claim a piece of
+the restriction instead of assuming one can process till the end.
+
+One bad example could be:
Review comment:
Replace "bad" with "incorrect." Does this still have the same meaning?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]