[
https://issues.apache.org/jira/browse/BEAM-10940?focusedWorklogId=502721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-502721
]
ASF GitHub Bot logged work on BEAM-10940:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Oct/20 14:30
Start Date: 20/Oct/20 14:30
Worklog Time Spent: 10m
Work Description: mxm commented on a change in pull request #13120:
URL: https://github.com/apache/beam/pull/13120#discussion_r508553910
##########
File path:
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/ProcessBundleDescriptors.java
##########
@@ -174,26 +170,18 @@ private static ExecutableProcessBundleDescriptor
fromExecutableStageInternal(
}
/**
- * Patches the input coder of a stateful transform to ensure that the byte
representation of a key
- * used to partition the input element at the Runner, matches the key byte
representation received
- * for state requests and timers from the SDK Harness. Stateful transforms
always have a KvCoder
- * as input.
+ * Patches the input coder of the transform to ensure that the byte
representation of input used
+ * at the Runner, matches the byte representation received from the SDK
Harness.
*/
- private static void lengthPrefixKeyCoder(
- String inputColId, Components.Builder componentsBuilder) {
- RunnerApi.PCollection pcollection =
componentsBuilder.getPcollectionsOrThrow(inputColId);
- RunnerApi.Coder kvCoder =
componentsBuilder.getCodersOrThrow(pcollection.getCoderId());
- Preconditions.checkState(
- ModelCoders.KV_CODER_URN.equals(kvCoder.getSpec().getUrn()),
- "Stateful executable stages must use a KV coder, but is: %s",
- kvCoder.getSpec().getUrn());
- String keyCoderId = ModelCoders.getKvCoderComponents(kvCoder).keyCoderId();
- // Retain the original coder, but wrap in LengthPrefixCoder
- String newKeyCoderId =
- LengthPrefixUnknownCoders.addLengthPrefixedCoder(keyCoderId,
componentsBuilder, false);
- // Replace old key coder with LengthPrefixCoder<old_key_coder>
- kvCoder = kvCoder.toBuilder().setComponentCoderIds(0,
newKeyCoderId).build();
- componentsBuilder.putCoders(pcollection.getCoderId(), kvCoder);
+ private static void lengthPrefixAnyInputCoder(
+ String inputPCollectionId, Components.Builder componentsBuilder) {
+ RunnerApi.PCollection pcollection =
+ componentsBuilder.getPcollectionsOrThrow(inputPCollectionId);
+ String newInputCoderId =
+ LengthPrefixUnknownCoders.addLengthPrefixedCoder(
+ pcollection.getCoderId(), componentsBuilder, false);
+ componentsBuilder.putPcollections(
+ inputPCollectionId,
pcollection.toBuilder().setCoderId(newInputCoderId).build());
Review comment:
I'm wondering, why is length-prefixing the key coder not necessary
anymore? Wouldn't the SDK Harness be able to extract a non-length-prefixed key
coder even though the input coder has been legth-prefixed? This would then
cause a regression like in https://github.com/apache/beam/pull/9997 if the SDK
Harness didn't use the NESTED contex, which it currently does because we had
fix this a while ago:
https://github.com/apache/beam/blob/57d249704da0c7bf3fb4e98b087ced2a28605fb3/sdks/python/apache_beam/runners/worker/bundle_processor.py#L769
The idea was to always ensure keys are length-prefixed, so we never run into
inconsistent key encodings between the Runner and the SDK Harness.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 502721)
Time Spent: 1h 20m (was: 1h 10m)
> Portable Flink runner should handle DelayedBundleApplication from
> ProcessBundleResponse.
> ----------------------------------------------------------------------------------------
>
> Key: BEAM-10940
> URL: https://issues.apache.org/jira/browse/BEAM-10940
> Project: Beam
> Issue Type: New Feature
> Components: runner-flink
> Reporter: Boyuan Zhang
> Priority: P2
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> SDF can produce residuals by self-checkpoint, which will be returned to
> runner by ProcessBundleResponse.DelayedBundleApplication. The portable runner
> should be able to handle the DelayedBundleApplication and reschedule it based
> on the timestamp.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)