[ https://issues.apache.org/jira/browse/BEAM-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17547822#comment-17547822 ]
Kenneth Knowles commented on BEAM-6819: --------------------------------------- This issue has been migrated to https://github.com/apache/beam/issues/19533 > Remote sources provide insufficient metadata about relative sizes of splits > --------------------------------------------------------------------------- > > Key: BEAM-6819 > URL: https://issues.apache.org/jira/browse/BEAM-6819 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Reporter: Sunil Pedapudi > Priority: P3 > Labels: Clarified > Time Spent: 10m > Remaining Estimate: 0h > > In the current split protocol, SourceMetadata is reported for the initial > parent source. Subsequent splits drop the SourceMetadata. Without this > additional information, downstream systems make simplifying assumptions that > result in decorrelation between input fraction and the actual fraction of > input represented by a task. > This decorrelation of input fraction has cascading negative effects for any > system relying on trends in input fraction (eg., Cloud Dataflow's autotuning). -- This message was sent by Atlassian Jira (v8.20.7#820007)