[ 
https://issues.apache.org/jira/browse/BEAM-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17547822#comment-17547822
 ] 

Kenneth Knowles commented on BEAM-6819:
---------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/19533

> Remote sources provide insufficient metadata about relative sizes of splits
> ---------------------------------------------------------------------------
>
>                 Key: BEAM-6819
>                 URL: https://issues.apache.org/jira/browse/BEAM-6819
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Sunil Pedapudi
>            Priority: P3
>              Labels: Clarified
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the current split protocol, SourceMetadata is reported for the initial 
> parent source. Subsequent splits drop the SourceMetadata. Without this 
> additional information, downstream systems make simplifying assumptions that 
> result in decorrelation between input fraction and the actual fraction of 
> input represented by a task. 
> This decorrelation of input fraction has cascading negative effects for any 
> system relying on trends in input fraction (eg., Cloud Dataflow's autotuning).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to