[ 
https://issues.apache.org/jira/browse/FLINK-24279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yun Gao closed FLINK-24279.
---------------------------
    Fix Version/s: 0.1.0
         Assignee: Zhipeng Zhang
       Resolution: Fixed

>  Support withBroadcast with DataStream API in Flink ML Library
> --------------------------------------------------------------
>
>                 Key: FLINK-24279
>                 URL: https://issues.apache.org/jira/browse/FLINK-24279
>             Project: Flink
>          Issue Type: New Feature
>          Components: Library / Machine Learning
>            Reporter: Zhipeng Zhang
>            Assignee: Zhipeng Zhang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.1.0
>
>
> When doing machine learning using DataStream, we found that DataStream lacks 
> withBroadcast() function, which could be useful in machine learning.
>  
> A DataSet-based demo is like:
> {code:java}
> DataSet<?> d1 = ...;
> DataSet<?> d2 = ...;
> d1.map(new RichMapFunction <?, ?>() {
>        @Override
>        public Object map(Object aLong) throws Exception{
>             List<?> elements = getRuntimeContext().getBroadcastVariable("d2");
>             ...;           
>        }
> }).withBroadcastSet(d2, "d2");
> {code}
>  
> The withBroadcast() function incurs priority-base data-consuming. For example 
> in the above code snippet, we cannot consume any element from d1 before we 
> consumed all of elements in d2. 
>   
>  Thus when supporting withBroadcast() in DataStream, we also need 
> priority-base data-consuming. This could probably lead to deadlock and 
> DataStream does not provide a solution for deadlock.
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to