Hi Felix!

We had a similar usecase and I trained multiple models on partitions of
my data with mapPartition and the model-parameters (weights) as
broadcast variable. If I understood broadcast variables in Flink
correctly, you should end up with one model on each TaskManager.

Does that work?

Felix

Am 07.07.2015 um 17:32 schrieb Felix Neutatz:
> Hi,
> 
> at the moment I have a dataset which looks like this:
> 
> DataSet[model_ID, DataVector] data
> 
> So what I want to do is group by the model_ID and build for each model_ID
> one regression model
> 
> in pseudo code:
> data.groupBy(model_ID)
>         --> MultipleLinearRegression().fit(data_grouped)
> 
> Is there anyway besides an iteration how to do this at the moment?
> 
> Thanks for your help,
> 
> Felix Neutatz
> 

Reply via email to