Hi Felix, thanks for the idea. But doesn't this mean that I can only train one model per partition? The thing is, I have way more models than that :(
Best regards, Felix 2015-07-07 22:37 GMT+02:00 Felix Schüler <fschue...@posteo.de>: > Hi Felix! > > We had a similar usecase and I trained multiple models on partitions of > my data with mapPartition and the model-parameters (weights) as > broadcast variable. If I understood broadcast variables in Flink > correctly, you should end up with one model on each TaskManager. > > Does that work? > > Felix > > Am 07.07.2015 um 17:32 schrieb Felix Neutatz: > > Hi, > > > > at the moment I have a dataset which looks like this: > > > > DataSet[model_ID, DataVector] data > > > > So what I want to do is group by the model_ID and build for each model_ID > > one regression model > > > > in pseudo code: > > data.groupBy(model_ID) > > --> MultipleLinearRegression().fit(data_grouped) > > > > Is there anyway besides an iteration how to do this at the moment? > > > > Thanks for your help, > > > > Felix Neutatz > > >