Re: Implement bunch of transformations applied to same source stream in Apache Flink in parallel and combine result

2017-10-12 Thread Piotr Nowojski
Hi, What is the number of events per second that you wish to process? If it’s high enough (~ number of machines * number of cores) you should be just fine, instead of scaling with number of features, scale with number of events. If you have a single data source you still could randomly shuffle

Implement bunch of transformations applied to same source stream in Apache Flink in parallel and combine result

2017-10-11 Thread Andrey Salnikov
Hi! Could you please help me - I'm trying to use Apache Flink for machine learning tasks with external ensemble/tree libs like XGBoost, so my workflow will be like this: - receive single stream of data which atomic event looks like a simple vector event=(X1, X2, X3...Xn) and it can be