Hello everyone, First of all congratulations for putting your effort in creating Samoa and making it open source. While there is a lot of research around distributed streaming ML, there is great lack of actual tools, frameworks and programming models that let people use such techniques and compose custom pipelines. My main experience with SAMOA so far is at its runtime level since I worked on adding Apache Flink as one of the main adapters together with Faye (CCed) . We generally did not face any great challenges, apart from the fact that we had do manual circle detection in order to support Flink's iterations but maybe we could sum up some observations here to consider:
* There was no documentation regarding the integration of new backend systems. We had to extract information from a thesis report and by looking into the existing adapters source code and structure to do that. Perhaps a doc guide for backend contributors would be very useful in the future. * If you notice all adapters there is a lot of replicated logic for each system. That means that there is some room for more abstractions to generalise things such as parametrising, instantiating and deploying tasks. * Regarding the programming model, I noticed that you sometimes use action triggers (e.g. ‘evaluate every x records’). You could maybe abstract trigger actions so you can reuse them in various components. Another advantage is that you could even expose them downwards from tasks so systems like Apache Flink, Crunch or Google Dataflow can override (and optimise) some of this logic using build-in windowing semantics. This is just a simple idea but I think there is in general potential in abstracting and exposing as much as possible while still keeping implementation complexities to a minimum. We will keep and eye on the dev-list and participate actively with more feedback the more we use Samoa and find out needs. Currently, we are working on an experimental ML pipeline prototype that also works on streams so we will try to keep it as much in sync with Samoa as possible. cheers Paris, Faye
