Dear Community, We have been working on integrating the Apache Apex platform as an SPE for Apache Samoa. I understand the integration process and the APIs that are exposed by Apache Samoa for integration with other SPEs. I have a few questions regarding the internals of Samoa which would help me complete the integration and ultimately open a PR.
1. I see that all tasks implemented in Samoa are Evaluation tasks - PrequentialEvaluation or ClusteringEvaluation. What if we need to evaluate actual test instances (for example, instances which are not part of the training set)? Do we need to write another Task to evaluate test instances? 2. If yes, then how to we access the trained model. I understand that streaming algorithms would not produce a one-time trained model. Given that, there should be some way of identifying the state of the current model. For example, in the VHT evaluation, we build a decision tree over time using the training instances. Now, if I have some actual test instances to classify using the VHT, what is the way to do that? 3. How do we see the results of a classification task or a clustering task. Ideally I would like to see the class labels given to input instances for a classification task or cluster numbers given to input instances for a clustering task. I could not find any option to view such results. 4. Apache Apex uses Kryo for serialization and hence it needs a class to have a default constructor. I noticed that many of the Samoa classes do not have default constructors and throw an exception when running on Apex. Would it be okay of the Apex integration PR adds these default constructors to the Samoa classes for the purpose of serialization? Thanks in anticipation! -Bhupesh -- Regards, Bhupesh Chawda
