Hi Andrew,
I think this topic is broader than just defining a few traits. A popular
way of integrating ML algorithms is via the combination of dataframes
and pipelines, similar to what scipy and SparkML are offering at the
moment. Maybe it could make sense to integrate with what they have
instead of starting our own efforts?
Best,
Sebastian
On 21.07.2016 04:35, Andrew Palumbo wrote:
Hi All,
I'd like to draw your attention to MAHOUT-1856:
https://issues.apache.org/jira/browse/MAHOUT-1856
This is a discussion that has popped up several times over the last couple of
years. as we move towards building out our algorithm library, It would be great
to nail this down now.
Most Importantly to not be able to be criticized as "a loose bag of algorithms"
as we've sometimes been in the past.
The main point being It would be good to lay out common traits for
Classification, Clustering, and Optimization algorithms.
This is just a start. I created this issue a few months back, and intentionally
left off Recommender, because I was unsure if there were common traits across
them. By traits, I am referring to both both the literal meaning and more
specifically, actual Scala traits.
@pat, @tdunning, @ssc, could you give your thoughts on this?
As well, it would be good to add online flavors of different algorithm classes
into the mix.
@tdunning could you share some thoughts here?
Trevor Grant will be heading up this effort, and It would be great if we all as a team
could come up with abstract design plans for each class of algorithm (as well as to
determine the current "classes of algorithms", as each of us has our own unique
blend of specializations. And could give our thoughts on this.
Currently this is really the opening of the conversation.
It would be best to post thoughts on:
https://issues.apache.org/jira/browse/MAHOUT-1856
Any feedback is welcomed.
Thanks,
Andy