Ditto, thanks for reaching out Jim; grateful for your offer. We are cutting an 0.13 release in the next couple weeks and I know we could use help testing/signing/etc.
Best Andrew On Thu, Feb 9, 2017 at 10:48 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > Jim, let me start by stating it's an (unexpected on my side) honor. Are you > willing to get hands-on at this point in numerical problems (or have > resources that can get hands-on)? > > Short modern Mahout story (as short as it is possible to be short) > > Most nagging problem: lack of support by industry and/or academia. We have > capable committers but less capable capable backers in terms of willingness > to sanction contributions. > > Current mahout development goes 2 ways: (a) the platform (aka `samsara`); > and (b) useful, preferrably end2end use case scenarious, or just > methodology implementation. Note that while (b) is intended to use (a) (and > gain backend portability as a bonus), it is not strictly required as long > as the backend-speicific code could be fairly easily ported to other > backends. Still though, if we come across a need for custom code, we try to > analyze the situation if it is something that might be a fairly common > abstraction so we could add it to the formalisms list we got in the > platform and avoid repetition in the future. Platform primer could be found > on the site, I won't be getting into that now. > > In the platform the problem #1, currently, is the performance. Not that it > is generally bad, but some pieces are limited by back-ends. We did some > in-memory work to integrate more performing backends there but the effort > is constrained by our immediate capacities to contribute, and the most > glaring issue (as one of visitors duly noted in jira) is that the > distributed backends we are trying to run are severely limited in terms of > interconnected algebraic problems. We have ideas what to do here though. > > It is the very distributed performance of interconnected numerical problems > of the current backends (flink, spark) which precludes Mahout from being a > pragmatical platform for implementing deep learning at scale, for example. > I suppose in-memory performance should be ok for that purpose once we have > added GPU and DL specific GPU primitives. The in-memory improvements are > not complete for everything that would be ideal, but there has been some > notable progress there. > > With methodologies, well, there's no one single most pressing problem, it > is really just defined by a pragmatical problem one has at hand. Currently, > Trevor does the most of this outstanding work. It simply and preferably > should be a more edgy than most distributed packages offer. > > E.g., decent-to-good bayesian optimization for hyperparameters, or say I > was suggesting to experiment with LRFM recommendation techniques for a few > years, as they significantly expand on type of predictors the method can > take, and their treatment, compared to things like COO or implicit feedback > behavior-based recommenders. Another example is there's no good coverage in > clustering in terms of _type_ of clustering -- mixtures, density, spectral, > not just traditional centroid type of methods. Visualization techniques, > even as simple as 2d density estimators for big datasets are also in > demand. Generally speaking, industry has stepped far ahead in terms of > visualization approaches than commonly is available in open source > software. Bottom line, the only guidance here i see is -- "don't be > trivial. Seek unique value proposition". But most guiding principle so far > was people's pragmatism: "I have actual production use case and/or very > specific requirements for that, I want to use the methodology X for that, > and I don't seem to be able to find it elsewhere under management of a > distributed platform Y". > > -d > > > On Thu, Feb 9, 2017 at 6:34 AM, Jim Jagielski <j...@jagunet.com> wrote: > > > > > > On Feb 8, 2017, at 11:50 PM, Suneel Marthi <smar...@apache.org> wrote: > > > > > > Curious JimJag, > > > Did some dude from CapitalOne poke u about Mahout > > > > > > > Not really, no... > > >