PS Maybe we should say, if you can provide kryo serialization, it can be assumed platform agnostic, and provide api for embedding that further. In practice all backends (except, I guess, H20 which is going extinct if not yet) currently support kryo, and the new potential ones could easily add it too (after all it is just a bunch of bytes after serialization, can't get any more basic than that).
On Thu, Jul 6, 2017 at 11:21 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > On Thu, Jul 6, 2017 at 9:45 AM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > >> To Dmitriy's point (2)- I think it is acceptable to create an R-Tree >> structure, that will exist only within the algorithm for doing in-core >> operations, (or maybe it lives slightly outside of the algorithm so we >> don't need to recreate trees for DBSCAN, Random Forrests, other tree-based >> algorithms- e.g. we can reuse the same trees for various algorithms.) BUT >> Trees only exist WITHIN the in-core, i.e. we don't want to modify the >> allReduceBlock to accept Matrices OR Trees, that will get out of hand >> fast. Please anyone chime in to correct me/argue against. >> > > +1. that's exactly what i meant. > > >> So really, we've stumbled into a more important philosophical question- >> and >> that is: Is it acceptable to create objects which make the internals of >> algorithms easier to read and work with, so long as they may be serialized >> to incore matrices/vectors? I am +1, and if it is decided this is not >> acceptable, I need to go back and alter (or drop) things like the CanopyFn >> [2] of the Canopy Clustering Algorithm. >> > > +1 too if it is practical. > The dilemma here is that if one wants to stay platform agnostic then the > algorithm has to use platform-agnostic persistence/serialization, of which > samsara provides only that of DRM/Matrix/Vector. So yes, if it is naturally > mapping to record-tagged numerical information, it is preferable (and > that's what i actually did a lot encoding models). > > In practice however of course in a particular application settings it is > often such that people can't car less about backend compatibility, in which > case a custom serialization is totally ok. But it in public mahout version > it would run against the party line of staying backend agnostic so if at > all possible with a little overhead, we try to avoid it. >