Well... I think it is an issue that has to do with figuring out how to *avoid* import and export as much as possible.
On Tue, Apr 15, 2014 at 6:36 PM, Pat Ferrel <[email protected]> wrote: > Which is why it’s an import/export issue. > > On Apr 15, 2014, at 5:48 PM, Ted Dunning <[email protected]> wrote: > > On Tue, Apr 15, 2014 at 10:58 AM, Pat Ferrel <[email protected]> > wrote: > > > As to the statement "There is not, nor do i think there will be a way to > > run this stuff with CLI” seems unduly misleading. Really, does anyone > > second this? > > > > There will be Scala scripts to drive this stuff and yes even from the > CLI. > > Do you imagine that every Mahout USER will be a Scala + Mahout DSL > > programmer? That may be fine for commiters but users will be PHP devs, > Ruby > > devs, Python or Java devs maybe even a few C# devs. I think you are > > confusing Mahout DEVS with USERS. Few users are R devs moving into > > production work, they are production engineers moving into ML who want a > > blackbox. They will need a language agnostic way to drive Mahout. Making > > statements like this only confuse potential users and drive them away to > no > > purpose. I’m happy for the nascent Mahout-Scala shell, but it’s not in > the > > typical user’s world view. > > > > Yes, ultimately there may need to be command line programs of various > sorts, but the fact is, we need to make sure that we avoid files as the API > for moving large amounts of data. That means that we have to have some way > of controlling the persistence of in-memory objects and in many cases, that > means that processing chains will not typically be integrated at the level > of command line programs. > > Dmitriy's comment about R is apropos. You can put scripts together for > various end-user purposes but you don't have a CLI for every R comment. > Nor for every Perl, python or php command either. > > To the extent we have in-memory persistence across the life-time of > multiple driver programs, then a sort of CLI interface will be possible. I > know that h2o will do that, but I am not entirely clear on the life-time of > RDD's in Spark relative to Mahout DSL programs. Regardless of possibility, > I don't expect CLI interface to be the primary integration path for these > new capabilities. > >
