Hi YiZhi Liu, The spark.ml classes are part of the higher-level "Pipelines" API, which works with DataFrames. When creating this API, we decided to separate it from the old API to avoid confusion. You can read more about it here: http://spark.apache.org/docs/latest/ml-guide.html
For (3): We use Breeze, but we have to modify it in order to do distributed optimization based on Spark. Joseph On Tue, Oct 6, 2015 at 11:47 PM, YiZhi Liu <javeli...@gmail.com> wrote: > Hi everyone, > > I'm curious about the difference between > ml.classification.LogisticRegression and > mllib.classification.LogisticRegressionWithLBFGS. Both of them are > optimized using LBFGS, the only difference I see is LogisticRegression > takes DataFrame while LogisticRegressionWithLBFGS takes RDD. > > So I wonder, > 1. Why not simply add a DataFrame training interface to > LogisticRegressionWithLBFGS? > 2. Whats the difference between ml.classification and > mllib.classification package? > 3. Why doesn't ml.classification.LogisticRegression call > mllib.optimization.LBFGS / mllib.optimization.OWLQN directly? Instead, > it uses breeze.optimize.LBFGS and re-implements most of the procedures > in mllib.optimization.{LBFGS,OWLQN}. > > Thank you. > > Best, > > -- > Yizhi Liu > Senior Software Engineer / Data Mining > www.mvad.com, Shanghai, China > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >