Hi Joseph,

Thank you for clarifying the motivation that you setup a different API
for ml pipelines, it sounds great. But I still think we could extract
some common parts of the training & inference procedures for ml and
mllib. In ml.classification.LogisticRegression, you simply transform
the DataFrame into RDD and follow the same procedures in
mllib.optimization.{LBFGS,OWLQN}, right?

My suggestion is, if I may, ml package should focus on the public API,
and leave the underlying implementations, e.g. numerical optimization,
to mllib package.

Please let me know if my understanding has any problem. Thank you!

2015-10-08 1:15 GMT+08:00 Joseph Bradley <jos...@databricks.com>:
> Hi YiZhi Liu,
>
> The spark.ml classes are part of the higher-level "Pipelines" API, which
> works with DataFrames.  When creating this API, we decided to separate it
> from the old API to avoid confusion.  You can read more about it here:
> http://spark.apache.org/docs/latest/ml-guide.html
>
> For (3): We use Breeze, but we have to modify it in order to do distributed
> optimization based on Spark.
>
> Joseph
>
> On Tue, Oct 6, 2015 at 11:47 PM, YiZhi Liu <javeli...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> I'm curious about the difference between
>> ml.classification.LogisticRegression and
>> mllib.classification.LogisticRegressionWithLBFGS. Both of them are
>> optimized using LBFGS, the only difference I see is LogisticRegression
>> takes DataFrame while LogisticRegressionWithLBFGS takes RDD.
>>
>> So I wonder,
>> 1. Why not simply add a DataFrame training interface to
>> LogisticRegressionWithLBFGS?
>> 2. Whats the difference between ml.classification and
>> mllib.classification package?
>> 3. Why doesn't ml.classification.LogisticRegression call
>> mllib.optimization.LBFGS / mllib.optimization.OWLQN directly? Instead,
>> it uses breeze.optimize.LBFGS and re-implements most of the procedures
>> in mllib.optimization.{LBFGS,OWLQN}.
>>
>> Thank you.
>>
>> Best,
>>
>> --
>> Yizhi Liu
>> Senior Software Engineer / Data Mining
>> www.mvad.com, Shanghai, China
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>



-- 
Yizhi Liu
Senior Software Engineer / Data Mining
www.mvad.com, Shanghai, China

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to