Re: Python vs Scala performance

2014-10-22 Thread Eustache DIEMERT
Wild guess maybe, but do you decode the json records in Python ? it could be much slower as the default lib is quite slow. If so try ujson [1] - a C implementation that is at least an order of magnitude faster. HTH [1] https://pypi.python.org/pypi/ujson 2014-10-22 16:51 GMT+02:00 Marius

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
of these regression algorithms, for example how to choose a good step and number of iterations? I wonder if I'm using those right... Thanks, -- *Thomas ROBERT* www.creativedata.fr 2014-07-03 16:16 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Printing the model show the intercept is always 0

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
+02:00 Eustache DIEMERT eusta...@diemert.fr: Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-07 Thread Eustache DIEMERT
the ones column. Does anyone here has had success with this code on real-world datasets ? [1] https://github.com/oddskool/mllib-samples/tree/ridge (in the ridge branch) 2014-07-07 9:08 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Well, why not, but IMHO MLLib Logistic Regression is unusable

Re: [mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-03 Thread Eustache DIEMERT
Printing the model show the intercept is always 0 :( Should I open a bug for that ? 2014-07-02 16:11 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems

[mllib] strange/buggy results with RidgeRegressionWithSGD

2014-07-02 Thread Eustache DIEMERT
Hi list, I'm benchmarking MLlib for a regression task [1] and get strange results. Namely, using RidgeRegressionWithSGD it seems the predicted points miss the intercept: {code} val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000) ... valuesAndPreds.take(10).map(t = println(t))

Re: How to use K-fold validation in spark-1.0?

2014-06-24 Thread Eustache DIEMERT
I'm interested in this topic too :) Are the MLLib core devs on this list ? E/ 2014-06-24 14:19 GMT+02:00 holdingonrobin robinholdin...@gmail.com: Anyone knows anything about it? Or should I actually move this topic to a MLlib specif mailing list? Any information is appreciated! Thanks!

Re: MLLib inside Storm : silly or not ?

2014-06-20 Thread Eustache DIEMERT
learning. On Thu, Jun 19, 2014 at 12:26 AM, Eustache DIEMERT eusta...@diemert.fr wrote: Hi Sparkers, We have a Storm cluster and looking for a decent execution engine for machine learned models. What I've seen from MLLib is extremely positive, but we can't just throw away our Storm based stack

MLLib inside Storm : silly or not ?

2014-06-19 Thread Eustache DIEMERT
Hi Sparkers, We have a Storm cluster and looking for a decent execution engine for machine learned models. What I've seen from MLLib is extremely positive, but we can't just throw away our Storm based stack. So my question is: is it feasible/recommended to train models in Spark/MLLib and execute

Re: MLLib inside Storm : silly or not ?

2014-06-19 Thread Eustache DIEMERT
, which at least provides an online lda. C On Thursday, June 19, 2014, Eustache DIEMERT eusta...@diemert.fr wrote: Hi Sparkers, We have a Storm cluster and looking for a decent execution engine for machine learned models. What I've seen from MLLib is extremely positive, but we can't just

Re: Random Forest on Spark

2014-04-18 Thread Eustache DIEMERT
Is there a PR or issue where GBT / RF progress in MLLib is tracked ? 2014-04-17 21:11 GMT+02:00 Evan R. Sparks evan.spa...@gmail.com: Sorry - I meant to say that Multiclass classification, Gradient Boosting, and Random Forest support based on the recent Decision Tree implementation in MLlib