[mllib] useFeatureScaling likes hardcode in LogisticRegressionWithLBFGS and is not comprehensive for users.

2014-11-26 Thread Yanbo Liang
Hi All, LogisticRegressionWithLBFGS set useFeatureScaling to true default which can improve the convergence during optimization. However, other model training method such as LogisticRegressionWithSGD does not set useFeatureScaling to true by default and the corresponding set function is private

Re: java.lang.OutOfMemoryError at simple local test

2014-11-26 Thread rzykov
We made some changes in code (it generates 1000 * 1000 elements) and memory limits up to 100M: def generate = { for{ j - 1 to 10 i - 1 to 1000 } yield(j, i) } ~/soft/spark-1.1.0-bin-hadoop2.3/bin/spark-submit --master local --executor-memory 100M --driver-memory 100M --class Spill

Re: How to resolve Spark site issues?

2014-11-26 Thread York, Brennon
A diff you say?! Done and done. If someone with privileges to push to the SVN site repo could check it out I think we¹d be good to go. @Sean, thanks for the repo URL! On 11/25/14, 2:51 PM, Sean Owen so...@cloudera.com wrote: For the interested, the SVN repo for the site is viewable at

Re: [mllib] useFeatureScaling likes hardcode in LogisticRegressionWithLBFGS and is not comprehensive for users.

2014-11-26 Thread Xiangrui Meng
Hi Yanbo, We scale the model coefficients back after training. So scaling in prediction is not necessary. We had some discussion about this. I'd like to treat feature scaling as part of the feature transformation, and recommend users to apply feature scaling before training. It is a cleaner

Re: [mllib] useFeatureScaling likes hardcode in LogisticRegressionWithLBFGS and is not comprehensive for users.

2014-11-26 Thread DB Tsai
Hi Yanbo, As Xiangrui said, the feature scaling in training step is transparent to users, and in theory, with/without feature scaling, the optimization should converge to the same solution after transforming to the original space. In short, we do the training in the scaled space, and get the

Re: How to resolve Spark site issues?

2014-11-26 Thread Reynold Xin
Thanks, Brennon. I pushed the change and updated the website. On Wed, Nov 26, 2014 at 8:17 AM, York, Brennon brennon.y...@capitalone.com wrote: A diff you say?! Done and done. If someone with privileges to push to the SVN site repo could check it out I think we¹d be good to go. @Sean, thanks

Re: [mllib] useFeatureScaling likes hardcode in LogisticRegressionWithLBFGS and is not comprehensive for users.

2014-11-26 Thread Shaocun Tian
Hi, all As I understand, with feature scaling the optimization algorithm will converge faster. Here I have a question about doing scaling multi times. I know that doing more standard scaling will cause no difference. But if I want to try MinMax scaling, would it be weird to using standard

Fwd: How the sequence of blockManagerId's are constructed in spark/*/storage/blockManagerMasterActor.getPeers()?

2014-11-26 Thread rapelly kartheek
-- Forwarded message -- From: rapelly kartheek kartheek.m...@gmail.com Date: Thu, Nov 27, 2014 at 11:47 AM Subject: How the sequence of blockManagerId's are constructed in spark/*/storage/blockManagerMasterActor.getPeers()? To: u...@spark.apache.org Hi, I've been fiddling with