RE: ideal number of executors per machine

2015-12-16 Thread Bui, Tri
Article below gives a good idea. http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ Play around with two configuration (large number of executor with small core, and small executor with large core) . Calculated value have to be conservative or it will make the

RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-18 Thread Bui, Tri
--- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Fri, Dec 12, 2014 at 12:23 PM, Bui, Tri tri@verizonwireless.com wrote: Thanks for the info. How do I use StandardScaler() to scale example data (10246.0,[14111.0,1.0]) ? Thx tri

Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread Bui, Tri
Hi, Trying to use LBFGS as the optimizer, do I need to implement feature scaling via StandardScaler or does LBFGS do it by default? Following code generated error Failure again! Giving up and returning, Maybe the objective is just poorly behaved ?. val data =

RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread Bui, Tri
Message- From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] Sent: Friday, December 12, 2014 12:16 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression? You need to do the StandardScaler to help

RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread Bui, Tri
Thanks for the info. How do I use StandardScaler() to scale example data (10246.0,[14111.0,1.0]) ? Thx tri -Original Message- From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] Sent: Friday, December 12, 2014 1:26 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Do I need

RE: Learning rate or stepsize automation

2014-12-09 Thread Bui, Tri
Thanks! Will try it out. From: Debasish Das [mailto:debasish.da...@gmail.com] Sent: Monday, December 08, 2014 5:13 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Learning rate or stepsize automation Hi Bui, Please use BFGS based solvers...For BFGS you don't have to specify step size

Learning rate or stepsize automation

2014-12-08 Thread Bui, Tri
Hi, Is there any way to auto calculate the optimum learning rate or stepsize via MLLIB for SGD ? Thx tri

Cannot PredictOnValues or PredictOn base on the model build with StreamingLinearRegressionWithSGD

2014-12-05 Thread Bui, Tri
Hi, The following example code is able to build the correct model.weights, but its prediction value is zero. Am I passing the PredictOnValues incorrectly? I also coded a batch version base on LinearRegressionWithSGD() with the same train and test data, iteration, stepsize info, and it was

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-12-01 Thread Bui, Tri
values, which is the lp.features. Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com] Sent: Thursday, November 27, 2014 12:22 AM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD Hi Tri, Maybe my latest responds

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
Try (hdfs:///localhost:8020/user/data/*) With 3 /. Thx tri -Original Message- From: Benjamin Cuthbert [mailto:cuthbert@gmail.com] Sent: Monday, December 01, 2014 4:41 PM To: user@spark.apache.org Subject: hdfs streaming context All, Is it possible to stream on HDFS directory

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
For the streaming example I am working on, Its accepted (hdfs:///user/data) without the localhost info. Let me dig through my hdfs config. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, December 01, 2014 4:50 PM To: Benjamin Cuthbert Cc:

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
Yep. No localhost Usually, I use hdfs:///user/data to indicates I want hdfs or file:///user/data to indicates local file directory. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, December 01, 2014 5:06 PM To: Bui, Tri Cc: Benjamin Cuthbert; user

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-11-26 Thread Bui, Tri
)).setNumIterations(args(4).toInt).setStepSize(.0001) model.trainOn(trainingData) model.predictOnValues(testData.map(lp = (lp.label, lp.features))).print() ssc.start() ssc.awaitTermination() Thanks Tri From: Bui, Tri [mailto:tri@verizonwireless.com.INVALID] Sent: Tuesday, November 25, 2014 9

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-11-26 Thread Bui, Tri
= (lp.label, lp.features))).print() [error] ^ [error] two errors found [error] (compile:compile) Compilation failed Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com] Sent: Tuesday, November 25, 2014 8:57 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Inaccurate Estimate

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-11-25 Thread Bui, Tri
().setInitialWeights(Vectors.zeros(args(3).toInt)) .setIntercept(true) But still get compilation error. Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com] Sent: Tuesday, November 25, 2014 4:08 AM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Inaccurate Estimate of weights model from

RE: Mulitple Spark Context

2014-11-14 Thread Bui, Tri
Does this also apply to StreamingContext ? What issue would I have if I have 1000s of StreaminContext ? Thanks Tri From: Daniil Osipov [mailto:daniil.osi...@shazam.com] Sent: Friday, November 14, 2014 3:47 PM To: Charles Cc: u...@spark.incubator.apache.org Subject: Re: Mulitple Spark Context

streaming linear regression is not building the model

2014-11-10 Thread Bui, Tri
Hi, The model weight is not updating for streaming linear regression. The code and data below is what I am running. import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD