You probably need to scale the values in the data set so that they are
all of comparable ranges and translate them so that their means get to 0.
You can use pyspark.mllib.feature.StandardScaler(True, True) object for
that.
On 28.5.2015. 6:08, Maheshakya Wijewardena wrote:
Hi,
I'm trying
Thanks for the information. I'll try that out with Spark 1.4.
On Thu, May 28, 2015 at 9:54 AM, DB Tsai dbt...@dbtsai.com wrote:
LinearRegressionWithSGD requires to tune the step size and # of
iteration very carefully. Please try Linear Regression with elastic
net implementation in Spark 1.4
Sorry. I mean the parameter step.
2015-05-28 12:21 GMT+08:00 Maheshakya Wijewardena mahesha...@wso2.com:
What is the parameter for the learning rate alpha? LinearRegressionWithSGD
has only following parameters.
@param data: The training data.
@param iterations:The
Hi,
I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the
attached dataset. The code is attached. When I check the model weights
vector after training, it contains `nan` values.
[nan,nan,nan,nan,nan,nan,nan,nan]
But for some data sets, this problem does not occur. What might
LinearRegressionWithSGD requires to tune the step size and # of
iteration very carefully. Please try Linear Regression with elastic
net implementation in Spark 1.4 in ML framework, which uses quasi
newton method and step size will be automatically determined. That
implementation also matches the