Hi Robin,

You can try this PR out. This has built-in features scaling, and has
ElasticNet regularization (L1/L2 mix). This implementation can stably
converge to model from R's glmnet package.

https://github.com/apache/spark/pull/4259

Sincerely,

DB Tsai
-------------------------------------------------------
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai



On Thu, Jan 15, 2015 at 9:42 AM, Robin East <robin.e...@xense.co.uk> wrote:
> -dev, +user
>
> You’ll need to set the gradient descent step size to something small - a bit 
> of trial and error shows that 0.00000001 works.
>
> You’ll need to create a LinearRegressionWithSGD instance and set the step 
> size explicitly:
>
> val lr = new LinearRegressionWithSGD()
> lr.optimizer.setStepSize(0.00000001)
> lr.optimizer.setNumIterations(100)
> val model = lr.run(parsedData)
>
> On 15 Jan 2015, at 16:46, devl.development <devl.developm...@gmail.com> wrote:
>
>> From what I gather, you use LinearRegressionWithSGD to predict y or the
>> response variable given a feature vector x.
>>
>> In a simple example I used a perfectly linear dataset such that x=y
>> y,x
>> 1,1
>> 2,2
>> ...
>>
>> 10000,10000
>>
>> Using the out-of-box example from the website (with and without scaling):
>>
>> val data = sc.textFile(file)
>>
>>    val parsedData = data.map { line =>
>>      val parts = line.split(',')
>>     LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
>> and x
>>
>>    }
>>    val scaler = new StandardScaler(withMean = true, withStd = true)
>>      .fit(parsedData.map(x => x.features))
>>    val scaledData = parsedData
>>      .map(x =>
>>      LabeledPoint(x.label,
>>        scaler.transform(Vectors.dense(x.features.toArray))))
>>
>>    // Building the model
>>    val numIterations = 100
>>    val model = LinearRegressionWithSGD.train(parsedData, numIterations)
>>
>>    // Evaluate model on training examples and compute training error *
>> tried using both scaledData and parsedData
>>    val valuesAndPreds = scaledData.map { point =>
>>      val prediction = model.predict(point.features)
>>      (point.label, prediction)
>>    }
>>    val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
>>    println("training Mean Squared Error = " + MSE)
>>
>> Both scaled and unscaled attempts give:
>>
>> training Mean Squared Error = NaN
>>
>> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
>> still comes up with the same thing.
>>
>> Is this not supposed to work for x and y or 2 dimensional plots? Is there
>> something I'm missing or wrong in the code above? Or is there a limitation
>> in the method?
>>
>> Thanks for any advice.
>>
>>
>>
>> --
>> View this message in context: 
>> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
>> Sent from the Apache Spark Developers List mailing list archive at 
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to