Re: LinearRegressionModel - Negative Predicted Value

2017-03-06 Thread Manish Maheshwari
Thanks Sean. Our training MSE is really large. We definitely need better
predictor variables.

Training Mean Squared Error = 7.72E8

Thanks,
Manish


On Mon, Mar 6, 2017 at 4:45 PM, Sean Owen <so...@cloudera.com> wrote:

> There's nothing unusual about negative values from a linear regression.
> If, generally, your predicted values are far from your actual values, then
> your model hasn't fit well. You may have a bug somewhere in your pipeline
> or you may have data without much linear relationship. Most of this isn't a
> Spark problem.
>
> On Mon, Mar 6, 2017 at 8:05 AM Manish Maheshwari <mylogi...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> We are using a LinearRegressionModel in Scala. We are using a standard
>> StandardScaler to normalize the data before modelling.. the Code snippet
>> looks like this -
>>
>> *Modellng - *
>> val labeledPointsRDD = tableRecords.map(row =>
>> {
>> val filtered = row.toSeq.filter({ case s: String => false case _ => true
>> })
>> val converted = filtered.map({ case i: Int => i.toDouble case l: Long =>
>> l.toDouble case d: Double => d case _ => 0.0 })
>> val features = Vectors.dense(converted.slice(1,
>> converted.length).toArray)
>> LabeledPoint(converted(0), features)
>> })
>> val scaler1 = new StandardScaler().fit(labeledPointsRDD.map(x =>
>> x.features))
>> save(sc, scalarModelOutputPath, scaler1)
>> val normalizedData = labeledPointsRDD.map(lp => {LabeledPoint(lp.label,
>> scaler1.transform(lp.features))})
>> val splits = normalizedData.randomSplit(Array(0.8, 0.2))
>> val trainingData = splits(0)
>> val testingData = splits(1)
>> trainingData.cache()
>> var regression = new LinearRegressionWithSGD().setIntercept(true)
>> regression.optimizer.setStepSize(0.01)
>> val model = regression.run(trainingData)
>> model.save(sc, modelOutputPath)
>>
>> Post that when we score the model on the same data that it was trained on
>> using the below snippet we see this -
>>
>> *Scoring - *
>> val labeledPointsRDD = tableRecords.map(row =>
>> {val filtered = row.toSeq.filter({ case s: String => false case _ => true
>> })
>> val converted = filtered.map({ case i: Int => i.toDouble case l: Long =>
>> l.toDouble case d: Double => d case _ => 0.0 })
>> val features = Vectors.dense(converted.toArray)
>> (row(0), features)
>> })
>> val scaler1 = read(sc,scalarModelOutputPath)
>> val normalizedData = labeledPointsRDD.map(p => (p._1,
>> scaler1.transform(p._2)))
>> normalizedData.cache()
>> val model = LinearRegressionModel.load(sc,modelOutputPath)
>> val valuesAndPreds = normalizedData.map(p => (p._1.toString(),
>> model.predict(p._2)))
>>
>> However, a lot of predicted values are negative. The input data has no
>> negative values we we are unable to understand this behaviour.
>> Further the order and sequence of all the variables remains the same in
>> the modelling and testing data frames.
>>
>> Any ideas?
>>
>> Thanks,
>> Manish
>>
>>


LinearRegressionModel - Negative Predicted Value

2017-03-06 Thread Manish Maheshwari
Hi All,

We are using a LinearRegressionModel in Scala. We are using a standard
StandardScaler to normalize the data before modelling.. the Code snippet
looks like this -

*Modellng - *
val labeledPointsRDD = tableRecords.map(row =>
{
val filtered = row.toSeq.filter({ case s: String => false case _ => true })
val converted = filtered.map({ case i: Int => i.toDouble case l: Long =>
l.toDouble case d: Double => d case _ => 0.0 })
val features = Vectors.dense(converted.slice(1, converted.length).toArray)
LabeledPoint(converted(0), features)
})
val scaler1 = new StandardScaler().fit(labeledPointsRDD.map(x =>
x.features))
save(sc, scalarModelOutputPath, scaler1)
val normalizedData = labeledPointsRDD.map(lp => {LabeledPoint(lp.label,
scaler1.transform(lp.features))})
val splits = normalizedData.randomSplit(Array(0.8, 0.2))
val trainingData = splits(0)
val testingData = splits(1)
trainingData.cache()
var regression = new LinearRegressionWithSGD().setIntercept(true)
regression.optimizer.setStepSize(0.01)
val model = regression.run(trainingData)
model.save(sc, modelOutputPath)

Post that when we score the model on the same data that it was trained on
using the below snippet we see this -

*Scoring - *
val labeledPointsRDD = tableRecords.map(row =>
{val filtered = row.toSeq.filter({ case s: String => false case _ => true })
val converted = filtered.map({ case i: Int => i.toDouble case l: Long =>
l.toDouble case d: Double => d case _ => 0.0 })
val features = Vectors.dense(converted.toArray)
(row(0), features)
})
val scaler1 = read(sc,scalarModelOutputPath)
val normalizedData = labeledPointsRDD.map(p => (p._1,
scaler1.transform(p._2)))
normalizedData.cache()
val model = LinearRegressionModel.load(sc,modelOutputPath)
val valuesAndPreds = normalizedData.map(p => (p._1.toString(),
model.predict(p._2)))

However, a lot of predicted values are negative. The input data has no
negative values we we are unable to understand this behaviour.
Further the order and sequence of all the variables remains the same in the
modelling and testing data frames.

Any ideas?

Thanks,
Manish


SparkR - Support for Other Models

2015-09-09 Thread Manish MAHESHWARI
Hello,

Is there a time line to add support for other model types like SVD, Cluster, 
GBM etc in the subsequent releases. 1.5 Added support for Linear models only.
If there is, where can we know the tentative timeline of the same.

Thanks,
Manish

CONFIDENTIAL NOTE:
The information contained in this email is intended only for the use of the 
individual or entity named above and may contain information that is 
privileged, confidential and exempt from disclosure under applicable law. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this message in error, please 
immediately notify the sender and delete the mail. Thank you.