Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

DB Tsai Fri, 12 Dec 2014 13:29:39 -0800

You can do something like the following.

val rddVector = input.map({
  case (response, vec) => {
    val newVec = MLUtils.appendBias(vec)
    newVec.toBreeze(newVec.size - 1) = response
    newVec
  }
}


val scalerWithResponse = new StandardScaler(true, true).fit(rddVector)

val trainingData =  scalerWithResponse.transform(rddVector).map(x=> {
  (x(x.size - 1), Vectors.dense(x.toArray.slice(0, x.size -1))
})

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, Dec 12, 2014 at 12:23 PM, Bui, Tri <tri....@verizonwireless.com> wrote:
> Thanks for the info.
>
> How do I use StandardScaler() to scale example data  (10246.0,[14111.0,1.0]) ?
>
> Thx
> tri
>
> -----Original Message-----
> From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
> Sent: Friday, December 12, 2014 1:26 PM
> To: Bui, Tri
> Cc: user@spark.apache.org
> Subject: Re: Do I need to applied feature scaling via StandardScaler for 
> LBFGS for Linear Regression?
>
> It seems that your response is not scaled which will cause issue in LBFGS. 
> Typically, people train Linear Regression with zero-mean/unit-variable 
> feature and response without training the intercept. Since the response is 
> zero-mean, the intercept will be always zero. When you convert the 
> coefficients to the oringal space from the scaled space, the intercept can be 
> computed by w0 = y - \sum <x_n> w_n where <x_n> is the average of column n.
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Fri, Dec 12, 2014 at 10:49 AM, Bui, Tri <tri....@verizonwireless.com> 
> wrote:
>> Thanks for the confirmation.
>>
>> Fyi..The code below works for similar dataset, but with the feature 
>> magnitude changed,  LBFGS converged to the right weights.
>>
>> Example, time sequential Feature value 1, 2, 3, 4, 5, would generate the 
>> error while sequential feature 14111, 14112, 14113,14115 would converge to  
>> the right weight.  Why?
>>
>> Below is code to implement standardscaler() for sample data  
>> (10246.0,[14111.0,1.0])):
>>
>> val scaler1 = new StandardScaler().fit(train.map(x => x.features)) val
>> train1 = train.map(x => (x.label, scaler1.transform(x.features)))
>>
>> But I  keeps on getting error: "value features is not a member of (Double, 
>> org.apache.spark.mllib.linalg.Vector)"
>>
>> Should my feature vector be .toInt instead of Double?
>>
>> Also, the error  org.apache.spark.mllib.linalg.Vector  should have an
>> "s" to match import library org.apache.spark.mllib.linalg.Vectors
>>
>> Thanks
>> Tri
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
>> Sent: Friday, December 12, 2014 12:16 PM
>> To: Bui, Tri
>> Cc: user@spark.apache.org
>> Subject: Re: Do I need to applied feature scaling via StandardScaler for 
>> LBFGS for Linear Regression?
>>
>> You need to do the StandardScaler to help the convergency yourself.
>> LBFGS just takes whatever objective function you provide without doing any 
>> scaling. I will like to provide LinearRegressionWithLBFGS which does the 
>> scaling internally in the nearly feature.
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> My Blog: https://www.dbtsai.com
>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>
>>
>> On Fri, Dec 12, 2014 at 8:49 AM, Bui, Tri 
>> <tri....@verizonwireless.com.invalid> wrote:
>>> Hi,
>>>
>>>
>>>
>>> Trying to use LBFGS as the optimizer, do I need to implement feature
>>> scaling via StandardScaler or does LBFGS do it by default?
>>>
>>>
>>>
>>> Following code  generated error “ Failure again!  Giving up and
>>> returning, Maybe the objective is just poorly behaved ?”.
>>>
>>>
>>>
>>> val data = sc.textFile("file:///data/Train/final2.train")
>>>
>>> val parsedata = data.map { line =>
>>>
>>> val partsdata = line.split(',')
>>>
>>> LabeledPoint(partsdata(0).toDouble, Vectors.dense(partsdata(1).split('
>>> ').map(_.toDouble)))
>>>
>>> }
>>>
>>>
>>>
>>> val train = parsedata.map(x => (x.label,
>>> MLUtils.appendBias(x.features))).cache()
>>>
>>>
>>>
>>> val numCorrections = 10
>>>
>>> val convergenceTol = 1e-4
>>>
>>> val maxNumIterations = 50
>>>
>>> val regParam = 0.1
>>>
>>> val initialWeightsWithIntercept = Vectors.dense(new Array[Double](2))
>>>
>>>
>>>
>>> val (weightsWithIntercept, loss) = LBFGS.runLBFGS(train,
>>>
>>>   new LeastSquaresGradient(),
>>>
>>>   new SquaredL2Updater(),
>>>
>>>   numCorrections,
>>>
>>>   convergenceTol,
>>>
>>>   maxNumIterations,
>>>
>>>   regParam,
>>>
>>>   initialWeightsWithIntercept)
>>>
>>>
>>>
>>> Did I implement LBFGS for Linear Regression via “LeastSquareGradient()”
>>> correctly?
>>>
>>>
>>>
>>> Thanks
>>>
>>> Tri
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
>> additional commands, e-mail: user-h...@spark.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
> commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

Reply via email to