[ 
https://issues.apache.org/jira/browse/SPARK-25544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782841#comment-16782841
 ] 

Sean Owen commented on SPARK-25544:
-----------------------------------

I think this is a reasonable change -- you can test it in a PR if you're up for 
it

> Slow/failed convergence in Spark ML models due to internal predictor scaling
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-25544
>                 URL: https://issues.apache.org/jira/browse/SPARK-25544
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.3.2
>         Environment: Databricks runtime 4.2: Spark 2.3.1, Scala 2.11
>            Reporter: Andrew Crosby
>            Priority: Major
>
> The LinearRegression and LogisticRegression estimators in Spark ML can take a 
> large number of iterations to converge, or fail to converge altogether, when 
> trained using the l-bfgs method with standardization turned off.
> *Details:*
> LinearRegression and LogisticRegression standardize their input features by 
> default. In SPARK-8522 the option to disable standardization was added. This 
> is implemented internally by changing the effective strength of 
> regularization rather than disabling the feature scaling. Mathematically, 
> both changing the effective regularizaiton strength, and disabling feature 
> scaling should give the same solution, but they can have very different 
> convergence properties.
> The normal justication given for scaling features is that it ensures that all 
> covariances are O(1) and should improve numerical convergence, but this 
> argument does not account for the regularization term. This doesn't cause any 
> issues if standardization is set to true, since all features will have an 
> O(1) regularization strength. But it does cause issues when standardization 
> is set to false, since the effecive regularization strength of feature i is 
> now O(1/ sigma_i^2) where sigma_i is the standard deviation of the feature. 
> This means that predictors with small standard deviations (which can occur 
> legitimately e.g. via one hot encoding) will have very large effective 
> regularization strengths and consequently lead to very large gradients and 
> thus poor convergence in the solver.
> *Example code to recreate:*
> To demonstrate just how bad these convergence issues can be, here is a very 
> simple test case which builds a linear regression model with a categorical 
> feature, a numerical feature and their interaction. When fed the specified 
> training data, this model will fail to converge before it hits the maximum 
> iteration limit. In this case, it is the interaction between category "2" and 
> the numeric feature that leads to a feature with a small standard deviation.
> Training data:
> ||category||numericFeature||label||
> |1|1.0|0.5|
> |1|0.5|1.0|
> |2|0.01|2.0|
>  
> {code:java}
> val df = Seq(("1", 1.0, 0.5), ("1", 0.5, 1.0), ("2", 1e-2, 
> 2.0)).toDF("category", "numericFeature", "label")
> val indexer = new StringIndexer().setInputCol("category") 
> .setOutputCol("categoryIndex")
> val encoder = new 
> OneHotEncoder().setInputCol("categoryIndex").setOutputCol("categoryEncoded").setDropLast(false)
> val interaction = new Interaction().setInputCols(Array("categoryEncoded", 
> "numericFeature")).setOutputCol("interaction")
> val assembler = new VectorAssembler().setInputCols(Array("categoryEncoded", 
> "interaction")).setOutputCol("features")
> val model = new 
> LinearRegression().setFeaturesCol("features").setLabelCol("label").setPredictionCol("prediction").setStandardization(false).setSolver("l-bfgs").setRegParam(1.0).setMaxIter(100)
> val pipeline = new Pipeline().setStages(Array(indexer, encoder, interaction, 
> assembler, model))
> val pipelineModel  = pipeline.fit(df)
> val numIterations = 
> pipelineModel.stages(4).asInstanceOf[LinearRegressionModel].summary.totalIterations{code}
>  *Possible fix:*
> These convergence issues can be fixed by turning off feature scaling when 
> standardization is set to false rather than using an effective regularization 
> strength. This can be hacked into LinearRegression.scala by simply replacing 
> line 423
> {code:java}
> val featuresStd = featuresSummarizer.variance.toArray.map(math.sqrt)
> {code}
> with
> {code:java}
> val featuresStd = if ($(standardization)) 
> featuresSummarizer.variance.toArray.map(math.sqrt) else 
> featuresSummarizer.variance.toArray.map(x => 1.0)
> {code}
> Rerunning the above test code with that hack in place, will lead to 
> convergence after just 4 iterations instead of hitting the max iterations 
> limit!
> *Impact:*
> I can't speak for other people, but I've personally encountered these 
> convergence issues several times when building production scale Spark ML 
> models, and have resorted to writing my only implementation of 
> LinearRegression with the above hack in place. The issue is made worse by the 
> fact that Spark does not raise an error when the maximum number of iterations 
> is hit, so the first time you encounter the issue it can take a while to 
> figure out what is going on.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to