[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

MrBago Sat, 10 Feb 2018 16:33:48 -0800

Github user MrBago commented on the issue:

    https://github.com/apache/spark/pull/20566
  
    I believe this will break persistence for LogisticRegression. I believe the 
issue is that the `threshold` param on LogisticRegressionModel doesn't get a 
default directly, but only gets it during the call to `fit` on 
LogisticRegression. This is currently fine because the Model can only be 
created by fitting or by being read from disk and in both case some value gets 
set for threshold. With this change that's no longer the case. Here's a test to 
confirm, 
https://github.com/apache/spark/commit/5db2108224accdf848b41ef0d8d1c312b49f49c6.
    
    I believe LinearRegression may have a similar issue.
    
    Our current tests don't seem to cover this kind of thing so I think we 
should improve test coverage if we want to make this kind of change.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

Reply via email to