Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/14326
  
    @yanboliang I go through the code and there are several problems need to 
solve:
    
    The robust regression has a parameter `sigma` which must > 0, so that it is 
a bound optimize problem and should use LBFGS-B. But as my test, current breeze 
LBFGS-B has bugs and when iterating sometimes it will generate NaN value and 
corrupt the computing. 
    
    I add some log printing to help debug, I paste a small fragment to show how 
the LBFGS-B corrupt:
    **(robust regression w/o intercept w/ regularization test)**
    costFun: sigma param: 1.0
    huberAggrLoss + reg: 18262.68068379334
    cost grad- sigma: -630.1789355384457
    costFun: sigma param: 631.1789355384457
    huberAggrLoss + reg: 1.256602668595641E7
    cost grad- sigma: -466.0711286869664
    costFun: sigma param: 64.01789355384457
    huberAggrLoss + reg: 483796.45119015244
    cost grad- sigma: -448.1113667824356
    costFun: sigma param: 9.849995439060637
    huberAggrLoss + reg: 44154.79971484518
    cost grad- sigma: -275.5999029061156
    costFun: sigma param: 3.2447088269560513
    huberAggrLoss + reg: 8513.171279631315
    cost grad- sigma: -5.737776191290681
    **costFun: sigma param: NaN
    huberAggrLoss + reg: NaN**
    cost grad- sigma: -822.4999999999944
    
    as shown above, when sigma param became NaN in iterating, the LBFGS-B has 
corrupted and there is no need to continue. 
    
    When I trace the LBFGS-B I found that in
    `LBFGSB.subspaceMinimization` method, it may cause output point became 
(NaN, NaN...) even if the input is OK. so that I think it is a bug in 
`LBFGSB.subspaceMinimization` . 
    I think this problem has no wark-around way and need Breeze community to 
fix it.
    
    The second problem, whether the loss should divided by N and whether L2 reg 
should divided by 2, I think it should keep consistent with other GLM alogrithm 
in mllib.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to