Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/19020
  
    @yanboliang Yeah, I saw the discussion and it seems to me the reason was: 
there would be too much code duplication. Sure, it's true that there would be 
code duplication, but to me that's a reason to work on the internals so that 
there is less code duplication, rather than just to continue patching around a 
design that doesn't work very well. We *can* combine them, I just don't think 
we should. I know I'm late to the discussion, so there's already been a lot of 
work. But these things can't really be undone due to backwards compatibility. 
We could work on creating better interfaces for plugging in 
loss/prediction/optimizer, which I think is the best way to approach it. Linear 
and logistic regression seem like they are just becoming giant, monolithic 
pieces of code.
    
    I guess the argument against it will be lack of developer bandwidth. If 
that's the case, ok, but I'd argue to just leave Huber regression to be 
implemented by an external package in that case. If we don't have bandwidth do 
it in a robust, well-designed way then I don't think doing it the *easy* way is 
a good solution either. My first vote is to implement as a separate estimator, 
my second vote would be to leave it for a Spark package.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to