Github user sethah commented on the issue: https://github.com/apache/spark/pull/19020 @yanboliang Yeah, I saw the discussion and it seems to me the reason was: there would be too much code duplication. Sure, it's true that there would be code duplication, but to me that's a reason to work on the internals so that there is less code duplication, rather than just to continue patching around a design that doesn't work very well. We *can* combine them, I just don't think we should. I know I'm late to the discussion, so there's already been a lot of work. But these things can't really be undone due to backwards compatibility. We could work on creating better interfaces for plugging in loss/prediction/optimizer, which I think is the best way to approach it. Linear and logistic regression seem like they are just becoming giant, monolithic pieces of code. I guess the argument against it will be lack of developer bandwidth. If that's the case, ok, but I'd argue to just leave Huber regression to be implemented by an external package in that case. If we don't have bandwidth do it in a robust, well-designed way then I don't think doing it the *easy* way is a good solution either. My first vote is to implement as a separate estimator, my second vote would be to leave it for a Spark package.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org