Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124753530 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -961,14 +1008,16 @@ class GeneralizedLinearRegressionModel private[ml] ( } override protected def transformImpl(dataset: Dataset[_]): DataFrame = { --- End diff -- I summarized all four cases for making prediction as following: Estimator(training data) | Transformer(prediction data) | How R predict | How Spark predict ------------------------- | ----------------------------- | --------------- | ------------------ w/ offset column | w/ offset column | use offset of prediction data | use offset of prediction data w/ offset column | w/o offset column | use offset of training data | not use offset w/o offset column | w/ offset column | not use offset | not use offset w/o offset column | w/o offset column | not use offset | not use offset For case 1 and 4, there is not that controversial. For case 2, the reason behind a different way to handle is we can't store all ```offset``` data in our model like what R does, but we should print a warning log to let users know that is different from R. For case 3, in your current implementation, it ignores whether the model was trained with offset. I think it might be worth discussing. I think the correct way should consider whether the model was trained with offset. If the model was trained without offset, we should ignore the offset column when making prediction on new dataset. Or at least, we should print out warning to remind users. However, I think we can discuss and resolve this issue in follow-up work. @actuaryzhang What do you think my proposal of how Spark make prediction? Thanks.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org