Github user SlavikBaranov commented on the pull request: https://github.com/apache/spark/pull/8395#issuecomment-134327407 @srowen I think, intermediate data is cached because it significantly improves performance if the number of features is high (since `appendBias` method performs `System.arrayCopy`). On the other hand, `LogisticRegressionWithLBFGS` class always performs feature scaling, so redundant evaluation of the input RDD might take significant time. Thinking of it again, I'd prefer to return removed warning and limit the fix to something like this: if (data.getStorageLevel != StorageLevel.NONE) { data.unpersist(false) } What do you think about it?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org