Github user MLnick commented on the issue: https://github.com/apache/spark/pull/12896 Your suggestion is, to me, the ideal solution. It's probably the more common method of splitting "ratings" datasets for CV purposes. I'm interested in working on it but I think it would be a whole new specific cross-validator class. I'm not quite sure what the best approach is for efficiency (refer #14321 for stratified sampling approach, it's more for labels and is not efficient for this case, but the general concept might apply). In short, it's obviously a lot more effort and will take time. Perhaps it also starts life outside of Spark in packages. Not sure on this yet, but happy to collaborate on ideas! Originally this PR was intended for `2.0` to at least make ALS useable with the CV classes.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org