Github user staple commented on the pull request: https://github.com/apache/spark/pull/2491#issuecomment-57855256 @mengxr Sorry about that, in the future Iâll follow the best practice youâve outlined. Here are the take-aways from my perspective: - Investigate use of sparse storage for the conditional distribution. I believe the existing implementation in master uses dense conditional distribution matrices, but sparse is obviously possible. - Remove grouping of conditional probabilities, as it adds complexity and you mentioned you arenât sure if it will help performance. - Add support for predictValues with consistent partitioning. Iâll look into all these. Thanks for your feedback!
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org