Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49072664 Yeah API stability is very important. I keep banging on about the flip-side -- freezing an API that may still need to change. You get a different important problem. I'm sure everyone gets that, and it's a judgment call and trade-off. I will change the PR to preserve the existing methods and add new ones. That's the thing we can consider and merge or not. I'm not offended if nobody else is feeling this one. I can always fork/wrap this aspect to fit what I need it do. (And I have other API suggestions I'd rather spend time on if anything.) I wouldn't want to add the overhead of a separate set of implementations just for 64 bit values. Users would have a hard time understanding the difference and choosing. 3 billion people is a lot! It could happen, yes. Maybe not with people but with, say, URLs. *If* collisions mattered much, then with many billions of things, you can't use the ALS implementation as it stands, since _most_ IDs would collide no matter how you map or hash. That's the best motivation I can offer for this change.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---