Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1393#issuecomment-49072664
  
    Yeah API stability is very important. I keep banging on about the flip-side 
-- freezing an API that may still need to change. You get a different important 
problem. I'm sure everyone gets that, and it's a judgment call and trade-off.
    
    I will change the PR to preserve the existing methods and add new ones. 
That's the thing we can consider and merge or not. I'm not offended if nobody 
else is feeling this one. I can always fork/wrap this aspect to fit what I need 
it do. (And I have other API suggestions I'd rather spend time on if anything.)
    
    I wouldn't want to add the overhead of a separate set of implementations 
just for 64 bit values. Users would have a hard time understanding the 
difference and choosing.
    
    3 billion people is a lot! It could happen, yes. Maybe not with people but 
with, say, URLs. *If* collisions mattered much, then with many billions of 
things, you can't use the ALS implementation as it stands, since _most_ IDs 
would collide no matter how you map or hash. That's the best motivation I can 
offer for this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to