Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1393#issuecomment-49013972
  
    Yes you could also tell callers to track their own user-ID mapping and 
maintain it consistently everywhere. Callers have to share that state then 
somehow. Hashing is easier, and 64 bits makes it work for practical purposes. 
    
    A caller has to do something like these to deal with real-world identifiers 
because an `Int` ID API by itself doesn't quite work. This is an instance of a 
meta-concern I have, if an API which (from my perspective) is going to be 
problematic at scale is already unchangeable before battle-testing. (I actually 
thought all of MLlib was de facto `@Experimental`?)
    
    Yeah however you can layer on other APIs to fix it, or use `@deprecated` in 
cases like this to keep existing methods but add new signatures too. I think 
that would be the simplest solution to this particular concern.
    
    The question of serialized size is still out there. That is worth weighing 
in on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to