Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-49104793 I didn't suggest having a new implementation for long IDs, only a new API. They can run on the same implementation (e.g. the current Int-based one transforms the Ints to Longs and calls that one). This is a much more sensible way to evolve the API and it's very common in other software. All our MLlib APIs were designed to support this kind of evolution (e.g. you set your parameters using a builder pattern, where we can add new methods, and the top-level API is just functions you can call that we can easily map to more complex versions of the functions). The place I'm coming from is that there are *far* more complex APIs than ours that have retained backwards compatibility over decades, and were maintained by a similar-sized team. One great example is Java's class library, which is not only a great library but has also been compatible since 1.0. There are well-known ways to retain compatibility while still improving the API, such as adding a new package (e.g. java.nio vs java.io). I would be totally fine doing that with MLlib as we gain experience with it, but there's no reason to break the old API in the process. Again, I feel that people from today's tech company world think way too much about "perfecting" an API by repeatedly tweaking it, and while that works within a single engineering team, it doesn't work in software that you expect someone else to use.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---