Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1393#issuecomment-49104793
  
    I didn't suggest having a new implementation for long IDs, only a new API. 
They can run on the same implementation (e.g. the current Int-based one 
transforms the Ints to Longs and calls that one). This is a much more sensible 
way to evolve the API and it's very common in other software. All our MLlib 
APIs were designed to support this kind of evolution (e.g. you set your 
parameters using a builder pattern, where we can add new methods, and the 
top-level API is just functions you can call that we can easily map to more 
complex versions of the functions).
    
    The place I'm coming from is that there are *far* more complex APIs than 
ours that have retained backwards compatibility over decades, and were 
maintained by a similar-sized team. One great example is Java's class library, 
which is not only a great library but has also been compatible since 1.0. There 
are well-known ways to retain compatibility while still improving the API, such 
as adding a new package (e.g. java.nio vs java.io). I would be totally fine 
doing that with MLlib as we gain experience with it, but there's no reason to 
break the old API in the process. Again, I feel that people from today's tech 
company world think way too much about "perfecting" an API by repeatedly 
tweaking it, and while that works within a single engineering team, it doesn't 
work in software that you expect someone else to use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to