cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500536745

   > i think you are referring to having to implement that if we don't go with 
the current PR implementation and instead go with something like what you 
proposed -- since the current PR wrapper implementations don't implement the 
Set/Map interfaces. Is that correct? In other words, I think you are arguing 
against your proposal -- I just want to be sure.
   
   So firstly, as per the earlier discussion in the PR, I think we have to 
implement `Set`, `Map`, and `TreeMap`. Using custom types just hurts programmer 
productivity and ability to understand the code too much. Obviously there are 
going to be some differences with the `java.util` versions, but hopefully those 
can be minimal, so people's intuitions about what is O(1), O(N), O(log N) can 
be correct, as well as people's understanding of what the code does.
   
   So the contest here is between
   1. implementing your own Set, Map, TreeMap wrappers above the library code, 
or
   2. implementing a helper class like I suggested above
   3. just using pcollections directly, and adding a wrapper class only for the 
libraries that need it, like Vavr
    
   My main argument is that 1 is a lot of tricky code that we don't really 
need, and likely worse performance (it's hard to argue about it without 
benchmarks, of course).
   
   3 will make it slightly harder to try different collections libraries. 
Although in theory we could ask people to use our own helper functions rather 
than PCollections-specific methods, to minimize the delta needed. Perhaps this 
is an avenue worth exploring.
   
   > I think it is important for people to be able to easily understand what 
performance they are going to get from a function. If that's not clear, and we 
end up giving O(1) performance if the downcast to a persistent collection can 
occur vs. O(N) performance if it cannot -- I think that would make it more 
likely for O(N) performance to occur inadvertently. Especially since people 
won't necessarily be able to know what type they have just by reading the 
immediate code -- if all they know is that have a java.util.Map then they might 
have to trace back to figure out where they actually got that, what the actual 
type is, and therefore what O() behavior they are going to get if they try to 
add something to it.
   
   There is a tension between what you are trying to do here (abstract over 
multiple libraries) and the idea that "people [need] to be able to easily 
understand what performance they are going to get from a function." If a new 
library we add later is somewhat better at get() and worse at put(), does that 
means that people using your wrappers can't "easily understand what performance 
they are going to get"? Arguably yes. So should we ditch the wrappers and use 
the libraries directly?
   
   Ultimately the test of performance is our JMH benchmarks, as well as the 
profiles we take in production. I believe we should rely on those, rather than 
assuming we can deduce everything about performance ahead of time...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to