cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500536745
> i think you are referring to having to implement that if we don't go with
the current PR implementation and instead go with something like what you
proposed -- since the current PR wrapper implementations don't implement the
Set/Map interfaces. Is that correct? In other words, I think you are arguing
against your proposal -- I just want to be sure.
So firstly, as per the earlier discussion in the PR, I think we have to
implement `Set`, `Map`, and `TreeMap`. Using custom types just hurts programmer
productivity and ability to understand the code too much. Obviously there are
going to be some differences with the `java.util` versions, but hopefully those
can be minimal, so people's intuitions about what is O(1), O(N), O(log N) can
be correct, as well as people's understanding of what the code does.
So the contest here is between
1. implementing your own Set, Map, TreeMap wrappers above the library code,
or
2. implementing a helper class like I suggested above
3. just using pcollections directly, and adding a wrapper class only for the
libraries that need it, like Vavr
My main argument is that 1 is a lot of tricky code that we don't really
need, and likely worse performance (it's hard to argue about it without
benchmarks, of course).
3 will make it slightly harder to try different collections libraries.
Although in theory we could ask people to use our own helper functions rather
than PCollections-specific methods, to minimize the delta needed. Perhaps this
is an avenue worth exploring.
> I think it is important for people to be able to easily understand what
performance they are going to get from a function. If that's not clear, and we
end up giving O(1) performance if the downcast to a persistent collection can
occur vs. O(N) performance if it cannot -- I think that would make it more
likely for O(N) performance to occur inadvertently. Especially since people
won't necessarily be able to know what type they have just by reading the
immediate code -- if all they know is that have a java.util.Map then they might
have to trace back to figure out where they actually got that, what the actual
type is, and therefore what O() behavior they are going to get if they try to
add something to it.
There is a tension between what you are trying to do here (abstract over
multiple libraries) and the idea that "people [need] to be able to easily
understand what performance they are going to get from a function." If a new
library we add later is somewhat better at get() and worse at put(), does that
means that people using your wrappers can't "easily understand what performance
they are going to get"? Arguably yes. So should we ditch the wrappers and use
the libraries directly?
Ultimately the test of performance is our JMH benchmarks, as well as the
profiles we take in production. I believe we should rely on those, rather than
assuming we can deduce everything about performance ahead of time...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]