[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1508967636 @rondagostino BTW I still would like to understand what is wrong with calling `values()` on the map (if anything) but let's take that discussion offline and not block on it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1508964167 OK. It sounds like the consensus is that we should keep the "immutable" naming then. Thanks, @rondagostino , @showuon , @emissionnebula , @ijuma . LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1502416844 > Yes, updated would only make sense on the map interface. For the Set interface, added seems reasonable. Sorry to bikeshed this again but I kinda like `withAdded`, `withRemoval`, etc. Curious how you feel about those. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1502415755 @ijuma @rondagostino : I feel like "persistent" should appear somewhere in the class names here. Perhaps you're right that it doesn't need to be in the short class name, but can we put the classes in a namespace that includes that word? So something like `org.apache.kafka.server.persistent`? Then we'd have `org.apache.kafka.server.persistent.ImmutableMap`, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500625439 Type safety is a tricky thing. There's no hard and fast rule about where to stop :) I do genuinely think encoding "this is persistent and therefore immutable" in the type system is a good thing... maybe it's my history as a C++ programmer (heh) Anyway sounds good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500597366 It's messy to have to go through a conversion function to use standard interfaces. The fact that we have to do it in Scala is a big negative for Scala. I'm greatly looking forward to no longer having to do this once we get rid of Scala. I would like to never again have to convert between (or ponder the difference between) a `collections.Seq` and a `collections.immutable.Seq`, or a scala map versus a java map. Also, as you yourself argued, converting back to java.util.Map loses type safety, since now we don't know if we have a persistent map or not. It is quite useful to know that your map is immutable, even if I doubt that confusion between regular types and persistent ones will be as common as you suggest. Most importantly, I don't think the delta between a wrapper type that implements java.util.Map and one that doesn't is that big. Like I said earlier, we did this in TimelineHashMap and other places. While I'd prefer to not have wrappers, if we must have them they should not require downcasting to be useful. If you want I can raise a PR to implement the standard interfaces. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500536745 > i think you are referring to having to implement that if we don't go with the current PR implementation and instead go with something like what you proposed -- since the current PR wrapper implementations don't implement the Set/Map interfaces. Is that correct? In other words, I think you are arguing against your proposal -- I just want to be sure. So firstly, as per the earlier discussion in the PR, I think we have to implement `Set`, `Map`, and `TreeMap`. Using custom types just hurts programmer productivity and ability to understand the code too much. Obviously there are going to be some differences with the `java.util` versions, but hopefully those can be minimal, so people's intuitions about what is O(1), O(N), O(log N) can be correct, as well as people's understanding of what the code does. So the contest here is between 1. implementing your own Set, Map, TreeMap wrappers above the library code, or 2. implementing a helper class like I suggested above 3. just using pcollections directly, and adding a wrapper class only for the libraries that need it, like Vavr My main argument is that 1 is a lot of tricky code that we don't really need, and likely worse performance (it's hard to argue about it without benchmarks, of course). 3 will make it slightly harder to try different collections libraries. Although in theory we could ask people to use our own helper functions rather than PCollections-specific methods, to minimize the delta needed. Perhaps this is an avenue worth exploring. > I think it is important for people to be able to easily understand what performance they are going to get from a function. If that's not clear, and we end up giving O(1) performance if the downcast to a persistent collection can occur vs. O(N) performance if it cannot -- I think that would make it more likely for O(N) performance to occur inadvertently. Especially since people won't necessarily be able to know what type they have just by reading the immediate code -- if all they know is that have a java.util.Map then they might have to trace back to figure out where they actually got that, what the actual type is, and therefore what O() behavior they are going to get if they try to add something to it. There is a tension between what you are trying to do here (abstract over multiple libraries) and the idea that "people [need] to be able to easily understand what performance they are going to get from a function." If a new library we add later is somewhat better at get() and worse at put(), does that means that people using your wrappers can't "easily understand what performance they are going to get"? Arguably yes. So should we ditch the wrappers and use the libraries directly? Ultimately the test of performance is our JMH benchmarks, as well as the profiles we take in production. I believe we should rely on those, rather than assuming we can deduce everything about performance ahead of time... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500464493 > It is true that the wrapper implementation in the PR as it stands right now always creates a new object when creating a new persistent collection (either by creating an empty/singleton or by adding/removing from an existing one). The overhead is low -- object creation isn't onerous in time or space when object initialization is cheap (which it is here), and the work created by having the additional object will be a very low percentage of the work that the underlying persistent collection has to do anyway. So it doesn't concern me very much in that regard, but I do agree we should avoid it if we decide it is unnecessary and any tradeoff ends up being worth it. The performance issues are debatable, I'll definitely grant you that. And I feel kind of bad mentioning them considering what a big performance improvement this will be in general. Probably the bigger issue is the extra complexity of implementing our own Set / Map / TreeMap / etc. It's all doable (and perhaps we'll end up doing it for Vavr) but it feels kind of bad when the library has already done that work. > So I think we have a situation where the current PR introduces a small amount of overhead that is not expected to be significant but that we would prefer to eliminate of possible. But the cost of eliminating it is a lack of type safety -- the compiler would not be able to know if a collection is a persistent one or not. I agree that the solution I proprosed lacks type safety. It would be possible for someone to pass a non-persistent map to one of the `PersistentCollectionFactory` methods. Or, of course, to pass a different kind of persistent collection than what is expected. If type safety is a requirement, it would be possible to add a third generic parameter which is the type of the Map. But this gets back to your original objection to exposing these types, which is that people could then invoke methods directly on the pcollections map type. The users would also need to pull in the pcollections imports and so on. I feel that there is no advantage to going down this path versus just using pcollections directly, since the effort to switch to something else later would not be lessened. Another option is to actually handle the different types in an appropriate fashion. For example, if someone gives you a non-pcollections `java.util.HashMap`, you can implement `afterAdding` by copying it with the requested addition. Hmm... now that I think about this more, it actually sounds like a pretty good option. I suppose in that case we could just have a bunch of static methods in `org.apache.kafka.server.persistent.PersistentCollection` like `PersistentCollection. afterAdding(K k, V v)` and implement them appropriately for all persistent collection types we support... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …
cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1499641310 Thanks for working on this, @rondagostino and @showuon ! I think this will be a really cool improvement. A note about namespace names: Rather than having an abbreviated namespace like `org.apache.kafka.pcoll` , how about `org.apache.kafka.server.persistent` ? This emphasizes that the code is in `server-common` and is more consistent with our other namespace names, which mostly don't use abbreviations. I don't have any objection to wrapping pcollections so that you can swap in a different library in the future. However, I don't like the wrappers here for two main reasons: 1. They don't implement standard collection interfaces (java.util.Map, java.util.Set) 2. They involve an additional "wrapper object" for every operation (every addition, etc.) The first issue is easy to fix. If you look at TimelineHashMap.java and TimelineHashSet.java, you can see that I implemented the standard Map and Set operations without too much fuss. It is a lot better than just implementing `Iterable` and some getters or having awkward "convert this to a real java collection" methods. The second issue is harder to fix because of the nature of Java's type system. However, I suspect that you might be able to do it by having something like a `PCollectionFactory` that implemented `PersistentCollectionFactory`. Then you could have methods like: ``` class PersistentCollectionFactory { Map emptyMap() Map singletonMap(K k, V v) Map afterAdding(Map map, K k, V v) Map afterRemoving(Map map, K k, V v) Map afterRemoving(Map map, K k) ... } ``` This would basically let you not have to cart around a wrapper object for each new map you created. However you could avoid dealing with pcollections types entirely in the calling code, and just refer to java.util.Map, etc. Since the pcollections type do implement the standard java types, this would work well here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org