[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-14 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1508967636

   @rondagostino BTW I still would like to understand what is wrong with 
calling `values()` on the map (if anything) but let's take that discussion 
offline and not block on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-14 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1508964167

   OK. It sounds like the consensus is that we should keep the "immutable" 
naming then.
   
   Thanks, @rondagostino , @showuon , @emissionnebula , @ijuma . LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-10 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1502416844

   > Yes, updated would only make sense on the map interface. For the Set 
interface, added seems reasonable.
   
   Sorry to bikeshed this again but I kinda like `withAdded`, `withRemoval`, 
etc. Curious how you feel about those.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-10 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1502415755

   @ijuma @rondagostino : I feel like "persistent" should appear somewhere in 
the class names here. Perhaps you're right that it doesn't need to be in the 
short class name, but can we put the classes in a namespace that includes that 
word? So something like `org.apache.kafka.server.persistent`? Then we'd have 
`org.apache.kafka.server.persistent.ImmutableMap`, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-07 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500625439

   Type safety is a tricky thing. There's no hard and fast rule about where to 
stop :)
   
   I do genuinely think encoding "this is persistent and therefore immutable" 
in the type system is a good thing... maybe it's my history as a C++ programmer 
(heh)
   
   Anyway sounds good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-07 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500597366

   It's messy to have to go through a conversion function to use standard 
interfaces. The fact that we have to do it in Scala is a big negative for 
Scala. I'm greatly looking forward to no longer having to do this once we get 
rid of Scala. I would like to never again have to convert between (or ponder 
the difference between) a `collections.Seq` and a `collections.immutable.Seq`, 
or a scala map versus a java map.
   
   Also, as you yourself argued, converting back to java.util.Map loses type 
safety, since now we don't know if we have a persistent map or not. It is quite 
useful to know that your map is immutable, even if I doubt that confusion 
between regular types and persistent ones will be as common as you suggest.

   Most importantly, I don't think the delta between a wrapper type that 
implements java.util.Map and one that doesn't is that big. Like I said earlier, 
we did this in TimelineHashMap and other places. While I'd prefer to not have 
wrappers, if we must have them they should not require downcasting to be 
useful. If you want I can raise a PR to implement the standard interfaces.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-07 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500536745

   > i think you are referring to having to implement that if we don't go with 
the current PR implementation and instead go with something like what you 
proposed -- since the current PR wrapper implementations don't implement the 
Set/Map interfaces. Is that correct? In other words, I think you are arguing 
against your proposal -- I just want to be sure.
   
   So firstly, as per the earlier discussion in the PR, I think we have to 
implement `Set`, `Map`, and `TreeMap`. Using custom types just hurts programmer 
productivity and ability to understand the code too much. Obviously there are 
going to be some differences with the `java.util` versions, but hopefully those 
can be minimal, so people's intuitions about what is O(1), O(N), O(log N) can 
be correct, as well as people's understanding of what the code does.
   
   So the contest here is between
   1. implementing your own Set, Map, TreeMap wrappers above the library code, 
or
   2. implementing a helper class like I suggested above
   3. just using pcollections directly, and adding a wrapper class only for the 
libraries that need it, like Vavr

   My main argument is that 1 is a lot of tricky code that we don't really 
need, and likely worse performance (it's hard to argue about it without 
benchmarks, of course).
   
   3 will make it slightly harder to try different collections libraries. 
Although in theory we could ask people to use our own helper functions rather 
than PCollections-specific methods, to minimize the delta needed. Perhaps this 
is an avenue worth exploring.
   
   > I think it is important for people to be able to easily understand what 
performance they are going to get from a function. If that's not clear, and we 
end up giving O(1) performance if the downcast to a persistent collection can 
occur vs. O(N) performance if it cannot -- I think that would make it more 
likely for O(N) performance to occur inadvertently. Especially since people 
won't necessarily be able to know what type they have just by reading the 
immediate code -- if all they know is that have a java.util.Map then they might 
have to trace back to figure out where they actually got that, what the actual 
type is, and therefore what O() behavior they are going to get if they try to 
add something to it.
   
   There is a tension between what you are trying to do here (abstract over 
multiple libraries) and the idea that "people [need] to be able to easily 
understand what performance they are going to get from a function." If a new 
library we add later is somewhat better at get() and worse at put(), does that 
means that people using your wrappers can't "easily understand what performance 
they are going to get"? Arguably yes. So should we ditch the wrappers and use 
the libraries directly?
   
   Ultimately the test of performance is our JMH benchmarks, as well as the 
profiles we take in production. I believe we should rely on those, rather than 
assuming we can deduce everything about performance ahead of time...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-07 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500464493

   > It is true that the wrapper implementation in the PR as it stands right 
now always creates a new object when creating a new persistent collection 
(either by creating an empty/singleton or by adding/removing from an existing 
one). The overhead is low -- object creation isn't onerous in time or space 
when object initialization is cheap (which it is here), and the work created by 
having the additional object will be a very low percentage of the work that the 
underlying persistent collection has to do anyway. So it doesn't concern me 
very much in that regard, but I do agree we should avoid it if we decide it is 
unnecessary and any tradeoff ends up being worth it.
   
   The performance issues are debatable, I'll definitely grant you that. And I 
feel kind of bad mentioning them considering what a big performance improvement 
this will be in general. Probably the bigger issue is the extra complexity of 
implementing our own Set / Map / TreeMap / etc. It's all doable (and perhaps 
we'll end up doing it for Vavr) but it feels kind of bad when the library has 
already done that work.
   
   > So I think we have a situation where the current PR introduces a small 
amount of overhead that is not expected to be significant but that we would 
prefer to eliminate of possible. But the cost of eliminating it is a lack of 
type safety -- the compiler would not be able to know if a collection is a 
persistent one or not.
   
   I agree that the solution I proprosed lacks type safety. It would be 
possible for someone to pass a non-persistent map to one of the 
`PersistentCollectionFactory` methods. Or, of course, to pass a different kind 
of persistent collection than what is expected.
   
   If type safety is a requirement, it would be possible to add a third generic 
parameter which is the type of the Map. But this gets back to your original 
objection to exposing these types, which is that people could then invoke 
methods directly on the pcollections map type. The users would also need to 
pull in the pcollections imports and so on. I feel that there is no advantage 
to going down this path versus just using pcollections directly, since the 
effort to switch to something else later would not be lessened.
   
   Another option is to actually handle the different types in an appropriate 
fashion. For example, if someone gives you a non-pcollections 
`java.util.HashMap`, you can implement `afterAdding` by copying it with the 
requested addition. Hmm... now that I think about this more, it actually sounds 
like a pretty good option. I suppose in that case we could just have a bunch of 
static methods in `org.apache.kafka.server.persistent.PersistentCollection` 
like `PersistentCollection. afterAdding(K k, V v)` and implement them 
appropriately for all persistent collection types we support...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka] cmccabe commented on pull request #13280: KAFKA-14735: Improve KRaft metadata image change performance at high …

2023-04-06 Thread via GitHub


cmccabe commented on PR #13280:
URL: https://github.com/apache/kafka/pull/13280#issuecomment-1499641310

   Thanks for working on this, @rondagostino and @showuon ! I think this will 
be a really cool improvement.
   
   A note about namespace names:
   
   Rather than having an abbreviated namespace like `org.apache.kafka.pcoll` , 
how about `org.apache.kafka.server.persistent` ? This emphasizes that the code 
is in `server-common` and is more consistent with our other namespace names, 
which mostly don't use abbreviations.
   
   I don't have any objection to wrapping pcollections so that you can swap in 
a different library in the future. However, I don't like the wrappers here for 
two main reasons:
   1. They don't implement standard collection interfaces (java.util.Map, 
java.util.Set)
   2. They involve an additional "wrapper object" for every operation (every 
addition, etc.)
   
   The first issue is easy to fix. If you look at TimelineHashMap.java and 
TimelineHashSet.java, you can see that I implemented the standard Map and Set 
operations without too much fuss. It is a lot better than just implementing 
`Iterable` and some getters or having awkward "convert this to a real java 
collection" methods.
   
   The second issue is harder to fix because of the nature of Java's type 
system. However, I suspect that you might be able to do it by having something 
like a `PCollectionFactory` that implemented `PersistentCollectionFactory`. 
Then you could have methods like:
   
   ```
   class PersistentCollectionFactory {
   Map emptyMap()
   Map singletonMap(K k, V v)
   Map afterAdding(Map map, K k, V v)
   Map afterRemoving(Map map, K k, V v)
   Map afterRemoving(Map map, K k)
   ...
   }
   ```
   
   This would basically let you not have to cart around a wrapper object for 
each new map you created. However you could avoid dealing with pcollections 
types entirely in the calling code, and just refer to java.util.Map, etc. Since 
the pcollections type do implement the standard java types, this would work 
well here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org