belugabehr commented on PR #3194:
URL: https://github.com/apache/avro/pull/3194#issuecomment-2629242178
I am taking another pass at this one. The performance improvements are just
too significant to pass up (11%+ in my local benchmarhks). Here are my
benchmarks:
```
# JMH version: 1.35
# VM version: JDK 17.0.13, OpenJDK 64-Bit Server VM,
17.0.13+11-Ubuntu-2ubuntu124.04
# VM invoker: /usr/lib/jvm/java-17-openjdk-amd64/bin/java
# This is from AVRO-4069 (the worst result I measured for this branch)
Result
"com.szymon_kaluza.protobuf_avro.benchmark.AvroBench.serializeAndDeserializeSmall":
889717.049 ±(99.9%) 9103.527 ops/s [Average]
(min, avg, max) = (860194.914, 889717.049, 909429.461), stdev = 13625.731
CI (99.9%): [880613.522, 898820.576] (assumes normal distribution)
# This is from master (the best result I measured for this branch)
Result
"com.szymon_kaluza.protobuf_avro.benchmark.AvroBench.serializeAndDeserializeSmall":
796084.662 ±(99.9%) 12190.142 ops/s [Average]
(min, avg, max) = (764448.008, 796084.662, 850984.160), stdev = 18245.632
CI (99.9%): [783894.521, 808274.804] (assumes normal distribution)
```
The reason I even investigated this is because Visual VM profiling was
lighting up on `IdentityHashMap` usage. By migrating to HashMap, I was able to
achieve most of the performance gains. Also, another thing that I hadn't
considered before is that `IdentityHashMap` uses object reference-equality for
insertions and lookups. This means that the Map can look different for each
instantiation of Avro because obviously the reference values will be different
every time it runs. This made benchmarking kind of tricky and unreliable as
well. If the keys just happened to have a lot of collisions, then performance
would suffer, alternatively, benchmarking improved if the keys happen to be
laid out more evenly in the bucket space. Yet another reason to simply get off
this implementation.
There is one caveat in the JavaDocs:
> For many JRE implementations and operation mixes, this class will yield
better performance than
[HashMap](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html)
(which uses chaining rather than linear-probing).
* https://docs.oracle.com/javase/8/docs/api/java/util/IdentityHashMap.html
I get around this by exploiting the fact that the number of items in the map
is fixed as instantiation. Therefore, I use a Map size sufficiently large such
that there are no key collisions and therefore no chaining involved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]