Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-04-14 Thread via GitHub


chaokunyang commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2053969385

   What do you mean use class value? You mean use it to cache a I'd for class? 
This will have a hash look up cost too 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-04-14 Thread via GitHub


tommyettinger commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2053941441

   Well, the `FlipMap` turned out to be significantly slower than all other 
maps I tried, due to an error I made when writing `put()` and related code. 
With `put()` fixed so it actually puts a key and value in place, `FlipMap` is 
quite slow... I've been thinking about alternatives, though.
   
   I've never used `ClassValue`, myself, but could it maybe be possible to 
avoid any Class-based hashtable if a ClassValue was registered 
per-Fury-instance that stores (somehow) a registered ID for a Class, in the 
Class itself? I also looked into hacky ways that might allow `sun.misc.Unsafe` 
to store an id directly in a Class, but even if that could work, it doesn't 
allow multiple `Fury` instances to have different registered classes. 
ClassValue might be able to do it, but I really have no idea what I am doing 
with that code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-04-06 Thread via GitHub


chaokunyang commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2040992654

   Hi @tommyettinger , thanks for this hard work.
   
   Fury has a `src/main/java/org/apache/fury/benchmark/MapSuite.java` which did 
some benchmarks for IdentityMap.
   
   For end-to-end tests, 
`src/main/java/org/apache/fury/benchmark/UserTypeSerializeSuite.java` can be 
used.  You can use following config:
   ```java
 public static void main(String[] args) throws IOException {
   if (args.length == 0) {
 String commandLine =
 "org.apache.fury.*UserTypeSerializeSuite.fury* -f 3 -wi 10 -i 10 
-t 1 -w 2s -r 2s -rf csv";
 System.out.println(commandLine);
 args = commandLine.split(" ");
   }
   Main.main(args);
 }
   ```
   `org.apache.fury.benchmark.state.BenchmarkState#references` should be set to 
true only, otherwise referecne tracking won't be enabled:
   ```java
 @Param({"true"})
 public boolean references;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-04-06 Thread via GitHub


tommyettinger commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2040990074

   I'm making progress on the rest of the map types now; IdentityMap seems done 
and ObjectIntMap might be done or close to it. Hybridizing the two shouldn't be 
hard at all. Is there a particular benchmark that you think would best test the 
performance of these map types? It looks like running all benchmarks would take 
well over a day.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-04-03 Thread via GitHub


tommyettinger commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2036145035

   I've done the steps you described, and it has gone from not inlining (code 
too large) to inlining when hot! I keep finding various small things to change, 
but they're at least getting smaller. The last major change wouldn't have even 
affected a map with Class keys (I had accidentally left in some comparisons as 
`==` from when this was purely an identity-comparing map; they use `equals()` 
now). I'll look into putOrGet() next.
   
   I have a nagging worry that some major part of this might not behave 
correctly. I have some tests from a very old version of Apache Harmony that 
I've been using for some data structures... If you happen to know of a better 
suite of tests for compatibility with Set and Map, that would be great! I know 
implementing the interfaces isn't critical for Fury since what's there now 
doesn't have to implement Map, but being able to run a standard set of tests 
makes me more confident these will work how they should.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-03-31 Thread via GitHub


chaokunyang commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2028580971

   Great, looks pretty promising! The caller is 
`org.apache.fury.resolver.MapRefResolver#writeRefOrNull`, would be nice if it 
can be inlined into this method.
   
   I took a look at 
https://github.com/tommyettinger/assorted-benchmarks/blob/master/jmh/src/main/java/de/heidelberg/pvs/container_bench/FlipMap.java#L207.
 Would it be possible to seperate the branch less hit into a separate method, 
so the method body is smaller for inline. JVM has a max 325 bytecode size for 
inline be default. And in fury, we merged the `put` and `get` together into 
`putOrGet` to reduce two hash look up into one.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-03-31 Thread via GitHub


tommyettinger commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2028576591

   OK, I've been thinking about this for a while, and tinkering a lot with 
different options. I was surprised to find that at least one version of cuckoo 
hashing really can perform quite well (though not as dramatically better on 
containsKey() performance), as long as it never encounters two different keys 
with the same exact hashCode(). I modified an existing Apache v2-licensed 
cuckoo hash map, reducing allocations as much as possible, and the performance 
gains are considerable on some operations:
   
   ```
   POPULATE
   IdentityCuckooMap  : 198676.832 ns/op
   JDK IdentityHashMap: 508930.787 ns/op
   CONTAINS
   IdentityCuckooMap  :   1023.728 ns/op
   JDK IdentityHashMap:   1418.562 ns/op
   COPY
   IdentityCuckooMap  :  28171.380 ns/op
   JDK IdentityHashMap: 302847.095 ns/op
   ITERATE
   IdentityCuckooMap  :  27954.503 ns/op
   JDK IdentityHashMap: 149939.553 ns/op
   ```
   
   The catch here is that if a cuckoo hash "fails," it can enter an infinite 
loop or allocate memory forever, etc. Very bad worst-case properties. However, 
the internals of this particular cuckoo-hashed map and the existing 
FuryObjectMap are quite similar, and I'm working on an approach that will 
"flip" from using cuckoo hashing as long as it is viable, to using linear 
probing as FuryObjectMap currently does, if fully-colliding keys are present 
(or some other situation would force a rehash). I'm not sure what kind of speed 
penalty the flipping will impose; it is possible branch prediction will figure 
out that fully-colliding keys are extremely rare, and that would mean the cost 
would be very low relative to the above numbers. But, the code is larger for 
each method, which might be a problem for inlining. I'll have to benchmark and 
see... I know the JDK HashMap can also change its implementation to use a 
TreeMap-like Red-Black Tree if it encounters excessive collisions on Comparable 
k
 eys, so this isn't an unprecedented model.
   
   I'm testing the FlipMap code in [my jdkgdxds 
repo](https://github.com/tommyettinger/jdkgdxds/blob/master/src/test/java/com/github/tommyettinger/ds/test/FlipMap.java)
 for correctness, and I'll be benchmarking it using what should be similar code 
in [my assorted-benchmarks 
repo](https://github.com/tommyettinger/assorted-benchmarks/blob/master/jmh/src/main/java/de/heidelberg/pvs/container_bench/FlipMap.java).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-03-17 Thread via GitHub


tommyettinger commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2002402887

   I was able to run the benchmark, and I may have an avenue that could improve 
contains() and get() performance.
   
   
![Contains](https://github.com/apache/incubator-fury/assets/160684/5c8beceb-ae32-43ac-b32e-81b01b70c1d5)
   
   This was testing just contains() with Object keys, some of which were 
`Class` instances from `java.**` packages, and some of which were simple 
`Dummy` instances, where Dummy is a class that stores its name and that's it, 
using identity comparison like Class. The one case where anything performed 
noticeably better was when hashing the .toString() result of the Object (but 
.getName() on Class would also work), and I believe that only did well because 
the String and its hashCode are cached. That didn't do better every time, and 
some other operations took longer.
   
   The full benchmark data on various operations is available here as a raw 
.txt:
   
https://github.com/tommyettinger/assorted-benchmarks/blob/529bd6a8e48ceb3fbbb7f0021f990394e1fd3cf6/jmh/Identity_Map_21.txt
   and here as a nicely-formatted spreadsheet (.fods format from LibreOffice 
and I think OpenOffice): (direct link)
   
https://github.com/tommyettinger/assorted-benchmarks/raw/2538fcf164e27b3defc368436f8e2799225f75e9/jmh/Identity_Map_Performance_21.fods
   
   The confusing thing is that there seems to be some deoptimization going on 
in the JVM, since the benchmarks each start about twice as fast as when they 
actually start recording data. It could be my laptop heating up quickly, but 
that seems unlikely.
   
   You can run the benchmarks yourself by loading the `jmh` project in [the 
repo I 
linked](https://github.com/tommyettinger/assorted-benchmarks/tree/master/jmh) 
and running the shadowJar task. Then you can use a command like...
   
   ```
   java -jar benchmarks.jar "JDKIdentityMapBench" -p 
impl=JDK_O2O_HASH,JDK_O2O_IDENTITY,JDKGDXDS_HASH,JDKGDXDS_TOSTR,JDKGDXDS_IDENTITY
 -p size=100,1000,1 -wi 4 -i 5 -f 1 -w 5 -r 5 -p payloadType=OBJECT_UNIFORM
   ```
   
   ... which will run the Class-and-Dummy-Object test with the implementations 
for HashMap, IdentityHashMap, ObjectObjectMap, the new ObjectMapToString, and 
IdentityObjectMap, on sizes 100, 1000, and 1, with 4 warmup iterations, 5 
actual iterations, and each iteration taking 5 seconds. My apologies for how 
messy the benchmarking project is... It's mostly a testbed to see if something 
is worth investigating further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-03-17 Thread via GitHub


chaokunyang commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2002398425

   Looking forward to your benchmark results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-03-16 Thread via GitHub


tommyettinger commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2001956086

   RE: removed multiplication, good, that was one of the optimizations I wasn't 
sure had made it into Fury.
   
   I've got a JMH benchmark that can test several different Map implementations 
at once, but I can't run it at the moment because all 12 logical cores on this 
laptop are currently devoted to an unrelated statistical test on a random 
number generator, and that test will run for at least several more hours. I've 
got the JMH benchmark set up to test HashMap, ObjectObjectMap from jdkgdxds, 
the corresponding Map type in FastUtil, and the hash-based Map in Koloboke 
1.0.0 . I'll see if I can add FuryObjectMap, ~~it just needs to implement Map~~ 
OK, it doesn't implement Map, so it would need to be run separately, which 
isn't quite a fair comparison. However, ObjectObjectMap from jdkgdxds is very 
close. It's possible Koloboke will do well here, which it appeared to do 8 
years ago when it last updated, but a lot has changed in the current JDK 
versions. I'll see if I can test on a large number of Class instances, to have 
similar conditions to the large object graph scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-03-15 Thread via GitHub


chaokunyang commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2001616502

   
   Just found that fury has already removed the hash multiplication:
   ```java
 protected int place(K item) {
   return System.identityHashCode(item) & mask;
 }
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org



Re: [I] [Java] optimize map performance for reference tracking and serializer dispatch [incubator-fury]

2024-03-15 Thread via GitHub


chaokunyang commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2001615051

   Hi @tommyettinger , let's continue the discussion in 
https://github.com/apache/incubator-fury/issues/1274#issuecomment-1997259888 . 
   
   We used `0.25` as the default loader actor for class serializer dispatch in 
https://github.com/apache/incubator-fury/blob/main/java/fury-core/src/main/java/org/apache/fury/resolver/ClassResolver.java#L201
 . But for reference tracking, we used `0.51f`, since the object graph may be 
huge, a smaller loader factor will use more memory, and then map is big, there 
may have some L1 cache miss. But I thinks we can change it to `0.5f`.
   
   For class serializer dispatch, some times some class are more hot than 
others, I was also thinking whether is it possible to adjust the map hierarchy 
to make it faster for hot keys. But not sure whether it will bring big 
difference.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@fury.apache.org
For additional commands, e-mail: commits-h...@fury.apache.org