On Tue, Jun 6, 2023, at 13:20, Tomas Vondra wrote: > it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms > on your system. There's a bunch of opportunities for more improvements, > as the hash table implementation is pretty naive/silly, the on-disk > format is wasteful and so on. > > But before spending more time on that, it'd be interesting to know what > would be a competitive timing. I mean, what would be "good enough"? What > timings are achievable with graph databases?
Your hashset is now almost exactly as fast as the corresponding roaringbitmap query, +/- 1 ms on my machine. I tested Neo4j and the results are surprising; it appears to be significantly *slower*. However, I've probably misunderstood something, maybe I need to add some index or something. Even so, it's interesting it's apparently not fast "by default". The query I tested: MATCH (user:User {id: '5867'})-[:FRIENDS_WITH*3..3]->(fof) RETURN COUNT(DISTINCT fof) Here is how I loaded the data into it: % pwd /Users/joel/Library/Application Support/Neo4j Desktop/Application/relate-data/dbmss/dbms-3837aa22-c830-4dcf-8668-ef8e302263c7 % head import/* ==> import/friendships.csv <== 1,13,FRIENDS_WITH 1,11,FRIENDS_WITH 1,6,FRIENDS_WITH 1,3,FRIENDS_WITH 1,4,FRIENDS_WITH 1,5,FRIENDS_WITH 1,15,FRIENDS_WITH 1,14,FRIENDS_WITH 1,7,FRIENDS_WITH 1,8,FRIENDS_WITH ==> import/friendships_header.csv <== :START_ID(User),:END_ID(User),:TYPE ==> import/users.csv <== 1,User 2,User 3,User 4,User 5,User 6,User 7,User 8,User 9,User 10,User ==> import/users_header.csv <== id:ID(User),:LABEL % ./bin/neo4j-admin database import full --overwrite-destination --nodes=User=import/users_header.csv,import/users.csv --relationships=FRIENDS_WIDTH=import/friendships_header.csv,import/friendships.csv neo4j /Joel