Re: Do we want a hashset type?

Joel Jacobson Wed, 07 Jun 2023 07:22:35 -0700

On Tue, Jun 6, 2023, at 13:20, Tomas Vondra wrote:
> it cuts the timing to about 50% on my laptop, so maybe it'll be ~300ms
> on your system. There's a bunch of opportunities for more improvements,
> as the hash table implementation is pretty naive/silly, the on-disk
> format is wasteful and so on.
>
> But before spending more time on that, it'd be interesting to know what
> would be a competitive timing. I mean, what would be "good enough"? What
> timings are achievable with graph databases?


Your hashset is now almost exactly as fast as the corresponding roaringbitmap 
query, +/- 1 ms on my machine.

I tested Neo4j and the results are surprising; it appears to be significantly 
*slower*.
However, I've probably misunderstood something, maybe I need to add some index 
or something.
Even so, it's interesting it's apparently not fast "by default".

The query I tested:
MATCH (user:User {id: '5867'})-[:FRIENDS_WITH*3..3]->(fof)
RETURN COUNT(DISTINCT fof)

Here is how I loaded the data into it:

% pwd
/Users/joel/Library/Application Support/Neo4j 
Desktop/Application/relate-data/dbmss/dbms-3837aa22-c830-4dcf-8668-ef8e302263c7

% head import/*
==> import/friendships.csv <==
1,13,FRIENDS_WITH
1,11,FRIENDS_WITH
1,6,FRIENDS_WITH
1,3,FRIENDS_WITH
1,4,FRIENDS_WITH
1,5,FRIENDS_WITH
1,15,FRIENDS_WITH
1,14,FRIENDS_WITH
1,7,FRIENDS_WITH
1,8,FRIENDS_WITH

==> import/friendships_header.csv <==
:START_ID(User),:END_ID(User),:TYPE

==> import/users.csv <==
1,User
2,User
3,User
4,User
5,User
6,User
7,User
8,User
9,User
10,User

==> import/users_header.csv <==
id:ID(User),:LABEL

% ./bin/neo4j-admin database import full --overwrite-destination 
--nodes=User=import/users_header.csv,import/users.csv 
--relationships=FRIENDS_WIDTH=import/friendships_header.csv,import/friendships.csv
 neo4j

/Joel

Re: Do we want a hashset type?

Reply via email to