subject:"\[Neo4j\] Comparing Lucene index lookup performance to lookup by node id"

[Neo4j] Comparing Lucene index lookup performance to lookup by node id

2011-11-03 Thread Tero Paananen

This is probably not news to anyone, but I might as well post about
it in case new users are wondering about performance between
index based lookups and lookups by node ids.

I have a test database of 750,000 nodes of type A.

The db also contains 90,000 nodes of types B and C, and roughly
4M relationships between A-B and A-C (so two different relationship
types). The size on disk is 4.7GB, of which the Lucene index takes
2.3GB or so.

Node of type A has three properties, one fulltext indexed ones and
an id type property indexed with type exact index (type of property
is a string). Let's call the property name as guid. The relationships and
other types of nodes also have indexed properties, which are all indexed
in their own indexes. There are about 14M properties in the db.

To test the performance I generate a list of all node IDs and guid property
values, and perform 400,000 lookups using random entries from those
lists, and record the execution time of the 400,000 lookups.

This is on a box with 8GB of RAM, and the performance runs are nowhere
near using all that memory.

I'm using SDN 2.0.0 M1 to access the data. The node id lookups are
done with the findOne(Long id) method in the CRUDRepository class
and the guid property lookups are done with the
findByPropertyValue(String indexName, String property, Object value)
method in the NamedIndexRepository class.

Using default settings for the graph db.

The node id lookups run in about 12,700ms

The index based guid property id lookups run in about 123,000ms.

So roughly a 10x performance difference.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

2011-11-03 Thread Mattias Persson

Indexes, while "fast" they are still an indirection and way slower than a
direct access of something. So this is quite expected.

2011/11/3 Tero Paananen 

> This is probably not news to anyone, but I might as well post about
> it in case new users are wondering about performance between
> index based lookups and lookups by node ids.
>
> I have a test database of 750,000 nodes of type A.
>
> The db also contains 90,000 nodes of types B and C, and roughly
> 4M relationships between A-B and A-C (so two different relationship
> types). The size on disk is 4.7GB, of which the Lucene index takes
> 2.3GB or so.
>
> Node of type A has three properties, one fulltext indexed ones and
> an id type property indexed with type exact index (type of property
> is a string). Let's call the property name as guid. The relationships and
> other types of nodes also have indexed properties, which are all indexed
> in their own indexes. There are about 14M properties in the db.
>
> To test the performance I generate a list of all node IDs and guid property
> values, and perform 400,000 lookups using random entries from those
> lists, and record the execution time of the 400,000 lookups.
>
> This is on a box with 8GB of RAM, and the performance runs are nowhere
> near using all that memory.
>
> I'm using SDN 2.0.0 M1 to access the data. The node id lookups are
> done with the findOne(Long id) method in the CRUDRepository class
> and the guid property lookups are done with the
> findByPropertyValue(String indexName, String property, Object value)
> method in the NamedIndexRepository class.
>
> Using default settings for the graph db.
>
> The node id lookups run in about 12,700ms
>
> The index based guid property id lookups run in about 123,000ms.
>
> So roughly a 10x performance difference.
>
> -TPP
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

2011-11-03 Thread Tero Paananen

> Indexes, while "fast" they are still an indirection and way slower than a
> direct access of something. So this is quite expected.

Agreed. I wanted to run the performance tests to find out how much
slower the index lookups are. Would they have been 2x - 4x slower,
it would've probably still been acceptable for my particular use case.

At 10x, I need to think about alternatives.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

2011-11-03 Thread Michael Hunger

Hi Tero,

thanks for the valueable feedback.

Please note that SDN is not yet optimized in all places.

So I'd love your input by profiling your use-case. It is probably something in 
between.

Can you use visualvm or yourkit or another profiler to figure out the hotspot 
methods where the most time is spent for the index-lookup. I would also love to 
pair with you on this.
(Or get your data-generator and use-cases and profile it myself).

Could you please try to run the same test with the raw neo4j API to see the 
difference.

Another note:

SDN was never intended to be a tool for pulling mass data from the graph into 
memory as it adds some overhead for management and object creation. The most 
important use-cases for SDN is that it gives you an easy
way to work with the graph in terms of your different domain models and also 
eases the integration of other libraries (e.g. mvc) that require domain POJOS.
For mass-data handling it might be sensible to drop down to the core neo4j API 
(by getting the index used explicitely and pull the nodes with 
index.get(key,value)).

Cheers

Michael

Am 03.11.2011 um 18:32 schrieb Tero Paananen:

> This is probably not news to anyone, but I might as well post about
> it in case new users are wondering about performance between
> index based lookups and lookups by node ids.
> 
> I have a test database of 750,000 nodes of type A.
> 
> The db also contains 90,000 nodes of types B and C, and roughly
> 4M relationships between A-B and A-C (so two different relationship
> types). The size on disk is 4.7GB, of which the Lucene index takes
> 2.3GB or so.
> 
> Node of type A has three properties, one fulltext indexed ones and
> an id type property indexed with type exact index (type of property
> is a string). Let's call the property name as guid. The relationships and
> other types of nodes also have indexed properties, which are all indexed
> in their own indexes. There are about 14M properties in the db.
> 
> To test the performance I generate a list of all node IDs and guid property
> values, and perform 400,000 lookups using random entries from those
> lists, and record the execution time of the 400,000 lookups.
> 
> This is on a box with 8GB of RAM, and the performance runs are nowhere
> near using all that memory.
> 
> I'm using SDN 2.0.0 M1 to access the data. The node id lookups are
> done with the findOne(Long id) method in the CRUDRepository class
> and the guid property lookups are done with the
> findByPropertyValue(String indexName, String property, Object value)
> method in the NamedIndexRepository class.
> 
> Using default settings for the graph db.
> 
> The node id lookups run in about 12,700ms
> 
> The index based guid property id lookups run in about 123,000ms.
> 
> So roughly a 10x performance difference.
> 
> -TPP
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

2011-11-03 Thread Tero Paananen

> So I'd love your input by profiling your use-case. It is probably something 
> in between.
>
> Can you use visualvm or yourkit or another profiler to figure out the hotspot 
> methods
> where the most time is spent for the index-lookup. I would also love to pair 
> with you on this.
> (Or get your data-generator and use-cases and profile it myself).
>
> Could you please try to run the same test with the raw neo4j API to see the 
> difference.

Sure thing. I'll try and do this tomorrow, but it might have to wait
until next week.

> Another note:
>
> SDN was never intended to be a tool for pulling mass data from the graph into 
> memory as it
> adds some overhead for management and object creation. The most important 
> use-cases
> for SDN is that it gives you an easy way to work with the graph in terms of 
> your different
> domain models and also eases the integration of other libraries (e.g. mvc) 
> that require
> domain POJOS.
> For mass-data handling it might be sensible to drop down to the core neo4j 
> API (by getting
> the index used explicitely and pull the nodes with index.get(key,value)).

Absolutely, and this is exactly why I'm using SDN. The performance tests I ran
are by no means supposed to simulate normal uses. The repetitions were there
to make sure I could get meaningful run times for the micro benchmark.

-TPP
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Comparing Lucene index lookup performance to lookup by node id

Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

Re: [Neo4j] Comparing Lucene index lookup performance to lookup by node id

5 matches

Site Navigation

Mail list logo

Footer information