[Neo4j] Neo4j Lucene index lookup performance when memory-constrained

Zongheng Yang Tue, 28 Jul 2015 17:33:45 -0700

Hi Neo4j devs,

My application does the following: constantly do some Lucene index lookups, 
then loop over the result nodes and get the IDs:


        ResourceIterator<Node> nodes = graphDb.findNodes(
            label, "name" + attr, search);
        Set<Long> userIds = new HashSet<Long>();
        while (nodes.hasNext()) {
            userIds.add(nodes.next().getId());
        }

*Environment.* Linux box, 15GB RAM, 2GB JVM heap. The Neo4j store files 
total 29GB on-disk; the Lucene indexes total 6GB. Using Neo4j 2.2 embedded; 
cache_type is set to none.

*Symptom 1. *When the Neo4j page cache size (dbms.pagecache.memory) is set 
to low enough (<= 8.5GB) -- hence leaving enough space for the Lucene 
indexes -- the latency looks good enough.  

*Symptom 2.* However, when it is set slightly larger -- to 9.5GB or 10GB -- 
the following starts to happen during the queries. *Constant* high IO wait; 
the OS constantly reads in tens of MBs; *constant* stream of 3k+ maj_flt 
for the Java process.  It seems *as if the indexes could not evict the 
Neo4j pages*, or in other words, as if *the index pages were being 
independently LRU-cached*.  The CPU constantly waits for IO to bring in 
some pages (I'd guess most likely all Lucene pages) to do any work (1% usr 
usage every ~10 seconds).

This is very surprising to me, as I'd expect even in memory-constrained 
cases like this the following would happen: the Lucene indexes would 
compete against and eventually win over the Neo4j store pages (brought into 
memory by full warmup done at start time) in the OS page cache, and hence 
the high IO would occur initially but decrease to none later (5.8 GB of 
indexes should fit comfortably in 15GB RAM). 

Could someone explain why the above would be happening?

Zongheng


-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Neo4j Lucene index lookup performance when memory-constrained

Reply via email to