Hi Neo4j devs,
My application does the following: constantly do some Lucene index lookups,
then loop over the result nodes and get the IDs:
ResourceIterator<Node> nodes = graphDb.findNodes(
label, "name" + attr, search);
Set<Long> userIds = new HashSet<Long>();
while (nodes.hasNext()) {
userIds.add(nodes.next().getId());
}
*Environment.* Linux box, 15GB RAM, 2GB JVM heap. The Neo4j store files
total 29GB on-disk; the Lucene indexes total 6GB. Using Neo4j 2.2 embedded;
cache_type is set to none.
*Symptom 1. *When the Neo4j page cache size (dbms.pagecache.memory) is set
to low enough (<= 8.5GB) -- hence leaving enough space for the Lucene
indexes -- the latency looks good enough.
*Symptom 2.* However, when it is set slightly larger -- to 9.5GB or 10GB --
the following starts to happen during the queries. *Constant* high IO wait;
the OS constantly reads in tens of MBs; *constant* stream of 3k+ maj_flt
for the Java process. It seems *as if the indexes could not evict the
Neo4j pages*, or in other words, as if *the index pages were being
independently LRU-cached*. The CPU constantly waits for IO to bring in
some pages (I'd guess most likely all Lucene pages) to do any work (1% usr
usage every ~10 seconds).
This is very surprising to me, as I'd expect even in memory-constrained
cases like this the following would happen: the Lucene indexes would
compete against and eventually win over the Neo4j store pages (brought into
memory by full warmup done at start time) in the OS page cache, and hence
the high IO would occur initially but decrease to none later (5.8 GB of
indexes should fit comfortably in 15GB RAM).
Could someone explain why the above would be happening?
Zongheng
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.