Hi,

I'm evaluating different graph database products and am new to OrientDB. 
One use case I'm testing now is loading data to graph database. The use 
case basically is building a graph with half million vertices and a few 
millions of edges. I'm using OrientDB 1.6.4 on a CentOS Linux box with 8GB 
of memory and the CentOS version is 5.10 and the JDK is 1.7.0_40. The 
blueprints version is blueprints-core-2.5.0-SNAPSHOT 
and blueprints-orient-graph-2.5.0-SNAPSHOT.

I use OrientGraph to build the graph. During initialization, it creates an 
OrientGraph instance ("plocal" or "local" storage engine) and creates a few 
key indices using createKeyIndex on vertex nodes. The building process does 
index based lookups (OrientGraph.getVertices()) on vertices and based 
whether the vertices exist or not, it will create them and set properties, 
or create edges and set properties on edges. There are no global index 
based lookups on edges. Edges are always reached via vertices. I load the 
data in batches (each batch probably has a few hundreds operations like 
looking up a vertex, creating a vertex, getting all edges of a vertex, 
creating an edge and setting a property etc.) and commit transaction at the 
end of each batch. After processing around 300 batches, an exception of 
"Maximum lock count exceeded" was thrown. I tried both "local" and "plocal" 
storage engine and got the same exception. I searched this group and got to 
know that OrientDB used to have this bug in very old versions and I'm using 
the latest version (1.6.4).

Since the exception was thrown in transaction commit, I changed to use the 
OrientGraphNoTx interface. Without transaction enabled, I did not get the 
"Maximum lock count exceeded" exception but I noticed that the process was 
really eager for memory. Giving JVM 4GB of max memory, the speed was OK 
although still slower than Neo4j for the same process. I did not let the 
process finish once I saw the memory usage growing to 3GB. I restarted the 
process by giving JVM only 1GB of maximum memory and after running the 
process for 2 and half hours, an OutOfMemoryError was thrown. While with 
Neo4j, the whole loading process was finished using 1GB of maximum memory 
with quite good performance.

Another thing I noticed was that the database size on disk is much bigger 
than the database size using Neo4j. At half way of the loading process, the 
OrientDB DB directory is already at 4GB, while for Neo4j the DB directory 
size is only 1.6GB after the whole loading process is finished.

I actually really like the way OrientDB is designed, the mix of document 
and graph features and the binary protocol on remote interfaces. I really 
appreciate if you can help me get around the hurdles mentioned above. I 
might have done something wrong or maybe there are some tuning can be done. 

Thanks.
Jun

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to