Hi,
I'm evaluating different graph database products and am new to OrientDB.
One use case I'm testing now is loading data to graph database. The use
case basically is building a graph with half million vertices and a few
millions of edges. I'm using OrientDB 1.6.4 on a CentOS Linux box with 8GB
of memory and the CentOS version is 5.10 and the JDK is 1.7.0_40. The
blueprints version is blueprints-core-2.5.0-SNAPSHOT
and blueprints-orient-graph-2.5.0-SNAPSHOT.
I use OrientGraph to build the graph. During initialization, it creates an
OrientGraph instance ("plocal" or "local" storage engine) and creates a few
key indices using createKeyIndex on vertex nodes. The building process does
index based lookups (OrientGraph.getVertices()) on vertices and based
whether the vertices exist or not, it will create them and set properties,
or create edges and set properties on edges. There are no global index
based lookups on edges. Edges are always reached via vertices. I load the
data in batches (each batch probably has a few hundreds operations like
looking up a vertex, creating a vertex, getting all edges of a vertex,
creating an edge and setting a property etc.) and commit transaction at the
end of each batch. After processing around 300 batches, an exception of
"Maximum lock count exceeded" was thrown. I tried both "local" and "plocal"
storage engine and got the same exception. I searched this group and got to
know that OrientDB used to have this bug in very old versions and I'm using
the latest version (1.6.4).
Since the exception was thrown in transaction commit, I changed to use the
OrientGraphNoTx interface. Without transaction enabled, I did not get the
"Maximum lock count exceeded" exception but I noticed that the process was
really eager for memory. Giving JVM 4GB of max memory, the speed was OK
although still slower than Neo4j for the same process. I did not let the
process finish once I saw the memory usage growing to 3GB. I restarted the
process by giving JVM only 1GB of maximum memory and after running the
process for 2 and half hours, an OutOfMemoryError was thrown. While with
Neo4j, the whole loading process was finished using 1GB of maximum memory
with quite good performance.
Another thing I noticed was that the database size on disk is much bigger
than the database size using Neo4j. At half way of the loading process, the
OrientDB DB directory is already at 4GB, while for Neo4j the DB directory
size is only 1.6GB after the whole loading process is finished.
I actually really like the way OrientDB is designed, the mix of document
and graph features and the binary protocol on remote interfaces. I really
appreciate if you can help me get around the hurdles mentioned above. I
might have done something wrong or maybe there are some tuning can be done.
Thanks.
Jun
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.