[orientdb] Proper way to generate and insert data into OrientDB

Cyprien Gottstein Thu, 22 Dec 2016 09:30:06 -0800

Hello everyone,

We are currently testing OrientDB to check if we may use it as a database. 
The problem we are facing right now is about memory consumption through 
data generation.


We want to test if OrientDB support our queries in a larger scale so we 
went through building a little generator in java to insert data 
corresponding to our needs.In order to generate and insert the data in a 
fast way we parse first ontologies (which serve for semantic data 
referencing) and store them in memory. Afterwards we generate some random 
data, bind them together on-the-fly and also bind them to some concepts in 
the ontologies graphs. All of that is made using the Java Graph API.

It works nicely at the beginning, but in the end it always crashes with 
"java.lang.OutOfMemoryError: GC overhead limit exceeded". The java program 
which handle the data generation has 1.5 GB ram to work with and when it 
crashes we have almost generated a million of OrientDB elements ( about a 
third as vertices, the rest as edges).

We tried a lot of things, Massive Insert intent, setting 
keepReferencesInMemory to false, limiting disk cache size, and we checked 
multiple times to ensure we were not doing anything stupid with the memory. 
We also thought about using fetchplan to ensure the cache only stores the 
main document in memory and not all of its edges but this option is not 
accessible in the Graph API.Yet, we can't make the generator go further 
because it always lack memory.

We think it's related to the disk cache usage, we can't properly measure it 
but its visible in htop, the memory usage keeps growing during the last 
half of the generation even though at this point we are only inserting data 
in the graph and we are not storing anything anymore in the Java program. 
Our theory behind this is that we store pointers to the ontology nodes 
which themselves points on the nodes we generate on the fly and at some 
point this may trigger the cache to keep the pointed nodes alive in memory. 
This would explain why the memory keep growing.

I'm sorry of its a bit fuzzy.

We could just add more RAM to the JVM but we can't help but wonder: What 
are we missing  ? Is there a way to properly generate a set data with 
connections between eachother and insert it into OrientDB ?

Thanks,

Cyprien Gottstein.




-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Proper way to generate and insert data into OrientDB

Reply via email to