[orientdb] Re: Proper way to generate and insert data into OrientDB

Oleksandr Gubchenko Thu, 22 Dec 2016 09:36:41 -0800

It might be a jvm memory tuning issue. You can check out the gclog adding 
this parameters:
add -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCID 
-XX:+PrintGCDetails -Xloggc:$ORIENTDBHOME/log/gc%p_%t.log 
check the memory usage and give server the correct amount of memory.


Il giorno giovedì 22 dicembre 2016 18:29:49 UTC+1, Cyprien Gottstein ha 
scritto:
>
> Hello everyone,
>
> We are currently testing OrientDB to check if we may use it as a database. 
> The problem we are facing right now is about memory consumption through 
> data generation.
>
> We want to test if OrientDB support our queries in a larger scale so we 
> went through building a little generator in java to insert data 
> corresponding to our needs.In order to generate and insert the data in a 
> fast way we parse first ontologies (which serve for semantic data 
> referencing) and store them in memory. Afterwards we generate some random 
> data, bind them together on-the-fly and also bind them to some concepts in 
> the ontologies graphs. All of that is made using the Java Graph API.
>
> It works nicely at the beginning, but in the end it always crashes with 
> "java.lang.OutOfMemoryError: GC overhead limit exceeded". The java program 
> which handle the data generation has 1.5 GB ram to work with and when it 
> crashes we have almost generated a million of OrientDB elements ( about a 
> third as vertices, the rest as edges).
>
> We tried a lot of things, Massive Insert intent, setting 
> keepReferencesInMemory to false, limiting disk cache size, and we checked 
> multiple times to ensure we were not doing anything stupid with the memory. 
> We also thought about using fetchplan to ensure the cache only stores the 
> main document in memory and not all of its edges but this option is not 
> accessible in the Graph API.Yet, we can't make the generator go further 
> because it always lack memory.
>
> We think it's related to the disk cache usage, we can't properly measure 
> it but its visible in htop, the memory usage keeps growing during the last 
> half of the generation even though at this point we are only inserting data 
> in the graph and we are not storing anything anymore in the Java program. 
> Our theory behind this is that we store pointers to the ontology nodes 
> which themselves points on the nodes we generate on the fly and at some 
> point this may trigger the cache to keep the pointed nodes alive in memory. 
> This would explain why the memory keep growing.
>
> I'm sorry of its a bit fuzzy.
>
> We could just add more RAM to the JVM but we can't help but wonder: What 
> are we missing  ? Is there a way to properly generate a set data with 
> connections between eachother and insert it into OrientDB ?
>
> Thanks,
>
> Cyprien Gottstein.
>
>
>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Re: Proper way to generate and insert data into OrientDB

Reply via email to