Hello everyone, We are currently testing OrientDB to check if we may use it as a database. The problem we are facing right now is about memory consumption through data generation.
We want to test if OrientDB support our queries in a larger scale so we went through building a little generator in java to insert data corresponding to our needs.In order to generate and insert the data in a fast way we parse first ontologies (which serve for semantic data referencing) and store them in memory. Afterwards we generate some random data, bind them together on-the-fly and also bind them to some concepts in the ontologies graphs. All of that is made using the Java Graph API. It works nicely at the beginning, but in the end it always crashes with "java.lang.OutOfMemoryError: GC overhead limit exceeded". The java program which handle the data generation has 1.5 GB ram to work with and when it crashes we have almost generated a million of OrientDB elements ( about a third as vertices, the rest as edges). We tried a lot of things, Massive Insert intent, setting keepReferencesInMemory to false, limiting disk cache size, and we checked multiple times to ensure we were not doing anything stupid with the memory. We also thought about using fetchplan to ensure the cache only stores the main document in memory and not all of its edges but this option is not accessible in the Graph API.Yet, we can't make the generator go further because it always lack memory. We think it's related to the disk cache usage, we can't properly measure it but its visible in htop, the memory usage keeps growing during the last half of the generation even though at this point we are only inserting data in the graph and we are not storing anything anymore in the Java program. Our theory behind this is that we store pointers to the ontology nodes which themselves points on the nodes we generate on the fly and at some point this may trigger the cache to keep the pointed nodes alive in memory. This would explain why the memory keep growing. I'm sorry of its a bit fuzzy. We could just add more RAM to the JVM but we can't help but wonder: What are we missing ? Is there a way to properly generate a set data with connections between eachother and insert it into OrientDB ? Thanks, Cyprien Gottstein. -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
