Hi there, I'm trying to load data from huge CSV files using Java Graph API. Most bigger CSV file has about 1M of rows. I would understand what is the best approach to create edges between rows of couples of CSV files.
I've tried two different approaches: 1) store all vertices and after iterate all vertices of a class and for every vertices (sources) find all targets vertices from second class using a SELECT query on an indexed attribute. This approach is very slowly and I obtain for every query execution the following alert: Query 'SELECT FROM cluster:clusterName WHERE primaryid = 117538101' fetched more than 50000 records: to speed up the execution, create an index or change the query to use an existent index I don't understand this alert because I've definited an index on my primaryid property. 2) store all vertices on graph and create two HashMap (key=primaryid, value=OrientVertex) that contains all vertices loaded and use it to find target vertices for every source: this approach is more quickly but it is use a lot of memory. I'm executing the process on a server that have 12GB of RAM but it's insufficient. Can you suggest me another approach more efficient, please? What is exactly the approach used by ETL, for instance? -- Fabio Rinnone Skype: fabiorinnone Web: http://www.fabiorinnone.eu -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
signature.asc
Description: OpenPGP digital signature