Hi there,

I'm trying to load data from huge CSV files using Java Graph API. Most
bigger CSV file has about 1M of rows. I would understand what is the
best approach to create edges between rows of couples of CSV files.

I've tried two different approaches:

1) store all vertices and after iterate all vertices of a class and for
every vertices (sources) find all targets vertices from second class
using a SELECT query on an indexed attribute. This approach is very
slowly and I obtain for every query execution the following alert:

Query 'SELECT FROM cluster:clusterName WHERE primaryid = 117538101'
fetched more than 50000 records: to speed up the execution, create an
index or change the query to use an existent index

I don't understand this alert because I've definited an index on my
primaryid property.

2) store all vertices on graph and create two HashMap (key=primaryid,
value=OrientVertex) that contains all vertices loaded and use it to find
target vertices for every source: this approach is more quickly but it
is use a lot of memory. I'm executing the process on a server that have
12GB of RAM but it's insufficient.

Can you suggest me another approach more efficient, please?

What is exactly the approach used by ETL, for instance?

-- 
Fabio Rinnone
Skype: fabiorinnone
Web: http://www.fabiorinnone.eu


-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to