I have an import script here: https://www.dropbox.com/s/6wz3bjee6s4oy4p/import-offshoreleaks-neo4j.sh?dl=0 and then run this in cypher-shell / neo4j-shell: https://www.dropbox.com/s/tglph6hxro78v13/configure.cql?dl=0
But there will be also a neo4j database release really soon. Cheers, Michael On Wed, Nov 22, 2017 at 7:57 PM, <[email protected]> wrote: > Hi! Has anyone here has worked with the Paradise Papers CSV dataset? ( > https://offshoreleaks.icij.org/pages/database) The icij have used neo4j > for their graph db, and from that link, offer the CSV files of the data. I > was able to create the nodes for the graph, but I'm having a tough time > creating the relationships from the edges CSV - it is currently importing > now (~4 hours), but I'm hoping there is a better way out there than how I > did it! > > The difficulty for me, apart from being new to neo4j, is that the edges > CSV contains all the relationships (5 different types) with the node_id for > the source and target id specified. The node_id is unique to a node that is > one of 5 types of nodes. So I figured that I could write a statement > (ignoring properties) that would read the CSV as 'line' and then: > > MATCH (n1 {node_id: line.`node_1`}), (n2 {node_id: line.`node_2`}) > CREATE (n1)-[:line.`rel_type`]->(n2); > > The problem with this is that you can't programmatically specify the > relationship type.. I don't think. So I came up with the following: > > MATCH (n1 {node_id: line.`node_1`}), (n2 {node_id: line.`node_2`}) > FOREACH(ignoreMe IN CASE WHEN line.`rel_type`='registered_address' THEN > [1] ELSE [] END | > MERGE (n1)-[:REGISTERED_ADDRESS]->(n2) > ) > <Other FOREACH statements, one for each type of relationship> ... > > Now that last idea works, but really slowly, even with indexes on node_id > for each node type. It was creating about 25 relationships every 10 seconds > which wasn't going to work for ~ 400,000 relationships. > > What I ended up doing was dumping the CSVs into a MySQL db and through a > multi join query, 'selected' the individual CREATE statements for every > relationship, saved this to a file, installed APOC, granted permissions and > then ran the file using runFile. It is faster now (probably going to take > 4-5 hours) but seems overly complicated. I'm hoping someone has a better > way of doing it! > > Ideas? :) > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
