Hi Aram, * Do you have any other information of the spec of the machine you're running this on? e.g. how much RAM etc * Have you tried upping the value to PERIODIC COMMIT? Perhaps try it out with a smaller subset of the data to measure the impact - try it with values of 1,000 / 10,000 perhaps. * I think it would be interesting to pull out some other things as nodes as well - might lead to more interesting queries e.g. CEO, Location, Registered Agent, DOS Process, Jurisdiction could all be nodes that link back to a DOS.
Let me know if any of that doesn't make sense. Mark On 4 March 2014 15:54, Aram Chung <aramol...@gmail.com> wrote: > Hi, > > I was asked to post this here by Mark Needham (@markhneedham) who thought > my query took longer than it should. > > I'm trying to see how graph databases could be used in investigative > journalism: I was loading in New York State's Active Corporations: > Beginning 1800 data from > https://data.ny.gov/Economic-Development/Active-Corporations-Beginning-1800/n9v6-gdp6as > a 1964486-row csv (and deleted all U+F8FF characters, because I was > getting "[null] is not a supported property value"). The Cypher query I > used was > > USING PERIODIC COMMIT 500 > LOAD CSV > FROM > "file://path/to/csv/Active_Corporations___Beginning_1800__without_header__wonky_characters_fixed.csv" > AS company > CREATE (:DataActiveCorporations > { > DOS_ID:company[0], > Current_Entity_Name:company[1], > Initial_DOS_Filing_Date:company[2], > County:company[3], > Jurisdiction:company[4], > Entity_Type:company[5], > > DOS_Process_Name:company[6], > DOS_Process_Address_1:company[7], > DOS_Process_Address_2:company[8], > DOS_Process_City:company[9], > DOS_Process_State:company[10], > DOS_Process_Zip:company[11], > > CEO_Name:company[12], > CEO_Address_1:company[13], > CEO_Address_2:company[14], > CEO_City:company[15], > CEO_State:company[16], > CEO_Zip:company[17], > > Registered_Agent_Name:company[18], > Registered_Agent_Address_1:company[19], > Registered_Agent_Address_2:company[20], > Registered_Agent_City:company[21], > Registered_Agent_State:company[22], > Registered_Agent_Zip:company[23], > > Location_Name:company[24], > Location_Address_1:company[25], > Location_Address_2:company[26], > Location_City:company[27], > Location_State:company[28], > Location_Zip:company[29] > } > ); > > Each row is one node so it's as close to the raw data as possible. The > idea is loosely that these nodes will be linked with new nodes representing > people and addresses verified by reporters. > > This is what I got: > > +-------------------+ > | No data returned. | > +-------------------+ > Nodes created: 1964486 > Properties set: 58934580 > Labels added: 1964486 > 4550855 ms > > Some context information: > Neo4j Milestone Release 2.1.0-M01 > Windows 7 > java version "1.7.0_03" > > Best, > Aram > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to neo4j+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.