Re: [Neo4j] LOAD CSV takes over an hour

Pavan Kumar Wed, 25 Jun 2014 04:06:22 -0700

Hello,
I am using load csv command for importing database.
i am getting the following error "GC overhead limit exceeded"
Do i have to change jvm properties..?? I have value set to -Xmx512, do i
have to increase this to avoid the error.
Kindly help



On Thu, Jun 19, 2014 at 11:23 AM, Pavan Kumar <kumar.pavan...@gmail.com>
wrote:

> Hi,
> I have tried creating constraints and index before attempting LOAD CSV
> command
> Commands are executing for long time and it it showing me "Unknown error"
> Any idea why it is giving me such error, I am running it on windows
> machine which has 8GB RAM.
> Do i have to change properties in neo4j.properties file
> Kindly help me
>
>
> On Wed, Jun 18, 2014 at 8:20 PM, david fauth <dsfa...@gmail.com> wrote:
>
>> Run the Create Constraint commands then attempt your LOAD CSV command.
>>
>>
>> On Wednesday, June 18, 2014 5:13:42 AM UTC-4, Pavan Kumar wrote:
>>
>>> Hi,
>>> So My cypher will be like
>>> ----------------------------------------------------------
>>>  USING PERIODIC COMMIT 1000
>>> LOAD CSV WITH HEADERS FROM
>>> "file:D:/Graph_Database/DrugBank_database/DrugbankFull_Database.csv"
>>> AS csvimport
>>> create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid is
>>> unique;
>>> MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID}) ON CREATE SET
>>> uniprotid.Name=csvimport.Name,uniprotid.Uniprot_title=
>>> csvimport.Uniprot_Title
>>> create constraint on (genename:Gene_Name) assert genename:Gene_Name is
>>> unique;
>>> merge (genename:Gene_Name{genename: csvimport.Gene_Name})
>>>  and so on...
>>>  merge (uniprotid)-[:Genename]->(genename)
>>> merge (uniprotid)-[:GenBank_ProteinID]->(Genbank_prtn)
>>> and so on...
>>> ---------------------------------------------------------
>>> Is that right...? i tried the same statements in 2.1.2 and i am getting
>>> the following errors.
>>>
>>> 1. Invalid input 'n': expected 'p/P' (line 5, column 20)
>>>
>>>
>>> "create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid is 
>>> unique;"
>>>
>>>
>>> 2. Cannot merge node using null property value for uniprotid
>>>
>>>
>>> Kindly help
>>>
>>>
>>>
>>>
>>> On Wed, Jun 18, 2014 at 1:44 PM, Michael Hunger <
>>> michael...@neotechnology.com> wrote:
>>>
>>>> I don't understand.
>>>>
>>>> Michael
>>>>
>>>>  Am 18.06.2014 um 10:11 schrieb Pavan Kumar <kumar.p...@gmail.com>:
>>>>
>>>>  When i use create statements, it is not considering  the empty fileds
>>>> from the CSV file. So used Merge command
>>>>
>>>>
>>>> On Wed, Jun 18, 2014 at 1:09 PM, Michael Hunger <
>>>> michael...@neotechnology.com> wrote:
>>>>
>>>>> And create the indexes for all those node + property
>>>>>
>>>>> And for operations like this:
>>>>>
>>>>> MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID,
>>>>> Name:csvimport.Name, Uniprot_title: csvimport.Uniprot_Title}
>>>>>
>>>>> please use a constraint:
>>>>>
>>>>> create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid
>>>>> is unique;
>>>>>
>>>>> and the merge operation like this, so it can actually leverage the
>>>>> index/constraint.
>>>>>
>>>>>  MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID}) ON CREATE SET
>>>>> uniprotid.Name=csvimport.Name,uniprotid.Uniprot_title=
>>>>> csvimport.Uniprot_Title
>>>>>
>>>>> ...
>>>>>
>>>>>  Am 18.06.2014 um 09:18 schrieb Pavan Kumar <kumar.p...@gmail.com>:
>>>>>
>>>>>   My query looks like following
>>>>> USING PERIODIC COMMIT 1000
>>>>> LOAD CSV WITH HEADERS FROM
>>>>> "file:D:/Graph_Database/DrugBank_database/DrugbankFull_Database.csv"
>>>>> AS csvimport
>>>>> merge (uniprotid:Uniprotid{uniprotid: csvimport.ID,
>>>>> Name:csvimport.Name, Uniprot_title: csvimport.Uniprot_Title})
>>>>> merge (genename:Gene_Name{genename: csvimport.Gene_Name})
>>>>> merge (Genbank_prtn:GenBank_Protein{GenBank_protein_id:
>>>>> csvimport.GenBank_Protein_ID})
>>>>> merge (Genbank_gene:GenBank_Gene{GenBank_gene_id:
>>>>> csvimport.GenBank_Gene_ID})
>>>>> merge (pdbid:PDBID{PDBid: csvimport.PDB_ID})
>>>>> merge (geneatlas:Geneatlasid{Geneatlas: csvimport.GenAtlas_ID})
>>>>> merge (HGNC:HGNCid{hgnc: csvimport.HGNC_ID})
>>>>> merge (species:Species{Species: csvimport.Species})
>>>>> merge (genecard:Genecardid{Genecard: csvimport.GeneCard_ID})
>>>>> merge (drugid:DrugID{DrugID: csvimport.Drug_IDs})
>>>>> merge (uniprotid)-[:Genename]->(genename)
>>>>> merge (uniprotid)-[:GenBank_ProteinID]->(Genbank_prtn)
>>>>> merge (uniprotid)-[:GenBank_GeneID]->(Genbank_gene)
>>>>> merge (uniprotid)-[:PDBID]->(pdbid)
>>>>> merge (uniprotid)-[:GeneatlasID]->(geneatlas)
>>>>> merge (uniprotid)-[:HGNCID]->(HGNC)
>>>>> merge (uniprotid)-[:Species]->(species)
>>>>> merge (uniprotid)-[:GenecardID]->(genecard)
>>>>> merge (uniprotid)-[:DrugID]->(drugid)
>>>>>
>>>>> I am attaching sample csv file also. Please find it.
>>>>> As suggested, I will try with new version of neo4j
>>>>>
>>>>>
>>>>> On Wed, Jun 18, 2014 at 12:41 PM, Michael Hunger <
>>>>> michael...@neotechnology.com> wrote:
>>>>>
>>>>>> What does your query look like?
>>>>>> Please switch to Neo4j 2.1.2
>>>>>>
>>>>>> And create indexes / constraints for the nodes you're inserting with
>>>>>> MERGE or looking up via MATCH.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>  Am 18.06.2014 um 08:46 schrieb Pavan Kumar <kumar.p...@gmail.com>:
>>>>>>
>>>>>>   Hi,
>>>>>> I have deployed neo4j 2.1.0-M01 on windows which has 8GB RAM. I am
>>>>>> trying to import CSV file which has 30000 records. I am using USING
>>>>>> PERIODIC COMMIT 1000 LOAD CSV command for importing, but it gives
>>>>>> unknown error. I have modified neo4j.properties file as adviced in the
>>>>>> blogs. My neo4j.properties now looks like
>>>>>>  # Default values for the low-level graph engine
>>>>>>
>>>>>> neostore.nodestore.db.mapped_memory=200M
>>>>>> neostore.relationshipstore.db.mapped_memory=4G
>>>>>> neostore.propertystore.db.mapped_memory=500M
>>>>>> neostore.propertystore.db.strings.mapped_memory=500M
>>>>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>>>>
>>>>>> # Enable this to be able to upgrade a store from an older version
>>>>>> allow_store_upgrade=true
>>>>>>
>>>>>> # Enable this to specify a parser other than the default one.
>>>>>> #cypher_parser_version=2.0
>>>>>>
>>>>>> # Keep logical logs, helps debugging but uses more disk space,
>>>>>> enabled for
>>>>>> # legacy reasons To limit space needed to store historical logs use
>>>>>> values such
>>>>>> # as: "7 days" or "100M size" instead of "true"
>>>>>> keep_logical_logs=true
>>>>>>
>>>>>> # Autoindexing
>>>>>>
>>>>>> # Enable auto-indexing for nodes, default is false
>>>>>> node_auto_indexing=true
>>>>>>
>>>>>> # The node property keys to be auto-indexed, if enabled
>>>>>> #node_keys_indexable=name,age
>>>>>>
>>>>>> # Enable auto-indexing for relationships, default is false
>>>>>> relationship_auto_indexing=true
>>>>>>
>>>>>> # The relationship property keys to be auto-indexed, if enabled
>>>>>> #relationship_keys_indexable=name,age
>>>>>>
>>>>>> # Setting for Community Edition:
>>>>>> cache_type=weak
>>>>>>
>>>>>> Still i am facing the same problem. Is there any other file to change
>>>>>> properties. Kindly help me in this issue.
>>>>>> Thanks in advance
>>>>>>
>>>>>> On Tuesday, 4 March 2014 21:24:03 UTC+5:30, Aram Chung wrote:
>>>>>>>
>>>>>>>  Hi,
>>>>>>>
>>>>>>> I was asked to post this here by Mark Needham (@markhneedham) who
>>>>>>> thought my query took longer than it should.
>>>>>>>
>>>>>>> I'm trying to see how graph databases could be used in investigative
>>>>>>> journalism: I was loading in New York State's Active Corporations:
>>>>>>> Beginning 1800 data from https://data.ny.gov/Economic-
>>>>>>> Development/Active-Corporations-Beginning-1800/n9v6-gdp6 as a
>>>>>>> 1964486-row csv (and deleted all U+F8FF characters, because I was 
>>>>>>> getting
>>>>>>> "[null] is not a supported property value"). The Cypher query I used was
>>>>>>>
>>>>>>> USING PERIODIC COMMIT 500
>>>>>>> LOAD CSV
>>>>>>>   FROM "file://path/to/csv/Active_Corporations___Beginning_1800__
>>>>>>> without_header__wonky_characters_fixed.csv"
>>>>>>>   AS company
>>>>>>> CREATE (:DataActiveCorporations
>>>>>>> {
>>>>>>> DOS_ID:company[0],
>>>>>>> Current_Entity_Name:company[1],
>>>>>>> Initial_DOS_Filing_Date:company[2],
>>>>>>> County:company[3],
>>>>>>> Jurisdiction:company[4],
>>>>>>> Entity_Type:company[5],
>>>>>>>
>>>>>>> DOS_Process_Name:company[6],
>>>>>>> DOS_Process_Address_1:company[7],
>>>>>>> DOS_Process_Address_2:company[8],
>>>>>>> DOS_Process_City:company[9],
>>>>>>> DOS_Process_State:company[10],
>>>>>>> DOS_Process_Zip:company[11],
>>>>>>>
>>>>>>> CEO_Name:company[12],
>>>>>>> CEO_Address_1:company[13],
>>>>>>> CEO_Address_2:company[14],
>>>>>>> CEO_City:company[15],
>>>>>>> CEO_State:company[16],
>>>>>>> CEO_Zip:company[17],
>>>>>>>
>>>>>>> Registered_Agent_Name:company[18],
>>>>>>> Registered_Agent_Address_1:company[19],
>>>>>>> Registered_Agent_Address_2:company[20],
>>>>>>> Registered_Agent_City:company[21],
>>>>>>> Registered_Agent_State:company[22],
>>>>>>> Registered_Agent_Zip:company[23],
>>>>>>>
>>>>>>> Location_Name:company[24],
>>>>>>> Location_Address_1:company[25],
>>>>>>> Location_Address_2:company[26],
>>>>>>> Location_City:company[27],
>>>>>>> Location_State:company[28],
>>>>>>> Location_Zip:company[29]
>>>>>>> }
>>>>>>> );
>>>>>>>
>>>>>>> Each row is one node so it's as close to the raw data as possible.
>>>>>>> The idea is loosely that these nodes will be linked with new nodes
>>>>>>> representing people and addresses verified by reporters.
>>>>>>>
>>>>>>> This is what I got:
>>>>>>>
>>>>>>> +-------------------+
>>>>>>> | No data returned. |
>>>>>>> +-------------------+
>>>>>>> Nodes created: 1964486
>>>>>>> Properties set: 58934580
>>>>>>> Labels added: 1964486
>>>>>>> 4550855 ms
>>>>>>>
>>>>>>> Some context information:
>>>>>>> Neo4j Milestone Release 2.1.0-M01
>>>>>>> Windows 7
>>>>>>> java version "1.7.0_03"
>>>>>>>
>>>>>>> Best,
>>>>>>> Aram
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to neo4j+un...@googlegroups.com.
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to a topic in
>>>>>> the Google Groups "Neo4j" group.
>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>>> topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>> neo4j+un...@googlegroups.com.
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Pavan Kumar
>>>>> Project Engineer
>>>>> CDAC -KP
>>>>> Ph +91-7676367646
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to neo4j+un...@googlegroups.com.
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> <SAmple_Drugbank.xls>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "Neo4j" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>> topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> neo4j+un...@googlegroups.com.
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Pavan Kumar
>>>> Project Engineer
>>>> CDAC -KP
>>>> Ph +91-7676367646
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to neo4j+un...@googlegroups.com.
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "Neo4j" group.
>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>> topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> neo4j+un...@googlegroups.com.
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Pavan Kumar
>>> Project Engineer
>>> CDAC -KP
>>> Ph +91-7676367646
>>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> neo4j+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Thanks & Regards,
> Pavan Kumar
> Project Engineer
> CDAC -KP
> Ph +91-7676367646
>



-- 
Thanks & Regards,
Pavan Kumar
Project Engineer
CDAC -KP
Ph +91-7676367646

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] LOAD CSV takes over an hour

Reply via email to