Splitting it up worked well, I still had to give my VM 32 GB of memory and
28 GB heap, but the files I was importing were more than 50MB each, the
largest being 165MB with about 5 million rows, which took maybe 4 minutes
to import. I just didn't expect to need so much memory, there is a table in
the docs that lead me to believe 8 GB should be fine but that table must be
geared towards cypher queries that work against the existing graph. It
looks like load csv can require more memory than the amount needed to query
against a given number of primitives that are already in the graph.

Today/this weekend I'm planning some even larger import with csv's ~300MB,
possibly larger, exciting! I guess that strategy when I can't add more
memory to the VM is to split the files into smaller csv's?

Chris


On Thu, Aug 28, 2014 at 4:27 PM, Michael Hunger <
michael.hun...@neotechnology.com> wrote:

> You didn't like my suggestions to split it up?
>
> I should probably have explained that cypher will pull all input data
> through the first stage (create) allocating a lot of memory only to
> continue then to the next part (match).
>
> This is because of the match my own creates issue which otherwise could
> lead to an infinite loop (matching on data that you just created creating
> more data to match on etc.)
>
> So my split up suggestion would have avoided that.
>
> Feel free to try it out and report if it behaves better.
>
> Cheers,
>
> Michael
>
> Am 28.08.2014 um 20:18 schrieb Chris Roberts <chris.gogr...@gmail.com>:
>
> It finished. 32GB memory in the VM and 28GB of JVM heap.
>
> In total the import created: Created 4595170 relationships, returned 0
> rows in 390028 ms
>
> Chris
>
>
> On Wed, Aug 27, 2014 at 5:14 PM, Michael Hunger <
> michael.hun...@neotechnology.com> wrote:
>
>> Chris,
>>
>> your cypher query seems to be wrong:
>>
>> 1. split it up into node creation and relationship creation
>> 2. use bigger transaciton sizes
>> 3. you forgot a colon before :GraphPart so it doesn't use an index for
>> that one
>> 4. you don't have do use the path and foreach a simple match is good
>> enough
>>
>> USING PERIODIC COMMIT 10000
>>
>> LOAD CSV WITH HEADERS FROM "file://localhost/home/deployer/tblMfr.csv"
>> AS csvLine
>>     FIELDTERMINATOR '\t'
>> CREATE (vendor:GraphVendor { vendor_code_id: toInt(csvLine.Mfr_Code_ID),
>> vendor_id: toInt(csvLine.Mfr_ID), vendor_name: csvLine.Mfr_Name,
>> vendor_abbreviation: csvLine.Mfr_Abbr, vendor_status: csvLine.Mfr_Status });
>>
>>
>> create index on :GraphVendor(vendor_id);
>>
>> USING PERIODIC COMMIT 10000
>>
>> LOAD CSV WITH HEADERS FROM "file://localhost/home/deployer/tblMfr.csv"
>> AS csvLine
>>     FIELDTERMINATOR '\t'
>>
>> WITH toInt(csvLine.Mfr_ID) as vendor_id
>>
>> MATCH (vendor:GraphVendor { vendor_id: vendor_id})
>> MATCH (part:GraphPart {mfr_id: vendor_id})
>> MERGE (part)-[:MANUFACTURED_BY]->(vendor);
>>
>>
>>
>> Am 26.08.2014 um 23:33 schrieb Chris G <chris.gogr...@gmail.com>:
>>
>> Group, I'm trying to wrap me head around the memory configuration for
>> Neo4j.
>>
>> I've got ~4 million parts that I have loaded and indexed via cypher and
>> have these indexes:
>>
>> Indexes
>>   ON :GraphPart(mfr_id)  ONLINE
>>   ON :GraphPart(part_id) ONLINE (for uniqueness constraint)
>>
>> Constraints
>>   ON (graphpart:GraphPart) ASSERT graphpart.part_id IS UNIQUE
>>
>>
>>
>> Now I want to import my vendors via this cypher:
>>
>> USING PERIODIC COMMIT 1
>> LOAD CSV WITH HEADERS FROM "file://localhost/home/deployer/tblMfr.csv"
>> AS csvLine
>>     FIELDTERMINATOR '\t'
>> CREATE (vendor:GraphVendor { vendor_code_id: toInt(csvLine.Mfr_Code_ID),
>> vendor_id: toInt(csvLine.Mfr_ID), vendor_name: csvLine.Mfr_Name,
>> vendor_abbreviation: csvLine.Mfr_Abbr, vendor_status: csvLine.Mfr_Status })
>> WITH vendor
>> MATCH p = (GraphPart {mfr_id: vendor.vendor_id})
>> FOREACH (n IN nodes(p) | MERGE (n)-[r:MANUFACTURED_BY]->(vendor))
>>
>>
>> I have configured the conf files:
>>
>> neo4j.properties:
>> neostore.nodestore.db.mapped_memory=50M
>> neostore.relationshipstore.db.mapped_memory=500M
>> neostore.propertystore.db.mapped_memory=100M
>> neostore.propertystore.db.strings.mapped_memory=130M
>> neostore.propertystore.db.arrays.mapped_memory=0M
>>
>> neo4j-wrapper.conf:
>>
>> wrapper.java.initmemory=4096
>> wrapper.java.maxmemory=12288
>>
>>
>> even with 12G heap and PERIODIC COMMIT *1 *messages.log looks like this:
>> 2014-08-26 21:14:08.936+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 719ms [total block
>> time: 16.227s]
>> 2014-08-26 21:14:10.874+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 1630ms [total block
>> time: 17.857s]
>> 2014-08-26 21:14:12.377+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 673ms [total block
>> time: 18.53s]
>> 2014-08-26 21:14:13.715+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 719ms [total block
>> time: 19.249s]
>> 2014-08-26 21:14:15.424+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 1400ms [total block
>> time: 20.649s]
>> 2014-08-26 21:14:16.924+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 754ms [total block
>> time: 21.403s]
>> 2014-08-26 21:14:18.146+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 908ms [total block
>> time: 22.311s]
>> 2014-08-26 21:14:19.881+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 1207ms [total block
>> time: 23.518s]
>> 2014-08-26 21:14:21.551+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 1033ms [total block
>> time: 24.551s]
>> 2014-08-26 21:14:22.801+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 827ms [total block
>> time: 25.378s]
>> 2014-08-26 21:14:49.154+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 26040ms [total block
>> time: 51.418s]
>> 2014-08-26 21:14:49.524+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 270ms [total block
>> time: 51.688s]
>> 2014-08-26 21:15:24.662+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 32772ms [total block
>> time: 84.46s]
>> 2014-08-26 21:15:51.122+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 26039ms [total block
>> time: 110.499s]
>> 2014-08-26 21:16:24.233+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 32902ms [total block
>> time: 143.401s]
>> 2014-08-26 21:16:50.232+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 25898ms [total block
>> time: 169.299s]
>> 2014-08-26 21:17:20.085+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 29753ms [total block
>> time: 199.052s]
>> 2014-08-26 21:17:46.225+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 26040ms [total block
>> time: 225.092s]
>> 2014-08-26 21:21:04.960+0000 WARN  [o.n.k.EmbeddedGraphDatabase]: GC
>> Monitor: Application threads blocked for an additional 29433ms [total block
>> time: 254.525s]
>>
>>
>> Could anyone suggest what I can try next, or some alternative memory
>> settings?
>>
>> I'm trying to get proof of concept up and running so I can present this
>> to my bosses.
>>
>> I hope I am missing something simple, if not I think it's time for Neo4j
>> to invest in some canonical documentation on how to configure neo4j memory
>> usage, There are sparse mentions in the user guide, but most of what I find
>> related to performance comes from blog posts, stack overflow questions, and
>> mailing list posts (most of which Michael Hunger is answering). I also hope
>> once I get past these initial memory settings the rest of neo4j will just
>> work.
>>
>> Thanks for reading,
>>
>> Chris
>>
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to neo4j+unsubscr...@googlegroups.com.
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/neo4j/rOr8tL1r-R8/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> neo4j+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> CR
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/rOr8tL1r-R8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
CR

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to