Re: [Neo4j] Can we commit at regular intervals in batch insert mode
2010/6/17 Suruchi Deodhar deodharsuru...@gmail.com: Hi all, I am a new user of neo4j graph database and I am trying to port my data from Oracle 10g into Neo4j using batch insert. I am basically reading node data from one table with around 2.5 million rows and relationship data from another table with around 105 million rows and creating nodes and relationships in Neo4j. While doing this and using the java -Xmx 4096M option also, I run into occasional warnings related to java heap size and creation of the database is taking days (I had only expected some hours). Just to be picky: java expects -Xmx4096M (written together) _not_ -Xmx 4096M (written apart). 1. Is there a provision to commit at regular intervals in batch insert mode? If you're using the BatchInserter (as opposed to GraphDatabaseService) you don't need to commit because there's no concept of transactions in batch insertion mode. 2. Am I doing something in a wrong manner or do I need to apply some optimizations while creating such a huge graph db? Has anyone encountered similar problems and resolved it? All I know is that it's usually no problems to insert tens of millions of nodes/relationships or more and quite fast as well. 3. Is this because of the Indexer used in Batch insert mode? I had created some 8000 modes without using batch inserter earlier and it took me only a few minutes. Something that can take up time is mixed reads and writes to/from the LuceneIndexBatchInserter. Try to group many writes and many reads together (with an optimize() in between). I'll see if I can do something about that performance problem as well. Read more about it at http://wiki.neo4j.org/content/Batch_Insert Please let me know your inputs. Thanks, Sue ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Self-referencing relationships anyone?
Hi all! I was playing around with adding support for relationships where the start node and end node are the same. I managed to come up with a nice litte patch that adds support for this to the current development version (trunk) of Neo4j. If anyone is feeling adventurous and want to try it out I would love to get feedback on this. Direct link to the patch: https://trac.neo4j.org/attachment/ticket/239/loopRelationships.patch Cheers, Tobias On Fri, Jun 18, 2010 at 1:17 PM, neo4j.org nore...@neo4j.org wrote: #239: Add support for relationships with same start node as end node -+-- Reporter: tobias | Owner: tobias Type: enhancement request | Status: new Priority: minor| Milestone: Component: kernel |Keywords: -+-- The attached patch applies against r4579. The unit tests in the patch all pass (as does all the previously existing unit tests), but more testing would be nice before committing this to trunk. -- Ticket URL: https://trac.neo4j.org/ticket/239 neo4j.org http://trac.neo4j.org/ The Neo4J.org Issue Tracker -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships anyone?
I would like to check it out... ...and so would I... ...and so would I... ...and so would I... -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Tobias Ivarsson Sent: Friday, June 18, 2010 7:24 AM To: Neo user discussions Subject: [Neo4j] Self-referencing relationships anyone? Hi all! I was playing around with adding support for relationships where the start node and end node are the same. I managed to come up with a nice litte patch that adds support for this to the current development version (trunk) of Neo4j. If anyone is feeling adventurous and want to try it out I would love to get feedback on this. Direct link to the patch: https://trac.neo4j.org/attachment/ticket/239/loopRelationships.patch Cheers, Tobias On Fri, Jun 18, 2010 at 1:17 PM, neo4j.org nore...@neo4j.org wrote: #239: Add support for relationships with same start node as end node -+-- Reporter: tobias | Owner: tobias Type: enhancement request | Status: new Priority: minor| Milestone: Component: kernel |Keywords: -+-- The attached patch applies against r4579. The unit tests in the patch all pass (as does all the previously existing unit tests), but more testing would be nice before committing this to trunk. -- Ticket URL: https://trac.neo4j.org/ticket/239 neo4j.org http://trac.neo4j.org/ The Neo4J.org Issue Tracker -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships anyone?
Very nice. This allows in the meta model to directly implement singleton classes. From: tobias.ivars...@neotechnology.com Date: Fri, 18 Jun 2010 13:23:40 +0200 To: user@lists.neo4j.org Subject: [Neo4j] Self-referencing relationships anyone? Hi all! I was playing around with adding support for relationships where the start node and end node are the same. I managed to come up with a nice litte patch that adds support for this to the current development version (trunk) of Neo4j. If anyone is feeling adventurous and want to try it out I would love to get feedback on this. Direct link to the patch: https://trac.neo4j.org/attachment/ticket/239/loopRelationships.patch Cheers, Tobias On Fri, Jun 18, 2010 at 1:17 PM, neo4j.org nore...@neo4j.org wrote: #239: Add support for relationships with same start node as end node -+-- Reporter: tobias | Owner: tobias Type: enhancement request | Status: new Priority: minor| Milestone: Component: kernel |Keywords: -+-- The attached patch applies against r4579. The unit tests in the patch all pass (as does all the previously existing unit tests), but more testing would be nice before committing this to trunk. -- Ticket URL: https://trac.neo4j.org/ticket/239 neo4j.org http://trac.neo4j.org/ The Neo4J.org Issue Tracker -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Can we commit at regular intervals in batch insert mode
Hello! The batch inserter worked pretty well for around 8 hours creating all the nodes (2 million) and around 80 million relationships out of a total of 105 million and after that it suddenly slowed down drastically, taking days, without throwing any errors (only some Java heap size warnings). The data was finally loaded in Neo4j after about 2 days. - Are there any optimizations that I can apply to avoid this behavior using batch inserter? (say close the db and start it again after around 75 million relationships are created?). I used the LuceneIndexBatchInserter and as mentioned before, first created all the nodes and then retrieved 2 nodes at a time to create relationships. The way I used it is as follows: //rs- resultset storing node information from my current Oracle 10g database //rs1- resultset storing relationship information from my current Oracle 10g database Batchinserter LuceneIndexService ... while(rs.next()) { //Create nodes using batch inserter) } optimize() while (rs1.next()) { //Retrieve node1, node2; //create relationship using BatchInserter } --- I have another question regarding query times on Neo4j. From your experience, how much are the typical query times for large graphs using neo4j for a simple query like: Find all the first level neighbors of all the nodes in the graph. I am running into hours (???) to run the above query on the graph that I created with ~ 2million nodes and 105 million relationships. --- Thanks, Suruchi Today's Topics: 1. Re: Can we commit at regular intervals in batch insert mode (Mattias Persson) -- Message: 1 Date: Fri, 18 Jun 2010 09:49:00 +0200 From: Mattias Persson matt...@neotechnology.com Subject: Re: [Neo4j] Can we commit at regular intervals in batch insert mode To: Neo4j user discussions user@lists.neo4j.org Message-ID: aanlktimlueof21lunhx94irfl6m1izcw9f3ecvw5b...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 2010/6/17 Suruchi Deodhar deodharsuru...@gmail.com: Hi all, I am a new user of neo4j graph database and I am trying to port my data from Oracle 10g into Neo4j using batch insert. I am basically reading node data from one table with around 2.5 million rows and relationship data from another table with ?around 105 million rows and creating nodes and relationships in Neo4j. While doing this and using the java -Xmx 4096M option also, I run into occasional warnings related to java heap size and creation of the database is taking days (I had only expected some hours). Just to be picky: java expects -Xmx4096M (written together) _not_ -Xmx 4096M (written apart). 1. Is there a provision to commit at regular intervals in batch insert mode? If you're using the BatchInserter (as opposed to GraphDatabaseService) you don't need to commit because there's no concept of transactions in batch insertion mode. 2. Am I doing something in a wrong manner or do I need to apply some optimizations while creating such a huge graph db? Has anyone encountered similar problems and resolved it? All I know is that it's usually no problems to insert tens of millions of nodes/relationships or more and quite fast as well. 3. Is this because of the Indexer used in Batch insert mode? I had created some 8000 modes without using batch inserter earlier and it took me only a few minutes. Something that can take up time is mixed reads and writes to/from the LuceneIndexBatchInserter. Try to group many writes and many reads together (with an optimize() in between). I'll see if I can do something about that performance problem as well. Read more about it at http://wiki.neo4j.org/content/Batch_Insert Please let ?me know your inputs. Thanks, Sue ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com -- ___ User mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user End of User Digest, Vol 39, Issue 34 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Can we commit at regular intervals in batch insert mode
Suruchi, this sounds much to slow. Is there any chance of you sending over the insert and query code and some small data sample off list to me? Would be great to see what is the problem. Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Fri, Jun 18, 2010 at 7:08 PM, Suruchi Deodhar deodharsuru...@gmail.com wrote: Hello! The batch inserter worked pretty well for around 8 hours creating all the nodes (2 million) and around 80 million relationships out of a total of 105 million and after that it suddenly slowed down drastically, taking days, without throwing any errors (only some Java heap size warnings). The data was finally loaded in Neo4j after about 2 days. - Are there any optimizations that I can apply to avoid this behavior using batch inserter? (say close the db and start it again after around 75 million relationships are created?). I used the LuceneIndexBatchInserter and as mentioned before, first created all the nodes and then retrieved 2 nodes at a time to create relationships. The way I used it is as follows: //rs- resultset storing node information from my current Oracle 10g database //rs1- resultset storing relationship information from my current Oracle 10g database Batchinserter LuceneIndexService ... while(rs.next()) { //Create nodes using batch inserter) } optimize() while (rs1.next()) { //Retrieve node1, node2; //create relationship using BatchInserter } --- I have another question regarding query times on Neo4j. From your experience, how much are the typical query times for large graphs using neo4j for a simple query like: Find all the first level neighbors of all the nodes in the graph. I am running into hours (???) to run the above query on the graph that I created with ~ 2million nodes and 105 million relationships. --- Thanks, Suruchi Today's Topics: 1. Re: Can we commit at regular intervals in batch insert mode (Mattias Persson) -- Message: 1 Date: Fri, 18 Jun 2010 09:49:00 +0200 From: Mattias Persson matt...@neotechnology.com Subject: Re: [Neo4j] Can we commit at regular intervals in batch insert mode To: Neo4j user discussions user@lists.neo4j.org Message-ID: aanlktimlueof21lunhx94irfl6m1izcw9f3ecvw5b...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 2010/6/17 Suruchi Deodhar deodharsuru...@gmail.com: Hi all, I am a new user of neo4j graph database and I am trying to port my data from Oracle 10g into Neo4j using batch insert. I am basically reading node data from one table with around 2.5 million rows and relationship data from another table with ?around 105 million rows and creating nodes and relationships in Neo4j. While doing this and using the java -Xmx 4096M option also, I run into occasional warnings related to java heap size and creation of the database is taking days (I had only expected some hours). Just to be picky: java expects -Xmx4096M (written together) _not_ -Xmx 4096M (written apart). 1. Is there a provision to commit at regular intervals in batch insert mode? If you're using the BatchInserter (as opposed to GraphDatabaseService) you don't need to commit because there's no concept of transactions in batch insertion mode. 2. Am I doing something in a wrong manner or do I need to apply some optimizations while creating such a huge graph db? Has anyone encountered similar problems and resolved it? All I know is that it's usually no problems to insert tens of millions of nodes/relationships or more and quite fast as well. 3. Is this because of the Indexer used in Batch insert mode? I had created some 8000 modes without using batch inserter earlier and it took me only a few minutes. Something that can take up time is mixed reads and writes to/from the LuceneIndexBatchInserter. Try to group many writes and many reads together (with an optimize() in between). I'll see if I can do something about that performance problem as well. Read more about it at http://wiki.neo4j.org/content/Batch_Insert Please let ?me know your inputs. Thanks, Sue ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com --