Re: [Neo4j] Can we commit at regular intervals in batch insert mode

2010-06-18 Thread Mattias Persson
2010/6/17 Suruchi Deodhar deodharsuru...@gmail.com:
 Hi all,

 I am a new user of neo4j graph database and I am trying to port my data from
 Oracle 10g into Neo4j using batch insert. I am basically reading node data
 from one table with around 2.5 million rows and relationship data from
 another table with  around 105 million rows and creating nodes and
 relationships in Neo4j.

 While doing this and using the java -Xmx 4096M option also, I run into
 occasional warnings related to java heap size and creation of the database
 is taking days (I had only expected some hours).

Just to be picky: java expects -Xmx4096M (written together) _not_ -Xmx
4096M (written apart).


 1. Is there a provision to commit at regular intervals in batch insert mode?

If you're using the BatchInserter (as opposed to GraphDatabaseService)
you don't need to commit because there's no concept of transactions in
batch insertion mode.


 2. Am I doing something in a wrong manner or do I need to apply some
 optimizations while creating such a huge graph db? Has anyone encountered
 similar problems and resolved it?

All I know is that it's usually no problems to insert tens of millions
of nodes/relationships or more and quite fast as well.


 3. Is this because of the Indexer used in Batch insert mode? I had created
 some 8000 modes without using batch inserter earlier and it took me only a
 few minutes.

Something that can take up time is mixed reads and writes to/from the
LuceneIndexBatchInserter. Try to group many writes and many reads
together (with an optimize() in between). I'll see if I can do
something about that performance problem as well.

Read more about it at http://wiki.neo4j.org/content/Batch_Insert


 Please let  me know your inputs.

 Thanks,
 Sue
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Self-referencing relationships anyone?

2010-06-18 Thread Tobias Ivarsson
Hi all!

I was playing around with adding support for relationships where the start
node and end node are the same. I managed to come up with a nice litte patch
that adds support for this to the current development version (trunk) of
Neo4j. If anyone is feeling adventurous and want to try it out I would love
to get feedback on this.

Direct link to the patch:
https://trac.neo4j.org/attachment/ticket/239/loopRelationships.patch

Cheers,
Tobias

On Fri, Jun 18, 2010 at 1:17 PM, neo4j.org nore...@neo4j.org wrote:

 #239: Add support for relationships with same start node as end node

 -+--
  Reporter:  tobias   |   Owner:  tobias
 Type:  enhancement request  |  Status:  new
  Priority:  minor|   Milestone:
 Component:  kernel   |Keywords:

 -+--
  The attached patch applies against r4579. The unit tests in the patch all
  pass (as does all the previously existing unit tests), but more testing
  would be nice before committing this to trunk.

 --
 Ticket URL: https://trac.neo4j.org/ticket/239
 neo4j.org http://trac.neo4j.org/
 The Neo4J.org Issue Tracker




-- 
Tobias Ivarsson tobias.ivars...@neotechnology.com
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Self-referencing relationships anyone?

2010-06-18 Thread Rick Bullotta
I would like to check it out...
...and so would I...
...and so would I...
...and so would I...


-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Tobias Ivarsson
Sent: Friday, June 18, 2010 7:24 AM
To: Neo user discussions
Subject: [Neo4j] Self-referencing relationships anyone?

Hi all!

I was playing around with adding support for relationships where the start
node and end node are the same. I managed to come up with a nice litte patch
that adds support for this to the current development version (trunk) of
Neo4j. If anyone is feeling adventurous and want to try it out I would love
to get feedback on this.

Direct link to the patch:
https://trac.neo4j.org/attachment/ticket/239/loopRelationships.patch

Cheers,
Tobias

On Fri, Jun 18, 2010 at 1:17 PM, neo4j.org nore...@neo4j.org wrote:

 #239: Add support for relationships with same start node as end node


-+--
  Reporter:  tobias   |   Owner:  tobias
 Type:  enhancement request  |  Status:  new
  Priority:  minor|   Milestone:
 Component:  kernel   |Keywords:


-+--
  The attached patch applies against r4579. The unit tests in the patch all
  pass (as does all the previously existing unit tests), but more testing
  would be nice before committing this to trunk.

 --
 Ticket URL: https://trac.neo4j.org/ticket/239
 neo4j.org http://trac.neo4j.org/
 The Neo4J.org Issue Tracker




-- 
Tobias Ivarsson tobias.ivars...@neotechnology.com
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Self-referencing relationships anyone?

2010-06-18 Thread Niels Hoogeveen

Very nice. This allows in the meta model to directly implement singleton 
classes.

 From: tobias.ivars...@neotechnology.com
 Date: Fri, 18 Jun 2010 13:23:40 +0200
 To: user@lists.neo4j.org
 Subject: [Neo4j] Self-referencing relationships anyone?
 
 Hi all!
 
 I was playing around with adding support for relationships where the start
 node and end node are the same. I managed to come up with a nice litte patch
 that adds support for this to the current development version (trunk) of
 Neo4j. If anyone is feeling adventurous and want to try it out I would love
 to get feedback on this.
 
 Direct link to the patch:
 https://trac.neo4j.org/attachment/ticket/239/loopRelationships.patch
 
 Cheers,
 Tobias
 
 On Fri, Jun 18, 2010 at 1:17 PM, neo4j.org nore...@neo4j.org wrote:
 
  #239: Add support for relationships with same start node as end node
 
  -+--
   Reporter:  tobias   |   Owner:  tobias
  Type:  enhancement request  |  Status:  new
   Priority:  minor|   Milestone:
  Component:  kernel   |Keywords:
 
  -+--
   The attached patch applies against r4579. The unit tests in the patch all
   pass (as does all the previously existing unit tests), but more testing
   would be nice before committing this to trunk.
 
  --
  Ticket URL: https://trac.neo4j.org/ticket/239
  neo4j.org http://trac.neo4j.org/
  The Neo4J.org Issue Tracker
 
 
 
 
 -- 
 Tobias Ivarsson tobias.ivars...@neotechnology.com
 Hacker, Neo Technology
 www.neotechnology.com
 Cellphone: +46 706 534857
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
  
_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Can we commit at regular intervals in batch insert mode

2010-06-18 Thread Suruchi Deodhar
Hello!

The batch inserter worked pretty well for around 8 hours creating all the
nodes (2 million) and around 80 million relationships out of a total of 105
million and after that it suddenly slowed down drastically, taking days,
without throwing any errors (only some Java heap size warnings).
The data was finally loaded in Neo4j after about 2 days.

- Are there any optimizations that I can apply to avoid this behavior using
batch inserter? (say close the db and start it again after around 75 million
relationships are created?).
I used the LuceneIndexBatchInserter and as mentioned before, first created
all the nodes and then retrieved 2 nodes at a time to create relationships.
The way I used it is as follows:

//rs- resultset storing node information from my current Oracle 10g database
//rs1- resultset storing relationship information from my current Oracle 10g
database

Batchinserter 
LuceneIndexService ...

while(rs.next()) {
//Create nodes using batch inserter)
}

optimize()

while (rs1.next()) {
//Retrieve node1, node2;
//create relationship using BatchInserter
}

---

I have another question regarding query times on Neo4j. From your
experience, how much are the typical query times for large graphs using
neo4j for a simple query like:
Find all the first level neighbors of all the nodes in the graph.

I am running into hours (???) to run the above query on the graph that I
created with ~ 2million nodes and 105 million relationships.

---

Thanks,
Suruchi




 Today's Topics:

   1. Re:  Can we commit at regular intervals in batch insert mode
  (Mattias Persson)


 --

 Message: 1
 Date: Fri, 18 Jun 2010 09:49:00 +0200
 From: Mattias Persson matt...@neotechnology.com
 Subject: Re: [Neo4j] Can we commit at regular intervals in batch
insert mode
 To: Neo4j user discussions user@lists.neo4j.org
 Message-ID:
aanlktimlueof21lunhx94irfl6m1izcw9f3ecvw5b...@mail.gmail.com
 Content-Type: text/plain; charset=UTF-8

 2010/6/17 Suruchi Deodhar deodharsuru...@gmail.com:
  Hi all,
 
  I am a new user of neo4j graph database and I am trying to port my data
 from
  Oracle 10g into Neo4j using batch insert. I am basically reading node
 data
  from one table with around 2.5 million rows and relationship data from
  another table with ?around 105 million rows and creating nodes and
  relationships in Neo4j.
 
  While doing this and using the java -Xmx 4096M option also, I run into
  occasional warnings related to java heap size and creation of the
 database
  is taking days (I had only expected some hours).

 Just to be picky: java expects -Xmx4096M (written together) _not_ -Xmx
 4096M (written apart).

 
  1. Is there a provision to commit at regular intervals in batch insert
 mode?

 If you're using the BatchInserter (as opposed to GraphDatabaseService)
 you don't need to commit because there's no concept of transactions in
 batch insertion mode.

 
  2. Am I doing something in a wrong manner or do I need to apply some
  optimizations while creating such a huge graph db? Has anyone encountered
  similar problems and resolved it?

 All I know is that it's usually no problems to insert tens of millions
 of nodes/relationships or more and quite fast as well.

 
  3. Is this because of the Indexer used in Batch insert mode? I had
 created
  some 8000 modes without using batch inserter earlier and it took me only
 a
  few minutes.

 Something that can take up time is mixed reads and writes to/from the
 LuceneIndexBatchInserter. Try to group many writes and many reads
 together (with an optimize() in between). I'll see if I can do
 something about that performance problem as well.

 Read more about it at http://wiki.neo4j.org/content/Batch_Insert

 
  Please let ?me know your inputs.
 
  Thanks,
  Sue
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com


 --

 ___
 User mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


 End of User Digest, Vol 39, Issue 34
 

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Can we commit at regular intervals in batch insert mode

2010-06-18 Thread Peter Neubauer
Suruchi,
this sounds much to slow. Is there any chance of you sending over the
insert and query code and some small data sample off list to me? Would
be great to see what is the problem.

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


On Fri, Jun 18, 2010 at 7:08 PM, Suruchi Deodhar
deodharsuru...@gmail.com wrote:

 Hello!

 The batch inserter worked pretty well for around 8 hours creating all the
 nodes (2 million) and around 80 million relationships out of a total of 105
 million and after that it suddenly slowed down drastically, taking days,
 without throwing any errors (only some Java heap size warnings).
 The data was finally loaded in Neo4j after about 2 days.

 - Are there any optimizations that I can apply to avoid this behavior using
 batch inserter? (say close the db and start it again after around 75 million
 relationships are created?).
 I used the LuceneIndexBatchInserter and as mentioned before, first created
 all the nodes and then retrieved 2 nodes at a time to create relationships.
 The way I used it is as follows:

 //rs- resultset storing node information from my current Oracle 10g database
 //rs1- resultset storing relationship information from my current Oracle 10g
 database

 Batchinserter 
 LuceneIndexService ...

 while(rs.next()) {
 //Create nodes using batch inserter)
 }

 optimize()

 while (rs1.next()) {
 //Retrieve node1, node2;
 //create relationship using BatchInserter
 }

 ---

 I have another question regarding query times on Neo4j. From your
 experience, how much are the typical query times for large graphs using
 neo4j for a simple query like:
 Find all the first level neighbors of all the nodes in the graph.

 I am running into hours (???) to run the above query on the graph that I
 created with ~ 2million nodes and 105 million relationships.

 ---

 Thanks,
 Suruchi




  Today's Topics:
 
    1. Re:  Can we commit at regular intervals in batch insert mode
       (Mattias Persson)
 
 
  --
 
  Message: 1
  Date: Fri, 18 Jun 2010 09:49:00 +0200
  From: Mattias Persson matt...@neotechnology.com
  Subject: Re: [Neo4j] Can we commit at regular intervals in batch
         insert mode
  To: Neo4j user discussions user@lists.neo4j.org
  Message-ID:
         aanlktimlueof21lunhx94irfl6m1izcw9f3ecvw5b...@mail.gmail.com
  Content-Type: text/plain; charset=UTF-8
 
  2010/6/17 Suruchi Deodhar deodharsuru...@gmail.com:
   Hi all,
  
   I am a new user of neo4j graph database and I am trying to port my data
  from
   Oracle 10g into Neo4j using batch insert. I am basically reading node
  data
   from one table with around 2.5 million rows and relationship data from
   another table with ?around 105 million rows and creating nodes and
   relationships in Neo4j.
  
   While doing this and using the java -Xmx 4096M option also, I run into
   occasional warnings related to java heap size and creation of the
  database
   is taking days (I had only expected some hours).
 
  Just to be picky: java expects -Xmx4096M (written together) _not_ -Xmx
  4096M (written apart).
 
  
   1. Is there a provision to commit at regular intervals in batch insert
  mode?
 
  If you're using the BatchInserter (as opposed to GraphDatabaseService)
  you don't need to commit because there's no concept of transactions in
  batch insertion mode.
 
  
   2. Am I doing something in a wrong manner or do I need to apply some
   optimizations while creating such a huge graph db? Has anyone encountered
   similar problems and resolved it?
 
  All I know is that it's usually no problems to insert tens of millions
  of nodes/relationships or more and quite fast as well.
 
  
   3. Is this because of the Indexer used in Batch insert mode? I had
  created
   some 8000 modes without using batch inserter earlier and it took me only
  a
   few minutes.
 
  Something that can take up time is mixed reads and writes to/from the
  LuceneIndexBatchInserter. Try to group many writes and many reads
  together (with an optimize() in between). I'll see if I can do
  something about that performance problem as well.
 
  Read more about it at http://wiki.neo4j.org/content/Batch_Insert
 
  
   Please let ?me know your inputs.
  
   Thanks,
   Sue
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
 
 
  --
  Mattias Persson, [matt...@neotechnology.com]
  Hacker, Neo Technology
  www.neotechnology.com
 
 
  --