Re: [Neo] Persistance of nodeID

2009-12-26 Thread Laurent Laborde
On Sat, Nov 21, 2009 at 11:35 AM, Laurent Laborde kerdez...@gmail.com wrote:

 My neo4j database won't be critical for the production, it can go
 down and/or crash at anytime without breaking the production platform.
 But... populating this database will take a lot of time and some
 usefull ressource. So i'll try to not break it by using a too unstable
 api :)

 I'm not 100% sure about the content of every node. But i think about a
 few GB at minimum... up to hundred's of GB.
 I'll see... it's a RD project anyway.
 But if it perform well and efficiently (i mean : faster and cheaper in
 ressource than doing the same thing on our postresql cluster) the
 database could be well over 100GB, not including FTS index that i'm
 planning to use too :)

 I'm not 100% sure about what i'm doing here, that's all the fun of RD.
 Be sure that you will have as much feedback as possible as long as it
 doesn't disclose the goals of the projects and the content of my
 database (NDA, etc ...)

While i'm in vacation i have some times to play with neo.
After some cleaning and testing, i'm finally populating the database
with real data.
My code is not 100% fail-safe, but the program that populate the
database is running non-stop since 3 days, without a single error.

The next step will be the exploitation of the content of the database.
Neoclipse gave up a long time ago, i am expecting around 20.000 nodes
and ... mmm...  a million relationship (with different type) ?

the database directory is currently 4.5GB ... i hope i didn't forgot a
transaction somewhere that wrap everything and will rollback ~4GB of
data if the program die :(

i don't really understand the behaviour of the database if i have
nested transaction.
or what the database do if the program crash... it clean all
un-finished transaction from DB, right ?

Ho and ... Merry christmas :)

-- 
Laurent ker2x Laborde
Sysadmin  DBA at http://www.over-blog.com/
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Persistance of nodeID

2009-12-26 Thread Anders Nawroth
Hi Laurent!

 The next step will be the exploitation of the content of the database.
 Neoclipse gave up a long time ago, i am expecting around 20.000 nodes
 and ... mmm...  a million relationship (with different type) ?
   

What limits Neoclipse is the number of nodes/relationships which can be 
viewed at the same time. After only a few hundred of those, the GUI 
library it uses will experience problems (and should probably get 
replaced by something else). Keep the traversal depth as low as 
possible, and if that isn't enough, use some other tool like the Neo4j 
Shell: http://wiki.neo4j.org/content/Shell


/anders

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-26 Thread Peter Neubauer
Hi Núria,
the current ID-scheme of using Integers for IDs for both Nodes,
Relationships and Properties limits the possible node space size to 4
Billion nodes, 4 Billion Relationships and 4 Billion properties. Of
course one could switch to Long as IDs, but that will increase the
reserved amount of bytes and cause possible performance penalties.
However, this is the current limit, after that you have to start
thinking about sharding along a suitable domain-specific criteria.
What size and domain are you imagining?

However, when dealing with bigger nodespaces you probably want to
increase RAM of your server machine and think about SSD in order to
keep the often-used parts of your graph cached and minimize IO cost.

HTH

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk:  neubauer.peter
Skype   peter.neubauer
Phone   +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter  http://twitter.com/peterneubauer

http://www.neo4j.org- Relationships count.
http://gremlin.tinkerpop.com- PageRank in 2 lines of code.
http://www.linkedprocess.org   - Computing at LinkedData scale.



On Sat, Dec 26, 2009 at 4:10 PM, Núria Trench nuriatre...@gmail.com wrote:
 Hi,

 I have just finished parsing and creating the database with the latest
 index-util-0.9-SNAPSHOT available in your repository. It has been finished
 succesfully so I must thank you for your interest and useful help.
 And, finally, I have one last question. I have been created 180 million of
 edges and 20 million of nodes. Is it possible to create a bigger amount of
 edges and nodes with Neo4j? Do you have a limit?

 Thank your very much again.

 2009/12/21 Núria Trench nuriatre...@gmail.com

 Hi again Mattias,

 I'm still trying to parse all the data in order to create the graph. I will
 report the results as soon as possible.
 Thank you very much for your interest.

 Núria.

 2009/12/21 Mattias Persson matt...@neotechnology.com

 Hi again,

 any luck with this yet?

 2009/12/11 Núria Trench nuriatre...@gmail.com:
  Thank you very much Mattias. I will test it as soon as possible and I'll
  will tell you something.
 
  Núria.
 
  2009/12/11 Mattias Persson matt...@neotechnology.com
 
  I've tried this a couple of times now and first of all I see some
  problems in your code:
 
  1) In the method createRelationsTitleImage you have an inverted head
  != -1 check where it should be head == -1
 
  2) You index relationships in createRelationsBetweenTitles method,
  this isn't ok since the index can only manage nodes.
 
  And I recently committed a fix which removed the caching layer in
  the LuceneIndexBatchInserterImpl (and therefore also
  LuceneFulltextIndexBatchInserter). This probably fixes your problems.
  I'm also working on a performance fix which makes consecutive getNodes
  calls faster.
 
  So I think that with these fixes (1) and (2) and the latest index-util
  0.9-SNAPSHOT your sample will run fine. Also you could try without
  calling optimize. See more information at
  http://wiki.neo4j.org/content/Indexing_with_BatchInserter
 
  2009/12/10 Mattias Persson matt...@neotechnology.com:
   To continue this thread in the user list:
  
   Thanks Núria, I've gotten your samples code/files and I'm running it
   now to try to reproduce you problem.
  
   2009/12/9 Núria Trench nuriatre...@gmail.com:
   I have finished uploading the 4 csv files. You'll see an e-mail with
 the
   other 3 csv files packed in a rar file.
   Thanks,
  
   Núria.
  
   2009/12/9 Núria Trench nuriatre...@gmail.com
  
   Yes, you are right. But there is one csv file that is too big to be
  packed
   with other files and I am reducing it.
   I am sending the other files now.
  
   2009/12/9 Mattias Persson matt...@neotechnology.com
  
   By the way, you might consider packing those files (with zip or
 tar.gz
   or something) cause they will shrink quite well
  
   2009/12/9 Mattias Persson matt...@neotechnology.com:
Great, but I only got the images.csv file... I'm starting to
 test
  with
that at least
   
2009/12/9 Núria Trench nuriatre...@gmail.com:
Hi again,
   
The errors show up after being parsed 2 csv files to create all
 the
nodes,
just in the moment of calling the method getSingleNode for
  looking
up the
tail and head node for creating all the edges by reading the
 other
  two
csv
files.
   
I am sending with Sprend the four csv files that will help you
 to
trigger
index behaviour.
   
Thank you,
   
Núria.
   
2009/12/9 Mattias Persson matt...@neotechnology.com
   
Hmm, I've no idea... but does the errors show up early in the
  process
or do you have to insert a LOT of data to trigger it? In such
 case
you
could send me a part of them... maybe using
 http://www.sprend.se,
WDYT?
   
2009/12/9 Núria Trench nuriatre...@gmail.com:
 Hi Mattias,

 The data isn't confident but the files are very big (5,5
 GB).
 

Re: [Neo] Persistance of nodeID

2009-12-26 Thread Mattias Persson
2009/12/26 Laurent Laborde kerdez...@gmail.com:
 On Sat, Nov 21, 2009 at 11:35 AM, Laurent Laborde kerdez...@gmail.com wrote:

 My neo4j database won't be critical for the production, it can go
 down and/or crash at anytime without breaking the production platform.
 But... populating this database will take a lot of time and some
 usefull ressource. So i'll try to not break it by using a too unstable
 api :)

 I'm not 100% sure about the content of every node. But i think about a
 few GB at minimum... up to hundred's of GB.
 I'll see... it's a RD project anyway.
 But if it perform well and efficiently (i mean : faster and cheaper in
 ressource than doing the same thing on our postresql cluster) the
 database could be well over 100GB, not including FTS index that i'm
 planning to use too :)

 I'm not 100% sure about what i'm doing here, that's all the fun of RD.
 Be sure that you will have as much feedback as possible as long as it
 doesn't disclose the goals of the projects and the content of my
 database (NDA, etc ...)

 While i'm in vacation i have some times to play with neo.
 After some cleaning and testing, i'm finally populating the database
 with real data.
 My code is not 100% fail-safe, but the program that populate the
 database is running non-stop since 3 days, without a single error.

 The next step will be the exploitation of the content of the database.
 Neoclipse gave up a long time ago, i am expecting around 20.000 nodes
 and ... mmm...  a million relationship (with different type) ?

 the database directory is currently 4.5GB ... i hope i didn't forgot a
 transaction somewhere that wrap everything and will rollback ~4GB of
 data if the program die :(

 i don't really understand the behaviour of the database if i have
 nested transaction.
 or what the database do if the program crash... it clean all
 un-finished transaction from DB, right ?
Read more about transactions in neo4j (and about nested transactions)
here: http://wiki.neo4j.org/content/Neo_Transactions

yep any uncommitted transactions will be rolled back (not committed).
Transactions which are half-way through the commit process will be
fully committed if possible, else fully rolled back.

 Ho and ... Merry christmas :)
Merry Christmas :)

 --
 Laurent ker2x Laborde
 Sysadmin  DBA at http://www.over-blog.com/
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user