Re: [Neo] Persistance of nodeID
On Sat, Nov 21, 2009 at 11:35 AM, Laurent Laborde kerdez...@gmail.com wrote: My neo4j database won't be critical for the production, it can go down and/or crash at anytime without breaking the production platform. But... populating this database will take a lot of time and some usefull ressource. So i'll try to not break it by using a too unstable api :) I'm not 100% sure about the content of every node. But i think about a few GB at minimum... up to hundred's of GB. I'll see... it's a RD project anyway. But if it perform well and efficiently (i mean : faster and cheaper in ressource than doing the same thing on our postresql cluster) the database could be well over 100GB, not including FTS index that i'm planning to use too :) I'm not 100% sure about what i'm doing here, that's all the fun of RD. Be sure that you will have as much feedback as possible as long as it doesn't disclose the goals of the projects and the content of my database (NDA, etc ...) While i'm in vacation i have some times to play with neo. After some cleaning and testing, i'm finally populating the database with real data. My code is not 100% fail-safe, but the program that populate the database is running non-stop since 3 days, without a single error. The next step will be the exploitation of the content of the database. Neoclipse gave up a long time ago, i am expecting around 20.000 nodes and ... mmm... a million relationship (with different type) ? the database directory is currently 4.5GB ... i hope i didn't forgot a transaction somewhere that wrap everything and will rollback ~4GB of data if the program die :( i don't really understand the behaviour of the database if i have nested transaction. or what the database do if the program crash... it clean all un-finished transaction from DB, right ? Ho and ... Merry christmas :) -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Persistance of nodeID
Hi Laurent! The next step will be the exploitation of the content of the database. Neoclipse gave up a long time ago, i am expecting around 20.000 nodes and ... mmm... a million relationship (with different type) ? What limits Neoclipse is the number of nodes/relationships which can be viewed at the same time. After only a few hundred of those, the GUI library it uses will experience problems (and should probably get replaced by something else). Keep the traversal depth as low as possible, and if that isn't enough, use some other tool like the Neo4j Shell: http://wiki.neo4j.org/content/Shell /anders ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Núria, the current ID-scheme of using Integers for IDs for both Nodes, Relationships and Properties limits the possible node space size to 4 Billion nodes, 4 Billion Relationships and 4 Billion properties. Of course one could switch to Long as IDs, but that will increase the reserved amount of bytes and cause possible performance penalties. However, this is the current limit, after that you have to start thinking about sharding along a suitable domain-specific criteria. What size and domain are you imagining? However, when dealing with bigger nodespaces you probably want to increase RAM of your server machine and think about SSD in order to keep the often-used parts of your graph cached and minimize IO cost. HTH Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org- Relationships count. http://gremlin.tinkerpop.com- PageRank in 2 lines of code. http://www.linkedprocess.org - Computing at LinkedData scale. On Sat, Dec 26, 2009 at 4:10 PM, Núria Trench nuriatre...@gmail.com wrote: Hi, I have just finished parsing and creating the database with the latest index-util-0.9-SNAPSHOT available in your repository. It has been finished succesfully so I must thank you for your interest and useful help. And, finally, I have one last question. I have been created 180 million of edges and 20 million of nodes. Is it possible to create a bigger amount of edges and nodes with Neo4j? Do you have a limit? Thank your very much again. 2009/12/21 Núria Trench nuriatre...@gmail.com Hi again Mattias, I'm still trying to parse all the data in order to create the graph. I will report the results as soon as possible. Thank you very much for your interest. Núria. 2009/12/21 Mattias Persson matt...@neotechnology.com Hi again, any luck with this yet? 2009/12/11 Núria Trench nuriatre...@gmail.com: Thank you very much Mattias. I will test it as soon as possible and I'll will tell you something. Núria. 2009/12/11 Mattias Persson matt...@neotechnology.com I've tried this a couple of times now and first of all I see some problems in your code: 1) In the method createRelationsTitleImage you have an inverted head != -1 check where it should be head == -1 2) You index relationships in createRelationsBetweenTitles method, this isn't ok since the index can only manage nodes. And I recently committed a fix which removed the caching layer in the LuceneIndexBatchInserterImpl (and therefore also LuceneFulltextIndexBatchInserter). This probably fixes your problems. I'm also working on a performance fix which makes consecutive getNodes calls faster. So I think that with these fixes (1) and (2) and the latest index-util 0.9-SNAPSHOT your sample will run fine. Also you could try without calling optimize. See more information at http://wiki.neo4j.org/content/Indexing_with_BatchInserter 2009/12/10 Mattias Persson matt...@neotechnology.com: To continue this thread in the user list: Thanks Núria, I've gotten your samples code/files and I'm running it now to try to reproduce you problem. 2009/12/9 Núria Trench nuriatre...@gmail.com: I have finished uploading the 4 csv files. You'll see an e-mail with the other 3 csv files packed in a rar file. Thanks, Núria. 2009/12/9 Núria Trench nuriatre...@gmail.com Yes, you are right. But there is one csv file that is too big to be packed with other files and I am reducing it. I am sending the other files now. 2009/12/9 Mattias Persson matt...@neotechnology.com By the way, you might consider packing those files (with zip or tar.gz or something) cause they will shrink quite well 2009/12/9 Mattias Persson matt...@neotechnology.com: Great, but I only got the images.csv file... I'm starting to test with that at least 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi again, The errors show up after being parsed 2 csv files to create all the nodes, just in the moment of calling the method getSingleNode for looking up the tail and head node for creating all the edges by reading the other two csv files. I am sending with Sprend the four csv files that will help you to trigger index behaviour. Thank you, Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hmm, I've no idea... but does the errors show up early in the process or do you have to insert a LOT of data to trigger it? In such case you could send me a part of them... maybe using http://www.sprend.se, WDYT? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, The data isn't confident but the files are very big (5,5 GB).
Re: [Neo] Persistance of nodeID
2009/12/26 Laurent Laborde kerdez...@gmail.com: On Sat, Nov 21, 2009 at 11:35 AM, Laurent Laborde kerdez...@gmail.com wrote: My neo4j database won't be critical for the production, it can go down and/or crash at anytime without breaking the production platform. But... populating this database will take a lot of time and some usefull ressource. So i'll try to not break it by using a too unstable api :) I'm not 100% sure about the content of every node. But i think about a few GB at minimum... up to hundred's of GB. I'll see... it's a RD project anyway. But if it perform well and efficiently (i mean : faster and cheaper in ressource than doing the same thing on our postresql cluster) the database could be well over 100GB, not including FTS index that i'm planning to use too :) I'm not 100% sure about what i'm doing here, that's all the fun of RD. Be sure that you will have as much feedback as possible as long as it doesn't disclose the goals of the projects and the content of my database (NDA, etc ...) While i'm in vacation i have some times to play with neo. After some cleaning and testing, i'm finally populating the database with real data. My code is not 100% fail-safe, but the program that populate the database is running non-stop since 3 days, without a single error. The next step will be the exploitation of the content of the database. Neoclipse gave up a long time ago, i am expecting around 20.000 nodes and ... mmm... a million relationship (with different type) ? the database directory is currently 4.5GB ... i hope i didn't forgot a transaction somewhere that wrap everything and will rollback ~4GB of data if the program die :( i don't really understand the behaviour of the database if i have nested transaction. or what the database do if the program crash... it clean all un-finished transaction from DB, right ? Read more about transactions in neo4j (and about nested transactions) here: http://wiki.neo4j.org/content/Neo_Transactions yep any uncommitted transactions will be rolled back (not committed). Transactions which are half-way through the commit process will be fully committed if possible, else fully rolled back. Ho and ... Merry christmas :) Merry Christmas :) -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user