Re: [Neo] LuceneIndexBatchInserter doubt
Hi Peter, The limits that you have specified is enough for me. Thanks you again. Núria. 2009/12/26 Peter Neubauer > Hi Núria, > the current ID-scheme of using Integers for IDs for both Nodes, > Relationships and Properties limits the possible node space size to 4 > Billion nodes, 4 Billion Relationships and 4 Billion properties. Of > course one could switch to Long as IDs, but that will increase the > reserved amount of bytes and cause possible performance penalties. > However, this is the current limit, after that you have to start > thinking about sharding along a suitable domain-specific criteria. > What size and domain are you imagining? > > However, when dealing with bigger nodespaces you probably want to > increase RAM of your server machine and think about SSD in order to > keep the often-used parts of your graph cached and minimize IO cost. > > HTH > > Cheers, > > /peter neubauer > > COO and Sales, Neo Technology > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org- Relationships count. > http://gremlin.tinkerpop.com- PageRank in 2 lines of code. > http://www.linkedprocess.org - Computing at LinkedData scale. > > > > On Sat, Dec 26, 2009 at 4:10 PM, Núria Trench > wrote: > > Hi, > > > > I have just finished parsing and creating the database with the latest > > index-util-0.9-SNAPSHOT available in your repository. It has been > finished > > succesfully so I must thank you for your interest and useful help. > > And, finally, I have one last question. I have been created 180 million > of > > edges and 20 million of nodes. Is it possible to create a bigger amount > of > > edges and nodes with Neo4j? Do you have a limit? > > > > Thank your very much again. > > > > 2009/12/21 Núria Trench > > > >> Hi again Mattias, > >> > >> I'm still trying to parse all the data in order to create the graph. I > will > >> report the results as soon as possible. > >> Thank you very much for your interest. > >> > >> Núria. > >> > >> 2009/12/21 Mattias Persson > >> > >> Hi again, > >>> > >>> any luck with this yet? > >>> > >>> 2009/12/11 Núria Trench : > >>> > Thank you very much Mattias. I will test it as soon as possible and > I'll > >>> > will tell you something. > >>> > > >>> > Núria. > >>> > > >>> > 2009/12/11 Mattias Persson > >>> > > >>> >> I've tried this a couple of times now and first of all I see some > >>> >> problems in your code: > >>> >> > >>> >> 1) In the method createRelationsTitleImage you have an inverted > "head > >>> >> != -1" check where it should be "head == -1" > >>> >> > >>> >> 2) You index relationships in createRelationsBetweenTitles method, > >>> >> this isn't ok since the index can only manage nodes. > >>> >> > >>> >> And I recently committed a "fix" which removed the caching layer in > >>> >> the LuceneIndexBatchInserterImpl (and therefore also > >>> >> LuceneFulltextIndexBatchInserter). This probably fixes your > problems. > >>> >> I'm also working on a performance fix which makes consecutive > getNodes > >>> >> calls faster. > >>> >> > >>> >> So I think that with these fixes (1) and (2) and the latest > index-util > >>> >> 0.9-SNAPSHOT your sample will run fine. Also you could try without > >>> >> calling optimize. See more information at > >>> >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter > >>> >> > >>> >> 2009/12/10 Mattias Persson : > >>> >> > To continue this thread in the user list: > >>> >> > > >>> >> > Thanks Núria, I've gotten your samples code/files and I'm running > it > >>> >> > now to try to reproduce you problem. > >>> >> > > >>> >> > 2009/12/9 Núria Trench : > >>> >> >> I have finished uploading the 4 csv files. You'll see an e-mail > with > >>> the > >>> >> >> other 3 csv files packed in a rar file. > >>> >> >> Thanks, > >>> >> >> > >>> >> >> Núria. > >>> >> >> > >>> >> >> 2009/12/9 Núria Trench > >>> >> >>> > >>> >> >>> Yes, you are right. But there is one csv file that is too big to > be > >>> >> packed > >>> >> >>> with other files and I am reducing it. > >>> >> >>> I am sending the other files now. > >>> >> >>> > >>> >> >>> 2009/12/9 Mattias Persson > >>> >> > >>> >> By the way, you might consider packing those files (with zip or > >>> tar.gz > >>> >> or something) cause they will shrink quite well > >>> >> > >>> >> 2009/12/9 Mattias Persson : > >>> >> > Great, but I only got the images.csv file... I'm starting to > >>> test > >>> >> with > >>> >> > that at least > >>> >> > > >>> >> > 2009/12/9 Núria Trench : > >>> >> >> Hi again, > >>> >> >> > >>> >> >> The errors show up after being parsed 2 csv files to create > all > >>> the > >>> >> >> nodes, > >>> >> >> just in the moment of calling the method "getSingleNode" for > >>> >> looking > >>> >> >> up the > >>> >> >> tail and head node for cre
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Núria, the current ID-scheme of using Integers for IDs for both Nodes, Relationships and Properties limits the possible node space size to 4 Billion nodes, 4 Billion Relationships and 4 Billion properties. Of course one could switch to Long as IDs, but that will increase the reserved amount of bytes and cause possible performance penalties. However, this is the current limit, after that you have to start thinking about sharding along a suitable domain-specific criteria. What size and domain are you imagining? However, when dealing with bigger nodespaces you probably want to increase RAM of your server machine and think about SSD in order to keep the often-used parts of your graph cached and minimize IO cost. HTH Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org- Relationships count. http://gremlin.tinkerpop.com- PageRank in 2 lines of code. http://www.linkedprocess.org - Computing at LinkedData scale. On Sat, Dec 26, 2009 at 4:10 PM, Núria Trench wrote: > Hi, > > I have just finished parsing and creating the database with the latest > index-util-0.9-SNAPSHOT available in your repository. It has been finished > succesfully so I must thank you for your interest and useful help. > And, finally, I have one last question. I have been created 180 million of > edges and 20 million of nodes. Is it possible to create a bigger amount of > edges and nodes with Neo4j? Do you have a limit? > > Thank your very much again. > > 2009/12/21 Núria Trench > >> Hi again Mattias, >> >> I'm still trying to parse all the data in order to create the graph. I will >> report the results as soon as possible. >> Thank you very much for your interest. >> >> Núria. >> >> 2009/12/21 Mattias Persson >> >> Hi again, >>> >>> any luck with this yet? >>> >>> 2009/12/11 Núria Trench : >>> > Thank you very much Mattias. I will test it as soon as possible and I'll >>> > will tell you something. >>> > >>> > Núria. >>> > >>> > 2009/12/11 Mattias Persson >>> > >>> >> I've tried this a couple of times now and first of all I see some >>> >> problems in your code: >>> >> >>> >> 1) In the method createRelationsTitleImage you have an inverted "head >>> >> != -1" check where it should be "head == -1" >>> >> >>> >> 2) You index relationships in createRelationsBetweenTitles method, >>> >> this isn't ok since the index can only manage nodes. >>> >> >>> >> And I recently committed a "fix" which removed the caching layer in >>> >> the LuceneIndexBatchInserterImpl (and therefore also >>> >> LuceneFulltextIndexBatchInserter). This probably fixes your problems. >>> >> I'm also working on a performance fix which makes consecutive getNodes >>> >> calls faster. >>> >> >>> >> So I think that with these fixes (1) and (2) and the latest index-util >>> >> 0.9-SNAPSHOT your sample will run fine. Also you could try without >>> >> calling optimize. See more information at >>> >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter >>> >> >>> >> 2009/12/10 Mattias Persson : >>> >> > To continue this thread in the user list: >>> >> > >>> >> > Thanks Núria, I've gotten your samples code/files and I'm running it >>> >> > now to try to reproduce you problem. >>> >> > >>> >> > 2009/12/9 Núria Trench : >>> >> >> I have finished uploading the 4 csv files. You'll see an e-mail with >>> the >>> >> >> other 3 csv files packed in a rar file. >>> >> >> Thanks, >>> >> >> >>> >> >> Núria. >>> >> >> >>> >> >> 2009/12/9 Núria Trench >>> >> >>> >>> >> >>> Yes, you are right. But there is one csv file that is too big to be >>> >> packed >>> >> >>> with other files and I am reducing it. >>> >> >>> I am sending the other files now. >>> >> >>> >>> >> >>> 2009/12/9 Mattias Persson >>> >> >>> >> By the way, you might consider packing those files (with zip or >>> tar.gz >>> >> or something) cause they will shrink quite well >>> >> >>> >> 2009/12/9 Mattias Persson : >>> >> > Great, but I only got the images.csv file... I'm starting to >>> test >>> >> with >>> >> > that at least >>> >> > >>> >> > 2009/12/9 Núria Trench : >>> >> >> Hi again, >>> >> >> >>> >> >> The errors show up after being parsed 2 csv files to create all >>> the >>> >> >> nodes, >>> >> >> just in the moment of calling the method "getSingleNode" for >>> >> looking >>> >> >> up the >>> >> >> tail and head node for creating all the edges by reading the >>> other >>> >> two >>> >> >> csv >>> >> >> files. >>> >> >> >>> >> >> I am sending with Sprend the four csv files that will help you >>> to >>> >> >> trigger >>> >> >> index behaviour. >>> >> >> >>> >> >> Thank you, >>> >> >> >>> >> >> Núria. >>> >> >> >>> >> >> 2009/12/9 Mattias Persson >>> >> >>>
Re: [Neo] LuceneIndexBatchInserter doubt
Hi, I have just finished parsing and creating the database with the latest index-util-0.9-SNAPSHOT available in your repository. It has been finished succesfully so I must thank you for your interest and useful help. And, finally, I have one last question. I have been created 180 million of edges and 20 million of nodes. Is it possible to create a bigger amount of edges and nodes with Neo4j? Do you have a limit? Thank your very much again. 2009/12/21 Núria Trench > Hi again Mattias, > > I'm still trying to parse all the data in order to create the graph. I will > report the results as soon as possible. > Thank you very much for your interest. > > Núria. > > 2009/12/21 Mattias Persson > > Hi again, >> >> any luck with this yet? >> >> 2009/12/11 Núria Trench : >> > Thank you very much Mattias. I will test it as soon as possible and I'll >> > will tell you something. >> > >> > Núria. >> > >> > 2009/12/11 Mattias Persson >> > >> >> I've tried this a couple of times now and first of all I see some >> >> problems in your code: >> >> >> >> 1) In the method createRelationsTitleImage you have an inverted "head >> >> != -1" check where it should be "head == -1" >> >> >> >> 2) You index relationships in createRelationsBetweenTitles method, >> >> this isn't ok since the index can only manage nodes. >> >> >> >> And I recently committed a "fix" which removed the caching layer in >> >> the LuceneIndexBatchInserterImpl (and therefore also >> >> LuceneFulltextIndexBatchInserter). This probably fixes your problems. >> >> I'm also working on a performance fix which makes consecutive getNodes >> >> calls faster. >> >> >> >> So I think that with these fixes (1) and (2) and the latest index-util >> >> 0.9-SNAPSHOT your sample will run fine. Also you could try without >> >> calling optimize. See more information at >> >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter >> >> >> >> 2009/12/10 Mattias Persson : >> >> > To continue this thread in the user list: >> >> > >> >> > Thanks Núria, I've gotten your samples code/files and I'm running it >> >> > now to try to reproduce you problem. >> >> > >> >> > 2009/12/9 Núria Trench : >> >> >> I have finished uploading the 4 csv files. You'll see an e-mail with >> the >> >> >> other 3 csv files packed in a rar file. >> >> >> Thanks, >> >> >> >> >> >> Núria. >> >> >> >> >> >> 2009/12/9 Núria Trench >> >> >>> >> >> >>> Yes, you are right. But there is one csv file that is too big to be >> >> packed >> >> >>> with other files and I am reducing it. >> >> >>> I am sending the other files now. >> >> >>> >> >> >>> 2009/12/9 Mattias Persson >> >> >> >> By the way, you might consider packing those files (with zip or >> tar.gz >> >> or something) cause they will shrink quite well >> >> >> >> 2009/12/9 Mattias Persson : >> >> > Great, but I only got the images.csv file... I'm starting to >> test >> >> with >> >> > that at least >> >> > >> >> > 2009/12/9 Núria Trench : >> >> >> Hi again, >> >> >> >> >> >> The errors show up after being parsed 2 csv files to create all >> the >> >> >> nodes, >> >> >> just in the moment of calling the method "getSingleNode" for >> >> looking >> >> >> up the >> >> >> tail and head node for creating all the edges by reading the >> other >> >> two >> >> >> csv >> >> >> files. >> >> >> >> >> >> I am sending with Sprend the four csv files that will help you >> to >> >> >> trigger >> >> >> index behaviour. >> >> >> >> >> >> Thank you, >> >> >> >> >> >> Núria. >> >> >> >> >> >> 2009/12/9 Mattias Persson >> >> >>> >> >> >>> Hmm, I've no idea... but does the errors show up early in the >> >> process >> >> >>> or do you have to insert a LOT of data to trigger it? In such >> case >> >> >>> you >> >> >>> could send me a part of them... maybe using >> http://www.sprend.se, >> >> >>> WDYT? >> >> >>> >> >> >>> 2009/12/9 Núria Trench : >> >> >>> > Hi Mattias, >> >> >>> > >> >> >>> > The data isn't confident but the files are very big (5,5 >> GB). >> >> >>> > How can I send you this data? >> >> >>> > >> >> >>> > 2009/12/9 Mattias Persson >> >> >>> >> >> >> >>> >> Yep I got the java code, thanks. Yeah if the data is >> confident >> >> or >> >> >>> >> sensitive you can just send me the formatting, else >> consider >> >> >>> >> sending >> >> >>> >> the files as well (or a subset if they are big). >> >> >>> >> >> >> >>> >> 2009/12/9 Núria Trench : >> >> >> >> >> >> >> >> -- >> >> Mattias Persson, [matt...@neotechnology.com] >> >> Neo Technology, www.neotechnology.com >> >> ___ >> >> Neo mailing list >> >> User@lists.neo4j.org >> >> https://lists.neo4j.org/mailman/listinfo/user >> >> >> > ___ >> > Neo mailing list >> > User@lists.neo4j.org >>
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again Mattias, I'm still trying to parse all the data in order to create the graph. I will report the results as soon as possible. Thank you very much for your interest. Núria. 2009/12/21 Mattias Persson > Hi again, > > any luck with this yet? > > 2009/12/11 Núria Trench : > > Thank you very much Mattias. I will test it as soon as possible and I'll > > will tell you something. > > > > Núria. > > > > 2009/12/11 Mattias Persson > > > >> I've tried this a couple of times now and first of all I see some > >> problems in your code: > >> > >> 1) In the method createRelationsTitleImage you have an inverted "head > >> != -1" check where it should be "head == -1" > >> > >> 2) You index relationships in createRelationsBetweenTitles method, > >> this isn't ok since the index can only manage nodes. > >> > >> And I recently committed a "fix" which removed the caching layer in > >> the LuceneIndexBatchInserterImpl (and therefore also > >> LuceneFulltextIndexBatchInserter). This probably fixes your problems. > >> I'm also working on a performance fix which makes consecutive getNodes > >> calls faster. > >> > >> So I think that with these fixes (1) and (2) and the latest index-util > >> 0.9-SNAPSHOT your sample will run fine. Also you could try without > >> calling optimize. See more information at > >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter > >> > >> 2009/12/10 Mattias Persson : > >> > To continue this thread in the user list: > >> > > >> > Thanks Núria, I've gotten your samples code/files and I'm running it > >> > now to try to reproduce you problem. > >> > > >> > 2009/12/9 Núria Trench : > >> >> I have finished uploading the 4 csv files. You'll see an e-mail with > the > >> >> other 3 csv files packed in a rar file. > >> >> Thanks, > >> >> > >> >> Núria. > >> >> > >> >> 2009/12/9 Núria Trench > >> >>> > >> >>> Yes, you are right. But there is one csv file that is too big to be > >> packed > >> >>> with other files and I am reducing it. > >> >>> I am sending the other files now. > >> >>> > >> >>> 2009/12/9 Mattias Persson > >> > >> By the way, you might consider packing those files (with zip or > tar.gz > >> or something) cause they will shrink quite well > >> > >> 2009/12/9 Mattias Persson : > >> > Great, but I only got the images.csv file... I'm starting to test > >> with > >> > that at least > >> > > >> > 2009/12/9 Núria Trench : > >> >> Hi again, > >> >> > >> >> The errors show up after being parsed 2 csv files to create all > the > >> >> nodes, > >> >> just in the moment of calling the method "getSingleNode" for > >> looking > >> >> up the > >> >> tail and head node for creating all the edges by reading the > other > >> two > >> >> csv > >> >> files. > >> >> > >> >> I am sending with Sprend the four csv files that will help you > to > >> >> trigger > >> >> index behaviour. > >> >> > >> >> Thank you, > >> >> > >> >> Núria. > >> >> > >> >> 2009/12/9 Mattias Persson > >> >>> > >> >>> Hmm, I've no idea... but does the errors show up early in the > >> process > >> >>> or do you have to insert a LOT of data to trigger it? In such > case > >> >>> you > >> >>> could send me a part of them... maybe using > http://www.sprend.se, > >> >>> WDYT? > >> >>> > >> >>> 2009/12/9 Núria Trench : > >> >>> > Hi Mattias, > >> >>> > > >> >>> > The data isn't confident but the files are very big (5,5 GB). > >> >>> > How can I send you this data? > >> >>> > > >> >>> > 2009/12/9 Mattias Persson > >> >>> >> > >> >>> >> Yep I got the java code, thanks. Yeah if the data is > confident > >> or > >> >>> >> sensitive you can just send me the formatting, else consider > >> >>> >> sending > >> >>> >> the files as well (or a subset if they are big). > >> >>> >> > >> >>> >> 2009/12/9 Núria Trench : > >> > >> > >> > >> -- > >> Mattias Persson, [matt...@neotechnology.com] > >> Neo Technology, www.neotechnology.com > >> ___ > >> Neo mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > ___ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again, any luck with this yet? 2009/12/11 Núria Trench : > Thank you very much Mattias. I will test it as soon as possible and I'll > will tell you something. > > Núria. > > 2009/12/11 Mattias Persson > >> I've tried this a couple of times now and first of all I see some >> problems in your code: >> >> 1) In the method createRelationsTitleImage you have an inverted "head >> != -1" check where it should be "head == -1" >> >> 2) You index relationships in createRelationsBetweenTitles method, >> this isn't ok since the index can only manage nodes. >> >> And I recently committed a "fix" which removed the caching layer in >> the LuceneIndexBatchInserterImpl (and therefore also >> LuceneFulltextIndexBatchInserter). This probably fixes your problems. >> I'm also working on a performance fix which makes consecutive getNodes >> calls faster. >> >> So I think that with these fixes (1) and (2) and the latest index-util >> 0.9-SNAPSHOT your sample will run fine. Also you could try without >> calling optimize. See more information at >> http://wiki.neo4j.org/content/Indexing_with_BatchInserter >> >> 2009/12/10 Mattias Persson : >> > To continue this thread in the user list: >> > >> > Thanks Núria, I've gotten your samples code/files and I'm running it >> > now to try to reproduce you problem. >> > >> > 2009/12/9 Núria Trench : >> >> I have finished uploading the 4 csv files. You'll see an e-mail with the >> >> other 3 csv files packed in a rar file. >> >> Thanks, >> >> >> >> Núria. >> >> >> >> 2009/12/9 Núria Trench >> >>> >> >>> Yes, you are right. But there is one csv file that is too big to be >> packed >> >>> with other files and I am reducing it. >> >>> I am sending the other files now. >> >>> >> >>> 2009/12/9 Mattias Persson >> >> By the way, you might consider packing those files (with zip or tar.gz >> or something) cause they will shrink quite well >> >> 2009/12/9 Mattias Persson : >> > Great, but I only got the images.csv file... I'm starting to test >> with >> > that at least >> > >> > 2009/12/9 Núria Trench : >> >> Hi again, >> >> >> >> The errors show up after being parsed 2 csv files to create all the >> >> nodes, >> >> just in the moment of calling the method "getSingleNode" for >> looking >> >> up the >> >> tail and head node for creating all the edges by reading the other >> two >> >> csv >> >> files. >> >> >> >> I am sending with Sprend the four csv files that will help you to >> >> trigger >> >> index behaviour. >> >> >> >> Thank you, >> >> >> >> Núria. >> >> >> >> 2009/12/9 Mattias Persson >> >>> >> >>> Hmm, I've no idea... but does the errors show up early in the >> process >> >>> or do you have to insert a LOT of data to trigger it? In such case >> >>> you >> >>> could send me a part of them... maybe using http://www.sprend.se, >> >>> WDYT? >> >>> >> >>> 2009/12/9 Núria Trench : >> >>> > Hi Mattias, >> >>> > >> >>> > The data isn't confident but the files are very big (5,5 GB). >> >>> > How can I send you this data? >> >>> > >> >>> > 2009/12/9 Mattias Persson >> >>> >> >> >>> >> Yep I got the java code, thanks. Yeah if the data is confident >> or >> >>> >> sensitive you can just send me the formatting, else consider >> >>> >> sending >> >>> >> the files as well (or a subset if they are big). >> >>> >> >> >>> >> 2009/12/9 Núria Trench : >> >> >> >> -- >> Mattias Persson, [matt...@neotechnology.com] >> Neo Technology, www.neotechnology.com >> ___ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Thank you very much Mattias. I will test it as soon as possible and I'll will tell you something. Núria. 2009/12/11 Mattias Persson > I've tried this a couple of times now and first of all I see some > problems in your code: > > 1) In the method createRelationsTitleImage you have an inverted "head > != -1" check where it should be "head == -1" > > 2) You index relationships in createRelationsBetweenTitles method, > this isn't ok since the index can only manage nodes. > > And I recently committed a "fix" which removed the caching layer in > the LuceneIndexBatchInserterImpl (and therefore also > LuceneFulltextIndexBatchInserter). This probably fixes your problems. > I'm also working on a performance fix which makes consecutive getNodes > calls faster. > > So I think that with these fixes (1) and (2) and the latest index-util > 0.9-SNAPSHOT your sample will run fine. Also you could try without > calling optimize. See more information at > http://wiki.neo4j.org/content/Indexing_with_BatchInserter > > 2009/12/10 Mattias Persson : > > To continue this thread in the user list: > > > > Thanks Núria, I've gotten your samples code/files and I'm running it > > now to try to reproduce you problem. > > > > 2009/12/9 Núria Trench : > >> I have finished uploading the 4 csv files. You'll see an e-mail with the > >> other 3 csv files packed in a rar file. > >> Thanks, > >> > >> Núria. > >> > >> 2009/12/9 Núria Trench > >>> > >>> Yes, you are right. But there is one csv file that is too big to be > packed > >>> with other files and I am reducing it. > >>> I am sending the other files now. > >>> > >>> 2009/12/9 Mattias Persson > > By the way, you might consider packing those files (with zip or tar.gz > or something) cause they will shrink quite well > > 2009/12/9 Mattias Persson : > > Great, but I only got the images.csv file... I'm starting to test > with > > that at least > > > > 2009/12/9 Núria Trench : > >> Hi again, > >> > >> The errors show up after being parsed 2 csv files to create all the > >> nodes, > >> just in the moment of calling the method "getSingleNode" for > looking > >> up the > >> tail and head node for creating all the edges by reading the other > two > >> csv > >> files. > >> > >> I am sending with Sprend the four csv files that will help you to > >> trigger > >> index behaviour. > >> > >> Thank you, > >> > >> Núria. > >> > >> 2009/12/9 Mattias Persson > >>> > >>> Hmm, I've no idea... but does the errors show up early in the > process > >>> or do you have to insert a LOT of data to trigger it? In such case > >>> you > >>> could send me a part of them... maybe using http://www.sprend.se, > >>> WDYT? > >>> > >>> 2009/12/9 Núria Trench : > >>> > Hi Mattias, > >>> > > >>> > The data isn't confident but the files are very big (5,5 GB). > >>> > How can I send you this data? > >>> > > >>> > 2009/12/9 Mattias Persson > >>> >> > >>> >> Yep I got the java code, thanks. Yeah if the data is confident > or > >>> >> sensitive you can just send me the formatting, else consider > >>> >> sending > >>> >> the files as well (or a subset if they are big). > >>> >> > >>> >> 2009/12/9 Núria Trench : > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
I've tried this a couple of times now and first of all I see some problems in your code: 1) In the method createRelationsTitleImage you have an inverted "head != -1" check where it should be "head == -1" 2) You index relationships in createRelationsBetweenTitles method, this isn't ok since the index can only manage nodes. And I recently committed a "fix" which removed the caching layer in the LuceneIndexBatchInserterImpl (and therefore also LuceneFulltextIndexBatchInserter). This probably fixes your problems. I'm also working on a performance fix which makes consecutive getNodes calls faster. So I think that with these fixes (1) and (2) and the latest index-util 0.9-SNAPSHOT your sample will run fine. Also you could try without calling optimize. See more information at http://wiki.neo4j.org/content/Indexing_with_BatchInserter 2009/12/10 Mattias Persson : > To continue this thread in the user list: > > Thanks Núria, I've gotten your samples code/files and I'm running it > now to try to reproduce you problem. > > 2009/12/9 Núria Trench : >> I have finished uploading the 4 csv files. You'll see an e-mail with the >> other 3 csv files packed in a rar file. >> Thanks, >> >> Núria. >> >> 2009/12/9 Núria Trench >>> >>> Yes, you are right. But there is one csv file that is too big to be packed >>> with other files and I am reducing it. >>> I am sending the other files now. >>> >>> 2009/12/9 Mattias Persson By the way, you might consider packing those files (with zip or tar.gz or something) cause they will shrink quite well 2009/12/9 Mattias Persson : > Great, but I only got the images.csv file... I'm starting to test with > that at least > > 2009/12/9 Núria Trench : >> Hi again, >> >> The errors show up after being parsed 2 csv files to create all the >> nodes, >> just in the moment of calling the method "getSingleNode" for looking >> up the >> tail and head node for creating all the edges by reading the other two >> csv >> files. >> >> I am sending with Sprend the four csv files that will help you to >> trigger >> index behaviour. >> >> Thank you, >> >> Núria. >> >> 2009/12/9 Mattias Persson >>> >>> Hmm, I've no idea... but does the errors show up early in the process >>> or do you have to insert a LOT of data to trigger it? In such case >>> you >>> could send me a part of them... maybe using http://www.sprend.se , >>> WDYT? >>> >>> 2009/12/9 Núria Trench : >>> > Hi Mattias, >>> > >>> > The data isn't confident but the files are very big (5,5 GB). >>> > How can I send you this data? >>> > >>> > 2009/12/9 Mattias Persson >>> >> >>> >> Yep I got the java code, thanks. Yeah if the data is confident or >>> >> sensitive you can just send me the formatting, else consider >>> >> sending >>> >> the files as well (or a subset if they are big). >>> >> >>> >> 2009/12/9 Núria Trench : -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
To continue this thread in the user list: Thanks Núria, I've gotten your samples code/files and I'm running it now to try to reproduce you problem. 2009/12/9 Núria Trench : > I have finished uploading the 4 csv files. You'll see an e-mail with the > other 3 csv files packed in a rar file. > Thanks, > > Núria. > > 2009/12/9 Núria Trench >> >> Yes, you are right. But there is one csv file that is too big to be packed >> with other files and I am reducing it. >> I am sending the other files now. >> >> 2009/12/9 Mattias Persson >>> >>> By the way, you might consider packing those files (with zip or tar.gz >>> or something) cause they will shrink quite well >>> >>> 2009/12/9 Mattias Persson : >>> > Great, but I only got the images.csv file... I'm starting to test with >>> > that at least >>> > >>> > 2009/12/9 Núria Trench : >>> >> Hi again, >>> >> >>> >> The errors show up after being parsed 2 csv files to create all the >>> >> nodes, >>> >> just in the moment of calling the method "getSingleNode" for looking >>> >> up the >>> >> tail and head node for creating all the edges by reading the other two >>> >> csv >>> >> files. >>> >> >>> >> I am sending with Sprend the four csv files that will help you to >>> >> trigger >>> >> index behaviour. >>> >> >>> >> Thank you, >>> >> >>> >> Núria. >>> >> >>> >> 2009/12/9 Mattias Persson >>> >>> >>> >>> Hmm, I've no idea... but does the errors show up early in the process >>> >>> or do you have to insert a LOT of data to trigger it? In such case >>> >>> you >>> >>> could send me a part of them... maybe using http://www.sprend.se , >>> >>> WDYT? >>> >>> >>> >>> 2009/12/9 Núria Trench : >>> >>> > Hi Mattias, >>> >>> > >>> >>> > The data isn't confident but the files are very big (5,5 GB). >>> >>> > How can I send you this data? >>> >>> > >>> >>> > 2009/12/9 Mattias Persson >>> >>> >> >>> >>> >> Yep I got the java code, thanks. Yeah if the data is confident or >>> >>> >> sensitive you can just send me the formatting, else consider >>> >>> >> sending >>> >>> >> the files as well (or a subset if they are big). >>> >>> >> >>> >>> >> 2009/12/9 Núria Trench : >>> >>> >> > >>> >>> >> > >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> -- >>> >>> >> Mattias Persson, [matt...@neotechnology.com] >>> >>> >> Neo Technology, www.neotechnology.com >>> >>> > >>> >>> > >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Mattias Persson, [matt...@neotechnology.com] >>> >>> Neo Technology, www.neotechnology.com >>> >> >>> >> >>> > >>> > >>> > >>> > -- >>> > Mattias Persson, [matt...@neotechnology.com] >>> > Neo Technology, www.neotechnology.com >>> > >>> >>> >>> >>> -- >>> Mattias Persson, [matt...@neotechnology.com] >>> Neo Technology, www.neotechnology.com >> > > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, I have already done it 10 minutes ago. If you need an example to see the format of the 4 csv files, I can send it to you. Thanks again, Núria. 2009/12/9 Mattias Persson > Oh ok, It could be our attachments filter / security or something... > could you try to mail them to me directly at matt...@neotechnology.com > ? > > 2009/12/9 Núria Trench : > > Hi Mattias, > > > > In my last e-mail I have attached the sample code, haven't you received > it? > > I will try to attach it again. > > > > Núria. > > > > 2009/12/9 Mattias Persson > > > >> Hi again, Núria (it was I, Mattias who asked for the sample code). > >> Well... the fact that you parse 4 csv files doesn't really help me > >> setup a test for this... I mean how can I know that my test will be > >> similar to yours? Would it be ok to attach your code/csv files as > >> well? > >> > >> / Mattias > >> > >> 2009/12/9 Núria Trench : > >> > Hi Todd, > >> > > >> > The sample code creates nodes and relationships by parsing 4 csv > files. > >> > Thank you for trying to trigger this behaviour with this sample. > >> > > >> > Núria > >> > > >> > 2009/12/9 Mattias Persson > >> > > >> >> Could you provide me with some sample code which can trigger this > >> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? > >> >> > >> >> 2009/12/9 Núria Trench : > >> >> > Todd, > >> >> > > >> >> > I haven't the same problem. In my case, after indexing all the > >> >> > attributes/properties of each node, the application creates all the > >> edges > >> >> by > >> >> > looking up the tail node and the head node. So, it calls the method > >> >> > "org.neo4j.util.index. > >> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no > found > >> >> node) > >> >> > in many occasions. > >> >> > > >> >> > Any one has an alternative to get a node with indexex > >> >> attributes/properties? > >> >> > > >> >> > Thank you, > >> >> > > >> >> > Núria. > >> >> > > >> >> > > >> >> > 2009/12/7 Mattias Persson > >> >> > > >> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? > This > >> >> >> is a bug that we fixed yesterday... (assuming it's the same bug). > >> >> >> > >> >> >> 2009/12/7 Todd Stavish : > >> >> >> > Hi Mattias, Núria. > >> >> >> > > >> >> >> > I am also running into scalability problems with the Lucene > batch > >> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried > >> >> >> > calling optimize more. Increasing ulimit didn't help. > >> >> >> > > >> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: > >> >> >> > java.io.FileNotFoundException: > >> >> >> > > >> >> >> > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> >> > (Too many open files) > >> >> >> > [INFO] at > >> >> >> > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > >> >> >> > [INFO] at > >> >> >> > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > >> >> >> > [INFO] at > >> >> >> > >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > >> >> >> > [INFO] at > >> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > >> >> >> > [INFO] Caused by: java.io.FileNotFoundException: > >> >> >> > > >> >> >> > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> >> > (Too many open files) > >> >> >> > > >> >> >> > I tried breaking up to separate batchinserter instances, and it > >> hangs > >> >> >> > now. Can I create more than one batch inserter per process if > they > >> run > >> >> >> > sequentially and non-threaded? > >> >> >> > > >> >> >> > Thanks, > >> >> >> > Todd > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench < > >> nuriatre...@gmail.com> > >> >> >> wrote: > >> >> >> >> Hi again Mattias, > >> >> >> >> > >> >> >> >> I have tried to execute my application with the last version > >> >> available > >> >> >> in > >> >> >> >> the maven repository and I still have the same problem. After > >> >> creating > >> >> >> and > >> >> >> >> indexing all the nodes, the application calls the "optimize" > >> method > >> >> and, > >> >> >> >> then, it creates all the edges by calling the method "getNodes" > in > >> >> order > >> >> >> to > >> >> >> >> select the tail and head node of the edge, but it doesn't work > >> >> because > >> >> >> many > >> >> >> >> nodes are not found. > >> >> >> >> > >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works > >> >> properly, > >> >> >> but > >> >> >> >> if I try to create a big graph (180 million edges + 20 million > >> nodes) > >> >> it > >> >> >> >> doesn't. > >> >> >> >> > >> >> >> >> I have also tried to call the "optimize" method every time the > >> >> >> application > >> >> >> >> has been created 1 million nodes but it doesn't work. > >> >> >> >> > >> >> >> >
Re: [Neo] LuceneIndexBatchInserter doubt
Oh ok, It could be our attachments filter / security or something... could you try to mail them to me directly at matt...@neotechnology.com ? 2009/12/9 Núria Trench : > Hi Mattias, > > In my last e-mail I have attached the sample code, haven't you received it? > I will try to attach it again. > > Núria. > > 2009/12/9 Mattias Persson > >> Hi again, Núria (it was I, Mattias who asked for the sample code). >> Well... the fact that you parse 4 csv files doesn't really help me >> setup a test for this... I mean how can I know that my test will be >> similar to yours? Would it be ok to attach your code/csv files as >> well? >> >> / Mattias >> >> 2009/12/9 Núria Trench : >> > Hi Todd, >> > >> > The sample code creates nodes and relationships by parsing 4 csv files. >> > Thank you for trying to trigger this behaviour with this sample. >> > >> > Núria >> > >> > 2009/12/9 Mattias Persson >> > >> >> Could you provide me with some sample code which can trigger this >> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? >> >> >> >> 2009/12/9 Núria Trench : >> >> > Todd, >> >> > >> >> > I haven't the same problem. In my case, after indexing all the >> >> > attributes/properties of each node, the application creates all the >> edges >> >> by >> >> > looking up the tail node and the head node. So, it calls the method >> >> > "org.neo4j.util.index. >> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found >> >> node) >> >> > in many occasions. >> >> > >> >> > Any one has an alternative to get a node with indexex >> >> attributes/properties? >> >> > >> >> > Thank you, >> >> > >> >> > Núria. >> >> > >> >> > >> >> > 2009/12/7 Mattias Persson >> >> > >> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This >> >> >> is a bug that we fixed yesterday... (assuming it's the same bug). >> >> >> >> >> >> 2009/12/7 Todd Stavish : >> >> >> > Hi Mattias, Núria. >> >> >> > >> >> >> > I am also running into scalability problems with the Lucene batch >> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried >> >> >> > calling optimize more. Increasing ulimit didn't help. >> >> >> > >> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: >> >> >> > java.io.FileNotFoundException: >> >> >> > >> >> >> >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> >> > (Too many open files) >> >> >> > [INFO] at >> >> >> >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) >> >> >> > [INFO] at >> >> >> >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) >> >> >> > [INFO] at >> >> >> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) >> >> >> > [INFO] at >> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) >> >> >> > [INFO] Caused by: java.io.FileNotFoundException: >> >> >> > >> >> >> >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> >> > (Too many open files) >> >> >> > >> >> >> > I tried breaking up to separate batchinserter instances, and it >> hangs >> >> >> > now. Can I create more than one batch inserter per process if they >> run >> >> >> > sequentially and non-threaded? >> >> >> > >> >> >> > Thanks, >> >> >> > Todd >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench < >> nuriatre...@gmail.com> >> >> >> wrote: >> >> >> >> Hi again Mattias, >> >> >> >> >> >> >> >> I have tried to execute my application with the last version >> >> available >> >> >> in >> >> >> >> the maven repository and I still have the same problem. After >> >> creating >> >> >> and >> >> >> >> indexing all the nodes, the application calls the "optimize" >> method >> >> and, >> >> >> >> then, it creates all the edges by calling the method "getNodes" in >> >> order >> >> >> to >> >> >> >> select the tail and head node of the edge, but it doesn't work >> >> because >> >> >> many >> >> >> >> nodes are not found. >> >> >> >> >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works >> >> properly, >> >> >> but >> >> >> >> if I try to create a big graph (180 million edges + 20 million >> nodes) >> >> it >> >> >> >> doesn't. >> >> >> >> >> >> >> >> I have also tried to call the "optimize" method every time the >> >> >> application >> >> >> >> has been created 1 million nodes but it doesn't work. >> >> >> >> >> >> >> >> Have you tried to create as many nodes as I have said with the >> newer >> >> >> >> index-util version? >> >> >> >> >> >> >> >> Thank you, >> >> >> >> >> >> >> >> Núria. >> >> >> >> >> >> >> >> 2009/12/4 Núria Trench >> >> >> >> >> >> >> >>> Hi Mattias, >> >> >> >>> >> >> >> >>> Thank you very much for fixing the problem so fast. I will try it >> as >> >> >> soon >> >> >> >>> as the new changes will be available in the maven repository. >> >> >> >>> >> >> >> >>> Núria. >> >> >> >>> >>
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again. Núria. 2009/12/9 Mattias Persson > Hi again, Núria (it was I, Mattias who asked for the sample code). > Well... the fact that you parse 4 csv files doesn't really help me > setup a test for this... I mean how can I know that my test will be > similar to yours? Would it be ok to attach your code/csv files as > well? > > / Mattias > > 2009/12/9 Núria Trench : > > Hi Todd, > > > > The sample code creates nodes and relationships by parsing 4 csv files. > > Thank you for trying to trigger this behaviour with this sample. > > > > Núria > > > > 2009/12/9 Mattias Persson > > > >> Could you provide me with some sample code which can trigger this > >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? > >> > >> 2009/12/9 Núria Trench : > >> > Todd, > >> > > >> > I haven't the same problem. In my case, after indexing all the > >> > attributes/properties of each node, the application creates all the > edges > >> by > >> > looking up the tail node and the head node. So, it calls the method > >> > "org.neo4j.util.index. > >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found > >> node) > >> > in many occasions. > >> > > >> > Any one has an alternative to get a node with indexex > >> attributes/properties? > >> > > >> > Thank you, > >> > > >> > Núria. > >> > > >> > > >> > 2009/12/7 Mattias Persson > >> > > >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > >> >> is a bug that we fixed yesterday... (assuming it's the same bug). > >> >> > >> >> 2009/12/7 Todd Stavish : > >> >> > Hi Mattias, Núria. > >> >> > > >> >> > I am also running into scalability problems with the Lucene batch > >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried > >> >> > calling optimize more. Increasing ulimit didn't help. > >> >> > > >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: > >> >> > java.io.FileNotFoundException: > >> >> > > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> > (Too many open files) > >> >> > [INFO] at > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > >> >> > [INFO] at > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > >> >> > [INFO] at > >> >> > com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > >> >> > [INFO] at > com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > >> >> > [INFO] Caused by: java.io.FileNotFoundException: > >> >> > > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> > (Too many open files) > >> >> > > >> >> > I tried breaking up to separate batchinserter instances, and it > hangs > >> >> > now. Can I create more than one batch inserter per process if they > run > >> >> > sequentially and non-threaded? > >> >> > > >> >> > Thanks, > >> >> > Todd > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench < > nuriatre...@gmail.com> > >> >> wrote: > >> >> >> Hi again Mattias, > >> >> >> > >> >> >> I have tried to execute my application with the last version > >> available > >> >> in > >> >> >> the maven repository and I still have the same problem. After > >> creating > >> >> and > >> >> >> indexing all the nodes, the application calls the "optimize" > method > >> and, > >> >> >> then, it creates all the edges by calling the method "getNodes" in > >> order > >> >> to > >> >> >> select the tail and head node of the edge, but it doesn't work > >> because > >> >> many > >> >> >> nodes are not found. > >> >> >> > >> >> >> I have tried to create only 30 nodes and 15 edges and it works > >> properly, > >> >> but > >> >> >> if I try to create a big graph (180 million edges + 20 million > nodes) > >> it > >> >> >> doesn't. > >> >> >> > >> >> >> I have also tried to call the "optimize" method every time the > >> >> application > >> >> >> has been created 1 million nodes but it doesn't work. > >> >> >> > >> >> >> Have you tried to create as many nodes as I have said with the > newer > >> >> >> index-util version? > >> >> >> > >> >> >> Thank you, > >> >> >> > >> >> >> Núria. > >> >> >> > >> >> >> 2009/12/4 Núria Trench > >> >> >> > >> >> >>> Hi Mattias, > >> >> >>> > >> >> >>> Thank you very much for fixing the problem so fast. I will try it > as > >> >> soon > >> >> >>> as the new changes will be available in the maven repository. > >> >> >>> > >> >> >>> Núria. > >> >> >>> > >> >> >>> > >> >> >>> 2009/12/4 Mattias Persson > >> >> >>> > >> >> I fixed the problem and also added a cache per key for faster > >> >> getNodes/getSingleNode lookup during the insert process. However > >> the > >> >> cache assumes that there's nothing in the index when the process > >> >> starts (which al
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench : > Hi Todd, > > The sample code creates nodes and relationships by parsing 4 csv files. > Thank you for trying to trigger this behaviour with this sample. > > Núria > > 2009/12/9 Mattias Persson > >> Could you provide me with some sample code which can trigger this >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? >> >> 2009/12/9 Núria Trench : >> > Todd, >> > >> > I haven't the same problem. In my case, after indexing all the >> > attributes/properties of each node, the application creates all the edges >> by >> > looking up the tail node and the head node. So, it calls the method >> > "org.neo4j.util.index. >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found >> node) >> > in many occasions. >> > >> > Any one has an alternative to get a node with indexex >> attributes/properties? >> > >> > Thank you, >> > >> > Núria. >> > >> > >> > 2009/12/7 Mattias Persson >> > >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This >> >> is a bug that we fixed yesterday... (assuming it's the same bug). >> >> >> >> 2009/12/7 Todd Stavish : >> >> > Hi Mattias, Núria. >> >> > >> >> > I am also running into scalability problems with the Lucene batch >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried >> >> > calling optimize more. Increasing ulimit didn't help. >> >> > >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: >> >> > java.io.FileNotFoundException: >> >> > >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> > (Too many open files) >> >> > [INFO] at >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) >> >> > [INFO] at >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) >> >> > [INFO] at >> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) >> >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) >> >> > [INFO] Caused by: java.io.FileNotFoundException: >> >> > >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> > (Too many open files) >> >> > >> >> > I tried breaking up to separate batchinserter instances, and it hangs >> >> > now. Can I create more than one batch inserter per process if they run >> >> > sequentially and non-threaded? >> >> > >> >> > Thanks, >> >> > Todd >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench >> >> wrote: >> >> >> Hi again Mattias, >> >> >> >> >> >> I have tried to execute my application with the last version >> available >> >> in >> >> >> the maven repository and I still have the same problem. After >> creating >> >> and >> >> >> indexing all the nodes, the application calls the "optimize" method >> and, >> >> >> then, it creates all the edges by calling the method "getNodes" in >> order >> >> to >> >> >> select the tail and head node of the edge, but it doesn't work >> because >> >> many >> >> >> nodes are not found. >> >> >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works >> properly, >> >> but >> >> >> if I try to create a big graph (180 million edges + 20 million nodes) >> it >> >> >> doesn't. >> >> >> >> >> >> I have also tried to call the "optimize" method every time the >> >> application >> >> >> has been created 1 million nodes but it doesn't work. >> >> >> >> >> >> Have you tried to create as many nodes as I have said with the newer >> >> >> index-util version? >> >> >> >> >> >> Thank you, >> >> >> >> >> >> Núria. >> >> >> >> >> >> 2009/12/4 Núria Trench >> >> >> >> >> >>> Hi Mattias, >> >> >>> >> >> >>> Thank you very much for fixing the problem so fast. I will try it as >> >> soon >> >> >>> as the new changes will be available in the maven repository. >> >> >>> >> >> >>> Núria. >> >> >>> >> >> >>> >> >> >>> 2009/12/4 Mattias Persson >> >> >>> >> >> I fixed the problem and also added a cache per key for faster >> >> getNodes/getSingleNode lookup during the insert process. However >> the >> >> cache assumes that there's nothing in the index when the process >> >> starts (which almost always will be true) to speed things up even >> >> further. >> >> >> >> You can control the cache size and if it should be used by >> overriding >> >> the (this is also documented in the Javadoc): >> >> >> >> boolean useCache() >> >> int getMaxCacheSizePerKey() >> >> >> >> methods in your LuceneIndexBatchInserterImpl instance. The new >> changes >> >> should be available in the maven repository within an hour. >> >> >> >> >>>
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson > Could you provide me with some sample code which can trigger this > behaviour with the latest index-util-0.9-SNAPSHOT Núria? > > 2009/12/9 Núria Trench : > > Todd, > > > > I haven't the same problem. In my case, after indexing all the > > attributes/properties of each node, the application creates all the edges > by > > looking up the tail node and the head node. So, it calls the method > > "org.neo4j.util.index. > > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found > node) > > in many occasions. > > > > Any one has an alternative to get a node with indexex > attributes/properties? > > > > Thank you, > > > > Núria. > > > > > > 2009/12/7 Mattias Persson > > > >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > >> is a bug that we fixed yesterday... (assuming it's the same bug). > >> > >> 2009/12/7 Todd Stavish : > >> > Hi Mattias, Núria. > >> > > >> > I am also running into scalability problems with the Lucene batch > >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried > >> > calling optimize more. Increasing ulimit didn't help. > >> > > >> > INFO] Exception in thread "main" java.lang.RuntimeException: > >> > java.io.FileNotFoundException: > >> > > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> > (Too many open files) > >> > [INFO] at > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > >> > [INFO] at > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > >> > [INFO] at > >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > >> > [INFO] Caused by: java.io.FileNotFoundException: > >> > > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> > (Too many open files) > >> > > >> > I tried breaking up to separate batchinserter instances, and it hangs > >> > now. Can I create more than one batch inserter per process if they run > >> > sequentially and non-threaded? > >> > > >> > Thanks, > >> > Todd > >> > > >> > > >> > > >> > > >> > > >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench > >> wrote: > >> >> Hi again Mattias, > >> >> > >> >> I have tried to execute my application with the last version > available > >> in > >> >> the maven repository and I still have the same problem. After > creating > >> and > >> >> indexing all the nodes, the application calls the "optimize" method > and, > >> >> then, it creates all the edges by calling the method "getNodes" in > order > >> to > >> >> select the tail and head node of the edge, but it doesn't work > because > >> many > >> >> nodes are not found. > >> >> > >> >> I have tried to create only 30 nodes and 15 edges and it works > properly, > >> but > >> >> if I try to create a big graph (180 million edges + 20 million nodes) > it > >> >> doesn't. > >> >> > >> >> I have also tried to call the "optimize" method every time the > >> application > >> >> has been created 1 million nodes but it doesn't work. > >> >> > >> >> Have you tried to create as many nodes as I have said with the newer > >> >> index-util version? > >> >> > >> >> Thank you, > >> >> > >> >> Núria. > >> >> > >> >> 2009/12/4 Núria Trench > >> >> > >> >>> Hi Mattias, > >> >>> > >> >>> Thank you very much for fixing the problem so fast. I will try it as > >> soon > >> >>> as the new changes will be available in the maven repository. > >> >>> > >> >>> Núria. > >> >>> > >> >>> > >> >>> 2009/12/4 Mattias Persson > >> >>> > >> I fixed the problem and also added a cache per key for faster > >> getNodes/getSingleNode lookup during the insert process. However > the > >> cache assumes that there's nothing in the index when the process > >> starts (which almost always will be true) to speed things up even > >> further. > >> > >> You can control the cache size and if it should be used by > overriding > >> the (this is also documented in the Javadoc): > >> > >> boolean useCache() > >> int getMaxCacheSizePerKey() > >> > >> methods in your LuceneIndexBatchInserterImpl instance. The new > changes > >> should be available in the maven repository within an hour. > >> > >> 2009/12/4 Mattias Persson : > >> > I think I found the problem... it's indexing as it should, but it > >> > isn't reflected in getNodes/getSingleNode properly until you > >> > flush/optimize/shutdown the index. I'll try to fix it today! > >> > > >> > 2009/12/3 Núria Trench : > >> >> Thank you very much for your response. > >> >> If you need more information, you only have to send an e-mail > and I > >> will try > >>
Re: [Neo] LuceneIndexBatchInserter doubt
Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench : > Todd, > > I haven't the same problem. In my case, after indexing all the > attributes/properties of each node, the application creates all the edges by > looking up the tail node and the head node. So, it calls the method > "org.neo4j.util.index. > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node) > in many occasions. > > Any one has an alternative to get a node with indexex attributes/properties? > > Thank you, > > Núria. > > > 2009/12/7 Mattias Persson > >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This >> is a bug that we fixed yesterday... (assuming it's the same bug). >> >> 2009/12/7 Todd Stavish : >> > Hi Mattias, Núria. >> > >> > I am also running into scalability problems with the Lucene batch >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried >> > calling optimize more. Increasing ulimit didn't help. >> > >> > INFO] Exception in thread "main" java.lang.RuntimeException: >> > java.io.FileNotFoundException: >> > >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> > (Too many open files) >> > [INFO] at >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) >> > [INFO] at >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) >> > [INFO] at >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) >> > [INFO] Caused by: java.io.FileNotFoundException: >> > >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> > (Too many open files) >> > >> > I tried breaking up to separate batchinserter instances, and it hangs >> > now. Can I create more than one batch inserter per process if they run >> > sequentially and non-threaded? >> > >> > Thanks, >> > Todd >> > >> > >> > >> > >> > >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench >> wrote: >> >> Hi again Mattias, >> >> >> >> I have tried to execute my application with the last version available >> in >> >> the maven repository and I still have the same problem. After creating >> and >> >> indexing all the nodes, the application calls the "optimize" method and, >> >> then, it creates all the edges by calling the method "getNodes" in order >> to >> >> select the tail and head node of the edge, but it doesn't work because >> many >> >> nodes are not found. >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works properly, >> but >> >> if I try to create a big graph (180 million edges + 20 million nodes) it >> >> doesn't. >> >> >> >> I have also tried to call the "optimize" method every time the >> application >> >> has been created 1 million nodes but it doesn't work. >> >> >> >> Have you tried to create as many nodes as I have said with the newer >> >> index-util version? >> >> >> >> Thank you, >> >> >> >> Núria. >> >> >> >> 2009/12/4 Núria Trench >> >> >> >>> Hi Mattias, >> >>> >> >>> Thank you very much for fixing the problem so fast. I will try it as >> soon >> >>> as the new changes will be available in the maven repository. >> >>> >> >>> Núria. >> >>> >> >>> >> >>> 2009/12/4 Mattias Persson >> >>> >> I fixed the problem and also added a cache per key for faster >> getNodes/getSingleNode lookup during the insert process. However the >> cache assumes that there's nothing in the index when the process >> starts (which almost always will be true) to speed things up even >> further. >> >> You can control the cache size and if it should be used by overriding >> the (this is also documented in the Javadoc): >> >> boolean useCache() >> int getMaxCacheSizePerKey() >> >> methods in your LuceneIndexBatchInserterImpl instance. The new changes >> should be available in the maven repository within an hour. >> >> 2009/12/4 Mattias Persson : >> > I think I found the problem... it's indexing as it should, but it >> > isn't reflected in getNodes/getSingleNode properly until you >> > flush/optimize/shutdown the index. I'll try to fix it today! >> > >> > 2009/12/3 Núria Trench : >> >> Thank you very much for your response. >> >> If you need more information, you only have to send an e-mail and I >> will try >> >> to explain it better. >> >> >> >> Núria. >> >> >> >> 2009/12/3 Mattias Persson >> >> >> >>> This is something I'd like to reproduce and I'll do some testing >> on >> >>> this tomorrow >> >>> >> >>> 2009/12/3 Núria Trench : >> >>> > Hello, >> >>> > >> >>> > Last week, I decided to download your graph database core in >> order >> to use >> >>> > it. First, I created a new project to parse my CSV
Re: [Neo] LuceneIndexBatchInserter doubt
Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method "org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson > Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > is a bug that we fixed yesterday... (assuming it's the same bug). > > 2009/12/7 Todd Stavish : > > Hi Mattias, Núria. > > > > I am also running into scalability problems with the Lucene batch > > inserter at much smaller numbers, 30,000 indexed nodes. I tried > > calling optimize more. Increasing ulimit didn't help. > > > > INFO] Exception in thread "main" java.lang.RuntimeException: > > java.io.FileNotFoundException: > > > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > > (Too many open files) > > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > > [INFO] at > com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > > [INFO] Caused by: java.io.FileNotFoundException: > > > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > > (Too many open files) > > > > I tried breaking up to separate batchinserter instances, and it hangs > > now. Can I create more than one batch inserter per process if they run > > sequentially and non-threaded? > > > > Thanks, > > Todd > > > > > > > > > > > > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench > wrote: > >> Hi again Mattias, > >> > >> I have tried to execute my application with the last version available > in > >> the maven repository and I still have the same problem. After creating > and > >> indexing all the nodes, the application calls the "optimize" method and, > >> then, it creates all the edges by calling the method "getNodes" in order > to > >> select the tail and head node of the edge, but it doesn't work because > many > >> nodes are not found. > >> > >> I have tried to create only 30 nodes and 15 edges and it works properly, > but > >> if I try to create a big graph (180 million edges + 20 million nodes) it > >> doesn't. > >> > >> I have also tried to call the "optimize" method every time the > application > >> has been created 1 million nodes but it doesn't work. > >> > >> Have you tried to create as many nodes as I have said with the newer > >> index-util version? > >> > >> Thank you, > >> > >> Núria. > >> > >> 2009/12/4 Núria Trench > >> > >>> Hi Mattias, > >>> > >>> Thank you very much for fixing the problem so fast. I will try it as > soon > >>> as the new changes will be available in the maven repository. > >>> > >>> Núria. > >>> > >>> > >>> 2009/12/4 Mattias Persson > >>> > I fixed the problem and also added a cache per key for faster > getNodes/getSingleNode lookup during the insert process. However the > cache assumes that there's nothing in the index when the process > starts (which almost always will be true) to speed things up even > further. > > You can control the cache size and if it should be used by overriding > the (this is also documented in the Javadoc): > > boolean useCache() > int getMaxCacheSizePerKey() > > methods in your LuceneIndexBatchInserterImpl instance. The new changes > should be available in the maven repository within an hour. > > 2009/12/4 Mattias Persson : > > I think I found the problem... it's indexing as it should, but it > > isn't reflected in getNodes/getSingleNode properly until you > > flush/optimize/shutdown the index. I'll try to fix it today! > > > > 2009/12/3 Núria Trench : > >> Thank you very much for your response. > >> If you need more information, you only have to send an e-mail and I > will try > >> to explain it better. > >> > >> Núria. > >> > >> 2009/12/3 Mattias Persson > >> > >>> This is something I'd like to reproduce and I'll do some testing > on > >>> this tomorrow > >>> > >>> 2009/12/3 Núria Trench : > >>> > Hello, > >>> > > >>> > Last week, I decided to download your graph database core in > order > to use > >>> > it. First, I created a new project to parse my CSV files and > create > a new > >>> > graph database with Neo4j. This CSV files contain 150 milion > edges > and 20 > >>> > milion nodes. > >>> > > >>> > When I finished to write the code which will create the graph > database, I > >>> > executed it and, after six
Re: [Neo] LuceneIndexBatchInserter doubt
Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish : > Hi Mattias, Núria. > > I am also running into scalability problems with the Lucene batch > inserter at much smaller numbers, 30,000 indexed nodes. I tried > calling optimize more. Increasing ulimit didn't help. > > INFO] Exception in thread "main" java.lang.RuntimeException: > java.io.FileNotFoundException: > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > (Too many open files) > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > [INFO] at > com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > [INFO] Caused by: java.io.FileNotFoundException: > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > (Too many open files) > > I tried breaking up to separate batchinserter instances, and it hangs > now. Can I create more than one batch inserter per process if they run > sequentially and non-threaded? > > Thanks, > Todd > > > > > > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench wrote: >> Hi again Mattias, >> >> I have tried to execute my application with the last version available in >> the maven repository and I still have the same problem. After creating and >> indexing all the nodes, the application calls the "optimize" method and, >> then, it creates all the edges by calling the method "getNodes" in order to >> select the tail and head node of the edge, but it doesn't work because many >> nodes are not found. >> >> I have tried to create only 30 nodes and 15 edges and it works properly, but >> if I try to create a big graph (180 million edges + 20 million nodes) it >> doesn't. >> >> I have also tried to call the "optimize" method every time the application >> has been created 1 million nodes but it doesn't work. >> >> Have you tried to create as many nodes as I have said with the newer >> index-util version? >> >> Thank you, >> >> Núria. >> >> 2009/12/4 Núria Trench >> >>> Hi Mattias, >>> >>> Thank you very much for fixing the problem so fast. I will try it as soon >>> as the new changes will be available in the maven repository. >>> >>> Núria. >>> >>> >>> 2009/12/4 Mattias Persson >>> I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson : > I think I found the problem... it's indexing as it should, but it > isn't reflected in getNodes/getSingleNode properly until you > flush/optimize/shutdown the index. I'll try to fix it today! > > 2009/12/3 Núria Trench : >> Thank you very much for your response. >> If you need more information, you only have to send an e-mail and I will try >> to explain it better. >> >> Núria. >> >> 2009/12/3 Mattias Persson >> >>> This is something I'd like to reproduce and I'll do some testing on >>> this tomorrow >>> >>> 2009/12/3 Núria Trench : >>> > Hello, >>> > >>> > Last week, I decided to download your graph database core in order to use >>> > it. First, I created a new project to parse my CSV files and create a new >>> > graph database with Neo4j. This CSV files contain 150 milion edges and 20 >>> > milion nodes. >>> > >>> > When I finished to write the code which will create the graph database, I >>> > executed it and, after six hours of execution, the program crashes >>> because >>> > of a Lucene exception. The exception is related to the index merging and >>> it >>> > has the following message: >>> > "mergeFields produced an invalid result: docCount is 385282378 but fdx >>> file >>> > size is 3082259028; now aborting this merge to prevent index corruption" >>> > >>> > I have searched on the net and I found that it is a lucene bug. The >>> > libraries used for executing my project were: >>> > neo-1.0-b10 >>> > index-util-0.7 >>> > lucene-core-2.4.0 >>> > >>> > So, I decided to use a newer Lucene version. I found that you have a >>> newer
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench wrote: > Hi again Mattias, > > I have tried to execute my application with the last version available in > the maven repository and I still have the same problem. After creating and > indexing all the nodes, the application calls the "optimize" method and, > then, it creates all the edges by calling the method "getNodes" in order to > select the tail and head node of the edge, but it doesn't work because many > nodes are not found. > > I have tried to create only 30 nodes and 15 edges and it works properly, but > if I try to create a big graph (180 million edges + 20 million nodes) it > doesn't. > > I have also tried to call the "optimize" method every time the application > has been created 1 million nodes but it doesn't work. > > Have you tried to create as many nodes as I have said with the newer > index-util version? > > Thank you, > > Núria. > > 2009/12/4 Núria Trench > >> Hi Mattias, >> >> Thank you very much for fixing the problem so fast. I will try it as soon >> as the new changes will be available in the maven repository. >> >> Núria. >> >> >> 2009/12/4 Mattias Persson >> >>> I fixed the problem and also added a cache per key for faster >>> getNodes/getSingleNode lookup during the insert process. However the >>> cache assumes that there's nothing in the index when the process >>> starts (which almost always will be true) to speed things up even >>> further. >>> >>> You can control the cache size and if it should be used by overriding >>> the (this is also documented in the Javadoc): >>> >>> boolean useCache() >>> int getMaxCacheSizePerKey() >>> >>> methods in your LuceneIndexBatchInserterImpl instance. The new changes >>> should be available in the maven repository within an hour. >>> >>> 2009/12/4 Mattias Persson : >>> > I think I found the problem... it's indexing as it should, but it >>> > isn't reflected in getNodes/getSingleNode properly until you >>> > flush/optimize/shutdown the index. I'll try to fix it today! >>> > >>> > 2009/12/3 Núria Trench : >>> >> Thank you very much for your response. >>> >> If you need more information, you only have to send an e-mail and I >>> will try >>> >> to explain it better. >>> >> >>> >> Núria. >>> >> >>> >> 2009/12/3 Mattias Persson >>> >> >>> >>> This is something I'd like to reproduce and I'll do some testing on >>> >>> this tomorrow >>> >>> >>> >>> 2009/12/3 Núria Trench : >>> >>> > Hello, >>> >>> > >>> >>> > Last week, I decided to download your graph database core in order >>> to use >>> >>> > it. First, I created a new project to parse my CSV files and create >>> a new >>> >>> > graph database with Neo4j. This CSV files contain 150 milion edges >>> and 20 >>> >>> > milion nodes. >>> >>> > >>> >>> > When I finished to write the code which will create the graph >>> database, I >>> >>> > executed it and, after six hours of execution, the program crashes >>> >>> because >>> >>> > of a Lucene exception. The exception is related to the index merging >>> and >>> >>> it >>> >>> > has the following message: >>> >>> > "mergeFields produced an invalid result: docCount is 385282378 but >>> fdx >>> >>> file >>> >>> > size is 3082259028; now aborting this merge to prevent index >>> corruption" >>> >>> > >>> >>> > I have searched on the net and I found that it is a lucene bug. The >>> >>> > libraries used for executing my project were: >>> >>> > neo-1.0-b10 >>> >>> > index-util-0.7 >>> >>> > lucene-core-2.4.0 >>> >>> > >>> >>> > So, I decided to use a newer Lucene version. I found that you have a >>> >>> newer >>> >>> > index-util version so I updated the libraries: >>> >>> > neo-1.0-b10 >>> >>> > index-util-0.9 >>> >>> > lucene-core-2.9.1 >>> >>> > >>> >>> > When I had updated those libraries, I tried to execute my project >>> again >>> >>> and >>> >>> > I found that, in many occassions, it was not indexing properly. So, >
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the "optimize" method and, then, it creates all the edges by calling the method "getNodes" in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the "optimize" method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench > Hi Mattias, > > Thank you very much for fixing the problem so fast. I will try it as soon > as the new changes will be available in the maven repository. > > Núria. > > > 2009/12/4 Mattias Persson > >> I fixed the problem and also added a cache per key for faster >> getNodes/getSingleNode lookup during the insert process. However the >> cache assumes that there's nothing in the index when the process >> starts (which almost always will be true) to speed things up even >> further. >> >> You can control the cache size and if it should be used by overriding >> the (this is also documented in the Javadoc): >> >> boolean useCache() >> int getMaxCacheSizePerKey() >> >> methods in your LuceneIndexBatchInserterImpl instance. The new changes >> should be available in the maven repository within an hour. >> >> 2009/12/4 Mattias Persson : >> > I think I found the problem... it's indexing as it should, but it >> > isn't reflected in getNodes/getSingleNode properly until you >> > flush/optimize/shutdown the index. I'll try to fix it today! >> > >> > 2009/12/3 Núria Trench : >> >> Thank you very much for your response. >> >> If you need more information, you only have to send an e-mail and I >> will try >> >> to explain it better. >> >> >> >> Núria. >> >> >> >> 2009/12/3 Mattias Persson >> >> >> >>> This is something I'd like to reproduce and I'll do some testing on >> >>> this tomorrow >> >>> >> >>> 2009/12/3 Núria Trench : >> >>> > Hello, >> >>> > >> >>> > Last week, I decided to download your graph database core in order >> to use >> >>> > it. First, I created a new project to parse my CSV files and create >> a new >> >>> > graph database with Neo4j. This CSV files contain 150 milion edges >> and 20 >> >>> > milion nodes. >> >>> > >> >>> > When I finished to write the code which will create the graph >> database, I >> >>> > executed it and, after six hours of execution, the program crashes >> >>> because >> >>> > of a Lucene exception. The exception is related to the index merging >> and >> >>> it >> >>> > has the following message: >> >>> > "mergeFields produced an invalid result: docCount is 385282378 but >> fdx >> >>> file >> >>> > size is 3082259028; now aborting this merge to prevent index >> corruption" >> >>> > >> >>> > I have searched on the net and I found that it is a lucene bug. The >> >>> > libraries used for executing my project were: >> >>> > neo-1.0-b10 >> >>> > index-util-0.7 >> >>> > lucene-core-2.4.0 >> >>> > >> >>> > So, I decided to use a newer Lucene version. I found that you have a >> >>> newer >> >>> > index-util version so I updated the libraries: >> >>> > neo-1.0-b10 >> >>> > index-util-0.9 >> >>> > lucene-core-2.9.1 >> >>> > >> >>> > When I had updated those libraries, I tried to execute my project >> again >> >>> and >> >>> > I found that, in many occassions, it was not indexing properly. So, >> I >> >>> tried >> >>> > to optimize the index after every time I indexed something. This was >> a >> >>> > solution because, after that, it was indexing properly but the time >> >>> > execution increased a lot. >> >>> > >> >>> > I am not using transactions, instead of this, I am using the Batch >> >>> Inserter >> >>> > with the LuceneIndexBatchInserter. >> >>> > >> >>> > So, my question is: What can I do to solve this problem? If use >> >>> > index-util-0.7 I cannot finish the execution of creating the graph >> >>> database >> >>> > and I use index-util-0.9 I have to optimize the index in every >> insertion >> >>> and >> >>> > the execution never ever ends. >> >>> > >> >>> > Thank you very much in advance, >> >>> > >> >>> > Núria. >> >>> > ___ >> >>> > Neo mailing list >> >>> > User@lists.neo4j.org >> >>> > https://lists.neo4j.org/mailman/listinfo/user >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Mattias Persson, [matt...@neotechnology.com] >> >>> Neo Technology, www.neotechnology.com >> >>> ___ >> >>> Neo mailing list >> >>> User@lists.neo4j.org >> >>> https://lists.neo4j.org/mailman/listinfo/user >> >>> >> >> ___ >> >> Neo mailing list >>
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson > I fixed the problem and also added a cache per key for faster > getNodes/getSingleNode lookup during the insert process. However the > cache assumes that there's nothing in the index when the process > starts (which almost always will be true) to speed things up even > further. > > You can control the cache size and if it should be used by overriding > the (this is also documented in the Javadoc): > > boolean useCache() > int getMaxCacheSizePerKey() > > methods in your LuceneIndexBatchInserterImpl instance. The new changes > should be available in the maven repository within an hour. > > 2009/12/4 Mattias Persson : > > I think I found the problem... it's indexing as it should, but it > > isn't reflected in getNodes/getSingleNode properly until you > > flush/optimize/shutdown the index. I'll try to fix it today! > > > > 2009/12/3 Núria Trench : > >> Thank you very much for your response. > >> If you need more information, you only have to send an e-mail and I will > try > >> to explain it better. > >> > >> Núria. > >> > >> 2009/12/3 Mattias Persson > >> > >>> This is something I'd like to reproduce and I'll do some testing on > >>> this tomorrow > >>> > >>> 2009/12/3 Núria Trench : > >>> > Hello, > >>> > > >>> > Last week, I decided to download your graph database core in order to > use > >>> > it. First, I created a new project to parse my CSV files and create a > new > >>> > graph database with Neo4j. This CSV files contain 150 milion edges > and 20 > >>> > milion nodes. > >>> > > >>> > When I finished to write the code which will create the graph > database, I > >>> > executed it and, after six hours of execution, the program crashes > >>> because > >>> > of a Lucene exception. The exception is related to the index merging > and > >>> it > >>> > has the following message: > >>> > "mergeFields produced an invalid result: docCount is 385282378 but > fdx > >>> file > >>> > size is 3082259028; now aborting this merge to prevent index > corruption" > >>> > > >>> > I have searched on the net and I found that it is a lucene bug. The > >>> > libraries used for executing my project were: > >>> > neo-1.0-b10 > >>> > index-util-0.7 > >>> > lucene-core-2.4.0 > >>> > > >>> > So, I decided to use a newer Lucene version. I found that you have a > >>> newer > >>> > index-util version so I updated the libraries: > >>> > neo-1.0-b10 > >>> > index-util-0.9 > >>> > lucene-core-2.9.1 > >>> > > >>> > When I had updated those libraries, I tried to execute my project > again > >>> and > >>> > I found that, in many occassions, it was not indexing properly. So, I > >>> tried > >>> > to optimize the index after every time I indexed something. This was > a > >>> > solution because, after that, it was indexing properly but the time > >>> > execution increased a lot. > >>> > > >>> > I am not using transactions, instead of this, I am using the Batch > >>> Inserter > >>> > with the LuceneIndexBatchInserter. > >>> > > >>> > So, my question is: What can I do to solve this problem? If use > >>> > index-util-0.7 I cannot finish the execution of creating the graph > >>> database > >>> > and I use index-util-0.9 I have to optimize the index in every > insertion > >>> and > >>> > the execution never ever ends. > >>> > > >>> > Thank you very much in advance, > >>> > > >>> > Núria. > >>> > ___ > >>> > Neo mailing list > >>> > User@lists.neo4j.org > >>> > https://lists.neo4j.org/mailman/listinfo/user > >>> > > >>> > >>> > >>> > >>> -- > >>> Mattias Persson, [matt...@neotechnology.com] > >>> Neo Technology, www.neotechnology.com > >>> ___ > >>> Neo mailing list > >>> User@lists.neo4j.org > >>> https://lists.neo4j.org/mailman/listinfo/user > >>> > >> ___ > >> Neo mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > > > > > > > -- > > Mattias Persson, [matt...@neotechnology.com] > > Neo Technology, www.neotechnology.com > > > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson : > I think I found the problem... it's indexing as it should, but it > isn't reflected in getNodes/getSingleNode properly until you > flush/optimize/shutdown the index. I'll try to fix it today! > > 2009/12/3 Núria Trench : >> Thank you very much for your response. >> If you need more information, you only have to send an e-mail and I will try >> to explain it better. >> >> Núria. >> >> 2009/12/3 Mattias Persson >> >>> This is something I'd like to reproduce and I'll do some testing on >>> this tomorrow >>> >>> 2009/12/3 Núria Trench : >>> > Hello, >>> > >>> > Last week, I decided to download your graph database core in order to use >>> > it. First, I created a new project to parse my CSV files and create a new >>> > graph database with Neo4j. This CSV files contain 150 milion edges and 20 >>> > milion nodes. >>> > >>> > When I finished to write the code which will create the graph database, I >>> > executed it and, after six hours of execution, the program crashes >>> because >>> > of a Lucene exception. The exception is related to the index merging and >>> it >>> > has the following message: >>> > "mergeFields produced an invalid result: docCount is 385282378 but fdx >>> file >>> > size is 3082259028; now aborting this merge to prevent index corruption" >>> > >>> > I have searched on the net and I found that it is a lucene bug. The >>> > libraries used for executing my project were: >>> > neo-1.0-b10 >>> > index-util-0.7 >>> > lucene-core-2.4.0 >>> > >>> > So, I decided to use a newer Lucene version. I found that you have a >>> newer >>> > index-util version so I updated the libraries: >>> > neo-1.0-b10 >>> > index-util-0.9 >>> > lucene-core-2.9.1 >>> > >>> > When I had updated those libraries, I tried to execute my project again >>> and >>> > I found that, in many occassions, it was not indexing properly. So, I >>> tried >>> > to optimize the index after every time I indexed something. This was a >>> > solution because, after that, it was indexing properly but the time >>> > execution increased a lot. >>> > >>> > I am not using transactions, instead of this, I am using the Batch >>> Inserter >>> > with the LuceneIndexBatchInserter. >>> > >>> > So, my question is: What can I do to solve this problem? If use >>> > index-util-0.7 I cannot finish the execution of creating the graph >>> database >>> > and I use index-util-0.9 I have to optimize the index in every insertion >>> and >>> > the execution never ever ends. >>> > >>> > Thank you very much in advance, >>> > >>> > Núria. >>> > ___ >>> > Neo mailing list >>> > User@lists.neo4j.org >>> > https://lists.neo4j.org/mailman/listinfo/user >>> > >>> >>> >>> >>> -- >>> Mattias Persson, [matt...@neotechnology.com] >>> Neo Technology, www.neotechnology.com >>> ___ >>> Neo mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >>> >> ___ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench : > Thank you very much for your response. > If you need more information, you only have to send an e-mail and I will try > to explain it better. > > Núria. > > 2009/12/3 Mattias Persson > >> This is something I'd like to reproduce and I'll do some testing on >> this tomorrow >> >> 2009/12/3 Núria Trench : >> > Hello, >> > >> > Last week, I decided to download your graph database core in order to use >> > it. First, I created a new project to parse my CSV files and create a new >> > graph database with Neo4j. This CSV files contain 150 milion edges and 20 >> > milion nodes. >> > >> > When I finished to write the code which will create the graph database, I >> > executed it and, after six hours of execution, the program crashes >> because >> > of a Lucene exception. The exception is related to the index merging and >> it >> > has the following message: >> > "mergeFields produced an invalid result: docCount is 385282378 but fdx >> file >> > size is 3082259028; now aborting this merge to prevent index corruption" >> > >> > I have searched on the net and I found that it is a lucene bug. The >> > libraries used for executing my project were: >> > neo-1.0-b10 >> > index-util-0.7 >> > lucene-core-2.4.0 >> > >> > So, I decided to use a newer Lucene version. I found that you have a >> newer >> > index-util version so I updated the libraries: >> > neo-1.0-b10 >> > index-util-0.9 >> > lucene-core-2.9.1 >> > >> > When I had updated those libraries, I tried to execute my project again >> and >> > I found that, in many occassions, it was not indexing properly. So, I >> tried >> > to optimize the index after every time I indexed something. This was a >> > solution because, after that, it was indexing properly but the time >> > execution increased a lot. >> > >> > I am not using transactions, instead of this, I am using the Batch >> Inserter >> > with the LuceneIndexBatchInserter. >> > >> > So, my question is: What can I do to solve this problem? If use >> > index-util-0.7 I cannot finish the execution of creating the graph >> database >> > and I use index-util-0.9 I have to optimize the index in every insertion >> and >> > the execution never ever ends. >> > >> > Thank you very much in advance, >> > >> > Núria. >> > ___ >> > Neo mailing list >> > User@lists.neo4j.org >> > https://lists.neo4j.org/mailman/listinfo/user >> > >> >> >> >> -- >> Mattias Persson, [matt...@neotechnology.com] >> Neo Technology, www.neotechnology.com >> ___ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson > This is something I'd like to reproduce and I'll do some testing on > this tomorrow > > 2009/12/3 Núria Trench : > > Hello, > > > > Last week, I decided to download your graph database core in order to use > > it. First, I created a new project to parse my CSV files and create a new > > graph database with Neo4j. This CSV files contain 150 milion edges and 20 > > milion nodes. > > > > When I finished to write the code which will create the graph database, I > > executed it and, after six hours of execution, the program crashes > because > > of a Lucene exception. The exception is related to the index merging and > it > > has the following message: > > "mergeFields produced an invalid result: docCount is 385282378 but fdx > file > > size is 3082259028; now aborting this merge to prevent index corruption" > > > > I have searched on the net and I found that it is a lucene bug. The > > libraries used for executing my project were: > > neo-1.0-b10 > > index-util-0.7 > > lucene-core-2.4.0 > > > > So, I decided to use a newer Lucene version. I found that you have a > newer > > index-util version so I updated the libraries: > > neo-1.0-b10 > > index-util-0.9 > > lucene-core-2.9.1 > > > > When I had updated those libraries, I tried to execute my project again > and > > I found that, in many occassions, it was not indexing properly. So, I > tried > > to optimize the index after every time I indexed something. This was a > > solution because, after that, it was indexing properly but the time > > execution increased a lot. > > > > I am not using transactions, instead of this, I am using the Batch > Inserter > > with the LuceneIndexBatchInserter. > > > > So, my question is: What can I do to solve this problem? If use > > index-util-0.7 I cannot finish the execution of creating the graph > database > > and I use index-util-0.9 I have to optimize the index in every insertion > and > > the execution never ever ends. > > > > Thank you very much in advance, > > > > Núria. > > ___ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench : > Hello, > > Last week, I decided to download your graph database core in order to use > it. First, I created a new project to parse my CSV files and create a new > graph database with Neo4j. This CSV files contain 150 milion edges and 20 > milion nodes. > > When I finished to write the code which will create the graph database, I > executed it and, after six hours of execution, the program crashes because > of a Lucene exception. The exception is related to the index merging and it > has the following message: > "mergeFields produced an invalid result: docCount is 385282378 but fdx file > size is 3082259028; now aborting this merge to prevent index corruption" > > I have searched on the net and I found that it is a lucene bug. The > libraries used for executing my project were: > neo-1.0-b10 > index-util-0.7 > lucene-core-2.4.0 > > So, I decided to use a newer Lucene version. I found that you have a newer > index-util version so I updated the libraries: > neo-1.0-b10 > index-util-0.9 > lucene-core-2.9.1 > > When I had updated those libraries, I tried to execute my project again and > I found that, in many occassions, it was not indexing properly. So, I tried > to optimize the index after every time I indexed something. This was a > solution because, after that, it was indexing properly but the time > execution increased a lot. > > I am not using transactions, instead of this, I am using the Batch Inserter > with the LuceneIndexBatchInserter. > > So, my question is: What can I do to solve this problem? If use > index-util-0.7 I cannot finish the execution of creating the graph database > and I use index-util-0.9 I have to optimize the index in every insertion and > the execution never ever ends. > > Thank you very much in advance, > > Núria. > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] LuceneIndexBatchInserter doubt
Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to parse my CSV files and create a new graph database with Neo4j. This CSV files contain 150 milion edges and 20 milion nodes. When I finished to write the code which will create the graph database, I executed it and, after six hours of execution, the program crashes because of a Lucene exception. The exception is related to the index merging and it has the following message: "mergeFields produced an invalid result: docCount is 385282378 but fdx file size is 3082259028; now aborting this merge to prevent index corruption" I have searched on the net and I found that it is a lucene bug. The libraries used for executing my project were: neo-1.0-b10 index-util-0.7 lucene-core-2.4.0 So, I decided to use a newer Lucene version. I found that you have a newer index-util version so I updated the libraries: neo-1.0-b10 index-util-0.9 lucene-core-2.9.1 When I had updated those libraries, I tried to execute my project again and I found that, in many occassions, it was not indexing properly. So, I tried to optimize the index after every time I indexed something. This was a solution because, after that, it was indexing properly but the time execution increased a lot. I am not using transactions, instead of this, I am using the Batch Inserter with the LuceneIndexBatchInserter. So, my question is: What can I do to solve this problem? If use index-util-0.7 I cannot finish the execution of creating the graph database and I use index-util-0.9 I have to optimize the index in every insertion and the execution never ever ends. Thank you very much in advance, Núria. ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user