Re: [Neo] LuceneIndexBatchInserter doubt
Hi Núria, the current ID-scheme of using Integers for IDs for both Nodes, Relationships and Properties limits the possible node space size to 4 Billion nodes, 4 Billion Relationships and 4 Billion properties. Of course one could switch to Long as IDs, but that will increase the reserved amount of bytes and cause possible performance penalties. However, this is the current limit, after that you have to start thinking about sharding along a suitable domain-specific criteria. What size and domain are you imagining? However, when dealing with bigger nodespaces you probably want to increase RAM of your server machine and think about SSD in order to keep the often-used parts of your graph cached and minimize IO cost. HTH Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org- Relationships count. http://gremlin.tinkerpop.com- PageRank in 2 lines of code. http://www.linkedprocess.org - Computing at LinkedData scale. On Sat, Dec 26, 2009 at 4:10 PM, Núria Trench nuriatre...@gmail.com wrote: Hi, I have just finished parsing and creating the database with the latest index-util-0.9-SNAPSHOT available in your repository. It has been finished succesfully so I must thank you for your interest and useful help. And, finally, I have one last question. I have been created 180 million of edges and 20 million of nodes. Is it possible to create a bigger amount of edges and nodes with Neo4j? Do you have a limit? Thank your very much again. 2009/12/21 Núria Trench nuriatre...@gmail.com Hi again Mattias, I'm still trying to parse all the data in order to create the graph. I will report the results as soon as possible. Thank you very much for your interest. Núria. 2009/12/21 Mattias Persson matt...@neotechnology.com Hi again, any luck with this yet? 2009/12/11 Núria Trench nuriatre...@gmail.com: Thank you very much Mattias. I will test it as soon as possible and I'll will tell you something. Núria. 2009/12/11 Mattias Persson matt...@neotechnology.com I've tried this a couple of times now and first of all I see some problems in your code: 1) In the method createRelationsTitleImage you have an inverted head != -1 check where it should be head == -1 2) You index relationships in createRelationsBetweenTitles method, this isn't ok since the index can only manage nodes. And I recently committed a fix which removed the caching layer in the LuceneIndexBatchInserterImpl (and therefore also LuceneFulltextIndexBatchInserter). This probably fixes your problems. I'm also working on a performance fix which makes consecutive getNodes calls faster. So I think that with these fixes (1) and (2) and the latest index-util 0.9-SNAPSHOT your sample will run fine. Also you could try without calling optimize. See more information at http://wiki.neo4j.org/content/Indexing_with_BatchInserter 2009/12/10 Mattias Persson matt...@neotechnology.com: To continue this thread in the user list: Thanks Núria, I've gotten your samples code/files and I'm running it now to try to reproduce you problem. 2009/12/9 Núria Trench nuriatre...@gmail.com: I have finished uploading the 4 csv files. You'll see an e-mail with the other 3 csv files packed in a rar file. Thanks, Núria. 2009/12/9 Núria Trench nuriatre...@gmail.com Yes, you are right. But there is one csv file that is too big to be packed with other files and I am reducing it. I am sending the other files now. 2009/12/9 Mattias Persson matt...@neotechnology.com By the way, you might consider packing those files (with zip or tar.gz or something) cause they will shrink quite well 2009/12/9 Mattias Persson matt...@neotechnology.com: Great, but I only got the images.csv file... I'm starting to test with that at least 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi again, The errors show up after being parsed 2 csv files to create all the nodes, just in the moment of calling the method getSingleNode for looking up the tail and head node for creating all the edges by reading the other two csv files. I am sending with Sprend the four csv files that will help you to trigger index behaviour. Thank you, Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hmm, I've no idea... but does the errors show up early in the process or do you have to insert a LOT of data to trigger it? In such case you could send me a part of them... maybe using http://www.sprend.se, WDYT? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, The data isn't confident but the files are very big (5,5 GB).
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again Mattias, I'm still trying to parse all the data in order to create the graph. I will report the results as soon as possible. Thank you very much for your interest. Núria. 2009/12/21 Mattias Persson matt...@neotechnology.com Hi again, any luck with this yet? 2009/12/11 Núria Trench nuriatre...@gmail.com: Thank you very much Mattias. I will test it as soon as possible and I'll will tell you something. Núria. 2009/12/11 Mattias Persson matt...@neotechnology.com I've tried this a couple of times now and first of all I see some problems in your code: 1) In the method createRelationsTitleImage you have an inverted head != -1 check where it should be head == -1 2) You index relationships in createRelationsBetweenTitles method, this isn't ok since the index can only manage nodes. And I recently committed a fix which removed the caching layer in the LuceneIndexBatchInserterImpl (and therefore also LuceneFulltextIndexBatchInserter). This probably fixes your problems. I'm also working on a performance fix which makes consecutive getNodes calls faster. So I think that with these fixes (1) and (2) and the latest index-util 0.9-SNAPSHOT your sample will run fine. Also you could try without calling optimize. See more information at http://wiki.neo4j.org/content/Indexing_with_BatchInserter 2009/12/10 Mattias Persson matt...@neotechnology.com: To continue this thread in the user list: Thanks Núria, I've gotten your samples code/files and I'm running it now to try to reproduce you problem. 2009/12/9 Núria Trench nuriatre...@gmail.com: I have finished uploading the 4 csv files. You'll see an e-mail with the other 3 csv files packed in a rar file. Thanks, Núria. 2009/12/9 Núria Trench nuriatre...@gmail.com Yes, you are right. But there is one csv file that is too big to be packed with other files and I am reducing it. I am sending the other files now. 2009/12/9 Mattias Persson matt...@neotechnology.com By the way, you might consider packing those files (with zip or tar.gz or something) cause they will shrink quite well 2009/12/9 Mattias Persson matt...@neotechnology.com: Great, but I only got the images.csv file... I'm starting to test with that at least 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi again, The errors show up after being parsed 2 csv files to create all the nodes, just in the moment of calling the method getSingleNode for looking up the tail and head node for creating all the edges by reading the other two csv files. I am sending with Sprend the four csv files that will help you to trigger index behaviour. Thank you, Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hmm, I've no idea... but does the errors show up early in the process or do you have to insert a LOT of data to trigger it? In such case you could send me a part of them... maybe using http://www.sprend.se, WDYT? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, The data isn't confident but the files are very big (5,5 GB). How can I send you this data? 2009/12/9 Mattias Persson matt...@neotechnology.com Yep I got the java code, thanks. Yeah if the data is confident or sensitive you can just send me the formatting, else consider sending the files as well (or a subset if they are big). 2009/12/9 Núria Trench nuriatre...@gmail.com: -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
I've tried this a couple of times now and first of all I see some problems in your code: 1) In the method createRelationsTitleImage you have an inverted head != -1 check where it should be head == -1 2) You index relationships in createRelationsBetweenTitles method, this isn't ok since the index can only manage nodes. And I recently committed a fix which removed the caching layer in the LuceneIndexBatchInserterImpl (and therefore also LuceneFulltextIndexBatchInserter). This probably fixes your problems. I'm also working on a performance fix which makes consecutive getNodes calls faster. So I think that with these fixes (1) and (2) and the latest index-util 0.9-SNAPSHOT your sample will run fine. Also you could try without calling optimize. See more information at http://wiki.neo4j.org/content/Indexing_with_BatchInserter 2009/12/10 Mattias Persson matt...@neotechnology.com: To continue this thread in the user list: Thanks Núria, I've gotten your samples code/files and I'm running it now to try to reproduce you problem. 2009/12/9 Núria Trench nuriatre...@gmail.com: I have finished uploading the 4 csv files. You'll see an e-mail with the other 3 csv files packed in a rar file. Thanks, Núria. 2009/12/9 Núria Trench nuriatre...@gmail.com Yes, you are right. But there is one csv file that is too big to be packed with other files and I am reducing it. I am sending the other files now. 2009/12/9 Mattias Persson matt...@neotechnology.com By the way, you might consider packing those files (with zip or tar.gz or something) cause they will shrink quite well 2009/12/9 Mattias Persson matt...@neotechnology.com: Great, but I only got the images.csv file... I'm starting to test with that at least 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi again, The errors show up after being parsed 2 csv files to create all the nodes, just in the moment of calling the method getSingleNode for looking up the tail and head node for creating all the edges by reading the other two csv files. I am sending with Sprend the four csv files that will help you to trigger index behaviour. Thank you, Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hmm, I've no idea... but does the errors show up early in the process or do you have to insert a LOT of data to trigger it? In such case you could send me a part of them... maybe using http://www.sprend.se , WDYT? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, The data isn't confident but the files are very big (5,5 GB). How can I send you this data? 2009/12/9 Mattias Persson matt...@neotechnology.com Yep I got the java code, thanks. Yeah if the data is confident or sensitive you can just send me the formatting, else consider sending the files as well (or a subset if they are big). 2009/12/9 Núria Trench nuriatre...@gmail.com: -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
To continue this thread in the user list: Thanks Núria, I've gotten your samples code/files and I'm running it now to try to reproduce you problem. 2009/12/9 Núria Trench nuriatre...@gmail.com: I have finished uploading the 4 csv files. You'll see an e-mail with the other 3 csv files packed in a rar file. Thanks, Núria. 2009/12/9 Núria Trench nuriatre...@gmail.com Yes, you are right. But there is one csv file that is too big to be packed with other files and I am reducing it. I am sending the other files now. 2009/12/9 Mattias Persson matt...@neotechnology.com By the way, you might consider packing those files (with zip or tar.gz or something) cause they will shrink quite well 2009/12/9 Mattias Persson matt...@neotechnology.com: Great, but I only got the images.csv file... I'm starting to test with that at least 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi again, The errors show up after being parsed 2 csv files to create all the nodes, just in the moment of calling the method getSingleNode for looking up the tail and head node for creating all the edges by reading the other two csv files. I am sending with Sprend the four csv files that will help you to trigger index behaviour. Thank you, Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hmm, I've no idea... but does the errors show up early in the process or do you have to insert a LOT of data to trigger it? In such case you could send me a part of them... maybe using http://www.sprend.se , WDYT? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, The data isn't confident but the files are very big (5,5 GB). How can I send you this data? 2009/12/9 Mattias Persson matt...@neotechnology.com Yep I got the java code, thanks. Yeah if the data is confident or sensitive you can just send me the formatting, else consider sending the files as well (or a subset if they are big). 2009/12/9 Núria Trench nuriatre...@gmail.com: -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson matt...@neotechnology.com This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench nuriatre...@gmail.com: Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again. Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected
Re: [Neo] LuceneIndexBatchInserter doubt
Oh ok, It could be our attachments filter / security or something... could you try to mail them to me directly at matt...@neotechnology.com ? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again. Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, I have already done it 10 minutes ago. If you need an example to see the format of the 4 csv files, I can send it to you. Thanks again, Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Oh ok, It could be our attachments filter / security or something... could you try to mail them to me directly at matt...@neotechnology.com ? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again. Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson matt...@neotechnology.com This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench nuriatre...@gmail.com: Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to parse my CSV files and create a new graph database with Neo4j. This CSV files contain 150 milion edges and 20 milion nodes. When I finished to write the code which will create the graph database, I executed it and, after six hours of execution, the program crashes because of a Lucene exception. The exception is related to the index merging and it has the following message: mergeFields produced an invalid result: docCount is 385282378 but fdx file size is 3082259028; now aborting this merge to prevent index corruption I have searched on the net and I found that it is a lucene bug. The libraries used for executing my project were: neo-1.0-b10 index-util-0.7 lucene-core-2.4.0 So, I decided to use a newer Lucene version. I found that you have a newer index-util version so I updated the libraries: neo-1.0-b10 index-util-0.9 lucene-core-2.9.1 When I had updated those libraries, I tried to execute my project again and I found that, in many occassions, it was not indexing properly. So, I tried to optimize the index after every time I indexed something. This was a solution because, after that, it was indexing properly but the time execution increased a lot. I am not using transactions, instead of this, I am using the Batch Inserter with the LuceneIndexBatchInserter. So, my question is: What can I do to solve this problem? If use index-util-0.7 I cannot finish the execution of creating the graph database and I use index-util-0.9 I have to optimize the index in every insertion and the execution never ever ends. Thank you very much in advance, Núria. ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson matt...@neotechnology.com This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench nuriatre...@gmail.com: Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to parse my CSV files and create a new graph database with Neo4j. This CSV files contain 150 milion edges and 20 milion nodes. When I finished to write the code which will create the graph database, I executed it and, after six hours of execution, the program crashes because of a Lucene exception. The exception is related to the index merging and it has the following message: mergeFields produced an invalid result: docCount is 385282378 but fdx file size is 3082259028; now aborting this merge to prevent index corruption I have searched on the net and I found that it is a lucene bug. The libraries used for executing my project were: neo-1.0-b10 index-util-0.7 lucene-core-2.4.0 So, I decided to use a newer Lucene version. I found that you have a newer index-util version so I updated the libraries: neo-1.0-b10 index-util-0.9 lucene-core-2.9.1 When I had updated those libraries, I tried to execute my project again and I found that, in many occassions, it was not indexing properly. So, I tried to optimize the index after every time I indexed something. This was a solution because, after that, it was indexing properly but the time execution increased a lot. I am not using transactions, instead of this, I am using the Batch Inserter with
Re: [Neo] LuceneIndexBatchInserter doubt
Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson matt...@neotechnology.com This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench nuriatre...@gmail.com: Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to parse my CSV files and create a new graph database with Neo4j. This CSV files contain 150 milion edges and 20 milion nodes. When I finished to write the code which will create the graph database, I executed it and, after six hours of execution, the program crashes because of a Lucene exception. The exception is related to the index merging and it has the following message: mergeFields produced an invalid result: docCount is 385282378 but fdx file size is 3082259028; now aborting this merge to prevent index corruption I have searched on the net and I found that it is a lucene bug. The libraries used for executing my project were: neo-1.0-b10 index-util-0.7 lucene-core-2.4.0 So, I decided to use a newer Lucene version. I found that you have a newer index-util version so I updated the libraries: neo-1.0-b10 index-util-0.9 lucene-core-2.9.1 When I had updated those libraries, I tried to execute my project again and I found that, in many occassions, it was not indexing properly. So, I tried to optimize the index after every time I indexed something.
Re: [Neo] LuceneIndexBatchInserter doubt
I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson matt...@neotechnology.com This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench nuriatre...@gmail.com: Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to parse my CSV files and create a new graph database with Neo4j. This CSV files contain 150 milion edges and 20 milion nodes. When I finished to write the code which will create the graph database, I executed it and, after six hours of execution, the program crashes because of a Lucene exception. The exception is related to the index merging and it has the following message: mergeFields produced an invalid result: docCount is 385282378 but fdx file size is 3082259028; now aborting this merge to prevent index corruption I have searched on the net and I found that it is a lucene bug. The libraries used for executing my project were: neo-1.0-b10 index-util-0.7 lucene-core-2.4.0 So, I decided to use a newer Lucene version. I found that you have a newer index-util version so I updated the libraries: neo-1.0-b10 index-util-0.9 lucene-core-2.9.1 When I had updated those libraries, I tried to execute my project again and I found that, in many occassions, it was not indexing properly. So, I tried to optimize the index after every time I indexed something. This was a solution because, after that, it was indexing properly but the time execution increased a lot. I am not using transactions, instead of this, I am using the Batch Inserter with the LuceneIndexBatchInserter. So, my question is: What can I do to solve this problem? If use index-util-0.7 I cannot finish the execution of creating the graph database and I use index-util-0.9 I have to optimize the index in every insertion and the execution never ever ends. Thank you very much in advance, Núria. ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson matt...@neotechnology.com This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench nuriatre...@gmail.com: Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to parse my CSV files and create a new graph database with Neo4j. This CSV files contain 150 milion edges and 20 milion nodes. When I finished to write the code which will create the graph database, I executed it and, after six hours of execution, the program crashes because of a Lucene exception. The exception is related to the index merging and it has the following message: mergeFields produced an invalid result: docCount is 385282378 but fdx file size is 3082259028; now aborting this merge to prevent index corruption I have searched on the net and I found that it is a lucene bug. The libraries used for executing my project were: neo-1.0-b10 index-util-0.7 lucene-core-2.4.0 So, I decided to use a newer Lucene version. I found that you have a newer index-util version so I updated the libraries: neo-1.0-b10 index-util-0.9 lucene-core-2.9.1 When I had updated those libraries, I tried to execute my project again and I found that, in many occassions, it was not indexing properly. So, I tried to optimize the index after every time I indexed something. This was a solution because, after that, it was indexing properly but the time execution increased a lot. I am not using transactions, instead of this, I am using the Batch Inserter with the LuceneIndexBatchInserter. So, my question is: What can I do to solve this problem? If use index-util-0.7 I cannot finish the execution of creating the graph database and I use index-util-0.9 I have to optimize the index in every insertion and the execution never ever ends. Thank you very much in advance, Núria. ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] LuceneIndexBatchInserter doubt
Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson matt...@neotechnology.com This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench nuriatre...@gmail.com: Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to parse my CSV files and create a new graph database with Neo4j. This CSV files contain 150 milion edges and 20 milion nodes. When I finished to write the code which will create the graph database, I executed it and, after six hours of execution, the program crashes because of a Lucene exception. The exception is related to the index merging and it has the following message: mergeFields produced an invalid result: docCount is 385282378 but fdx file size is 3082259028; now aborting this merge to prevent index corruption I have searched on the net and I found that it is a lucene bug. The libraries used for executing my project were: neo-1.0-b10 index-util-0.7 lucene-core-2.4.0 So, I decided to use a newer Lucene version. I found that you have a newer index-util version so I updated the libraries: neo-1.0-b10 index-util-0.9 lucene-core-2.9.1 When I had updated those libraries, I tried to execute my project again and I found that, in many occassions, it was not indexing properly. So, I tried to optimize the index after every time I indexed something. This was a solution because, after that, it was indexing properly but the time execution increased a lot. I am not using transactions, instead of this, I am using the Batch Inserter with the LuceneIndexBatchInserter. So, my question is: What can I do to solve this problem? If use index-util-0.7 I cannot finish the execution of creating the graph database and I use index-util-0.9 I have to optimize the index in every insertion and the execution never ever ends. Thank you very much in advance, Núria. ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user