[Neo] I/O load in Neo during traversals
When doing some large traversal testing (no writes/updates), I noticed that the neostore.propertystore.db.strings file was seeing a lot of read I/O (as expected) but also a huge amount of write I/O (almost 5X the read I/O rate). Out of curiosity, what is the write activity that needs to occur when doing traversals? ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Troubleshooting performance/memory issues
FYI, we experimented with different heap size (1GB), along with different "chunk sizes", and were able to eliminate the heap error and get about a 10X improvement in insert speed. It would be helpful to better understand the interactions of the various Neo startup parameters, transaction buffers, and so on, and their impact on performance. I read the performance guidelines, which was some help, but perhaps some additional scenario-based recommendations might help (frequent updates/frequent access, infrequent update/frequent access, burst mode update vs steady update rate, etc...). Learning more about Neo every hour! -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Rick Bullotta Sent: Wednesday, December 09, 2009 2:57 PM To: 'Neo user discussions' Subject: [Neo] Troubleshooting performance/memory issues Hi, all. When trying to load a few hundred thousand nodes & relationships (chunking it in groups of 1000 nodes or so), we are getting an out of memory heap error after 15-20 minutes or so. No big deal, we expanded the heap settings for the JVM. But then we also noticed that the nioneo_logical_log.xxx file was continuing to grow, even though we were wrapping each 1000 node inserts in their own transaction (there is no other transaction active) and committing w/success and finishing each group of 1000.Periodically (seemingly unrelated to our transaction finishing), that file shrinks again and the data is flushed to the other neo propertystore and relationshipstore files. I just wanted to check if that was normal behavior, or if there is something wrong with way we (or Neo) is handling the transactions, and thus the reason we hit an out-of-memory error. Thanks, Rick ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Troubleshooting performance/memory issues
Hi, all. When trying to load a few hundred thousand nodes & relationships (chunking it in groups of 1000 nodes or so), we are getting an out of memory heap error after 15-20 minutes or so. No big deal, we expanded the heap settings for the JVM. But then we also noticed that the nioneo_logical_log.xxx file was continuing to grow, even though we were wrapping each 1000 node inserts in their own transaction (there is no other transaction active) and committing w/success and finishing each group of 1000.Periodically (seemingly unrelated to our transaction finishing), that file shrinks again and the data is flushed to the other neo propertystore and relationshipstore files. I just wanted to check if that was normal behavior, or if there is something wrong with way we (or Neo) is handling the transactions, and thus the reason we hit an out-of-memory error. Thanks, Rick ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Noob questions/comments
Hi, all. Here are a few questions and comments that I'd welcome feedback on : Questions: - If you delete the reference node (id = 0), how can you recreate it? - If you have a number of "loose" or disjoint graphs structured as trees with a single root node, is there a best practice for tracking/iterating only the top level node(s) of these disjoint graphs? Is relating them to the reference node and doing a first level traversal the best way? - We would like to treat our properties as slightly more complex than a simple type (they might have a last modified date, validity flag, and so on) - given the choice between adding properties to track this state or using nodes and relationships for these entities, what are the pros and cons of each approach? - One aspect of our application will store nodes that can be considered similar to event logs. There may be many thousands of these nodes per "event stream". We would like to be able to traverse the entries in chronological order, very quickly. We were considering the following design possibilities: o Simply create a node for each "stream" and a node for each entry, with a relationship between the stream and the entry, then implement our own sort routine o Similar to the above, but create a node for each "day", and manage relationships to allow traversal by stream and/or day o Create a node for each stream, a node for each entry and treat the entries as a forward-only linked list using relationships between the entries (and of course a relationship between the stream and the "first" entry) - Has the fact that the node id is an "int" rather than a "long" been an issue in any implementations? Are node id's reused if deleted (I suspect not, but just wanted to confirm). - Any whitepaper/best practices for high availability/load-balanced scenarios? We were considering using a message queue to send "deltas" around between nodes or something similar. - We'll be hosting Neo inside a servlet engine. Plan was to start up Neo within the init method of an autoloading servlet. Any other recommendations/suggestions? Best practice for ensuring a clean shutdown? - Anyone used any kind of intermediate index or other approach to bridge multiple Neo instances? - Any GUI tools for viewing/navigating the graph structure? We are prototyping one in Adobe Flex, curious if there are others. Comments/observations: - I love the fact that you can delete nodes and relationships from inside an iterator. I always hated the way I had to separately maintain a list of "things to be deleted" when traversing XML DOMs, for example. Nice capability! - Neo seems FAST! - It's a bit of a major mindset change, but once the lightbulb goes on, the potential seems limitless! Thanks in advance for guidance. Rick ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Lucene Index Corruption?
Hi, Lots of questions :) 1. This iterator will just prevent duplicates from being returned from the iterator? If there's a condition (bug in my code) that causes shutdown w/ open transactions, will the Lucene indexes continue to double until they're huge? 2. Would it be possible to detect this situation, and rebuild the indexes? I guess this is a losing cause if the app is regularly corrupting the data. 3. Could you allow me to close transacations from different threads? Yesterday, I wrote something that tracks tx opens and closes, and could iterate through all open transactions and call finish() on them. But TransactionImpl.finish seems to assume the calling thread is the creating thread, which is not the case here. 4. Better yet, expose API for me to force-finish all open transactions? I'd rather have a botched transaction than a corrupt index. 5. Is the only condition for this open transactions + a Lucene shutdown (via shutdown() OR abrubt process termination)? In further testing, it seems I can't reproduce the problem w/ a clean or dirty shutdown if all transactions are closed. 6. I assume your iterator fix will make b11? What are the chances the root cause will be fixed in b11? Do you have a tentative release date for b11? Thanks, Adam On Wed, Dec 9, 2009 at 9:02 AM, Mattias Persson wrote: > Hi Adam, > > We're aware of such problems and I just now committed a fix which > basically is a cover-up until those bugs are fixed... the iterable > from getNodes() now runs through a filter (lazily before each next()) > so your problem should go away. > > 2009/12/8 Adam Rabung : >> Hi, >> I've recently run into problems with indexes becoming corrupt after >> unclean shutdowns. Basically: >> 1. Transaction 1 writes some data >> 2. Transaction 2 reads some data, and is left open >> 3. The database is shut down, with warnings about an open transaction >> 4. The database is opened. Recovery executes, but it appears the >> Lucence indexes are "doubled" - that is, where we used to have key => >> (value1), we now have key => (value1, value1). >> >> I've attached a JUnit test case that hopefully reproduces this for >> you. I'm on Java 5, Mac OS 10.5, neo-1.0-b10.jar, and >> index-util-0.8.jar >> >> Obviously, the first step on my end is to make sure any open >> transactions are closed before attempting a shutdown. However, I'm >> able to pretty reliably reproduce this problem in a much scarier way - >> just killing a running Neo process via the Eclipse "Console" view "red >> square" process stop button. Amazingly, Eclipse doesn't properly shut >> down processes properly when this button is used, so I can't count on >> shutdown hooks: >> https://bugs.eclipse.org/bugs/show_bug.cgi?id=38016 >> >> What expectations should I have for corruption when a database + >> indexes are .shutDown() with open transactions? >> What expectations should I have for corruption when a database + >> indexes are terminated abruptly (Eclipse Console, power outage, etc)? >> Beyond proper transaction management, and ensuring shutDown() is >> called, is there anything I should be doing to help protect this data? > I don't know if there's anything you could do. The problem is that we > can't at the moment make lucene participate (I mean _really_ > participate) in a 2 phase commit together with the NeoService, but we > will fix these issues in a near future. > > Until then, I think you'll be fine with this new fix >> >> Thanks, >> Adam >> >> ___ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> >> > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Type metadata in properties/nodes
Hi, Tobias. Actually, I think we'll use your approach for the "known relationships" and "known types" (there are quite a few in our domain model) in addition to the dynamic approach. Thanks for the help! Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Tobias Ivarsson Sent: Wednesday, December 09, 2009 8:54 AM To: Neo user discussions Subject: Re: [Neo] Type metadata in properties/nodes I see. I realized that this was what you were after. What I was proposing was that you would know the types for the properties given the type of the node. The types for the nodes in your case would be more abstract, perhaps just defined by the set of properties. I used concrete types in my explanation because it usually helps people understand what I mean with utilizing the navigation context. I had a suspicion that your particular application might not benefit from this approach, but I wanted to throw it into the mix for the sake of completeness of the discussion, since there are a lot more people reading the list than writing in a particular thread. Cheers, Tobias ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Lucene Index Corruption?
Hi Adam, We're aware of such problems and I just now committed a fix which basically is a cover-up until those bugs are fixed... the iterable from getNodes() now runs through a filter (lazily before each next()) so your problem should go away. 2009/12/8 Adam Rabung : > Hi, > I've recently run into problems with indexes becoming corrupt after > unclean shutdowns. Basically: > 1. Transaction 1 writes some data > 2. Transaction 2 reads some data, and is left open > 3. The database is shut down, with warnings about an open transaction > 4. The database is opened. Recovery executes, but it appears the > Lucence indexes are "doubled" - that is, where we used to have key => > (value1), we now have key => (value1, value1). > > I've attached a JUnit test case that hopefully reproduces this for > you. I'm on Java 5, Mac OS 10.5, neo-1.0-b10.jar, and > index-util-0.8.jar > > Obviously, the first step on my end is to make sure any open > transactions are closed before attempting a shutdown. However, I'm > able to pretty reliably reproduce this problem in a much scarier way - > just killing a running Neo process via the Eclipse "Console" view "red > square" process stop button. Amazingly, Eclipse doesn't properly shut > down processes properly when this button is used, so I can't count on > shutdown hooks: > https://bugs.eclipse.org/bugs/show_bug.cgi?id=38016 > > What expectations should I have for corruption when a database + > indexes are .shutDown() with open transactions? > What expectations should I have for corruption when a database + > indexes are terminated abruptly (Eclipse Console, power outage, etc)? > Beyond proper transaction management, and ensuring shutDown() is > called, is there anything I should be doing to help protect this data? I don't know if there's anything you could do. The problem is that we can't at the moment make lucene participate (I mean _really_ participate) in a 2 phase commit together with the NeoService, but we will fix these issues in a near future. Until then, I think you'll be fine with this new fix > > Thanks, > Adam > > ___ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Type metadata in properties/nodes
I see. I realized that this was what you were after. What I was proposing was that you would know the types for the properties given the type of the node. The types for the nodes in your case would be more abstract, perhaps just defined by the set of properties. I used concrete types in my explanation because it usually helps people understand what I mean with utilizing the navigation context. I had a suspicion that your particular application might not benefit from this approach, but I wanted to throw it into the mix for the sake of completeness of the discussion, since there are a lot more people reading the list than writing in a particular thread. Cheers, Tobias On Wed, Dec 9, 2009 at 2:02 PM, wrote: > Hi, Tobias. > > > > Thanks for your thoughts and ideas. > > > > My requirement is not only to know the "type" of something, but also to > store metadata for "types" so that I can catalog the "property type" of > each individual property in a node for a given "type". It's a bit > complicated, but we are allowing very dynamic declarative "types" that > will not have an explicit compiled Java class wrapper for each "type" > (we will have a generic wrapper that deals with the "dynamic" type, and > some explicit wrapper for pre-defined entities). The main reason is > that we need to deal with a few data types beyond the Java primitives > and String(s). For example, we want to be able to know contextually > that a property is a "timestamp" or a "hyperlink". Thus the need for > the extra (but relatively simple) metadata. > > > > It might be useful to identify a commonly use subset of addition > property types that correspond to, for example, the most common RDBMS > data types and XML schema types. This might include date, time, > datetime, link, and so on. Since at the persistence level it appears > that a property is saved along with an integer enumeration of its > "simple type", perhaps there is an extensibility model that could be > implemented to allow these application-specific types to be created and > managed. I know that would be problematic, though, given that the > current implementation is an enumeration. No worries though, since > there are perfectly good workarounds/alternatives using relationships. > > > > Cheers, > > > > Rick > > > > > > Original Message > Subject: Re: [Neo] Type metadata in properties/nodes >From: Tobias Ivarsson > Date: Wed, December 09, 2009 5:39 am > To: Neo user discussions > Associating nodes with a type node is a good approach, especially if > you > want to be able to do queries like "give me all nodes of type X". But > for > knowing the semantic type of a node when found through a general > traversal I > prefer to use the navigational context of the node. For example if I > have a > Person-node I know that the node at the other end of a > FRIEND-relationship > will be a Person-node as well. Or if I have i Car-node I know that the > node > at the other end of a OWNER-relationship will be either a Person or a > Company, both of which probably have enough in common for me to be able > to > get an address (for sending them the parking ticket or what ever), if I > need > to specifically know if it's a Person or a Company, I could use some > property for that information (or check the relationship to a type > node), > but most of the semantic information would be known from how I reached > the > node. > I have added a note about this to the FAQ in the wiki. > Cheers, > Tobias > On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta < > rick.bullo...@burningskysoftware.com> wrote: > > Thanks, Peter. Good info. I think we ended up with a hybrid approach: > we > > modeled a set of "Type" nodes (related to a master "Types" node), > each of > > which includes the type metadata (property/type data) for a specific > > "type". > > "Instance" nodes then maintain a two-way relationship with their > associated > > "Type" node so that any node can quickly obtain its Type node and so > we can > > easily traverse all instances of a specific type...and we may end up > > extending this such that the properties themselves are each a node of > their > > own, in some cases, where we need to be able to > relate/search/traverse at a > > very detailed level. I suppose that depends on the performance > > implications > > of having lots more nodes and relationships. > > > > In any case, it definitely seems "do-able" with Neo. > > > > > > > > > > -Original Message- > > From: user-boun...@lists.neo4j.org >[[1]mailto:user-boun...@lists.neo4j.org] > > On > > Behalf Of Peter Neubauer > > Sent: Tuesday, December 08, 2009 3:25 PM > > To: Neo user discussions > > Subject: Re: [Neo] Type metadata in properties/nodes > > > > Hi Rick, > > there are a number of interesting approaches to this, i
Re: [Neo] Type metadata in properties/nodes
Hi, Tobias. Thanks for your thoughts and ideas. My requirement is not only to know the "type" of something, but also to store metadata for "types" so that I can catalog the "property type" of each individual property in a node for a given "type". It's a bit complicated, but we are allowing very dynamic declarative "types" that will not have an explicit compiled Java class wrapper for each "type" (we will have a generic wrapper that deals with the "dynamic" type, and some explicit wrapper for pre-defined entities). The main reason is that we need to deal with a few data types beyond the Java primitives and String(s). For example, we want to be able to know contextually that a property is a "timestamp" or a "hyperlink". Thus the need for the extra (but relatively simple) metadata. It might be useful to identify a commonly use subset of addition property types that correspond to, for example, the most common RDBMS data types and XML schema types. This might include date, time, datetime, link, and so on. Since at the persistence level it appears that a property is saved along with an integer enumeration of its "simple type", perhaps there is an extensibility model that could be implemented to allow these application-specific types to be created and managed. I know that would be problematic, though, given that the current implementation is an enumeration. No worries though, since there are perfectly good workarounds/alternatives using relationships. Cheers, Rick Original Message Subject: Re: [Neo] Type metadata in properties/nodes From: Tobias Ivarsson Date: Wed, December 09, 2009 5:39 am To: Neo user discussions Associating nodes with a type node is a good approach, especially if you want to be able to do queries like "give me all nodes of type X". But for knowing the semantic type of a node when found through a general traversal I prefer to use the navigational context of the node. For example if I have a Person-node I know that the node at the other end of a FRIEND-relationship will be a Person-node as well. Or if I have i Car-node I know that the node at the other end of a OWNER-relationship will be either a Person or a Company, both of which probably have enough in common for me to be able to get an address (for sending them the parking ticket or what ever), if I need to specifically know if it's a Person or a Company, I could use some property for that information (or check the relationship to a type node), but most of the semantic information would be known from how I reached the node. I have added a note about this to the FAQ in the wiki. Cheers, Tobias On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta < rick.bullo...@burningskysoftware.com> wrote: > Thanks, Peter. Good info. I think we ended up with a hybrid approach: we > modeled a set of "Type" nodes (related to a master "Types" node), each of > which includes the type metadata (property/type data) for a specific > "type". > "Instance" nodes then maintain a two-way relationship with their associated > "Type" node so that any node can quickly obtain its Type node and so we can > easily traverse all instances of a specific type...and we may end up > extending this such that the properties themselves are each a node of their > own, in some cases, where we need to be able to relate/search/traverse at a > very detailed level. I suppose that depends on the performance > implications > of having lots more nodes and relationships. > > In any case, it definitely seems "do-able" with Neo. > > > > > -Original Message- > From: user-boun...@lists.neo4j.org [[1]mailto:user-boun...@lists.neo4j.org] > On > Behalf Of Peter Neubauer > Sent: Tuesday, December 08, 2009 3:25 PM > To: Neo user discussions > Subject: Re: [Neo] Type metadata in properties/nodes > > Hi Rick, > there are a number of interesting approaches to this, involving both > ways to retain the metadata: > > 1. RDF and OWL > - basically, every node will maintain a relationship to its type node > (your shadow node), something like x?--RDF:TYPE-->type_node which > contains info on what the type is, what properties etc. > > 2. Neo4j Meta package ([2]http://components.neo4j.org/neo-meta/) > - this is the concept of describing the type of things in code (Java > in this case) and thus in code enforce the restrictions and type > conversions on properties through the code. This does not capture any > meta info in the graph but is easy to do. > > 3. Annotate the nodes with type info > - in this approach, there is a "type" or "classname" property on any > node that is used to derive the type to deserialize/serialize the > object into, th
Re: [Neo] Type metadata in properties/nodes
Associating nodes with a type node is a good approach, especially if you want to be able to do queries like "give me all nodes of type X". But for knowing the semantic type of a node when found through a general traversal I prefer to use the navigational context of the node. For example if I have a Person-node I know that the node at the other end of a FRIEND-relationship will be a Person-node as well. Or if I have i Car-node I know that the node at the other end of a OWNER-relationship will be either a Person or a Company, both of which probably have enough in common for me to be able to get an address (for sending them the parking ticket or what ever), if I need to specifically know if it's a Person or a Company, I could use some property for that information (or check the relationship to a type node), but most of the semantic information would be known from how I reached the node. I have added a note about this to the FAQ in the wiki. Cheers, Tobias On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta < rick.bullo...@burningskysoftware.com> wrote: > Thanks, Peter. Good info. I think we ended up with a hybrid approach: we > modeled a set of "Type" nodes (related to a master "Types" node), each of > which includes the type metadata (property/type data) for a specific > "type". > "Instance" nodes then maintain a two-way relationship with their associated > "Type" node so that any node can quickly obtain its Type node and so we can > easily traverse all instances of a specific type...and we may end up > extending this such that the properties themselves are each a node of their > own, in some cases, where we need to be able to relate/search/traverse at a > very detailed level. I suppose that depends on the performance > implications > of having lots more nodes and relationships. > > In any case, it definitely seems "do-able" with Neo. > > > > > -Original Message- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] > On > Behalf Of Peter Neubauer > Sent: Tuesday, December 08, 2009 3:25 PM > To: Neo user discussions > Subject: Re: [Neo] Type metadata in properties/nodes > > Hi Rick, > there are a number of interesting approaches to this, involving both > ways to retain the metadata: > > 1. RDF and OWL > - basically, every node will maintain a relationship to its type node > (your shadow node), something like x?--RDF:TYPE-->type_node which > contains info on what the type is, what properties etc. > > 2. Neo4j Meta package (http://components.neo4j.org/neo-meta/) > - this is the concept of describing the type of things in code (Java > in this case) and thus in code enforce the restrictions and type > conversions on properties through the code. This does not capture any > meta info in the graph but is easy to do. > > 3. Annotate the nodes with type info > - in this approach, there is a "type" or "classname" property on any > node that is used to derive the type to deserialize/serialize the > object into, the rest of the meta info is contained in the upper code > layers. Andreas Ronges JRuby bindings are using this approach. > > 4. Encode everything into a String property > - this approach means shuffling everything into a string property, > basically treating properties as BLOBs. Works in some cases, but > certainly locks down your data in these properties. > > What is best depends on your domain, and there might be more > approaches out there. I sense that you are asking even for an > extensible type system especially on properties. That is not in scope > of the core graph engine, but I am not sure if in theory it would be > possible to extend the property type system, we would need to discuss > that separately. > > Cheers, > > /peter neubauer > > COO and Sales, Neo Technology > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org- Relationships count. > http://gremlin.tinkerpop.com - PageRank in 2 lines of code. > > > > On Tue, Dec 8, 2009 at 8:43 PM, Rick Bullotta > wrote: > > I can see how relationships could be used to map "is a duck." typing, but > > I'm struggling with how to infer type from properties. In particular, > while > > anything could be stuffed into a String, it loses important semantics > when > > you do so. I'm not referring to *storage* as a String, which makes > plenty > > of sense - it's that the type identity of the source property is lost if > you > > do so. I could maintain a "shadow node" of the type metadata that could > be > > related to each instance with a property name/property type array, but > that > > seems like something that would be useful within the node model itself. > > > > > > > > Types like DateTime, hyperlinks, and so on, while quite easily storable > in > > Neo4J, lose useful semantics on the way in. I'd welcome your thoughts on > > how others have managed thi
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, I have already done it 10 minutes ago. If you need an example to see the format of the 4 csv files, I can send it to you. Thanks again, Núria. 2009/12/9 Mattias Persson > Oh ok, It could be our attachments filter / security or something... > could you try to mail them to me directly at matt...@neotechnology.com > ? > > 2009/12/9 Núria Trench : > > Hi Mattias, > > > > In my last e-mail I have attached the sample code, haven't you received > it? > > I will try to attach it again. > > > > Núria. > > > > 2009/12/9 Mattias Persson > > > >> Hi again, Núria (it was I, Mattias who asked for the sample code). > >> Well... the fact that you parse 4 csv files doesn't really help me > >> setup a test for this... I mean how can I know that my test will be > >> similar to yours? Would it be ok to attach your code/csv files as > >> well? > >> > >> / Mattias > >> > >> 2009/12/9 Núria Trench : > >> > Hi Todd, > >> > > >> > The sample code creates nodes and relationships by parsing 4 csv > files. > >> > Thank you for trying to trigger this behaviour with this sample. > >> > > >> > Núria > >> > > >> > 2009/12/9 Mattias Persson > >> > > >> >> Could you provide me with some sample code which can trigger this > >> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? > >> >> > >> >> 2009/12/9 Núria Trench : > >> >> > Todd, > >> >> > > >> >> > I haven't the same problem. In my case, after indexing all the > >> >> > attributes/properties of each node, the application creates all the > >> edges > >> >> by > >> >> > looking up the tail node and the head node. So, it calls the method > >> >> > "org.neo4j.util.index. > >> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no > found > >> >> node) > >> >> > in many occasions. > >> >> > > >> >> > Any one has an alternative to get a node with indexex > >> >> attributes/properties? > >> >> > > >> >> > Thank you, > >> >> > > >> >> > Núria. > >> >> > > >> >> > > >> >> > 2009/12/7 Mattias Persson > >> >> > > >> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? > This > >> >> >> is a bug that we fixed yesterday... (assuming it's the same bug). > >> >> >> > >> >> >> 2009/12/7 Todd Stavish : > >> >> >> > Hi Mattias, Núria. > >> >> >> > > >> >> >> > I am also running into scalability problems with the Lucene > batch > >> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried > >> >> >> > calling optimize more. Increasing ulimit didn't help. > >> >> >> > > >> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: > >> >> >> > java.io.FileNotFoundException: > >> >> >> > > >> >> >> > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> >> > (Too many open files) > >> >> >> > [INFO] at > >> >> >> > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > >> >> >> > [INFO] at > >> >> >> > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > >> >> >> > [INFO] at > >> >> >> > >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > >> >> >> > [INFO] at > >> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > >> >> >> > [INFO] Caused by: java.io.FileNotFoundException: > >> >> >> > > >> >> >> > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> >> > (Too many open files) > >> >> >> > > >> >> >> > I tried breaking up to separate batchinserter instances, and it > >> hangs > >> >> >> > now. Can I create more than one batch inserter per process if > they > >> run > >> >> >> > sequentially and non-threaded? > >> >> >> > > >> >> >> > Thanks, > >> >> >> > Todd > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench < > >> nuriatre...@gmail.com> > >> >> >> wrote: > >> >> >> >> Hi again Mattias, > >> >> >> >> > >> >> >> >> I have tried to execute my application with the last version > >> >> available > >> >> >> in > >> >> >> >> the maven repository and I still have the same problem. After > >> >> creating > >> >> >> and > >> >> >> >> indexing all the nodes, the application calls the "optimize" > >> method > >> >> and, > >> >> >> >> then, it creates all the edges by calling the method "getNodes" > in > >> >> order > >> >> >> to > >> >> >> >> select the tail and head node of the edge, but it doesn't work > >> >> because > >> >> >> many > >> >> >> >> nodes are not found. > >> >> >> >> > >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works > >> >> properly, > >> >> >> but > >> >> >> >> if I try to create a big graph (180 million edges + 20 million > >> nodes) > >> >> it > >> >> >> >> doesn't. > >> >> >> >> > >> >> >> >> I have also tried to call the "optimize" method every time the > >> >> >> application > >> >> >> >> has been created 1 million nodes but it doesn't work. > >> >> >> >> > >> >> >> >
Re: [Neo] LuceneIndexBatchInserter doubt
Oh ok, It could be our attachments filter / security or something... could you try to mail them to me directly at matt...@neotechnology.com ? 2009/12/9 Núria Trench : > Hi Mattias, > > In my last e-mail I have attached the sample code, haven't you received it? > I will try to attach it again. > > Núria. > > 2009/12/9 Mattias Persson > >> Hi again, Núria (it was I, Mattias who asked for the sample code). >> Well... the fact that you parse 4 csv files doesn't really help me >> setup a test for this... I mean how can I know that my test will be >> similar to yours? Would it be ok to attach your code/csv files as >> well? >> >> / Mattias >> >> 2009/12/9 Núria Trench : >> > Hi Todd, >> > >> > The sample code creates nodes and relationships by parsing 4 csv files. >> > Thank you for trying to trigger this behaviour with this sample. >> > >> > Núria >> > >> > 2009/12/9 Mattias Persson >> > >> >> Could you provide me with some sample code which can trigger this >> >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? >> >> >> >> 2009/12/9 Núria Trench : >> >> > Todd, >> >> > >> >> > I haven't the same problem. In my case, after indexing all the >> >> > attributes/properties of each node, the application creates all the >> edges >> >> by >> >> > looking up the tail node and the head node. So, it calls the method >> >> > "org.neo4j.util.index. >> >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found >> >> node) >> >> > in many occasions. >> >> > >> >> > Any one has an alternative to get a node with indexex >> >> attributes/properties? >> >> > >> >> > Thank you, >> >> > >> >> > Núria. >> >> > >> >> > >> >> > 2009/12/7 Mattias Persson >> >> > >> >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This >> >> >> is a bug that we fixed yesterday... (assuming it's the same bug). >> >> >> >> >> >> 2009/12/7 Todd Stavish : >> >> >> > Hi Mattias, Núria. >> >> >> > >> >> >> > I am also running into scalability problems with the Lucene batch >> >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried >> >> >> > calling optimize more. Increasing ulimit didn't help. >> >> >> > >> >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: >> >> >> > java.io.FileNotFoundException: >> >> >> > >> >> >> >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> >> > (Too many open files) >> >> >> > [INFO] at >> >> >> >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) >> >> >> > [INFO] at >> >> >> >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) >> >> >> > [INFO] at >> >> >> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) >> >> >> > [INFO] at >> com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) >> >> >> > [INFO] Caused by: java.io.FileNotFoundException: >> >> >> > >> >> >> >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> >> > (Too many open files) >> >> >> > >> >> >> > I tried breaking up to separate batchinserter instances, and it >> hangs >> >> >> > now. Can I create more than one batch inserter per process if they >> run >> >> >> > sequentially and non-threaded? >> >> >> > >> >> >> > Thanks, >> >> >> > Todd >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench < >> nuriatre...@gmail.com> >> >> >> wrote: >> >> >> >> Hi again Mattias, >> >> >> >> >> >> >> >> I have tried to execute my application with the last version >> >> available >> >> >> in >> >> >> >> the maven repository and I still have the same problem. After >> >> creating >> >> >> and >> >> >> >> indexing all the nodes, the application calls the "optimize" >> method >> >> and, >> >> >> >> then, it creates all the edges by calling the method "getNodes" in >> >> order >> >> >> to >> >> >> >> select the tail and head node of the edge, but it doesn't work >> >> because >> >> >> many >> >> >> >> nodes are not found. >> >> >> >> >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works >> >> properly, >> >> >> but >> >> >> >> if I try to create a big graph (180 million edges + 20 million >> nodes) >> >> it >> >> >> >> doesn't. >> >> >> >> >> >> >> >> I have also tried to call the "optimize" method every time the >> >> >> application >> >> >> >> has been created 1 million nodes but it doesn't work. >> >> >> >> >> >> >> >> Have you tried to create as many nodes as I have said with the >> newer >> >> >> >> index-util version? >> >> >> >> >> >> >> >> Thank you, >> >> >> >> >> >> >> >> Núria. >> >> >> >> >> >> >> >> 2009/12/4 Núria Trench >> >> >> >> >> >> >> >>> Hi Mattias, >> >> >> >>> >> >> >> >>> Thank you very much for fixing the problem so fast. I will try it >> as >> >> >> soon >> >> >> >>> as the new changes will be available in the maven repository. >> >> >> >>> >> >> >> >>> Núria. >> >> >> >>> >>
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again. Núria. 2009/12/9 Mattias Persson > Hi again, Núria (it was I, Mattias who asked for the sample code). > Well... the fact that you parse 4 csv files doesn't really help me > setup a test for this... I mean how can I know that my test will be > similar to yours? Would it be ok to attach your code/csv files as > well? > > / Mattias > > 2009/12/9 Núria Trench : > > Hi Todd, > > > > The sample code creates nodes and relationships by parsing 4 csv files. > > Thank you for trying to trigger this behaviour with this sample. > > > > Núria > > > > 2009/12/9 Mattias Persson > > > >> Could you provide me with some sample code which can trigger this > >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? > >> > >> 2009/12/9 Núria Trench : > >> > Todd, > >> > > >> > I haven't the same problem. In my case, after indexing all the > >> > attributes/properties of each node, the application creates all the > edges > >> by > >> > looking up the tail node and the head node. So, it calls the method > >> > "org.neo4j.util.index. > >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found > >> node) > >> > in many occasions. > >> > > >> > Any one has an alternative to get a node with indexex > >> attributes/properties? > >> > > >> > Thank you, > >> > > >> > Núria. > >> > > >> > > >> > 2009/12/7 Mattias Persson > >> > > >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > >> >> is a bug that we fixed yesterday... (assuming it's the same bug). > >> >> > >> >> 2009/12/7 Todd Stavish : > >> >> > Hi Mattias, Núria. > >> >> > > >> >> > I am also running into scalability problems with the Lucene batch > >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried > >> >> > calling optimize more. Increasing ulimit didn't help. > >> >> > > >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: > >> >> > java.io.FileNotFoundException: > >> >> > > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> > (Too many open files) > >> >> > [INFO] at > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > >> >> > [INFO] at > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > >> >> > [INFO] at > >> >> > com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > >> >> > [INFO] at > com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > >> >> > [INFO] Caused by: java.io.FileNotFoundException: > >> >> > > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> > (Too many open files) > >> >> > > >> >> > I tried breaking up to separate batchinserter instances, and it > hangs > >> >> > now. Can I create more than one batch inserter per process if they > run > >> >> > sequentially and non-threaded? > >> >> > > >> >> > Thanks, > >> >> > Todd > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench < > nuriatre...@gmail.com> > >> >> wrote: > >> >> >> Hi again Mattias, > >> >> >> > >> >> >> I have tried to execute my application with the last version > >> available > >> >> in > >> >> >> the maven repository and I still have the same problem. After > >> creating > >> >> and > >> >> >> indexing all the nodes, the application calls the "optimize" > method > >> and, > >> >> >> then, it creates all the edges by calling the method "getNodes" in > >> order > >> >> to > >> >> >> select the tail and head node of the edge, but it doesn't work > >> because > >> >> many > >> >> >> nodes are not found. > >> >> >> > >> >> >> I have tried to create only 30 nodes and 15 edges and it works > >> properly, > >> >> but > >> >> >> if I try to create a big graph (180 million edges + 20 million > nodes) > >> it > >> >> >> doesn't. > >> >> >> > >> >> >> I have also tried to call the "optimize" method every time the > >> >> application > >> >> >> has been created 1 million nodes but it doesn't work. > >> >> >> > >> >> >> Have you tried to create as many nodes as I have said with the > newer > >> >> >> index-util version? > >> >> >> > >> >> >> Thank you, > >> >> >> > >> >> >> Núria. > >> >> >> > >> >> >> 2009/12/4 Núria Trench > >> >> >> > >> >> >>> Hi Mattias, > >> >> >>> > >> >> >>> Thank you very much for fixing the problem so fast. I will try it > as > >> >> soon > >> >> >>> as the new changes will be available in the maven repository. > >> >> >>> > >> >> >>> Núria. > >> >> >>> > >> >> >>> > >> >> >>> 2009/12/4 Mattias Persson > >> >> >>> > >> >> I fixed the problem and also added a cache per key for faster > >> >> getNodes/getSingleNode lookup during the insert process. However > >> the > >> >> cache assumes that there's nothing in the index when the process > >> >> starts (which al
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench : > Hi Todd, > > The sample code creates nodes and relationships by parsing 4 csv files. > Thank you for trying to trigger this behaviour with this sample. > > Núria > > 2009/12/9 Mattias Persson > >> Could you provide me with some sample code which can trigger this >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? >> >> 2009/12/9 Núria Trench : >> > Todd, >> > >> > I haven't the same problem. In my case, after indexing all the >> > attributes/properties of each node, the application creates all the edges >> by >> > looking up the tail node and the head node. So, it calls the method >> > "org.neo4j.util.index. >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found >> node) >> > in many occasions. >> > >> > Any one has an alternative to get a node with indexex >> attributes/properties? >> > >> > Thank you, >> > >> > Núria. >> > >> > >> > 2009/12/7 Mattias Persson >> > >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This >> >> is a bug that we fixed yesterday... (assuming it's the same bug). >> >> >> >> 2009/12/7 Todd Stavish : >> >> > Hi Mattias, Núria. >> >> > >> >> > I am also running into scalability problems with the Lucene batch >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried >> >> > calling optimize more. Increasing ulimit didn't help. >> >> > >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: >> >> > java.io.FileNotFoundException: >> >> > >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> > (Too many open files) >> >> > [INFO] at >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) >> >> > [INFO] at >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) >> >> > [INFO] at >> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) >> >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) >> >> > [INFO] Caused by: java.io.FileNotFoundException: >> >> > >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> > (Too many open files) >> >> > >> >> > I tried breaking up to separate batchinserter instances, and it hangs >> >> > now. Can I create more than one batch inserter per process if they run >> >> > sequentially and non-threaded? >> >> > >> >> > Thanks, >> >> > Todd >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench >> >> wrote: >> >> >> Hi again Mattias, >> >> >> >> >> >> I have tried to execute my application with the last version >> available >> >> in >> >> >> the maven repository and I still have the same problem. After >> creating >> >> and >> >> >> indexing all the nodes, the application calls the "optimize" method >> and, >> >> >> then, it creates all the edges by calling the method "getNodes" in >> order >> >> to >> >> >> select the tail and head node of the edge, but it doesn't work >> because >> >> many >> >> >> nodes are not found. >> >> >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works >> properly, >> >> but >> >> >> if I try to create a big graph (180 million edges + 20 million nodes) >> it >> >> >> doesn't. >> >> >> >> >> >> I have also tried to call the "optimize" method every time the >> >> application >> >> >> has been created 1 million nodes but it doesn't work. >> >> >> >> >> >> Have you tried to create as many nodes as I have said with the newer >> >> >> index-util version? >> >> >> >> >> >> Thank you, >> >> >> >> >> >> Núria. >> >> >> >> >> >> 2009/12/4 Núria Trench >> >> >> >> >> >>> Hi Mattias, >> >> >>> >> >> >>> Thank you very much for fixing the problem so fast. I will try it as >> >> soon >> >> >>> as the new changes will be available in the maven repository. >> >> >>> >> >> >>> Núria. >> >> >>> >> >> >>> >> >> >>> 2009/12/4 Mattias Persson >> >> >>> >> >> I fixed the problem and also added a cache per key for faster >> >> getNodes/getSingleNode lookup during the insert process. However >> the >> >> cache assumes that there's nothing in the index when the process >> >> starts (which almost always will be true) to speed things up even >> >> further. >> >> >> >> You can control the cache size and if it should be used by >> overriding >> >> the (this is also documented in the Javadoc): >> >> >> >> boolean useCache() >> >> int getMaxCacheSizePerKey() >> >> >> >> methods in your LuceneIndexBatchInserterImpl instance. The new >> changes >> >> should be available in the maven repository within an hour. >> >> >> >> >>>
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson > Could you provide me with some sample code which can trigger this > behaviour with the latest index-util-0.9-SNAPSHOT Núria? > > 2009/12/9 Núria Trench : > > Todd, > > > > I haven't the same problem. In my case, after indexing all the > > attributes/properties of each node, the application creates all the edges > by > > looking up the tail node and the head node. So, it calls the method > > "org.neo4j.util.index. > > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found > node) > > in many occasions. > > > > Any one has an alternative to get a node with indexex > attributes/properties? > > > > Thank you, > > > > Núria. > > > > > > 2009/12/7 Mattias Persson > > > >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > >> is a bug that we fixed yesterday... (assuming it's the same bug). > >> > >> 2009/12/7 Todd Stavish : > >> > Hi Mattias, Núria. > >> > > >> > I am also running into scalability problems with the Lucene batch > >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried > >> > calling optimize more. Increasing ulimit didn't help. > >> > > >> > INFO] Exception in thread "main" java.lang.RuntimeException: > >> > java.io.FileNotFoundException: > >> > > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> > (Too many open files) > >> > [INFO] at > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > >> > [INFO] at > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > >> > [INFO] at > >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > >> > [INFO] Caused by: java.io.FileNotFoundException: > >> > > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> > (Too many open files) > >> > > >> > I tried breaking up to separate batchinserter instances, and it hangs > >> > now. Can I create more than one batch inserter per process if they run > >> > sequentially and non-threaded? > >> > > >> > Thanks, > >> > Todd > >> > > >> > > >> > > >> > > >> > > >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench > >> wrote: > >> >> Hi again Mattias, > >> >> > >> >> I have tried to execute my application with the last version > available > >> in > >> >> the maven repository and I still have the same problem. After > creating > >> and > >> >> indexing all the nodes, the application calls the "optimize" method > and, > >> >> then, it creates all the edges by calling the method "getNodes" in > order > >> to > >> >> select the tail and head node of the edge, but it doesn't work > because > >> many > >> >> nodes are not found. > >> >> > >> >> I have tried to create only 30 nodes and 15 edges and it works > properly, > >> but > >> >> if I try to create a big graph (180 million edges + 20 million nodes) > it > >> >> doesn't. > >> >> > >> >> I have also tried to call the "optimize" method every time the > >> application > >> >> has been created 1 million nodes but it doesn't work. > >> >> > >> >> Have you tried to create as many nodes as I have said with the newer > >> >> index-util version? > >> >> > >> >> Thank you, > >> >> > >> >> Núria. > >> >> > >> >> 2009/12/4 Núria Trench > >> >> > >> >>> Hi Mattias, > >> >>> > >> >>> Thank you very much for fixing the problem so fast. I will try it as > >> soon > >> >>> as the new changes will be available in the maven repository. > >> >>> > >> >>> Núria. > >> >>> > >> >>> > >> >>> 2009/12/4 Mattias Persson > >> >>> > >> I fixed the problem and also added a cache per key for faster > >> getNodes/getSingleNode lookup during the insert process. However > the > >> cache assumes that there's nothing in the index when the process > >> starts (which almost always will be true) to speed things up even > >> further. > >> > >> You can control the cache size and if it should be used by > overriding > >> the (this is also documented in the Javadoc): > >> > >> boolean useCache() > >> int getMaxCacheSizePerKey() > >> > >> methods in your LuceneIndexBatchInserterImpl instance. The new > changes > >> should be available in the maven repository within an hour. > >> > >> 2009/12/4 Mattias Persson : > >> > I think I found the problem... it's indexing as it should, but it > >> > isn't reflected in getNodes/getSingleNode properly until you > >> > flush/optimize/shutdown the index. I'll try to fix it today! > >> > > >> > 2009/12/3 Núria Trench : > >> >> Thank you very much for your response. > >> >> If you need more information, you only have to send an e-mail > and I > >> will try > >>
Re: [Neo] LuceneIndexBatchInserter doubt
Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench : > Todd, > > I haven't the same problem. In my case, after indexing all the > attributes/properties of each node, the application creates all the edges by > looking up the tail node and the head node. So, it calls the method > "org.neo4j.util.index. > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node) > in many occasions. > > Any one has an alternative to get a node with indexex attributes/properties? > > Thank you, > > Núria. > > > 2009/12/7 Mattias Persson > >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This >> is a bug that we fixed yesterday... (assuming it's the same bug). >> >> 2009/12/7 Todd Stavish : >> > Hi Mattias, Núria. >> > >> > I am also running into scalability problems with the Lucene batch >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried >> > calling optimize more. Increasing ulimit didn't help. >> > >> > INFO] Exception in thread "main" java.lang.RuntimeException: >> > java.io.FileNotFoundException: >> > >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> > (Too many open files) >> > [INFO] at >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) >> > [INFO] at >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) >> > [INFO] at >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) >> > [INFO] Caused by: java.io.FileNotFoundException: >> > >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> > (Too many open files) >> > >> > I tried breaking up to separate batchinserter instances, and it hangs >> > now. Can I create more than one batch inserter per process if they run >> > sequentially and non-threaded? >> > >> > Thanks, >> > Todd >> > >> > >> > >> > >> > >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench >> wrote: >> >> Hi again Mattias, >> >> >> >> I have tried to execute my application with the last version available >> in >> >> the maven repository and I still have the same problem. After creating >> and >> >> indexing all the nodes, the application calls the "optimize" method and, >> >> then, it creates all the edges by calling the method "getNodes" in order >> to >> >> select the tail and head node of the edge, but it doesn't work because >> many >> >> nodes are not found. >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works properly, >> but >> >> if I try to create a big graph (180 million edges + 20 million nodes) it >> >> doesn't. >> >> >> >> I have also tried to call the "optimize" method every time the >> application >> >> has been created 1 million nodes but it doesn't work. >> >> >> >> Have you tried to create as many nodes as I have said with the newer >> >> index-util version? >> >> >> >> Thank you, >> >> >> >> Núria. >> >> >> >> 2009/12/4 Núria Trench >> >> >> >>> Hi Mattias, >> >>> >> >>> Thank you very much for fixing the problem so fast. I will try it as >> soon >> >>> as the new changes will be available in the maven repository. >> >>> >> >>> Núria. >> >>> >> >>> >> >>> 2009/12/4 Mattias Persson >> >>> >> I fixed the problem and also added a cache per key for faster >> getNodes/getSingleNode lookup during the insert process. However the >> cache assumes that there's nothing in the index when the process >> starts (which almost always will be true) to speed things up even >> further. >> >> You can control the cache size and if it should be used by overriding >> the (this is also documented in the Javadoc): >> >> boolean useCache() >> int getMaxCacheSizePerKey() >> >> methods in your LuceneIndexBatchInserterImpl instance. The new changes >> should be available in the maven repository within an hour. >> >> 2009/12/4 Mattias Persson : >> > I think I found the problem... it's indexing as it should, but it >> > isn't reflected in getNodes/getSingleNode properly until you >> > flush/optimize/shutdown the index. I'll try to fix it today! >> > >> > 2009/12/3 Núria Trench : >> >> Thank you very much for your response. >> >> If you need more information, you only have to send an e-mail and I >> will try >> >> to explain it better. >> >> >> >> Núria. >> >> >> >> 2009/12/3 Mattias Persson >> >> >> >>> This is something I'd like to reproduce and I'll do some testing >> on >> >>> this tomorrow >> >>> >> >>> 2009/12/3 Núria Trench : >> >>> > Hello, >> >>> > >> >>> > Last week, I decided to download your graph database core in >> order >> to use >> >>> > it. First, I created a new project to parse my CSV
Re: [Neo] LuceneIndexBatchInserter doubt
Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method "org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson > Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > is a bug that we fixed yesterday... (assuming it's the same bug). > > 2009/12/7 Todd Stavish : > > Hi Mattias, Núria. > > > > I am also running into scalability problems with the Lucene batch > > inserter at much smaller numbers, 30,000 indexed nodes. I tried > > calling optimize more. Increasing ulimit didn't help. > > > > INFO] Exception in thread "main" java.lang.RuntimeException: > > java.io.FileNotFoundException: > > > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > > (Too many open files) > > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > > [INFO] at > com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > > [INFO] Caused by: java.io.FileNotFoundException: > > > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > > (Too many open files) > > > > I tried breaking up to separate batchinserter instances, and it hangs > > now. Can I create more than one batch inserter per process if they run > > sequentially and non-threaded? > > > > Thanks, > > Todd > > > > > > > > > > > > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench > wrote: > >> Hi again Mattias, > >> > >> I have tried to execute my application with the last version available > in > >> the maven repository and I still have the same problem. After creating > and > >> indexing all the nodes, the application calls the "optimize" method and, > >> then, it creates all the edges by calling the method "getNodes" in order > to > >> select the tail and head node of the edge, but it doesn't work because > many > >> nodes are not found. > >> > >> I have tried to create only 30 nodes and 15 edges and it works properly, > but > >> if I try to create a big graph (180 million edges + 20 million nodes) it > >> doesn't. > >> > >> I have also tried to call the "optimize" method every time the > application > >> has been created 1 million nodes but it doesn't work. > >> > >> Have you tried to create as many nodes as I have said with the newer > >> index-util version? > >> > >> Thank you, > >> > >> Núria. > >> > >> 2009/12/4 Núria Trench > >> > >>> Hi Mattias, > >>> > >>> Thank you very much for fixing the problem so fast. I will try it as > soon > >>> as the new changes will be available in the maven repository. > >>> > >>> Núria. > >>> > >>> > >>> 2009/12/4 Mattias Persson > >>> > I fixed the problem and also added a cache per key for faster > getNodes/getSingleNode lookup during the insert process. However the > cache assumes that there's nothing in the index when the process > starts (which almost always will be true) to speed things up even > further. > > You can control the cache size and if it should be used by overriding > the (this is also documented in the Javadoc): > > boolean useCache() > int getMaxCacheSizePerKey() > > methods in your LuceneIndexBatchInserterImpl instance. The new changes > should be available in the maven repository within an hour. > > 2009/12/4 Mattias Persson : > > I think I found the problem... it's indexing as it should, but it > > isn't reflected in getNodes/getSingleNode properly until you > > flush/optimize/shutdown the index. I'll try to fix it today! > > > > 2009/12/3 Núria Trench : > >> Thank you very much for your response. > >> If you need more information, you only have to send an e-mail and I > will try > >> to explain it better. > >> > >> Núria. > >> > >> 2009/12/3 Mattias Persson > >> > >>> This is something I'd like to reproduce and I'll do some testing > on > >>> this tomorrow > >>> > >>> 2009/12/3 Núria Trench : > >>> > Hello, > >>> > > >>> > Last week, I decided to download your graph database core in > order > to use > >>> > it. First, I created a new project to parse my CSV files and > create > a new > >>> > graph database with Neo4j. This CSV files contain 150 milion > edges > and 20 > >>> > milion nodes. > >>> > > >>> > When I finished to write the code which will create the graph > database, I > >>> > executed it and, after six