Re: [Neo] LuceneIndexBatchInserter doubt
Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3 Mattias Persson matt...@neotechnology.com This is something I'd like to reproduce and I'll do some testing on this tomorrow 2009/12/3 Núria Trench nuriatre...@gmail.com: Hello, Last week, I decided to download your graph database core in order to use it. First, I created a new project to
Re: [Neo] LuceneIndexBatchInserter doubt
Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected in getNodes/getSingleNode properly until you flush/optimize/shutdown the index. I'll try to fix it today! 2009/12/3 Núria Trench nuriatre...@gmail.com: Thank you very much for your response. If you need more information, you only have to send an e-mail and I will try to explain it better. Núria. 2009/12/3
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again. Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson matt...@neotechnology.com: I think I found the problem... it's indexing as it should, but it isn't reflected
Re: [Neo] LuceneIndexBatchInserter doubt
Oh ok, It could be our attachments filter / security or something... could you try to mail them to me directly at matt...@neotechnology.com ? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again. Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further. You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available
Re: [Neo] LuceneIndexBatchInserter doubt
Hi Mattias, I have already done it 10 minutes ago. If you need an example to see the format of the 4 csv files, I can send it to you. Thanks again, Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Oh ok, It could be our attachments filter / security or something... could you try to mail them to me directly at matt...@neotechnology.com ? 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again. Núria. 2009/12/9 Mattias Persson matt...@neotechnology.com Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well? / Mattias 2009/12/9 Núria Trench nuriatre...@gmail.com: Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample. Núria 2009/12/9 Mattias Persson matt...@neotechnology.com Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria? 2009/12/9 Núria Trench nuriatre...@gmail.com: Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode which returns -1 (no found node) in many occasions. Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson matt...@neotechnology.com Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug). 2009/12/7 Todd Stavish toddstav...@gmail.com: Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help. INFO] Exception in thread main java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench nuriatre...@gmail.com wrote: Hi again Mattias, I have tried to execute my application with the last version available in the maven repository and I still have the same problem. After creating and indexing all the nodes, the application calls the optimize method and, then, it creates all the edges by calling the method getNodes in order to select the tail and head node of the edge, but it doesn't work because many nodes are not found. I have tried to create only 30 nodes and 15 edges and it works properly, but if I try to create a big graph (180 million edges + 20 million nodes) it doesn't. I have also tried to call the optimize method every time the application has been created 1 million nodes but it doesn't work. Have you tried to create as many nodes as I have said with the newer index-util version? Thank you, Núria. 2009/12/4 Núria Trench nuriatre...@gmail.com Hi Mattias, Thank you very much for fixing the problem so fast. I will try it as soon as the new changes will be available in the maven repository. Núria. 2009/12/4 Mattias Persson matt...@neotechnology.com I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when
Re: [Neo] Type metadata in properties/nodes
Associating nodes with a type node is a good approach, especially if you want to be able to do queries like give me all nodes of type X. But for knowing the semantic type of a node when found through a general traversal I prefer to use the navigational context of the node. For example if I have a Person-node I know that the node at the other end of a FRIEND-relationship will be a Person-node as well. Or if I have i Car-node I know that the node at the other end of a OWNER-relationship will be either a Person or a Company, both of which probably have enough in common for me to be able to get an address (for sending them the parking ticket or what ever), if I need to specifically know if it's a Person or a Company, I could use some property for that information (or check the relationship to a type node), but most of the semantic information would be known from how I reached the node. I have added a note about this to the FAQ in the wiki. Cheers, Tobias On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Thanks, Peter. Good info. I think we ended up with a hybrid approach: we modeled a set of Type nodes (related to a master Types node), each of which includes the type metadata (property/type data) for a specific type. Instance nodes then maintain a two-way relationship with their associated Type node so that any node can quickly obtain its Type node and so we can easily traverse all instances of a specific type...and we may end up extending this such that the properties themselves are each a node of their own, in some cases, where we need to be able to relate/search/traverse at a very detailed level. I suppose that depends on the performance implications of having lots more nodes and relationships. In any case, it definitely seems do-able with Neo. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Peter Neubauer Sent: Tuesday, December 08, 2009 3:25 PM To: Neo user discussions Subject: Re: [Neo] Type metadata in properties/nodes Hi Rick, there are a number of interesting approaches to this, involving both ways to retain the metadata: 1. RDF and OWL - basically, every node will maintain a relationship to its type node (your shadow node), something like x?--RDF:TYPE--type_node which contains info on what the type is, what properties etc. 2. Neo4j Meta package (http://components.neo4j.org/neo-meta/) - this is the concept of describing the type of things in code (Java in this case) and thus in code enforce the restrictions and type conversions on properties through the code. This does not capture any meta info in the graph but is easy to do. 3. Annotate the nodes with type info - in this approach, there is a type or classname property on any node that is used to derive the type to deserialize/serialize the object into, the rest of the meta info is contained in the upper code layers. Andreas Ronges JRuby bindings are using this approach. 4. Encode everything into a String property - this approach means shuffling everything into a string property, basically treating properties as BLOBs. Works in some cases, but certainly locks down your data in these properties. What is best depends on your domain, and there might be more approaches out there. I sense that you are asking even for an extensible type system especially on properties. That is not in scope of the core graph engine, but I am not sure if in theory it would be possible to extend the property type system, we would need to discuss that separately. Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org- Relationships count. http://gremlin.tinkerpop.com - PageRank in 2 lines of code. On Tue, Dec 8, 2009 at 8:43 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: I can see how relationships could be used to map is a duck. typing, but I'm struggling with how to infer type from properties. In particular, while anything could be stuffed into a String, it loses important semantics when you do so. I'm not referring to *storage* as a String, which makes plenty of sense - it's that the type identity of the source property is lost if you do so. I could maintain a shadow node of the type metadata that could be related to each instance with a property name/property type array, but that seems like something that would be useful within the node model itself. Types like DateTime, hyperlinks, and so on, while quite easily storable in Neo4J, lose useful semantics on the way in. I'd welcome your thoughts on how others have managed this type of scenario and other techniques for meta tagging nodes and properties with type or other
Re: [Neo] Type metadata in properties/nodes
Hi, Tobias. Thanks for your thoughts and ideas. My requirement is not only to know the type of something, but also to store metadata for types so that I can catalog the property type of each individual property in a node for a given type. It's a bit complicated, but we are allowing very dynamic declarative types that will not have an explicit compiled Java class wrapper for each type (we will have a generic wrapper that deals with the dynamic type, and some explicit wrapper for pre-defined entities). The main reason is that we need to deal with a few data types beyond the Java primitives and String(s). For example, we want to be able to know contextually that a property is a timestamp or a hyperlink. Thus the need for the extra (but relatively simple) metadata. It might be useful to identify a commonly use subset of addition property types that correspond to, for example, the most common RDBMS data types and XML schema types. This might include date, time, datetime, link, and so on. Since at the persistence level it appears that a property is saved along with an integer enumeration of its simple type, perhaps there is an extensibility model that could be implemented to allow these application-specific types to be created and managed. I know that would be problematic, though, given that the current implementation is an enumeration. No worries though, since there are perfectly good workarounds/alternatives using relationships. Cheers, Rick Original Message Subject: Re: [Neo] Type metadata in properties/nodes From: Tobias Ivarsson tobias.ivars...@neotechnology.com Date: Wed, December 09, 2009 5:39 am To: Neo user discussions user@lists.neo4j.org Associating nodes with a type node is a good approach, especially if you want to be able to do queries like give me all nodes of type X. But for knowing the semantic type of a node when found through a general traversal I prefer to use the navigational context of the node. For example if I have a Person-node I know that the node at the other end of a FRIEND-relationship will be a Person-node as well. Or if I have i Car-node I know that the node at the other end of a OWNER-relationship will be either a Person or a Company, both of which probably have enough in common for me to be able to get an address (for sending them the parking ticket or what ever), if I need to specifically know if it's a Person or a Company, I could use some property for that information (or check the relationship to a type node), but most of the semantic information would be known from how I reached the node. I have added a note about this to the FAQ in the wiki. Cheers, Tobias On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Thanks, Peter. Good info. I think we ended up with a hybrid approach: we modeled a set of Type nodes (related to a master Types node), each of which includes the type metadata (property/type data) for a specific type. Instance nodes then maintain a two-way relationship with their associated Type node so that any node can quickly obtain its Type node and so we can easily traverse all instances of a specific type...and we may end up extending this such that the properties themselves are each a node of their own, in some cases, where we need to be able to relate/search/traverse at a very detailed level. I suppose that depends on the performance implications of having lots more nodes and relationships. In any case, it definitely seems do-able with Neo. -Original Message- From: user-boun...@lists.neo4j.org [[1]mailto:user-boun...@lists.neo4j.org] On Behalf Of Peter Neubauer Sent: Tuesday, December 08, 2009 3:25 PM To: Neo user discussions Subject: Re: [Neo] Type metadata in properties/nodes Hi Rick, there are a number of interesting approaches to this, involving both ways to retain the metadata: 1. RDF and OWL - basically, every node will maintain a relationship to its type node (your shadow node), something like x?--RDF:TYPE--type_node which contains info on what the type is, what properties etc. 2. Neo4j Meta package ([2]http://components.neo4j.org/neo-meta/) - this is the concept of describing the type of things in code (Java in this case) and thus in code enforce the restrictions and type conversions on properties through the code. This does not capture any meta info in the graph but is easy to do. 3. Annotate the nodes with type info - in this approach, there is a type or classname property on any node that is used to derive the type to deserialize/serialize the object into, the rest of the meta info is
Re: [Neo] Type metadata in properties/nodes
I see. I realized that this was what you were after. What I was proposing was that you would know the types for the properties given the type of the node. The types for the nodes in your case would be more abstract, perhaps just defined by the set of properties. I used concrete types in my explanation because it usually helps people understand what I mean with utilizing the navigation context. I had a suspicion that your particular application might not benefit from this approach, but I wanted to throw it into the mix for the sake of completeness of the discussion, since there are a lot more people reading the list than writing in a particular thread. Cheers, Tobias On Wed, Dec 9, 2009 at 2:02 PM, rick.bullo...@burningskysoftware.comwrote: Hi, Tobias. Thanks for your thoughts and ideas. My requirement is not only to know the type of something, but also to store metadata for types so that I can catalog the property type of each individual property in a node for a given type. It's a bit complicated, but we are allowing very dynamic declarative types that will not have an explicit compiled Java class wrapper for each type (we will have a generic wrapper that deals with the dynamic type, and some explicit wrapper for pre-defined entities). The main reason is that we need to deal with a few data types beyond the Java primitives and String(s). For example, we want to be able to know contextually that a property is a timestamp or a hyperlink. Thus the need for the extra (but relatively simple) metadata. It might be useful to identify a commonly use subset of addition property types that correspond to, for example, the most common RDBMS data types and XML schema types. This might include date, time, datetime, link, and so on. Since at the persistence level it appears that a property is saved along with an integer enumeration of its simple type, perhaps there is an extensibility model that could be implemented to allow these application-specific types to be created and managed. I know that would be problematic, though, given that the current implementation is an enumeration. No worries though, since there are perfectly good workarounds/alternatives using relationships. Cheers, Rick Original Message Subject: Re: [Neo] Type metadata in properties/nodes From: Tobias Ivarsson tobias.ivars...@neotechnology.com Date: Wed, December 09, 2009 5:39 am To: Neo user discussions user@lists.neo4j.org Associating nodes with a type node is a good approach, especially if you want to be able to do queries like give me all nodes of type X. But for knowing the semantic type of a node when found through a general traversal I prefer to use the navigational context of the node. For example if I have a Person-node I know that the node at the other end of a FRIEND-relationship will be a Person-node as well. Or if I have i Car-node I know that the node at the other end of a OWNER-relationship will be either a Person or a Company, both of which probably have enough in common for me to be able to get an address (for sending them the parking ticket or what ever), if I need to specifically know if it's a Person or a Company, I could use some property for that information (or check the relationship to a type node), but most of the semantic information would be known from how I reached the node. I have added a note about this to the FAQ in the wiki. Cheers, Tobias On Tue, Dec 8, 2009 at 10:22 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Thanks, Peter. Good info. I think we ended up with a hybrid approach: we modeled a set of Type nodes (related to a master Types node), each of which includes the type metadata (property/type data) for a specific type. Instance nodes then maintain a two-way relationship with their associated Type node so that any node can quickly obtain its Type node and so we can easily traverse all instances of a specific type...and we may end up extending this such that the properties themselves are each a node of their own, in some cases, where we need to be able to relate/search/traverse at a very detailed level. I suppose that depends on the performance implications of having lots more nodes and relationships. In any case, it definitely seems do-able with Neo. -Original Message- From: user-boun...@lists.neo4j.org [[1]mailto:user-boun...@lists.neo4j.org] On Behalf Of Peter Neubauer Sent: Tuesday, December 08, 2009 3:25 PM To: Neo user discussions Subject: Re: [Neo] Type metadata in properties/nodes Hi Rick, there are a number of interesting approaches to this, involving both ways to retain the metadata: 1. RDF and OWL - basically,
[Neo] Noob questions/comments
Hi, all. Here are a few questions and comments that I'd welcome feedback on : Questions: - If you delete the reference node (id = 0), how can you recreate it? - If you have a number of loose or disjoint graphs structured as trees with a single root node, is there a best practice for tracking/iterating only the top level node(s) of these disjoint graphs? Is relating them to the reference node and doing a first level traversal the best way? - We would like to treat our properties as slightly more complex than a simple type (they might have a last modified date, validity flag, and so on) - given the choice between adding properties to track this state or using nodes and relationships for these entities, what are the pros and cons of each approach? - One aspect of our application will store nodes that can be considered similar to event logs. There may be many thousands of these nodes per event stream. We would like to be able to traverse the entries in chronological order, very quickly. We were considering the following design possibilities: o Simply create a node for each stream and a node for each entry, with a relationship between the stream and the entry, then implement our own sort routine o Similar to the above, but create a node for each day, and manage relationships to allow traversal by stream and/or day o Create a node for each stream, a node for each entry and treat the entries as a forward-only linked list using relationships between the entries (and of course a relationship between the stream and the first entry) - Has the fact that the node id is an int rather than a long been an issue in any implementations? Are node id's reused if deleted (I suspect not, but just wanted to confirm). - Any whitepaper/best practices for high availability/load-balanced scenarios? We were considering using a message queue to send deltas around between nodes or something similar. - We'll be hosting Neo inside a servlet engine. Plan was to start up Neo within the init method of an autoloading servlet. Any other recommendations/suggestions? Best practice for ensuring a clean shutdown? - Anyone used any kind of intermediate index or other approach to bridge multiple Neo instances? - Any GUI tools for viewing/navigating the graph structure? We are prototyping one in Adobe Flex, curious if there are others. Comments/observations: - I love the fact that you can delete nodes and relationships from inside an iterator. I always hated the way I had to separately maintain a list of things to be deleted when traversing XML DOMs, for example. Nice capability! - Neo seems FAST! - It's a bit of a major mindset change, but once the lightbulb goes on, the potential seems limitless! Thanks in advance for guidance. Rick ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Troubleshooting performance/memory issues
Hi, all. When trying to load a few hundred thousand nodes relationships (chunking it in groups of 1000 nodes or so), we are getting an out of memory heap error after 15-20 minutes or so. No big deal, we expanded the heap settings for the JVM. But then we also noticed that the nioneo_logical_log.xxx file was continuing to grow, even though we were wrapping each 1000 node inserts in their own transaction (there is no other transaction active) and committing w/success and finishing each group of 1000.Periodically (seemingly unrelated to our transaction finishing), that file shrinks again and the data is flushed to the other neo propertystore and relationshipstore files. I just wanted to check if that was normal behavior, or if there is something wrong with way we (or Neo) is handling the transactions, and thus the reason we hit an out-of-memory error. Thanks, Rick ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Troubleshooting performance/memory issues
FYI, we experimented with different heap size (1GB), along with different chunk sizes, and were able to eliminate the heap error and get about a 10X improvement in insert speed. It would be helpful to better understand the interactions of the various Neo startup parameters, transaction buffers, and so on, and their impact on performance. I read the performance guidelines, which was some help, but perhaps some additional scenario-based recommendations might help (frequent updates/frequent access, infrequent update/frequent access, burst mode update vs steady update rate, etc...). Learning more about Neo every hour! -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Rick Bullotta Sent: Wednesday, December 09, 2009 2:57 PM To: 'Neo user discussions' Subject: [Neo] Troubleshooting performance/memory issues Hi, all. When trying to load a few hundred thousand nodes relationships (chunking it in groups of 1000 nodes or so), we are getting an out of memory heap error after 15-20 minutes or so. No big deal, we expanded the heap settings for the JVM. But then we also noticed that the nioneo_logical_log.xxx file was continuing to grow, even though we were wrapping each 1000 node inserts in their own transaction (there is no other transaction active) and committing w/success and finishing each group of 1000.Periodically (seemingly unrelated to our transaction finishing), that file shrinks again and the data is flushed to the other neo propertystore and relationshipstore files. I just wanted to check if that was normal behavior, or if there is something wrong with way we (or Neo) is handling the transactions, and thus the reason we hit an out-of-memory error. Thanks, Rick ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] I/O load in Neo during traversals
When doing some large traversal testing (no writes/updates), I noticed that the neostore.propertystore.db.strings file was seeing a lot of read I/O (as expected) but also a huge amount of write I/O (almost 5X the read I/O rate). Out of curiosity, what is the write activity that needs to occur when doing traversals? ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user