Re: [Neo4j] OutOfMemory while populating large graph
Note that a couple of memory issues are fixed in Lucene 2.9.3. Leaking when indexing big docs, and indolent reclamation of space from the FieldCache. Bill Arijit Mukherjee wrote: > I've a similar problem. Although I'm not going out of memory yet, I can see > the heap constantly growing, and JProfiler says most of it is due to the > Lucene indexing. And even if I do the commit after every X transactions, > once the population is finished, the final commit is done, and the graph db > closed - the heap stays like that - almost full. An explicit gc will clean > up some part, but not fully. > > Arijit > > On 9 July 2010 17:00, Mattias Persson wrote: > > > 2010/7/9 Marko Rodriguez > > > > > Hi, > > > > > > > Would it actually be worth something to be able to begin a transaction > > > which > > > > auto-committs stuff every X write operation, like a batch inserter mode > > > > which can be used in normal EmbeddedGraphDatabase? Kind of like: > > > > > > > >graphDb.beginTx( Mode.BATCH_INSERT ) > > > > > > > > ...so that you can start such a transaction and then just insert data > > > > without having to care about restarting it now and then? > > > > > > Thats cool! Does that already exist? In my code (like others on the list > > it > > > seems) I have a counter++ that every 20,000 inserts (some made up number > > > that is not going to throw an OutOfMemory) commits and the reopens a new > > > transaction. Sorta sux. > > > > > > > No it doesn't, I just wrote stuff which I though someone could think of as > > useful. A cool thing with just telling it to do a batch insert mode > > transaction (not the actual commit interval) is that it could look at how > > much memory it had to play around with and commit whenever it would be the > > most efficient, even having the ability to change the limit on the fly if > > the memory suddenly ran out. > > > > > > > Thanks, > > > Marko. > > > > > > ___ > > > Neo4j mailing list > > > User@lists.neo4j.org > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > > > > > > -- > > Mattias Persson, [matt...@neotechnology.com] > > Hacker, Neo Technology > > www.neotechnology.com > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > "And when the night is cloudy, > There is still a light that shines on me, > Shine on until tomorrow, let it be." > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Can you expand on this a bit... as to what the graph internals are doing: Option 1: You have "colored" relationships (RED, BLUE, GREEN, etc to 10k colors). >From a random node, you traverse the graph finding all nodes that it is connected to via the PURPLE or FUSIA relationship. vs Option 2: You have a COLOR relationship with a name property that contains the actual color name. >From a random node, you traverse the graph finding all nodes that it is connected to via the COLOR relationship containing the name "PURPLE" or "FUSIA" in the relationship. For some reason I thought it was more expensive (in terms of traversal time) to look up a property on a relationship than to simply pass a named relationship type. On Fri, Jul 9, 2010 at 8:45 AM, Johan Svensson wrote: > Hi, > > I would not recommend to use large amounts of different (dynamically > created) relationship types. It is better to use well defined > relationship types with an additional property on the relationship > whenever needed. The limit is actually not 64k but 2^31, but having > large amounts of relationship types like 10k-100k+ will reduce > performance and consume a lot of memory. > > Regards, > Johan > > On Thu, Jul 8, 2010 at 4:13 PM, Max De Marzi Jr. > wrote: > > Can somebody verify the max number of relationship types? If it is 64k, > is > > there a way to increase it without significant effort? > > > > > >> I believe you can have something like 64k > >> relationship types, so using the relationship type for the route name is > >> possible. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OutOfMemory while populating large graph
I confess I had not investigated the batch inserter. From the description it fits my requirements exactly. With respect to auto-commits, it seems there are two use cases. The first is every day operations that might run out of memory. In this case it might be nice for neo4j to swap out memory to temporary disk as needed. If this performs acceptably, I think that should be default behavior. The second case is the initial population of a graph, where there is no need for roll back and so there is no need to commit to a temporary location. In this case, it seems having neo4j decide when to commit would be ideal. My concern with the first use case is that swapping to temporary storage at ideal intervals may be less efficient than having the user commit to permanent storage at less-than-ideal intervals. If that is the case, then the only real justification for committing to temporary storage would be if there was a requirement to potentially roll back a transaction that was larger than memory could support. -Paul -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Friday, July 09, 2010 7:30 AM To: Neo4j user discussions Subject: Re: [Neo4j] OutOfMemory while populating large graph 2010/7/9 Marko Rodriguez > Hi, > > > Would it actually be worth something to be able to begin a transaction > which > > auto-committs stuff every X write operation, like a batch inserter mode > > which can be used in normal EmbeddedGraphDatabase? Kind of like: > > > >graphDb.beginTx( Mode.BATCH_INSERT ) > > > > ...so that you can start such a transaction and then just insert data > > without having to care about restarting it now and then? > > Thats cool! Does that already exist? In my code (like others on the list it > seems) I have a counter++ that every 20,000 inserts (some made up number > that is not going to throw an OutOfMemory) commits and the reopens a new > transaction. Sorta sux. > No it doesn't, I just wrote stuff which I though someone could think of as useful. A cool thing with just telling it to do a batch insert mode transaction (not the actual commit interval) is that it could look at how much memory it had to play around with and commit whenever it would be the most efficient, even having the ability to change the limit on the fly if the memory suddenly ran out. > Thanks, > Marko. > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Hi, I would not recommend to use large amounts of different (dynamically created) relationship types. It is better to use well defined relationship types with an additional property on the relationship whenever needed. The limit is actually not 64k but 2^31, but having large amounts of relationship types like 10k-100k+ will reduce performance and consume a lot of memory. Regards, Johan On Thu, Jul 8, 2010 at 4:13 PM, Max De Marzi Jr. wrote: > Can somebody verify the max number of relationship types? If it is 64k, is > there a way to increase it without significant effort? > > >> I believe you can have something like 64k >> relationship types, so using the relationship type for the route name is >> possible. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OutOfMemory while populating large graph
Short answer is "maybe". ;-) There are some cases where the transaction is an "all or nothing" scenario, others where incremental commits are OK. Having the ability to do incremental autocommits would be useful, however. In a perfect world, it could be based on a "bucket" (e.g. XXX transactions), a time (each 30 seconds), or on a memory usage rule. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Friday, July 09, 2010 7:30 AM To: Neo4j user discussions Subject: Re: [Neo4j] OutOfMemory while populating large graph 2010/7/9 Marko Rodriguez > Hi, > > > Would it actually be worth something to be able to begin a transaction > which > > auto-committs stuff every X write operation, like a batch inserter mode > > which can be used in normal EmbeddedGraphDatabase? Kind of like: > > > >graphDb.beginTx( Mode.BATCH_INSERT ) > > > > ...so that you can start such a transaction and then just insert data > > without having to care about restarting it now and then? > > Thats cool! Does that already exist? In my code (like others on the list it > seems) I have a counter++ that every 20,000 inserts (some made up number > that is not going to throw an OutOfMemory) commits and the reopens a new > transaction. Sorta sux. > No it doesn't, I just wrote stuff which I though someone could think of as useful. A cool thing with just telling it to do a batch insert mode transaction (not the actual commit interval) is that it could look at how much memory it had to play around with and commit whenever it would be the most efficient, even having the ability to change the limit on the fly if the memory suddenly ran out. > Thanks, > Marko. > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Is it possible to count common nodes when traversing?
Sorry, it should be: for ( Node currentNode : Traversal.description() .breadthFirst().uniqueness( Uniqueness.RELATIONSHIP_GLOBAL) .relationships(MyRelationships.SIMILAR) .relationships(MyRelationships.CATEGORY) .prune(TraversalFactory.pruneAfterDepth(2)).traverse(node) ) { 2010/7/9 Mattias Persson > Just to notify you guys on this... since as of now (r4717) the > TraversalFactory class is named Traversal instead, so code would look like: > > for ( Node currentNode : TraversalFactory.description() > > .breadthFirst().uniqueness(Uniqueness.RELATIONSHIP_GLOBAL) > .relationships(MyRelationships.SIMILAR) > .relationships(MyRelationships.CATEGORY) > .prune(TraversalFactory.pruneAfterDepth(2)).traverse(node) ) { > > > 2010/7/8 Mattias Persson > > Your problem is that a node can't be visited more than once in a >> traversal, right? Have you looked at the new traversal framework in >> 1.1-SNAPSHOT? It solves that problem in that you can specify uniqueness for >> the traverser... you can instead say that each Relationship can't be visited >> more than once, but Nodes can. Your example: >> >> >> Map result = new HashMap(); >> for ( Node currentNode : TraversalFactory.createTraversalDescription() >> .breadthFirst().uniqueness(Uniqueness.RELATIONSHIP_GLOBAL) >> .relationships(MyRelationships.SIMILAR) >> .relationships(MyRelationships.CATEGORY) >> .prune(TraversalFactory.pruneAfterDepth(2)).traverse(node) ) { >> >> if(currentNode.hasProperty("category")) { >> >> if(result.get(currentNode) == null) { >> result.put(currentNode, 1); >> } else { >> result.put(currentNode, result.get(currentNode) + 1); >> } >> } >> } >> >> 2010/7/8 Rick Bullotta >> >> A performance improvement might be achieved by minimizing object >>> creation/hash inserts using a "counter" wrapper. >>> >>> - Create a simple class "Counter" with a single public property "count" >>> of >>> type int (not Integer) with an initial value of 1 >>> >>> - Tweak your code to something like: >>> >>>public Map findCategoriesForWord(String word) { >>> final Node node = index.getSingleNode("word", word); >>> final Map result = new HashMap>> Counter>(); >>> if(node != null) { >>>Traverser traverserWords = >>> node.traverse(Traverser.Order.BREADTH_FIRST, >>>StopEvaluator.DEPTH_ONE, new ReturnableEvaluator() { >>>@Override >>>public boolean isReturnableNode(TraversalPosition >>> traversalPosition) { >>>final Node currentNode = >>> traversalPosition.currentNode(); >>>final Iterator >>> relationshipIterator = >>> currentNode.getRelationships(MyRelationships.CATEGORY).iterator(); >>>while(relationshipIterator.hasNext()) { >>>final Relationship relationship = >>> relationshipIterator.next(); >>>final String categoryName = (String) >>> relationship.getProperty("catId"); >>> >>> Counter counter = >>> result.get(categoryName); >>> >>>if(counter == null) { >>>result.put(categoryName, new Counter()); >>>} else { >>> ++counter.count; >>> } >>>} >>>return true; >>>} >>>}, MyRelationships.SIMILAR, Direction.BOTH); >>>traverserWords.getAllNodes(); >>>} >>>return result; >>>} >>> >>> >>> -Original Message- >>> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] >>> On >>> Behalf Of Java Programmer >>> Sent: Thursday, July 08, 2010 8:12 AM >>> To: Neo4j user discussions >>> Subject: Re: [Neo4j] Is it possible to count common nodes when >>> traversing? >>> >>> Hi, >>> Thanks for your answer but it's not exactly what I was on my mind - >>> word can belong to several categories, and different words can share >>> same category e.g.: >>> >>> "word 1" : "category 1", "category 2", "category 3" >>> "word 2" : "category 2", "category 3" >>> "word 3" : "category 3" >>> >>> there is relation between "word 1" and "word 2" and between "word 2" >>> and "word 3" (SIMILAR). >>> As a result when querying for "word 1" with depth 1, I would like to get: >>> "category 1" -> 1 (result), "category 2" -> 2, "category 3" -> 2 (not >>> 3 because it's out of depth) >>> >>> So far I have changed previous method to use the relationship with >>> property of categoryId, but I don't know if there won't be >>> a performance issues (I iterate for all relationship of the found node >>> (every similar), and store the categories in Map). If you could look >>> at it and tell me if the way of thinking is good
Re: [Neo4j] OutOfMemory while populating large graph
I've a similar problem. Although I'm not going out of memory yet, I can see the heap constantly growing, and JProfiler says most of it is due to the Lucene indexing. And even if I do the commit after every X transactions, once the population is finished, the final commit is done, and the graph db closed - the heap stays like that - almost full. An explicit gc will clean up some part, but not fully. Arijit On 9 July 2010 17:00, Mattias Persson wrote: > 2010/7/9 Marko Rodriguez > > > Hi, > > > > > Would it actually be worth something to be able to begin a transaction > > which > > > auto-committs stuff every X write operation, like a batch inserter mode > > > which can be used in normal EmbeddedGraphDatabase? Kind of like: > > > > > >graphDb.beginTx( Mode.BATCH_INSERT ) > > > > > > ...so that you can start such a transaction and then just insert data > > > without having to care about restarting it now and then? > > > > Thats cool! Does that already exist? In my code (like others on the list > it > > seems) I have a counter++ that every 20,000 inserts (some made up number > > that is not going to throw an OutOfMemory) commits and the reopens a new > > transaction. Sorta sux. > > > > No it doesn't, I just wrote stuff which I though someone could think of as > useful. A cool thing with just telling it to do a batch insert mode > transaction (not the actual commit interval) is that it could look at how > much memory it had to play around with and commit whenever it would be the > most efficient, even having the ability to change the limit on the fly if > the memory suddenly ran out. > > > > Thanks, > > Marko. > > > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Hacker, Neo Technology > www.neotechnology.com > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- "And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be." ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Is it possible to count common nodes when traversing?
Just to notify you guys on this... since as of now (r4717) the TraversalFactory class is named Traversal instead, so code would look like: for ( Node currentNode : TraversalFactory.description() .breadthFirst().uniqueness(Uniqueness.RELATIONSHIP_GLOBAL) .relationships(MyRelationships.SIMILAR) .relationships(MyRelationships.CATEGORY) .prune(TraversalFactory.pruneAfterDepth(2)).traverse(node) ) { 2010/7/8 Mattias Persson > Your problem is that a node can't be visited more than once in a traversal, > right? Have you looked at the new traversal framework in 1.1-SNAPSHOT? It > solves that problem in that you can specify uniqueness for the traverser... > you can instead say that each Relationship can't be visited more than once, > but Nodes can. Your example: > > > Map result = new HashMap(); > for ( Node currentNode : TraversalFactory.createTraversalDescription() > .breadthFirst().uniqueness(Uniqueness.RELATIONSHIP_GLOBAL) > .relationships(MyRelationships.SIMILAR) > .relationships(MyRelationships.CATEGORY) > .prune(TraversalFactory.pruneAfterDepth(2)).traverse(node) ) { > > if(currentNode.hasProperty("category")) { > > if(result.get(currentNode) == null) { > result.put(currentNode, 1); > } else { > result.put(currentNode, result.get(currentNode) + 1); > } > } > } > > 2010/7/8 Rick Bullotta > > A performance improvement might be achieved by minimizing object >> creation/hash inserts using a "counter" wrapper. >> >> - Create a simple class "Counter" with a single public property "count" of >> type int (not Integer) with an initial value of 1 >> >> - Tweak your code to something like: >> >>public Map findCategoriesForWord(String word) { >> final Node node = index.getSingleNode("word", word); >> final Map result = new HashMap> Counter>(); >> if(node != null) { >>Traverser traverserWords = >> node.traverse(Traverser.Order.BREADTH_FIRST, >>StopEvaluator.DEPTH_ONE, new ReturnableEvaluator() { >>@Override >>public boolean isReturnableNode(TraversalPosition >> traversalPosition) { >>final Node currentNode = >> traversalPosition.currentNode(); >>final Iterator >> relationshipIterator = >> currentNode.getRelationships(MyRelationships.CATEGORY).iterator(); >>while(relationshipIterator.hasNext()) { >>final Relationship relationship = >> relationshipIterator.next(); >>final String categoryName = (String) >> relationship.getProperty("catId"); >> >> Counter counter = >> result.get(categoryName); >> >>if(counter == null) { >>result.put(categoryName, new Counter()); >>} else { >> ++counter.count; >> } >>} >>return true; >>} >>}, MyRelationships.SIMILAR, Direction.BOTH); >>traverserWords.getAllNodes(); >>} >>return result; >>} >> >> >> -Original Message- >> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] >> On >> Behalf Of Java Programmer >> Sent: Thursday, July 08, 2010 8:12 AM >> To: Neo4j user discussions >> Subject: Re: [Neo4j] Is it possible to count common nodes when traversing? >> >> Hi, >> Thanks for your answer but it's not exactly what I was on my mind - >> word can belong to several categories, and different words can share >> same category e.g.: >> >> "word 1" : "category 1", "category 2", "category 3" >> "word 2" : "category 2", "category 3" >> "word 3" : "category 3" >> >> there is relation between "word 1" and "word 2" and between "word 2" >> and "word 3" (SIMILAR). >> As a result when querying for "word 1" with depth 1, I would like to get: >> "category 1" -> 1 (result), "category 2" -> 2, "category 3" -> 2 (not >> 3 because it's out of depth) >> >> So far I have changed previous method to use the relationship with >> property of categoryId, but I don't know if there won't be >> a performance issues (I iterate for all relationship of the found node >> (every similar), and store the categories in Map). If you could look >> at it and tell me if the way of thinking is good, I would be very >> appreciate: >> >>public Map findCategoriesForWord(String word) { >>final Node node = index.getSingleNode("word", word); >>final Map result = new HashMap(); >>if(node != null) { >>Traverser traverserWords = >> node.traverse(Traverser.Order.BREADTH_FIRST, >>StopEvaluator.DEPTH_ONE, new ReturnableEvaluator() { >>@Override >>public boolean isReturnableNod
Re: [Neo4j] New tentative API in trunk: Expander/Expansion
Thanks for all the input guys! As of revision 4717 these methods no longer exist in trunk. Since this was just a tentative API. I will continue experimenting with this API in a branch and it will likely make it back into the core API in a later release. Cheers, Tobias -- Tobias Ivarsson Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OutOfMemory while populating large graph
2010/7/9 Marko Rodriguez > Hi, > > > Would it actually be worth something to be able to begin a transaction > which > > auto-committs stuff every X write operation, like a batch inserter mode > > which can be used in normal EmbeddedGraphDatabase? Kind of like: > > > >graphDb.beginTx( Mode.BATCH_INSERT ) > > > > ...so that you can start such a transaction and then just insert data > > without having to care about restarting it now and then? > > Thats cool! Does that already exist? In my code (like others on the list it > seems) I have a counter++ that every 20,000 inserts (some made up number > that is not going to throw an OutOfMemory) commits and the reopens a new > transaction. Sorta sux. > No it doesn't, I just wrote stuff which I though someone could think of as useful. A cool thing with just telling it to do a batch insert mode transaction (not the actual commit interval) is that it could look at how much memory it had to play around with and commit whenever it would be the most efficient, even having the ability to change the limit on the fly if the memory suddenly ran out. > Thanks, > Marko. > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Can I use neo4j for this?
On Fri, Jul 9, 2010 at 12:33 PM, wrote: > Dear all, Dear you, > I'm completely new to neo4j (and don't even really speak Java), I'm a sysadmin with poor object programing skill. It wasn't a problem to use neo4j, as the api is simple and clear. (No "enterprisy" factoryFactory.Proxy.Processor(thingy.stuff()). Just simple object) > but I have been struggling in vain for quite a while to get sensible > performance on my graph-data in MySQL and PostgreSQL. From your webpage and > other posts on the lists I got the great feeling that newbies are welcome > here, Friendly Greetings \o/ > so I hope it is all right if I tell you something about my data and what I > want to know about it so that maybe someone can tell whether I can actually > do this with neo4j. Yes, you can ! (c)(r)(tm) > My data is about 250 million separate graphs with a grand total of about 5 > billion nodes. > - The graphs are of a tree-like structure (many are actual trees, but not all > of them). > - Every graph has an id. > - Every node has 4 properties: It should work :) > 1. name (some names are very common, many occur only once or twice) > 2. category1 (there are about 40 different categories on this level) > 3. name-group (John, Jon, Jonathan form one group, many of the names that > occur only once get their own name group) > 4. category2 (there are about 10 different categories on this level) > - Every edge has one or two properties > 1. type (currently about 50 different ones) [obligatory] > 2. attribute [only there for 3 types; about 10 per cent of all edges] Please note that you can have many edge per node and many "type" of edge. Edges can have many properties. So no problems here. > [snip] > If no highlighting is done, we just return the ids. > If highlighting was done, let's say on n4.name, then I want all names that > occur in this position of any graph. I don't understand the "highlight" thing. [snip] > I hope I managed to make myself understood. If not, I am happy to draw some > graphs and upload them somewhere. > > I know that I will need indices on the name and name-family properties. Not > sure how well they would perform on the less selective properties, though. Lucene (prefered index engine for neo4j) is very powerfull :) [snip] > Performance: For many queries of the type outlined above, I have to wait for > more than two minutes on my SMALL dataset (6 million graphs, 100 million > nodes, 87 million edges) via PostgreSQL. For some it is more like 10 or 20 > minutes... I would prefer not to have to wait for more than 5 seconds on the > small dataset and 20 or 30 on the big dataset. I can try to help with postgresql (i'm postgresql DBA) but, imho, you should try with neo4j :) (and i'll love to hear your feedback about graph in pgsql vs graph in neo4j) I have a question : you say "250 millions separate graph". Each 250 millions graph are totally independant ? (okay, 250 millions shards are probably overkill but... well... just wondering :) ) *hugs* -- Laurent "ker2x" Laborde Sysadmin & DBA at http://www.over-blog.com/ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OutOfMemory while populating large graph
Hi, > Would it actually be worth something to be able to begin a transaction which > auto-committs stuff every X write operation, like a batch inserter mode > which can be used in normal EmbeddedGraphDatabase? Kind of like: > >graphDb.beginTx( Mode.BATCH_INSERT ) > > ...so that you can start such a transaction and then just insert data > without having to care about restarting it now and then? Thats cool! Does that already exist? In my code (like others on the list it seems) I have a counter++ that every 20,000 inserts (some made up number that is not going to throw an OutOfMemory) commits and the reopens a new transaction. Sorta sux. Thanks, Marko. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OutOfMemory while populating large graph
Modifications in a transaction are kept in memory so that there's the ability to rollback the transaction completely if something would go wrong. There could of course be a solution where (I'm just spawning supposedly), so that if a tx gets big enough such a transaction gets converted into its own graph database or some other on-disk data structure which would then be merged into the main database on commit. Would it actually be worth something to be able to begin a transaction which auto-committs stuff every X write operation, like a batch inserter mode which can be used in normal EmbeddedGraphDatabase? Kind of like: graphDb.beginTx( Mode.BATCH_INSERT ) ...so that you can start such a transaction and then just insert data without having to care about restarting it now and then? Another view of this is that such big transactions (I'm assuming here) are only really used for a first-time insertion of a big data set, where the BatchInserter can be used and does exactly that... it flushes to disk whenever it feels like and you can just go on feeding it more and more data. 2010/7/8 Rick Bullotta > Paul, I also would like to see automatic swapping/paging to disk as part of > Neo4J, minimally when in "bulk insert" mode...and ideally in every usage > scenario. I don't fully understand why the in-memory logs get so large > and/or aren't backed by the on-disk log, or if they are, why they need to > be > kept in memory as well. Perhaps it isn't the transaction "stuff" that is > taking up memory, but the graph itself? > > Can any of the Neo team help provide some insight? > > Thanks! > > > -Original Message- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] > On > Behalf Of Paul A. Jackson > Sent: Thursday, July 08, 2010 1:35 PM > To: (User@lists.neo4j.org) > Subject: [Neo4j] OutOfMemory while populating large graph > > I have seen people discuss committing transactions after some microbatch of > a few hundred records, but I thought this was optional. I thought Neo4J > would automatically write out to disk as memory became full. Well, I > encountered an OOM and want to make sure that I understand the reason. Was > my understanding incorrect, or is there a parameter that I need to set to > some limit, or is the problem them I am indexing as I go. The stack trace, > FWIW, is: > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >at java.util.HashMap.(HashMap.java:209) >at java.util.HashSet.(HashSet.java:86) >at > > org.neo4j.index.lucene.LuceneTransaction$TxCache.add(LuceneTransaction.java: > 334) >at > org.neo4j.index.lucene.LuceneTransaction.insert(LuceneTransaction.java:93) >at > org.neo4j.index.lucene.LuceneTransaction.index(LuceneTransaction.java:59) >at > org.neo4j.index.lucene.LuceneXaConnection.index(LuceneXaConnection.java:94) >at > > org.neo4j.index.lucene.LuceneIndexService.indexThisTx(LuceneIndexService.jav > a:220) >at > org.neo4j.index.impl.GenericIndexService.index(GenericIndexService.java:54) >at > > org.neo4j.index.lucene.LuceneIndexService.index(LuceneIndexService.java:209) >at > JiraLoader$JiraExtractor$Item.setNodeProperty(JiraLoader.java:321) >at > JiraLoader$JiraExtractor$Item.updateGraph(JiraLoader.java:240) > > Thanks, > Paul Jackson > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Can I use neo4j for this?
Dear all, I'm completely new to neo4j (and don't even really speak Java), but I have been struggling in vain for quite a while to get sensible performance on my graph-data in MySQL and PostgreSQL. From your webpage and other posts on the lists I got the great feeling that newbies are welcome here, so I hope it is all right if I tell you something about my data and what I want to know about it so that maybe someone can tell whether I can actually do this with neo4j. My data is about 250 million separate graphs with a grand total of about 5 billion nodes. - The graphs are of a tree-like structure (many are actual trees, but not all of them). - Every graph has an id. - Every node has 4 properties: 1. name (some names are very common, many occur only once or twice) 2. category1 (there are about 40 different categories on this level) 3. name-group (John, Jon, Jonathan form one group, many of the names that occur only once get their own name group) 4. category2 (there are about 10 different categories on this level) - Every edge has one or two properties 1. type (currently about 50 different ones) [obligatory] 2. attribute [only there for 3 types; about 10 per cent of all edges] There are two sorts of questions I want to be able to answer: 1. The user specifies a subgraph (currently even a subtree, but not sure whether it will remain that way) and wants the ids of all matching graphs. 2. The user specifies a subgraph and highlights one position he didn't fill in. As a result, he wants a list of all items that occur in this position ordered by their frequency in this position. Examples of queries: (sorry for the weird format, but I have no idea how to represent a tree in text) EXAMPLE 1: relations: n1 > n2 (relation type: t1) n1 > n3 (relation type: t2) n3 > n4 (relation type: t3) n3 > n5 (relation type: t3) n3 > n6 (relation type not specified, just has to exist) node properties: n1: name-group: John-like; category2: c2-13 n2: [no properties specified, just has to exist] n3: category1: c1-15 n4: [no properties specified, just has to exist] n5: [no properties specified, just has to exist] n6: name: Ben; category2: c2-13 If no highlighting is done, we just return the ids. If highlighting was done, let's say on n4.name, then I want all names that occur in this position of any graph. EXAMPLE 2: relations: n1 > n2 (relation type: t1) n1 > n3 (relation type: t2) n3 > n4 (relation type: t3) no node properties specified. I hope I managed to make myself understood. If not, I am happy to draw some graphs and upload them somewhere. I know that I will need indices on the name and name-family properties. Not sure how well they would perform on the less selective properties, though. Basically, my problem is similar to the one found here: http://lists.neo4j.org/pipermail/user/2009-June/001331.html But what makes me worry is a quote from here: http://components.neo4j.org/neo4j-graph-matching/ "The pattern matching is done by first defining a graph pattern and then searching for matching occurrences of that pattern in the graph around a given anchor node." I do not necessarily have an anchor node. And I have lots of graphs... Performance: For many queries of the type outlined above, I have to wait for more than two minutes on my SMALL dataset (6 million graphs, 100 million nodes, 87 million edges) via PostgreSQL. For some it is more like 10 or 20 minutes... I would prefer not to have to wait for more than 5 seconds on the small dataset and 20 or 30 on the big dataset. Sorry for the lengthy email and I'm looking forward to your replies! Best regards, Jonathan -- GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl. Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to traverse by the number of relationships between nodes?
Hi Craig, That's great, thanks a lot. I'll give it a go. Cheers, Tim - Original Message From: Craig Taverner To: Neo4j user discussions Sent: Thu, July 8, 2010 8:49:38 PM Subject: Re: [Neo4j] How to traverse by the number of relationships between nodes? Hi Tim, It is exactly the same approach, but instead of building the route cache while loading the graph, just do it on a second pass which traverses the graph. If the graph is structured like you describe, then write a traverser that visits each Visit node once, and for each Visit node iterate over the Page relationships, creating an Array of relationships. Sort the array by your visit order property and you have the route cache. Step backwards through the route creating the route relationships as described before. Cheers, Craig On Thu, Jul 8, 2010 at 5:33 PM, Tim Jones wrote: > Hi Craig, thanks for your answer. > > What's your approach that would allow me to specify the destination node at > analysis time? I'd like to retain the flexibility to do this too. > > Thanks > Tim > > > > - Original Message > From: Craig Taverner > To: Neo4j user discussions > Sent: Thu, July 8, 2010 12:09:28 PM > Subject: Re: [Neo4j] How to traverse by the number of relationships between > nodes? > > Even without the new traversal framework, the returnable evaluator has > access to the current node being evaluated and can investigate it's > relationships (and even run another traverser). I'm not sure if nested > traversing is a good idea, but I certainly have used methods like > getRelationships inside an evaluator with no problems. > > As for the main goal, I think there are many ways to skin a cat. For > performance reasons I would always look for the way that embeds the final > result in the graph structure itself, so you don't need complex traversals > to get your answer. So in your case you want the 10 most popular routes, I > guess what you are looking for are relationships between pages that define > a > route and a popularity score. So the final answer would be found by simply > sorting these relationships to the destination page by popularity. No > traversal required :-) > > Your current structure is a good match for the incoming data, but requires > lots of traversing to determine the main answer you are after. So I would > vote for adding a new structure that includes the answer. I think I have an > idea that can be done during load if you know in advance the destination > node you want to analyse, as well as after load (second pass) if you want > to > specify the destination node only at analysis time. I'll describe the > 'during load' approach. > > Load the apache log data, optionally building the structure you do now, but > also identifying all start points and routes to the destination. This can > be > achieved by an in memory cache for each user session (visit) of the route > from the entry point, appended to as each new page is visited (just an > ArrayList of page Nodes, growing page-by-page), and when the destination > Page is reached, create a unique identifier for that route (eg. a string of > all node-ids in the route, or the hashcode of that). Then step back along > all nodes in the route, adding relations with > DynamicRelationshipType.withName("ROUTE-"+routeName) and property count=1, > and if the relationship already exists for that name, increment the count. > > You can even load later apache logs to this and it will continue to > incremement the route counters nicely. And to reset the counters, just > delete all those route relationships. > > Now the final answer for your query is only to iterate over all incoming > relationships to the destination page, and if the relationship type name > starts with 'ROUTE-' add to an ArrayList of relationships, and then sort > that list by the counter property. This should be almost instantaneous > result :-) > > Of course, this algorithm assumes that the total number of possible routes > is not unreasonably high. I believe you can have something like 64k > relationship types, so using the relationship type for the route name is > possible. If you are uncomfortable with that, just use a static type like > 'ROUTE', and put the relationship name in a relationship property. That > slightly increases the complexity of the test for the route during creation > and slightly decreases the complecity of the test for the route during the > final scoring. For this example, the performance difference is > insignificant. > > Cheers, Craig > > > On Thu, Jul 8, 2010 at 10:57 AM, Anders Nawroth >wrote: > > > Hi Tim! > > > > Maybe you can use the new traversal framework, this interface comes to > > mind: > > > > >http://components.neo4j.org/neo4j-kernel/apidocs/org/neo4j/graphdb/traversal/SourceSelector.html >l > >l > > > > Regarding the number of relationships, it could be a good idea to store > > it as a property on the node. > > > > /anders > > > > > Is there any way I can write a ReturnableEva
Re: [Neo4j] Write Neo4j Books - Packt Publishing
On Fri, Jul 9, 2010 at 9:47 AM, Kshipra Singh wrote: > Hi All, > > I represent Packt Publishing, the publishers of computer related books. > > We are planning to extend our range of Open Source books based on Java > technology and are currently inviting authors interested in writing them. > This doesn't require any past writing experience. All that we expect from our > authors is a good subject knowlegde, a passion to share it with others and an > ability to communicate clearly in English. > > So, if you love Neo4j and fancy writing a book, here's an opportunity for > you! Send us your book ideas at aut...@packtpub.com and our editorial team > will be happy to evaluate them. Even if you don't have a book idea and are > simply interested in writing a book, we are still keen to hear from you. I'll happily buy a neo4j book. (I won't write it :D ) (funny enough, i never heard about paktpub until yesterday, when i bought a book a Solr) -- Laurent "ker2x" Laborde Sysadmin & DBA at http://www.over-blog.com/ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Write Neo4j Books - Packt Publishing
Hi All, I represent Packt Publishing, the publishers of computer related books. We are planning to extend our range of Open Source books based on Java technology and are currently inviting authors interested in writing them. This doesn't require any past writing experience. All that we expect from our authors is a good subject knowlegde, a passion to share it with others and an ability to communicate clearly in English. So, if you love Neo4j and fancy writing a book, here's an opportunity for you! Send us your book ideas at aut...@packtpub.com and our editorial team will be happy to evaluate them. Even if you don't have a book idea and are simply interested in writing a book, we are still keen to hear from you. Packt runs an Open Source royalty scheme so by writing for Packt you will be giving back to the Open Source Community. More details about this opportunity are available at: http://authors.packtpub.com/content/calling-open-source-java-based-technology-enthusiasts-write-packt Thanks Kshipra Singh Author Relationship Manager Packt Publishing www.PacktPub.com Skype: kshiprasingh15 Twitter: http://twitter.com/packtauthors Interested in becoming an author? Visit http://authors.packtpub.com for all the information you need about writing for Packt. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user