Re: [Neo] How to efficiently query in Neo4J?
Hi there, as both Rick and Michael say - there is different use cases, different ways of organizing your data and different ways to query it. Marko and me tried to summarize what is working especially good with Graph Databases and what not: http://markorodriguez.com/Blarko/Entries/2010/4/7_The_Graph_Traversal_Pattern.html, the paper is even at http://www.scribd.com/doc/29591188/The-Graph-Traversal-Pattern for easy reading. To cite the conclusion: --8 4 Conclusion Graphs are a flexible modeling construct that can be used to model a domain and the indices that partition that domain into an efficient, searchable space. When the relations between the objects of the domain are seen as vertex partitions, then a graph is simply an index that relates vertices to vertices by edges. The way in which these vertices relate to each other determines which graph traversals are most efficient to execute and which problems can be solved by the graph data structure. Graph databases and the graph traversal pattern do not require a global analysis of data. For many problems, only local subsets of the graph need to be traversed to yield a solution. By structuring the graph in such a way as to minimize traversal steps, limit the use of external indices, and reduce the number of set-based operations, modelers gain great efficiency that is difficult to accomplish with other data management solutions. ---8 So, graphs are no silver bullets but can gain great efficiency when applied to the right problems in the right manner. Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://nosqleu.org- The biggest NOSQL event. Ever. http://www.thoughtmade.com - Scandinavias coolest Bring-a-Thing party. On Fri, Apr 9, 2010 at 12:33 AM, Michael Ludwig mil...@gmx.de wrote: rick.bullotta schrieb am 08.04.2010 um 15:16:11 (-0700) [Re: [Neo] How to efficiently query in Neo4J?]: Factor in a wide range of SLAs needed for performance vs availability vs affordability vs scalability vs adminstration costs, and the equation gets a whole lot more complicated. Granted. I'm sure there's a graphy-model for the tag/post example that could be made smoking fast with Neo also. Sure, but there's also a way of looking at screws that might suggest you should use a hammer ;-) and it would be wrong. Which doesn't mean it couldn't be modeled for the tag/post example - just a general caveat to think about both tools and problems when trying to find a good solution. Throw columnar storage, key-value, and document DB's into the mix, and the good news is that we have a lot of weapons in our arsenal now to tackle very demanding and diverse application challenges! Yes, it's becoming very interesting. Lots of new high-level tools for specialized or relaxed requirements. SQL won't be dethroned, though. -- Michael Ludwig ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] sail and time index
You could perhaps use the LuceneFulltextQueryIndexService (which is a LuceneFulltextIndexService, but which interprets the value argument in getNodes() as lucene query syntaxhttp://lucene.apache.org/java/2_9_1/queryparsersyntax.html). Index your URI|time as a one concatenated value and query it with range querieshttp://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Range%20Searches, f.ex: [http://my-uri|0 TO http://my-uri|1270797634350] (since all values are strings in lucene), and grabbing the last one. If the results of the range query could be reversed you could grab the first one instead, which would be much better. This solution doesn't strike me as being particularly good, but might work (I haven't tried it). 2010/4/8 Lyudmila L. Balakireva lu...@lanl.gov Hi, I have question how to better deal with time range index. I am using Sail layer to build a triple store with context as time value. I need binary search the nearest context value for the given uri and requested time. For example for the mysql it will be table [ uri,time ] where rows: u ,t1 u, t2 u1,t1 u2,t2 etc. having the given uri u and time t t1 t t2 I can fast find the nearest t1 for uri u by query: select max (time) where timet and uri =u; Is any internal trick I can use for neo4j to build such index or optimize the operation? The timeline index is somewhat different since I need first to break recordset by uri and then find time. It will be many millions of timelines attached to the one node and I still need to iterate thru the nodes. What else can be done here then Mysql? Thank you for help, Luda ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] verbosequadstore loading
There are some problems at the moment regarding insertion speeds. o We haven't yet created an rdf store which can use a BatchInserter (which could also be tweaked to ignore checking if statements already exists before it adds each statement and all that). o The other one is that the sail layer on top of the neo4j-rdf component contains functionality which allows a thread to have more than one running transaction at the same time. This was added due to some users requirements, but slows it down by a factor 2 or something (not sure about this). I would like to see both these issues resolved soon, and when they are fixed insertion speeds will be quite nice! 2010/4/9 Lyudmila L. Balakireva lu...@lanl.gov Hi, How to optimize loading to the VerboseQuadStore? I am doing test similar to the test example from neo rdf sail and it is very slow. The size of files 3G - 7G . Thanks, Luda ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Traversers in the REST API
I definitely agree that limiting or paging a set of results is probably not very useful without some sort of sorting. The (only) benefit of pushing sorting to the client is that the client might be able to filter the result further before sorting it. Since sorting is generally the most expensive operation it should be done as late as possible. However the idea of semi-sorting, to get only one page of sorted results at each request, that was mentioned in some thread yesterday sounds quite compelling. I agree that an equivalent of LIMIT, OFFSET and ORDER BY is a good target. As to indexing: the structure of the graph IS the index to a large extent. This means that a well designed graph would often not need paging if the traversal is done right. There are however some cases where this is hard to accomplish and we need to work on supporting those cases better. Remember that a Graph Database is NOT a Relational Database. A lot of the ideas people have about databases are based on their knowledge of Relational Databases. I understand that it can be hard, but if that baggage could be left at the door it would make things a lot easier. Nobody is saying that Relational Databases are dead (except for some publicity stunts) far from it! What we (and a lot of other people) are saying is that the age of one database to rule them all is over. Different problems are best solved with different kinds of databases, RDBMSes are great at some, K/V stores some, and Graph Databases are great for some. Then there are some problems that are best solved with a combination of two or more (kinds of) databases, where each database brings its own strengths to the table, and is used only for the things it is good at. That's enough deviation from the topic, my conclusions remain the same as they were before this discussion started, I will state them in as few words as possible and in bullet point form to convey them as clearly as I can: * The REST API will probably need result set limiting or pagination. * Limiting and pagination will require (server side) sorting * Sorting can be better implemented if it's implemented in the core of the traversal framework * Limiting / Pagination can be deferred for a while until we know what it needs to look like (from looking at actual uses) * (Server side) Sorting can be deferred until we need it for limiting / pagination Peace, Tobias On Thu, Apr 8, 2010 at 10:17 PM, Michael Ludwig mil...@gmx.de wrote: Tobias Ivarsson schrieb am 08.04.2010 um 18:23:27 (+0200) [Re: [Neo] Traversers in the REST API]: On Wed, Apr 7, 2010 at 3:05 PM, Alastair James al.ja...@gmail.com wrote: when we start talking about returning 1000s of nodes in JSON over HTTP just to get the first 10 this is clearly sub-optimal (as I build websites this is a very common use case). So, as you say, sorting and limiting can wait, but I suspect the HTTP API would benefit from offering it. Limiting need not require changes to the core API, it could be implemented as a second stage in the HTTP API code prior to output encoding. For paging / limiting: yes, you are absolutely right, this would not effect the core API at all, only the REST API. Limiting/paging is something we would probably add to the REST API before sorting. Limiting and paging usually go hand in hand with sorting, in my experience. Why would anyone want to page through an unsorted collection? Sorting might be a similar case, but I still think the client would be better fitted to do sorting well. The server has indexes to support the sorting. (If it doesn't, it has a problem anyway.) What does the client have to support sorting? So how would it be better fitted to do sorting well? But once paging / limiting is added it would be quite natural / useful to add sorting as well. What I want to avoid is keeping state on the server while waiting for the client to request the next page. If you ensure a binary tree index is used to do the sorting, you should be fine. -- Michael Ludwig ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Find Dominating Set in a graph,Dead end using Neo4j
Hi, Sorry for the much delayed response. First a disclaimer: Neo4j makes no claim to magically make P=NP (the problem of finding a minimal dominating set is NP-hard), it would be cool if Neo4j could be used to prove such a thing, but I am not going to spend the rest of my life attempting it. I would suggest that you try to find a small dominating set rather than the minimal dominating set. Even then I cant think of a solution that doesn't require having all Nodes sorted by their degree in memory. If you want to do this with the kind of numbers of Nodes we consider normal (from several millions to billions), you would run into a lot of trouble. The code you have written looks like it might instantiate the collections you use for storing your result multiple times. But I cant tell without the full source code. Cheers, Tobias On Sun, Mar 28, 2010 at 3:01 PM, francis adediran adedi...@gmail.comwrote: Hi, I have been trying to solve the problem of minimal dominating set using Neo4j but to no avail. The algorithm to solve this problem is correct but implementing it using Neo4j is becoming a pain or maybe i'm not yet familiar with Neo as i thought i was. The minimal dominating set in a graph is a set of nodes such that every node in the network graph is a neighbor of at least one element of the set. In social networking terms, they are the set of people who collectively are friends with every one in the network. More info on dominating set here http://en.wikipedia.org/wiki/Dominating_set so for example,given this bi-directional graph network A \ \ B / \ / \ C D \ / E B and C are the minimal dominating set. Also,the sets {B,E} and {B,D} are possible candidates depending on your algorithm. but the most important thing to get a set with minimal number of nodes the best algorithm know so far, is Chvatal algorithm. it basically selects the node which brings the most new nodes into the dominated set. In our graph above, the dominating set S is {B,C) and dominated set V is{A,D,E} B is selected because it brings in A,C and D C is selected because it brings in E The dominating set can be useful to marketers and even spammers as it provide an indirect way to reach an entire network I've tried implementing this in Neo4j,but the problem is since we need to keep track of the dominating and dominated sets neo4j keeps re-instantiating them in the Returnable Evaluator while traversing the network is there a better way to do this Neo4j? Traverser friendsTraverser = startNode.traverse(Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, new myReturnableEvaluator(){ private ArrayList Node DM = new ArrayListNode(); //Dominating set private ArrayList Node V = new ArrayListNode(); //dominated set @Override public boolean isReturnableNode(TraversalPosition pos) { int numberof_new_users=this.getNodesDegree(pos.currentNode()); if(!(DM.isEmpty())) { //get node's conns IterableRelationship relationships = pos.currentNode().getRelationships(Direction.OUTGOING); //get all there end nodes for(Relationship r:relationships) { Node end = r.getEndNode(); . ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] meta meta classes
Now we're getting somewhere... I like this solution a lot, thanks for the great idea. Let me see if I get time to try and implement it soon! Or if you get to if before me and supply a patch. It's nice either way :) 2010/4/9 Niels Hoogeveen pd_aficion...@hotmail.com Having thought about implementation a little bit more, it would be neat to have MetaModelClass implement an interface similar to MetaModelRestrictable, with one difference, the setRange and getRange don't refer to a PropertyRange but to an InstanceRange. Like PropertyRange, InstanceRange should be an abstract class so several different different implementations of InstanceRanges can be provided: * InstanceEnumerationRange (enumerates the n instances the class contains and sets minCardinality = maxCardinality = n ) * IndexedInstanceRange (indexes the n instances it contains and sets minCardinality = maxCardinality = n, making the class into an Instance array) * NullInstanceRange (set minCardinality = maxCardinality = 0, making the class abstract) * InstanceClassesRange (states that the instances of a class have to be instances of one of a given set of classes) The first three range types restrict the MetaModelClass to having no getDirectInstances.add functionality, and also the setting of cardinality should be disabled, which makes me wonder if we need a class FixedMetaModelClass, which simply doesn't provide these methods, or if we should throw exceptions when methods are inappropriately called, or if we allow inconsistent data like the rest of the meta model does. From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Thu, 8 Apr 2010 23:52:02 +0200 Subject: Re: [Neo] meta meta classes I think the best solution here is to have an instance enumeration on MetaModelClass. Singletons are special case of an enumeration. See: http://www.w3.org/TR/2002/WD-owl-ref-20021112/#Enumerated http://owl.cs.manchester.ac.uk/2007/05/api/javadoc/org/semanticweb/owl/model/OWLObjectOneOf.html Date: Thu, 8 Apr 2010 22:33:11 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo] meta meta classes Ok now I get your point! Thank you for clarifying. Your singleton proposal could be a good idea then. Could it potentially be a hindrance in some scenario? I mean should we have a MetaModelClass#setSingleton(boolean) or something so that this behaviour can be controlled? 2010/4/8, Niels Hoogeveen pd_aficion...@hotmail.com: Each country needs to be modeled as classes, because I want to set the restriction that French regions (which can have different properties from Canadian provinces) can only have a relationship with the country France, and Canadian provinces can only have a relationship with the country Canada. The domain and the range of a PropertyType are classes not instances. If countries were simply instances of the country class, it would be possible to say that an instance of a Canadian province is a subdivision of France. I'd like to be able to iterate over the subdivision of France and have guaranteed that each instance has the property region code, a property unknown to Canadian provinces. Without having a restriction stating that a specific sub-division belongs to a specific country, any sub-division can be related to any country, so a user may erroneously say that Alberta is a French region. Not only is this factually incorrect, but structurally too. Alberta, being a Canadian province, doesn't have the region code property, which I want French regions to have. Date: Thu, 8 Apr 2010 20:06:32 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo] meta meta classes 2010/4/8 Niels Hoogeveen pd_aficion...@hotmail.com Your point about the cardinality restriction is a correct observation. In fact it would be better to create a is-subdivision-of PropertyType on sub-division and give that a range country with a cardinality of 1. Then for each subclass of sub-division a restriction should be set, naming the country class this specific sub-division class applies to. Still, it requires each country to be defined as both a class and an instance. Why just countries as classes? why not each subdivision as classes as well? Why countries as classes at all? Date: Thu, 8 Apr 2010 19:15:30 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo] meta meta classes So, you describe the model (with country, sub-division and has-sub-division) which is OK! Then you not just want to add data which would conform to it, you also describe the highest level of that data in the meta model itself with cardinality for how
Re: [Neo] Traversers in the REST API
Since in manycases the results of a query will need to be reformed into their associated domain objects, we've chosen to do our sorting at that point (and on the server). We do our (primary) filtering within the traversal/DB-domain object processes. That seems to work well. Pagination is kinda tricky if the data changes between subsequent requests for pages. Since pagination is generally used for UIs, a common approach is to place the entire dataset (or a cursor, depending on where the data is coming from) in a session object. Regardless of where it is kept, if you want to deal with data changes, you either have to a) invalidate the cached dataset if data changes or b) keep a copy of the whole dataset around in its as queried state so that subsequent paging requests are consistent. Either case involves keeping a fairly big duplicate data structure on the server or middle tier and violates one of the objectives of REST-ful APIs, which is that of statelessness. For that reason, I personally think the REST-ful API shouldn't deal with paging. It should probably be done at some intermediate level as needed by applications. We can certainly build a separate API that we can all leverage if needed, but I don't think it should be in the core REST-ful layer. Just my $0.02, after taxes. Original Message Subject: Re: [Neo] Traversers in the REST API From: Tobias Ivarsson tobias.ivars...@neotechnology.com Date: Fri, April 09, 2010 4:00 am To: Neo user discussions user@lists.neo4j.org I definitely agree that limiting or paging a set of results is probably not very useful without some sort of sorting. The (only) benefit of pushing sorting to the client is that the client might be able to filter the result further before sorting it. Since sorting is generally the most expensive operation it should be done as late as possible. However the idea of semi-sorting, to get only one page of sorted results at each request, that was mentioned in some thread yesterday sounds quite compelling. I agree that an equivalent of LIMIT, OFFSET and ORDER BY is a good target. As to indexing: the structure of the graph IS the index to a large extent. This means that a well designed graph would often not need paging if the traversal is done right. There are however some cases where this is hard to accomplish and we need to work on supporting those cases better. Remember that a Graph Database is NOT a Relational Database. A lot of the ideas people have about databases are based on their knowledge of Relational Databases. I understand that it can be hard, but if that baggage could be left at the door it would make things a lot easier. Nobody is saying that Relational Databases are dead (except for some publicity stunts) far from it! What we (and a lot of other people) are saying is that the age of one database to rule them all is over. Different problems are best solved with different kinds of databases, RDBMSes are great at some, K/V stores some, and Graph Databases are great for some. Then there are some problems that are best solved with a combination of two or more (kinds of) databases, where each database brings its own strengths to the table, and is used only for the things it is good at. That's enough deviation from the topic, my conclusions remain the same as they were before this discussion started, I will state them in as few words as possible and in bullet point form to convey them as clearly as I can: * The REST API will probably need result set limiting or pagination. * Limiting and pagination will require (server side) sorting * Sorting can be better implemented if it's implemented in the core of the traversal framework * Limiting / Pagination can be deferred for a while until we know what it needs to look like (from looking at actual uses) * (Server side) Sorting can be deferred until we need it for limiting / pagination Peace, Tobias On Thu, Apr 8, 2010 at 10:17 PM, Michael Ludwig mil...@gmx.de wrote: Tobias Ivarsson schrieb am 08.04.2010 um 18:23:27 (+0200) [Re: [Neo] Traversers in the REST API]: On Wed, Apr 7, 2010 at 3:05 PM, Alastair James al.ja...@gmail.com wrote: when we start talking about returning 1000s of nodes in JSON over HTTP just to get the first 10 this is clearly sub-optimal (as I build websites this is a very common use case). So, as you say, sorting and limiting can wait, but I suspect the HTTP API would benefit from offering it. Limiting need not require changes to the core API, it could be implemented as a second stage in the HTTP API code prior to output encoding. For paging / limiting: yes, you are absolutely right, this would not
Re: [Neo] Traversers in the REST API
Since in manycases the results of a query will need to be reformed into their associated domain objects Unlikely to be the case over the HTTP API. Its unlikely people will create domain objects in (e.g.) PHP they will just use the data directly. Pagination is kinda tricky if the data changes between subsequent requests for pages. Since pagination is generally used for UIs, a common approach is to place the entire dataset (or a cursor, depending on where the data is coming from) in a session object. Regardless of where it is kept, if you want to deal with data changes, you either have to a) invalidate the cached dataset if data changes or b) keep a copy of the whole dataset around in its as queried state so that subsequent paging requests are consistent. Either case involves keeping a fairly big duplicate data structure on the server or middle tier and violates one of the objectives of REST-ful APIs, which is that of statelessness. For that reason, I personally think the REST-ful API shouldn't deal with paging. It should probably be done at some intermediate level as needed by applications. We can certainly build a separate API that we can all leverage if needed, but I don't think it should be in the core REST-ful layer. Well, I think for my use cases (websites), its likely that users dont flick between pages that often. For example, on may sites, users will view page 1 and select an item, any very view move on to page 2. Its a very different usage pattern compared to a traditional desktop UI, so there is absolutely no need to hold the sorted set on the server in a cursor type way. A typical use case for me would be 1000+ matching rows, with 90%+ of page views for the first 10, 5% for the next 10 etc...! You can clearly see that sending the entire results set of 1000+ rows over HTTP/JSON is inefficient. Of course, caching between the web server and the neo HTTP API can help, but not in all cases, and it seems silly to rely on this. Al -- Dr Alastair James CTO James Publishing Ltd. http://www.linkedin.com/pub/3/914/163 www.worldreviewer.com WINNER Travolution Awards Best Travel Information Website 2009 WINNER IRHAS Awards, Los Angeles, Best Travel Website 2008 WINNER Travolution Awards Best New Online Travel Company 2008 WINNER Travel Weekly Magellan Award 2008 WINNER Yahoo! Finds of the Year 2007 Noli nothis permittere te terere! ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Traversers in the REST API
Why not just slap memcached in the middle? Would help with scalability as well, plus you could keep cached results keyed by query params in there if needed. Just a thought... Original Message Subject: Re: [Neo] Traversers in the REST API From: Alastair James al.ja...@gmail.com Date: Fri, April 09, 2010 8:32 am To: Neo user discussions user@lists.neo4j.org Since in manycases the results of a query will need to be reformed into their associated domain objects Unlikely to be the case over the HTTP API. Its unlikely people will create domain objects in (e.g.) PHP they will just use the data directly. Pagination is kinda tricky if the data changes between subsequent requests for pages. Since pagination is generally used for UIs, a common approach is to place the entire dataset (or a cursor, depending on where the data is coming from) in a session object. Regardless of where it is kept, if you want to deal with data changes, you either have to a) invalidate the cached dataset if data changes or b) keep a copy of the whole dataset around in its as queried state so that subsequent paging requests are consistent. Either case involves keeping a fairly big duplicate data structure on the server or middle tier and violates one of the objectives of REST-ful APIs, which is that of statelessness. For that reason, I personally think the REST-ful API shouldn't deal with paging. It should probably be done at some intermediate level as needed by applications. We can certainly build a separate API that we can all leverage if needed, but I don't think it should be in the core REST-ful layer. Well, I think for my use cases (websites), its likely that users dont flick between pages that often. For example, on may sites, users will view page 1 and select an item, any very view move on to page 2. Its a very different usage pattern compared to a traditional desktop UI, so there is absolutely no need to hold the sorted set on the server in a cursor type way. A typical use case for me would be 1000+ matching rows, with 90%+ of page views for the first 10, 5% for the next 10 etc...! You can clearly see that sending the entire results set of 1000+ rows over HTTP/JSON is inefficient. Of course, caching between the web server and the neo HTTP API can help, but not in all cases, and it seems silly to rely on this. Al -- Dr Alastair James CTO James Publishing Ltd. [1]http://www.linkedin.com/pub/3/914/163 [2]www.worldreviewer.com WINNER Travolution Awards Best Travel Information Website 2009 WINNER IRHAS Awards, Los Angeles, Best Travel Website 2008 WINNER Travolution Awards Best New Online Travel Company 2008 WINNER Travel Weekly Magellan Award 2008 WINNER Yahoo! Finds of the Year 2007 Noli nothis permittere te terere! ___ Neo mailing list User@lists.neo4j.org [3]https://lists.neo4j.org/mailman/listinfo/user References 1. http://www.linkedin.com/pub/3/914/163 2. http://www.worldreviewer.com/ 3. https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] How to efficiently query in Neo4J?
On 9 April 2010 14:21, Max De Marzi Jr. maxdema...@gmail.com wrote: On first traversal, add a relationship to a found node to each node that would return, and check for this relationship on the second traversal? Maybe create a unique id, set a property or add a node property with the unique id on the first traversal, and check for this property on the second traversal? I suspect that would lead to pretty poor performance given that I am altering the data on each node visited. Also, I would have to remember to remove these properties / relationships later on otherwise the database will balloon in size. That said, it would be suitable for queries that I could generate (say) every hour. So adding relations to the graph would be acceptable for a index that is built using a 'slow query', but not suitable for realtime results. ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] How to efficiently query in Neo4J?
Alastair James schrieb am 09.04.2010 um 14:04:37 (+0100) [Re: [Neo] How to efficiently query in Neo4J?]: So, I suppose this question boils down to, is there an efficient way to calculate the union of two traversals without retrieving all result sets and performing the union in user code? No need for two traversals if you annotate your category tree in Neo4j the same way Celko has popularized for SQL, i.e. marking each category with *left* and *right*. It's really not a question of graph or sets, as in both cases what you deal with is a tree. http://intelligent-enterprise.informationweek.com/001020/celko.jhtml Note that this needs some custom logic for category tree updates. But it's not difficult in SQL, and I think it's not much more difficult in Neo4j either. -- Michael Ludwig ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] How to efficiently query in Neo4J?
In a previous post, I suggested a two-way traversal (I guess it's actually a traversal done once in one direction and n-1 times in the other direction, where n = number of tags you're matching posts on). I'm willing to bet it could be pretty fast... Do you have any code that can create a dummy data set in Neo4J? If you do, I'd be willing to give it a try... Original Message Subject: Re: [Neo] How to efficiently query in Neo4J? From: Michael Ludwig mil...@gmx.de Date: Fri, April 09, 2010 2:44 pm To: Neo user discussions user@lists.neo4j.org Alastair James schrieb am 09.04.2010 um 14:04:37 (+0100) [Re: [Neo] How to efficiently query in Neo4J?]: So, I suppose this question boils down to, is there an efficient way to calculate the union of two traversals without retrieving all result sets and performing the union in user code? No need for two traversals if you annotate your category tree in Neo4j the same way Celko has popularized for SQL, i.e. marking each category with *left* and *right*. It's really not a question of graph or sets, as in both cases what you deal with is a tree. [1]http://intelligent-enterprise.informationweek.com/001020/celko.jhtml Note that this needs some custom logic for category tree updates. But it's not difficult in SQL, and I think it's not much more difficult in Neo4j either. -- Michael Ludwig ___ Neo mailing list User@lists.neo4j.org [2]https://lists.neo4j.org/mailman/listinfo/user References 1. http://intelligent-enterprise.informationweek.com/001020/celko.jhtml 2. https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user