Re: [Neo4j] Concurrent graph usage, design guidelines
A typical traversal while teaching is like this: - grab 1 text node and all it's token nodes (can be cached in memory but still need initial DB lookup) - do calculations - update token nodes. There are usually 50-500 token nodes for a text node so 1 iteration in teaching needs to access this many nodes. I thought to cache all required properties of each node in memory because REST communication is slow. However, if I go for the embedded version, I can use on-the-fly lookup since neo will do the caching for me. Using the embedded version solves nearly all my troubles. I just need to find a good way to provide concurrent access to the graph. I like the idea of stateful usage, where can I find more information about it? Thanks Miklós 2011.02.28. 11:47 keltezéssel, Michael Hunger írta: > Miklós, > > perhaps there is some categorization you can do to the text nodes and > introduce intermediary nodes from the root node. (Or at least shard them by > some number). > > Then you could also do traversals from those intermediary nodes in parallel. > If that helps you too speed up. > > What are you querying for in your traversals? You can put information on the > relationships and evaluate that (and also all other nodes and relationships > so far on the current path) using the evaluators (new framework) or > pruning/filtering (stable framework). > > Can you explain a typical traversal so that it gets more clear what kind of > queries you run against the graph. > > Yes I'd really recommend developing against embedded and then exposing this > custom, higher level API to the clients (as you obviously have multiple > clients accessing the data). > Your custom REST API can also do paging or streaming (or if you go for a > stateful (e.g. session based) one then you can also keep the traversal state > in the session and continuously pull from that). > > HTH > > Michael > > Am 28.02.2011 um 11:36 schrieb Kiss Miklós: > >> Thanks for the ideas! >> >> OK, I'll tell a bit more about my current scenario. I use the graph to >> store medium sized texts (various sizes, 1 - 30kb) with some properties, >> their tokens (1-2-3 word phrases) with many properties, and some other >> stuff too which is not relevant now. >> Since I use the graph for teaching AI algoritms, I need to query (and >> modify) large parts of the graph from time to time and I cannot do any >> pruning here (at least I didn't find a way to do that). The >> relationships I use are mainly the same type and I cannot really >> distinguish them in the domain. The only thing I can think of is to >> introduce different types of relationships based on word count for >> tokens (e.g.: TOKEN_1, TOKEN_2, TOKEN_3 for 1-2-3 length phrases). >> Otherwise they are considered the same. And all the text nodes are >> connected to the reference node (I have *many* text nodes). The most >> problematic part is to query (a part of) the text nodes. >> Currently I access the DB through the REST API mainly because I need to >> have concurrent access and it seemed a good solution at first when I >> started the project. I use travesing to query the nodes/relationships I >> need and I also use the indexing service for various tasks (duplicate >> check, node lookup). Since the relationships I use have the same type, I >> cannot write efficient prune evaluators. It would be good if I could >> query the results in more groups each returning only ~100 results but I >> have no ways to tell the traverser where to start and where to stop (the >> graph itself doesn't have such information). >> >> If I understand well Your advice I should implement complex server >> plugins and call them using REST? So actually I put parts of the program >> logic to the DB server? This will have better performance but won't it >> make development harder (program logic is distibuted on different levels)? >> >> Yes, unfortunately I need concurrent write access. However, sharing the >> GraphDatabaseService object might be usable for me. I'll investigate >> this approach too. >> >> Thanks again, >> Miklós Kiss >> >> 2011.02.28. 10:38 keltezéssel, Michael Hunger írta: >>> Miklós, >>> >>> you should actually do both :) >>> >>> So go for the embedded version to create domain specific service calls for >>> interacting with your database. >>> >>> And then expose those as your own REST-endpoints using either a >>> Server-Plugin or an unmanaged extension. >>> >>> The current REST server API is too noisy to transfer that many >>> relationships per node. >>> >>> How do you currently use the REST server API ? Which calls? >>> If you don't already do traversals then you should definitely look into >>> them. (Rather than traversing node relationships one by one). >>> With the traversal you can already specify javascript code that is executed >>> in the server context for pruning and filtering. >>> >>> Do you need concurrent write access? Otherwise the other stores can work in >>> readonly mode against the databas
Re: [Neo4j] Concurrent graph usage, design guidelines
Miklós, perhaps there is some categorization you can do to the text nodes and introduce intermediary nodes from the root node. (Or at least shard them by some number). Then you could also do traversals from those intermediary nodes in parallel. If that helps you too speed up. What are you querying for in your traversals? You can put information on the relationships and evaluate that (and also all other nodes and relationships so far on the current path) using the evaluators (new framework) or pruning/filtering (stable framework). Can you explain a typical traversal so that it gets more clear what kind of queries you run against the graph. Yes I'd really recommend developing against embedded and then exposing this custom, higher level API to the clients (as you obviously have multiple clients accessing the data). Your custom REST API can also do paging or streaming (or if you go for a stateful (e.g. session based) one then you can also keep the traversal state in the session and continuously pull from that). HTH Michael Am 28.02.2011 um 11:36 schrieb Kiss Miklós: > Thanks for the ideas! > > OK, I'll tell a bit more about my current scenario. I use the graph to > store medium sized texts (various sizes, 1 - 30kb) with some properties, > their tokens (1-2-3 word phrases) with many properties, and some other > stuff too which is not relevant now. > Since I use the graph for teaching AI algoritms, I need to query (and > modify) large parts of the graph from time to time and I cannot do any > pruning here (at least I didn't find a way to do that). The > relationships I use are mainly the same type and I cannot really > distinguish them in the domain. The only thing I can think of is to > introduce different types of relationships based on word count for > tokens (e.g.: TOKEN_1, TOKEN_2, TOKEN_3 for 1-2-3 length phrases). > Otherwise they are considered the same. And all the text nodes are > connected to the reference node (I have *many* text nodes). The most > problematic part is to query (a part of) the text nodes. > Currently I access the DB through the REST API mainly because I need to > have concurrent access and it seemed a good solution at first when I > started the project. I use travesing to query the nodes/relationships I > need and I also use the indexing service for various tasks (duplicate > check, node lookup). Since the relationships I use have the same type, I > cannot write efficient prune evaluators. It would be good if I could > query the results in more groups each returning only ~100 results but I > have no ways to tell the traverser where to start and where to stop (the > graph itself doesn't have such information). > > If I understand well Your advice I should implement complex server > plugins and call them using REST? So actually I put parts of the program > logic to the DB server? This will have better performance but won't it > make development harder (program logic is distibuted on different levels)? > > Yes, unfortunately I need concurrent write access. However, sharing the > GraphDatabaseService object might be usable for me. I'll investigate > this approach too. > > Thanks again, > Miklós Kiss > > 2011.02.28. 10:38 keltezéssel, Michael Hunger írta: >> Miklós, >> >> you should actually do both :) >> >> So go for the embedded version to create domain specific service calls for >> interacting with your database. >> >> And then expose those as your own REST-endpoints using either a >> Server-Plugin or an unmanaged extension. >> >> The current REST server API is too noisy to transfer that many relationships >> per node. >> >> How do you currently use the REST server API ? Which calls? >> If you don't already do traversals then you should definitely look into >> them. (Rather than traversing node relationships one by one). >> With the traversal you can already specify javascript code that is executed >> in the server context for pruning and filtering. >> >> Do you need concurrent write access? Otherwise the other stores can work in >> readonly mode against the database. >> And concurrent access from inside the same VM is no problem, you can easily >> share the GraphDatabaseService object. >> >> Perhaps you can also tell us a bit about your domain and how it is modelled >> to support you there. >> >> Cheers >> >> Michael >> >> Am 28.02.2011 um 10:19 schrieb Kiss Miklós: >> >>> Hi all, >>> >>> I'm wondering if I'm using the Neo4j graph database right. My current >>> graph structure contains many relations for every single node. Some of >>> the nodes have>1 relations which is hard to traverse using REST >>> server (collecting nodes is heavy on memory and transmitting is heavy on >>> bandwith). >>> >>> My first question is: is this structure a useable one or should I >>> restructure my graph so that the number of direct relationships becomes >>> much lower (don't really know how could I do that and it would obfuscate >>
Re: [Neo4j] Concurrent graph usage, design guidelines
Thanks for the ideas! OK, I'll tell a bit more about my current scenario. I use the graph to store medium sized texts (various sizes, 1 - 30kb) with some properties, their tokens (1-2-3 word phrases) with many properties, and some other stuff too which is not relevant now. Since I use the graph for teaching AI algoritms, I need to query (and modify) large parts of the graph from time to time and I cannot do any pruning here (at least I didn't find a way to do that). The relationships I use are mainly the same type and I cannot really distinguish them in the domain. The only thing I can think of is to introduce different types of relationships based on word count for tokens (e.g.: TOKEN_1, TOKEN_2, TOKEN_3 for 1-2-3 length phrases). Otherwise they are considered the same. And all the text nodes are connected to the reference node (I have *many* text nodes). The most problematic part is to query (a part of) the text nodes. Currently I access the DB through the REST API mainly because I need to have concurrent access and it seemed a good solution at first when I started the project. I use travesing to query the nodes/relationships I need and I also use the indexing service for various tasks (duplicate check, node lookup). Since the relationships I use have the same type, I cannot write efficient prune evaluators. It would be good if I could query the results in more groups each returning only ~100 results but I have no ways to tell the traverser where to start and where to stop (the graph itself doesn't have such information). If I understand well Your advice I should implement complex server plugins and call them using REST? So actually I put parts of the program logic to the DB server? This will have better performance but won't it make development harder (program logic is distibuted on different levels)? Yes, unfortunately I need concurrent write access. However, sharing the GraphDatabaseService object might be usable for me. I'll investigate this approach too. Thanks again, Miklós Kiss 2011.02.28. 10:38 keltezéssel, Michael Hunger írta: > Miklós, > > you should actually do both :) > > So go for the embedded version to create domain specific service calls for > interacting with your database. > > And then expose those as your own REST-endpoints using either a Server-Plugin > or an unmanaged extension. > > The current REST server API is too noisy to transfer that many relationships > per node. > > How do you currently use the REST server API ? Which calls? > If you don't already do traversals then you should definitely look into them. > (Rather than traversing node relationships one by one). > With the traversal you can already specify javascript code that is executed > in the server context for pruning and filtering. > > Do you need concurrent write access? Otherwise the other stores can work in > readonly mode against the database. > And concurrent access from inside the same VM is no problem, you can easily > share the GraphDatabaseService object. > > Perhaps you can also tell us a bit about your domain and how it is modelled > to support you there. > > Cheers > > Michael > > Am 28.02.2011 um 10:19 schrieb Kiss Miklós: > >> Hi all, >> >> I'm wondering if I'm using the Neo4j graph database right. My current >> graph structure contains many relations for every single node. Some of >> the nodes have>1 relations which is hard to traverse using REST >> server (collecting nodes is heavy on memory and transmitting is heavy on >> bandwith). >> >> My first question is: is this structure a useable one or should I >> restructure my graph so that the number of direct relationships becomes >> much lower (don't really know how could I do that and it would obfuscate >> the domain model)? >> >> My second question is: if I leave the structure that way, I could solve >> the performance issues if I used the embedded version of neo. However, I >> need to have concurrent access to the graph. Is this possible? >> >> Thanks, >> Miklós Kiss >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Concurrent graph usage, design guidelines
Miklós, you should actually do both :) So go for the embedded version to create domain specific service calls for interacting with your database. And then expose those as your own REST-endpoints using either a Server-Plugin or an unmanaged extension. The current REST server API is too noisy to transfer that many relationships per node. How do you currently use the REST server API ? Which calls? If you don't already do traversals then you should definitely look into them. (Rather than traversing node relationships one by one). With the traversal you can already specify javascript code that is executed in the server context for pruning and filtering. Do you need concurrent write access? Otherwise the other stores can work in readonly mode against the database. And concurrent access from inside the same VM is no problem, you can easily share the GraphDatabaseService object. Perhaps you can also tell us a bit about your domain and how it is modelled to support you there. Cheers Michael Am 28.02.2011 um 10:19 schrieb Kiss Miklós: > Hi all, > > I'm wondering if I'm using the Neo4j graph database right. My current > graph structure contains many relations for every single node. Some of > the nodes have >1 relations which is hard to traverse using REST > server (collecting nodes is heavy on memory and transmitting is heavy on > bandwith). > > My first question is: is this structure a useable one or should I > restructure my graph so that the number of direct relationships becomes > much lower (don't really know how could I do that and it would obfuscate > the domain model)? > > My second question is: if I leave the structure that way, I could solve > the performance issues if I used the embedded version of neo. However, I > need to have concurrent access to the graph. Is this possible? > > Thanks, > Miklós Kiss > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Concurrent graph usage, design guidelines
Hi all, I'm wondering if I'm using the Neo4j graph database right. My current graph structure contains many relations for every single node. Some of the nodes have >1 relations which is hard to traverse using REST server (collecting nodes is heavy on memory and transmitting is heavy on bandwith). My first question is: is this structure a useable one or should I restructure my graph so that the number of direct relationships becomes much lower (don't really know how could I do that and it would obfuscate the domain model)? My second question is: if I leave the structure that way, I could solve the performance issues if I used the embedded version of neo. However, I need to have concurrent access to the graph. Is this possible? Thanks, Miklós Kiss ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user