Re: [Neo4j] Concurrent graph usage, design guidelines

2011-02-28 Thread Kiss Miklós
A typical traversal while teaching is like this:
- grab 1 text node and all it's token nodes (can be cached in memory but 
still need initial DB lookup)
- do calculations
- update token nodes.
There are usually 50-500 token nodes for a text node so 1 iteration in 
teaching needs to access this many nodes. I thought to cache all 
required properties of each node in memory because REST communication is 
slow. However, if I go for the embedded version, I can use on-the-fly 
lookup since neo will do the caching for me.
Using the embedded version solves nearly all my troubles. I just need to 
find a good way to provide concurrent access to the graph. I like the 
idea of stateful usage, where can I find more information about it?

Thanks
Miklós

2011.02.28. 11:47 keltezéssel, Michael Hunger írta:
> Miklós,
>
> perhaps there is some categorization you can do to the text nodes and 
> introduce intermediary nodes from the root node. (Or at least shard them by 
> some number).
>
> Then you could also do traversals from those intermediary nodes in parallel. 
> If that helps you too speed up.
>
> What are you querying for in your traversals? You can put information on the 
> relationships and evaluate that (and also all other nodes and relationships 
> so far on the current path) using the evaluators (new framework) or 
> pruning/filtering (stable framework).
>
> Can you explain a typical traversal so that it gets more clear what kind of 
> queries you run against the graph.
>
> Yes I'd really recommend developing against embedded and then exposing this 
> custom, higher level API to the clients (as you obviously have multiple 
> clients accessing the data).
> Your custom REST API can also do paging or streaming (or if you go for a 
> stateful (e.g. session based) one then you can also keep the traversal state 
> in the session and continuously pull from that).
>
> HTH
>
> Michael
>
> Am 28.02.2011 um 11:36 schrieb Kiss Miklós:
>
>> Thanks for the ideas!
>>
>> OK, I'll tell a bit more about my current scenario. I use the graph to
>> store medium sized texts (various sizes, 1 - 30kb) with some properties,
>> their tokens (1-2-3 word phrases) with many properties, and some other
>> stuff too which is not relevant now.
>> Since I use the graph for teaching AI algoritms, I need to query (and
>> modify) large parts of the graph from time to time and I cannot do any
>> pruning here (at least I didn't find a way to do that). The
>> relationships I use are mainly the same type and I cannot really
>> distinguish them in the domain. The only thing I can think of is to
>> introduce different types of relationships based on word count for
>> tokens (e.g.: TOKEN_1, TOKEN_2, TOKEN_3 for 1-2-3 length phrases).
>> Otherwise they are considered the same. And all the text nodes are
>> connected to the reference node (I have *many* text nodes). The most
>> problematic part is to query (a part of) the text nodes.
>> Currently I access the DB through the REST API mainly because I need to
>> have concurrent access and it seemed a good solution at first when I
>> started the project. I use travesing to query the nodes/relationships I
>> need and I also use the indexing service for various tasks (duplicate
>> check, node lookup). Since the relationships I use have the same type, I
>> cannot write efficient prune evaluators. It would be good if I could
>> query the results in more groups each returning only ~100 results but I
>> have no ways to tell the traverser where to start and where to stop (the
>> graph itself doesn't have such information).
>>
>> If I understand well Your advice I should implement complex server
>> plugins and call them using REST? So actually I put parts of the program
>> logic to the DB server? This will have better performance but won't it
>> make development harder (program logic is distibuted on different levels)?
>>
>> Yes, unfortunately I need concurrent write access. However, sharing the
>> GraphDatabaseService object might be usable for me. I'll investigate
>> this approach too.
>>
>> Thanks again,
>> Miklós Kiss
>>
>> 2011.02.28. 10:38 keltezéssel, Michael Hunger írta:
>>> Miklós,
>>>
>>> you should actually do both :)
>>>
>>> So go for the embedded version to create domain specific service calls for 
>>> interacting with your database.
>>>
>>> And then expose those as your own REST-endpoints using either a 
>>> Server-Plugin or an unmanaged extension.
>>>
>>> The current REST server API is too noisy to transfer that many 
>>> relationships per node.
>>>
>>> How do you currently use the REST server API ? Which calls?
>>> If you don't already do traversals then you should definitely look into 
>>> them. (Rather than traversing node relationships one by one).
>>> With the traversal you can already specify javascript code that is executed 
>>> in the server context for pruning and filtering.
>>>
>>> Do you need concurrent write access? Otherwise the other stores can work in 
>>> readonly mode against the databas

Re: [Neo4j] Concurrent graph usage, design guidelines

2011-02-28 Thread Michael Hunger
Miklós,

perhaps there is some categorization you can do to the text nodes and introduce 
intermediary nodes from the root node. (Or at least shard them by some number).

Then you could also do traversals from those intermediary nodes in parallel. If 
that helps you too speed up.

What are you querying for in your traversals? You can put information on the 
relationships and evaluate that (and also all other nodes and relationships so 
far on the current path) using the evaluators (new framework) or 
pruning/filtering (stable framework).

Can you explain a typical traversal so that it gets more clear what kind of 
queries you run against the graph.

Yes I'd really recommend developing against embedded and then exposing this 
custom, higher level API to the clients (as you obviously have multiple clients 
accessing the data).
Your custom REST API can also do paging or streaming (or if you go for a 
stateful (e.g. session based) one then you can also keep the traversal state in 
the session and continuously pull from that).

HTH

Michael

Am 28.02.2011 um 11:36 schrieb Kiss Miklós:

> Thanks for the ideas!
> 
> OK, I'll tell a bit more about my current scenario. I use the graph to 
> store medium sized texts (various sizes, 1 - 30kb) with some properties, 
> their tokens (1-2-3 word phrases) with many properties, and some other 
> stuff too which is not relevant now.
> Since I use the graph for teaching AI algoritms, I need to query (and 
> modify) large parts of the graph from time to time and I cannot do any 
> pruning here (at least I didn't find a way to do that). The 
> relationships I use are mainly the same type and I cannot really 
> distinguish them in the domain. The only thing I can think of is to 
> introduce different types of relationships based on word count for 
> tokens (e.g.: TOKEN_1, TOKEN_2, TOKEN_3 for 1-2-3 length phrases). 
> Otherwise they are considered the same. And all the text nodes are 
> connected to the reference node (I have *many* text nodes). The most 
> problematic part is to query (a part of) the text nodes.
> Currently I access the DB through the REST API mainly because I need to 
> have concurrent access and it seemed a good solution at first when I 
> started the project. I use travesing to query the nodes/relationships I 
> need and I also use the indexing service for various tasks (duplicate 
> check, node lookup). Since the relationships I use have the same type, I 
> cannot write efficient prune evaluators. It would be good if I could 
> query the results in more groups each returning only ~100 results but I 
> have no ways to tell the traverser where to start and where to stop (the 
> graph itself doesn't have such information).
> 
> If I understand well Your advice I should implement complex server 
> plugins and call them using REST? So actually I put parts of the program 
> logic to the DB server? This will have better performance but won't it 
> make development harder (program logic is distibuted on different levels)?
> 
> Yes, unfortunately I need concurrent write access. However, sharing the 
> GraphDatabaseService object might be usable for me. I'll investigate 
> this approach too.
> 
> Thanks again,
> Miklós Kiss
> 
> 2011.02.28. 10:38 keltezéssel, Michael Hunger írta:
>> Miklós,
>> 
>> you should actually do both :)
>> 
>> So go for the embedded version to create domain specific service calls for 
>> interacting with your database.
>> 
>> And then expose those as your own REST-endpoints using either a 
>> Server-Plugin or an unmanaged extension.
>> 
>> The current REST server API is too noisy to transfer that many relationships 
>> per node.
>> 
>> How do you currently use the REST server API ? Which calls?
>> If you don't already do traversals then you should definitely look into 
>> them. (Rather than traversing node relationships one by one).
>> With the traversal you can already specify javascript code that is executed 
>> in the server context for pruning and filtering.
>> 
>> Do you need concurrent write access? Otherwise the other stores can work in 
>> readonly mode against the database.
>> And concurrent access from inside the same VM is no problem, you can easily 
>> share the GraphDatabaseService object.
>> 
>> Perhaps you can also tell us a bit about your domain and how it is modelled 
>> to support you there.
>> 
>> Cheers
>> 
>> Michael
>> 
>> Am 28.02.2011 um 10:19 schrieb Kiss Miklós:
>> 
>>> Hi all,
>>> 
>>> I'm wondering if I'm using the Neo4j graph database right. My current
>>> graph structure contains many relations for every single node. Some of
>>> the nodes have>1 relations which is hard to traverse using REST
>>> server (collecting nodes is heavy on memory and transmitting is heavy on
>>> bandwith).
>>> 
>>> My first question is: is this structure a useable one or should I
>>> restructure my graph so that the number of direct relationships becomes
>>> much lower (don't really know how could I do that and it would obfuscate
>>

Re: [Neo4j] Concurrent graph usage, design guidelines

2011-02-28 Thread Kiss Miklós
Thanks for the ideas!

OK, I'll tell a bit more about my current scenario. I use the graph to 
store medium sized texts (various sizes, 1 - 30kb) with some properties, 
their tokens (1-2-3 word phrases) with many properties, and some other 
stuff too which is not relevant now.
Since I use the graph for teaching AI algoritms, I need to query (and 
modify) large parts of the graph from time to time and I cannot do any 
pruning here (at least I didn't find a way to do that). The 
relationships I use are mainly the same type and I cannot really 
distinguish them in the domain. The only thing I can think of is to 
introduce different types of relationships based on word count for 
tokens (e.g.: TOKEN_1, TOKEN_2, TOKEN_3 for 1-2-3 length phrases). 
Otherwise they are considered the same. And all the text nodes are 
connected to the reference node (I have *many* text nodes). The most 
problematic part is to query (a part of) the text nodes.
Currently I access the DB through the REST API mainly because I need to 
have concurrent access and it seemed a good solution at first when I 
started the project. I use travesing to query the nodes/relationships I 
need and I also use the indexing service for various tasks (duplicate 
check, node lookup). Since the relationships I use have the same type, I 
cannot write efficient prune evaluators. It would be good if I could 
query the results in more groups each returning only ~100 results but I 
have no ways to tell the traverser where to start and where to stop (the 
graph itself doesn't have such information).

If I understand well Your advice I should implement complex server 
plugins and call them using REST? So actually I put parts of the program 
logic to the DB server? This will have better performance but won't it 
make development harder (program logic is distibuted on different levels)?

Yes, unfortunately I need concurrent write access. However, sharing the 
GraphDatabaseService object might be usable for me. I'll investigate 
this approach too.

Thanks again,
Miklós Kiss

2011.02.28. 10:38 keltezéssel, Michael Hunger írta:
> Miklós,
>
> you should actually do both :)
>
> So go for the embedded version to create domain specific service calls for 
> interacting with your database.
>
> And then expose those as your own REST-endpoints using either a Server-Plugin 
> or an unmanaged extension.
>
> The current REST server API is too noisy to transfer that many relationships 
> per node.
>
> How do you currently use the REST server API ? Which calls?
> If you don't already do traversals then you should definitely look into them. 
> (Rather than traversing node relationships one by one).
> With the traversal you can already specify javascript code that is executed 
> in the server context for pruning and filtering.
>
> Do you need concurrent write access? Otherwise the other stores can work in 
> readonly mode against the database.
> And concurrent access from inside the same VM is no problem, you can easily 
> share the GraphDatabaseService object.
>
> Perhaps you can also tell us a bit about your domain and how it is modelled 
> to support you there.
>
> Cheers
>
> Michael
>
> Am 28.02.2011 um 10:19 schrieb Kiss Miklós:
>
>> Hi all,
>>
>> I'm wondering if I'm using the Neo4j graph database right. My current
>> graph structure contains many relations for every single node. Some of
>> the nodes have>1 relations which is hard to traverse using REST
>> server (collecting nodes is heavy on memory and transmitting is heavy on
>> bandwith).
>>
>> My first question is: is this structure a useable one or should I
>> restructure my graph so that the number of direct relationships becomes
>> much lower (don't really know how could I do that and it would obfuscate
>> the domain model)?
>>
>> My second question is: if I leave the structure that way, I could solve
>> the performance issues if I used the embedded version of neo. However, I
>> need to have concurrent access to the graph. Is this possible?
>>
>> Thanks,
>> Miklós Kiss
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
>

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Concurrent graph usage, design guidelines

2011-02-28 Thread Michael Hunger
Miklós,

you should actually do both :)

So go for the embedded version to create domain specific service calls for 
interacting with your database.

And then expose those as your own REST-endpoints using either a Server-Plugin 
or an unmanaged extension.

The current REST server API is too noisy to transfer that many relationships 
per node.

How do you currently use the REST server API ? Which calls?
If you don't already do traversals then you should definitely look into them. 
(Rather than traversing node relationships one by one).
With the traversal you can already specify javascript code that is executed in 
the server context for pruning and filtering.

Do you need concurrent write access? Otherwise the other stores can work in 
readonly mode against the database.
And concurrent access from inside the same VM is no problem, you can easily 
share the GraphDatabaseService object.

Perhaps you can also tell us a bit about your domain and how it is modelled to 
support you there.

Cheers

Michael

Am 28.02.2011 um 10:19 schrieb Kiss Miklós:

> Hi all,
> 
> I'm wondering if I'm using the Neo4j graph database right. My current 
> graph structure contains many relations for every single node. Some of 
> the nodes have >1 relations which is hard to traverse using REST 
> server (collecting nodes is heavy on memory and transmitting is heavy on 
> bandwith).
> 
> My first question is: is this structure a useable one or should I 
> restructure my graph so that the number of direct relationships becomes 
> much lower (don't really know how could I do that and it would obfuscate 
> the domain model)?
> 
> My second question is: if I leave the structure that way, I could solve 
> the performance issues if I used the embedded version of neo. However, I 
> need to have concurrent access to the graph. Is this possible?
> 
> Thanks,
> Miklós Kiss
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Concurrent graph usage, design guidelines

2011-02-28 Thread Kiss Miklós
Hi all,

I'm wondering if I'm using the Neo4j graph database right. My current 
graph structure contains many relations for every single node. Some of 
the nodes have >1 relations which is hard to traverse using REST 
server (collecting nodes is heavy on memory and transmitting is heavy on 
bandwith).

My first question is: is this structure a useable one or should I 
restructure my graph so that the number of direct relationships becomes 
much lower (don't really know how could I do that and it would obfuscate 
the domain model)?

My second question is: if I leave the structure that way, I could solve 
the performance issues if I used the embedded version of neo. However, I 
need to have concurrent access to the graph. Is this possible?

Thanks,
Miklós Kiss
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user