Re: [Neo4j] REST results pagination

2011-04-19 Thread Jim Webber
Hi Javier,

I've just checked and that's in our "list of stuff we really should do because 
it annoys us that it's not there."

No promises, but we do intend to work through at least some of that list for 
the 1.4 releases.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Saikat Kanjilal

I'd like to propose that we put this functionality into the plugin 
(https://github.com/skanjila/gremlin-translation-plugin) that Peter and I are 
currently working on, thoughts?
> From: j...@neotechnology.com
> Date: Tue, 19 Apr 2011 15:25:20 +0100
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] REST results pagination
> 
> Hi Javier,
> 
> I've just checked and that's in our "list of stuff we really should do 
> because it annoys us that it's not there."
> 
> No promises, but we do intend to work through at least some of that list for 
> the 1.4 releases.
> 
> Jim
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
  
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Javier de la Rosa
On Tue, Apr 19, 2011 at 10:32, Saikat Kanjilal  wrote:
> I'd like to propose that we put this functionality into the plugin 
> (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I are 
> currently working on, thoughts?

+1

>> From: j...@neotechnology.com
>> I've just checked and that's in our "list of stuff we really should do 
>> because it annoys us that it's not there."
>> No promises, but we do intend to work through at least some of that list for 
>> the 1.4 releases.

It will be great to see the feature in the 1.4 :-)



-- 
Javier de la Rosa
http://versae.es
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Jim Webber
>> I'd like to propose that we put this functionality into the plugin 
>> (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I 
>> are currently working on, thoughts?

I'm thinking that, if we do it, it should be handled through content 
negotiation. That is if you ask for application/atom then you get paged lists 
of results. I don't necessarily think that's a plugin, it's more likely part of 
the representation logic in server itself.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Michael DeHaan
On Tue, Apr 19, 2011 at 10:58 AM, Jim Webber  wrote:
>>> I'd like to propose that we put this functionality into the plugin 
>>> (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I 
>>> are currently working on, thoughts?
>
> I'm thinking that, if we do it, it should be handled through content 
> negotiation. That is if you ask for application/atom then you get paged lists 
> of results. I don't necessarily think that's a plugin, it's more likely part 
> of the representation logic in server itself.

This is something I've been wondering about as I may have the need to
feed very large graphs into the system and am wondering how the REST
API will hold up compared to the native interface.

What happens if the result of an index query (or traversal, whatever)
legitimately needs to return 100k results?

Wouldn't that be a bit large for one request?   If anything, it's a
lot of JSON to decode at once.

Feeds make sense for things that are feed-like, but do atom feeds
really make sense for results of very dynamic queries that don't get
subscribed to?
Or, related question, is there a point where the result sets of
operations get so large that things start to break down?   What do
people find this to generally be?

Maybe it's not an issue, but pointers to any problems REST API usage
has with large data sets (and solutions?) would be welcome.

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Javier de la Rosa
On Tue, Apr 19, 2011 at 10:25, Jim Webber  wrote:
> I've just checked and that's in our "list of stuff we really should do 
> because it annoys us that it's not there."
> No promises, but we do intend to work through at least some of that list for 
> the 1.4 releases.

If this finally is developed, it will possible to request for all
nodes and all relationships in some URL?

>
> Jim
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user



--
Javier de la Rosa
http://versae.es
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Michael Hunger
Hi Javier,

what would you need that for? I'm interested in the usecase.

Cheers

Michael

Am 20.04.2011 um 06:17 schrieb Javier de la Rosa:

> On Tue, Apr 19, 2011 at 10:25, Jim Webber  wrote:
>> I've just checked and that's in our "list of stuff we really should do 
>> because it annoys us that it's not there."
>> No promises, but we do intend to work through at least some of that list for 
>> the 1.4 releases.
> 
> If this finally is developed, it will possible to request for all
> nodes and all relationships in some URL?
> 
>> 
>> Jim
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
> 
> 
> 
> --
> Javier de la Rosa
> http://versae.es
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Tim McNamara
Data export, e.g. dumping everything as CSV, DOT or RDF?

On 20 April 2011 18:33, Michael Hunger wrote:

> Hi Javier,
>
> what would you need that for? I'm interested in the usecase.
>
> Cheers
>
> Michael
>
> Am 20.04.2011 um 06:17 schrieb Javier de la Rosa:
>
> > On Tue, Apr 19, 2011 at 10:25, Jim Webber  wrote:
> >> I've just checked and that's in our "list of stuff we really should do
> because it annoys us that it's not there."
> >> No promises, but we do intend to work through at least some of that list
> for the 1.4 releases.
> >
> > If this finally is developed, it will possible to request for all
> > nodes and all relationships in some URL?
> >
> >>
> >> Jim
> >> ___
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >
> >
> >
> > --
> > Javier de la Rosa
> > http://versae.es
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Michael Hunger
But wouldn't that really custom operation not more easily and much faster done 
as a server plugin?

Otherwise all the data would have to be serialized to json and deserialized 
again and no streaming possible.

>From a server extension you could even stream and gzip that data with ease.

Cheers

Michael

Am 20.04.2011 um 08:41 schrieb Tim McNamara:

> Data export, e.g. dumping everything as CSV, DOT or RDF?
> 
> On 20 April 2011 18:33, Michael Hunger 
> wrote:
> 
>> Hi Javier,
>> 
>> what would you need that for? I'm interested in the usecase.
>> 
>> Cheers
>> 
>> Michael
>> 
>> Am 20.04.2011 um 06:17 schrieb Javier de la Rosa:
>> 
>>> On Tue, Apr 19, 2011 at 10:25, Jim Webber  wrote:
 I've just checked and that's in our "list of stuff we really should do
>> because it annoys us that it's not there."
 No promises, but we do intend to work through at least some of that list
>> for the 1.4 releases.
>>> 
>>> If this finally is developed, it will possible to request for all
>>> nodes and all relationships in some URL?
>>> 
 
 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
>>> 
>>> 
>>> 
>>> --
>>> Javier de la Rosa
>>> http://versae.es
>>> ___
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Akhil
wont dumping a graph database as a tabular format create a huge file 
even if the number of nodes are few !!!  (for a highly  interconnected 
graph)
On 4/20/2011 2:41 AM, Tim McNamara wrote:
> Data export, e.g. dumping everything as CSV, DOT or RDF?
>
> On 20 April 2011 18:33, Michael Hungerwrote:
>
>> Hi Javier,
>>
>> what would you need that for? I'm interested in the usecase.
>>
>> Cheers
>>
>> Michael
>>
>> Am 20.04.2011 um 06:17 schrieb Javier de la Rosa:
>>
>>> On Tue, Apr 19, 2011 at 10:25, Jim Webber  wrote:
 I've just checked and that's in our "list of stuff we really should do
>> because it annoys us that it's not there."
 No promises, but we do intend to work through at least some of that list
>> for the 1.4 releases.
>>> If this finally is developed, it will possible to request for all
>>> nodes and all relationships in some URL?
>>>
 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
>>>
>>>
>>> --
>>> Javier de la Rosa
>>> http://versae.es
>>> ___
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Jacob Hansson
On Tue, Apr 19, 2011 at 10:17 PM, Michael DeHaan
wrote:

> On Tue, Apr 19, 2011 at 10:58 AM, Jim Webber 
> wrote:
> >>> I'd like to propose that we put this functionality into the plugin (
> https://github.com/skanjila/gremlin-translation-plugin) that Peter and I
> are currently working on, thoughts?
> >
> > I'm thinking that, if we do it, it should be handled through content
> negotiation. That is if you ask for application/atom then you get paged
> lists of results. I don't necessarily think that's a plugin, it's more
> likely part of the representation logic in server itself.
>
> This is something I've been wondering about as I may have the need to
> feed very large graphs into the system and am wondering how the REST
> API will hold up compared to the native interface.
>
> What happens if the result of an index query (or traversal, whatever)
> legitimately needs to return 100k results?
>
> Wouldn't that be a bit large for one request?   If anything, it's a
> lot of JSON to decode at once.
>
>
Yeah, we can't do this right now, and implementing it is harder than it
seems at first glance, since we first need to implement sorting of results,
otherwise the paged result will be useless. Like Jim said though, this is
another one of those *must be done* features.


> Feeds make sense for things that are feed-like, but do atom feeds
> really make sense for results of very dynamic queries that don't get
> subscribed to?
> Or, related question, is there a point where the result sets of
> operations get so large that things start to break down?   What do
> people find this to generally be?
>

I'm sure there are some awesome content types out there that we can look at
that will fit our uses, I don't feel confident to say if Atom is a good
choice, I've never worked with it..

The point where this breaks down I'm gonna guess is in server-side
serialization, because we currently don't stream the serialized data, but
build it up in memory and ship it off when it's done. I'd say you'll run out
of memory after 1 nodes or so on a small server, which I think
underlines how important this is to fix.


>
> Maybe it's not an issue, but pointers to any problems REST API usage
> has with large data sets (and solutions?) would be welcome.
>

Not aware of anyone bumping into these limits yet, but I'm sure we'll start
hearing about it.. The only current solution I can think of is a server
plugin that emulates this, but it would have to sort the result, and I'm
afraid that it will be hard (probably not impossible, but hard) to implement
that in a memory-efficient way that far away from the kernel. You may just
end up moving the OutOfMemeoryExceptions' to the plugin instead of the
serialization system.


>
> --Michael
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Craig Taverner
I think sorting would need to be optional, since it is likely to be a
performance and memory hug on large traversals. I think one of the key
benefits of the traversal framework in the Embedded API is being able to
traverse and 'stream' a very large graph without occupying much memory. If
this can be achieved in the REST API (through pagination), that is a very
good thing. I assume the main challenge is being able to freeze a traverser
and keep it on hold between client requests for the next page. Perhaps you
have already solved that bit?

In my opinion, I would code the sorting as a characteristic of the graph
itself, in order to avoid having to sort in the server (and incur the
memory/performance hit). So that means I would use a domain-specific
solution to sorting. Of course, generic sorting is nice also, but make it
optional.

On Wed, Apr 20, 2011 at 11:19 AM, Jacob Hansson wrote:

> On Tue, Apr 19, 2011 at 10:17 PM, Michael DeHaan
> wrote:
>
> > On Tue, Apr 19, 2011 at 10:58 AM, Jim Webber 
> > wrote:
> > >>> I'd like to propose that we put this functionality into the plugin (
> > https://github.com/skanjila/gremlin-translation-plugin) that Peter and I
> > are currently working on, thoughts?
> > >
> > > I'm thinking that, if we do it, it should be handled through content
> > negotiation. That is if you ask for application/atom then you get paged
> > lists of results. I don't necessarily think that's a plugin, it's more
> > likely part of the representation logic in server itself.
> >
> > This is something I've been wondering about as I may have the need to
> > feed very large graphs into the system and am wondering how the REST
> > API will hold up compared to the native interface.
> >
> > What happens if the result of an index query (or traversal, whatever)
> > legitimately needs to return 100k results?
> >
> > Wouldn't that be a bit large for one request?   If anything, it's a
> > lot of JSON to decode at once.
> >
> >
> Yeah, we can't do this right now, and implementing it is harder than it
> seems at first glance, since we first need to implement sorting of results,
> otherwise the paged result will be useless. Like Jim said though, this is
> another one of those *must be done* features.
>
>
> > Feeds make sense for things that are feed-like, but do atom feeds
> > really make sense for results of very dynamic queries that don't get
> > subscribed to?
> > Or, related question, is there a point where the result sets of
> > operations get so large that things start to break down?   What do
> > people find this to generally be?
> >
>
> I'm sure there are some awesome content types out there that we can look at
> that will fit our uses, I don't feel confident to say if Atom is a good
> choice, I've never worked with it..
>
> The point where this breaks down I'm gonna guess is in server-side
> serialization, because we currently don't stream the serialized data, but
> build it up in memory and ship it off when it's done. I'd say you'll run
> out
> of memory after 1 nodes or so on a small server, which I think
> underlines how important this is to fix.
>
>
> >
> > Maybe it's not an issue, but pointers to any problems REST API usage
> > has with large data sets (and solutions?) would be welcome.
> >
>
> Not aware of anyone bumping into these limits yet, but I'm sure we'll start
> hearing about it.. The only current solution I can think of is a server
> plugin that emulates this, but it would have to sort the result, and I'm
> afraid that it will be hard (probably not impossible, but hard) to
> implement
> that in a memory-efficient way that far away from the kernel. You may just
> end up moving the OutOfMemeoryExceptions' to the plugin instead of the
> serialization system.
>
>
> >
> > --Michael
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
>
>
>
> --
> Jacob Hansson
> Phone: +46 (0) 763503395
> Twitter: @jakewins
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Jacob Hansson
On Wed, Apr 20, 2011 at 11:25 AM, Craig Taverner  wrote:

> I think sorting would need to be optional, since it is likely to be a
> performance and memory hug on large traversals. I think one of the key
> benefits of the traversal framework in the Embedded API is being able to
> traverse and 'stream' a very large graph without occupying much memory. If
> this can be achieved in the REST API (through pagination), that is a very
> good thing. I assume the main challenge is being able to freeze a traverser
> and keep it on hold between client requests for the next page. Perhaps you
> have already solved that bit?
>

While I agree with you that the ability to effectively stream the results of
a traversal is a very useful thing, I don't like the persisted traverser
approach, for several reasons. I'm sorry if my tone below is a bit harsh, I
don't mean it that way, I simply want to make a strong case for why I think
the hard way is the right way in this case.

First, the only good restful approach I can think of for doing persisted
traversals would be to "create" a traversal resource (since it is an object
that keeps persistent state), and get back an id to refer to it. Subsequent
calls to paged results would then be to that traversal resource, updating
its state and getting results back. Assuming this is the correct way to
implement this, it comes with a lot of questions. Should there be a timeout
for these resources, or is the user responsible for removing them from
memory? What happens when the server crashes and the client can't find the
traversal resources it has ids for?

If we somehow solve that or find some better approach, we end up with an API
where a client can get paged results, but two clients performing the same
traversal on the same data may get back the same result in different order
(see my comments on sorting based on expected traversal behaviour below).
This means that the API is really only useful if you actually want to get
the entire result back. If that was the problem we wanted to solve, a
streaming solution is a much easier and faster approach than a paging
solution.

Second, being able to iterate over the entire result set is only half of the
use cases we are looking to solve. The other half are the ones I mentioned
examples of (the blog case, presenting lists of things to users and so on),
and those are not solved by this. Forcing users of our database to pull out
all their data over the wire and sort the whole thing, only to keep the
first 10 items, for each user that lands on their frontpage, is not ok.

Third, and most importantly to me, using this case to put more pressure on
ourselves to implement real sorting is a really good thing. Sorting is
something that *really* should be provided by us, anyone who has used a
modern database expects this to be our problem to solve. We have a really
good starting point for optimizing sorting algorithms, sitting as we are
inside the kernel with our caches and indexes :)


>
> In my opinion, I would code the sorting as a characteristic of the graph
> itself, in order to avoid having to sort in the server (and incur the
> memory/performance hit). So that means I would use a domain-specific
> solution to sorting. Of course, generic sorting is nice also, but make it
> optional.
>

I agree sorting should be an opt-in feature. Putting meta-data like sorting
order and similar things inside the graph I think is a matter of personal
preference, and for sure has its place as a useful optimization. I do,
however, think that the "official" approach to sorting needs to be based on
concepts familiar from other databases - define your query, and define how
you want the result sorted. If indexes are available the database can use
them to optimize the sorting, otherwise it will suck, but at least we're
doing what the user wants us to do. All lessons learned in YesSQL databases
(see what I did there?) should not be unlearned :)

Also, the approach of sorting via the traversal itself assumes knowledge of
which order the traverser will move through the graph, and that is not
necessarily something that will be the same in later releases. Tobias was
talking about cache-first traversals as an addition or even a replacement to
depth/breadth first ones, a major optimization we cannot do if we encourage
people to sort "inside" the graph.

/Jake


>
> On Wed, Apr 20, 2011 at 11:19 AM, Jacob Hansson  >wrote:
>
> > On Tue, Apr 19, 2011 at 10:17 PM, Michael DeHaan
> > wrote:
> >
> > > On Tue, Apr 19, 2011 at 10:58 AM, Jim Webber 
> > > wrote:
> > > >>> I'd like to propose that we put this functionality into the plugin
> (
> > > https://github.com/skanjila/gremlin-translation-plugin) that Peter and
> I
> > > are currently working on, thoughts?
> > > >
> > > > I'm thinking that, if we do it, it should be handled through content
> > > negotiation. That is if you ask for application/atom then you get paged
> > > lists of results. I don't necessarily think that's a plugin, it's 

Re: [Neo4j] REST results pagination

2011-04-20 Thread Javier de la Rosa
Wow, had I known the number of replies, I had sent the e-mail much before ;)

Sorting is a very cool feature. I didn't know the hard core
implications of pagination. The only think I want is to avoid the
overload of sending thousands nodes through HTTP in JSON. Actually,
with a workaround that splits results (using offset and limit) in the
server without a real cut of the them, would be enough for me. Sending
in JSON only a concrete number of results. If this feature could be in
the core of Neo4j, perfect :-)


On Wed, Apr 20, 2011 at 08:01, Jacob Hansson  wrote:
> On Wed, Apr 20, 2011 at 11:25 AM, Craig Taverner  wrote:
>
>> I think sorting would need to be optional, since it is likely to be a
>> performance and memory hug on large traversals. I think one of the key
>> benefits of the traversal framework in the Embedded API is being able to
>> traverse and 'stream' a very large graph without occupying much memory. If
>> this can be achieved in the REST API (through pagination), that is a very
>> good thing. I assume the main challenge is being able to freeze a traverser
>> and keep it on hold between client requests for the next page. Perhaps you
>> have already solved that bit?
>>
>
> While I agree with you that the ability to effectively stream the results of
> a traversal is a very useful thing, I don't like the persisted traverser
> approach, for several reasons. I'm sorry if my tone below is a bit harsh, I
> don't mean it that way, I simply want to make a strong case for why I think
> the hard way is the right way in this case.
>
> First, the only good restful approach I can think of for doing persisted
> traversals would be to "create" a traversal resource (since it is an object
> that keeps persistent state), and get back an id to refer to it. Subsequent
> calls to paged results would then be to that traversal resource, updating
> its state and getting results back. Assuming this is the correct way to
> implement this, it comes with a lot of questions. Should there be a timeout
> for these resources, or is the user responsible for removing them from
> memory? What happens when the server crashes and the client can't find the
> traversal resources it has ids for?
>
> If we somehow solve that or find some better approach, we end up with an API
> where a client can get paged results, but two clients performing the same
> traversal on the same data may get back the same result in different order
> (see my comments on sorting based on expected traversal behaviour below).
> This means that the API is really only useful if you actually want to get
> the entire result back. If that was the problem we wanted to solve, a
> streaming solution is a much easier and faster approach than a paging
> solution.
>
> Second, being able to iterate over the entire result set is only half of the
> use cases we are looking to solve. The other half are the ones I mentioned
> examples of (the blog case, presenting lists of things to users and so on),
> and those are not solved by this. Forcing users of our database to pull out
> all their data over the wire and sort the whole thing, only to keep the
> first 10 items, for each user that lands on their frontpage, is not ok.
>
> Third, and most importantly to me, using this case to put more pressure on
> ourselves to implement real sorting is a really good thing. Sorting is
> something that *really* should be provided by us, anyone who has used a
> modern database expects this to be our problem to solve. We have a really
> good starting point for optimizing sorting algorithms, sitting as we are
> inside the kernel with our caches and indexes :)
>
>
>>
>> In my opinion, I would code the sorting as a characteristic of the graph
>> itself, in order to avoid having to sort in the server (and incur the
>> memory/performance hit). So that means I would use a domain-specific
>> solution to sorting. Of course, generic sorting is nice also, but make it
>> optional.
>>
>
> I agree sorting should be an opt-in feature. Putting meta-data like sorting
> order and similar things inside the graph I think is a matter of personal
> preference, and for sure has its place as a useful optimization. I do,
> however, think that the "official" approach to sorting needs to be based on
> concepts familiar from other databases - define your query, and define how
> you want the result sorted. If indexes are available the database can use
> them to optimize the sorting, otherwise it will suck, but at least we're
> doing what the user wants us to do. All lessons learned in YesSQL databases
> (see what I did there?) should not be unlearned :)
>
> Also, the approach of sorting via the traversal itself assumes knowledge of
> which order the traverser will move through the graph, and that is not
> necessarily something that will be the same in later releases. Tobias was
> talking about cache-first traversals as an addition or even a replacement to
> depth/breadth first ones, a major optimization we cannot do i

Re: [Neo4j] REST results pagination

2011-04-20 Thread Javier de la Rosa
Here is my motivation for this petition. In my ideal world, everything
in Neo4j REST server that returns a list, should be something like
RequestList or QuerySet, and supports pagination and even filtering
with lookups, so I could do things like the next (sorry for the Python
syntax, I'm always thinking of the Python rest client):

>>> gdb.nodes.all()[2:5]
# Perform the query to server to get the nodes between the 2nd and
5th position, we assume Neo always returns ordered results in the same
way.

>>> gdb.nodes.filter(name__contains="neo")[:10]
# Returns only nodes with a property called name which contains
"neo", and returns only the first 10.

This is important for the integration of the Neo4j Python Rest Client
in Django, because I'm currently developing an application with lazy
and user-defined schemas on top of Django and Neo4j. The listing of
nodes and relationships is a requirement for me, so the pagination is
a must in my aplication. Performing this in the application layer
instead of Neo4j server side, wastes a lot of time sending information
via REST.


On Wed, Apr 20, 2011 at 03:43, Michael Hunger
 wrote:
> But wouldn't that really custom operation not more easily and much faster 
> done as a server plugin?
>
> Otherwise all the data would have to be serialized to json and deserialized 
> again and no streaming possible.
>
> From a server extension you could even stream and gzip that data with ease.
>
> Cheers
>
> Michael
>
> Am 20.04.2011 um 08:41 schrieb Tim McNamara:
>
>> Data export, e.g. dumping everything as CSV, DOT or RDF?
>>
>> On 20 April 2011 18:33, Michael Hunger 
>> wrote:
>>
>>> Hi Javier,
>>>
>>> what would you need that for? I'm interested in the usecase.
>>>
>>> Cheers
>>>
>>> Michael
>>>
>>> Am 20.04.2011 um 06:17 schrieb Javier de la Rosa:
>>>
 On Tue, Apr 19, 2011 at 10:25, Jim Webber  wrote:
> I've just checked and that's in our "list of stuff we really should do
>>> because it annoys us that it's not there."
> No promises, but we do intend to work through at least some of that list
>>> for the 1.4 releases.

 If this finally is developed, it will possible to request for all
 nodes and all relationships in some URL?

>
> Jim
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user



 --
 Javier de la Rosa
 http://versae.es
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
>>>
>>> ___
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Javier de la Rosa
http://versae.es
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Michael DeHaan
>
> This is important for the integration of the Neo4j Python Rest Client
> in Django, because I'm currently developing an application with lazy
> and user-defined schemas on top of Django and Neo4j. The listing of
> nodes and relationships is a requirement for me, so the pagination is
> a must in my aplication. Performing this in the application layer
> instead of Neo4j server side, wastes a lot of time sending information
> via REST.

Well put about the listing of nodes and relationships.   That's the
use case where this comes up.

If I can't trust that my app's code indexed something correctly, or I
need to index old data later, I may need to walk the whole
graph to update the indexes, so large result sets become scary.   I
don't think I can rely on a traverse as part of the graphs might be
disjoint.

New use cases on old data mean we'll have to do that, just like adding
a new index to a SQL db.   Or if I have an index that says "all nodes
of type", that result set could get very large.

In fact, I probably need to access all nodes in order to apply any new
indexes, if I can't just send a reindexing command that says
"for all nodes add to index like so, etc".

If I'm understanding the "server plugin" thing correctly, I've got to
go write some java classes to do that... which, while I *can* do, it
would
better if it could be accessed in a language agnostic way, with
something more or less resembling a database cursor (see MongoDB's
API).

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Craig Taverner
To respond to your arguments it would be worth noting a comment by Michael
DeHaan later on in this thread. He asked for 'something more or less
resembling a database cursor (see MongoDB's API).' The trick is to achieve
this without having to store a lot of state on the server, so it is robust
against server restarts or crashes.

If we compare to the SQL situation, there are two numbers passed by the
client, the page size and the offset. The state can be re-created by the
database server entirely from this information. How this is implemented in a
relational database I do not know, but whether the database is relational or
a graph, certain behaviors would be expected, like robustness against
database content changes between the requests, and coping with very long
gaps between requests. In my opinion the database cursor could be achieved
by both of the following approaches:

   - Starting the traversal from the beginning, and only returning results
   after passing the cursor offset position
   - Keeping a live traverser in the server, and continuing it from the
   previous position

Personally I think the second approach is simply a performance optimization
of the first. So robustness is achieved by having both, with the second one
working when possible (no server restarts, timeout not expiring, etc.), and
falling back to the first in other cases. This achieves performance and
robustness. What we do not need to do with either case is keep an entire
result set in memory between client requests.

Now when you add sorting into the picture, then you need to generate the
complete result-set in memory, sort, paginate and return only the requested
page. If the entire process has to be repeated for every page requested,
this could perform very badly for large result sets. I must believe that
relational databases do not do this (but I do not know how they paginate
sorted results, unless the sort order is maintained in an index).

To avoid keeping everything in memory, or repeatedly reloading everything to
memory on every page request, we need sorted results to be produced on the
stream. This can be done by keeping the sort order in an index. This is very
hard to do in a generic way, which is why I thought it best done in a domain
specific way.

Finally, I think we are really looking at two, different but valid use
cases. The need for generic sorting combined with pagination, and the need
for pagination on very large result sets. The former use case can work with
re-traversing and sorting on each client request, is fully generic, but will
perform badly on large result sets. The latter can perform adequately on
large result sets, as long as you do not need to sort (and use the database
cursor approach to avoid loading the result set into memory).

On Wed, Apr 20, 2011 at 2:01 PM, Jacob Hansson  wrote:

> On Wed, Apr 20, 2011 at 11:25 AM, Craig Taverner  wrote:
>
> > I think sorting would need to be optional, since it is likely to be a
> > performance and memory hug on large traversals. I think one of the key
> > benefits of the traversal framework in the Embedded API is being able to
> > traverse and 'stream' a very large graph without occupying much memory.
> If
> > this can be achieved in the REST API (through pagination), that is a very
> > good thing. I assume the main challenge is being able to freeze a
> traverser
> > and keep it on hold between client requests for the next page. Perhaps
> you
> > have already solved that bit?
> >
>
> While I agree with you that the ability to effectively stream the results
> of
> a traversal is a very useful thing, I don't like the persisted traverser
> approach, for several reasons. I'm sorry if my tone below is a bit harsh, I
> don't mean it that way, I simply want to make a strong case for why I think
> the hard way is the right way in this case.
>
> First, the only good restful approach I can think of for doing persisted
> traversals would be to "create" a traversal resource (since it is an object
> that keeps persistent state), and get back an id to refer to it. Subsequent
> calls to paged results would then be to that traversal resource, updating
> its state and getting results back. Assuming this is the correct way to
> implement this, it comes with a lot of questions. Should there be a timeout
> for these resources, or is the user responsible for removing them from
> memory? What happens when the server crashes and the client can't find the
> traversal resources it has ids for?
>
> If we somehow solve that or find some better approach, we end up with an
> API
> where a client can get paged results, but two clients performing the same
> traversal on the same data may get back the same result in different order
> (see my comments on sorting based on expected traversal behaviour below).
> This means that the API is really only useful if you actually want to get
> the entire result back. If that was the problem we wanted to solve, a
> streaming solution is a much easier and

Re: [Neo4j] REST results pagination

2011-04-21 Thread Jacob Hansson
On Wed, Apr 20, 2011 at 7:42 PM, Craig Taverner  wrote:

> To respond to your arguments it would be worth noting a comment by Michael
> DeHaan later on in this thread. He asked for 'something more or less
> resembling a database cursor (see MongoDB's API).' The trick is to achieve
> this without having to store a lot of state on the server, so it is robust
> against server restarts or crashes.
>
> If we compare to the SQL situation, there are two numbers passed by the
> client, the page size and the offset. The state can be re-created by the
> database server entirely from this information. How this is implemented in
> a
> relational database I do not know, but whether the database is relational
> or
> a graph, certain behaviors would be expected, like robustness against
> database content changes between the requests, and coping with very long
> gaps between requests. In my opinion the database cursor could be achieved
> by both of the following approaches:
>
>   - Starting the traversal from the beginning, and only returning results
>   after passing the cursor offset position
>

I assume this:

Traverser x = Traversal.description().traverse( someNode );
x.nodes();
x.nodes(); // Not necessarily in the same order as previous call.

If that assumption is false or there is some workaround, then I agree that
this is a valid approach, and a good efficient alternative when sorting is
not relevant. Glancing at the code in TraverserImpl though, it really looks
like the call to .nodes  will re-run the traversal, and I thought that would
mean the two calls can yield results in different order?

  - Keeping a live traverser in the server, and continuing it from the
>   previous position
>
> Personally I think the second approach is simply a performance optimization
> of the first. So robustness is achieved by having both, with the second one
> working when possible (no server restarts, timeout not expiring, etc.), and
> falling back to the first in other cases. This achieves performance and
> robustness. What we do not need to do with either case is keep an entire
> result set in memory between client requests.
>

I understand, and completely agree. My problem with the approach is that I
think its harder than it looks at first glance.


>
> Now when you add sorting into the picture, then you need to generate the
> complete result-set in memory, sort, paginate and return only the requested
> page. If the entire process has to be repeated for every page requested,
> this could perform very badly for large result sets. I must believe that
> relational databases do not do this (but I do not know how they paginate
> sorted results, unless the sort order is maintained in an index).
>

This is what makes me push for the sorted approach - relational databases
are doing this. I don't know how they do it, but they are, and we should be
at least as good.


>
> To avoid keeping everything in memory, or repeatedly reloading everything
> to
> memory on every page request, we need sorted results to be produced on the
> stream. This can be done by keeping the sort order in an index. This is
> very
> hard to do in a generic way, which is why I thought it best done in a
> domain
> specific way.
>

I agree the issue of what should be indexed to optimize sorting is a
domain-specific problem, but I think that is how relational databases treat
it as well. If you want sorting to be fast, you have to tell them to index
the field you will be sorting on. The only difference contra having the user
put the sorting index in the graph is that relational databases will handle
the indexing for you, saving you a *ton* of work, and I think we should too.

There are cases where you need to add this sort of meta data to your domain
model, where the sorting logic is too complex, and you see that in
relational dbs as well, where people create lookup tables for various
things. There are for sure valid uses for that too, but the generic approach
I believe covers the *vast* majority of the common use cases.


> Finally, I think we are really looking at two, different but valid use
> cases. The need for generic sorting combined with pagination, and the need
> for pagination on very large result sets. The former use case can work with
> re-traversing and sorting on each client request, is fully generic, but
> will
> perform badly on large result sets. The latter can perform adequately on
> large result sets, as long as you do not need to sort (and use the database
> cursor approach to avoid loading the result set into memory).
>

I agree, this is important. I'd like to change "the need for pagination on
very large result sets" to "the ability to return very large result sets
over the wire". That opens up the debate to solutions like http streaming,
which do not have the problems that come with keeping state on the server
between calls.


>
> On Wed, Apr 20, 2011 at 2:01 PM, Jacob Hansson 
> wrote:
>
> > On Wed, Apr 20, 2011 at 11:25 AM, Craig Ta

Re: [Neo4j] REST results pagination

2011-04-21 Thread Craig Taverner
>
> I assume this:
>Traverser x = Traversal.description().traverse( someNode );
>x.nodes();
>x.nodes(); // Not necessarily in the same order as previous call.
>
> If that assumption is false or there is some workaround, then I agree that
> this is a valid approach, and a good efficient alternative when sorting is
> not relevant. Glancing at the code in TraverserImpl though, it really looks
> like the call to .nodes  will re-run the traversal, and I thought that
> would
> mean the two calls can yield results in different order?
>

OK. My assumptions were different. I assume that while the order is not
easily predictable, it is reproducable as long as the underlying graph has
not changed. If the graph changes, then the order can change also. But I
think this is true of a relational database also, is it not?

So, obviously pagination is expected (by me at least) to give page X as it
is at the time of the request for page X, not at the time of the request for
page 1.

But my assumptions could be incorrect too...

I understand, and completely agree. My problem with the approach is that I
> think its harder than it looks at first glance.
>

I guess I cannot argue that point. My original email said I did not know if
this idea had been solved yet. Since some of the key people involved in this
have not chipped into this discussion, either we are reasonably correct in
our ideas, or so wrong that they don't know where to begin correcting us ;-)

This is what makes me push for the sorted approach - relational databases
> are doing this. I don't know how they do it, but they are, and we should be
> at least as good.
>

Absolutely. We should be as good. Relational database manage to serve a page
deep down the list quite fast. I must believe if they had to complete the
traversal, sort the results and extract the page on every single page
request, they could not be so fast. I think my ideas for the traversal are
'supposed' to be performance enhancements, and that is why I like them ;-)

I agree the issue of what should be indexed to optimize sorting is a
> domain-specific problem, but I think that is how relational databases treat
> it as well. If you want sorting to be fast, you have to tell them to index
> the field you will be sorting on. The only difference contra having the
> user
> put the sorting index in the graph is that relational databases will handle
> the indexing for you, saving you a *ton* of work, and I think we should
> too.
>

Yes. I was discussing automatic indexing with Mattias recently. I think (and
hope I am right), that once we move to automatic indexes, then it will be
possible to put external indexes (a'la lucene) and graph indexes (like the
ones I favour) behind the same API. In this case perhaps the database will
more easily be able to make the right optimized decisions, and use the index
for providing sorted results fast and with low memory footprint where
possible, based on the existance or non-existance of the necessary indices.
Then all the developer needs to do to make things really fast is put in the
right index. For some data, that would be lucene and for others it would be
a graph index. If we get to this point, I think we will have closed a key
usability gap with relational databases.

There are cases where you need to add this sort of meta data to your domain
> model, where the sorting logic is too complex, and you see that in
> relational dbs as well, where people create lookup tables for various
> things. There are for sure valid uses for that too, but the generic
> approach
> I believe covers the *vast* majority of the common use cases.
>

Perhaps. But I'm not sure the two extremes are as lop-sided as you think. I
think large data users are very interested in Neo4j.

I agree, this is important. I'd like to change "the need for pagination on
> very large result sets" to "the ability to return very large result sets
> over the wire". That opens up the debate to solutions like http streaming,
> which do not have the problems that come with keeping state on the server
> between calls.
>

I think there are two separate, but related, problems to solve. One is the
transfer of large result-sets over the wire for people that need that. The
other is efficiently providing the small page of results from a large
dataset. Most of our discussion has so far focused on the latter.

For the former, I did a bit of experimenting last year and was able to
compact my JSON by several times by moving all meta-data into a header
section. This works very well for data that has a repeating structure, for
example a large number of records with similar schema. I know schema is a
nasty word in the nosql world, but it is certainly common for data to have a
repeating pattern, especially when dealing with very large numbers. Then you
find that something like CSV is actually an efficient format, since the bulk
of the text is only the data. We did this in JSON by simply specifying a
meta-data element (wit

Re: [Neo4j] REST results pagination

2011-04-21 Thread Rick Bullotta
Fwiw, I think paging is an outdated "crutch", for a few reasons:

1) bandwidth and browser processing/parsing are largely non issues, although 
they used to be

2) human users rarely have the patience (and usability sucks) to go beyond 2-4 
pages of information.  It is far better to allow incrementally refined filters 
and searches to get to a workable subset of data.

3) machine users could care less about paging

4) when doing visualization of a large dataset, you generally want the whole 
dataset, not a page of it, so that's another "non use case"

Discuss and debate please!

Rick



- Reply message -
From: "Craig Taverner" 
Date: Thu, Apr 21, 2011 8:52 am
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

>
> I assume this:
>Traverser x = Traversal.description().traverse( someNode );
>x.nodes();
>x.nodes(); // Not necessarily in the same order as previous call.
>
> If that assumption is false or there is some workaround, then I agree that
> this is a valid approach, and a good efficient alternative when sorting is
> not relevant. Glancing at the code in TraverserImpl though, it really looks
> like the call to .nodes  will re-run the traversal, and I thought that
> would
> mean the two calls can yield results in different order?
>

OK. My assumptions were different. I assume that while the order is not
easily predictable, it is reproducable as long as the underlying graph has
not changed. If the graph changes, then the order can change also. But I
think this is true of a relational database also, is it not?

So, obviously pagination is expected (by me at least) to give page X as it
is at the time of the request for page X, not at the time of the request for
page 1.

But my assumptions could be incorrect too...

I understand, and completely agree. My problem with the approach is that I
> think its harder than it looks at first glance.
>

I guess I cannot argue that point. My original email said I did not know if
this idea had been solved yet. Since some of the key people involved in this
have not chipped into this discussion, either we are reasonably correct in
our ideas, or so wrong that they don't know where to begin correcting us ;-)

This is what makes me push for the sorted approach - relational databases
> are doing this. I don't know how they do it, but they are, and we should be
> at least as good.
>

Absolutely. We should be as good. Relational database manage to serve a page
deep down the list quite fast. I must believe if they had to complete the
traversal, sort the results and extract the page on every single page
request, they could not be so fast. I think my ideas for the traversal are
'supposed' to be performance enhancements, and that is why I like them ;-)

I agree the issue of what should be indexed to optimize sorting is a
> domain-specific problem, but I think that is how relational databases treat
> it as well. If you want sorting to be fast, you have to tell them to index
> the field you will be sorting on. The only difference contra having the
> user
> put the sorting index in the graph is that relational databases will handle
> the indexing for you, saving you a *ton* of work, and I think we should
> too.
>

Yes. I was discussing automatic indexing with Mattias recently. I think (and
hope I am right), that once we move to automatic indexes, then it will be
possible to put external indexes (a'la lucene) and graph indexes (like the
ones I favour) behind the same API. In this case perhaps the database will
more easily be able to make the right optimized decisions, and use the index
for providing sorted results fast and with low memory footprint where
possible, based on the existance or non-existance of the necessary indices.
Then all the developer needs to do to make things really fast is put in the
right index. For some data, that would be lucene and for others it would be
a graph index. If we get to this point, I think we will have closed a key
usability gap with relational databases.

There are cases where you need to add this sort of meta data to your domain
> model, where the sorting logic is too complex, and you see that in
> relational dbs as well, where people create lookup tables for various
> things. There are for sure valid uses for that too, but the generic
> approach
> I believe covers the *vast* majority of the common use cases.
>

Perhaps. But I'm not sure the two extremes are as lop-sided as you think. I
think large data users are very interested in Neo4j.

I agree, this is important. I'd like to change "the need for pagination on
> very large result sets" to "the ability to return very large result sets
> over the wire". That opens up the debate to solutions like http streaming,
> which do not have the problems that come with keeping state on the server
> between calls.
>

I think there are two separate, but related, problems to solve. One is the
transfer of large result-sets over the wire for people that need that. The
other 

Re: [Neo4j] REST results pagination

2011-04-21 Thread Georg Summer
Legacy application that just uses a new data source. It can be quite hard to
get users away from their trusty old-chap-UI.  In the case of Pagination,
Legacy might only mean some years but still legacy :-).

@1-2) In the wake of mobile applications and mobile sites a pagination
system might be more relevant than bulk loading everything and displaying
it. defining smart filters might be problematic in such a use case as well.

Parallelism of an application could also be a interesting aspect. Each
worker retrieves the different pages of the graph and the user does not have
to care at all about separating the graph after downloading it. This would
only be interesting though if the graph relations are not important.

Georg

On 21 April 2011 14:59, Rick Bullotta  wrote:

> Fwiw, I think paging is an outdated "crutch", for a few reasons:
>
> 1) bandwidth and browser processing/parsing are largely non issues,
> although they used to be
>
> 2) human users rarely have the patience (and usability sucks) to go beyond
> 2-4 pages of information.  It is far better to allow incrementally refined
> filters and searches to get to a workable subset of data.
>
> 3) machine users could care less about paging
>
> 4) when doing visualization of a large dataset, you generally want the
> whole dataset, not a page of it, so that's another "non use case"
>
> Discuss and debate please!
>
> Rick
>
>
>
> - Reply message -
> From: "Craig Taverner" 
> Date: Thu, Apr 21, 2011 8:52 am
> Subject: [Neo4j] REST results pagination
> To: "Neo4j user discussions" 
>
> >
> > I assume this:
> >Traverser x = Traversal.description().traverse( someNode );
> >x.nodes();
> >x.nodes(); // Not necessarily in the same order as previous call.
> >
> > If that assumption is false or there is some workaround, then I agree
> that
> > this is a valid approach, and a good efficient alternative when sorting
> is
> > not relevant. Glancing at the code in TraverserImpl though, it really
> looks
> > like the call to .nodes  will re-run the traversal, and I thought that
> > would
> > mean the two calls can yield results in different order?
> >
>
> OK. My assumptions were different. I assume that while the order is not
> easily predictable, it is reproducable as long as the underlying graph has
> not changed. If the graph changes, then the order can change also. But I
> think this is true of a relational database also, is it not?
>
> So, obviously pagination is expected (by me at least) to give page X as it
> is at the time of the request for page X, not at the time of the request
> for
> page 1.
>
> But my assumptions could be incorrect too...
>
> I understand, and completely agree. My problem with the approach is that I
> > think its harder than it looks at first glance.
> >
>
> I guess I cannot argue that point. My original email said I did not know if
> this idea had been solved yet. Since some of the key people involved in
> this
> have not chipped into this discussion, either we are reasonably correct in
> our ideas, or so wrong that they don't know where to begin correcting us
> ;-)
>
> This is what makes me push for the sorted approach - relational databases
> > are doing this. I don't know how they do it, but they are, and we should
> be
> > at least as good.
> >
>
> Absolutely. We should be as good. Relational database manage to serve a
> page
> deep down the list quite fast. I must believe if they had to complete the
> traversal, sort the results and extract the page on every single page
> request, they could not be so fast. I think my ideas for the traversal are
> 'supposed' to be performance enhancements, and that is why I like them ;-)
>
> I agree the issue of what should be indexed to optimize sorting is a
> > domain-specific problem, but I think that is how relational databases
> treat
> > it as well. If you want sorting to be fast, you have to tell them to
> index
> > the field you will be sorting on. The only difference contra having the
> > user
> > put the sorting index in the graph is that relational databases will
> handle
> > the indexing for you, saving you a *ton* of work, and I think we should
> > too.
> >
>
> Yes. I was discussing automatic indexing with Mattias recently. I think
> (and
> hope I am right), that once we move to automatic indexes, then it will be
> possible to put external indexes (a'la lucene) and graph indexes (like the
> ones I favour) behind the same API. In this case perhaps the database will
> more easily be able to make the right optimized decisions, and use the
> index
> for providing sorted results fast and with low memory footprint where
> possible, based on the existance or non-existance of the necessary indices.
> Then all the developer needs to do to make things really fast is put in the
> right index. For some data, that would be lucene and for others it would be
> a graph index. If we get to this point, I think we will have closed a key
> usability gap with relational databases.
>
> There a

Re: [Neo4j] REST results pagination

2011-04-21 Thread Michael DeHaan
>
> 3) machine users could care less about paging

My thoughts are that parsing very large documents can perform poorly
and requires the entire document be slurped into (available) RAM.
This puts a cap on the size of a usable resultset and slows
processing, or at least makes you pay an up-front cost, and decreases
potential for parallelism in other parts of your app?.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Rick Bullotta
That can be dealt with via more "streamable" content structures.

- Reply message -
From: "Michael DeHaan" 
Date: Thu, Apr 21, 2011 9:44 am
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

>
> 3) machine users could care less about paging

My thoughts are that parsing very large documents can perform poorly
and requires the entire document be slurped into (available) RAM.
This puts a cap on the size of a usable resultset and slows
processing, or at least makes you pay an up-front cost, and decreases
potential for parallelism in other parts of your app?.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Jacob Hansson
On Thu, Apr 21, 2011 at 2:52 PM, Craig Taverner  wrote:

> >
> > I assume this:
> >Traverser x = Traversal.description().traverse( someNode );
> >x.nodes();
> >x.nodes(); // Not necessarily in the same order as previous call.
> >
> > If that assumption is false or there is some workaround, then I agree
> that
> > this is a valid approach, and a good efficient alternative when sorting
> is
> > not relevant. Glancing at the code in TraverserImpl though, it really
> looks
> > like the call to .nodes  will re-run the traversal, and I thought that
> > would
> > mean the two calls can yield results in different order?
> >
>
> OK. My assumptions were different. I assume that while the order is not
> easily predictable, it is reproducable as long as the underlying graph has
> not changed. If the graph changes, then the order can change also. But I
> think this is true of a relational database also, is it not?
>
> So, obviously pagination is expected (by me at least) to give page X as it
> is at the time of the request for page X, not at the time of the request
> for
> page 1.
>
> But my assumptions could be incorrect too...
>

I think you are probably right about that, and if you don't provide a sort
order, then I think a SQL database will exert the same sort of unknown
behaviour, like you say.

Leaving out single-user single-threaded applications then, it must be
assumed that the database will be accessed by other parties while we page
through our result. If the cache-first traversal optimization gets
implemented, it might even be enough to read the results of the traversal
for the sorting order the next time around to be different. Point being,
there is a reasonable assumption that parts of the traversal result will
never get returned due to the shifting sort order.

I can only think of a few use cases where loosing some of the expected
result is ok, for instance if you want to "peek" at the result.


>
> I understand, and completely agree. My problem with the approach is that I
> > think its harder than it looks at first glance.
> >
>
> I guess I cannot argue that point. My original email said I did not know if
> this idea had been solved yet. Since some of the key people involved in
> this
> have not chipped into this discussion, either we are reasonably correct in
> our ideas, or so wrong that they don't know where to begin correcting us
> ;-)
>

I'm waiting for one of those SlapOnTheFingersExceptions' that Tobias has
been handing out :)


>
> This is what makes me push for the sorted approach - relational databases
> > are doing this. I don't know how they do it, but they are, and we should
> be
> > at least as good.
> >
>
> Absolutely. We should be as good. Relational database manage to serve a
> page
> deep down the list quite fast. I must believe if they had to complete the
> traversal, sort the results and extract the page on every single page
> request, they could not be so fast. I think my ideas for the traversal are
> 'supposed' to be performance enhancements, and that is why I like them ;-)
>

I think they are performance enhancements, huge ones. But I still think
there are hard problems involved in putting them into practice.


>
I agree the issue of what should be indexed to optimize sorting is a
> > domain-specific problem, but I think that is how relational databases
> treat
> > it as well. If you want sorting to be fast, you have to tell them to
> index
> > the field you will be sorting on. The only difference contra having the
> > user
> > put the sorting index in the graph is that relational databases will
> handle
> > the indexing for you, saving you a *ton* of work, and I think we should
> > too.
> >
>
> Yes. I was discussing automatic indexing with Mattias recently. I think
> (and
> hope I am right), that once we move to automatic indexes, then it will be
> possible to put external indexes (a'la lucene) and graph indexes (like the
> ones I favour) behind the same API. In this case perhaps the database will
> more easily be able to make the right optimized decisions, and use the
> index
> for providing sorted results fast and with low memory footprint where
> possible, based on the existance or non-existance of the necessary indices.
> Then all the developer needs to do to make things really fast is put in the
> right index. For some data, that would be lucene and for others it would be
> a graph index. If we get to this point, I think we will have closed a key
> usability gap with relational databases.
>

Couldn't agree more :)


>
> There are cases where you need to add this sort of meta data to your domain
> > model, where the sorting logic is too complex, and you see that in
> > relational dbs as well, where people create lookup tables for various
> > things. There are for sure valid uses for that too, but the generic
> > approach
> > I believe covers the *vast* majority of the common use cases.
> >
>
> Perhaps. But I'm not sure the two extremes are as lop-sided as you think.

Re: [Neo4j] REST results pagination

2011-04-21 Thread Jacob Hansson
On Thu, Apr 21, 2011 at 2:59 PM, Rick Bullotta
wrote:

> Fwiw, I think paging is an outdated "crutch", for a few reasons:
>
> 1) bandwidth and browser processing/parsing are largely non issues,
> although they used to be
>

I disagree. They have improved significantly, for sure, but that is no
reason to download massive amounts of data that will never be used.


>
> 2) human users rarely have the patience (and usability sucks) to go beyond
> 2-4 pages of information.  It is far better to allow incrementally refined
> filters and searches to get to a workable subset of data.
>

I agree with the suckiness of paging and the awesomeness of filtering - but
what do you do when the users filter returns 40 million results? You somehow
have to tell the user that "damn, that filter, it returned forty freaking
million results, you need to refine your search buddy".

The way the user expects that to happen is through presenting a paged,
infinite scrolled or similar interface, where she can see how many results
where returned and act on that feedback.


> 3) machine users could care less about paging
>
>
Agreed, streaming is a much better way for machines to talk about data that
doesn't fit in memory.


> 4) when doing visualization of a large dataset, you generally want the
> whole dataset, not a page of it, so that's another "non use case"
>

Not necessarily true. You need all the data that you want to visualize, but
that is not necessarily all the data the user has asked for. You can be
clever about the visualization to keep it uncluttered, and "paging"-like
behaviours may be a way to do that.


>
> Discuss and debate please!
>
> Rick
>
>
>
> - Reply message -
> From: "Craig Taverner" 
> Date: Thu, Apr 21, 2011 8:52 am
> Subject: [Neo4j] REST results pagination
> To: "Neo4j user discussions" 
>
> >
> > I assume this:
> >Traverser x = Traversal.description().traverse( someNode );
> >x.nodes();
> >x.nodes(); // Not necessarily in the same order as previous call.
> >
> > If that assumption is false or there is some workaround, then I agree
> that
> > this is a valid approach, and a good efficient alternative when sorting
> is
> > not relevant. Glancing at the code in TraverserImpl though, it really
> looks
> > like the call to .nodes  will re-run the traversal, and I thought that
> > would
> > mean the two calls can yield results in different order?
> >
>
> OK. My assumptions were different. I assume that while the order is not
> easily predictable, it is reproducable as long as the underlying graph has
> not changed. If the graph changes, then the order can change also. But I
> think this is true of a relational database also, is it not?
>
> So, obviously pagination is expected (by me at least) to give page X as it
> is at the time of the request for page X, not at the time of the request
> for
> page 1.
>
> But my assumptions could be incorrect too...
>
> I understand, and completely agree. My problem with the approach is that I
> > think its harder than it looks at first glance.
> >
>
> I guess I cannot argue that point. My original email said I did not know if
> this idea had been solved yet. Since some of the key people involved in
> this
> have not chipped into this discussion, either we are reasonably correct in
> our ideas, or so wrong that they don't know where to begin correcting us
> ;-)
>
> This is what makes me push for the sorted approach - relational databases
> > are doing this. I don't know how they do it, but they are, and we should
> be
> > at least as good.
> >
>
> Absolutely. We should be as good. Relational database manage to serve a
> page
> deep down the list quite fast. I must believe if they had to complete the
> traversal, sort the results and extract the page on every single page
> request, they could not be so fast. I think my ideas for the traversal are
> 'supposed' to be performance enhancements, and that is why I like them ;-)
>
> I agree the issue of what should be indexed to optimize sorting is a
> > domain-specific problem, but I think that is how relational databases
> treat
> > it as well. If you want sorting to be fast, you have to tell them to
> index
> > the field you will be sorting on. The only difference contra having the
> > user
> > put the sorting index in the graph is that relational databases will
> handle
> > the indexing for you, saving you a *ton* of work, and I think we should
> > too.
> >
>
> Yes. I was discussing automatic indexing with Mattias recently. I think
> (and
> hope I am right), that once we move to automatic indexes, then it will be
> possible to put external indexes (a'la lucene) and graph indexes (like the
> ones I favour) behind the same API. In this case perhaps the database will
> more easily be able to make the right optimized decisions, and use the
> index
> for providing sorted results fast and with low memory footprint where
> possible, based on the existance or non-existance of the necessary indices.
> Then all the developer 

Re: [Neo4j] REST results pagination

2011-04-21 Thread Rick Bullotta
Fwiw, we use an "idiot resistant" (no such thing as "idiot proof") approach 
that clamps the number of returned items on the server side by default. We 
allow the user to explicitly request to do something foolish and ask for more 
data, but it requires a conscious effort.


- Reply message -
From: "Jacob Hansson" 
Date: Thu, Apr 21, 2011 10:06 am
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

On Thu, Apr 21, 2011 at 2:59 PM, Rick Bullotta
wrote:

> Fwiw, I think paging is an outdated "crutch", for a few reasons:
>
> 1) bandwidth and browser processing/parsing are largely non issues,
> although they used to be
>

I disagree. They have improved significantly, for sure, but that is no
reason to download massive amounts of data that will never be used.


>
> 2) human users rarely have the patience (and usability sucks) to go beyond
> 2-4 pages of information.  It is far better to allow incrementally refined
> filters and searches to get to a workable subset of data.
>

I agree with the suckiness of paging and the awesomeness of filtering - but
what do you do when the users filter returns 40 million results? You somehow
have to tell the user that "damn, that filter, it returned forty freaking
million results, you need to refine your search buddy".

The way the user expects that to happen is through presenting a paged,
infinite scrolled or similar interface, where she can see how many results
where returned and act on that feedback.


> 3) machine users could care less about paging
>
>
Agreed, streaming is a much better way for machines to talk about data that
doesn't fit in memory.


> 4) when doing visualization of a large dataset, you generally want the
> whole dataset, not a page of it, so that's another "non use case"
>

Not necessarily true. You need all the data that you want to visualize, but
that is not necessarily all the data the user has asked for. You can be
clever about the visualization to keep it uncluttered, and "paging"-like
behaviours may be a way to do that.


>
> Discuss and debate please!
>
> Rick
>
>
>
> - Reply message -
> From: "Craig Taverner" 
> Date: Thu, Apr 21, 2011 8:52 am
> Subject: [Neo4j] REST results pagination
> To: "Neo4j user discussions" 
>
> >
> > I assume this:
> >Traverser x = Traversal.description().traverse( someNode );
> >x.nodes();
> >x.nodes(); // Not necessarily in the same order as previous call.
> >
> > If that assumption is false or there is some workaround, then I agree
> that
> > this is a valid approach, and a good efficient alternative when sorting
> is
> > not relevant. Glancing at the code in TraverserImpl though, it really
> looks
> > like the call to .nodes  will re-run the traversal, and I thought that
> > would
> > mean the two calls can yield results in different order?
> >
>
> OK. My assumptions were different. I assume that while the order is not
> easily predictable, it is reproducable as long as the underlying graph has
> not changed. If the graph changes, then the order can change also. But I
> think this is true of a relational database also, is it not?
>
> So, obviously pagination is expected (by me at least) to give page X as it
> is at the time of the request for page X, not at the time of the request
> for
> page 1.
>
> But my assumptions could be incorrect too...
>
> I understand, and completely agree. My problem with the approach is that I
> > think its harder than it looks at first glance.
> >
>
> I guess I cannot argue that point. My original email said I did not know if
> this idea had been solved yet. Since some of the key people involved in
> this
> have not chipped into this discussion, either we are reasonably correct in
> our ideas, or so wrong that they don't know where to begin correcting us
> ;-)
>
> This is what makes me push for the sorted approach - relational databases
> > are doing this. I don't know how they do it, but they are, and we should
> be
> > at least as good.
> >
>
> Absolutely. We should be as good. Relational database manage to serve a
> page
> deep down the list quite fast. I must believe if they had to complete the
> traversal, sort the results and extract the page on every single page
> request, they could not be so fast. I think my ideas for the traversal are
> 'supposed' to be performance enhancements, and that is why I like them ;-)
>
> I agree the issue of what should be indexed to optimize sorting is a
> > domain-specific problem, but I think that is how relational databases
> treat
> > it as well. If you want sorting to be fast, you have to tell them to
> index
> > the field you will be sorting on. The only difference contra having the
> > user
> > put the sorting index in the graph is that relational databases will
> handle
> > the indexing for you, saving you a *ton* of work, and I think we should
> > too.
> >
>
> Yes. I was discussing automatic indexing with Mattias recently. I think
> (and
> hope I am right), that once we move to automatic index

Re: [Neo4j] REST results pagination

2011-04-21 Thread Rick Bullotta
Good dialog, btw!

- Reply message -
From: "Jacob Hansson" 
Date: Thu, Apr 21, 2011 10:06 am
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

On Thu, Apr 21, 2011 at 2:59 PM, Rick Bullotta
wrote:

> Fwiw, I think paging is an outdated "crutch", for a few reasons:
>
> 1) bandwidth and browser processing/parsing are largely non issues,
> although they used to be
>

I disagree. They have improved significantly, for sure, but that is no
reason to download massive amounts of data that will never be used.


>
> 2) human users rarely have the patience (and usability sucks) to go beyond
> 2-4 pages of information.  It is far better to allow incrementally refined
> filters and searches to get to a workable subset of data.
>

I agree with the suckiness of paging and the awesomeness of filtering - but
what do you do when the users filter returns 40 million results? You somehow
have to tell the user that "damn, that filter, it returned forty freaking
million results, you need to refine your search buddy".

The way the user expects that to happen is through presenting a paged,
infinite scrolled or similar interface, where she can see how many results
where returned and act on that feedback.


> 3) machine users could care less about paging
>
>
Agreed, streaming is a much better way for machines to talk about data that
doesn't fit in memory.


> 4) when doing visualization of a large dataset, you generally want the
> whole dataset, not a page of it, so that's another "non use case"
>

Not necessarily true. You need all the data that you want to visualize, but
that is not necessarily all the data the user has asked for. You can be
clever about the visualization to keep it uncluttered, and "paging"-like
behaviours may be a way to do that.


>
> Discuss and debate please!
>
> Rick
>
>
>
> - Reply message -
> From: "Craig Taverner" 
> Date: Thu, Apr 21, 2011 8:52 am
> Subject: [Neo4j] REST results pagination
> To: "Neo4j user discussions" 
>
> >
> > I assume this:
> >Traverser x = Traversal.description().traverse( someNode );
> >x.nodes();
> >x.nodes(); // Not necessarily in the same order as previous call.
> >
> > If that assumption is false or there is some workaround, then I agree
> that
> > this is a valid approach, and a good efficient alternative when sorting
> is
> > not relevant. Glancing at the code in TraverserImpl though, it really
> looks
> > like the call to .nodes  will re-run the traversal, and I thought that
> > would
> > mean the two calls can yield results in different order?
> >
>
> OK. My assumptions were different. I assume that while the order is not
> easily predictable, it is reproducable as long as the underlying graph has
> not changed. If the graph changes, then the order can change also. But I
> think this is true of a relational database also, is it not?
>
> So, obviously pagination is expected (by me at least) to give page X as it
> is at the time of the request for page X, not at the time of the request
> for
> page 1.
>
> But my assumptions could be incorrect too...
>
> I understand, and completely agree. My problem with the approach is that I
> > think its harder than it looks at first glance.
> >
>
> I guess I cannot argue that point. My original email said I did not know if
> this idea had been solved yet. Since some of the key people involved in
> this
> have not chipped into this discussion, either we are reasonably correct in
> our ideas, or so wrong that they don't know where to begin correcting us
> ;-)
>
> This is what makes me push for the sorted approach - relational databases
> > are doing this. I don't know how they do it, but they are, and we should
> be
> > at least as good.
> >
>
> Absolutely. We should be as good. Relational database manage to serve a
> page
> deep down the list quite fast. I must believe if they had to complete the
> traversal, sort the results and extract the page on every single page
> request, they could not be so fast. I think my ideas for the traversal are
> 'supposed' to be performance enhancements, and that is why I like them ;-)
>
> I agree the issue of what should be indexed to optimize sorting is a
> > domain-specific problem, but I think that is how relational databases
> treat
> > it as well. If you want sorting to be fast, you have to tell them to
> index
> > the field you will be sorting on. The only difference contra having the
> > user
> > put the sorting index in the graph is that relational databases will
> handle
> > the indexing for you, saving you a *ton* of work, and I think we should
> > too.
> >
>
> Yes. I was discussing automatic indexing with Mattias recently. I think
> (and
> hope I am right), that once we move to automatic indexes, then it will be
> possible to put external indexes (a'la lucene) and graph indexes (like the
> ones I favour) behind the same API. In this case perhaps the database will
> more easily be able to make the right optimized decisions, and use the
> index
> 

Re: [Neo4j] REST results pagination

2011-04-21 Thread Jim Webber
This is indeed a good dialogue. The pagination versus streaming was something 
I'd previously had in my mind as orthogonal issues, but I like the direction 
this is going. Let's break it down to fundamentals:

As a remote client, I want to be just as rich and performant as a local client. 
Unfortunately,  Deutsch, Amdahl and Einstein are against me on that, and I 
don't think I am tough enough to defeat those guys.

So what are my choices? I know I have to be more "granular" to try to alleviate 
some of the network penalty so doing operations bulkily sounds great. 

Now what I need to decide is whether I control the rate at which those bulk 
operations occur or whether the server does. If I want to control those 
operations, then paging seems sensible. Otherwise a streamed (chunked) encoding 
scheme would make sense if I'm happy for the server to throw results back at me 
at its own pace. Or indeed you can mix both so that pages are streamed.

In either case if I get bored of those results, I'll stop paging or I'll 
terminate the connection.

So what does this mean for implementation on the server? I guess this is 
important since it affects the likelihood of the Neo Tech team implementing it.

If the server supports pagination, it means we need a paging controller in 
memory per paginated result set being created. If we assume that we'll only go 
forward in pages, that's effectively just a wrapper around the traversal that's 
been uploaded. The overhead should be modest, and apart from the paging 
controller and the traverser, it doesn't need much state. We would need to add 
some logic to the representation code to support "next" links, but that seems a 
modest task.

If the server streams, we will need to decouple the representation generation 
from the existing representation logic since that builds an in-memory 
representation which is then flushed. Instead we'll need a streaming 
representation implementation which seems to be a reasonable amount of 
engineering. We'll also need a new streaming binding to the REST server in 
JAX-RS land.

I'm still a bit concerned about how "rude" it is for a client to just drop a 
streaming connection. I've asked Mark Nottingham for his authoritative opinion 
on that. But still, this does seem popular and feasible.

Jim





___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Craig Taverner
>
> I can only think of a few use cases where loosing some of the expected
> result is ok, for instance if you want to "peek" at the result.
>

IMHO, paging is, by definition, a "peek". Since the client controls when the
next page will be requested, it is not possible, or reasonable, to enforce
that the complete set of pages (if every requested) will represent a
consistent result set. This is not supported by relational databases either.
The result set, and meaning of a page, can change between requests. So it
can, and does happen, data some of the expected result is lost.

This is completely different to the streaming result, which I see Jim
commented on, and so I might just reply to his mail too :-)

I'm waiting for one of those SlapOnTheFingersExceptions' that Tobias has
> been handing out :)
>

My fingers are, as yet, unscathed. The slap can come at any moment! :-)

This sounds really cool, would be a great thing to look into!
>

Should you want examples, I have a wiki page on this topic at
http://redmine.amanzi.org/wiki/geoptima/Geoptima_Event_Log


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Craig Taverner
I think Jim makes a great point about the differences between paging and
streaming, being client or server controlled. I think there is a related
point to be made, and that is that paging does not, and cannot, guarantee a
consistent total result set. Since the database can change between pages
requests, they can be inconsistent. It is possible for the same record to
appear in two pages, or for a record to be missed. This is certainly how
relational databases work in this regard.

But in the streaming case, we expect a complete and consistent result set.
Unless, of course, the client cuts off the stream. The use case is very
different, while paging is about getting a peek at the data, and rarely
about paging all the way to the end, streaming is about getting the entire
result, but streamed for efficiency.

On Thu, Apr 21, 2011 at 5:00 PM, Jim Webber  wrote:

> This is indeed a good dialogue. The pagination versus streaming was
> something I'd previously had in my mind as orthogonal issues, but I like the
> direction this is going. Let's break it down to fundamentals:
>
> As a remote client, I want to be just as rich and performant as a local
> client. Unfortunately,  Deutsch, Amdahl and Einstein are against me on that,
> and I don't think I am tough enough to defeat those guys.
>
> So what are my choices? I know I have to be more "granular" to try to
> alleviate some of the network penalty so doing operations bulkily sounds
> great.
>
> Now what I need to decide is whether I control the rate at which those bulk
> operations occur or whether the server does. If I want to control those
> operations, then paging seems sensible. Otherwise a streamed (chunked)
> encoding scheme would make sense if I'm happy for the server to throw
> results back at me at its own pace. Or indeed you can mix both so that pages
> are streamed.
>
> In either case if I get bored of those results, I'll stop paging or I'll
> terminate the connection.
>
> So what does this mean for implementation on the server? I guess this is
> important since it affects the likelihood of the Neo Tech team implementing
> it.
>
> If the server supports pagination, it means we need a paging controller in
> memory per paginated result set being created. If we assume that we'll only
> go forward in pages, that's effectively just a wrapper around the traversal
> that's been uploaded. The overhead should be modest, and apart from the
> paging controller and the traverser, it doesn't need much state. We would
> need to add some logic to the representation code to support "next" links,
> but that seems a modest task.
>
> If the server streams, we will need to decouple the representation
> generation from the existing representation logic since that builds an
> in-memory representation which is then flushed. Instead we'll need a
> streaming representation implementation which seems to be a reasonable
> amount of engineering. We'll also need a new streaming binding to the REST
> server in JAX-RS land.
>
> I'm still a bit concerned about how "rude" it is for a client to just drop
> a streaming connection. I've asked Mark Nottingham for his authoritative
> opinion on that. But still, this does seem popular and feasible.
>
> Jim
>
>
>
>
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Rick Bullotta
Jim, we should schedule a group chat on this topic.



- Reply message -
From: "Jim Webber" 
Date: Thu, Apr 21, 2011 11:01 am
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

This is indeed a good dialogue. The pagination versus streaming was something 
I'd previously had in my mind as orthogonal issues, but I like the direction 
this is going. Let's break it down to fundamentals:

As a remote client, I want to be just as rich and performant as a local client. 
Unfortunately,  Deutsch, Amdahl and Einstein are against me on that, and I 
don't think I am tough enough to defeat those guys.

So what are my choices? I know I have to be more "granular" to try to alleviate 
some of the network penalty so doing operations bulkily sounds great.

Now what I need to decide is whether I control the rate at which those bulk 
operations occur or whether the server does. If I want to control those 
operations, then paging seems sensible. Otherwise a streamed (chunked) encoding 
scheme would make sense if I'm happy for the server to throw results back at me 
at its own pace. Or indeed you can mix both so that pages are streamed.

In either case if I get bored of those results, I'll stop paging or I'll 
terminate the connection.

So what does this mean for implementation on the server? I guess this is 
important since it affects the likelihood of the Neo Tech team implementing it.

If the server supports pagination, it means we need a paging controller in 
memory per paginated result set being created. If we assume that we'll only go 
forward in pages, that's effectively just a wrapper around the traversal that's 
been uploaded. The overhead should be modest, and apart from the paging 
controller and the traverser, it doesn't need much state. We would need to add 
some logic to the representation code to support "next" links, but that seems a 
modest task.

If the server streams, we will need to decouple the representation generation 
from the existing representation logic since that builds an in-memory 
representation which is then flushed. Instead we'll need a streaming 
representation implementation which seems to be a reasonable amount of 
engineering. We'll also need a new streaming binding to the REST server in 
JAX-RS land.

I'm still a bit concerned about how "rude" it is for a client to just drop a 
streaming connection. I've asked Mark Nottingham for his authoritative opinion 
on that. But still, this does seem popular and feasible.

Jim





___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Michael Hunger
Really cool discussion so far,

I would also prefer streaming over paging as with that approach we can give 
both ends more of the control they need.

The server doesn't have to keep state over a long time (and also implement 
timeouts and clearing of that state, and keeping that state for lots of clients 
also adds up).
The client can decide how much of the result he's interested in, if it is just 
1 entry or 100k and then just drop the connection.
Streaming calls can also have a request-timeout, so keeping those open for too 
long (with no activity) will close them automatically.
Server doesn't use up lots of memory for streaming, one could even leverage the 
lazyness of traversers (and indexes) for not even executing/fetching results 
that are not going to be sent over the wire.

This should accommodate every kind of client from the mobile phone which only 
lists a few entries, to the big machine that can eat a firehose of result data 
in milliseconds.

For this kind of "look-ahead" support we could (and should) add an possible 
offset, so that a client can request data (whose order _he_ is sure hasn't 
changed) by having the server skipping the first n entries (so they don't have 
to be serialized/put on the wire).

I also think that this streaming API could already address many of the 
pain-points of the current REST API. Perhaps we even want to provide a 
streaming interface in both directions, having the client being able to for 
instance stream the creation of nodes and relationships and their indexing 
without restarting a connection for each operation. Whatever comes in this 
stream could also be processed in one TX (or with TX tokens embedded in the 
stream the client could even control that).

The only question that is posing here for me is if we want to put it on top of 
the existing REST API or rather create a more concise API/formats for that 
(with the later option of the format even degrading to binary for high bandwith 
interaction). I'd prefer the latter.

Cheers

Michael

Am 21.04.2011 um 21:09 schrieb Rick Bullotta:

> Jim, we should schedule a group chat on this topic.
> 
> 
> 
> - Reply message -
> From: "Jim Webber" 
> Date: Thu, Apr 21, 2011 11:01 am
> Subject: [Neo4j] REST results pagination
> To: "Neo4j user discussions" 
> 
> This is indeed a good dialogue. The pagination versus streaming was something 
> I'd previously had in my mind as orthogonal issues, but I like the direction 
> this is going. Let's break it down to fundamentals:
> 
> As a remote client, I want to be just as rich and performant as a local 
> client. Unfortunately,  Deutsch, Amdahl and Einstein are against me on that, 
> and I don't think I am tough enough to defeat those guys.
> 
> So what are my choices? I know I have to be more "granular" to try to 
> alleviate some of the network penalty so doing operations bulkily sounds 
> great.
> 
> Now what I need to decide is whether I control the rate at which those bulk 
> operations occur or whether the server does. If I want to control those 
> operations, then paging seems sensible. Otherwise a streamed (chunked) 
> encoding scheme would make sense if I'm happy for the server to throw results 
> back at me at its own pace. Or indeed you can mix both so that pages are 
> streamed.
> 
> In either case if I get bored of those results, I'll stop paging or I'll 
> terminate the connection.
> 
> So what does this mean for implementation on the server? I guess this is 
> important since it affects the likelihood of the Neo Tech team implementing 
> it.
> 
> If the server supports pagination, it means we need a paging controller in 
> memory per paginated result set being created. If we assume that we'll only 
> go forward in pages, that's effectively just a wrapper around the traversal 
> that's been uploaded. The overhead should be modest, and apart from the 
> paging controller and the traverser, it doesn't need much state. We would 
> need to add some logic to the representation code to support "next" links, 
> but that seems a modest task.
> 
> If the server streams, we will need to decouple the representation generation 
> from the existing representation logic since that builds an in-memory 
> representation which is then flushed. Instead we'll need a streaming 
> representation implementation which seems to be a reasonable amount of 
> engineering. We'll also need a new streaming binding to the REST server in 
> JAX-RS land.
> 
> I'm still a bit concerned about how "rude" it is for a client to just drop a 
> streaming connection. I've asked Mark Nottingham for his authoritative 
> opinion on that. But still, this does seem popular and feasible.
> 
> Jim
> 
> 
> 
> 
> 
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Rick Otten
Half-baked thoughts from a neo4j newbie hacker type on this topic:

1)  I think it is very important, even with modern infrastructures, for
the client to be able to optionally throttle the result set it generates
with a query as it sees fit, and not just because of client memory and
bandwidth limitations.

With regular old SQL databases if you send a careless large query, you
can chew up significant system resources, for significant amounts of
time while it is being processed.  At a minimum, a rowcount/pagination
option allows you to build something into your client which can
minimize accidental denial of service queries.   I'm not sure if it is
possible to construct a query against a large Neo4j database that
would temporarily cripple it, but it wouldn't surprise me if you
could.


2) Sometimes with regular old SQL databases I'll run a sanity check
"count()" function with the query to just return the size of the expected
result set before I try to pull it back into my data structure.  Many
times "count()" is all I needed anyhow.   Does Neo4j have a result set
size function?  Perhaps a client that really could only handle small
result sets could run a count(), and then filter the search somehow, if
necessary, until the count() was smaller?  (I guess it would depend on the
problem domain...)

   In other words it may be possible, when it is really important, to
implement pagination logic on the client side, if you don't mind
running multiple queries for each set of data you get back.


3)  If the result set was broken into pages, you could organize the pages
in the server with a set of [temporary] graph nodes with relationships to
the results in the database -- one node for each page, and a parent node
for the result set.   If order of the pages is important, you could add
directed relationships between the page nodes.  If the order within the
pages is important you could either apply a sequence numbering to the
page-result relationship, or add directed temporary result set directed
relationships too.

Subsequent page retrievals would be new traversals based on the search
result set graph.  In a sense you would be building a temporary
graph-index I suppose.

And advantage to organizing search result sets this way is that you
could then union and intersect result sets (and do other set
operations) without a huge memory overhead.  (Which means you could
probably store millions of search results at one time, and you could
persist them through restarts.)



4) In some HA architectures you may have multiple database copies behind a
load balancer.  Would the search result pages be stored equally on all of
them?  Would the client require a "sticky" flag, to always go back to the
same specific server instance for more pages?

   Depending on how fast writes get propagated across the cluster
(compared to requests for the next page), if you were creating nodes as
described in (3) would that work?



5) As for sorting:

   In my experience, if I need a result set sorted from a regular SQL
database, I will usually sort it myself.  Most databases I've ever
worked with routinely have performance problems.  You can minimize
finger pointing and the risk of complicating those other performance
problems by just directing the database to get me what I need, I'll do
the rest of it back in the client.

   On the other hand, sometimes it is quicker and easier to let the
database do the work. (Usually when I can only handle the data in small
chunks on the client.)

   What I'm trying to say, is that I think sorting is going to be more
important to clients who want paginated results (ie, using resource
limited clients), than to clients who are grabbing large chunks of data
at a time (and will want to "own" any post-query processing steps
anyhow).


-- 
Rick Otten
rot...@windfish.net
O=='=+


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Michael Hunger
Rick,

great thoughts.

Good catch, forgot to add the in-graph representation of the results to my 
mail, thanks for adding that part. Temporary (transient) nodes and 
relationships would really rock here, with the advantage that with HA you have 
them distributed to all cluster nodes.
Certainly Craig has to add some interesting things to this, as those resemble 
probably his in graph indexes / R-Trees.

As traversers are lazy a count operation is not so easily possible, you could 
run the traversal and discard the results. But then the client could also just 
pull those results until it reaches its
internal tresholds and then decide to use more filtering or stop the pulling 
and ask the user for more filtering (you can always retrieve n+1 and show the 
user that there are more that "n" results available).

The index result size() method only returns an estimate of the result size 
(which might not contain currently changed index entries).

Please don't forget that a count() query in a RDBMS can be as ridicully 
expensive as the original query (especially if just the column selection was 
replaced with count, and sorting, grouping etc was still left in place together 
with lots of joins). 

Sorting on your own instead of letting the db do that mostly harms the 
performance as it requires you to build up all the data in memory, sort it and 
then use it. Instead of having the db do that more efficiently, stream the data 
and you can use it directly from the stream.

Cheers

Michael

Am 21.04.2011 um 23:04 schrieb Rick Otten:

> Half-baked thoughts from a neo4j newbie hacker type on this topic:
> 
> 1)  I think it is very important, even with modern infrastructures, for
> the client to be able to optionally throttle the result set it generates
> with a query as it sees fit, and not just because of client memory and
> bandwidth limitations.
> 
>With regular old SQL databases if you send a careless large query, you
> can chew up significant system resources, for significant amounts of
> time while it is being processed.  At a minimum, a rowcount/pagination
> option allows you to build something into your client which can
> minimize accidental denial of service queries.   I'm not sure if it is
> possible to construct a query against a large Neo4j database that
> would temporarily cripple it, but it wouldn't surprise me if you
> could.
> 
> 
> 2) Sometimes with regular old SQL databases I'll run a sanity check
> "count()" function with the query to just return the size of the expected
> result set before I try to pull it back into my data structure.  Many
> times "count()" is all I needed anyhow.   Does Neo4j have a result set
> size function?  Perhaps a client that really could only handle small
> result sets could run a count(), and then filter the search somehow, if
> necessary, until the count() was smaller?  (I guess it would depend on the
> problem domain...)
> 
>   In other words it may be possible, when it is really important, to
> implement pagination logic on the client side, if you don't mind
> running multiple queries for each set of data you get back.
> 
> 
> 3)  If the result set was broken into pages, you could organize the pages
> in the server with a set of [temporary] graph nodes with relationships to
> the results in the database -- one node for each page, and a parent node
> for the result set.   If order of the pages is important, you could add
> directed relationships between the page nodes.  If the order within the
> pages is important you could either apply a sequence numbering to the
> page-result relationship, or add directed temporary result set directed
> relationships too.
> 
>Subsequent page retrievals would be new traversals based on the search
> result set graph.  In a sense you would be building a temporary
> graph-index I suppose.
> 
>And advantage to organizing search result sets this way is that you
> could then union and intersect result sets (and do other set
> operations) without a huge memory overhead.  (Which means you could
> probably store millions of search results at one time, and you could
> persist them through restarts.)
> 
> 
> 
> 4) In some HA architectures you may have multiple database copies behind a
> load balancer.  Would the search result pages be stored equally on all of
> them?  Would the client require a "sticky" flag, to always go back to the
> same specific server instance for more pages?
> 
>   Depending on how fast writes get propagated across the cluster
> (compared to requests for the next page), if you were creating nodes as
> described in (3) would that work?
> 
> 
> 
> 5) As for sorting:
> 
>   In my experience, if I need a result set sorted from a regular SQL
> database, I will usually sort it myself.  Most databases I've ever
> worked with routinely have performance problems.  You can minimize
> finger pointing and the risk of complicating those other performance
> problems by just directing the database to get me what I need, I'll 

Re: [Neo4j] REST results pagination

2011-04-22 Thread Craig Taverner
>
> Good catch, forgot to add the in-graph representation of the results to my
> mail, thanks for adding that part. Temporary (transient) nodes and
> relationships would really rock here, with the advantage that with HA you
> have them distributed to all cluster nodes.
> Certainly Craig has to add some interesting things to this, as those
> resemble probably his in graph indexes / R-Trees.
>

I certainly make use of this model, much more so for my statistical analysis
than for graph indexes (but I'm planning to merge indexes and statistics).

However, in my case the structures are currently very domain specific. But I
think the idea is sound and should be generalizable. What I do is have a
concept of a 'dataset' on which queries can be performed. The dataset is
usually the root of a large sub-graph. The query parser (domain specific)
creates a hashcode of the query, checks if the dataset node already has a
resultset (as a connected sub-graph with its own root node containing the
previous query hashcode), and if so return that (traverse it), otherwise
perform the complete dataset traversal, creating the resultset as a new
subgraph and then return it. This works well specifically for statistical
queries, where the resultset is much smaller than the dataset, so adding new
subgraphs has small impact on the database size, and the resultset is much
faster to return, so this is a performance enhancement for multiple requests
from the client. Also, I keep the resultset permanently, not temporarily.
Very few operations modify the dataset, and if they do, we delete all
resultsets, and they get re-created the next time. My work on merging the
indexes with the statistics is also planned to only recreate 'dirty' subsets
of the result-set, so modifying the dataset has minimal impact on the query
performance.

After reading Rick's previous email I started thinking of approaches to
generalizing this, but I think your 'transient' nodes perhaps encompass
everything I thought about. Here is an idea:

   - Have new nodes/relations/properties tables on disk, like a second graph
   database, but different in the sense that it has one-way relations into the
   main database, which cannot be seen by the main graph and so are by
   definition not part of the graph. These can have transience and expiry
   characteristics. Then we can build the resultset graphs as transient graphs
   in the transient database, with 'drill-down' capabilities to the original
   graph (something I find I always need for statistical queries, and something
   a graph is simply much better at than a relational database).
   - Use some kind of hashcode in the traversal definition or query to
   identify existing, cached, transient graphs in the second database, so you
   can rely on those for repeated queries, or pagination or streaming, etc.

As traversers are lazy a count operation is not so easily possible, you
> could run the traversal and discard the results. But then the client could
> also just pull those results until it reaches its
> internal tresholds and then decide to use more filtering or stop the
> pulling and ask the user for more filtering (you can always retrieve n+1 and
> show the user that there are more that "n" results available).
>

Yes. Count needs to perform the traversal. So the only way to not have to
traverse twice is to keep a cache. If we make the cache a transient
sub-graph (possibly in the second database I described above), then we have
the interesting behaviour that count() takes a while, but subsequent
queries, pagination or streaming, are fast.

Please don't forget that a count() query in a RDBMS can be as ridicully
> expensive as the original query (especially if just the column selection was
> replaced with count, and sorting, grouping etc was still left in place
> together with lots of joins).
>

Good to hear they have the same problem as us :-)
(or even more problems)

Sorting on your own instead of letting the db do that mostly harms the
> performance as it requires you to build up all the data in memory, sort it
> and then use it. Instead of having the db do that more efficiently, stream
> the data and you can use it directly from the stream.
>

Client side sorting makes sense if you know the domain well enough to know,
for example, you will receive a small enough result set to 'fit' in the
client, and want to give the user multiple interactive sort options without
hitting the database again. But I agree that in general it makes sense to
get the database to do the sort.

Cheers, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Rick Otten
> Client side sorting makes sense if you know the domain well enough to
> know, for example, you will receive a small enough result set to 'fit'
> in the client, and want to give the user multiple interactive sort
> options without hitting the database again. But I agree that in general
>it makes sense to get the database to do the sort.
>

I'll concede this point.  In general it should be better to do the sorts
on the database server, which is typically by design a hefty backend
system that is optimized for that sort of processing.

In my experience with regular SQL databases, unfortunately they typically
only scale vertically, and are usually running on expensive
enterprise-grade hardware.  Most of the ones I've worked either run on
minimally sized hardware or have quickly outgrown their hardware.

So they are always either:
  1) Currently suffering from a capacity problem.
  2) Just recovering from a capacity problem.
  3) Heading rapidly towards a new capacity problem.

The next problem I run into is a political, rather than technical one. 
The database administration team is often a different group of people from
the appserver/front end development team.   The guys writing the queries
are usually closer to the appserver than the database.  In other words, it
is easier for them to manage a problem in the appserver, than it is to
manage a problem in the database.

So, instead of having a deep well of data processing power to draw on, and
then using a wide layer of thin commodity hardware presentation layer
servers, we end up transferring data processing power out of the data
server and into the presentation layer.

As we evolve into building data processing systems which can scale
horizontally on commodity hardware, the perpetual capacity problems the
legacy vertical databases suffer from may wane, finally freeing the other
layers from having to "pick up some of the slack".


-- 
Rick Otten
rot...@windfish.net
O=='=+


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Tobias Ivarsson
On Thu, Apr 21, 2011 at 11:18 PM, Michael Hunger <
michael.hun...@neotechnology.com> wrote:

> Rick,
>
> great thoughts.
>
> Good catch, forgot to add the in-graph representation of the results to my
> mail, thanks for adding that part. Temporary (transient) nodes and
> relationships would really rock here, with the advantage that with HA you
> have them distributed to all cluster nodes.
> Certainly Craig has to add some interesting things to this, as those
> resemble probably his in graph indexes / R-Trees.
>
> As traversers are lazy a count operation is not so easily possible, you
> could run the traversal and discard the results. But then the client could
> also just pull those results until it reaches its
> internal tresholds and then decide to use more filtering or stop the
> pulling and ask the user for more filtering (you can always retrieve n+1 and
> show the user that there are more that "n" results available).
>
> The index result size() method only returns an estimate of the result size
> (which might not contain currently changed index entries).
>
> Please don't forget that a count() query in a RDBMS can be as ridicully
> expensive as the original query (especially if just the column selection was
> replaced with count, and sorting, grouping etc was still left in place
> together with lots of joins).
>
> Sorting on your own instead of letting the db do that mostly harms the
> performance as it requires you to build up all the data in memory, sort it
> and then use it. Instead of having the db do that more efficiently, stream
> the data and you can use it directly from the stream.
>

throw new SlapOnTheFingersException("sometimes the application developer can
do a better job since she has better knowledge of the data, the database
only has generic knowledge");

Since Jake had already mentioned (in this very thread) that he expected one
of those, I thought I might as well throw one in there.

I agree with the analysis of count(), as the name ("count") implies, it will
have to run the entire query in order to count the number of resulting
items.

About sorting I'm torn. The perception of sorting in the database being slow
that Rick points to is one that I've seen a lot. When you hand the
responsibility of sorting to the database you hide the fact that sorting is
an expensive operation, it does require reading in all data in order to sort
it. People often expect databases to be "smarter" than that, since they
sometimes are, but that is pretty much only when reading straight from an
index and not doing much more. A generic sort of data can never be better
than O(log(n!)) [O(log(n!)) is almost equal to, and commonly rounded to the
easier to compute function O(n log(n))]. If you put the responsibility of
sorting in the hands of the application you can sometimes utilize knowledge
about the data to do a more efficient sorting than the database could have
done. Most often by simply doing an application level filtering of the data
before sorting it, based on some filtering that could not be transfered to
the database query. This does however make the work of the application
developer slightly more tedious, which is why I think it would be sensible
to have support for sorting on the database level, and hope that users will
be sensible about using it, and not assume magic from it.

Something I find very interesting is the concept of semi-sorted data.
Semi-sorted data is often good enough, easier to achieve, and quite easy to
then sort completely if that is required. Examples of semi-sorted data could
be data in an order that satisfies the heap property. Or for spatial queries
returning the closest hits first, but not necessarily in perfect order, say
returning the hits within a miles radius first, before the ones in a radius
between 1-10 miles, and so on, without requiring the hits in each 'segment'
to be perfectly ordered by distance. Breadth first order is another example
of semi-sorted data, that could be used when traversing data as you've
outlined with "paging nodes", or similarly "grouped by parent node"-order.

I must say that I really enjoy following this discussion. I really like the
idea of streaming, since I think that can be implemented more easily than
paging, while satisfying many of the desired use cases. But I still want to
hear more arguments for and against both alternatives. And as has already
been pointed out, they aren't mutually exclusive.

I'll keep listening in on the conversation, but I don't have much more to
add at this point. I have one desire for the structure of the conversation
though. When you quote what someone else has said before you, could you
please include who that person was, it makes going back and reading the full
context easier.

Cheers,
-- 
Tobias Ivarsson 
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Michael DeHaan
On Thu, Apr 21, 2011 at 5:00 PM, Michael Hunger
 wrote:
> Really cool discussion so far,
>
> I would also prefer streaming over paging as with that approach we can give 
> both ends more of the control they need.

Just in case we're not talking about the same kind of streaming --
when I think streaming, I think "streaming uploads", "streaming
downloads", etc.

If the REST format is JSON (or XML, whatever), that's a /document/ so
you can't just say "read the next (up to) 512 bytes" and work on it.
It becomes a more low-level endeavor because if you're in the middle
of reading a record, or don't even have the "end of list" terminator,
what you have isn't parseable yet.  I'm sure a lot of hacking could be
done to make the client figure out if he had enough other than the
closing array element, but it's a lot to ask of a JSON client.

So I'm interested in how, in that proposal, the REST API might stream
results to a client, because for the streaming to be meaningful, you
need to be able to parse what you get back and know where the
boundaries are (or build a buffer until you fill in a datastructure
enough to operate on it).

I don't see that working with JSON/REST so much.   It seems to imply a
message bus.

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Jim Webber
Hi Michael,

> Just in case we're not talking about the same kind of streaming --
> when I think streaming, I think "streaming uploads", "streaming
> downloads", etc.

I'm thinking "chunked" transfers. That is the server starts sending a response 
and then eventually terminates it when the whole response has been sent to the 
client.

Although it seems a bit rude, the client could simply opt to close the 
connection when it's "read enough" providing what it has read makes sense. 
Sometimes document fragments can make sense:


   
 
   
   

   

   

   
   

   


In this case we certainly don't have well-formed XML, but some streaming API 
(e.g. stax) might already have been able to create some local objects on the 
client side as the Earth and Mars nodes came in.

I don't think this is elegant at all, but it might be practical. I've asked 
Mark Nottingham for his view on this since he's pretty sensible about Web 
things.

Jim




___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Georg Summer
I might be a little newbish here, but then why not an Iterator?
The iterator lives on the server and is accessible through the REST
interface, providing a advance and value method. It either operates on a
stored and once-created-stable result set or holds the query and evaluates
it on demand (issues of changing underlying graph included).

The client can have paginator functionality by advancing and derefing the
iterator n times or streaming-like behaviour by constantly pushing the
obtained data into a queue and keep on going.

If the client does not need the iterator anymore he simple stops using it
and a timeout kills it eventually on the server. a client-callable delete
method for the iterator would work as well.


Georg

On 22 April 2011 18:43, Jim Webber  wrote:

> Hi Michael,
>
> > Just in case we're not talking about the same kind of streaming --
> > when I think streaming, I think "streaming uploads", "streaming
> > downloads", etc.
>
> I'm thinking "chunked" transfers. That is the server starts sending a
> response and then eventually terminates it when the whole response has been
> sent to the client.
>
> Although it seems a bit rude, the client could simply opt to close the
> connection when it's "read enough" providing what it has read makes sense.
> Sometimes document fragments can make sense:
>
> 
>   
> 
>   
>   
>
>   
> 
>   
>
>   
>   
>
>   
> 
>
> In this case we certainly don't have well-formed XML, but some streaming
> API (e.g. stax) might already have been able to create some local objects on
> the client side as the Earth and Mars nodes came in.
>
> I don't think this is elegant at all, but it might be practical. I've asked
> Mark Nottingham for his view on this since he's pretty sensible about Web
> things.
>
> Jim
>
>
>
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Jim Webber
Hi Georg,

It would at least have to be an iterator over pages - otherwise the results 
tend to be fine-grained and so horribly inefficient for sending over a network.

Jim

On 22 Apr 2011, at 18:24, Georg Summer wrote:

> I might be a little newbish here, but then why not an Iterator?
> The iterator lives on the server and is accessible through the REST
> interface, providing a advance and value method. It either operates on a
> stored and once-created-stable result set or holds the query and evaluates
> it on demand (issues of changing underlying graph included).
> 
> The client can have paginator functionality by advancing and derefing the
> iterator n times or streaming-like behaviour by constantly pushing the
> obtained data into a queue and keep on going.
> 
> If the client does not need the iterator anymore he simple stops using it
> and a timeout kills it eventually on the server. a client-callable delete
> method for the iterator would work as well.
> 
> 
> Georg
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Rick Bullotta
Sax (or stax) is an example of streaming with a higher level format, but there 
are plenty of other ways as well.  The *critical* performance element is to 
*never* have to accumulate an entire intermediate document on either side (eg 
json object or xml Dom) if you can avoid it.  You end up requiring 4x the 
resources (or more), extra latency, more parsing, and more garbage collection.

I'll get with Jim webber and propose a prototype of alternatives.

Note also that the lack of binary I/O in the browser without 
flash/java/silverlight is a challenge, but we can work around it.




- Reply message -
From: "Michael DeHaan" 
Date: Fri, Apr 22, 2011 12:18 pm
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

On Thu, Apr 21, 2011 at 5:00 PM, Michael Hunger
 wrote:
> Really cool discussion so far,
>
> I would also prefer streaming over paging as with that approach we can give 
> both ends more of the control they need.

Just in case we're not talking about the same kind of streaming --
when I think streaming, I think "streaming uploads", "streaming
downloads", etc.

If the REST format is JSON (or XML, whatever), that's a /document/ so
you can't just say "read the next (up to) 512 bytes" and work on it.
It becomes a more low-level endeavor because if you're in the middle
of reading a record, or don't even have the "end of list" terminator,
what you have isn't parseable yet.  I'm sure a lot of hacking could be
done to make the client figure out if he had enough other than the
closing array element, but it's a lot to ask of a JSON client.

So I'm interested in how, in that proposal, the REST API might stream
results to a client, because for the streaming to be meaningful, you
need to be able to parse what you get back and know where the
boundaries are (or build a buffer until you fill in a datastructure
enough to operate on it).

I don't see that working with JSON/REST so much.   It seems to imply a
message bus.

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Rick Bullotta
That would need to hold resources on the server (potentially for an 
indeterminate amount of time) since it must be stateful.  In general, stateful 
apis do not scale well in cases of dynamic queries.






- Reply message -
From: "Georg Summer" 
Date: Fri, Apr 22, 2011 1:25 pm
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

I might be a little newbish here, but then why not an Iterator?
The iterator lives on the server and is accessible through the REST
interface, providing a advance and value method. It either operates on a
stored and once-created-stable result set or holds the query and evaluates
it on demand (issues of changing underlying graph included).

The client can have paginator functionality by advancing and derefing the
iterator n times or streaming-like behaviour by constantly pushing the
obtained data into a queue and keep on going.

If the client does not need the iterator anymore he simple stops using it
and a timeout kills it eventually on the server. a client-callable delete
method for the iterator would work as well.


Georg

On 22 April 2011 18:43, Jim Webber  wrote:

> Hi Michael,
>
> > Just in case we're not talking about the same kind of streaming --
> > when I think streaming, I think "streaming uploads", "streaming
> > downloads", etc.
>
> I'm thinking "chunked" transfers. That is the server starts sending a
> response and then eventually terminates it when the whole response has been
> sent to the client.
>
> Although it seems a bit rude, the client could simply opt to close the
> connection when it's "read enough" providing what it has read makes sense.
> Sometimes document fragments can make sense:
>
> 
>   
> 
>   
>   
>
>   
> 
>   
>
>   
>   
>
>   
> 
>
> In this case we certainly don't have well-formed XML, but some streaming
> API (e.g. stax) might already have been able to create some local objects on
> the client side as the Earth and Mars nodes came in.
>
> I don't think this is elegant at all, but it might be practical. I've asked
> Mark Nottingham for his view on this since he's pretty sensible about Web
> things.
>
> Jim
>
>
>
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Rick Bullotta
I'll be happy to host the streaming rest api "summit".  Ample amounts of beer 
will be provided.;-)


- Reply message -
From: "Jim Webber" 
Date: Fri, Apr 22, 2011 1:46 pm
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

Hi Georg,

It would at least have to be an iterator over pages - otherwise the results 
tend to be fine-grained and so horribly inefficient for sending over a network.

Jim

On 22 Apr 2011, at 18:24, Georg Summer wrote:

> I might be a little newbish here, but then why not an Iterator?
> The iterator lives on the server and is accessible through the REST
> interface, providing a advance and value method. It either operates on a
> stored and once-created-stable result set or holds the query and evaluates
> it on demand (issues of changing underlying graph included).
>
> The client can have paginator functionality by advancing and derefing the
> iterator n times or streaming-like behaviour by constantly pushing the
> obtained data into a queue and keep on going.
>
> If the client does not need the iterator anymore he simple stops using it
> and a timeout kills it eventually on the server. a client-callable delete
> method for the iterator would work as well.
>
>
> Georg
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Michael Hunger
And you would want to reuse your connection so you don't have to pay this 
penalty per request

Just asking how would duch a REST Resource iterator look like -URI, verbs, 
request,response formats?

I assume then evety query (index,traversal) would just return the iterator URI 
for later consumption. If we store the query and/or result information (as 
discussed by Crsig and others) at the "node" returned as iterator this would be 
a nice fit.

M
Sent from my iBrick4


Am 22.04.2011 um 19:46 schrieb Jim Webber :

> Hi Georg,
> 
> It would at least have to be an iterator over pages - otherwise the results 
> tend to be fine-grained and so horribly inefficient for sending over a 
> network.
> 
> Jim
> 
> On 22 Apr 2011, at 18:24, Georg Summer wrote:
> 
>> I might be a little newbish here, but then why not an Iterator?
>> The iterator lives on the server and is accessible through the REST
>> interface, providing a advance and value method. It either operates on a
>> stored and once-created-stable result set or holds the query and evaluates
>> it on demand (issues of changing underlying graph included).
>> 
>> The client can have paginator functionality by advancing and derefing the
>> iterator n times or streaming-like behaviour by constantly pushing the
>> obtained data into a queue and keep on going.
>> 
>> If the client does not need the iterator anymore he simple stops using it
>> and a timeout kills it eventually on the server. a client-callable delete
>> method for the iterator would work as well.
>> 
>> 
>> Georg
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Michael Hunger
I spent some time looking at what others are doing for inspiration.

I kind of like the Riak/Basho approach with multipart-chunks and the approach 
of explictely creating a resource for the query that can be navigated (either 
via pages or first,next,[prev,last] links) and expires (and could be 
reconstructed).

Cheers

Michael

Good discussion: 
http://stackoverflow.com/questions/924472/paging-in-a-rest-collection

CouchDB: 
http://wiki.apache.org/couchdb/HTTP_Document_API
startKey + limit, endKey + limit, sorting, insert/update order

Mongooese: [cursor-id]+batch_size


OrientDB: .../[limit]

Sones: no real rest API, but a SQL on top of the graph: 
http://developers.sones.de/documentation/graph-query-language/select/
with limit, offset, but also depth (for graph)

HBase explcitly creates scanners, which can be then access with next 
operations, and expire after no activity for a certain timeout


riak:
http://wiki.basho.com/REST-API.html
 client-id header for client identification -> sticky?
optional query parameters for including properties, and if to stream the data 
keys=[true,false,stream]

If “keys=stream”, the response will be transferred using chunked-encoding, 
where each chunk is a JSON object. The first chunk will contain the “props” 
entry (if props was not set to false). Subsequent chunks will contain 
individual JSON objects with the “keys” entry containing a sublist of the total 
keyset (some sublists may be empty).
riak seems to support partial json, non closed elements: -d 
'{"props":{"n_val":5'

returns multiple responses in one go, Content-Type: multipart/mixed; 
boundary=YinLMzyUR9feB17okMytgKsylvh

--YinLMzyUR9feB17okMytgKsylvh
Content-Type: application/x-www-form-urlencoded
Link: ; rel="up"
Etag: 16vic4eU9ny46o4KPiDz1f
Last-Modified: Wed, 10 Mar 2010 18:01:06 GMT

{"bar":"baz"}
(this block can be repeated n times)
--YinLMzyUR9feB17okMytgKsylvh--
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0

Query results:
Content-Type – always multipart/mixed, with a boundary specified
Understanding the response body

The response body will always be multipart/mixed, with each chunk representing 
a single phase of the link-walking query. Each phase will also be encoded in 
multipart/mixed, with each chunk representing a single object that was found. 
If no objects were found or “keep” was not set on the phase, no chunks will be 
present in that phase. Objects inside phase results will include Location 
headers that can be used to determine bucket and key. In fact, you can treat 
each object-chunk similarly to a complete response from read object, without 
the status code.
< HTTP/1.1 200 OK
< Server: MochiWeb/1.1 WebMachine/1.6 (eat around the stinger)
< Expires: Wed, 10 Mar 2010 20:24:49 GMT
< Date: Wed, 10 Mar 2010 20:14:49 GMT
< Content-Type: multipart/mixed; boundary=JZi8W8pB0Z3nO3odw11GUB4LQCN
< Content-Length: 970
<

--JZi8W8pB0Z3nO3odw11GUB4LQCN
Content-Type: multipart/mixed; boundary=OjZ8Km9J5vbsmxtcn1p48J91cJP

--OjZ8Km9J5vbsmxtcn1p48J91cJP
Content-Type: application/json
Etag: 3pvmY35coyWPxh8mh4uBQC
Last-Modified: Wed, 10 Mar 2010 20:14:13 GMT

{"riak":"CAP"}
--OjZ8Km9J5vbsmxtcn1p48J91cJP--

--JZi8W8pB0Z3nO3odw11GUB4LQCN
Content-Type: multipart/mixed; boundary=RJKFlAs9PrdBNfd74HANycvbA8C

--RJKFlAs9PrdBNfd74HANycvbA8C
Location: /riak/test/doc2
Content-Type: application/json
Etag: 6dQBm9oYA1mxRSH0e96l5W
Last-Modified: Wed, 10 Mar 2010 18:11:41 GMT

{"foo":"bar"}
--RJKFlAs9PrdBNfd74HANycvbA8C--

--JZi8W8pB0Z3nO3odw11GUB4LQCN--
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0

Riak - MapReduce:
Optional query parameters:

* chunked – when set to true, results will be returned one at a time in 
multipart/mixed format using chunked-encoding.
Important headers:

* Content-Type – application/json when chunked is not true, 
otherwise multipart/mixed with application/json parts

Other interesting endpoints: /ping, /stats
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-23 Thread Rick Bullotta
Let's discuss sometime soon.  Creating resources that need to be cached or 
saved in session state bring with them a whole bunch of negative aspects...



- Reply message -
From: "Michael Hunger" 
Date: Fri, Apr 22, 2011 10:57 pm
Subject: [Neo4j] REST results pagination
To: "Neo4j user discussions" 

I spent some time looking at what others are doing for inspiration.

I kind of like the Riak/Basho approach with multipart-chunks and the approach 
of explictely creating a resource for the query that can be navigated (either 
via pages or first,next,[prev,last] links) and expires (and could be 
reconstructed).

Cheers

Michael

Good discussion:
http://stackoverflow.com/questions/924472/paging-in-a-rest-collection

CouchDB:
http://wiki.apache.org/couchdb/HTTP_Document_API
startKey + limit, endKey + limit, sorting, insert/update order

Mongooese: [cursor-id]+batch_size


OrientDB: .../[limit]

Sones: no real rest API, but a SQL on top of the graph: 
http://developers.sones.de/documentation/graph-query-language/select/
with limit, offset, but also depth (for graph)

HBase explcitly creates scanners, which can be then access with next 
operations, and expire after no activity for a certain timeout


riak:
http://wiki.basho.com/REST-API.html
 client-id header for client identification -> sticky?
optional query parameters for including properties, and if to stream the data 
keys=[true,false,stream]

If “keys=stream”, the response will be transferred using chunked-encoding, 
where each chunk is a JSON object. The first chunk will contain the “props” 
entry (if props was not set to false). Subsequent chunks will contain 
individual JSON objects with the “keys” entry containing a sublist of the total 
keyset (some sublists may be empty).
riak seems to support partial json, non closed elements: -d 
'{"props":{"n_val":5'

returns multiple responses in one go, Content-Type: multipart/mixed; 
boundary=YinLMzyUR9feB17okMytgKsylvh

--YinLMzyUR9feB17okMytgKsylvh
Content-Type: application/x-www-form-urlencoded
Link: ; rel="up"
Etag: 16vic4eU9ny46o4KPiDz1f
Last-Modified: Wed, 10 Mar 2010 18:01:06 GMT

{"bar":"baz"}
(this block can be repeated n times)
--YinLMzyUR9feB17okMytgKsylvh--
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0

Query results:
Content-Type – always multipart/mixed, with a boundary specified
Understanding the response body

The response body will always be multipart/mixed, with each chunk representing 
a single phase of the link-walking query. Each phase will also be encoded in 
multipart/mixed, with each chunk representing a single object that was found. 
If no objects were found or “keep” was not set on the phase, no chunks will be 
present in that phase. Objects inside phase results will include Location 
headers that can be used to determine bucket and key. In fact, you can treat 
each object-chunk similarly to a complete response from read object, without 
the status code.
< HTTP/1.1 200 OK
< Server: MochiWeb/1.1 WebMachine/1.6 (eat around the stinger)
< Expires: Wed, 10 Mar 2010 20:24:49 GMT
< Date: Wed, 10 Mar 2010 20:14:49 GMT
< Content-Type: multipart/mixed; boundary=JZi8W8pB0Z3nO3odw11GUB4LQCN
< Content-Length: 970
<

--JZi8W8pB0Z3nO3odw11GUB4LQCN
Content-Type: multipart/mixed; boundary=OjZ8Km9J5vbsmxtcn1p48J91cJP

--OjZ8Km9J5vbsmxtcn1p48J91cJP
Content-Type: application/json
Etag: 3pvmY35coyWPxh8mh4uBQC
Last-Modified: Wed, 10 Mar 2010 20:14:13 GMT

{"riak":"CAP"}
--OjZ8Km9J5vbsmxtcn1p48J91cJP--

--JZi8W8pB0Z3nO3odw11GUB4LQCN
Content-Type: multipart/mixed; boundary=RJKFlAs9PrdBNfd74HANycvbA8C

--RJKFlAs9PrdBNfd74HANycvbA8C
Location: /riak/test/doc2
Content-Type: application/json
Etag: 6dQBm9oYA1mxRSH0e96l5W
Last-Modified: Wed, 10 Mar 2010 18:11:41 GMT

{"foo":"bar"}
--RJKFlAs9PrdBNfd74HANycvbA8C--

--JZi8W8pB0Z3nO3odw11GUB4LQCN--
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0

Riak - MapReduce:
Optional query parameters:

* chunked – when set to true, results will be returned one at a time in 
multipart/mixed format using chunked-encoding.
Important headers:

* Content-Type – application/json when chunked is not true, 
otherwise multipart/mixed with application/json parts

Other interesting endpoints: /ping, /stats
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-26 Thread Jacob Hansson
Here at 49 responses, I'd like to reiterate Craigs point earlier that we are
really talking about several separate issues, and I'm wondering if we should
split this discussion up, because it is getting very hard to follow it.

As I see it, we are looking at* *three things:
*
Paging**
**Use case: *UI code that presents a paged, infinite scrolled or similar
interface to the user. Peeking at results for debugging or other purposes.
*
Streaming**
**Use case: *Returning huge data sets without killing anyone.
*
Sorting
**Use case: *Presenting lists of things to users, applications that care
about the order of results for some other reason.


I think we've come to agree that streaming and paging serve similar but
different purposes and are not quite able to replace each others
functionality.

I also took the liberty to elevate sorting to it's own topic, because I
believe it should be a generic thing you can do on a result set, whereas
paging and streaming are different means of returning a result set.



If we want to continue this discussion, would anyone object to splitting it
into these three parts?

/Jake

On Sun, Apr 24, 2011 at 2:18 AM, Rick Bullotta
wrote:

> Let's discuss sometime soon.  Creating resources that need to be cached or
> saved in session state bring with them a whole bunch of negative aspects...
>
>
>
> - Reply message -
> From: "Michael Hunger" 
> Date: Fri, Apr 22, 2011 10:57 pm
> Subject: [Neo4j] REST results pagination
> To: "Neo4j user discussions" 
>
> I spent some time looking at what others are doing for inspiration.
>
> I kind of like the Riak/Basho approach with multipart-chunks and the
> approach of explictely creating a resource for the query that can be
> navigated (either via pages or first,next,[prev,last] links) and expires
> (and could be reconstructed).
>
> Cheers
>
> Michael
>
> Good discussion:
> http://stackoverflow.com/questions/924472/paging-in-a-rest-collection
>
> CouchDB:
> http://wiki.apache.org/couchdb/HTTP_Document_API
> startKey + limit, endKey + limit, sorting, insert/update order
>
> Mongooese: [cursor-id]+batch_size
>
>
> OrientDB: .../[limit]
>
> Sones: no real rest API, but a SQL on top of the graph:
> http://developers.sones.de/documentation/graph-query-language/select/
> with limit, offset, but also depth (for graph)
>
> HBase explcitly creates scanners, which can be then access with next
> operations, and expire after no activity for a certain timeout
>
>
> riak:
> http://wiki.basho.com/REST-API.html
>  client-id header for client identification -> sticky?
> optional query parameters for including properties, and if to stream the
> data keys=[true,false,stream]
>
> If “keys=stream”, the response will be transferred using chunked-encoding,
> where each chunk is a JSON object. The first chunk will contain the “props”
> entry (if props was not set to false). Subsequent chunks will contain
> individual JSON objects with the “keys” entry containing a sublist of the
> total keyset (some sublists may be empty).
> riak seems to support partial json, non closed elements: -d
> '{"props":{"n_val":5'
>
> returns multiple responses in one go, Content-Type: multipart/mixed;
> boundary=YinLMzyUR9feB17okMytgKsylvh
>
> --YinLMzyUR9feB17okMytgKsylvh
> Content-Type: application/x-www-form-urlencoded
> Link: ; rel="up"
> Etag: 16vic4eU9ny46o4KPiDz1f
> Last-Modified: Wed, 10 Mar 2010 18:01:06 GMT
>
> {"bar":"baz"}
> (this block can be repeated n times)
> --YinLMzyUR9feB17okMytgKsylvh--
> * Connection #0 to host 127.0.0.1 left intact
> * Closing connection #0
>
> Query results:
> Content-Type – always multipart/mixed, with a boundary specified
> Understanding the response body
>
> The response body will always be multipart/mixed, with each chunk
> representing a single phase of the link-walking query. Each phase will also
> be encoded in multipart/mixed, with each chunk representing a single object
> that was found. If no objects were found or “keep” was not set on the phase,
> no chunks will be present in that phase. Objects inside phase results will
> include Location headers that can be used to determine bucket and key. In
> fact, you can treat each object-chunk similarly to a complete response from
> read object, without the status code.
> < HTTP/1.1 200 OK
> < Server: MochiWeb/1.1 WebMachine/1.6 (eat around the stinger)
> < Expires: Wed, 10 Mar 2010 20:24:49 GMT
> < Date: Wed, 10 Mar 2010 20:14:49 GMT
> < Content-Type: multipart/mixed; boundary=JZi8W8pB0Z3nO3odw11GUB4LQCN
> < Content-Length: 970
> <
>
> --JZi8W8pB0Z3nO3odw11GUB4LQCN
> Content-Type: multipart/mixed; boundary=OjZ8Km9J5vbsmxtcn1p48J91cJP
>
> --OjZ8Km9J5vbsmxtcn1p48J91cJP
> Content-Type: application/json
> Etag: 3pvmY35coyWPxh8mh4uBQC
> Last-Modified: Wed, 10 Mar 2010 20:14:13 GMT
>
> {"riak":"CAP"}
> --OjZ8Km9J5vbsmxtcn1p48J91cJP--
>
> --JZi8W8pB0Z3nO3odw11GUB4LQCN
> Content-Type: multipart/mixed; boundary=RJKFlAs9PrdBNfd74HANycvbA8C
>
> --RJKFlAs9PrdBNfd74HANycvbA8C
> Location: /riak/test/doc2
> Con

Re: [Neo4j] REST results pagination

2011-04-26 Thread Jim Webber
In addition to what Jake just said about splitting this thread apart, I'd like 
to bring up what Rick suggested about getting together to thrash this out.

Can you guys think about when we might want a skype call for this? We have to 
take into account timezones to cover from CET through to PST (unless anyone is 
even further east?). 

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-26 Thread Justin Cormack
On Fri, 2011-04-22 at 17:43 +0100, Jim Webber wrote:
> Hi Michael,
> 
> > Just in case we're not talking about the same kind of streaming --
> > when I think streaming, I think "streaming uploads", "streaming
> > downloads", etc.
> 
> I'm thinking "chunked" transfers. That is the server starts sending a 
> response and then eventually terminates it when the whole response has been 
> sent to the client.
> 
> Although it seems a bit rude, the client could simply opt to close the 
> connection when it's "read enough" providing what it has read makes sense. 
> Sometimes document fragments can make sense:

> In this case we certainly don't have well-formed XML, but some streaming API 
> (e.g. stax) might already have been able to create some local objects on the 
> client side as the Earth and Mars nodes came in.
> 
> I don't think this is elegant at all, but it might be practical. I've asked 
> Mark Nottingham for his view on this since he's pretty sensible about Web 
> things.

Any intermediate proxies would have to cache the whole thing; many
proxies are not designed for streaming responses so might read the whole
thing before relaying it (although they seem to be getting a bit better
at this with video over http). So the server would probably end up
generating the whole thing if there was a proxy in the path. 

I think its workable, but not sure it is ideal...

Justin


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-06-12 Thread James Thornton
Gremlin makes sorting results easy...

g.v(1).out.sort{it.lang}.reverse().toList()

For more details, see this thread in the Gremlin Users group
(https://groups.google.com/forum/#!topic/gremlin-users/A8bZHSOxgyA)

- James

http://jamesthornton.com/
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-06-12 Thread Peter Neubauer
Good call James, Pierre and Marko.

Added the example to the plugin tests and documentation.

https://github.com/neo4j/neo4j-gremlin-plugin/commit/e9d02a14c091f8a5880613af55631041806bc6a7

/peter

Sent from my phone.
On Jun 12, 2011 7:56 PM, "James Thornton"  wrote:
> Gremlin makes sorting results easy...
>
> g.v(1).out.sort{it.lang}.reverse().toList()
>
> For more details, see this thread in the Gremlin Users group
> (https://groups.google.com/forum/#!topic/gremlin-users/A8bZHSOxgyA)
>
> - James
>
> http://jamesthornton.com/
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user