Re: [Neo4j] REST results pagination

2011-04-22 Thread Michael DeHaan
On Thu, Apr 21, 2011 at 5:00 PM, Michael Hunger
 wrote:
> Really cool discussion so far,
>
> I would also prefer streaming over paging as with that approach we can give 
> both ends more of the control they need.

Just in case we're not talking about the same kind of streaming --
when I think streaming, I think "streaming uploads", "streaming
downloads", etc.

If the REST format is JSON (or XML, whatever), that's a /document/ so
you can't just say "read the next (up to) 512 bytes" and work on it.
It becomes a more low-level endeavor because if you're in the middle
of reading a record, or don't even have the "end of list" terminator,
what you have isn't parseable yet.  I'm sure a lot of hacking could be
done to make the client figure out if he had enough other than the
closing array element, but it's a lot to ask of a JSON client.

So I'm interested in how, in that proposal, the REST API might stream
results to a client, because for the streaming to be meaningful, you
need to be able to parse what you get back and know where the
boundaries are (or build a buffer until you fill in a datastructure
enough to operate on it).

I don't see that working with JSON/REST so much.   It seems to imply a
message bus.

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Michael DeHaan
>
> 3) machine users could care less about paging

My thoughts are that parsing very large documents can perform poorly
and requires the entire document be slurped into (available) RAM.
This puts a cap on the size of a usable resultset and slows
processing, or at least makes you pay an up-front cost, and decreases
potential for parallelism in other parts of your app?.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Michael DeHaan
>
> This is important for the integration of the Neo4j Python Rest Client
> in Django, because I'm currently developing an application with lazy
> and user-defined schemas on top of Django and Neo4j. The listing of
> nodes and relationships is a requirement for me, so the pagination is
> a must in my aplication. Performing this in the application layer
> instead of Neo4j server side, wastes a lot of time sending information
> via REST.

Well put about the listing of nodes and relationships.   That's the
use case where this comes up.

If I can't trust that my app's code indexed something correctly, or I
need to index old data later, I may need to walk the whole
graph to update the indexes, so large result sets become scary.   I
don't think I can rely on a traverse as part of the graphs might be
disjoint.

New use cases on old data mean we'll have to do that, just like adding
a new index to a SQL db.   Or if I have an index that says "all nodes
of type", that result set could get very large.

In fact, I probably need to access all nodes in order to apply any new
indexes, if I can't just send a reindexing command that says
"for all nodes add to index like so, etc".

If I'm understanding the "server plugin" thing correctly, I've got to
go write some java classes to do that... which, while I *can* do, it
would
better if it could be accessed in a language agnostic way, with
something more or less resembling a database cursor (see MongoDB's
API).

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Michael DeHaan
On Tue, Apr 19, 2011 at 10:58 AM, Jim Webber  wrote:
>>> I'd like to propose that we put this functionality into the plugin 
>>> (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I 
>>> are currently working on, thoughts?
>
> I'm thinking that, if we do it, it should be handled through content 
> negotiation. That is if you ask for application/atom then you get paged lists 
> of results. I don't necessarily think that's a plugin, it's more likely part 
> of the representation logic in server itself.

This is something I've been wondering about as I may have the need to
feed very large graphs into the system and am wondering how the REST
API will hold up compared to the native interface.

What happens if the result of an index query (or traversal, whatever)
legitimately needs to return 100k results?

Wouldn't that be a bit large for one request?   If anything, it's a
lot of JSON to decode at once.

Feeds make sense for things that are feed-like, but do atom feeds
really make sense for results of very dynamic queries that don't get
subscribed to?
Or, related question, is there a point where the result sets of
operations get so large that things start to break down?   What do
people find this to generally be?

Maybe it's not an issue, but pointers to any problems REST API usage
has with large data sets (and solutions?) would be welcome.

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST API thoughts/questions/feedback

2011-04-19 Thread Michael DeHaan
On Tue, Apr 19, 2011 at 10:48 AM, Jacob Hansson  wrote:
> Hey Michael, big thanks again for taking the time to write down your
> experiences working with the REST API.
>
> See inline response.

Thanks for the follow up.  That's quite helpful and let's me know I'm
not doing the unique-key-implementation in too much of
a non-idiomatic way.

I'll get back with you about doc fixes and should the bindings
materialize further, I'll share some examples.

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] REST API thoughts/questions/feedback

2011-04-18 Thread Michael DeHaan
Hi all.

I've been working recently on writing a Perl binding for the Neo4j
REST API and thought I'd share some observations, and hopefully can
get a few suggestions on some things.  You can see some of the Perl
work in progress here -- https://github.com/mpdehaan/Elevator (search
for *Neo*.pm).  Basically it's a data later that allows objects to be
plugged between Sql, NoSql (Riak and Mongo so far) and Neo4j.

The idea is we can build data classes and just call "commit()" and the
like on them, though if the class is backed by Neo4j obviously we'll
be able to add links between them, query the links, and so forth.  I'm
still working on that.

Basically the REST API *is* working for me, but here are my
observations about it:

(1)  I'd like to be able to be able to specify the node ID of a node
before I create it.  I like having a primary key, as I can do with
things like Mongo and Riak.  If I do not have a primary key, I have to
search before I add, "upsert" becomes difficult, as do deletions, and
I have to worry about which copy of a given object is authorative.
I understand this can't work for everyone but seems like it would be
useful. If that can be done now, I'd love info on how to!

(2)  I'd like a delete_all find of API to be able to delete all nodes
matching a particular criteria versus having to do a search.   For
instance, I may need to rebuild the database, or part of it, and it's
not clear on how to drop it.   Also, is there a way to drop the entire
database via REST?

(3)  I'd like to be able to have the "key" of the node automatically
added to the index without having to make a second call.  Ideally I'd
like to be able to configure the server to auto-index certain fields,
which is something some of the NoSQL/search tools offer. Similarly,
when updating the node, the index should auto update without an
explicit call to the indexer.

(4)  The capability to do an "upsert" would be very useful, create a
node if it exists for the given key, if not, update it.

(5)   It seems the indexes are the only means of search?   If I need
to search on a field that isn't indexed (say in production, I need to
add a new index), how do I go about adding it for all the nodes that
need to be added to the index *to* that index?   It seems I'd need to
be keeping at least an index of all nodes of a given "type" all along,
so I could at least iterate over those?

I think most of the underlying questions/problems I have are that I'm
trying to make sure graph elements are unique for some criteria, and
this requires that I make more API calls than normal, and have to
implement this in my library and not in the server -- which could be
fragile and certaintly isn't atomic.

I've also noticed some minor things, which were slight stumbling blocks:

 * It seems that while the application says it takes JSON, it will
actually accept things as a key/value pair form submission, and may
prefer it that way.  This could be my code though and I need to debug
this further.
 * At one point in the API docs it suggests POSTing a hash as { key,
value }.  In JSON, this should be { key : value }.
 * Some API documentation online refers to the default port being
 and didn't mention the "/db/..." prefix to the URLs.
 * While I understand "proper" REST is politically correct, I'd be
really happy with simple endpoints that always could take POSTs, or
the ability to always do a post.   Makes calling code simpler.
 * In the documentation, it was unclear whether "my_nodes" really
needed to be "my_nodes" or was some sort of namespace that I could or
should use.   Is there a way to keep graphs in different namespaces?

In all, it's actually looking pretty good, though knowing what this
object key is in advance, and having a way to avoid duplicate objects
would help tremendously.   I like the idea that the URLs come back
when adding objects, in particular, as it helps make the REST APIs to
call about a particular node more self documenting.

I'd be happy to try to explain further if any of that didn't make
sense -- particularly I'd be very interested in how to specify a "key"
for an element in advance, so I didn't have to rely on lookups each
time I need the node ID.   Since the lookup can return a list, it
doesn't guarantee I can get back a specific node.

Thanks!

--Michael DeHaan
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user