Re: [Neo4j] REST results pagination
On Thu, Apr 21, 2011 at 5:00 PM, Michael Hunger wrote: > Really cool discussion so far, > > I would also prefer streaming over paging as with that approach we can give > both ends more of the control they need. Just in case we're not talking about the same kind of streaming -- when I think streaming, I think "streaming uploads", "streaming downloads", etc. If the REST format is JSON (or XML, whatever), that's a /document/ so you can't just say "read the next (up to) 512 bytes" and work on it. It becomes a more low-level endeavor because if you're in the middle of reading a record, or don't even have the "end of list" terminator, what you have isn't parseable yet. I'm sure a lot of hacking could be done to make the client figure out if he had enough other than the closing array element, but it's a lot to ask of a JSON client. So I'm interested in how, in that proposal, the REST API might stream results to a client, because for the streaming to be meaningful, you need to be able to parse what you get back and know where the boundaries are (or build a buffer until you fill in a datastructure enough to operate on it). I don't see that working with JSON/REST so much. It seems to imply a message bus. --Michael ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
> > 3) machine users could care less about paging My thoughts are that parsing very large documents can perform poorly and requires the entire document be slurped into (available) RAM. This puts a cap on the size of a usable resultset and slows processing, or at least makes you pay an up-front cost, and decreases potential for parallelism in other parts of your app?. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
> > This is important for the integration of the Neo4j Python Rest Client > in Django, because I'm currently developing an application with lazy > and user-defined schemas on top of Django and Neo4j. The listing of > nodes and relationships is a requirement for me, so the pagination is > a must in my aplication. Performing this in the application layer > instead of Neo4j server side, wastes a lot of time sending information > via REST. Well put about the listing of nodes and relationships. That's the use case where this comes up. If I can't trust that my app's code indexed something correctly, or I need to index old data later, I may need to walk the whole graph to update the indexes, so large result sets become scary. I don't think I can rely on a traverse as part of the graphs might be disjoint. New use cases on old data mean we'll have to do that, just like adding a new index to a SQL db. Or if I have an index that says "all nodes of type", that result set could get very large. In fact, I probably need to access all nodes in order to apply any new indexes, if I can't just send a reindexing command that says "for all nodes add to index like so, etc". If I'm understanding the "server plugin" thing correctly, I've got to go write some java classes to do that... which, while I *can* do, it would better if it could be accessed in a language agnostic way, with something more or less resembling a database cursor (see MongoDB's API). --Michael ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
On Tue, Apr 19, 2011 at 10:58 AM, Jim Webber wrote: >>> I'd like to propose that we put this functionality into the plugin >>> (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I >>> are currently working on, thoughts? > > I'm thinking that, if we do it, it should be handled through content > negotiation. That is if you ask for application/atom then you get paged lists > of results. I don't necessarily think that's a plugin, it's more likely part > of the representation logic in server itself. This is something I've been wondering about as I may have the need to feed very large graphs into the system and am wondering how the REST API will hold up compared to the native interface. What happens if the result of an index query (or traversal, whatever) legitimately needs to return 100k results? Wouldn't that be a bit large for one request? If anything, it's a lot of JSON to decode at once. Feeds make sense for things that are feed-like, but do atom feeds really make sense for results of very dynamic queries that don't get subscribed to? Or, related question, is there a point where the result sets of operations get so large that things start to break down? What do people find this to generally be? Maybe it's not an issue, but pointers to any problems REST API usage has with large data sets (and solutions?) would be welcome. --Michael ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST API thoughts/questions/feedback
On Tue, Apr 19, 2011 at 10:48 AM, Jacob Hansson wrote: > Hey Michael, big thanks again for taking the time to write down your > experiences working with the REST API. > > See inline response. Thanks for the follow up. That's quite helpful and let's me know I'm not doing the unique-key-implementation in too much of a non-idiomatic way. I'll get back with you about doc fixes and should the bindings materialize further, I'll share some examples. --Michael ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] REST API thoughts/questions/feedback
Hi all. I've been working recently on writing a Perl binding for the Neo4j REST API and thought I'd share some observations, and hopefully can get a few suggestions on some things. You can see some of the Perl work in progress here -- https://github.com/mpdehaan/Elevator (search for *Neo*.pm). Basically it's a data later that allows objects to be plugged between Sql, NoSql (Riak and Mongo so far) and Neo4j. The idea is we can build data classes and just call "commit()" and the like on them, though if the class is backed by Neo4j obviously we'll be able to add links between them, query the links, and so forth. I'm still working on that. Basically the REST API *is* working for me, but here are my observations about it: (1) I'd like to be able to be able to specify the node ID of a node before I create it. I like having a primary key, as I can do with things like Mongo and Riak. If I do not have a primary key, I have to search before I add, "upsert" becomes difficult, as do deletions, and I have to worry about which copy of a given object is authorative. I understand this can't work for everyone but seems like it would be useful. If that can be done now, I'd love info on how to! (2) I'd like a delete_all find of API to be able to delete all nodes matching a particular criteria versus having to do a search. For instance, I may need to rebuild the database, or part of it, and it's not clear on how to drop it. Also, is there a way to drop the entire database via REST? (3) I'd like to be able to have the "key" of the node automatically added to the index without having to make a second call. Ideally I'd like to be able to configure the server to auto-index certain fields, which is something some of the NoSQL/search tools offer. Similarly, when updating the node, the index should auto update without an explicit call to the indexer. (4) The capability to do an "upsert" would be very useful, create a node if it exists for the given key, if not, update it. (5) It seems the indexes are the only means of search? If I need to search on a field that isn't indexed (say in production, I need to add a new index), how do I go about adding it for all the nodes that need to be added to the index *to* that index? It seems I'd need to be keeping at least an index of all nodes of a given "type" all along, so I could at least iterate over those? I think most of the underlying questions/problems I have are that I'm trying to make sure graph elements are unique for some criteria, and this requires that I make more API calls than normal, and have to implement this in my library and not in the server -- which could be fragile and certaintly isn't atomic. I've also noticed some minor things, which were slight stumbling blocks: * It seems that while the application says it takes JSON, it will actually accept things as a key/value pair form submission, and may prefer it that way. This could be my code though and I need to debug this further. * At one point in the API docs it suggests POSTing a hash as { key, value }. In JSON, this should be { key : value }. * Some API documentation online refers to the default port being and didn't mention the "/db/..." prefix to the URLs. * While I understand "proper" REST is politically correct, I'd be really happy with simple endpoints that always could take POSTs, or the ability to always do a post. Makes calling code simpler. * In the documentation, it was unclear whether "my_nodes" really needed to be "my_nodes" or was some sort of namespace that I could or should use. Is there a way to keep graphs in different namespaces? In all, it's actually looking pretty good, though knowing what this object key is in advance, and having a way to avoid duplicate objects would help tremendously. I like the idea that the URLs come back when adding objects, in particular, as it helps make the REST APIs to call about a particular node more self documenting. I'd be happy to try to explain further if any of that didn't make sense -- particularly I'd be very interested in how to specify a "key" for an element in advance, so I didn't have to rely on lookups each time I need the node ID. Since the lookup can return a list, it doesn't guarantee I can get back a specific node. Thanks! --Michael DeHaan ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user