Re: [Neo4j] Question from Webinar - traversing a path with nodes of different types
Hi Vipul, Out of curiosity, what does process in this context mean? As Rick alludes to, you'd have some component performing the simulation using the domain objects and possibly a graph traversal. An example of an algorithm for this would be to walk the graph from 1, and whenever you find a branch, you split the walk. When you finish walking a branch (a point where more than one branch joins) you use some kind of synchronization to join the walks. Does this make sense? David On Wed, Apr 20, 2011 at 11:16 PM, Vipul Gupta vipulgupta...@gmail.comwrote: Hi David, Inputs are 1 and 6 + Graph is acyclic. domain.Client@1 - domain.Router@2 - domain.Router@3 - domain.Router@5- domain.Server@6 - domain.Router@7 - domain.Router@8 - I want a way to start from 1, process the 2 path till it reaches 5 (say in a thread) process the 7 path till it reaches 5 (in another thread) then process 5 and eventually 6. the above step of processing intermediate path and waiting on the blocking point can happen over and over again in a more complex graph (that is there could be a number of loops in between even) and the traversal stops only we reach 6 I hope this makes it a bit clear. I was working out something for this, but it is turning out to be too complex a solution for this sort of traversal of a graph, so I am hoping if you can suggest something. Best Regards, Vipul On Thu, Apr 21, 2011 at 11:36 AM, David Montag david.mon...@neotechnology.com wrote: Hi Vipul, Zooming out a little bit, what are the inputs to your algorithm, and what do you want it to do? For example, given 1 and 6, do you want to find any points in the chain between them that are join points of two (or more) subchains (5 in this case)? David On Wed, Apr 20, 2011 at 10:56 PM, Vipul Gupta vipulgupta...@gmail.comwrote: my mistake - I meant 5 depends on both 3 and 8 and acts as a blocking point till 3 and 8 finishes On Thu, Apr 21, 2011 at 11:19 AM, Vipul Gupta vipulgupta...@gmail.comwrote: David/Michael, Let me modify the example a bit. What if my graph structure is like this domain.Client@1 - domain.Router@2 - domain.Router@3 - domain.Router@5 - domain.Server@6 - domain.Router@7 - domain.Router@8 - Imagine a manufacturing line. 6 depends on both 3 and 8 and acts as a blocking point till 3 and 8 finishes. Is there a way to get a cleaner traversal for such kind of relationship. I want to get a complete intermediate traversal from Client to Server. Thank a lot for helping out on this. Best Regards, Vipul On Thu, Apr 21, 2011 at 12:09 AM, David Montag david.mon...@neotechnology.com wrote: Hi Vipul, Thanks for listening! It's a very good question, and the short answer is: yes! I'm cc'ing our mailing list so that everyone can take part in the answer. Here's the long answer, illustrated by an example: Let's assume you're modeling a network. You'll have some domain classes that are all networked entities with peers: @NodeEntity public class NetworkEntity { @RelatedTo(type = PEER, direction = Direction.BOTH, elementClass = NetworkEntity.class) private SetNetworkEntity peers; public void addPeer(NetworkEntity peer) { peers.add(peer); } } public class Server extends NetworkEntity {} public class Router extends NetworkEntity {} public class Client extends NetworkEntity {} Then we can build a small network: Client c = new Client().persist(); Router r1 = new Router().persist(); Router r21 = new Router().persist(); Router r22 = new Router().persist(); Router r3 = new Router().persist(); Server s = new Server().persist(); c.addPeer(r1); r1.addPeer(r21); r1.addPeer(r22); r21.addPeer(r3); r22.addPeer(r3); r3.addPeer(s); c.persist(); Note that after linking the entities, I only call persist() on the client. You can read more about this in the reference documentation, but essentially it will cascade in the direction of the relationships created, and will in this case cascade all the way to the server entity. You can now query this: IterableEntityPathClient, Server paths = c.findAllPathsByTraversal(Traversal.description()); The above code will get you an EntityPath per node visited during the traversal from c. The example does however not use a very interesting traversal description, but you can still print the results: for (EntityPathClient, Server path : paths) { StringBuilder sb = new StringBuilder(); IteratorNetworkEntity iter = path.NetworkEntitynodeEntities().iterator(); while (iter.hasNext()) { sb.append(iter.next()); if (iter.hasNext()) sb.append( - ); } System.out.println(sb); } This will print each path, with all entities in the path. This is what it looks like: domain.Client@1 domain.Client@1 - domain.Router@2 domain.Client@1 - domain.Router@2 - domain.Router@3 domain.Client@1 -
Re: [Neo4j] Basic Node storage/retrieval related question?
Why are you using Object a and not int a or Integer a SDG uses the field type and not the current value type to provide conversions for non primitive types. As Object is such it is converted to a String. We will look into accomodating for Object values in the future. By then please use the concrete type or a conversion service. M Sent from my iBrick4 Am 21.04.2011 um 19:21 schrieb G vlin...@gmail.com: I have a pojo with a field a. which i initialize like this Object a = 10; I store the POJO containing this field using neo4j.. When I load this POJO, I have a getter method to get the object Object getA() { return a; } *What should be the class type of a ? * I am of the opinion it should be java.lang.Integer but it is coming out to be java.lang.String I am assuming this is because of node.getProperty(... ) Is there a way I can get Integer object only. Also what all types can be stored ? thanks, Karan . ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Look up a node by the value of one of its properties!!!
Hello, I am new to this list and I have a question about Neo4j (I am just using Neo4j). I install neo4j server and Jersey Client (REST API) by which I communicate with the Neo4j database. My problem is how do I look up a node (get the location of this node : URI) knowing the value of one of its properties; I do not know the node ID. (for example I have a node {name:thomas,age:20...}; and I want to get the location(URI) of this node by sending the name thomas to the database server). Please, could someone give me his feedback? I will be glad to get any support on graph traversals using REST API. Best regards, -- Kobla GBENYO, S/C M. Jean MATHE, 28 Rue de la Normandie, 79 000 Niort. (+33) 6 26 07 93 41 / 6 62 26 64 47 http://www.gbenyo-expo.fr ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Look up a node by the value of one of its properties!!!
You can use the indexing part of the REST API. That means after creation you have to add the fields you are interested in to an index. Then you can retrieve the node(s) later. See http://components.neo4j.org/neo4j-server/snapshot/rest.html#Add_to_index HTH Michael Sent from my iBrick4 Am 22.04.2011 um 09:39 schrieb Kobla Gbenyo ko...@riastudio.fr: Hello, I am new to this list and I have a question about Neo4j (I am just using Neo4j). I install neo4j server and Jersey Client (REST API) by which I communicate with the Neo4j database. My problem is how do I look up a node (get the location of this node : URI) knowing the value of one of its properties; I do not know the node ID. (for example I have a node {name:thomas,age:20...}; and I want to get the location(URI) of this node by sending the name thomas to the database server). Please, could someone give me his feedback? I will be glad to get any support on graph traversals using REST API. Best regards, -- Kobla GBENYO, S/C M. Jean MATHE, 28 Rue de la Normandie, 79 000 Niort. (+33) 6 26 07 93 41 / 6 62 26 64 47 http://www.gbenyo-expo.fr ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Basic Node storage/retrieval related question?
I was storing this as an object because this field was acting as parameters to different functions I was calling and functions had different parameter types. Would Generics help here ? so that for my pojo I can have the following instead T a T getA(){ return a } I would just give that a quick try. Do you think that would solve this issue for me or do you have an alternate idea ? -Karan On Fri, Apr 22, 2011 at 1:08 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Why are you using Object a and not int a or Integer a SDG uses the field type and not the current value type to provide conversions for non primitive types. As Object is such it is converted to a String. We will look into accomodating for Object values in the future. By then please use the concrete type or a conversion service. M Sent from my iBrick4 Am 21.04.2011 um 19:21 schrieb G vlin...@gmail.com: I have a pojo with a field a. which i initialize like this Object a = 10; I store the POJO containing this field using neo4j.. When I load this POJO, I have a getter method to get the object Object getA() { return a; } *What should be the class type of a ? * I am of the opinion it should be java.lang.Integer but it is coming out to be java.lang.String I am assuming this is because of node.getProperty(... ) Is there a way I can get Integer object only. Also what all types can be stored ? thanks, Karan . ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
Hi Stephen, I think the network IO you've measured is consistent with the rest of the behaviour your've described. What I'm thinking is that you're simply reaching the limits of create transaction-create a node-complete transaction-flush to filesystem (that is, you're basically testing disk write speed/seek time/etc). Can you check how busy your IO to disk is? I expect it'll be relatively high. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] about two database
Hi Jose, 1-i have 2 database (graph), I need to get information from one database to another without having to take the target database instance of another database. One reasonable way of doing this is to use the HA configuration. The HA protocol will keep two (or many) instances of the database in sync. 2- i need know how to open a database(graph) if it already exists. thanks beforehand You could try to open a EmbeddedReadOnlyGraphDatabase. If the database store exists (and is a valid database) then no exception will be thrown. Otherwise you'll get a TransactionFailureException. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
Good catch, forgot to add the in-graph representation of the results to my mail, thanks for adding that part. Temporary (transient) nodes and relationships would really rock here, with the advantage that with HA you have them distributed to all cluster nodes. Certainly Craig has to add some interesting things to this, as those resemble probably his in graph indexes / R-Trees. I certainly make use of this model, much more so for my statistical analysis than for graph indexes (but I'm planning to merge indexes and statistics). However, in my case the structures are currently very domain specific. But I think the idea is sound and should be generalizable. What I do is have a concept of a 'dataset' on which queries can be performed. The dataset is usually the root of a large sub-graph. The query parser (domain specific) creates a hashcode of the query, checks if the dataset node already has a resultset (as a connected sub-graph with its own root node containing the previous query hashcode), and if so return that (traverse it), otherwise perform the complete dataset traversal, creating the resultset as a new subgraph and then return it. This works well specifically for statistical queries, where the resultset is much smaller than the dataset, so adding new subgraphs has small impact on the database size, and the resultset is much faster to return, so this is a performance enhancement for multiple requests from the client. Also, I keep the resultset permanently, not temporarily. Very few operations modify the dataset, and if they do, we delete all resultsets, and they get re-created the next time. My work on merging the indexes with the statistics is also planned to only recreate 'dirty' subsets of the result-set, so modifying the dataset has minimal impact on the query performance. After reading Rick's previous email I started thinking of approaches to generalizing this, but I think your 'transient' nodes perhaps encompass everything I thought about. Here is an idea: - Have new nodes/relations/properties tables on disk, like a second graph database, but different in the sense that it has one-way relations into the main database, which cannot be seen by the main graph and so are by definition not part of the graph. These can have transience and expiry characteristics. Then we can build the resultset graphs as transient graphs in the transient database, with 'drill-down' capabilities to the original graph (something I find I always need for statistical queries, and something a graph is simply much better at than a relational database). - Use some kind of hashcode in the traversal definition or query to identify existing, cached, transient graphs in the second database, so you can rely on those for repeated queries, or pagination or streaming, etc. As traversers are lazy a count operation is not so easily possible, you could run the traversal and discard the results. But then the client could also just pull those results until it reaches its internal tresholds and then decide to use more filtering or stop the pulling and ask the user for more filtering (you can always retrieve n+1 and show the user that there are more that n results available). Yes. Count needs to perform the traversal. So the only way to not have to traverse twice is to keep a cache. If we make the cache a transient sub-graph (possibly in the second database I described above), then we have the interesting behaviour that count() takes a while, but subsequent queries, pagination or streaming, are fast. Please don't forget that a count() query in a RDBMS can be as ridicully expensive as the original query (especially if just the column selection was replaced with count, and sorting, grouping etc was still left in place together with lots of joins). Good to hear they have the same problem as us :-) (or even more problems) Sorting on your own instead of letting the db do that mostly harms the performance as it requires you to build up all the data in memory, sort it and then use it. Instead of having the db do that more efficiently, stream the data and you can use it directly from the stream. Client side sorting makes sense if you know the domain well enough to know, for example, you will receive a small enough result set to 'fit' in the client, and want to give the user multiple interactive sort options without hitting the database again. But I agree that in general it makes sense to get the database to do the sort. Cheers, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
Client side sorting makes sense if you know the domain well enough to know, for example, you will receive a small enough result set to 'fit' in the client, and want to give the user multiple interactive sort options without hitting the database again. But I agree that in general it makes sense to get the database to do the sort. I'll concede this point. In general it should be better to do the sorts on the database server, which is typically by design a hefty backend system that is optimized for that sort of processing. In my experience with regular SQL databases, unfortunately they typically only scale vertically, and are usually running on expensive enterprise-grade hardware. Most of the ones I've worked either run on minimally sized hardware or have quickly outgrown their hardware. So they are always either: 1) Currently suffering from a capacity problem. 2) Just recovering from a capacity problem. 3) Heading rapidly towards a new capacity problem. The next problem I run into is a political, rather than technical one. The database administration team is often a different group of people from the appserver/front end development team. The guys writing the queries are usually closer to the appserver than the database. In other words, it is easier for them to manage a problem in the appserver, than it is to manage a problem in the database. So, instead of having a deep well of data processing power to draw on, and then using a wide layer of thin commodity hardware presentation layer servers, we end up transferring data processing power out of the data server and into the presentation layer. As we evolve into building data processing systems which can scale horizontally on commodity hardware, the perpetual capacity problems the legacy vertical databases suffer from may wane, finally freeing the other layers from having to pick up some of the slack. -- Rick Otten rot...@windfish.net O=='=+ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
On Thu, Apr 21, 2011 at 11:18 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Rick, great thoughts. Good catch, forgot to add the in-graph representation of the results to my mail, thanks for adding that part. Temporary (transient) nodes and relationships would really rock here, with the advantage that with HA you have them distributed to all cluster nodes. Certainly Craig has to add some interesting things to this, as those resemble probably his in graph indexes / R-Trees. As traversers are lazy a count operation is not so easily possible, you could run the traversal and discard the results. But then the client could also just pull those results until it reaches its internal tresholds and then decide to use more filtering or stop the pulling and ask the user for more filtering (you can always retrieve n+1 and show the user that there are more that n results available). The index result size() method only returns an estimate of the result size (which might not contain currently changed index entries). Please don't forget that a count() query in a RDBMS can be as ridicully expensive as the original query (especially if just the column selection was replaced with count, and sorting, grouping etc was still left in place together with lots of joins). Sorting on your own instead of letting the db do that mostly harms the performance as it requires you to build up all the data in memory, sort it and then use it. Instead of having the db do that more efficiently, stream the data and you can use it directly from the stream. throw new SlapOnTheFingersException(sometimes the application developer can do a better job since she has better knowledge of the data, the database only has generic knowledge); Since Jake had already mentioned (in this very thread) that he expected one of those, I thought I might as well throw one in there. I agree with the analysis of count(), as the name (count) implies, it will have to run the entire query in order to count the number of resulting items. About sorting I'm torn. The perception of sorting in the database being slow that Rick points to is one that I've seen a lot. When you hand the responsibility of sorting to the database you hide the fact that sorting is an expensive operation, it does require reading in all data in order to sort it. People often expect databases to be smarter than that, since they sometimes are, but that is pretty much only when reading straight from an index and not doing much more. A generic sort of data can never be better than O(log(n!)) [O(log(n!)) is almost equal to, and commonly rounded to the easier to compute function O(n log(n))]. If you put the responsibility of sorting in the hands of the application you can sometimes utilize knowledge about the data to do a more efficient sorting than the database could have done. Most often by simply doing an application level filtering of the data before sorting it, based on some filtering that could not be transfered to the database query. This does however make the work of the application developer slightly more tedious, which is why I think it would be sensible to have support for sorting on the database level, and hope that users will be sensible about using it, and not assume magic from it. Something I find very interesting is the concept of semi-sorted data. Semi-sorted data is often good enough, easier to achieve, and quite easy to then sort completely if that is required. Examples of semi-sorted data could be data in an order that satisfies the heap property. Or for spatial queries returning the closest hits first, but not necessarily in perfect order, say returning the hits within a miles radius first, before the ones in a radius between 1-10 miles, and so on, without requiring the hits in each 'segment' to be perfectly ordered by distance. Breadth first order is another example of semi-sorted data, that could be used when traversing data as you've outlined with paging nodes, or similarly grouped by parent node-order. I must say that I really enjoy following this discussion. I really like the idea of streaming, since I think that can be implemented more easily than paging, while satisfying many of the desired use cases. But I still want to hear more arguments for and against both alternatives. And as has already been pointed out, they aren't mutually exclusive. I'll keep listening in on the conversation, but I don't have much more to add at this point. I have one desire for the structure of the conversation though. When you quote what someone else has said before you, could you please include who that person was, it makes going back and reading the full context easier. Cheers, -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Strange performance difference on different machines
Hi Michael, On 2011-04-21, at 4:38 PM, Michael Hunger wrote: Bob, I don't know if you have already answered these questions. Which JDK (also version) are you using for that, what are the JVM memory settings? Sun's java 1.6 patch level 24 (I think I'll have to confirm on Monday) Memory settings... from the top of my head... there's a GC setting that I can't recall, and the min and max heap is set to 12GB (which it never comes close to) Do you have a profiler handy that you could throw at your benchmark? (E.g. yourkit has a 30 day trial, other profilers surely too). Monday for that too. Do you have the source code of your tests at hand? So we could run exactly the same code on our own Linux systems for cross checking? If it's useful I can probably extract something, starting Monday :-) What Linux distribution is it, and 64 or 32 bit? Do you also have a disk formatted with ext3 to cross check? (Perhaps just a loopback device). Ubuntu 10.10 64bit. The machine has been set up as ext4, I'll see what I can scrounge up for ext3. How much memory does the linux box have available? 16GB Thanks so much. Thank you. Cheers, Bob Michael Am 21.04.2011 um 21:53 schrieb Bob Hutchison: On 2011-04-20, at 7:30 AM, Tobias Ivarsson wrote: Sorry I got a bit distracted when writing this. I should have added that I then want you to send the results of running that benchmark to me so that I can further analyze what the cause of these slow writes might be. Thank you, Tobias That's what I figured you meant. Sorry for the delay, here they are: On a HP z400, quad Xeon W3550 @ 3.07GHz ext4 filesystem - dd if=/dev/urandom of=store bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 111.175 s, 9.4 MB/s dd if=store of=/dev/null bs=100M 10+0 records in 10+0 records out 1048576000 bytes (1.0 GB) copied, 0.281153 s, 3.7 GB/s dd if=store of=/dev/null bs=100M 10+0 records in 10+0 records out 1048576000 bytes (1.0 GB) copied, 0.244339 s, 4.3 GB/s dd if=store of=/dev/null bs=100M 10+0 records in 10+0 records out 1048576000 bytes (1.0 GB) copied, 0.242583 s, 4.3 GB/s ./run ../store logfile 33 100 500 100 tx_count[100] records[31397] fdatasyncs[100] read[0.9881029 MB] wrote[1.9762058 MB] Time was: 5.012 19.952114 tx/s, 6264.365 records/s, 19.952114 fdatasyncs/s, 201.87897 kB/s on reads, 403.75793 kB/s on writes ./run ../store logfile 33 1000 5000 10 tx_count[10] records[30997] fdatasyncs[10] read[0.9755144 MB] wrote[1.9510288 MB] Time was: 0.604 16.556292 tx/s, 51319.54 records/s, 16.556292 fdatasyncs/s, 1653.8523 kB/s on reads, 3307.7046 kB/s on writes ./run ../store logfile 33 1000 5000 100 tx_count[100] records[298245] fdatasyncs[100] read[9.386144 MB] wrote[18.772287 MB] Time was: 199.116 0.5022198 tx/s, 1497.8455 records/s, 0.5022198 fdatasyncs/s, 48.270412 kB/s on reads, 96.540825 kB/s on writes procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 1 2 0 8541712 336716 367094000 1 7 12 20 4 1 95 0 0 2 0 8525712 336716 367094800 0 979 1653 3186 4 1 60 35 1 2 0 8525220 336716 367120400 0 1244 1671 3150 4 1 71 24 0 2 0 8524724 336716 367133200 0 709 1517 3302 4 1 65 30 0 2 0 8524476 336716 367146000 0 1033 1680 69342 5 7 59 29 0 2 0 8539168 336716 367158800 0 1375 1599 3272 3 1 70 25 1 2 0 8538860 336716 367171600 0 1157 1594 3097 3 1 72 24 0 1 0 8541340 336716 367184400 0 1151 1512 3182 3 2 70 25 0 1 0 8524812 336716 367197200 0 1597 1641 3391 4 2 72 22 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user Bob Hutchison Recursive Design Inc. http://www.recursive.ca/ weblog: http://xampl.com/so ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] about two database
Hi Jose, thanks you very much for your answer, but do not know where I can find some example about de HA. The main wiki page is here: http://wiki.neo4j.org/content/High_Availability_Cluster And the (milestone) docs are here: http://docs.neo4j.org/chunked/milestone/server-ha.html Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
Hi Michael, Just in case we're not talking about the same kind of streaming -- when I think streaming, I think streaming uploads, streaming downloads, etc. I'm thinking chunked transfers. That is the server starts sending a response and then eventually terminates it when the whole response has been sent to the client. Although it seems a bit rude, the client could simply opt to close the connection when it's read enough providing what it has read makes sense. Sometimes document fragments can make sense: results node id=1234 property name=planet value=Earth/ /node node id=1235 property name=planet value=Mars/ /node !-- client gets bored here and kills the connection missing out on what would have followed -- node id=1236 property name=planet value=Jupiter/ /node node id=1237 property name=planet value=Saturn/ /node /results In this case we certainly don't have well-formed XML, but some streaming API (e.g. stax) might already have been able to create some local objects on the client side as the Earth and Mars nodes came in. I don't think this is elegant at all, but it might be practical. I've asked Mark Nottingham for his view on this since he's pretty sensible about Web things. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] about two database
Hi Jim, I am creating a package of algorithms to work in such databases NEO4J, then for that I have two graphs (DB), and what you want to know is how to obtain such a node in the graph G1 from G2, but without having an instance of G1, This can be done? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
Hi Georg, It would at least have to be an iterator over pages - otherwise the results tend to be fine-grained and so horribly inefficient for sending over a network. Jim On 22 Apr 2011, at 18:24, Georg Summer wrote: I might be a little newbish here, but then why not an Iterator? The iterator lives on the server and is accessible through the REST interface, providing a advance and value method. It either operates on a stored and once-created-stable result set or holds the query and evaluates it on demand (issues of changing underlying graph included). The client can have paginator functionality by advancing and derefing the iterator n times or streaming-like behaviour by constantly pushing the obtained data into a queue and keep on going. If the client does not need the iterator anymore he simple stops using it and a timeout kills it eventually on the server. a client-callable delete method for the iterator would work as well. Georg ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
Sax (or stax) is an example of streaming with a higher level format, but there are plenty of other ways as well. The *critical* performance element is to *never* have to accumulate an entire intermediate document on either side (eg json object or xml Dom) if you can avoid it. You end up requiring 4x the resources (or more), extra latency, more parsing, and more garbage collection. I'll get with Jim webber and propose a prototype of alternatives. Note also that the lack of binary I/O in the browser without flash/java/silverlight is a challenge, but we can work around it. - Reply message - From: Michael DeHaan michael.deh...@gmail.com Date: Fri, Apr 22, 2011 12:18 pm Subject: [Neo4j] REST results pagination To: Neo4j user discussions user@lists.neo4j.org On Thu, Apr 21, 2011 at 5:00 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Really cool discussion so far, I would also prefer streaming over paging as with that approach we can give both ends more of the control they need. Just in case we're not talking about the same kind of streaming -- when I think streaming, I think streaming uploads, streaming downloads, etc. If the REST format is JSON (or XML, whatever), that's a /document/ so you can't just say read the next (up to) 512 bytes and work on it. It becomes a more low-level endeavor because if you're in the middle of reading a record, or don't even have the end of list terminator, what you have isn't parseable yet. I'm sure a lot of hacking could be done to make the client figure out if he had enough other than the closing array element, but it's a lot to ask of a JSON client. So I'm interested in how, in that proposal, the REST API might stream results to a client, because for the streaming to be meaningful, you need to be able to parse what you get back and know where the boundaries are (or build a buffer until you fill in a datastructure enough to operate on it). I don't see that working with JSON/REST so much. It seems to imply a message bus. --Michael ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] about two database
It is in principle possible but what is the issue of having an instance (ro or rw) of the second db that does the parsing of the store files for you? Sent from my iBrick4 Am 22.04.2011 um 19:22 schrieb Jose Angel Inda Herrera jai...@estudiantes.uci.cu: Hi Jim, I am creating a package of algorithms to work in such databases NEO4J, then for that I have two graphs (DB), and what you want to know is how to obtain such a node in the graph G1 from G2, but without having an instance of G1, This can be done? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
That would need to hold resources on the server (potentially for an indeterminate amount of time) since it must be stateful. In general, stateful apis do not scale well in cases of dynamic queries. - Reply message - From: Georg Summer georg.sum...@gmail.com Date: Fri, Apr 22, 2011 1:25 pm Subject: [Neo4j] REST results pagination To: Neo4j user discussions user@lists.neo4j.org I might be a little newbish here, but then why not an Iterator? The iterator lives on the server and is accessible through the REST interface, providing a advance and value method. It either operates on a stored and once-created-stable result set or holds the query and evaluates it on demand (issues of changing underlying graph included). The client can have paginator functionality by advancing and derefing the iterator n times or streaming-like behaviour by constantly pushing the obtained data into a queue and keep on going. If the client does not need the iterator anymore he simple stops using it and a timeout kills it eventually on the server. a client-callable delete method for the iterator would work as well. Georg On 22 April 2011 18:43, Jim Webber j...@neotechnology.com wrote: Hi Michael, Just in case we're not talking about the same kind of streaming -- when I think streaming, I think streaming uploads, streaming downloads, etc. I'm thinking chunked transfers. That is the server starts sending a response and then eventually terminates it when the whole response has been sent to the client. Although it seems a bit rude, the client could simply opt to close the connection when it's read enough providing what it has read makes sense. Sometimes document fragments can make sense: results node id=1234 property name=planet value=Earth/ /node node id=1235 property name=planet value=Mars/ /node !-- client gets bored here and kills the connection missing out on what would have followed -- node id=1236 property name=planet value=Jupiter/ /node node id=1237 property name=planet value=Saturn/ /node /results In this case we certainly don't have well-formed XML, but some streaming API (e.g. stax) might already have been able to create some local objects on the client side as the Earth and Mars nodes came in. I don't think this is elegant at all, but it might be practical. I've asked Mark Nottingham for his view on this since he's pretty sensible about Web things. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
I'll be happy to host the streaming rest api summit. Ample amounts of beer will be provided.;-) - Reply message - From: Jim Webber j...@neotechnology.com Date: Fri, Apr 22, 2011 1:46 pm Subject: [Neo4j] REST results pagination To: Neo4j user discussions user@lists.neo4j.org Hi Georg, It would at least have to be an iterator over pages - otherwise the results tend to be fine-grained and so horribly inefficient for sending over a network. Jim On 22 Apr 2011, at 18:24, Georg Summer wrote: I might be a little newbish here, but then why not an Iterator? The iterator lives on the server and is accessible through the REST interface, providing a advance and value method. It either operates on a stored and once-created-stable result set or holds the query and evaluates it on demand (issues of changing underlying graph included). The client can have paginator functionality by advancing and derefing the iterator n times or streaming-like behaviour by constantly pushing the obtained data into a queue and keep on going. If the client does not need the iterator anymore he simple stops using it and a timeout kills it eventually on the server. a client-callable delete method for the iterator would work as well. Georg ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
And you would want to reuse your connection so you don't have to pay this penalty per request Just asking how would duch a REST Resource iterator look like -URI, verbs, request,response formats? I assume then evety query (index,traversal) would just return the iterator URI for later consumption. If we store the query and/or result information (as discussed by Crsig and others) at the node returned as iterator this would be a nice fit. M Sent from my iBrick4 Am 22.04.2011 um 19:46 schrieb Jim Webber j...@neotechnology.com: Hi Georg, It would at least have to be an iterator over pages - otherwise the results tend to be fine-grained and so horribly inefficient for sending over a network. Jim On 22 Apr 2011, at 18:24, Georg Summer wrote: I might be a little newbish here, but then why not an Iterator? The iterator lives on the server and is accessible through the REST interface, providing a advance and value method. It either operates on a stored and once-created-stable result set or holds the query and evaluates it on demand (issues of changing underlying graph included). The client can have paginator functionality by advancing and derefing the iterator n times or streaming-like behaviour by constantly pushing the obtained data into a queue and keep on going. If the client does not need the iterator anymore he simple stops using it and a timeout kills it eventually on the server. a client-callable delete method for the iterator would work as well. Georg ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Error building Neo4j
Maven 2 solved the problem. I submitted a pull request from https://github.com/kevmoo/community to explain such in the readme. On Thu, Apr 21, 2011 at 02:39, Jim Webber j...@neotechnology.com wrote: Hi Kevin, I can replicate your problem. The way I worked around this was to use Maven 2.2.1 rather than Maven 3.0.x. Then I get a green build for community edition. I'll poke the devteam and see what Maven versions they're running on. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] about two database
El 22/04/11 13:51, Michael Hunger escribió: It is in principle possible but what is the issue of having an instance (ro or rw) of the second db that does the parsing of the store files for you? hi michel if I have the instance of the second bd means that I have is a reference to the 2nd database ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] about delete node
Hi list, There is some property that when I delete a node you mark me as to be removed to perform the operation to clear when the transaction is completed ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] about delete node
Sorry, I'm not sure to follow. There is just the node.delete() operation, which is commited at the end of tx. http://wiki.neo4j.org/content/Delete_Semantics Do you mean you want to mark a node as to be removed? There is nothing like that. Or do you want a property that tells you that a node has been removed? (You can set that yourself prior to deletion). Hope that helps Michael Am 23.04.2011 um 04:00 schrieb Jose Angel Inda Herrera: Hi list, There is some property that when I delete a node you mark me as to be removed to perform the operation to clear when the transaction is completed ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
I spent some time looking at what others are doing for inspiration. I kind of like the Riak/Basho approach with multipart-chunks and the approach of explictely creating a resource for the query that can be navigated (either via pages or first,next,[prev,last] links) and expires (and could be reconstructed). Cheers Michael Good discussion: http://stackoverflow.com/questions/924472/paging-in-a-rest-collection CouchDB: http://wiki.apache.org/couchdb/HTTP_Document_API startKey + limit, endKey + limit, sorting, insert/update order Mongooese: [cursor-id]+batch_size OrientDB: .../[limit] Sones: no real rest API, but a SQL on top of the graph: http://developers.sones.de/documentation/graph-query-language/select/ with limit, offset, but also depth (for graph) HBase explcitly creates scanners, which can be then access with next operations, and expire after no activity for a certain timeout riak: http://wiki.basho.com/REST-API.html client-id header for client identification - sticky? optional query parameters for including properties, and if to stream the data keys=[true,false,stream] If “keys=stream”, the response will be transferred using chunked-encoding, where each chunk is a JSON object. The first chunk will contain the “props” entry (if props was not set to false). Subsequent chunks will contain individual JSON objects with the “keys” entry containing a sublist of the total keyset (some sublists may be empty). riak seems to support partial json, non closed elements: -d '{props:{n_val:5' returns multiple responses in one go, Content-Type: multipart/mixed; boundary=YinLMzyUR9feB17okMytgKsylvh --YinLMzyUR9feB17okMytgKsylvh Content-Type: application/x-www-form-urlencoded Link: /riak/test; rel=up Etag: 16vic4eU9ny46o4KPiDz1f Last-Modified: Wed, 10 Mar 2010 18:01:06 GMT {bar:baz} (this block can be repeated n times) --YinLMzyUR9feB17okMytgKsylvh-- * Connection #0 to host 127.0.0.1 left intact * Closing connection #0 Query results: Content-Type – always multipart/mixed, with a boundary specified Understanding the response body The response body will always be multipart/mixed, with each chunk representing a single phase of the link-walking query. Each phase will also be encoded in multipart/mixed, with each chunk representing a single object that was found. If no objects were found or “keep” was not set on the phase, no chunks will be present in that phase. Objects inside phase results will include Location headers that can be used to determine bucket and key. In fact, you can treat each object-chunk similarly to a complete response from read object, without the status code. HTTP/1.1 200 OK Server: MochiWeb/1.1 WebMachine/1.6 (eat around the stinger) Expires: Wed, 10 Mar 2010 20:24:49 GMT Date: Wed, 10 Mar 2010 20:14:49 GMT Content-Type: multipart/mixed; boundary=JZi8W8pB0Z3nO3odw11GUB4LQCN Content-Length: 970 --JZi8W8pB0Z3nO3odw11GUB4LQCN Content-Type: multipart/mixed; boundary=OjZ8Km9J5vbsmxtcn1p48J91cJP --OjZ8Km9J5vbsmxtcn1p48J91cJP Content-Type: application/json Etag: 3pvmY35coyWPxh8mh4uBQC Last-Modified: Wed, 10 Mar 2010 20:14:13 GMT {riak:CAP} --OjZ8Km9J5vbsmxtcn1p48J91cJP-- --JZi8W8pB0Z3nO3odw11GUB4LQCN Content-Type: multipart/mixed; boundary=RJKFlAs9PrdBNfd74HANycvbA8C --RJKFlAs9PrdBNfd74HANycvbA8C Location: /riak/test/doc2 Content-Type: application/json Etag: 6dQBm9oYA1mxRSH0e96l5W Last-Modified: Wed, 10 Mar 2010 18:11:41 GMT {foo:bar} --RJKFlAs9PrdBNfd74HANycvbA8C-- --JZi8W8pB0Z3nO3odw11GUB4LQCN-- * Connection #0 to host 127.0.0.1 left intact * Closing connection #0 Riak - MapReduce: Optional query parameters: * chunked – when set to true, results will be returned one at a time in multipart/mixed format using chunked-encoding. Important headers: * Content-Type – application/json when chunked is not true, otherwise multipart/mixed with application/json parts Other interesting endpoints: /ping, /stats ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user