Re: [Neo4j] Starting neo4j Server doesn't return to promt
Stefan, Could you please send the console output and the content of the data/log dir for more info? On Apr 19, 2011 1:02 AM, Stephan Hagemann stephan.hagem...@googlemail.com wrote: Hello group, I just realized that since upgrading to Neo4j 1.3 my deployment is broken. It seems to be due to the fact that when starting up, the server does not return to a prompt (I noticed this locally also - I need to press enter to get the prompt). Vlad (the deployment script) thus probably assumes that the startup is not yet finished. I have played with the startup options in the neo4j executable, but to no avail. Is anyone else experiencing this or has some ideas? Thanks! Stephan ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to combine both traversing and index queries?
Another approach to this problem is to consider that an index is actually structured as a graph (a tree), and so if you write the tree into the graph together with your data model, you can combined the index and the traversal into a pure graph traversal. Of course, it is insufficient to simply build both the index tree and the domain model as two graphs that only connect at the result nodes. You need to build a combined graph that achieves the purpose of both indexing and domain structure. This is a very domain specific thing and so there are no general purpose solutions. You have to build the graph to suite your domain. One approach is to build the domain graph first, then decide why you want indexing, and without adding lucene (or any external index) to the mix, think about how to modify the graph to also achieve the same effect. On Mon, Apr 18, 2011 at 8:54 PM, Willem-Paul Stuurman w.p.stuur...@knollenstein.com wrote: Hi Ville, We ran into a similar problem basically wanting to search only part of the graph using Lucene. We used traversing to determine the nodes to search from and from there on use Lucene to do a search on nodes connected to the nodes from the traverse result. We solved it as follows: - defined a TransactionEventHandler to auto-update the indexes with node properties, but also add relationships to the same index. We use the relationship.name() as the property name for Lucene, with the 'other node' id as the value. - traverse to get a set of nodes from where on the search. We apply the ACL here to only return nodes the user is allowed to see. - create a BooleanQuery for Lucene with the relationship.name() field names and id's. So if the relationship would be 'IS_FRIEND_OF' and we want to do a full text search for 'trinity' on friends of people with ids 1,2 and 3, we create a query that contains: +(name:trinity) +(isfriendof:1 isfriendof:2 isfriendof:3) To make sure we only get back 'person' nodes we also indexed the node type (in our case 'emtype'), so the complete query is: +emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3) This way you can easily traverse to define the 'edges' of where to search and let Lucene handle the search within that region. Optionally we add the ACL to the Lucene query as well using the same technique, basically adding all group ids the current user is member of and has a 'CAN_ACCESS' relationship with the node: +emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3) +(canaccess:233 canaccess:254 canaccess:324) It works for us because in our case we know the traversal will return a reasonable set of nodes (not thousands+). Lucene can return thousands of nodes, but that's not a problem of course. And we can still use the fun stuff like sorting, paging and score results. Hope this helps. Cheers Paul PS: we always use lower case field names without underscores because somehow it makes Lucene happier On 18 apr 2011, at 11:19, Mattias Persson wrote: 2011/4/18 Michael Hunger michael.hun...@neotechnology.com: Would it be also possible to go the other way round? E.g. have the index-results (name:Vil*) as starting point and traverse backwards the two steps to your start node? (Either using a traversal or the shortest path graph algo with a maximum way-length)? That's what I suggested, but it doesn't exist yet :) To do it that way today (do a traversal from each and every index result) would probably be slower than doing one traversal with filtering. Cheers Michael Am 18.04.2011 um 11:03 schrieb Mattias Persson: Hi Ville, 2011/4/14 Ville Mattila vi...@mattila.fi: Hi there, I am somehow stuck with a problem of combining traversing and queries to indices efficiently - something like finding all people with a name starting with Vil* two steps away from a reference node. Traversing all friends within two steps from the reference node is trivial, but I find it a bit inefficient to apply a return evaluator in each of the nodes visited during traversal. Or is it so? How about more complex criteria which may involve more than one property or even more complex (Lucene) queries? The best solution IMHO (one that isn't available yet) would be to let a traversal have multiple starting points, that is have the index result as starting point. I think that doing a traversal and filtering with an evaluator is the way to go. Have you tried doing this and saw a bad performance for it? I was thinking to spice up my Neo4j setup with Elasticsearch (www.elasticsearch.org) to dedicate Neo4j to keep track of the relationships and ES to index all the data in them, however it makes me feel very uncomfortable to keep up the consistency when data gets updated. However, now I need to keep also Neo4j indices updated. And not to be said, combining traversal and an external index is yet more complicated. However I like
Re: [Neo4j] Wiki documentation neo4j+restfulie.
Hi José, Please feel free to add to the wiki. We've had a problem with spammers recently, so if you run into permissions problems please shout. Jim On 22 Mar 2011, at 20:19, jdbjun...@gmail.com wrote: Hi, going through the neo4j documentation I found some examples of how access neo4j api using two rest libaries (rest-client, neography). After reading it, I've decided to do the same tests using the library restfulie, which I'm committer. Am I allowed to change the wiki adding the restfulie example? If it is ok, is any one willing to review it before changing the wiki? Thanks, José Donizetti. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
Hi Javier, I've just checked and that's in our list of stuff we really should do because it annoys us that it's not there. No promises, but we do intend to work through at least some of that list for the 1.4 releases. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
I'd like to propose that we put this functionality into the plugin (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I are currently working on, thoughts? From: j...@neotechnology.com Date: Tue, 19 Apr 2011 15:25:20 +0100 To: user@lists.neo4j.org Subject: Re: [Neo4j] REST results pagination Hi Javier, I've just checked and that's in our list of stuff we really should do because it annoys us that it's not there. No promises, but we do intend to work through at least some of that list for the 1.4 releases. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
On Tue, Apr 19, 2011 at 10:32, Saikat Kanjilal sxk1...@hotmail.com wrote: I'd like to propose that we put this functionality into the plugin (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I are currently working on, thoughts? +1 From: j...@neotechnology.com I've just checked and that's in our list of stuff we really should do because it annoys us that it's not there. No promises, but we do intend to work through at least some of that list for the 1.4 releases. It will be great to see the feature in the 1.4 :-) -- Javier de la Rosa http://versae.es ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Neo4j/Spatial and Scala
Hi all, I am evaluating the advantages of using Neo4j and its spatial extension. For testing I have extended (forked) the neo4j-scala with some spatial convenience methods. So that something written in Java like: SpatialDatabaseService db = new SpatialDatabaseService( graphDb() ); EditableLayer layer = (EditableLayer) db.getOrCreateEditableLayer( test ); SpatialDatabaseRecord record = layer.add( layer.getGeometryFactory().createPoint(new Coordinate( 15.3, 56.2 ) ) ); Will be converted to: class Neo4jTest extends Neo4jSpatialWrapper with EmbeddedGraphDatabaseServiceProvider with SpatialDatabaseServiceProvider { def neo4jStoreDir = NEO4J_STORE_DIR withLayer(getOrCreateEditableLayer(test)) { implicit layer = val myRecord = add newPoint ((15.3, 56.2)) } } Please refer the github (https://github.com/FaKod/neo4j-scala) Readme or the test cases for more examples . Since we will try (if we have enough time) to extend Neo4j Scala, I would love to get some comments from some of the Scala enthusiasts in this list (hope there are any ;-). Now or later, here or to my email adress. What do you think? Does it help? Is it OK how we use the Traits or the implicits? How should we do POPO to Node serialization? With annotations like those in jo4neo? Regards -- Christopher twitter: @fakod blog: http://blog.fakod.eu ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST API thoughts/questions/feedback
Hey Michael, big thanks again for taking the time to write down your experiences working with the REST API. See inline response. On Mon, Apr 18, 2011 at 4:10 PM, Michael DeHaan michael.deh...@gmail.comwrote: Hi all. I've been working recently on writing a Perl binding for the Neo4j REST API and thought I'd share some observations, and hopefully can get a few suggestions on some things. You can see some of the Perl work in progress here -- https://github.com/mpdehaan/Elevator (search for *Neo*.pm). Basically it's a data later that allows objects to be plugged between Sql, NoSql (Riak and Mongo so far) and Neo4j. The idea is we can build data classes and just call commit() and the like on them, though if the class is backed by Neo4j obviously we'll be able to add links between them, query the links, and so forth. I'm still working on that. Basically the REST API *is* working for me, but here are my observations about it: (1) I'd like to be able to be able to specify the node ID of a node before I create it. I like having a primary key, as I can do with things like Mongo and Riak. If I do not have a primary key, I have to search before I add, upsert becomes difficult, as do deletions, and I have to worry about which copy of a given object is authorative. I understand this can't work for everyone but seems like it would be useful. If that can be done now, I'd love info on how to! I think the current standard approach to key/value storage is, like you mention, to store unique keys in an index. This does mean you have to build upsert abstractions yourself, always doing an index lookup before inserts or updates. As far as allowing neo4j clients to set ids for nodes, I think the problems that would create (for instance in High Availability setups where each slave gets a set of ids it can assign) seems like they would outweigh the benefits. (2) I'd like a delete_all find of API to be able to delete all nodes matching a particular criteria versus having to do a search. For instance, I may need to rebuild the database, or part of it, and it's not clear on how to drop it. Also, is there a way to drop the entire database via REST? This feels like a two-part idea, both of which I like :) First, the ability to do manipulating operations like deleting and/or editing data on a large scale without having to pull down each node over http would be awesome. There is talk about putting together a query language, and that could potentially be outfitted to do mutating operations, similar to how SQL was extended to do that. Will definately keep this in mind! Second, the ability to nuke the database I think is a great thing to have in a development environment. A feature we're discussing is the ability to have multiple databases running in each neo4j server, allowing you to nuke and create databases as appropriate. For a faster fix, take a look at Michael Hungers db-nuker plugin: https://github.com/jexp/neo4j-clean-remote-db-addon (3) I'd like to be able to have the key of the node automatically added to the index without having to make a second call. Ideally I'd like to be able to configure the server to auto-index certain fields, which is something some of the NoSQL/search tools offer. Similarly, when updating the node, the index should auto update without an explicit call to the indexer. Agreed, auto-indexing would be *awesome*. There are some hard problems related to doing auto indexing *well* that need to be solved first, but this is something that I really hope we will end up implementing. (4) The capability to do an upsert would be very useful, create a node if it exists for the given key, if not, update it. Like I said above, the current approach I think is to put this logic on the client side, which is slower, but the logic for doing this without user-defined key-value style ids would potentially be very complex. I might be wrong, but it my gut feeling is that we can't do this well if we don't have user-defined ids. (5) It seems the indexes are the only means of search? If I need to search on a field that isn't indexed (say in production, I need to add a new index), how do I go about adding it for all the nodes that need to be added to the index *to* that index? It seems I'd need to be keeping at least an index of all nodes of a given type all along, so I could at least iterate over those?' The main means of searching the graph structure inside a neo4j database is by traversing it. Basically, you write a description for how to travel the graph and what data to return, and then you get a list of nodes, a list of relationships or a list of paths back, depending on what you asked for. The indexes are currently mainly used for simple lookups and for finding starting points for traversals. See http://components.neo4j.org/neo4j-server/milestone/rest.html#Traverse I think most of the underlying questions/problems I have are that I'm
Re: [Neo4j] REST results pagination
I'd like to propose that we put this functionality into the plugin (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I are currently working on, thoughts? I'm thinking that, if we do it, it should be handled through content negotiation. That is if you ask for application/atom then you get paged lists of results. I don't necessarily think that's a plugin, it's more likely part of the representation logic in server itself. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] WebCrawler-Data in Neo4j
Hey, I'm currently thinking about how my current data (in mysql + solr) would fit into Neo4j. In one of my documents, there are the 3 types of data I have: 1. Properties that have high cardinality: e.g. the domain name (www.example.org, unique), the subdomain name (www.), the host-name (example) 2. A bunch of numbers (the website latency (1244ms), the amount of incoming links (e.g. 2321)) 3. A number of 'tags' that have a relatively low cardinality (100). Things like the webserver (apache), the country (germany) As for the model, I think it would be something like this: - Every domain gets a node - #1 would be modeled as a property on the domain node - #2 would probably be put into a lucene index so I can sort on it later on - #3 could be modeled using relations. E.g. a node that has two properties: type:webserver and name:apache. All of the domain-nodes can have a relation called runs on the webserver Does this make sense? I am used to Document DBs, relational DBs and Column Stores, but Graph DBs are still pretty new to me and I don't think I got the model 100% :) Using this model, would I be able to filter subsets of the data such as All Domains that run on apache and are in Germany and have more than 200 incoming links sorted by the amount of links? I played a bit arround with the neography gem in Ruby and I could do stuff like: germany_nginx = germany_nodel.shortest_path_to(websrv_nginx).depth(2).nodes But I couldn't figure out how to expand this query Looking forward to the feedback! Marc -- Pessimists, we're told, look at a glass containing 50% air and 50% water and see it as half empty. Optimists, in contrast, see it as half full. Engineers, of course, understand the glass is twice as big as it needs to be. (Bob Lewis) ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST API thoughts/questions/feedback
On Tue, Apr 19, 2011 at 10:48 AM, Jacob Hansson ja...@voltvoodoo.com wrote: Hey Michael, big thanks again for taking the time to write down your experiences working with the REST API. See inline response. Thanks for the follow up. That's quite helpful and let's me know I'm not doing the unique-key-implementation in too much of a non-idiomatic way. I'll get back with you about doc fixes and should the bindings materialize further, I'll share some examples. --Michael ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Strange performance difference on different machines
Hi Tobias, On 2011-04-19, at 1:48 AM, Tobias Ivarsson wrote: Hi Bob, What happens here is that you perform a tiny operation in each transaction, so what you are really testing here is how fast your file system can flush, because with such tiny transactions all of the time is going to be spent in transactional overhead (i.e. flushing transaction logs to the disk). The reason you see such large differences between Mac OS X and Linux is because Mac OS X cheats. Flushing a file (fdatasync) on Mac does pretty much nothing. The only thing Mac OS X guarantees is that it will write the data that you just flushed before it writes the next data block you flush, so called ordered writes. This means that you could potentially get data-loss on hard failure, but never in a way that makes your data internally inconsistent. Okay, that's makes some sense. Thanks for the information. So to give a short answer to your questions: 1) The linux number is reasonable, Mac OS X cheats. 2) What you are testing is the write speed of your disk for writing small chunks of data. So you're thinking that 16 or 17 writes is what should be expected? Cheers, Bob Cheers, Tobias On Mon, Apr 18, 2011 at 10:57 PM, Bob Hutchison hutch-li...@recursive.cawrote: Hi, Using Neo4j 1.3 and the Borneo (Clojure) wrapper I'm getting radically different performance numbers with identical test code. The test is a simple-minded: create two nodes and a relation between them. No properties, no indexes, all nodes and relations are different. On OS X, it takes about 50s to perform that operation 50,000 times, 0.8s to do it 500 times. It uses roughly 30-40% of one core to do this. On linux it takes about 30s to perform that operation 500 times. The CPU usage is negligible (really negligible... almost none). I cannot explain the difference in behaviour. I have two questions: 1) is either of these a reasonable number? I hoping the OS X numbers are not too fast. 2) any ideas as to what might be the cause of this? The Computers are comparable. The OS X is a 2.8 GHz i7, the linux box is a 3.something GHz Xeon (I don't remember the details). Thanks in advance for any help, Bob ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user Bob Hutchison Recursive Design Inc. http://www.recursive.ca/ weblog: http://xampl.com/so ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Strange performance difference on different machines
I sure hope not! That's crazy slow, even with one transaction per operation... -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Bob Hutchison Sent: Tuesday, April 19, 2011 4:11 PM To: Neo4j user discussions Subject: Re: [Neo4j] Strange performance difference on different machines Hi Tobias, On 2011-04-19, at 1:48 AM, Tobias Ivarsson wrote: Hi Bob, What happens here is that you perform a tiny operation in each transaction, so what you are really testing here is how fast your file system can flush, because with such tiny transactions all of the time is going to be spent in transactional overhead (i.e. flushing transaction logs to the disk). The reason you see such large differences between Mac OS X and Linux is because Mac OS X cheats. Flushing a file (fdatasync) on Mac does pretty much nothing. The only thing Mac OS X guarantees is that it will write the data that you just flushed before it writes the next data block you flush, so called ordered writes. This means that you could potentially get data-loss on hard failure, but never in a way that makes your data internally inconsistent. Okay, that's makes some sense. Thanks for the information. So to give a short answer to your questions: 1) The linux number is reasonable, Mac OS X cheats. 2) What you are testing is the write speed of your disk for writing small chunks of data. So you're thinking that 16 or 17 writes is what should be expected? Cheers, Bob Cheers, Tobias On Mon, Apr 18, 2011 at 10:57 PM, Bob Hutchison hutch-li...@recursive.cawrote: Hi, Using Neo4j 1.3 and the Borneo (Clojure) wrapper I'm getting radically different performance numbers with identical test code. The test is a simple-minded: create two nodes and a relation between them. No properties, no indexes, all nodes and relations are different. On OS X, it takes about 50s to perform that operation 50,000 times, 0.8s to do it 500 times. It uses roughly 30-40% of one core to do this. On linux it takes about 30s to perform that operation 500 times. The CPU usage is negligible (really negligible... almost none). I cannot explain the difference in behaviour. I have two questions: 1) is either of these a reasonable number? I hoping the OS X numbers are not too fast. 2) any ideas as to what might be the cause of this? The Computers are comparable. The OS X is a 2.8 GHz i7, the linux box is a 3.something GHz Xeon (I don't remember the details). Thanks in advance for any help, Bob ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user Bob Hutchison Recursive Design Inc. http://www.recursive.ca/ weblog: http://xampl.com/so ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
On Tue, Apr 19, 2011 at 10:25, Jim Webber j...@neotechnology.com wrote: I've just checked and that's in our list of stuff we really should do because it annoys us that it's not there. No promises, but we do intend to work through at least some of that list for the 1.4 releases. If this finally is developed, it will possible to request for all nodes and all relationships in some URL? Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Javier de la Rosa http://versae.es ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user