Re: [Neo4j] path finding using OSM ways
Hi Bryce, Nice to see you back. The OSM data model in Neo4j-Spatial, created by the OSMImporter, is designed to mimic the complete contents of the XML files provided for OSM. As it is, this is not ideal for routing because it traces the complete set of nodes for the ways, while for routing you really want a graph that connects each waypoint by a single relationship. So, if I were to perform routing on top of the OSM model, I would actually build an overlap graph that just connects the waypoints. The current model has a vertex called a 'way', but that is not a way-point, because it represents the entire way (eg. a street). We would need to do the following: - Identify ways that are streets (as opposed to non-routing types like regions, buildings, lakes, etc.) - Identify the points that are intersections (way-points) - Create a way-point node for these - Add relationships between way points if they are connected by streets in the OSM model - Weight the relationships by the length of the streets - Then apply the A* algorithm (which I have no experience with myself, but others in neo4j certainly do) I think everything but the last part would be very easy to add to the OSMImporter itself, so that the routing graph exists in any OSM model. Today it does not exist, and routing would be more difficult and expensive (since you would have to traverse a much more complex graph, unnecessarily). Regards, Craig On Tue, May 31, 2011 at 4:31 AM, bryce hendrix brycehend...@gmail.comwrote: I am finally getting back to experimenting with Neo4j. Because it has been a while since I last looked at it, I've forgotten just about everything. I want to start with something simple, is there any sample code which does A* path finding over OSM ways? Thanks, Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Sample Linear Referencing Functions in Neo4j Spatial and GSoC
Hi, Recently someone asked a question on StackOverflow, if Neo4j Spatial was capable of one of the Oracle geoprocessing funtions, SDO_LRS.LOCATE_PT specifically. Since this is related to the ongoing GSoC projects for Neo4j Spatial, I thought I would do a quick investigation. What I found was that the requested capabilities are available in JTS (which we include in Neo4j Spatial), but with very different names. The code to achieve this in JTS is 'new LengthIndexedLine(geometry).extractPoint(measure,offset)'. I have wrapped these in the SpatialTopologyUtils.locatePoint(geometry,measure,offset), so that it is accessible together with some other spatial topology functions, and also looks more like the Oracle function. I pushed this to github, and think it can be included as a prototype for the discussions for the GSoC on Geoprocessing. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [SoC] Re: GSoC 2011 Weekly report - OSM data mining and editing capabilities in uDig and Geotools
Hi Mirco, Sounds like progress. Some suggestions: - I do not think you need to change the code for neo4j and udig, but only for neo4j-spatial and udig-community/neo4j. It is OK to make clones of those so you have the code for review, but they are quite core, and you should not need to actually change them. - Focus on neo4j-spatial and udig-community/neo4j, which are the two projects you will certainly make changes to. All uDig GUI changes can be made in udig-community/neo4j. - You might even want to make a new udig plugin in a new git project, perhaps udig-community/osm, for the OSM editor work. The neo4j plugin would provide the communication layer for neo4j and any neo4j data sources, while the OSM plugin would provide OSM specific features, including the additional views and editors required to support a complete 'OSM Editor' capability. Regards, Craig On Sun, Jun 5, 2011 at 1:51 AM, Mirco Franzago mircofranz...@gmail.comwrote: Weekly report #2 ==What I did== - The main work was to set-up the whole devel enviroment: eclipse + udig + neo4j. - I forked the repository on github for my code: [0], [1] and [2] are respectively the repositories for udig, neo4j and neo4j-spatial. - The target was to have eclipse with the udig sdk took from github, just as neo4j, to be able to commit the udig code and the neo4j code from the same envoroment. - I set-up the apache maven tool and the e-git plugin to be able to use them directly from eclipse. - After these steps and some fighting against the jars to import it was possible to execute udig with the neo4j plugins and to test the main functionalities. - I started the code analysis to understand where put my hands next week :-) ==Next week plan== - Fix some last problems for a new git user with the commit command. - Finally start the real coding after the initially head-cracking problems. [0] https://github.com/mircofranzago/udig-platform [1] https://github.com/mircofranzago/neo4j [2] https://github.com/mircofranzago/neo4j-spatial 2011/5/31 Mirco Franzago mircofranz...@gmail.com Hi all, I am Mirco Franzago and I started to work to my google summer of code 2011 project. I weekly will update this thread to let the community know about the work done and the work that will do. Last week I could not to do much cause I was very busy for my last exam before summer. Now I'm ready to start for this new job. ___ SoC mailing list s...@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/soc ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
Hi, The bounding boxes are used by the RTree index, which is a typical way to index spatial data. For Point data, the lat/long and the bounding box are the same thing, but for other shapes (streets/LineString and Polygons), the bounding box is quite different to the actual geometry (which is not just a single lat/long, but a set of connected points forming a complex shape). The RTree does not differentiate between points and other geometries, because it cares only about the bounding box, and therefor we provide that even for something as simple as a Point. Does that answer the question? Regards, Craig On Tue, Jun 7, 2011 at 4:57 PM, Boris Kizelshteyn bo...@popcha.com wrote: Greetings! Perhaps someone using neo4j-spatial can answer this seemingly simple question. Nodes classified into layers have both lat/lon properties and bounding boxes, the bounding box seems to be required to establish the relationship between node and layer, however the node is not found if the lat/lon does not match the query. Can someone explain the relationship between these two properties on a node? Many thanks! ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
I think you need to differentiate the bounding boxes of the data in the layer (stored in the database), and the bounding box of the search query. The search query is not stored in the database, and will not be seen as a node or nodes in the database. So if you want to search for data within some bounding box or polygon, then express that in the search query, and you do not need to care about how your nodes are stored in the database. So when you say you want to make a larger bounding box, I assume you are talking about the query itself. The REST API has the method findGeometriesInLayer, which takes minx, maxx, miny, maxy parameters and you can set those to whatever you want for your query. The REST API also exposes the CQL query language supported by GeoTools. This allows you to perform SQL-like queries on geometries and feature attributes. For example, you can search for all objects within a specific polygon (not just a rectangular bounding box), as well as conforming to certain attributes. See http://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.html for some examples of CQL. However, our current CQL support is not fully integrated with the RTree index. This means that the CQL itself will not benefit from the index, but be a raw search. You can, however, still get the benefit of the index by passing in the bounding box separately. So, for example, you want to search for data in a polygon. Make the polygon object, get it's bounding box and also the CQL query string. Then make a 'dynamic layer' using the CQL (which is a bit like making a prepared statement). Then perform the same 'findGeometriesInLayer' method mentioned above, using the bounding box and the dynamic layer (containing the CQL). This has the effect of using the RTree index for a first approximate search, followed by pure CQL for the final mile. See examples of this in action in the Unit tests in the source code. https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/ServerPluginTest.java#L109 has examples of CQL queries on the REST API. On Tue, Jun 7, 2011 at 5:48 PM, Boris Kizelshteyn bo...@popcha.com wrote: Thanks! So it seems you are saying that the bounding box represents a single point and is the same as the lat/lat lon? What if I make the bounding box bigger? What I am trying to do is geo queries against a bounding box made of a set of points, rather than individual points. So the query is, find the nodes where the given point falls inside their bounding boxes. Can I do this with REST? Thanks! On Tue, Jun 7, 2011 at 11:34 AM, Craig Taverner cr...@amanzi.com wrote: Hi, The bounding boxes are used by the RTree index, which is a typical way to index spatial data. For Point data, the lat/long and the bounding box are the same thing, but for other shapes (streets/LineString and Polygons), the bounding box is quite different to the actual geometry (which is not just a single lat/long, but a set of connected points forming a complex shape). The RTree does not differentiate between points and other geometries, because it cares only about the bounding box, and therefor we provide that even for something as simple as a Point. Does that answer the question? Regards, Craig On Tue, Jun 7, 2011 at 4:57 PM, Boris Kizelshteyn bo...@popcha.com wrote: Greetings! Perhaps someone using neo4j-spatial can answer this seemingly simple question. Nodes classified into layers have both lat/lon properties and bounding boxes, the bounding box seems to be required to establish the relationship between node and layer, however the node is not found if the lat/lon does not match the query. Can someone explain the relationship between these two properties on a node? Many thanks! ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Sample Linear Referencing Functions in Neo4j Spatial and GSoC
Done. Although now we have 20 lines of comments for 1 line of method code. Previously we had 4 lines of comments for one line of code. Whew! On Tue, Jun 7, 2011 at 11:02 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Very cool. Maybe you could just doc the parameters more than pointing to the Oracle reference, so one can see it directly in the JavaDoc? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jun 2, 2011 at 2:13 PM, Craig Taverner cr...@amanzi.com wrote: Hi, Recently someone asked a question on StackOverflow, if Neo4j Spatial was capable of one of the Oracle geoprocessing funtions, SDO_LRS.LOCATE_PT specifically. Since this is related to the ongoing GSoC projects for Neo4j Spatial, I thought I would do a quick investigation. What I found was that the requested capabilities are available in JTS (which we include in Neo4j Spatial), but with very different names. The code to achieve this in JTS is 'new LengthIndexedLine(geometry).extractPoint(measure,offset)'. I have wrapped these in the SpatialTopologyUtils.locatePoint(geometry,measure,offset), so that it is accessible together with some other spatial topology functions, and also looks more like the Oracle function. I pushed this to github, and think it can be included as a prototype for the discussions for the GSoC on Geoprocessing. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] GSoC 2011 Neo4j Geoprocessing | Weekly Report #2
I suggest you code review them first. Especially since there are API changes. On Tue, Jun 7, 2011 at 10:11 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Very nice Andreas! You consider it safe to pull these changes into the main repo? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sun, Jun 5, 2011 at 1:39 PM, Andreas Wilhelm a...@kabelbw.de wrote: Hi, This week I implemented update and search capability for spatial functions and following spatial functions with JUnit tests: ST_AsText, ST_AsKML, ST_AsGeoJSON, ST_AsBinary and ST_Reverse. Best Regards Andreas ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
OK. I understand much better what you want now. Your person nodes are not geographic objects, they are persons that can be at many positions and indeed move around. However, the 'path' that they take is a geographic object and can be placed on the map and analysed geographically. So the question I have is how do you store the path the person takes? Is this a bunch of position nodes connected back to that person? Or perhaps a chain of position-(next)-position-(next)-position, etc? However you have stored this in the graph, you can express this as a geographic object by implementing the GeometryEncoder interface. See, for example, the 6 lines of code it takes to traverse a chain of NEXT locations and produce a LineString geometry in the SimpleGraphEncoder at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82If you do this, you can create a layer that uses your own geometry encoder (or the SimpleGraphEncoder I referenced above, if you use the same graph structure) and your own domain model will be expressed as LineString geometries and you can perform spatial operations on them. Alternatively, if your data is more static in nature, and you are analysing only what the person did in the past, and the graph will therefor not change, perhaps you do not care to store the locations in the graph, and you can just import them as a LineString directly into a standard layer. Whatever route you take, the final action you want to perform is to find points near the LineString (path the person took). I do not think the bounding box is the right approach for that either. You need to try, for example, the method findClosestEdges in the utilities class at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115This method can find the part of the persons path that it closest to the point of interest. There also also many other geographic operations you might be interested in trying, once you have a better feel for the types of queries you want to ask. Regards, Craig On Wed, Jun 8, 2011 at 2:17 AM, Boris Kizelshteyn bo...@popcha.com wrote: Thanks for the detailed response! Here is what I'm trying to do and I'm still not sure how to accomplish it: 1. I have a node which is a person 2. I have geo data as that person moves around the world 3. I use the geodata to create a bounding box of where that person has been today 4. I want to say, was this person A near location X today? 5. I do this by seeing if location X is in A's bounding box. From looking at what you suggest doing, it's not clear how I assign the node person A to a layer? Is it that the bounding box is now in the layer and not in the node? The issue then becomes, how od I associate the two as the RTree relationship seems to establish itself on the bounding box between the node and the layer. Many thanks for your patience as I learn this challenging material. On Tue, Jun 7, 2011 at 4:13 PM, Craig Taverner cr...@amanzi.com wrote: I think you need to differentiate the bounding boxes of the data in the layer (stored in the database), and the bounding box of the search query. The search query is not stored in the database, and will not be seen as a node or nodes in the database. So if you want to search for data within some bounding box or polygon, then express that in the search query, and you do not need to care about how your nodes are stored in the database. So when you say you want to make a larger bounding box, I assume you are talking about the query itself. The REST API has the method findGeometriesInLayer, which takes minx, maxx, miny, maxy parameters and you can set those to whatever you want for your query. The REST API also exposes the CQL query language supported by GeoTools. This allows you to perform SQL-like queries on geometries and feature attributes. For example, you can search for all objects within a specific polygon (not just a rectangular bounding box), as well as conforming to certain attributes. See http://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.htmlfor some examples of CQL. However, our current CQL support is not fully integrated with the RTree index. This means that the CQL itself will not benefit from the index, but be a raw search. You can, however, still get the benefit of the index by passing in the bounding box separately. So, for example, you want to search for data in a polygon. Make the polygon object, get it's bounding box and also the CQL query string. Then make a 'dynamic layer' using the CQL (which is a bit like making a prepared statement
Re: [Neo4j] neo4j-spatial
Hi Saikat, Yes, your explanation was clear, but I was busy with other work and failed to repond - my bad ;-) Anyway, your idea is nice. And I can think of a few ways to model this in the graph, but at the end of the day the most important thing to decide first is what queries are you going to perform? Do you want a creative map, that while not drawn to scale, can still be asked questions like 'how far from the roller-coaster to the closest lunch venue?'. That kind of question could make use of the graph and the spatial extensions to provide an answer and show the route on the creative map, even if it is not a real to-scale map. Is that what you want to see? You can try contact me on skype also. Regards, Craig On Thu, Jun 9, 2011 at 5:35 AM, Saikat Kanjilal sxk1...@hotmail.com wrote: Hi Craig,Following up on this thread, was this explanation clear? If so I'd like to talk more details.Regards From: sxk1...@hotmail.com To: user@lists.neo4j.org Subject: RE: [Neo4j] neo4j-spatial Date: Sun, 5 Jun 2011 20:15:27 -0700 Hey Craig,Thanks for responding, so to be clear a theme park can have its own map created by the graphic artists that work at the theme park company, this map is sometimes 2D or sometimes a 3D map that really has no notion of lat long coordinates or GPS. What I am proposing is that we have the ability to inject GPS coordinates into this creative map through some mechanism that understands what the GPS coordinates of each point in this creative map are. So thats where the google map comes in, the google or bing map would potentially have lat long coordinates of every point in a theme park, so now the challenge is how do we transfer that knowledge inside this 2D or 3D creative map so that we can run neo4j traversal algorithms inside a map that has been injected with GPS data. A theme park is just the beginning, imagine having the power to inject this information into any 2D or 3D map, that would be pretty amazing.In essence I am doing this so that the creative map itself can use neo4j and be highly interactive and meaningful. Let me know if that's still unclear and if so lets talk on skype. Regards Date: Mon, 6 Jun 2011 01:13:08 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-spatial Hi Saikat, This sounds worth discussing further. I think I need to hear more about your use case. I do not know what the term 'creative map' means, and what traversals you are planning to do? When you talk about 'plotting points', do you mean you have a GPS and are moving inside a real theme park and want to see this inside google maps? Or are you just drawing a path on an interactive GIS? I think once I have some more understanding of what your use case is, what problem you are trying to solve, I am sure I will be able to give advice on how best to approach it, if it relates to anything else we are doing, or whether this is something you would need to put some coding time into :-) Regards, Craig On Sun, Jun 5, 2011 at 8:26 PM, Saikat Kanjilal sxk1...@hotmail.com wrote: Craig et al,I have an interesting usecase that I've been thinking about and I was wondering if it would make a good candidate for inclusion inside neo4j-spatial, I've read through the wiki ( http://wiki.neo4j.org/content/Collaboration_on_Spatial_Projects) and was interested in using neo4j-spatial to take any creative 2D Map and geo-enabling it. To explain in more detail lets say you are at a certain latitude and longitude in a theme park inside a google map (or a bing map), now you want to have the ability to reference that same latitude and longitude inside a 2d or a 3d creative map of that theme park and then be able to plot these points and enable traversal algorithms inside the creative map. I was wondering if you guys are thinking about this usecase, if not I'd love to work on and discuss this in more detail to see whether this fits into the neo4j-spatial roadmap. Thoughts? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversals versus Indexing
Think of your domain model graph as a kind of index. Traversing that should generally be faster than a generic index like lucene. Of course some things do not graph well, and you should use lucene for those. But if you can find something with a graph traversal, that is likely the way to go. Also you should think of structuring the graph to suit the queries you plan to perform. Then you will optimize the traversals. On Jun 13, 2011 11:33 AM, espeed ja...@jamesthornton.com wrote: It depends on the traversal you are running. -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Neo4j-Traversals-versus-Indexing-tp3057515p3057538.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Auto Indexing for Neo4j
This is great news. Now I'm really curious about the next step, and that is allowing indexes other than lucene. For example, the RTree index in neo4j-spatial was never possible to wrap behind the normal index API, because that was designed only for properties of nodes (and relationships), but the RTree is based on something completely different (complete spatial geometries). However, the new auto-indexing feature implies that any node can be added to an index without the developer needing to know anything about the index API. Instead the index needs to know if the node is appropriate for indexing. This is suitable for both lucene and the RTree. So what I'd like to see is that when configuring auto-indexing in the first place, instead of just specifying properties to index, specify some indexer implementation that can be created and run internally. For example, perhaps you pass the classname of some class that implements some necessary interface, and then that is instantiated, passed config properties, and used to index new or modified nodes. One method I could imagine this interface having would be a listener for change events to be evaluated for whether or not the index should be activated for a node change. For the lucene property index, this method would return true if the property exists on that node. For the RTree this method would return true if the node contained the meta-data required for neo4j-spatial to recognize it as a spatial type? Alternatively just an index method that does nothing when the nodes are not to be indexed, and indexes when necessary? So, are we now closer to having this kind of support? On Tue, Jun 14, 2011 at 11:30 PM, Chris Gioran chris.gio...@neotechnology.com wrote: Good news everyone, A request that's often come up on the mailing list is a mechanism for automatically indexing properties of nodes and relationships. As of today's SNAPSHOT, auto-indexing is part of Neo4j which means nodes and relationships can now be indexed based on convention, requiring far less effort and code from the developer's point of view. Getting hold of an automatic index is straightforward: AutoIndexerNode nodeAutoIndexer = graphDb.index().getNodeAutoIndexer(); AutoIndexNode nodeAutoIndex = nodeAutoIndexer.getAutoIndex(); Once you've got an instance of AutoIndex, you can use it as a read-only IndexNode. The AutoIndexer interface also supports runtime changes and enabling/disabling the auto indexing functionality. To support the new features, there are new Config options you can pass to the startup configuration map in EmbeddedGraphDatabase, the most important of which are: Config.NODE_AUTO_INDEXING (defaults to false) Config.RELATIONSHIP_AUTO_INDEXING (defaults to false) If set to true (independently of each other) these properties will enable auto indexing functionality and at the successful finish() of each transaction, all newly added properties on the primitives for which auto indexing is enabled will be added to a special AutoIndex (and deleted or changed properties will be updated accordingly too). There are options for fine grained control to determine properties are indexed, default behaviors and so forth. For example, by default all properties are indexed. If you want only properties name and age for Nodes and since and until for Relationships to be auto indexed, simply set the initial configuration as follows: Config.NODE_KEYS_INDEXABLE = name, age; Config.RELATIONSHIP_KEYS_INDEXABLE=since, until; For the semantics of the auto-indexing operations, constraints and more detailed examples, see the documentation available at http://docs.neo4j.org/chunked/1.4-SNAPSHOT/auto-indexing.html We're pretty excited about this feature since we think it'll make your lives as developers much more productive in a range of use-cases. If you're comfortable with using SNAPSHOT versions of Neo4j, please try it out and let us know what you think - we'd really value your feedback. If you're happier with using packaged milestones then this feature will be available from 1.4 M05 in a couple of weeks from now. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Slow Traversals on Nodes with too many Relationships
Could this also be related to the possibility that in order to determine relationship type and direction, the relationships need to be loaded from disk? If so, then having a large number of relationships on the same node would decrease performance, if the number was large enough to affect the disk io caching. If this is the case, perhaps adding a proxy node for the incoming relationships would work-around the problem? Of course then you have doubled the number of part nodes (two for each part, one part and one containers proxy). On Wed, Jun 15, 2011 at 10:27 PM, Rick Bullotta rick.bullo...@thingworx.com wrote: I would respectfully disagree that it doesn't necessarily represent production usage, since in some cases, each query/traversal will be unique and isolated to a part of a subgraph, so in some cases, a cold query may be the norm -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Michael Hunger Sent: Wednesday, June 15, 2011 10:25 AM To: Neo4j user discussions Subject: Re: [Neo4j] Slow Traversals on Nodes with too many Relationships That is rather a case of warming up your caches. Determining the traversal speed from the first run is not a good benchmark as it doesn't represent production usage :) The same (warming up) is true for all kinds of benchmarks (except for startup performance benchmarks). Cheers Michael Am 15.06.2011 um 14:48 schrieb Agelos Pikoulas: I have a few Part nodes related with each via HASPART relationship/edges. (eg Part1---HASPART---Part2---HASPART---Part3 etc) . TraversalDescription works fine, following each Part's outgoing HASPART relationship. Then I add a large number (say 100.000) of Container Nodes, where each Container has a CONTAINS relation to almost *every* Part node. Hence each Part node now has a 100.000 incoming CONTAINS relationships from Container nodes, but only a few outgoing HASPART relationships to other Part nodes. Now my previous TraversalDescription run extremely slow (several seconds inside each IteratorPath.next() call) Note that I do define relationships(RT.HASPART, Direction.OUTGOING) on the TraversalDescription, but it seems its not used by neo4j as a hint. Note that on a subsequent run of the same Traversal, its very quick indeed. Is there any way to use Indexing on relationships for such a scenario, to boost things up ? Ideally, the Traversal framework could use automatic/declerative indexing on Node Relationship types and/or direction to perform such traversals quicker. Regards ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Most Efficient way to query in my use cases
Another common thing to do in this case is create a node for the purchase action. This node would be related to the purchaser (user), item (pen) and shop, and would contain data appropriate to the purchase (date/time, price, etc). Then traverse from the shop or the pen to all purchase actions that reference the other one (shop or pen). On Thu, Jun 16, 2011 at 4:48 AM, Jim Webber j...@neotechnology.com wrote: Hi Manav, I think there's a relationship missing here. Pen--SOLD_BY--shop That way it's easy to find all the pens that a shop sold, and who them sold them to. In general modelling your domain expressively does not come at an increase cost with Neo4j (caveat: you can still create write hotspots). Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Slow Traversals on Nodes with too many Relationships
I understood that on windows the memory mapped sizes needed to be included in the heap, since they are not allocated outside the heap as they are on linux/mac. So in this case he needs a larger heap (and make sure the memory mapped files are much smaller than the heap). The relevant part of the configuration settings doc says: When running Neo4j on Windows the size of the memory-mapped nioneo configurations need to be added to the heap size parameter. On Linux and Unix-systems memory mapped IO is not included in the heap size. I still think that the solution to this case is to group the different relationship types into separate sub-graphs, so that the performance of traversing HAS_ONE is not affected by the number of relationships of CONTAINS. Of course traversing the CONTAINS will still be slow without increasing the cache, as you suggest. On Thu, Jun 16, 2011 at 12:07 AM, Michael Hunger michael.hun...@neotechnology.com wrote: Agelos, sorry, didn't want to sound that way. 512M ram is not very much for larger graphs. Neo4j has to cache nodes, relationships in the heap as well as you own datastructures. The memory mapped files for the datastores are kept outside the heap. Normally with your 4G I'd suggest using about 1.5G for heap and 1.5G for the memory mapped files. http://wiki.neo4j.org/content/Configuration_Settings Do you have a small test-case available that creates your graph and runs your traversal? Then I could have a look at that and also do some profiling to determine the issues for this slowdown. The indexing doesn't help as it also has to hit caches or disk. The graph traversal is normally a very efficient operation that shouldn't experience this bad performance. Cheers Michael P.S. I just use my mail client for handling the mailing list and it works fine for me. Imho Gmail groups threads automatically. Am 15.06.2011 um 17:40 schrieb Agelos Pikoulas: Re: [Neo4j] Slow Traversals on Nodes with too many Relationships I have to respectfully agree with Rick Bullotta. I was suspecting the big-O is not linear for this case. To verify I added x4 Container nodes (400.000) and their appropriate Relationships, and it is now *unbelievably* slow : It does not take x4 more, but it takes more than 30-40 seconds for each next() Remind you 100K nodes = ~2secs for each next() !!! And only to make matters worse, the subsequent runs weren't fast either - they actually took more time than the first (1st TotalTraversalTime= 389936ms, 2nd TotalTraversalTime= 443948ms) The whole setup is running on Eclipse 3.6, with -Xmx512m on JavaVM, Windows2003 VMWare machine with 4GB, running on a fast 2nd gen SSD (OCZ Vertex 2). The neo4J data resides on this SSD. The 100.000 nodes data files were ~250MB, the 400.000 one is ~1GB. I wonder what would happen if the Container nodes were a few million (which will be my case) - it will run forever. Could you please looking into my suggestion - i.e Using a 'smart' behind the scenes Indexing on both *RelationshipType* and *Direction* that Traversals actually use to boost things up ? To another topic, how does one use this mailing list - I use it through gmail and I am utterly lost - is there a better client/UI to actually post/reply into threads ? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Auto Indexing for Neo4j
I am using only one relationship type in my index tree, and made traversal decisions based on properties of the tree nodes, but have considered an 'optimization' based on embedding the index keys into the relationship types, which I think is what you did. However, I am not convinced it will work well because I suspect there will be losses if the total number of relationship types gets very high. I think this is a separate issue to the total number of relationships, but might affect all traversers, since there must exist a hashmap of all relationship types. Still it is very cool what Peter says below, because if all these 'experiments' with in-graph indexes can get put behind the standard index API, then we can get much more testing of this approach, and hopefully learn what we need to make this a viable solution for wide use. On Wed, Jun 15, 2011 at 4:56 AM, Michael Hunger michael.hun...@neotechnology.com wrote: A problem with a probably dumb index in a graph that I created for an experiment was the performance of getAllRelationships on that machine (it was a very large graph with all nodes being indexed). It was a mapping from long values to nodes, my simplistic approach just chopped the long values into chunks of 3 digits and used those 3 digits as relationship-types (i.e. 1000 additional rel-types). to form a tree which pointed to the node in question at the end. Will have to investigate that further. Am 14.06.2011 um 23:43 schrieb Peter Neubauer: Craig, the autoindexing is one step in this direction. The other is to enable the Spatial and other in-graph indexes like the graph-collections (timeline etc) at all to be treated like normal index providers. When that is done (will talk to Mattias who is coming back from vacation tomorrow on that), we are in a position to think about more complex autoindex providers. Also, the possibility to treat Neo4j Spatial and other graph structures as index providers, would hook into the index framework and expose things to higher level queries like Cypher and Gremlin, e.g. combining a spatial bounding box geometry search with a graph traversal for suitable properties that are less than 2 kilometers from the nearest school, sorting the results, returning only price and lat as columns, the 3 topmost hits. START geom = (index:spatial:'BBOX(the_geom, -90, 40, -60, 45)') MATCH (geom)--(fast), (fast)-[r, :NEAR]-(school) WHERE fast.roooms4 AND school.classes4 AND r.length2return fast.pic?, fast.lon?, fast.lat? SORT BY fast.price, fast.lat^ SLICE 3 So, I think the next step is to make in-graph indexing structures plug into the index framework, and then into autoindexing :) Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jun 14, 2011 at 5:49 PM, Craig Taverner cr...@amanzi.com wrote: This is great news. Now I'm really curious about the next step, and that is allowing indexes other than lucene. For example, the RTree index in neo4j-spatial was never possible to wrap behind the normal index API, because that was designed only for properties of nodes (and relationships), but the RTree is based on something completely different (complete spatial geometries). However, the new auto-indexing feature implies that any node can be added to an index without the developer needing to know anything about the index API. Instead the index needs to know if the node is appropriate for indexing. This is suitable for both lucene and the RTree. So what I'd like to see is that when configuring auto-indexing in the first place, instead of just specifying properties to index, specify some indexer implementation that can be created and run internally. For example, perhaps you pass the classname of some class that implements some necessary interface, and then that is instantiated, passed config properties, and used to index new or modified nodes. One method I could imagine this interface having would be a listener for change events to be evaluated for whether or not the index should be activated for a node change. For the lucene property index, this method would return true if the property exists on that node. For the RTree this method would return true if the node contained the meta-data required for neo4j-spatial to recognize it as a spatial type? Alternatively just an index method that does nothing when the nodes are not to be indexed, and indexes when necessary? So, are we now closer to having this kind of support? On Tue, Jun 14, 2011 at 11:30 PM, Chris
Re: [Neo4j] More spatial questions
Hi Nolan, I think I can answer a few of your questions. Firstly, some background. The graph model of the OSM data is based largely on the XML formated OSM documents, and there you will find 'nodes', 'ways', 'relations' and 'tags' each as their own xml-tag, and as a consequence each will also have their own neo4j-node in the graph. Another point is that the geometry can be based on one or more nodes or ways, and so we always create another node for the geometry, and link it to the osm-node, way or relation that represents that geometry. What all this boils down to is that you cannot find the tags on the geometry node itself. You cannot even find the location on that node. If you want to use the graph model in a direct way, as you have been trying, you really do need to know how the OSM data is modeled. For example, for a LineString geometry, you would need to traverse from the geometry node to the way node and finally to the tags node (to get the tags). To get to the locations is even more complex. Rather than do that, I would suggest that you work with the OSM API we provided with the OSMLayer, OSMDataset and OSMGeometryEncoder classes. Then you do not need to know the graph model at all. For example, OSMDataset has a method for getting a Way object from a node, and the returned object can be queried for its nodes, geometry, etc. Currently we provide methods for returning neo4j-nodes as well as objects that make spatial sense. One minor issue here is the ambiguity inherent in the fact that both neo4j and OSM make use of the term 'node', but for different things. We have various solutions to this, sometimes replacing 'node' with 'point' and sometimes prefixing with 'osm'. The unit tests in TestsForDocs includes some tests for the OSM API. My first goal is to find the nearest OSM node to a given lat, lon. My attempts seem to be made of fail thus far, however. Here's my code: Most of the OSM dataset is converted into LineStrings, and what you really want to do is find the closest vertex of the closest LineString. We have a utility function 'findClosestEdges' in the SpatialTopologyUtils class for that. The unit tests in TestSpatialUtils, and the testSnapping() method in particular, show use of this. My thinking is that nodes should be represented as points, so I can't see why this fails. When I run this in a REPL, I do get a node back. So far so good. Next, I want to get the node's tags. So I run: The spatial search will return 'geometries', which are spatial objects. In neo4j-spatial every geometry is represented by a unique node, but it is not required that that node contain coordinates or tags. That is up to the GeometryEncoder. In the case of the OSM model, this information is elsewhere, because of the nature of the OSM graph, which is a highly interconnected network of points, most of which do not represent Point geometries, but are part of much more complex geometries (streets, regions, buildings, etc.). n.getSingleRelationship(OSMRelation.TAGS, Direction.INCOMING) The geometry node is not connected directly to the tags node. You need two steps to get there. But again, rather than figure out the graph yourself, use the API. In this case, instead of getting the geometry node from the SpatialDatabaseRecord, rather just get the properties using getPropertyNames and getProperty(String). This API works the same on all kinds of spatial data, and in the case of OSM data will return the TAGS, since those are interpreted as attributes of the geometries. n.getSingleRelationship(OSMRelationship.GEOM, Direction.INCOMING).getOtherNode(n).getPropertyKeys I see what appears to be a series of tags (oneway, name, etc.) Why are these being returned for OSMRelation.GEOM rather than OSMRelation.TAGS? These are not the tags. Now you have found the node representing an OSM 'Way'. This has a few properties on it that are relevant to the way, the name, whether the street is oneway or not, etc. Sometimes these are based on values in the tags, but they are not the tags themselves. This node is connected to the geometry node and the tags node, so you were half-way there (to the tags that is). You started at the geometry node, and stepped over to the way node, and one more step (this time with the TAGS relationship) would have got you to the tags. But again, I advise against trying to explore the OSM graph by itself. As you have already found, it is not completely trivial. What you should have done is access the attributes directly from the search results. Additionally, I see the property way_osm_id, which clearly isn't a tag. It would also seem to indicate that this query returned a way rather than a node like I'd hoped. This conclusion is further born out by the tag names. So clearly I'm not getting the search correct. But beyond that, the way being returned by this search isn't close to the lat,lon I provided. What am I missing? The lat/long values are quite a bit deeper in the graph. In the case
Re: [Neo4j] neo4j-spatial roadmap/stability
Hi Christopher, Thanks for your interest in neo4j and neo4j-spatial. I will answer your questions and comments inline. I am working for the largest German speaking travel and holiday portal. Currently we are using a relatively simple MySQL based spatial distance functionality. We plan to enhance this by something which is capable of a flexible set of spatial queries. We will evaluate Neo4j-Spatial for that and benchmark it against PostGIS/PostGreSQL. This would be a very interesting application for neo4j-spatial. I'm sure we could support you in that. Obviously it is not as mature as PostGIS, but I think it is very suitable for flexible queries, especially if you plan to combine a complex domain model with spatial data, or expose a spatial element to existing domains. I found some Roadmap descriptions in the Neo4j Wiki ( http://wiki.neo4j.org/content/Neo4j_Spatial_Project_Plan), but I am not sure that these are still valid. Craig said (somewhere) that Neo4j Spatial is still alpha (I hope that this means that only the interfaces are still unstable). And I know that neo4j-spatial is an open source project where there is no Neo Technology responsibility. The project plan you found was unfortunately the original plan put down before neo4j-spatial really started, and represents the expectations for 2010. Most of these were met, and several other capabilities achieved in addition. I will edit the wiki to more accurately reflect the current status of the project. However, it is still true that it is in an alpha state. The API's are likely to change. Since last September we have viewed it as an alpha release, available for people to try out and provide feedback on. We believe it is capable of many useful tasks, and can be used for real applications. But it has not been in the 'wild' for long, and so there are probably remaining bugs and performance issues. In addition, as mentioned before, we will almost certainly change the API's a little as we receive more feedback and move the system forward. Already in 2011 there have been three new additions influencing the API: the SimplePointLayer for LBS and related capabilities, the beginnings of the REST API for inclusion in Neo4j-Server, and the Geoprocessing features. Can you drop a few words about the Spatial roadmap, its stability and planned licensing (all based on using it on a high volume web site)? I think we need Peter's opinion on the licensing. I believe it is currently the same as neo4j itself. The code comments state AGPL, and I am not sure if the recent decision to move the core to GPL is applicable to the spatial code. For the roadmap we will also update the wiki pages. Currently the efforts are to: - Improve the OSM model API (some basic API for exploring the OSM ways and nodes, already in place but needing some refinement) - Improve the REST API for spatial (we have some customers trying this out, and will make enhancements based on their feedback) - Integrate the spatial index into the new automatic indexing feature of Neo4j (some initial prototype of this is in place, and will be refined for the 1.5 release of Neo4j) - Improved Geoprocessing support, particularly on the OSM model. This is involving a GSoC project and will be presented at FOSS4G in Denver this year. See http://2011.foss4g.org/sessions/geoprocessing-neo4j-spatial-and-osm Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j -- Can it be embedded in Android?
I heard that Peter Neubauer made a port of neo4j to android a few years ago, but that nothing has been done since and no version since then would work. So my understanding is that it does not work on android, but that it is possible to make it work (with some work ;-). Peter is away, but I expect he would have a better answer than me. On Fri, Jun 24, 2011 at 1:33 PM, Sidharth Kshatriya sid.kshatr...@gmail.com wrote: Dear All, I have googled for this on the web and did not arrive at a satisfactory answer. *Question: Is it possible to run Neo4j on Android? * Thanks, Sidharth -- Sidharth Kshatriya www.sidk.info ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j -- Can it be embedded in Android?
Personally what I would like to see would be a sub-graph approach, with the android device storing a sub-graph of the main database, and updating that asynchronously with the server. Seems like something that can be done in a domain specific way, but much harder to do generically. I wanted this for OSM, with the local OSM graph on the android device representing a local map supporting fast LBS services, and automatically updating from the main OSM graph on a big central server as the user travels. On Fri, Jun 24, 2011 at 2:56 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: I think the limited capabilities of the Android device(s) (RAM, primarily) limit the usefulness of Neo4J versus alternatives since the datasets are usually small and simple in mobile apps. If we need any heavy-duty graph work for a mobile app, we'd do it on the server. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Sidharth Kshatriya Sent: Friday, June 24, 2011 8:53 AM To: Neo4j user discussions Subject: Re: [Neo4j] Neo4j -- Can it be embedded in Android? Yes, I saw that on the mailing list archives too. I would have though there would be some interest in using this on android -- but there seems to be no news about it since... On Fri, Jun 24, 2011 at 6:13 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: I remember something like that, too. The main issue is probably the non-traditional file system that Android exposes. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Craig Taverner Sent: Friday, June 24, 2011 8:37 AM To: Neo4j user discussions Subject: Re: [Neo4j] Neo4j -- Can it be embedded in Android? I heard that Peter Neubauer made a port of neo4j to android a few years ago, but that nothing has been done since and no version since then would work. So my understanding is that it does not work on android, but that it is possible to make it work (with some work ;-). Peter is away, but I expect he would have a better answer than me. On Fri, Jun 24, 2011 at 1:33 PM, Sidharth Kshatriya sid.kshatr...@gmail.com wrote: Dear All, I have googled for this on the web and did not arrive at a satisfactory answer. *Question: Is it possible to run Neo4j on Android? * Thanks, Sidharth -- Sidharth Kshatriya www.sidk.info ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Sidharth Kshatriya www.sidk.info ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Recent slowdown in imports with lucene
Hi, Has anyone noticed a slowdown of imports into neo4j with recent snapshots? Neo4j-spatial importing OSM data (which uses lucene to find matching nodes for ways) is suddenly running much slower than usual on non-batch imports. For most of my medium sized test cases, I normally have surprisingly similar import times for batch inserter and non-batch inserter (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs the normal API is now more than 10 times slower. Down to 70 nodes per second, which is insanely slow. Any idea if there is something in the recent snapshots for me to look into? Reproducing the problem requires simply running the TestOSMImport test cases in neo4j-spatial. I have only tried this on my laptop, so I have not ruled out that there is something local going on. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Recent slowdown in imports with lucene
Sorry for the lack of details. I wrote the email late at night, as I am again. Anyway, the relevant code in github is OSMImporter.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java. When adding nodes to the graph, it also adds the osm-id to a lucene index. There is no index#removal call, only multiple index#add calls within the same transaction. In fact we call index.add and index.get for one index (osm changesets), while calling index.add on another (osm-nodes). The relevant lines of code are 812 for adding new OSM nodes to the graph, and 914 for finding changesets in a different index. I have not investigated for which version of neo4j the slowdown started, or if there is somehow some other cause. I will try find time to do that later this week. But I thought I should ask on the list anyway in case anyone else has a similar problem, or if there are some obvious answers. On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson matt...@neotechnology.comwrote: Please elaborate on how you are using your index. Are you using Index#remove(entity,key) or Index#remove(entity) followed by get/query in the same tx? There was a recent change in transactional state implementation, where a full representation (in-memory lucene index) was needed for it to be able to return accurate results in some corner cases. That change could slow things down, but not that much though. I'll give some different scenarios a go and see if I can find some culprit for this. But again, a little more information would be useful, as always. 2011/6/26 Craig Taverner cr...@amanzi.com Hi, Has anyone noticed a slowdown of imports into neo4j with recent snapshots? Neo4j-spatial importing OSM data (which uses lucene to find matching nodes for ways) is suddenly running much slower than usual on non-batch imports. For most of my medium sized test cases, I normally have surprisingly similar import times for batch inserter and non-batch inserter (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs the normal API is now more than 10 times slower. Down to 70 nodes per second, which is insanely slow. Any idea if there is something in the recent snapshots for me to look into? Reproducing the problem requires simply running the TestOSMImport test cases in neo4j-spatial. I have only tried this on my laptop, so I have not ruled out that there is something local going on. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Recent slowdown in imports with lucene
Hi again, My apologies, but I have found the problem, and it is in the OSMImporter itself, nothing to do with Lucene or Neo4j. Peter made a commithttps://github.com/neo4j/neo4j-spatial/commit/b5e0f1d1a11ed9c8b2b8074f529362a1607a7643#src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.javain May that while at first glance appears to be a cleanup of my code (removal of string literals), it did have two meaningful changes I only saw on deeper inspection: - Addition of the map type: exact to the index creating (when I removed this, node creation improved from 70/s to 140/s) - User control over the commit size (previously I had hard-coded this to 5000 nodes per tx). There was a small, but significant bug in the commit size, with the new user parameter not being used to initialize anything, with the consequence that every node was committed individually. Setting the block size back to 5000 increased the node creation rate to nearly 1 (over 100 times faster). That is a serious improvement. Sorry again for wasting space on the list. I'm glad this was a user error, though, not a neo4j issue :-) Regards, Craig On Mon, Jun 27, 2011 at 12:54 AM, Craig Taverner cr...@amanzi.com wrote: Sorry for the lack of details. I wrote the email late at night, as I am again. Anyway, the relevant code in github is OSMImporter.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java. When adding nodes to the graph, it also adds the osm-id to a lucene index. There is no index#removal call, only multiple index#add calls within the same transaction. In fact we call index.add and index.get for one index (osm changesets), while calling index.add on another (osm-nodes). The relevant lines of code are 812 for adding new OSM nodes to the graph, and 914 for finding changesets in a different index. I have not investigated for which version of neo4j the slowdown started, or if there is somehow some other cause. I will try find time to do that later this week. But I thought I should ask on the list anyway in case anyone else has a similar problem, or if there are some obvious answers. On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson matt...@neotechnology.com wrote: Please elaborate on how you are using your index. Are you using Index#remove(entity,key) or Index#remove(entity) followed by get/query in the same tx? There was a recent change in transactional state implementation, where a full representation (in-memory lucene index) was needed for it to be able to return accurate results in some corner cases. That change could slow things down, but not that much though. I'll give some different scenarios a go and see if I can find some culprit for this. But again, a little more information would be useful, as always. 2011/6/26 Craig Taverner cr...@amanzi.com Hi, Has anyone noticed a slowdown of imports into neo4j with recent snapshots? Neo4j-spatial importing OSM data (which uses lucene to find matching nodes for ways) is suddenly running much slower than usual on non-batch imports. For most of my medium sized test cases, I normally have surprisingly similar import times for batch inserter and non-batch inserter (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs the normal API is now more than 10 times slower. Down to 70 nodes per second, which is insanely slow. Any idea if there is something in the recent snapshots for me to look into? Reproducing the problem requires simply running the TestOSMImport test cases in neo4j-spatial. I have only tried this on my laptop, so I have not ruled out that there is something local going on. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] cassandra + neo4j graph
Hi, I can comment on the spatial side. The neo4j-spatialhttps://github.com/neo4j/neo4j-spatiallibrary provides some tools for doing spatial analysis on your data. I do not know exactly what you plan to do, but since you mention user and place locations, I guess you are likely to be asking the database for proximity searches (users near me, or places of interest near me), in which case the SimplePointLayerhttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SimplePointLayer.javaclass should provide you what you need. Read the code (linked above), it is simple. Or read the related blog Neo4j Spatial, Part1: Finding things close to other thingshttp://blog.neo4j.org/2011/03/neo4j-spatial-part1-finding-things.html. You also do not need to include neo4j-spatial from the beginning. Just model your graph in a way suiting your domain, and when you want to enable spatial searches, include neo4j-spatial dependencies in your pom and start using it. If you happen to conform to one of the expected spatial structures, you can add you nodes to the spatial index directly, otherwise implement a GeometryEncoder and things should work from there. What I think you might find interesting is that you can edit the search mechanism to filter on both spatial and domain specific characteristics in the same pass. There are various options for this, so we can discuss that later, should you wish. Regards, Craig On Mon, Jun 27, 2011 at 3:49 PM, Aliabbas Petiwala aliabba...@gmail.comwrote: thanks for the informative reply , to add more , the social networking website will be geo aware and some spatial info also needs to be stored like the coordinates of the user node or the coordinates of the location\place how can we add more also will neo4j alone + spatial suffice ? can there be multiple masters for load balancing and how about splitting the graph in the design itself like designing in terms of multiple graphs which are mapped to a glue graph? hats off for building such a pioneering technology! regards, Aliabbas On 6/26/11, Jim Webber j...@neotechnology.com wrote: Hi Aliabbas, It's difficult to make pronouncements about your solution design without knowing about it, but here are some heuristics that can help you to plan whether you go with a native Neo4j solution or mix it up with other stores. All of these are only ideas and you should test first to ensure they make sense in your domain. 1. Document/record size. If each node is likely to contain a lot of data (e.g. many megabytes) then you may choose to hold that outside of Neo4j (e.g. file system, KV store). Otherwise Neo4j. 2. Length of individual fields. If they're small enough to fit within our short-string parameters (optimised around post codes, telephone numbers etc) then you get a performance boost compared to longer strings (which live in a separate store file in Neo4j). If your individual fields are really really long (See above, many megabytes), then consider moving them outside Neo4j. If you can slice up your fields into shorter strings then you'll get a good performance and footprint boost. 3. Many machines. Neo4j has master/slave replication so write performance is asymptotically limited by the IO performance of the master (while reads scale horizontally, pretty much). The number of nodes you have is not a problem for Neo4j, so what is critical is whether a single master can handle the write load you want to throw at it. Since modern buses are fast, and since graph data structures are often less write-heavy than equivalents in other data stores*, I'd suggest that you might be well served by Neo4j here. But my overriding advice is to spike something with Neo4j and then, only if you find something that doesn't work in your context, to think about adding another data store. Jim * I'll be blogging about this shortly since it's a common enough misconception that 1000 writes in a relational/other NOSQL database implies 1000 writes in a graph, whereas often it's a single write meaning graphs can be 1000 times better for the same workload. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Aliabbas Petiwala M.Tech CSE ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j-graph-collections
The RTree in principle should be generalizable, but the current implementation in neo4j-spatial does make a few assumptions specific to spatial data, and makes use of spatial envelopes for the tree node bounding boxes. It is also specific to 2D. We could make a few improvements first, like generalizing to n-dimensions, replacing the recursive search with a traverser and generalizing the bounding boxes to be simple double-arrays. Then the only thing left would be to decide if it is ok for it to be based on n-dim doubles or should be generalized to more types. On Tue, Jun 28, 2011 at 11:14 PM, Saikat Kanjilal sxk1...@hotmail.comwrote: I would be interested in helping out with this, let me know next steps. Sent from my iPhone On Jun 28, 2011, at 8:49 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: A couple of weeks ago Peter Neubauer set up a repository for in-graph datastructures: https://github.com/peterneubauer/graph-collections. At this time of writing only the Btree/Timeline index is part of this component. In my opinion it would be interesting to move the Rtree parts of neo-spatial to neo4j-graph-collections too. I looked at the code but don't feel competent to seperate out those classes that support generic Rtrees from those classes that are clearly spatial related. Is there any enthusiasm for such a project and if so, who is willing and able to do this? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j-graph-collections
I have previously used two solutions to deal with multiple types in btrees: - My first index in 2009 was a btree-like n-dim index using generics to support int[], long[], float[] and double[] (no strings). I used this for TimeLine (long[1]) and Location (double[2]). The knowledge about what type was used was in the code for constructing the index (whether a new index or accessing an existing index in the graph). - In December I started my amanzi-index (on githubhttps://github.com/craigtaverner/amanzi-index) that is also btree-like, n-dimensional. But this time it can index multiple types in the same tree (so a float, int and string in the same tree, instead of being forced to have all properties of the same type). It is a re-write of the previous index to support Strings, and mixed types. This time it does save the type information in meta-data at the tree root. The idea of using a 'comparator' class for the types is similar, but simpler than the idea I implemented for amanzi-index, where I have mapper classes that describe not only how to compare types, but also how to map from values to index keys and back. This includes (to some extent) the concept of the lucene analyser, since the mapper can decide on custom distribution of, for example, strings and category indexes. For both of these indexes, you configure the index up front, and then only call index.add(node) to index a node. This will fit in well with the new auto-indexing ideas in neo4j. On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: At this moment Btree only supports the primitive datatype long, while Rtree only supports the datatype double. For Btree it makes sense to at least support strings, floats, doubles and ints too. Use cases for these data types are pretty obvious and are Btree backed in (almost) every RDBMS product around.I think the best solution would be to create Comparator objects wrapping these primitive data types and store the class name of the comparator in root of the index tree. This allows users to create their own comparators for datatypes not covered yet. It would make sense people would want to store BigInt and BigDecimal objects in a Btree too, others may want to store dates (instead of datetime), fractions, complex numbers or even more exotic data types. Niels From: sxk1...@hotmail.com To: user@lists.neo4j.org Date: Tue, 28 Jun 2011 22:43:24 -0700 Subject: Re: [Neo4j] neo4j-graph-collections I've read through this thread in more detail and have a few thoughts, when you talk about type I am assuming that you are referring to an interface that both (Btree,Rtree) can implement, for the data types I'd like to understand the use cases first before implementing the different data types, maybe we could store types of Object instead of Long or Double and implement comparators in a more meaningful fashion. Also I was wondering if unit tests would need to be extracted out of the spatial component and embedded inside the graph-collections component as well or whether we'd potentially need to write brand new unit tests as well. Craig as I mentioned I'd love to help, let me know if it would be possible to fork a repo or to talk in more detail this week. Regards From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Wed, 29 Jun 2011 01:35:43 +0200 Subject: Re: [Neo4j] neo4j-graph-collections As to the issue of n-dim doubles, it would be interesting to consider creating a set of classes of type Orderable (supporting , =, , = operations), this we can use in both Rtree and Btree. Right now Btree only supports datatype Long. This should also become more generic. A first step we can take is at least wrap the common datatypes in Orderable classes. Niels Date: Wed, 29 Jun 2011 00:32:15 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-graph-collections The RTree in principle should be generalizable, but the current implementation in neo4j-spatial does make a few assumptions specific to spatial data, and makes use of spatial envelopes for the tree node bounding boxes. It is also specific to 2D. We could make a few improvements first, like generalizing to n-dimensions, replacing the recursive search with a traverser and generalizing the bounding boxes to be simple double-arrays. Then the only thing left would be to decide if it is ok for it to be based on n-dim doubles or should be generalized to more types. On Tue, Jun 28, 2011 at 11:14 PM, Saikat Kanjilal sxk1...@hotmail.comwrote: I would be interested in helping out with this, let me know next steps. Sent from my iPhone On Jun 28, 2011, at 8:49 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: A couple of weeks ago Peter Neubauer set up a repository for in-graph
Re: [Neo4j] neo4j-graph-collections
It is technically possible, but it is a somewhat specialized index, not a normal BTree, so I think you would want both (mine and a classic btree). My index performs better for certain data patterns, is best with semi-ordered data and moderately even distributions (since it has no rebalancing), and requires the developer to pick a good starting 'resolution' which means they should know something about their data. Perhaps we just port some of the typing support into a btree in the collections project? On Wed, Jun 29, 2011 at 4:19 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Craig, Would it be possible to merge your work on Amanzi with the work the Neo team has done on the Btree component that is now in neo4j-graph-collections, so we can eventually have one implementation that meets a broad variety of needs? Niels Date: Wed, 29 Jun 2011 15:34:47 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-graph-collections I have previously used two solutions to deal with multiple types in btrees: - My first index in 2009 was a btree-like n-dim index using generics to support int[], long[], float[] and double[] (no strings). I used this for TimeLine (long[1]) and Location (double[2]). The knowledge about what type was used was in the code for constructing the index (whether a new index or accessing an existing index in the graph). - In December I started my amanzi-index (on githubhttps://github.com/craigtaverner/amanzi-index) that is also btree-like, n-dimensional. But this time it can index multiple types in the same tree (so a float, int and string in the same tree, instead of being forced to have all properties of the same type). It is a re-write of the previous index to support Strings, and mixed types. This time it does save the type information in meta-data at the tree root. The idea of using a 'comparator' class for the types is similar, but simpler than the idea I implemented for amanzi-index, where I have mapper classes that describe not only how to compare types, but also how to map from values to index keys and back. This includes (to some extent) the concept of the lucene analyser, since the mapper can decide on custom distribution of, for example, strings and category indexes. For both of these indexes, you configure the index up front, and then only call index.add(node) to index a node. This will fit in well with the new auto-indexing ideas in neo4j. On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: At this moment Btree only supports the primitive datatype long, while Rtree only supports the datatype double. For Btree it makes sense to at least support strings, floats, doubles and ints too. Use cases for these data types are pretty obvious and are Btree backed in (almost) every RDBMS product around.I think the best solution would be to create Comparator objects wrapping these primitive data types and store the class name of the comparator in root of the index tree. This allows users to create their own comparators for datatypes not covered yet. It would make sense people would want to store BigInt and BigDecimal objects in a Btree too, others may want to store dates (instead of datetime), fractions, complex numbers or even more exotic data types. Niels From: sxk1...@hotmail.com To: user@lists.neo4j.org Date: Tue, 28 Jun 2011 22:43:24 -0700 Subject: Re: [Neo4j] neo4j-graph-collections I've read through this thread in more detail and have a few thoughts, when you talk about type I am assuming that you are referring to an interface that both (Btree,Rtree) can implement, for the data types I'd like to understand the use cases first before implementing the different data types, maybe we could store types of Object instead of Long or Double and implement comparators in a more meaningful fashion. Also I was wondering if unit tests would need to be extracted out of the spatial component and embedded inside the graph-collections component as well or whether we'd potentially need to write brand new unit tests as well. Craig as I mentioned I'd love to help, let me know if it would be possible to fork a repo or to talk in more detail this week. Regards From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Wed, 29 Jun 2011 01:35:43 +0200 Subject: Re: [Neo4j] neo4j-graph-collections As to the issue of n-dim doubles, it would be interesting to consider creating a set of classes of type Orderable (supporting , =, , = operations), this we can use in both Rtree and Btree. Right now Btree only supports datatype Long. This should also become more generic. A first step we can take is at least wrap the common datatypes in Orderable classes. Niels
Re: [Neo4j] neo4j-graph-collections
I think moving the RTree to the generic collections would not be too hard. I saw Saikat showed interested in doing this himself. Saikat, contact me off-list for further details on what I think could be done to make this port. On Wed, Jun 29, 2011 at 9:52 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Peter, I totally agree. Having the Rtree index removed of spatial dependencies in graph-collections should be our first priority. Once that is done we can focus on the other issues. Which doesn't mean we should stop discussing future improvements like setting up comparators (or something to that extent) that can be reusable, but we shouldn't try to get that up before Rtree is in graph-collections. Niels From: peter.neuba...@neotechnology.com Date: Wed, 29 Jun 2011 21:10:15 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-graph-collections Craig, just gave you push access to the graph collections in case you want to do anything there. Also, IMHO it would be more important to isolate and split out the RTree component from Spatial than to optimize it - that could be done in the new place with targeted performance tests later? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jun 29, 2011 at 4:19 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Craig, Would it be possible to merge your work on Amanzi with the work the Neo team has done on the Btree component that is now in neo4j-graph-collections, so we can eventually have one implementation that meets a broad variety of needs? Niels Date: Wed, 29 Jun 2011 15:34:47 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-graph-collections I have previously used two solutions to deal with multiple types in btrees: - My first index in 2009 was a btree-like n-dim index using generics to support int[], long[], float[] and double[] (no strings). I used this for TimeLine (long[1]) and Location (double[2]). The knowledge about what type was used was in the code for constructing the index (whether a new index or accessing an existing index in the graph). - In December I started my amanzi-index (on githubhttps://github.com/craigtaverner/amanzi-index) that is also btree-like, n-dimensional. But this time it can index multiple types in the same tree (so a float, int and string in the same tree, instead of being forced to have all properties of the same type). It is a re-write of the previous index to support Strings, and mixed types. This time it does save the type information in meta-data at the tree root. The idea of using a 'comparator' class for the types is similar, but simpler than the idea I implemented for amanzi-index, where I have mapper classes that describe not only how to compare types, but also how to map from values to index keys and back. This includes (to some extent) the concept of the lucene analyser, since the mapper can decide on custom distribution of, for example, strings and category indexes. For both of these indexes, you configure the index up front, and then only call index.add(node) to index a node. This will fit in well with the new auto-indexing ideas in neo4j. On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: At this moment Btree only supports the primitive datatype long, while Rtree only supports the datatype double. For Btree it makes sense to at least support strings, floats, doubles and ints too. Use cases for these data types are pretty obvious and are Btree backed in (almost) every RDBMS product around.I think the best solution would be to create Comparator objects wrapping these primitive data types and store the class name of the comparator in root of the index tree. This allows users to create their own comparators for datatypes not covered yet. It would make sense people would want to store BigInt and BigDecimal objects in a Btree too, others may want to store dates (instead of datetime), fractions, complex numbers or even more exotic data types. Niels From: sxk1...@hotmail.com To: user@lists.neo4j.org Date: Tue, 28 Jun 2011 22:43:24 -0700 Subject: Re: [Neo4j] neo4j-graph-collections I've read through this thread in more detail and have a few thoughts, when you talk about type I am assuming that you are referring to an interface that both
Re: [Neo4j] traversing densely populated nodes
This topics has come up before, and the domain level solutions are usually very similar, like Norbert's category/proxy nodes (to group by type/direction) and Niels' TimeLineIndex (BTree). I wonder whether we can build a generic user-level solution that can also be wrapped to appear as an internal database solution? For example, consider Niels's solution of the TimeLine index. In this case we group all the nodes based on a consistent hash. Usually the timeline would use a timestamp, but really any reasonably variable property can do, even the node-id itself. Then we have a BTree between the dense nodes and the root node (node with too many relationships). How about this crazy idea, create an API that mimics the normal node.getRelationship*() API, but internally traverses the entire tree? And also for creating the relationships? So for most cod we just do the usual node.createRelationshipTo(node,type,direction) and node.traverse(...), but internally we actually traverse the b-tree. This would solve the performance bottleneck being observed while keeping the 'illusion' of directly connected relationships. The solution would be implemented mostly in the application space, so will not need any changes to the core database. I see this as being of the same kind of solution as the auto-indexing. We setup some initial configuration that results in certain structures being created on demand. With auto-indexing we are talking about mostly automatically adding lucene indexes. With this idea we are talking about automatically replacing direct relationships with b-trees to resolve a specific performance issue. And when the relationship density is very low, if the b-tree is auto-balancing, it could just be a direct relationship anyway. On Wed, Jun 29, 2011 at 6:56 PM, Agelos Pikoulas agelos.pikou...@gmail.comwrote: My problem pattern is exactly the same as Niels's : A dense-node has millions of relations of a certain direction type, and only a few (sparse) relations of a different direction and type. The traversing is usually following only those sparse relationships on those dense-nodes. Now, even when traversing on these sparse relations, neo4j becomes extremely slow on a certainly non linear Order (the big cs O). Some tests I run (email me if u want the code) reveal that even the number of those dense-nodes in the database greatly influences the results. I just reported to Michael the runs with the latest M05 snapshot, which are not very positive... I have suggested an (auto) indexing of relationship types / direction that is used by traversing frameworks, but I ain't no graphdb-engine expert :-( A' Message: 5 Date: Wed, 29 Jun 2011 18:19:10 +0200 From: Niels Hoogeveen pd_aficion...@hotmail.com Subject: Re: [Neo4j] traversing densely populated nodes To: user@lists.neo4j.org Message-ID: col110-w326b152552b8f7fbe1312d8b...@phx.gbl Content-Type: text/plain; charset=iso-8859-1 Michael, The issue I am refering to does not pertain to traversing many relations at once but the impact many relationship of one type have on relationships of another type on the same node. Example: A topic class has 2 million outgoing relationships of type HAS_INSTANCE and has 3 outgoing relationships of type SUB_CLASS_OF. Fetching the 3 relations of type SUB_CLASS_OF takes very long, I presume due to the presence of the 2 million other relationships. I have no need to ever fetch the HAS_INSTANCE relationships from the topic node. That relation is always traversed from the other direction. I do want to know the class of a topic instance, leading to he topic class, but have no real interest ever to traverse all topic instance from the topic class (at least not directly.. i do want to know the most recent addition, and that's what I use the timeline index for). Niels From: michael.hun...@neotechnology.com Date: Wed, 29 Jun 2011 17:50:08 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] traversing densely populated nodes I think this is the same problem that Angelos is facing, we are currently evaluating options to improve the performance on those highly connected supernodes. A traditional option is really to split them into group or even kind of shard their relationships to a second layer. We're looking into storage improvement options as well as modifications to retrieval of that many relationships at once. Cheers Michael ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Database engine using Neo4j
Hi Kriti, I can comment on a few things, especially neo4j-spatial: - Neo4j is certainly good for social networks, and people have used it for that, but I personally do not have experience with that so I will not comment further (others can chip in where necessary). - Neo4j-Spatial is good for performing some spatial queries on your domain data. So you start by modeling your domain however you want, and then when you want to start using neo4j-spatial, just add all nodes that have spatial components (eg. location) to the spatial index and they will be available for querying. The SimplePointLayer class has support for querying by proximity, which sounds like what you want. You can also query with a filter on properties (so only nearby objects matching some other criteria). - I do my neo4j-spatial development in eclipse, so there should be no issues for you using eclipse. Just use m2eclipse, and add the dependency to your pom.xml. The current version o neo4j-spatial requires neo4j1.4, so if you are using older neo4j, you might need to make minor changes. - Neo4j is not optimized for storing BLOBs, so while it can store images as byte[], it is advisable to rather store a reference to the image (eg. URI), and store the image in another way (filesystem, other database, etc.) Regards, Craig On Wed, Jun 29, 2011 at 2:06 PM, kriti sharma kriti.0...@gmail.com wrote: Dear Users, I am developing a time capsule DB engine using Neo4j as a database. I intend to develop three scales (temporal , geo/spatial and egocentric/personal relationships) in the db structure. for the geolocation part, i would like to be able to query upon a location keyword and also some nearby places/photos/people that i have in my DB. Do you think neo4j spatial will be a good choice for such a spatial scheme? I have developed a timeline in the usual neo4j using timeline feature. Can I simply integrate neo4j spatial in my existing code for neo4j in eclipse? i am retrieving data from twitter, flickr, facebook etc. so the format of data may not be uniform. Therefore i found Neo4j to be an excellent option. Has some work been done in modelling a user's Facebook data(friends and networks) relationships in Neo4j? How should I go about storing images in the DB? Thanks Kriti ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] traversing densely populated nodes
In the amanzi-index I link all indexed nodes into the index tree, so traversals are straight up the tree. Of course this also means that there are at least as many relationships as indexed nodes. I was reviewing Michaels code for the relationship expander, and think that is a great idea, tranparently using an index instead of the normal relationships API, and can imagine using the relationship expander to instead traverse the BTree to the final relationship to the leaf nodes. So if we imagine a BTree with perhaps 10 or 20 hops from the root to the leaf node, the relationship expander Michael described would complete all hops and return only the last relationship, giving the illusion of direct connections from root to leaf. This would certainly perform well, especially for cases where there are factors limiting the number of relationships we want returned. I think the request for type and direction is the first obvious case, but we could be even more explicit than that, if we pass constraints based on the BTree's consistent hash. On Thu, Jun 30, 2011 at 11:36 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: In theory the approach I described earlier could work, though there are some pitfalls to the current implementation that need ironing out before this can become a recommended approach. The choice of Timeline instead of Btree may actually be the wrong choice after all. I chose Timeline because of my familiarity with this particular class, but its implementation may actually not be all that suitable for this particular use case. This has to do with the fact that Timeline is not just a tree, but a list where entries with an interval of max. 1000 are stored in a Btree index. This works reasonably well for a Timeline, but makes the approach less ideal for storing dense relationships. The problem with the Timeline implementation is the ability to lookup the tree root from a particular leave. In an ordinary Btree is would simply be a traversal from the leave through the layers of block nodes to the tree root. In Timeline the traversal will be different. It first has to move through the Timeline list until it finds an entry that is stored in the Btree (which worst case takes 1000 hops), and then it has to traverse the Btree up to the tree root. To avoid this complicated traversal I ended up doing a lookup through Lucene of the timeline URI (which is stored in all timeline list entries). In fact I might as well have added the URI of the dense node as a property and do the lookup through Lucene without the Timeline, it just happens that I like the sort order of Timeline, making it a useful approach anyway. I will experiment using Btree directly (without Timeline) and see if that leads to a simpler and faster traversal from leave to root node. There is one more issue before this can become production ready. Btree as it is implemented now is not thread safe (per the implementations Javadocs), so it need some love and attention to make it work properly. Niels Date: Thu, 30 Jun 2011 13:57:20 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] traversing densely populated nodes This topics has come up before, and the domain level solutions are usually very similar, like Norbert's category/proxy nodes (to group by type/direction) and Niels' TimeLineIndex (BTree). I wonder whether we can build a generic user-level solution that can also be wrapped to appear as an internal database solution? For example, consider Niels's solution of the TimeLine index. In this case we group all the nodes based on a consistent hash. Usually the timeline would use a timestamp, but really any reasonably variable property can do, even the node-id itself. Then we have a BTree between the dense nodes and the root node (node with too many relationships). How about this crazy idea, create an API that mimics the normal node.getRelationship*() API, but internally traverses the entire tree? And also for creating the relationships? So for most cod we just do the usual node.createRelationshipTo(node,type,direction) and node.traverse(...), but internally we actually traverse the b-tree. This would solve the performance bottleneck being observed while keeping the 'illusion' of directly connected relationships. The solution would be implemented mostly in the application space, so will not need any changes to the core database. I see this as being of the same kind of solution as the auto-indexing. We setup some initial configuration that results in certain structures being created on demand. With auto-indexing we are talking about mostly automatically adding lucene indexes. With this idea we are talking about automatically replacing direct relationships with b-trees to resolve a specific performance issue. And when the relationship density is very low, if the b-tree is auto-balancing, it could just be a direct
Re: [Neo4j] GSoC 2011 Neo4j Geoprocessing | Weekly Report #6
Hi Andreas, Sounds like good progress over all. It is only a week to the mid-terms, so it would be good to do a general code overview and see if this can be integrated with trunk. Shall we plan for a review and test integration in the middle of next week? Regards, Craig On Sat, Jul 2, 2011 at 10:25 AM, Andreas Wilhelm a...@kabelbw.de wrote: Hi, This week I had a little blocker with deleting some subgraph nodes and relations. For that I made a seperate test to identify the problem and try to find a solution. Apart from that I integrated a additonal spatial type function to get the distance between geometry nodes and updated the already existing spatial type functions to the new API. Best Regards Andreas Wilhelm ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] reify links with other neo4j databases located on different distributed servers
As far as I know there is no internal support for transparent traversals across shards. Generally people are doing that in the application layer. However, I think there might be a middle ground of sorts. I we modify the relationship expander, I could imagine that relationships that are between shards could be modified to return node on the other shard. This would make the traversal return nodes across shards, but since I've not tried this myself, I am uncertain if there are other consequences. On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala aliabba...@gmail.comwrote: Hi, I cannot figure out how my application logic can reify links with other neo4j databases located on different distributed servers? hence , how can i make the traversals and graph algorithms transparent to the location of the different databases ? -- Aliabbas Petiwala M.Tech CSE ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] wkb value in node created by addGeometryWKTToLayer
Hi Boris, You do not need to read the property yourself from the node, rather use the GeometryEncoder for this, it converts from the internal spatial storage to the Geometry class, which you can work with. If you call geom.toString() you will get a nice printable version (in WKT). Using the GeometryEncoder is a particularly good idea because we support many internal storage formats, not just the WKB you found. If you have point data only, you should consider using the SimplePointLayer (created with SpatialDatabaseService.createSimplePointLayer()), which will store the Point as two properties, for latitude and longitude. Back to your main question: WKB and WKT are two different formats for representing spatial data. We support both with the WKBGeometryEncoder and WKTGeometryEncoder classes, but in both cases we convert from that format to JTS Geometry class for performing spatial operations on. Internally these classes use the WKBReader/WKBWriter (and WKT versions of this) for performing the conversions. If you want to convert between WKB and WKT yourself, you should just use the JTS code directly. But as I said before, I do not think you need to do this. If you are getting your nodes from a search using the index, something like search.getResults().get(0).getGeometry().toString() will return the WKT version. Regards, Craig On Sat, Jul 2, 2011 at 1:04 AM, Boris Kizelshteyn bo...@popcha.com wrote: Craig or anyone who can answer this: what does the wkb value represent here. I know its the well known bytes, but how do I get back to wkt? I thought it was a byte array, but I can't seem to get my original values back. Form the values in the test case I have: POINT(15.2 60.1) wkb: [0,0,0,0,2,0,0,0,2,64,46,51,51,51,51,51,51,64,78,25,-103,-103,-103,-103,-102,64,46,-103,-103,-103,-103,-103,-102,64,78,12,-52,-52,-52,-52,-51] ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
Hi Boris, Ah! You are using the REST API. That changes a lot, since Neo4j Spatial is only recently exposed in REST and we do not expose most of the capabilities I have discussed in this thread, or indeed in my other answer today. I did recently add some REST methods that might work for you, specifically the addEditableLayer, which makes a WKB layer, and the addGeometryWKTToLayer, for adding any kind of Geometry (including LineString) to the layer. However, these were only added recently, and I have no experience using them myself, so consider this very much prototype code. From your other question today, can I assume you are having trouble making sense of the data coming back? So we need a better way to return the results in WKT instead of WKB? One option would be to enhance the addEditableLayer method to allow the creation of WKT layers instead of WKB layers, so the internal representation is more internet friendly. I've just added untested support for setting the format to WKT for the internal representation of the editable layer in the REST interface. This is untested (outside of my usual unit tests, that is), and is only in the trunk of neo4j-spatial, but you are welcome to try it out and see what happens. Regards, Craig On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com wrote: Hi Craig, Thanks so much for this reply. It is very insightful. Is it possible for me to implement the LineString geometries and lookups using REST? Many thanks! On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com wrote: OK. I understand much better what you want now. Your person nodes are not geographic objects, they are persons that can be at many positions and indeed move around. However, the 'path' that they take is a geographic object and can be placed on the map and analysed geographically. So the question I have is how do you store the path the person takes? Is this a bunch of position nodes connected back to that person? Or perhaps a chain of position-(next)-position-(next)-position, etc? However you have stored this in the graph, you can express this as a geographic object by implementing the GeometryEncoder interface. See, for example, the 6 lines of code it takes to traverse a chain of NEXT locations and produce a LineString geometry in the SimpleGraphEncoder at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 If you do this, you can create a layer that uses your own geometry encoder (or the SimpleGraphEncoder I referenced above, if you use the same graph structure) and your own domain model will be expressed as LineString geometries and you can perform spatial operations on them. Alternatively, if your data is more static in nature, and you are analysing only what the person did in the past, and the graph will therefor not change, perhaps you do not care to store the locations in the graph, and you can just import them as a LineString directly into a standard layer. Whatever route you take, the final action you want to perform is to find points near the LineString (path the person took). I do not think the bounding box is the right approach for that either. You need to try, for example, the method findClosestEdges in the utilities class at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115 This method can find the part of the persons path that it closest to the point of interest. There also also many other geographic operations you might be interested in trying, once you have a better feel for the types of queries you want to ask. Regards, Craig On Wed, Jun 8, 2011 at 2:17 AM, Boris Kizelshteyn bo...@popcha.com wrote: Thanks for the detailed response! Here is what I'm trying to do and I'm still not sure how to accomplish it: 1. I have a node which is a person 2. I have geo data as that person moves around the world 3. I use the geodata to create a bounding box of where that person has been today 4. I want to say, was this person A near location X today? 5. I do this by seeing if location X is in A's bounding box. From looking at what you suggest doing, it's not clear how I assign the node person A to a layer? Is it that the bounding box is now in the layer and not in the node? The issue then becomes, how od I associate the two as the RTree relationship seems to establish itself on the bounding box between the node and the layer. Many thanks for your patience as I learn this challenging material
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
As I understand it, Andreas is working on the much more complex problem of updating OSM geometries. That is more complex because it involves restructuring the connected graph. The case Boris has is much simpler, just modifying the WKT or WKB in the editable layer. In the Java API this is simply to call the GeometryEncoder.encodeGeometry() method, which will modify the geometry in place (ie. replace the old geometry with a new one). However, I do not think it is that simple on the REST interface. I can check, but think we will need a new method for updating geometries. Internally it is trivial to code. So I just added a quick method, called updateGeometryFromWKT, which requires the geometry (in WKT), the existing geometry node-id, and the layer. Give it a try. On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.comwrote: Actually, Andreas Wilhelm is working right now on updating geometries. Sent from my phone. On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote: Wow that's great! I'll try it out asap. This leads to my next question: how do I update the geometry in a layer, rather than add new? What I am thinking of doing is having a multipoint geometery associated with each of my user nodes which will represent their location history. My plan is to add the geometry to a world layer and then associate the returned node with the user. How do I then add new points to that connecter node? Can I just edit the wkt and assume the index will update? Or do you have a better suggestion for doing this? I would rather avoid having each point be a seperate node as I am tracking gps data and getting lots of coordinates, it would be many thousands of nodes per user. Many thanks! On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com wrote: Hi Boris, Ah! You are using the REST API. That changes a lot, since Neo4j Spatial is only recently exposed in REST and we do not expose most of the capabilities I have discussed in this thread, or indeed in my other answer today. I did recently add some REST methods that might work for you, specifically the addEditableLayer, which makes a WKB layer, and the addGeometryWKTToLayer, for adding any kind of Geometry (including LineString) to the layer. However, these were only added recently, and I have no experience using them myself, so consider this very much prototype code. From your other question today, can I assume you are having trouble making sense of the data coming back? So we need a better way to return the results in WKT instead of WKB? One option would be to enhance the addEditableLayer method to allow the creation of WKT layers instead of WKB layers, so the internal representation is more internet friendly. I've just added untested support for setting the format to WKT for the internal representation of the editable layer in the REST interface. This is untested (outside of my usual unit tests, that is), and is only in the trunk of neo4j-spatial, but you are welcome to try it out and see what happens. Regards, Craig On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com wrote: Hi Craig, Thanks so much for this reply. It is very insightful. Is it possible for me to implement the LineString geometries and lookups using REST? Many thanks! On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com wrote: OK. I understand much better what you want now. Your person nodes are not geographic objects, they are persons that can be at many positions and indeed move around. However, the 'path' that they take is a geographic object and can be placed on the map and analysed geographically. So the question I have is how do you store the path the person takes? Is this a bunch of position nodes connected back to that person? Or perhaps a chain of position-(next)-position-(next)-position, etc? However you have stored this in the graph, you can express this as a geographic object by implementing the GeometryEncoder interface. See, for example, the 6 lines of code it takes to traverse a chain of NEXT locations and produce a LineString geometry in the SimpleGraphEncoder at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 If you do this, you can create a layer that uses your own geometry encoder (or the SimpleGraphEncoder I referenced above, if you use the same graph structure) and your own domain model will be expressed as LineString geometries and you can perform spatial operations on them. Alternatively, if your data is more static
[Neo4j] Cypher error in neo4j-spatial
Hi, Recent builds of Neo4j-Spatial no longer like Peters new bounding box query. Peter is on vacation, and I am not familiar with the code (nor cypher), so I thought I would just dump the error message here for now in case someone can give me a quick pointer. The line of code is: Query query = parser.parse( start n=(layer1,'bbox:[15.0, 16.0, 56.0, 57.0]') match (n) -[r] - (x) return n.bbox, r:TYPE, x.layer?, x.bbox? ); The error is: org.neo4j.cypher.SyntaxError: string matching regex `\z' expected but `:' found at org.neo4j.cypher.parser.CypherParser.parse(CypherParser.scala:75) at org.neo4j.cypher.javacompat.CypherParser.parse(CypherParser.java:39) at org.neo4j.gis.spatial.IndexProviderTest.testNodeIndex(IndexProviderTest.java:91) Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
Hi Boris, I can see the new update method here: https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/server/plugin/SpatialPlugin.java#L138 And the commit for it is here: https://github.com/neo4j/neo4j-spatial/commit/22eaf91957a6265ef1e6923b5da572b75383b83e Hope that helps. Let me know if this works. The REST method is entirely untested, but does wrap code that is tested, so I'm relatively optimistic :-) Regards, Craig On Wed, Jul 6, 2011 at 1:51 AM, Boris Kizelshteyn bo...@popcha.com wrote: Hi Craig, This is awesome! Where is the update method? I can't find the code on github. Thanks! On Sat, Jul 2, 2011 at 6:00 PM, Craig Taverner cr...@amanzi.com wrote: As I understand it, Andreas is working on the much more complex problem of updating OSM geometries. That is more complex because it involves restructuring the connected graph. The case Boris has is much simpler, just modifying the WKT or WKB in the editable layer. In the Java API this is simply to call the GeometryEncoder.encodeGeometry() method, which will modify the geometry in place (ie. replace the old geometry with a new one). However, I do not think it is that simple on the REST interface. I can check, but think we will need a new method for updating geometries. Internally it is trivial to code. So I just added a quick method, called updateGeometryFromWKT, which requires the geometry (in WKT), the existing geometry node-id, and the layer. Give it a try. On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.com wrote: Actually, Andreas Wilhelm is working right now on updating geometries. Sent from my phone. On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote: Wow that's great! I'll try it out asap. This leads to my next question: how do I update the geometry in a layer, rather than add new? What I am thinking of doing is having a multipoint geometery associated with each of my user nodes which will represent their location history. My plan is to add the geometry to a world layer and then associate the returned node with the user. How do I then add new points to that connecter node? Can I just edit the wkt and assume the index will update? Or do you have a better suggestion for doing this? I would rather avoid having each point be a seperate node as I am tracking gps data and getting lots of coordinates, it would be many thousands of nodes per user. Many thanks! On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com wrote: Hi Boris, Ah! You are using the REST API. That changes a lot, since Neo4j Spatial is only recently exposed in REST and we do not expose most of the capabilities I have discussed in this thread, or indeed in my other answer today. I did recently add some REST methods that might work for you, specifically the addEditableLayer, which makes a WKB layer, and the addGeometryWKTToLayer, for adding any kind of Geometry (including LineString) to the layer. However, these were only added recently, and I have no experience using them myself, so consider this very much prototype code. From your other question today, can I assume you are having trouble making sense of the data coming back? So we need a better way to return the results in WKT instead of WKB? One option would be to enhance the addEditableLayer method to allow the creation of WKT layers instead of WKB layers, so the internal representation is more internet friendly. I've just added untested support for setting the format to WKT for the internal representation of the editable layer in the REST interface. This is untested (outside of my usual unit tests, that is), and is only in the trunk of neo4j-spatial, but you are welcome to try it out and see what happens. Regards, Craig On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com wrote: Hi Craig, Thanks so much for this reply. It is very insightful. Is it possible for me to implement the LineString geometries and lookups using REST? Many thanks! On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com wrote: OK. I understand much better what you want now. Your person nodes are not geographic objects, they are persons that can be at many positions and indeed move around. However, the 'path' that they take is a geographic object and can be placed on the map and analysed geographically. So the question I have is how do you store the path the person takes? Is this a bunch of position nodes connected back to that person? Or perhaps a chain of position-(next
Re: [Neo4j] Neo4j Spatial - Keep OSM imports
Another option is to run the main method of OSMImport class, which expects command line arguments for database location and OSM file, and will simply import a file once. This is not tested often, so there is a risk things have changed, but it is worth a try. Another, even easier, option in my opinion is the JRuby gem, neo4j-spatial.rb. See http://rubygems.org/gems/neo4j-spatial To get this running, just install JRuby from http://jruby.org, and then install the gem with jruby -S gem install neo4j-spatial and then you will have new console commands like 'import_layer'. If you run 'import_layer mydata.osm', it will import it to a new database, which you can use. See the github page for more information: https://github.com/craigtaverner/neo4j-spatial.rb On Thu, Jul 7, 2011 at 10:47 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Robin, the database is deleted after each run in Neo4jTestCase.java, @Override @After protected void tearDown() throws Exception { shutdownDatabase(true); super.tearDown(); } if you change to shutdownDatabase(false), the database will not be deleted. In this case, make sure to run just that test in order not to write several tests to the same DB for clarity. mvn test -Dtest=TestDynamicLayers Does that work for you? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com wrote: Hello, First of all, I don't know anything in java, and I'm trying to figure out if neo4j could be usefull for my projects. If it is, I will of course learn a bit of java so that I can use neo4j in a decent way for my needs. I'd like to use a neo4j spatial database together with GeoServer. For this, I'm following the tutorial here : http://wiki.neo4j.org/content/Neo4j_Spatial_in_GeoServer But this paragraph is blocking me : - One option for the database location is a database created using the unit tests in Neo4j Spatial. The rest of this wiki assumes that you ran the TestDynamicLayers unit test which loads an OSM dataset for the city of Malmö in Sweden, and then creates a number of Dynamic Layers (or views) on this data, which we can publish in GeoServer. - If you do use the unit test for the sample database, then the location of the database will be in the target/var/neo4j-db directory of the Neo4j Source code. My problem is I do not succeed keeping those neo4j spatial databases created with the tests : When I run TestDynamicLayers, it builds databases (in target/var/neo4j-db), but as soon as the database is successfully loaded, it deletes it and start importing another database, and so on. My poor understanding of java doesn't help a lot, I tried to edit the .java in Netbeans + Maven, but until then, it doesn't work, all the directories created during the tests are deleted when the test ends. Any idea how I could keep those databases ? Thanks, Robin ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j Spatial - Keep OSM imports - Use in GeoServer
I am travelling at the moment, so cannot give a long answer, but can suggest you look at the wiki page for neo4j in uDig, because there we have made some updates concerning which jars to use, and that will probably help you get this working. On Jul 12, 2011 10:59 AM, Robin Cura robin.c...@gmail.com wrote: Hi, First of all, thanks a lot to both of you for your answers, I have only been able to try this yesterday, and it released me from lots of troubles. I succeeded editing the Neo4jTestCase.java file in Netbeans, as you told. I've got troubles to install latest JRuby release (needed for neo4j-spatial) within my Ubuntu, so, I'll make this later, but it's really a good thing to know considering the simplicity of use. Creating thoses databases made me realize another problem.In fact, I followed the tutorial about using neo4j db in Geoserver, and it appears that my neo4j plugin for Geoserver doesn't work, as I always get this error when trying to create a new store linking to my neo4j database. My database is a folder named db1 (and db2 for the other one), located in my ~/ folder. In Geoserver, I create a new store and make it link to file:/home/administrateur/db1/neostore.id But each time, I got this errror : Error connecting to Store. There was an error trying to connecto to store neo4jstore. Do you want to save it anyway? Original exception error: Could not acquire data access 'neo4jstore' I tried with my 2 databases, and same problem. It seems those 2 db aren't the problem, as I've been able to open/visualise those in Gephi (using neo4j import plugin). My guess is that my neo4-spatial plugin for Geoserver isn't working properly. The main problem is that, since the tutorial was written, neo4j changed. In the tuto, we have to place some files in geoserver/WEB-INF/lib/ folder : - json-simple-1.1.jar -- No problem, this file is still used - geronimo-jta_1.1_spec-1.1.1.jar -- Same, this is still the version used in neo4j - neo4j-kernel-1.2-1.2.M04.jar -- Replaced this one with my current neo4j kernel jar, neo4j-kernel-1.4.jar - neo4j-index-1.2-1.2.M04.jar - neo4j-spatial.jar-- Replaced this one with the latest build returned by using sudo mvn clean package : neo4j-spatial-0.6-SNAPSHOT.jar My problem is that there is no more neo4j-index file in latest neo4j releases. There is some neo4j-lucene-index files, but 1.4 doesn't seem to use neo4j-index anymore. When I only put neo4j-lucene-index.jar, Geoserver doesn't propose any option to create a Store from Neo4j databases. So, what I did is I used the neo4j-index-1.3-1.3.M01.jar file from previous release of Neo4j : Geoserver proposes to create a Store from a Neo4j db, but I got the error message quoted above. Any idea how I could make this work ? What is the file that replace neo4j-index in Neo4j 1.4 ? I join one of my database, archived, so that one of you with a working neo4j plugin in Geoserver could test it and confirm the problem isn't with the DB. Thanks, Robin Cura 2011/7/9 Craig Taverner cr...@amanzi.com Another option is to run the main method of OSMImport class, which expects command line arguments for database location and OSM file, and will simply import a file once. This is not tested often, so there is a risk things have changed, but it is worth a try. Another, even easier, option in my opinion is the JRuby gem, neo4j-spatial.rb. See http://rubygems.org/gems/neo4j-spatial To get this running, just install JRuby from http://jruby.org, and then install the gem with jruby -S gem install neo4j-spatial and then you will have new console commands like 'import_layer'. If you run 'import_layer mydata.osm', it will import it to a new database, which you can use. See the github page for more information: https://github.com/craigtaverner/neo4j-spatial.rb On Thu, Jul 7, 2011 at 10:47 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Robin, the database is deleted after each run in Neo4jTestCase.java, @Override @After protected void tearDown() throws Exception { shutdownDatabase(true); super.tearDown(); } if you change to shutdownDatabase(false), the database will not be deleted. In this case, make sure to run just that test in order not to write several tests to the same DB for clarity. mvn test -Dtest=TestDynamicLayers Does that work for you? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com wrote: Hello, First of all, I don't know anything in java, and I'm trying to figure out if neo4j could
Re: [Neo4j] How to create a graph database out of a huge dataset?
I'm not sure it's such a good idea to call tx.success() on every iteration of the loop. I suggest call it only in the commit, and after the loop (ie. move it two lines down). Also I think a commit size of 50k it a little large. You're probably not going to see much improvement past 10k. In fact I generally only use 1k myself (but I hear 10k is popular too :-) On Sun, Jul 17, 2011 at 8:53 PM, st3ven st3...@web.de wrote: Hi, thanks for your fast answer. Right now I'm using lucene for 6M authors, but my whole dataset consists of nearly 25M authors. Can i use lucene there also, because I think this getting really slow to check if a user already exists. How can I change my heap memory settings and my memory-map settings, cause I'm using the transactional mode? Cause I think with 25M authors I will get a OutOfMemory Exception. Here is my code that I have already written so far: import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.Node; import org.neo4j.graphdb.Relationship; import org.neo4j.graphdb.Transaction; import org.neo4j.graphdb.index.Index; import org.neo4j.graphdb.index.IndexHits; import org.neo4j.graphdb.index.IndexManager; import org.neo4j.kernel.EmbeddedGraphDatabase; public class WikiGraphRegUser { /** * @param args */ public static void main(String[] args) throws IOException { BufferedReader bf = new BufferedReader(new FileReader( E:/wiki0.csv)); WikiGraphRegUser wgru = new WikiGraphRegUser(); wgru.createGraphDatabase(bf); } private String articleName = ; private GraphDatabaseService db; private IndexManager index; private IndexNode authorList; private int transactionCounter = 0; private Node article; private boolean isFirstAuthor = false; private Node author; private Relationship relationship; private int node; private void createGraphDatabase(BufferedReader bf) { db = new EmbeddedGraphDatabase(target/db); index = db.index(); authorList = index.forNodes(Author); String zeile; Transaction tx = db.beginTx(); try { // reads lines of CSV-file while ((zeile = bf.readLine()) != null) { if (transactionCounter++ % 5 == 0) { tx.success(); tx.finish(); tx = db.beginTx(); } // String[] looks like this: Article%;% Timestamp%;% Author String[] artikelinfo = zeile.split(%;% ); if (artikelinfo.length != 3) { System.out.println(ERROR: check CSV); for (int i = 0; i artikelinfo.length; i++) { System.out.println(artikelinfo[i]); } return; } if (articleName == ) { // create Article and connect with ReferenceNode article = createArticle(artikelinfo[0], db.getReferenceNode(), MyRelationshipTypes.ARTICLE); articleName = artikelinfo[0]; isFirstAuthor = true; } else if (!articleName.equals(artikelinfo[0])) { // create Article and connect with ReferenceNode article = createArticle(artikelinfo[0], db.getReferenceNode(), MyRelationshipTypes.ARTICLE); articleName = artikelinfo[0]; isFirstAuthor = true; } // checks if author already exists IndexHitsNode hits = authorList.get(Author, artikelinfo[2]); // if new author if (hits.size() == 0) { if (isFirstAuthor) { // creates author and connects him with an article author = createAndConnectNode(artikelinfo[2], article, MyRelationshipTypes.WROTE, artikelinfo[1]); isFirstAuthor = false; } else { author
Re: [Neo4j] How often are Spatial snapshots published?
Interesting that if you look at the github 'blame' for that file (see https://github.com/neo4j/neo4j-spatial/blame/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java), you find that all the findClosestEdges methods where added in October 2010. So if Nolan has a version older than that, then something weird is going on. He must have the very first version from September 2010, which is not compatible with any recent Neo4, Geotools or uDig. When I look at m2.neo4j.org I can see that the latest 0.6-SNAPSHOT is from May. So we do have a problem, but not one that takes us back to last September. Nolan, perhaps your pom.xml refers to an older neo4j-spatial? You should use 0.6-SNAPSHOT. And we will change that again soon (to 0.7) since we are making changes to the geoprocessing and indexing. On Fri, Jul 22, 2011 at 10:04 AM, Anders Nawroth and...@neotechnology.comwrote: Hi! The deployment seems to be broken at the moment, I'll look into that ASAP. /anders 2011-07-22 09:28, Peter Neubauer skrev: Nolan, saftest is to build it yourself from GITHub, I will check the deployment. Is that ok for now? /peter On Fri, Jul 22, 2011 at 3:57 AM, Nolan Darilekno...@thewordnerd.info wrote: I'm looking at the Spatial sources from Git, and am seeing lots of versions of SpatialTopologyUtils.findClosestEdges that don't appear to be in the snapshot I'm downloading. For instance, public static ArrayListPointResult findClosestEdges(Point point, Layer layer) { doesn't appear to be in the snapshot build I have--that or my local cache is borken. Are these snapshots rebuilt regularly? Thanks. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j Spatial and gtype property
Actually we do allow multiple geometry types in the same layer, but some actions, like export to shapely, will fail. We even test for this in TestDynamicLayers. You can use the gtype if you want, but it is specific to some GeometryEncoders, and might change in future releases. It would be better to get the layers geometry encoder and use that. On Jul 27, 2011 6:04 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Christopher, What do you mean by allowing to use? Yes, these properties are used to store the Geometry Type for a Layer and for geometry nodes. Sadly, you cannot have more than one Geometry in Layers due to the limitations of e.g. the GeoTools stack. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 27, 2011 at 4:07 AM, Christopher Schmidt fakod...@googlemail.com wrote: Hi all, is it allowed to use the gtype-property to get the geometry type numbers? (Which are defined in org.neo4j.gis.spatial.Constants) -- Christopher twitter: @fakod blog: http://blog.fakod.eu ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j Spatial and gtype property
Yes. If you have performed a search and now have SpatialDatabaseRecord results, then that is the best method to use. On Thu, Jul 28, 2011 at 6:03 AM, Christopher Schmidt fakod...@googlemail.com wrote: So best is to use SpatialDatabaseRecord.getGeometry()? Christopher On Wed, Jul 27, 2011 at 10:50 PM, Craig Taverner cr...@amanzi.com wrote: Actually we do allow multiple geometry types in the same layer, but some actions, like export to shapely, will fail. We even test for this in TestDynamicLayers. You can use the gtype if you want, but it is specific to some GeometryEncoders, and might change in future releases. It would be better to get the layers geometry encoder and use that. On Jul 27, 2011 6:04 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Christopher, What do you mean by allowing to use? Yes, these properties are used to store the Geometry Type for a Layer and for geometry nodes. Sadly, you cannot have more than one Geometry in Layers due to the limitations of e.g. the GeoTools stack. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 27, 2011 at 4:07 AM, Christopher Schmidt fakod...@googlemail.com wrote: Hi all, is it allowed to use the gtype-property to get the geometry type numbers? (Which are defined in org.neo4j.gis.spatial.Constants) -- Christopher twitter: @fakod blog: http://blog.fakod.eu ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Christopher twitter: @fakod blog: http://blog.fakod.eu ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial and postgis
Or if you want a command line import, try the ruby gem 'neo4j-spatial.rb'. Once installed you can type: osm_import file.shp On Aug 13, 2011 10:33 AM, Andreas Wilhelm a...@kabelbw.de wrote: Hi, with the pgsql2shp tool you can dump your postgis db in a shapefile and you should be able to import it in Neo4j Spatial in the following way: String shpPath = SHP_DIR + File.separator + layerName; ShapefileImporter importer = new ShapefileImporter(graphDb(), new NullListener(), commitInterval); importer.importFile(shpPath, layerName); Best Regards Andreas Am 12.08.2011 11:10, schrieb chen zhao: Hi, I very interested in neo4j spatial . but I do not know how to import the spatial data. My data are stored in postgis. I read the document http://wiki.neo4j.org/content/Spatial_Data_Storage; and http://wiki.neo4j.org/content/Importing_and_Exporting_Spatial_Data,but I yet do not know to to import data from postgis or import shapfiles. Could you provide some detail information? Please advice. zhao ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Spatial query with property filter
I can elaborate a little on what Peter says. The DynamicLayer support is indeed the only way to do what you want right now, but I think it is actually quite a good fit for your use case. When defining a dynamic layer you are actually just defining a 'returnable evaluator', which will be applied to the nodes during the RTree spatial search. This means that the primary search is spatial, but for each leaf node (geometry) the dynamic layer query is applied as a filter. If you use CQL for the query, then all geometries are converted into JTS geometry classes for the filter (which adds a little overhead, so if the spatial query is not your limited factor, this can affect performance). If you use JSON for the query, it is applied directly to the graph as a pattern match. So JSON should be faster, but does also require that you know the structure of the graph, which the CQL approach does not. Peters pointer to the TestDynamicLayers class is the best place to start for seeing how to use both CQL and JSON filter syntaxes. On Mon, Aug 29, 2011 at 11:59 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Hi there, well, spatial querying is not something that can be easily stuck into an iterator. If you want more than casual querying, I think you need to use the GeoTools APIs, we provide support for CQL as a query lang there, see https://github.com/neo4j/spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestDynamicLayers.java#L60for some examples. Basically, you define a dynamic layer witha CQL query, which will return the subset of the full layer (e.g. a SimplePointLayer) that matches that query. Would that help? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 29, 2011 at 1:37 AM, faffi obscurredbyclo...@gmail.com wrote: Hey guys, I'm seeing some kind of disconnect between the spatial and the regular graph traversing query. I can't find a way of executing a spatial query like in SimplePointLayer but also providing something like a ReturnEvaluator. My use case is essentially for all nodes within a 10km radius, return all with name foo. Do I actually have to iterate through all the nodes returned by the query in a list and individually check them? Thanks, faffi -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Spatial-query-with-property-filter-tp3291410p3291410.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j low-level data storage
I think Daniels questions are very relevant, but not just to OSM. Any large graph (of which OSM is simply a good example) will be affected by fragmentation, and that can affect performance. I recently was hit by performance of GIS queries (not OSM) related to fragmentation of the index tree. I will describe that problem below, but first let me describe my view on Daniels question. It is true that if parts of the graph that are geographically close are also close on disk the load time for bounding box queries will be faster. However, this is not a problem that is easy to solve in a generic way, because it requires knowledge of the domain. I can see two ways to create a less fragmented graph: - Have a de-fragmenting algorithm that re-organizes an existing graph according to some rules. This does not exist in neo4j (yet?), but is probably easier to generalize, since it should be possible to first analyse the connectedness of the graph, and then defragment based on that. This means a reasonable solution might be possible without knowing the domain. - Be sure to load domain specific data in the order you wish to query it. In other words, create a graph that is already de-fragmented. This second approach is the route I have started following (at least I've taken one or two tiny baby-steps in that direction, but plan for more). In the case of the OSM model produced by the OSMImporter in Neo4j-Spatial, we do not do much here. We are importing the data in the order it was created in the original postgres database (ie. in the order it was originally added to open street map). However, since the XML format puts ways after all nodes, we actually also store all ways after all nodes, which means that to load any particular way completely from the database requires hitting disk at at least two very different locations, the location of the way node and the interconnects between the nodes, and the location(s) of the original location nodes. This multiple hit will occur on the nodes, relationships and properties tables in a similar way. So I can also answer a question Daniel asked about the ids. The Neo4j nodes, relationships and properties have their own id space. So you can have node 1, relationship 1 and property 1. Lets consider a real example, a street made of 5 points, added early to OSM (so low id's in both postgres and in neo4j). The OSM file will have these nodes near the top, but the way that connects them together will be near the bottom of the file. In Postgres the nodes and ways are in different tables, and will both be near the top. In neo4j both osm-ways and osm-nodes are neo4j-nodes (in the same 'table'). The osm-nodes will have low ids, but the ways will have a high id. Also we use proxy nodes to connect osm-ways to osm-nodes, and these will be created together with the way. So we will have 5 nodes with low ids, and 8 nodes with high id's (5 proxy nodes, 1 way node, 1 geometry node and 1 tags node). If the way was big and/or edited multiple times, we could get even higher fragmentation. Personally I think that fragmenting one geometry into a few specific locations is not a big problem for the neo4j caches. However, when we are talking about a result-set or traversal of thousands or hundreds of thousands of geometries, then doubling or tripling the number of disk hits due to fragmentation can definitely have a big impact. How can this fragmentation situation be improved? One idea is to load the data with two passes. The current loader is trying to optimize OSM import speed, which is difficult already (and slower than in rdbms due to increased explicit structure), and so we have a single pass loader, with a lucene index for reconnecting ways to nodes. However, I think we could change this to a two pass loader, with the first pass reading and indexing the point nodes into a unique id-index (for fast postgres id lookup), and the second pass would connect the ways, and store both the nodes and ways to the database at the same time, in continuous disk space. This would improve query performance, and if we make a good unique-id index faster than lucene, we will actually also improve import speed .. :-) Now all of the above does not answer the original question regarding bounding box queries. All we will have done with this is improve the load time for complete OSM geometries (by reducing geometry fragmentation). But what about the index itself. We are storing the index as part of the graph. Today, Neo4j-spatial uses an RTree index that is created at the end of the load in OSMImporter. This means we load the complete OSM file, and then we index it. This is a good idea because it will store the entire RTree in contiguous disk space. Sort of there is one issue with the RTree node splitting that will cause slight fragmentation, but I think it is not too serious. Now when performing bounding box queries, the main work done by the RTree will hit the minimum amount of disk space, until
Re: [Neo4j] Neo4j in GIS Applications
Hi all, I am certainly behind on my emails, but I did just answer a related question about OSM and fragmentation, and I think that might have answered some of Daniels questions. But I can say a little more about OSM and Neo4j here, specifically about the issue of joins in postgres. Let me start by describing where I think postgres might be faster than neo4j, and then move onto where neo4j is faster than postgres. Importing OSM data into postgres will be faster than neo4j because the foreign keys are simple integer references between tables and are indexed using postgres high performance indexes. In Neo4j the relationships are much more detailed explicit bi-directional references taking more disk space (but no index space). The disk write time is longer (more data written), but the advantages of not having an index make it worth while. So that leads naturally to where neo4j is faster. The reason there is no index on the foreign key is because there is no need for one. Each relationship contains the id of the node it points to (and points from), and that id is directly mapped to the location on disk of the node itself. So this is more like an array lookup, because all nodes are the same size on disk. So the 'join' you perform when traversing from one osm-node to another is extremely fast, but more importantly it is not affected by database size. It is O(1) in performance! Fantastic! In rdbms, the need for an index on the foreign key means you are building a tree structure to get the join down from O(N) to O(ln(N)) or something better, but never as good as O(1). In neo4j-spatial, if you perform a bounding box query, you are traversing an RTree, which does not exist in posgres, but does exist in PostGIS. In both Neo4j-Spatial and PostGIS you are working with a tree index that will slow things down if there is a lot of data, and currently the postgis rtree is better optimized than the neo4j-spatial rtree. But if you are performing more graph-like processing, for example proximity searches, or routing analysis, then you will get the full O(1) benefits of the graph database, and no way can postgres match that :-) OK. Lots of hype, but I get enthusiastic sometimes. Take anything I say with a pinch of salt. Believe the part that make sense to you, and try some tests otherwise. It would be great to hear your experiences with modeling OSM in neo4j versus postgres. Regards, Craig On Tue, Oct 4, 2011 at 7:18 PM, Andreas Kollegger andreas.kolleg...@neotechnology.com wrote: Hi Daniel, If you haven't yet, you should check out the work done in the Neo4j Spatial project - https://github.com/neo4j/spatial - which has fairly comprehensive support for GIS. Data locality, as you mention, is exactly a big advantage of using a graph for geospatial data. Take a look at the Neo4j Spatial project and let us know what you think. Best, Andreas On Tue, Oct 4, 2011 at 9:58 AM, danielb danielbercht...@gmail.com wrote: Hello everyone, I am going to write my master thesis about the suitability of graph databases in GIS applications (at least I hope so^^). The database has to provide topological queries, network analysis and the ability to store large amount of mapdata for viewing - all based on OSM-data of Germany ( 100M nodes). Most likely I will compare Neo4j to PostGIS. As a starting point I want to know why you would recommend Neo4j to do the job? What are the main advantages of a graph database compared to a (object-)relational database in the GIS environment? The main focus and the goal of this work should be to show a performance improvement over relational databases. In a student project (OSM navigation system) we worked with relational (SQLite) and object-oriented (Perst) databases on netbook hardware and embedded systems. The relational database approach showed us two problems: If you transfer the OSM model directly into tables then you have a lot of joins which slows everything down (and lots of redundancy when using different tables for each zoom level). The other way is to store as much as possible in one big (sparse) table. But this would also have some performance issues I guess and from a design perspective it is not a nice solution. The object-oriented database also suffered from many random reads when loading a bounding box. In addition we could not say how data was stored in detail. The performance indeed increased after caching occured or by the use of SSD hardware. You can also store everything in RAM (money does the job), but for now you have to assume that all of the data has to be read from a slow disk the first time. Can Neo4j be configured to read for example a bounding box of OSM data from disk in an efficient way (data locality)? Maybe you also have some suggestions where I should have a look at in this work and what can be improved in Neo4j to get better results. I also would appreciate related papers.
Re: [Neo4j] Problem Installing Spatial (Beginner)
Sorry for such a late response, I missed this mail. I must first point out that it seems you are trying to use Neo4j-Spatial in the standalone server version of Neo4j. That is possible, but not well supported. We have only exposed a few of the functions in the server, and do not test it regularly. The main way we are using neo4j-spatial at the moment is in the embedded version of neo4j. This is where the maven instructions come in because they assume you are writing a Java application that will embed the database. If you are using a java application, and you can start using maven, then everything should be easy to get working. However, since I am relatively sure you are using neo4j-server, I think you are getting into deep water. We need to improve our support for neo4j server more before I can recommend you try it. The next release, 0.7, is focusing on geoprocessing features, and then we hope to expose this in neo4j-server in 0.8. Hopefully then things will be much easier for you. On Tue, Sep 27, 2011 at 5:24 PM, handloomweaver a...@atomised.coop wrote: Hi I wonder if someone would be so kind to help. I'm new to Neo4j and was trying to install Neo4jSpatial to try its GIS features out. I need to be clear that I have no experience of Java Maven so I'm struggling a bit. I want to install Neo4j Spatial once somewhere on my 4GB MacBook Pro. I have no problem downloading the Neo4j Java Binary and starting it. But I'm confused about the Spatial library. Looking at the Github page it says either use Maven or copy a zip file into a folder in Neo4j. Is the zip file the Github repository contents or something else? I've tried the Maven way (mvn install) described on GitHub but I'm firstly confused about if/where Neo4j is being installed (does it install it first, where?) and anyway the install fails. It seems to be a JVM Heap memory problem? Why is it failing. How can I make it not fail. Is it a config file somewhere needing tweaked? http://handloomweaver.s3.amazonaws.com/Terminal_Output.txt http://handloomweaver.s3.amazonaws.com/surefire-reports.zip I'm really keen to use Neo4J spatial but the barrier to entry for the less technical GIS developer is proving too high for me! I'd SO appreciate some help/pointers. I apologise that I am posting such a NOOB question on your forum but I've exhausted Google searches. Thanks -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Problem-Installing-Spatial-Beginner-tp3372924p3372924.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] osm_import.rb
Hi, Sorry for a late contribution to this discussion. I will try make a few comments to cover the various mails above. Firstly, the neo4j-spatial.rb GEM at version 0.0.8 on RubyGems works with Neo4j-Spatial 0.6, which does include the non-batch inserter code, so in principle should work for you. However, there is a need to change one line of code in the Ruby to make it use the normal graph API instead of the batch inserter. I will commit this change later, but for now you would change line 118 of osm.rb (see https://github.com/craigtaverner/neo4j-spatial.rb/blob/master/lib/neo4j/spatial/osm.rb#L118), to instead look like: #@importer.import_file batch_inserter, @osm_path @importer.import_file normal_database, @osm_path, false, 5000 (basically you replace 'batch_inserter' with 'normal_database' and add the two extra parameters 'false, 5000'). Looking at the errors you are getting, I see they are, as you suspected, related to out of date instructions. I will try get round to updating the instructions soon, but in the meantime: - For using the Ruby Gem, you should use the osm_import command (added automatically to your path when you install the gem). So you can replace the command 'jruby -S examples/osm_import.rb' with just 'osm_import'. - When using the code directly from github, there is a jar missing in the lib/neo4j/spatial/jars directory. This is the neo4j-spatial-0.6-SNAPSHOT.jar, which can be downloaded and copied into that directory manually. The direct link to this file on the m2.neo4j.orgsite is http://m2.neo4j.org/org/neo4j/neo4j-spatial/0.6-SNAPSHOT/neo4j-spatial-0.6-SNAPSHOT.jar Your last comment about 'includePoints' is just a setting for whether or not to use all OSM points as individual geometries or not. The default is false because you normally do not want to be able to search for all points on a long road, but for the road itself. I recommend leaving this as false, unless you have a specific need. Regards, Craig On Thu, Nov 10, 2011 at 2:51 PM, grimace macegh...@gmail.com wrote: I ended up trying again with just java (but still running with batchInserter), adjusting my memory settings and max heap, it's currently working on the americas.osm file from cloudmade - http://downloads.cloudmade.com/americas#downloads_breadcrumbs. The file is about 99 GB when assembled. I'm running on ubuntu 11.10 Core 2 Duo 2.Ghz with 4G ram (not very fast, but what I have available right now), Java Heap -- -Xmx=3072M config settings: neostore.nodestore.db.mapped_memory=1000M neostore.relationshipstore.db.mapped_memory=300M neostore.propertystore.db.mapped_memory=400M neostore.propertystore.db.strings.mapped_memory=800M neostore.propertystore.db.arrays.mapped_memory=100M My code is essentially from the test suite that you suggested but I am using the batchImporter instead. I'm about 1/3 of the way through and don't want to interrupt the process, but when it's done I'll try it without the batch importer. It runs at about 4500 nodes/second. Is that reasonable? I haven't looked at performance numbers from anyone else. Would the non batch performance be better? Is is better to 'includePoints' or not? One questions I had was, once I get this imported via this method ( neo4j embedded ), is it possible to move the imported db to a neo4j server? I'm hoping it is. If so, what would that process be? -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/osm-import-rb-tp3493463p3496760.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OSMImporter: Is there a way to do incremental imports?
I did some initial work on incremental imports back in 2010, but stopped due to some complications: - We needed to mix lucene reads and writes during the import (read to check if the node already exists, so we don't import twice) and this performs very badly in the batch inserter. We decided to first code a non-batch insert mode before re-starting the incremental import work. Now Peter and I did code a non-batch importer in early 2011, but never went back to complete the incremental import. - We wanted to support both the case of importing multiple OSM files that could be stitched together by resolving overlaps, as well as the case of applying changesets to the existing OSM model. This increased the complexity of the work just enough to ensure it got dropped. In early 2011 we also added support to changesets in the model (but only as a data structure, not in terms of importing changesets). So we are one step closer to this also. Since we now have non-batch importing, and changeset data structures, the opportunity to re-start the incremental import and importing changesets is there. It should not be too hard. For incremental imports, stitching osm files together, we re-activate the old code that tests the lucene index before adding nodes and relations. There might be some subtle edge cases to consider, but a set of tests with overlapping and non-overlapping osm files should flush them out. For applying changesets, more thinking is still required. Do we want to support history in the model, or only the latest version? Should we verify that only newer changesets are applied and in the right order, or rely on the user to get it right? I can say that we did some thinking this summer on the data structures required to support a complete change history. This relies on the fact that we already support multiple possible ways on the same nodes, so we can also, in principle, support multiple possible 'versions' of ways on the same nodes. More thinking is required, but I have a suspicion that we should actually go ahead and do this properly will full history, because that might be the only way to make sure the user never messes things up by importing in the wrong order. On Tue, Nov 22, 2011 at 9:58 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Gregory, incremental loads (and thus, restarts of OSM imports) are a feature we want to add later on, but it's not in there yet. This would also mean we could stitch in other areas on demand, and support submitting changesets back to OSM or at least capture them, so you as an OSM based app can contribute to OSM automagically. I know it's much to ask, but help here would be greatly appreciated. I hope to lab with Michael Hunger on import of data into OSM (and others) this Friday and hope to get somewhere :) Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - NOSQL for the Enterprise. http://startupbootcamp.org/- Öresund - Innovation happens HERE. On Tue, Nov 22, 2011 at 7:15 AM, grimace macegh...@gmail.com wrote: I've been playing with OSMImporter; tried batch and native java. I've had mixed success trying to import the planet, but since it's of considerable size, the job usually blows up or grinds to a halt about half way. I think the most I've made it to is 651M nodes and that's not even the ways or relations. I just don't know enough about it and thought I would ask before I try to dive in to it, but what would I have to do to so that I could restart the job ( where it left off ) when it blows? -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3526941.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] possibility to merge some neo4j databases?
There are two approaches I can think of: - use a better index for mapping ids. Lucent is too slow. Memory hashtables are memory bound.Peter has been investigating alternative dbs like bdb. I tried, but did not finish a hashmap of cached arrays, and Chris wrote his big data import project on github, which is a hashmap of cached hashmaps. Many promising solutions, but none yet complete. All Target the general case of id mapping. - for this specific case, merging small databases, I had an idea a couple of years ago which I still think will work. Bulk appending entire databases, by offsetting all internal ids by the current max id. I remember the reason Johan did not like this idea was that it suffered from the same flaws as the batch inserter, locking the entire db, no rollback and risk of entire db corruption. For people happy with the batch inserter, perhaps this is still an option, but unlikely to get prioritized by the neo team because if the corruption risks. It would, however, perform spectacularly well since the id map is a trivial function. Personally I hope someone completes Chris persistent hashmap or a similar solution. Id maps are a recurring theme and would be very valuable. On Nov 29, 2011 12:07 PM, osallou olivier.sal...@gmail.com wrote: Hi, I need to batch insert millions of data in neo4j. It is quite difficult to keep all in a Map to get node ids, so it needs frequent lookups in index to get some node ids for relationships, and result is quite low. Is there any way to build several neo4j databases (independantly) then to merge them? (I could build many small db in parallel) Thanks Olivier -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/possibility-to-merge-some-neo4j-databases-tp3544694p3544694.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Contributors section in the manual
What is the sort order? Date of first commit, number of lines, commits, packages? On Nov 21, 2011 2:35 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Everyone, have started to put in some random people in, see http://docs.neo4j.org/chunked/snapshot/contributors.html . Any ideas what more info to provide here, or how to make this nicer? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - NOSQL for the Enterprise. http://startupbootcamp.org/- Öresund - Innovation happens HERE. On Sun, Nov 13, 2011 at 10:42 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: To start with, The manual is for the direct codebase that is part of the distribution. The next step is to include sections and pointers to other stable related projects and drivers. Does that sound reasonable? On Nov 13, 2011 1:36 AM, Nigel Small ni...@nigelsmall.name wrote: Are you looking for info on associated projects like py2neo or direct contributions to the main code base? On a side note, I've been getting quite a few hits to my blog post on pagination in Neo4j. The bits I wrote for that are all Python/py2neo again but that or something similar might be worth including somewhere on the Neo site as it appears to be a reasonably sought-after topic. Cheers *Nigel Small* Phone: +44 7814 638 246 Blog: http://nigelsmall.name/ GTalk: ni...@nigelsmall.name MSN: nasm...@live.co.uk Skype: technige Twitter: @technige https://twitter.com/#!/technige LinkedIn: http://uk.linkedin.com/in/nigelsmall On 12 November 2011 20:40, Peter Neubauer peter.neuba...@neotechnology.comwrote: Hi guys, I would love to add a section on contributors to the Neo4j Manual, in http://docs.neo4j.org/chunked/snapshot/community.html so that all of you that participate in the process can be found in there. Do you have any suggestions on how to present this, that is - what info, links and maybe a short presentation snippets and pictures? Graph to components or simply a table? Thoughts? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - NOSQL for the Enterprise. http://startupbootcamp.org/- Öresund - Innovation happens HERE. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OSMImporter: Is there a way to do incremental imports?
There was only a method ending in 'WithCheck', or something like that, lying unused in the code from last year. Nothing more than that. Except for thinking about it, which is why I wrote the previous mail. On Dec 2, 2011 12:50 PM, Peter Neubauer pe...@neubauer.se wrote: Not sure, Craig, do you have the code somewhere? /peter On Tue, Nov 22, 2011 at 4:17 PM, grimace macegh...@gmail.com wrote: thanks for the response(s)! The hardware I'm testing on is not the best and only 4G of ram so I'm limited, but this seems the best opportunity for me to learn this...that being said... For incremental imports, stitching osm files together, we re-activate the old code that tests the lucene index before adding nodes and relations. There might be some subtle edge cases to consider, but a set of tests with overlapping and non-overlapping osm files should flush them out. I'd love to play with this. Is the old code there for me to re-enable in testing? Or can you point me to where this might be put in? Thx, Greg -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3527995.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Are graphs ok for lots of Event data
Of course the graph can be used for processing event data, and whether that works for your case or not depends. But we have used it for this, and I can discuss a few points. The event stream is obviously just a linear chain and can be modeled as such in the graph (eg. with NEXT relationships between event nodes). However this does not bring much advantage over the original flat file which already has implicit next (next line, assuming time ordered). You could instead use a TimeLineIndex to manage the order, and then you would have an advantage over disordered original data. Durations between events can be new nodes with START and END relationships to the individual events, and the time difference optionally added as a property to the duration node. One nice thing about the graph is that you can keep adding data and structure as you go, sometimes much later. So your question about adding server and number of items processed, etc, can be added later, at your convenience. When grouping events together and getting statistics, some things can be added incrementally, like max/min/count/total. But percentile is not so trivial. Consider the case where you want to know the statistics for each hour of events. If you have an hour node connected to all event nodes in that hour, you can update the max/min/count/total values as new event data enters the database. But percentile needs to be calculated once all events in the hour have arrived. This can be handled at the application level. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] path finding using OSM ways
We do indeed have twice the node count (and twice the relationship count). This is a necessary side effect of the fact that an OSM node can participate in more than one way (at intersections as well as shared edges of polygons, etc.). In addition, with shared edges the direction can be reversed from one way to the other, so we need a completely separate set of nodes and relationships to model one way versus the other. We have considered a compacted version of the graph where we only use the extra nodes and relationships when they are needed, but the code to decide when they are needed or to convert the subgraph to the expanded version when needed (ie. when a new joined way is loaded) would be much more complex, and therefor susceptible to bugs. We choose a cleaner, simpler code base over a more complex, but more compact graph. Now we also want to model historical changes. It appears that the use of multiple nodes/relationships will also allow us to model this, so it is a good thing (tm) :-) For routing, I would create a set of relationships connecting directly all nodes that are intersection points, and ignoring all the nodes along the way. We can add edge weights to these new relationships for the distance traveled, or other appropriate weighting factors (type of road, possible speed, hinderences, etc.). This graph would be ideal for routing calculations. The main OSM graph is not ideal for routing, but is designed to be a true and accurate reflection of the original OSM data and topology stored in the open street map database. With Neo4j we can do both :-) These routing relationships have not been added to the current OSM model in neo4j-spatial, but would be relatively trivial to add (if we ignore advanced concepts like turning restrictions). They could be added by the OSMImporter code that identifies intersections, with only a few lines of extra code (I think ;-) On 12/6/11, danielb danielbercht...@gmail.com wrote: craig.taverner wrote ... - Create a way-point node for these ... Hi together, I wonder why to add extra nodes to the graph (if I understand Craig correctly)? Wouldn't you then end up in expanding twice the node count (way-point nodes and OSM nodes themself, because you have to query the OSM id (or any other identification value of the end node) in every expand and lat / lon if you don't have precompiled edge weights)? I would just connect the OSM nodes directly with new edges to form a routing subgraph. Best Regards, Daniel -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-path-finding-using-OSM-ways-tp3004328p3564688.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Feedback requested: Major wish list item for Neo4J
I definitely second this suggestion. We have recently being working on a binary store for dense data we would like to access as if they were properties of nodes. Right now we have properties that are references to files on disk, and then handle the binary ourselves, but this does not benefit from any transactional advantages. Rick's suggestion of a plugable store would suite us very well, because I presume Neo4j would specify the interface/api to use to implement such a store in a way that could be handled atomically within transactions, and then we could satisfy that with our own store. On Wed, Dec 7, 2011 at 3:43 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: One area where I would love to see the Neo4J team focus some energy is in the efficient storage and retrieval of blob/large text properties. Similar to the indexing strategy in Neo4J, it would be nice if this was pluggable (and it could depend on some other data store more optimized for blob/clob properties). The keys for this to be successful are: - Transacted - Does not store these properties in memory except when accessed (and then, perhaps offer a getPropertyAsStream method and a setPropertyFromStream method for optimal performance) - Transparent - should just work Nice to haves, but not at all required in the first iteration: - Pluggable (store in Neo4J native, filesystem, EC2 simple storage, etc.) Addition of these capabilities would move Neo4J into a dramatically expanded realm of potential applications, some of which are quite mind blowing, both in the social realm and in the enterprise realm. Feedback welcomed! ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user