Re: [Neo4j] path finding using OSM ways

2011-05-31 Thread Craig Taverner
Hi Bryce,

Nice to see you back.

The OSM data model in Neo4j-Spatial, created by the OSMImporter, is designed
to mimic the complete contents of the XML files provided for OSM. As it is,
this is not ideal for routing because it traces the complete set of nodes
for the ways, while for routing you really want a graph that connects each
waypoint by a single relationship. So, if I were to perform routing on top
of the OSM model, I would actually build an overlap graph that just connects
the waypoints. The current model has a vertex called a 'way', but that is
not a way-point, because it represents the entire way (eg. a street). We
would need to do the following:

   - Identify ways that are streets (as opposed to non-routing types like
   regions, buildings, lakes, etc.)
   - Identify the points that are intersections (way-points)
   - Create a way-point node for these
   - Add relationships between way points if they are connected by streets
   in the OSM model
   - Weight the relationships by the length of the streets
   - Then apply the A* algorithm (which I have no experience with myself,
   but others in neo4j certainly do)

I think everything but the last part would be very easy to add to the
OSMImporter itself, so that the routing graph exists in any OSM model. Today
it does not exist, and routing would be more difficult and expensive (since
you would have to traverse a much more complex graph, unnecessarily).

Regards, Craig

On Tue, May 31, 2011 at 4:31 AM, bryce hendrix brycehend...@gmail.comwrote:

 I am finally getting back to experimenting with Neo4j. Because it has been
 a
 while since I last looked at it, I've forgotten just about everything. I
 want to start with something simple, is there any sample code which does A*
 path finding over OSM ways?

 Thanks,
 Bryce
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Sample Linear Referencing Functions in Neo4j Spatial and GSoC

2011-06-02 Thread Craig Taverner
Hi,

Recently someone asked a question on StackOverflow, if Neo4j Spatial was
capable of one of the Oracle geoprocessing funtions, SDO_LRS.LOCATE_PT
specifically. Since this is related to the ongoing GSoC projects for Neo4j
Spatial, I thought I would do a quick investigation. What I found was that
the requested capabilities are available in JTS (which we include in Neo4j
Spatial), but with very different names. The code to achieve this in JTS is
'new LengthIndexedLine(geometry).extractPoint(measure,offset)'. I have
wrapped these in the
SpatialTopologyUtils.locatePoint(geometry,measure,offset), so that it is
accessible together with some other spatial topology functions, and also
looks more like the Oracle function.

I pushed this to github, and think it can be included as a prototype for the
discussions for the GSoC on Geoprocessing.

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] [SoC] Re: GSoC 2011 Weekly report - OSM data mining and editing capabilities in uDig and Geotools

2011-06-05 Thread Craig Taverner
Hi Mirco,

Sounds like progress. Some suggestions:

   - I do not think you need to change the code for neo4j and udig, but only
   for neo4j-spatial and udig-community/neo4j. It is OK to make clones of those
   so you have the code for review, but they are quite core, and you should not
   need to actually change them.
   - Focus on neo4j-spatial and udig-community/neo4j, which are the two
   projects you will certainly make changes to. All uDig GUI changes can be
   made in udig-community/neo4j.
   - You might even want to make a new udig plugin in a new git project,
   perhaps udig-community/osm, for the OSM editor work. The neo4j plugin would
   provide the communication layer for neo4j and any neo4j data sources, while
   the OSM plugin would provide OSM specific features, including the additional
   views and editors required to support a complete 'OSM Editor' capability.

Regards, Craig

On Sun, Jun 5, 2011 at 1:51 AM, Mirco Franzago mircofranz...@gmail.comwrote:

 Weekly report #2
 ==What I did==
 - The main work was to set-up the whole devel enviroment: eclipse + udig +
 neo4j.
 - I forked the repository on github for my code: [0], [1] and [2] are 
 respectively
 the repositories for udig, neo4j and neo4j-spatial.
 - The target was to have eclipse with the udig sdk took from github, just
 as neo4j, to be able to commit the udig code and the neo4j code from the
 same envoroment.
 - I set-up the apache maven tool and the e-git plugin to be able to use
 them directly from eclipse.
 - After these steps and some fighting against the jars to import it was
 possible to execute udig with the neo4j plugins and to test the main
 functionalities.
 - I started the code analysis to understand where put my hands next week
 :-)

 ==Next week plan==
 - Fix some last problems for a new git user with the commit command.
 - Finally start the real coding after the initially head-cracking
 problems.

 [0] https://github.com/mircofranzago/udig-platform
 [1] https://github.com/mircofranzago/neo4j
 [2] https://github.com/mircofranzago/neo4j-spatial




 2011/5/31 Mirco Franzago mircofranz...@gmail.com

 Hi all,
 I am Mirco Franzago and I started to work to my google summer of code 2011
 project. I weekly will update this thread to let the community know about
 the work done and the work that will do.
 Last week I could not to do much cause I was very busy for my last exam
 before summer. Now I'm ready to start for this new job.



 ___
 SoC mailing list
 s...@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/soc


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-06-07 Thread Craig Taverner
Hi,

The bounding boxes are used by the RTree index, which is a typical way to
index spatial data. For Point data, the lat/long and the bounding box are
the same thing, but for other shapes (streets/LineString and Polygons), the
bounding box is quite different to the actual geometry (which is not just a
single lat/long, but a set of connected points forming a complex shape).

The RTree does not differentiate between points and other geometries,
because it cares only about the bounding box, and therefor we provide that
even for something as simple as a Point.

Does that answer the question?

Regards, Craig

On Tue, Jun 7, 2011 at 4:57 PM, Boris Kizelshteyn bo...@popcha.com wrote:

 Greetings!

 Perhaps someone using neo4j-spatial can answer this seemingly simple
 question. Nodes classified into layers have both lat/lon properties and
 bounding boxes, the bounding box seems to be required to establish the
 relationship between node and layer, however the node is not found if the
 lat/lon does not match the query. Can someone explain the relationship
 between these two properties on a node?

 Many thanks!
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-06-07 Thread Craig Taverner
I think you need to differentiate the bounding boxes of the data in the
layer (stored in the database), and the bounding box of the search query.
The search query is not stored in the database, and will not be seen as a
node or nodes in the database. So if you want to search for data within some
bounding box or polygon, then express that in the search query, and you do
not need to care about how your nodes are stored in the database.

So when you say you want to make a larger bounding box, I assume you are
talking about the query itself. The REST API has the method
findGeometriesInLayer, which takes minx, maxx, miny, maxy parameters and you
can set those to whatever you want for your query.

The REST API also exposes the CQL query language supported by GeoTools. This
allows you to perform SQL-like queries on geometries and feature attributes.
For example, you can search for all objects within a specific polygon (not
just a rectangular bounding box), as well as conforming to certain
attributes. See
http://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.html for
some examples of CQL.

However, our current CQL support is not fully integrated with the RTree
index. This means that the CQL itself will not benefit from the index, but
be a raw search. You can, however, still get the benefit of the index by
passing in the bounding box separately. So, for example, you want to search
for data in a polygon. Make the polygon object, get it's bounding box and
also the CQL query string. Then make a 'dynamic layer' using the CQL (which
is a bit like making a prepared statement). Then perform the same
'findGeometriesInLayer' method mentioned above, using the bounding box and
the dynamic layer (containing the CQL). This has the effect of using the
RTree index for a first approximate search, followed by pure CQL for the
final mile.

See examples of this in action in the Unit tests in the source code.
https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/ServerPluginTest.java#L109
has
examples of CQL queries on the REST API.

On Tue, Jun 7, 2011 at 5:48 PM, Boris Kizelshteyn bo...@popcha.com wrote:

 Thanks! So it seems you are saying that the bounding box represents a
 single
 point and is the same as the lat/lat lon? What if I make the bounding box
 bigger? What I am trying to do is geo queries against a bounding box made
 of
 a set of points, rather than individual points. So the query is, find the
 nodes where the given point falls inside their bounding boxes. Can I do
 this
 with REST?

 Thanks!

 On Tue, Jun 7, 2011 at 11:34 AM, Craig Taverner cr...@amanzi.com wrote:

  Hi,
 
  The bounding boxes are used by the RTree index, which is a typical way to
  index spatial data. For Point data, the lat/long and the bounding box are
  the same thing, but for other shapes (streets/LineString and Polygons),
 the
  bounding box is quite different to the actual geometry (which is not just
 a
  single lat/long, but a set of connected points forming a complex shape).
 
  The RTree does not differentiate between points and other geometries,
  because it cares only about the bounding box, and therefor we provide
 that
  even for something as simple as a Point.
 
  Does that answer the question?
 
  Regards, Craig
 
  On Tue, Jun 7, 2011 at 4:57 PM, Boris Kizelshteyn bo...@popcha.com
  wrote:
 
   Greetings!
  
   Perhaps someone using neo4j-spatial can answer this seemingly simple
   question. Nodes classified into layers have both lat/lon properties and
   bounding boxes, the bounding box seems to be required to establish the
   relationship between node and layer, however the node is not found if
 the
   lat/lon does not match the query. Can someone explain the relationship
   between these two properties on a node?
  
   Many thanks!
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Sample Linear Referencing Functions in Neo4j Spatial and GSoC

2011-06-07 Thread Craig Taverner
Done.

Although now we have 20 lines of comments for 1 line of method code.
Previously we had 4 lines of comments for one line of code. Whew!

On Tue, Jun 7, 2011 at 11:02 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Very cool.
 Maybe you could just doc the parameters more than pointing to the Oracle
 reference, so one can see it directly in the JavaDoc?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


 On Thu, Jun 2, 2011 at 2:13 PM, Craig Taverner cr...@amanzi.com wrote:

  Hi,
 
  Recently someone asked a question on StackOverflow, if Neo4j Spatial was
  capable of one of the Oracle geoprocessing funtions, SDO_LRS.LOCATE_PT
  specifically. Since this is related to the ongoing GSoC projects for
 Neo4j
  Spatial, I thought I would do a quick investigation. What I found was
 that
  the requested capabilities are available in JTS (which we include in
 Neo4j
  Spatial), but with very different names. The code to achieve this in JTS
 is
  'new LengthIndexedLine(geometry).extractPoint(measure,offset)'. I have
  wrapped these in the
  SpatialTopologyUtils.locatePoint(geometry,measure,offset), so that it is
  accessible together with some other spatial topology functions, and also
  looks more like the Oracle function.
 
  I pushed this to github, and think it can be included as a prototype for
  the
  discussions for the GSoC on Geoprocessing.
 
  Regards, Craig
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] GSoC 2011 Neo4j Geoprocessing | Weekly Report #2

2011-06-07 Thread Craig Taverner
I suggest you code review them first. Especially since there are API
changes.

On Tue, Jun 7, 2011 at 10:11 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Very nice Andreas!

 You consider it safe to pull these changes into the main repo?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


 On Sun, Jun 5, 2011 at 1:39 PM, Andreas Wilhelm a...@kabelbw.de wrote:

  Hi,
 
  This week I implemented update and search capability for spatial
  functions and following spatial functions with JUnit tests:
 
  ST_AsText, ST_AsKML, ST_AsGeoJSON, ST_AsBinary and ST_Reverse.
 
 
  Best Regards
 
  Andreas
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-06-08 Thread Craig Taverner
OK. I understand much better what you want now.

Your person nodes are not geographic objects, they are persons that can be
at many positions and indeed move around. However, the 'path' that they take
is a geographic object and can be placed on the map and analysed
geographically.

So the question I have is how do you store the path the person takes? Is
this a bunch of position nodes connected back to that person? Or perhaps a
chain of position-(next)-position-(next)-position, etc? However you have
stored this in the graph, you can express this as a geographic object by
implementing the GeometryEncoder interface. See, for example, the 6 lines of
code it takes to traverse a chain of NEXT locations and produce a LineString
geometry in the SimpleGraphEncoder at
https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82

https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82If
you do this, you can create a layer that uses your own geometry encoder (or
the SimpleGraphEncoder I referenced above, if you use the same graph
structure) and your own domain model will be expressed as LineString
geometries and you can perform spatial operations on them.

Alternatively, if your data is more static in nature, and you are analysing
only what the person did in the past, and the graph will therefor not
change, perhaps you do not care to store the locations in the graph, and you
can just import them as a LineString directly into a standard layer.

Whatever route you take, the final action you want to perform is to find
points near the LineString (path the person took). I do not think the
bounding box is the right approach for that either. You need to try, for
example, the method findClosestEdges in the utilities class at
https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115

https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115This
method can find the part of the persons path that it closest to the point of
interest. There also also many other geographic operations you might be
interested in trying, once you have a better feel for the types of queries
you want to ask.

Regards, Craig

On Wed, Jun 8, 2011 at 2:17 AM, Boris Kizelshteyn bo...@popcha.com wrote:

 Thanks for the detailed response! Here is what I'm trying to do and I'm
 still not sure how to accomplish it:

 1. I have a node which is a person

 2. I have geo data as that person moves around the world

 3. I use the geodata to create a bounding box of where that person has been
 today

 4. I want to say, was this person A near location X today?

 5. I do this by seeing if location X is in A's bounding box.

 From looking at what you suggest doing, it's not clear how I assign the
 node
 person A to a layer? Is it that the bounding box is now in the layer and
 not
 in the node? The issue then becomes, how od I associate the two as the
 RTree
 relationship seems to establish itself on the bounding box between the node
 and the layer.

 Many thanks for your patience as I learn this challenging material.

 On Tue, Jun 7, 2011 at 4:13 PM, Craig Taverner cr...@amanzi.com wrote:

  I think you need to differentiate the bounding boxes of the data in the
  layer (stored in the database), and the bounding box of the search query.
  The search query is not stored in the database, and will not be seen as a
  node or nodes in the database. So if you want to search for data within
  some
  bounding box or polygon, then express that in the search query, and you
 do
  not need to care about how your nodes are stored in the database.
 
  So when you say you want to make a larger bounding box, I assume you are
  talking about the query itself. The REST API has the method
  findGeometriesInLayer, which takes minx, maxx, miny, maxy parameters and
  you
  can set those to whatever you want for your query.
 
  The REST API also exposes the CQL query language supported by GeoTools.
  This
  allows you to perform SQL-like queries on geometries and feature
  attributes.
  For example, you can search for all objects within a specific polygon
 (not
  just a rectangular bounding box), as well as conforming to certain
  attributes. See
 
 http://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.htmlfor
  some examples of CQL.
 
  However, our current CQL support is not fully integrated with the RTree
  index. This means that the CQL itself will not benefit from the index,
 but
  be a raw search. You can, however, still get the benefit of the index by
  passing in the bounding box separately. So, for example, you want to
 search
  for data in a polygon. Make the polygon object, get it's bounding box and
  also the CQL query string. Then make a 'dynamic layer' using the CQL
 (which
  is a bit like making a prepared statement

Re: [Neo4j] neo4j-spatial

2011-06-09 Thread Craig Taverner
Hi Saikat,

Yes, your explanation was clear, but I was busy with other work and failed
to repond - my bad ;-)

Anyway, your idea is nice. And I can think of a few ways to model this in
the graph, but at the end of the day the most important thing to decide
first is what queries are you going to perform? Do you want a creative map,
that while not drawn to scale, can still be asked questions like 'how far
from the roller-coaster to the closest lunch venue?'. That kind of question
could make use of the graph and the spatial extensions to provide an answer
and show the route on the creative map, even if it is not a real to-scale
map. Is that what you want to see?

You can try contact me on skype also.

Regards, Craig

On Thu, Jun 9, 2011 at 5:35 AM, Saikat Kanjilal sxk1...@hotmail.com wrote:


 Hi Craig,Following up on this thread, was this explanation clear?  If so
 I'd like to talk more details.Regards

 From: sxk1...@hotmail.com
 To: user@lists.neo4j.org
 Subject: RE: [Neo4j] neo4j-spatial
 Date: Sun, 5 Jun 2011 20:15:27 -0700








 Hey Craig,Thanks for responding, so to be clear a theme park can have its
 own map created by the graphic artists that work at the theme park company,
 this map is sometimes 2D or sometimes a 3D map that really has no notion of
 lat long coordinates or GPS.  What I am proposing is that we have the
 ability to inject GPS coordinates into this creative map through some
 mechanism that understands what the GPS coordinates of each point in this
 creative map are.  So thats where the google map comes in, the google or
 bing map would potentially have lat long coordinates of every point in a
 theme park, so now the challenge is how do we transfer that knowledge inside
 this 2D or 3D creative map so that we can run neo4j traversal algorithms
 inside a map that has been injected with GPS data.  A theme park is just the
 beginning, imagine having the power to inject this information into any 2D
 or 3D map, that would be pretty amazing.In essence I am doing this so
 that the creative map itself
  can use neo4j and be highly interactive and meaningful.
 Let me know if that's still unclear and if so lets talk on skype.
 Regards

  Date: Mon, 6 Jun 2011 01:13:08 +0200
  From: cr...@amanzi.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] neo4j-spatial
 
  Hi Saikat,
 
  This sounds worth discussing further. I think I need to hear more about
 your
  use case. I do not know what the term 'creative map' means, and what
  traversals you are planning to do? When you talk about 'plotting points',
 do
  you mean you have a GPS and are moving inside a real theme park and want
 to
  see this inside google maps? Or are you just drawing a path on an
  interactive GIS?
 
  I think once I have some more understanding of what your use case is,
 what
  problem you are trying to solve, I am sure I will be able to give advice
 on
  how best to approach it, if it relates to anything else we are doing, or
  whether this is something you would need to put some coding time into :-)
 
  Regards, Craig
 
  On Sun, Jun 5, 2011 at 8:26 PM, Saikat Kanjilal sxk1...@hotmail.com
 wrote:
 
  
   Craig et al,I have an interesting usecase that I've been thinking about
 and
   I was wondering if it would make a good candidate for inclusion inside
   neo4j-spatial, I've read through the wiki (
   http://wiki.neo4j.org/content/Collaboration_on_Spatial_Projects) and
 was
   interested in using neo4j-spatial to take any creative 2D Map and
   geo-enabling it.  To explain in more detail lets say you are at a
 certain
   latitude and longitude in a theme park inside a google map (or a bing
 map),
   now you want to have the ability to reference that same latitude and
   longitude inside a 2d or a 3d creative map of that theme park and then
 be
   able to plot these points and enable traversal algorithms inside the
   creative map.
   I was wondering if you guys are thinking about this usecase, if not I'd
   love to work on and discuss this in more detail to see whether this
 fits
   into the neo4j-spatial roadmap.
   Thoughts?
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Traversals versus Indexing

2011-06-13 Thread Craig Taverner
Think of your domain model graph as a kind of index. Traversing that should
generally be faster than a generic index like lucene. Of course some things
do not graph well, and you should use lucene for those. But if you can find
something with a graph traversal, that is likely the way to go.

Also you should think of structuring the graph to suit the queries you plan
to perform. Then you will optimize the traversals.
On Jun 13, 2011 11:33 AM, espeed ja...@jamesthornton.com wrote:
 It depends on the traversal you are running.

 --
 View this message in context:
http://neo4j-user-list.438527.n3.nabble.com/Neo4j-Traversals-versus-Indexing-tp3057515p3057538.html
 Sent from the Neo4J User List mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Auto Indexing for Neo4j

2011-06-14 Thread Craig Taverner
This is great news.

Now I'm really curious about the next step, and that is allowing indexes
other than lucene. For example, the RTree index in neo4j-spatial was never
possible to wrap behind the normal index API, because that was designed only
for properties of nodes (and relationships), but the RTree is based on
something completely different (complete spatial geometries). However, the
new auto-indexing feature implies that any node can be added to an index
without the developer needing to know anything about the index API. Instead
the index needs to know if the node is appropriate for indexing. This is
suitable for both lucene and the RTree.

So what I'd like to see is that when configuring auto-indexing in the first
place, instead of just specifying properties to index, specify some indexer
implementation that can be created and run internally. For example, perhaps
you pass the classname of some class that implements some necessary
interface, and then that is instantiated, passed config properties, and used
to index new or modified nodes. One method I could imagine this interface
having would be a listener for change events to be evaluated for whether or
not the index should be activated for a node change. For the lucene property
index, this method would return true if the property exists on that node.
For the RTree this method would return true if the node contained the
meta-data required for neo4j-spatial to recognize it as a spatial type?
Alternatively just an index method that does nothing when the nodes are not
to be indexed, and indexes when necessary?

So, are we now closer to having this kind of support?

On Tue, Jun 14, 2011 at 11:30 PM, Chris Gioran 
chris.gio...@neotechnology.com wrote:

 Good news everyone,

 A request that's often come up on the mailing list is a mechanism for
 automatically indexing properties of nodes and relationships.

 As of today's SNAPSHOT, auto-indexing is part of Neo4j which means nodes
 and relationships can now be indexed based on convention, requiring
 far less effort and code from the developer's point of view.

 Getting hold of an automatic index is straightforward:

 AutoIndexerNode nodeAutoIndexer = graphDb.index().getNodeAutoIndexer();
 AutoIndexNode nodeAutoIndex = nodeAutoIndexer.getAutoIndex();

 Once you've got an instance of AutoIndex, you can use it as a read-only
 IndexNode.

 The AutoIndexer interface also supports runtime changes and
 enabling/disabling the auto indexing functionality.

 To support the new features, there are new Config
 options you can pass to the startup configuration map in
 EmbeddedGraphDatabase, the most important of which are:

 Config.NODE_AUTO_INDEXING (defaults to false)
 Config.RELATIONSHIP_AUTO_INDEXING (defaults to false)

 If set to true (independently of each other) these properties will
 enable auto indexing functionality and at the successful finish() of
 each transaction, all newly added properties on the primitives for which
 auto indexing is enabled will be added to a special AutoIndex (and
 deleted or changed properties will be updated accordingly too).

 There are options for fine grained control to determine
 properties are indexed, default behaviors and so forth. For example, by
 default all properties are indexed. If you want only properties name and
 age for Nodes and since and until for Relationships
 to be auto indexed, simply set the initial configuration as follows:

 Config.NODE_KEYS_INDEXABLE = name, age;
 Config.RELATIONSHIP_KEYS_INDEXABLE=since, until;

 For the semantics of the auto-indexing operations, constraints and more
 detailed examples, see the documentation available  at

 http://docs.neo4j.org/chunked/1.4-SNAPSHOT/auto-indexing.html

 We're pretty excited about this feature since we think it'll make your
 lives
 as developers much more productive in a range of use-cases. If you're
 comfortable with using SNAPSHOT versions of Neo4j, please try it out
 and let us know what you think - we'd really value your feedback.

 If you're happier with using packaged milestones then this feature
 will be available from 1.4 M05 in a couple of weeks from now.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Slow Traversals on Nodes with too many Relationships

2011-06-15 Thread Craig Taverner
Could this also be related to the possibility that in order to determine
relationship type and direction, the relationships need to be loaded from
disk? If so, then having a large number of relationships on the same node
would decrease performance, if the number was large enough to affect the
disk io caching.

If this is the case, perhaps adding a proxy node for the incoming
relationships would work-around the problem? Of course then you have doubled
the number of part nodes (two for each part, one part and one containers
proxy).

On Wed, Jun 15, 2011 at 10:27 PM, Rick Bullotta rick.bullo...@thingworx.com
 wrote:

 I would respectfully disagree that it doesn't necessarily represent
 production usage, since in some cases, each query/traversal will be unique
 and isolated to a part of a subgraph, so in some cases, a cold query may
 be the norm

 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
 On Behalf Of Michael Hunger
 Sent: Wednesday, June 15, 2011 10:25 AM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Slow Traversals on Nodes with too many Relationships

 That is rather a case of warming up your caches.

 Determining the traversal speed from the first run is not a good benchmark
 as it doesn't represent production usage :)
 The same (warming up) is true for all kinds of benchmarks (except for
 startup performance benchmarks).

 Cheers

 Michael

 Am 15.06.2011 um 14:48 schrieb Agelos Pikoulas:

  I have a few Part nodes related with each via HASPART
  relationship/edges.
  (eg Part1---HASPART---Part2---HASPART---Part3 etc) .
  TraversalDescription works fine, following each Part's outgoing HASPART
  relationship.
 
  Then I add a large number (say 100.000) of Container Nodes, where each
  Container has a CONTAINS relation to almost *every* Part node.
  Hence each Part node now has a 100.000 incoming CONTAINS relationships
 from
  Container nodes,
  but only a few outgoing HASPART relationships to other Part nodes.
 
  Now my previous TraversalDescription run extremely slow (several seconds
  inside each IteratorPath.next() call)
  Note that I do define relationships(RT.HASPART, Direction.OUTGOING) on
 the
  TraversalDescription,
  but it seems its not used by neo4j as a hint. Note that on a subsequent
 run
  of the same Traversal, its very quick indeed.
 
  Is there any way to use Indexing on relationships for such a scenario, to
  boost things up ?
 
  Ideally, the Traversal framework could use automatic/declerative indexing
 on
  Node Relationship types and/or direction to perform such traversals
 quicker.
 
  Regards
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Most Efficient way to query in my use cases

2011-06-15 Thread Craig Taverner
Another common thing to do in this case is create a node for the purchase
action. This node would be related to the purchaser (user), item (pen) and
shop, and would contain data appropriate to the purchase (date/time, price,
etc).

Then traverse from the shop or the pen to all purchase actions that
reference the other one (shop or pen).

On Thu, Jun 16, 2011 at 4:48 AM, Jim Webber j...@neotechnology.com wrote:

 Hi Manav,

 I think there's a relationship missing here.

 Pen--SOLD_BY--shop

 That way it's easy to find all the pens that a shop sold, and who them sold
 them to.

 In general modelling your domain expressively does not come at an increase
 cost with Neo4j (caveat: you can still create write hotspots).

 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Slow Traversals on Nodes with too many Relationships

2011-06-15 Thread Craig Taverner
I understood that on windows the memory mapped sizes needed to be included
in the heap, since they are not allocated outside the heap as they are on
linux/mac. So in this case he needs a larger heap (and make sure the memory
mapped files are much smaller than the heap). The relevant part of the
configuration settings doc says:

When running Neo4j on Windows the size of the memory-mapped nioneo
configurations need to be added to the heap size parameter. On Linux and
Unix-systems memory mapped IO is not included in the heap size.


I still think that the solution to this case is to group the different
relationship types into separate sub-graphs, so that the performance of
traversing  HAS_ONE is not affected by the number of relationships of
CONTAINS. Of course traversing the CONTAINS will still be slow without
increasing the cache, as you suggest.

On Thu, Jun 16, 2011 at 12:07 AM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 Agelos,

 sorry, didn't want to sound that way.

 512M ram is not very much for larger graphs. Neo4j has to cache nodes,
 relationships in the heap as well as you own datastructures.

 The memory mapped files for the datastores are kept outside the heap.

 Normally with your 4G I'd suggest using about 1.5G for heap and 1.5G for
 the memory mapped files.
 http://wiki.neo4j.org/content/Configuration_Settings

 Do you have a small test-case available that creates your graph and runs
 your traversal? Then I could have a look at that and also do some
 profiling to determine the issues for this slowdown.

 The indexing doesn't help as it also has to hit caches or disk. The graph
 traversal is normally a very efficient operation that shouldn't experience
 this bad performance.

 Cheers

 Michael


 P.S. I just use my mail client for handling the mailing list and it works
 fine for me. Imho Gmail groups threads automatically.


 Am 15.06.2011 um 17:40 schrieb Agelos Pikoulas:

  Re: [Neo4j] Slow Traversals on Nodes with too many
Relationships
 
  I have to respectfully agree with Rick Bullotta.
 
  I was suspecting the big-O is not linear for this case.
 
  To verify I added x4 Container nodes (400.000) and their appropriate
  Relationships, and it is now *unbelievably* slow :
  It does not take x4 more, but it takes more than 30-40 seconds for each
  next() Remind you 100K nodes = ~2secs for each next() !!!
 
  And only to make matters worse, the subsequent runs weren't fast either -
  they actually took more time than the first
  (1st TotalTraversalTime= 389936ms, 2nd TotalTraversalTime= 443948ms)
 
  The whole setup is running on
  Eclipse 3.6, with -Xmx512m on JavaVM,
  Windows2003 VMWare machine with 4GB, running on a fast 2nd gen SSD (OCZ
  Vertex 2). The neo4J data resides on this SSD.
  The 100.000 nodes data files were ~250MB, the 400.000 one is ~1GB.
 
  I wonder what would happen if the Container nodes were a few million
 (which
  will be my case) - it will run forever.
 
  Could you please looking into my suggestion - i.e Using a 'smart' behind
  the scenes Indexing on both *RelationshipType* and *Direction* that
  Traversals actually use to boost things up ?
 
  To another topic, how does one use this mailing list - I use it through
  gmail and I am utterly lost - is there a better client/UI to actually
  post/reply into threads ?
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Auto Indexing for Neo4j

2011-06-18 Thread Craig Taverner
I am using only one relationship type in my index tree, and made traversal
decisions based on properties of the tree nodes, but have considered an
'optimization' based on embedding the index keys into the relationship
types, which I think is what you did. However, I am not convinced it will
work well because I suspect there will be losses if the total number of
relationship types gets very high. I think this is a separate issue to the
total number of relationships, but might affect all traversers, since there
must exist a hashmap of all relationship types.

Still it is very cool what Peter says below, because if all these
'experiments' with in-graph indexes can get put behind the standard index
API, then we can get much more testing of this approach, and hopefully learn
what we need to make this a viable solution for wide use.

On Wed, Jun 15, 2011 at 4:56 AM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 A problem with a probably dumb index in a graph that I created for an
 experiment was the
 performance of getAllRelationships on that machine (it was a very large
 graph with all nodes being indexed).

 It was a mapping from long values to nodes, my simplistic approach just
 chopped the long values into chunks of 3 digits and used those 3 digits as
 relationship-types (i.e. 1000 additional rel-types).
 to form a tree which pointed to the node in question at the end.

 Will have to investigate that further.


 Am 14.06.2011 um 23:43 schrieb Peter Neubauer:

  Craig,
  the autoindexing is one step in this direction. The other is to enable
  the Spatial and other in-graph indexes like the graph-collections
  (timeline etc) at all to be treated like normal index providers. When
  that is done (will talk to Mattias who is coming back from vacation
  tomorrow on that), we are in a position to think about more complex
  autoindex providers.
 
  Also, the possibility to treat Neo4j Spatial and other graph
  structures as index providers, would hook into the index framework and
  expose things to higher level queries like Cypher and Gremlin, e.g.
  combining a spatial bounding box geometry search with a graph
  traversal for suitable properties that are less than 2 kilometers from
  the nearest school, sorting the results, returning only price and lat
  as columns, the 3 topmost hits.
 
  START geom = (index:spatial:'BBOX(the_geom, -90, 40, -60, 45)')
  MATCH (geom)--(fast), (fast)-[r, :NEAR]-(school)
  WHERE fast.roooms4 AND school.classes4 AND r.length2return
  fast.pic?, fast.lon?, fast.lat?
  SORT BY fast.price, fast.lat^
  SLICE 3
 
  So, I think the next step is to make in-graph indexing structures plug
  into the index framework, and then into autoindexing :)
 
 
  Cheers,
 
  /peter neubauer
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org   - Your high performance graph
 database.
  http://startupbootcamp.org/- Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Tue, Jun 14, 2011 at 5:49 PM, Craig Taverner cr...@amanzi.com
 wrote:
  This is great news.
 
  Now I'm really curious about the next step, and that is allowing indexes
  other than lucene. For example, the RTree index in neo4j-spatial was
 never
  possible to wrap behind the normal index API, because that was designed
 only
  for properties of nodes (and relationships), but the RTree is based on
  something completely different (complete spatial geometries). However,
 the
  new auto-indexing feature implies that any node can be added to an index
  without the developer needing to know anything about the index API.
 Instead
  the index needs to know if the node is appropriate for indexing. This is
  suitable for both lucene and the RTree.
 
  So what I'd like to see is that when configuring auto-indexing in the
 first
  place, instead of just specifying properties to index, specify some
 indexer
  implementation that can be created and run internally. For example,
 perhaps
  you pass the classname of some class that implements some necessary
  interface, and then that is instantiated, passed config properties, and
 used
  to index new or modified nodes. One method I could imagine this
 interface
  having would be a listener for change events to be evaluated for whether
 or
  not the index should be activated for a node change. For the lucene
 property
  index, this method would return true if the property exists on that
 node.
  For the RTree this method would return true if the node contained the
  meta-data required for neo4j-spatial to recognize it as a spatial type?
  Alternatively just an index method that does nothing when the nodes are
 not
  to be indexed, and indexes when necessary?
 
  So, are we now closer to having this kind of support?
 
  On Tue, Jun 14, 2011 at 11:30 PM, Chris

Re: [Neo4j] More spatial questions

2011-06-19 Thread Craig Taverner
Hi Nolan,

I think I can answer a few of your questions. Firstly, some background. The
graph model of the OSM data is based largely on the XML formated OSM
documents, and there you will find 'nodes', 'ways', 'relations' and 'tags'
each as their own xml-tag, and as a consequence each will also have their
own neo4j-node in the graph. Another point is that the geometry can be based
on one or more nodes or ways, and so we always create another node for the
geometry, and link it to the osm-node, way or relation that represents that
geometry.

What all this boils down to is that you cannot find the tags on the geometry
node itself. You cannot even find the location on that node. If you want to
use the graph model in a direct way, as you have been trying, you really do
need to know how the OSM data is modeled. For example, for a LineString
geometry, you would need to traverse from the geometry node to the way node
and finally to the tags node (to get the tags). To get to the locations is
even more complex. Rather than do that, I would suggest that you work with
the OSM API we provided with the OSMLayer, OSMDataset and OSMGeometryEncoder
classes. Then you do not need to know the graph model at all.

For example, OSMDataset has a method for getting a Way object from a node,
and the returned object can be queried for its nodes, geometry, etc.
Currently we provide methods for returning neo4j-nodes as well as objects
that make spatial sense. One minor issue here is the ambiguity inherent in
the fact that both neo4j and OSM make use of the term 'node', but for
different things. We have various solutions to this, sometimes replacing
'node' with 'point' and sometimes prefixing with 'osm'. The unit tests in
TestsForDocs includes some tests for the OSM API.

My first goal is to find the nearest OSM node to a given lat, lon. My
 attempts seem to be made of fail thus far, however. Here's my code:


Most of the OSM dataset is converted into LineStrings, and what you really
want to do is find the closest vertex of the closest LineString. We have a
utility function 'findClosestEdges' in the SpatialTopologyUtils class for
that. The unit tests in TestSpatialUtils, and the testSnapping() method in
particular, show use of this.

My thinking is that nodes should be represented as points, so I can't
 see why this fails. When I run this in a REPL, I do get a node back. So
 far so good. Next, I want to get the node's tags. So I run:


The spatial search will return 'geometries', which are spatial objects. In
neo4j-spatial every geometry is represented by a unique node, but it is not
required that that node contain coordinates or tags. That is up to the
GeometryEncoder. In the case of the OSM model, this information is
elsewhere, because of the nature of the OSM graph, which is a highly
interconnected network of points, most of which do not represent Point
geometries, but are part of much more complex geometries (streets, regions,
buildings, etc.).

n.getSingleRelationship(OSMRelation.TAGS, Direction.INCOMING)


The geometry node is not connected directly to the tags node. You need two
steps to get there. But again, rather than figure out the graph yourself,
use the API. In this case, instead of getting the geometry node from the
SpatialDatabaseRecord, rather just get the properties using getPropertyNames
and getProperty(String). This API works the same on all kinds of spatial
data, and in the case of OSM data will return the TAGS, since those are
interpreted as attributes of the geometries.

n.getSingleRelationship(OSMRelationship.GEOM,
 Direction.INCOMING).getOtherNode(n).getPropertyKeys
 I see what appears to be a series of tags (oneway, name, etc.) Why are
 these being returned for OSMRelation.GEOM rather than OSMRelation.TAGS?


These are not the tags. Now you have found the node representing an OSM
'Way'. This has a few properties on it that are relevant to the way, the
name, whether the street is oneway or not, etc. Sometimes these are based on
values in the tags, but they are not the tags themselves. This node is
connected to the geometry node and the tags node, so you were half-way there
(to the tags that is). You started at the geometry node, and stepped over to
the way node, and one more step (this time with the TAGS relationship) would
have got you to the tags.

But again, I advise against trying to explore the OSM graph by itself. As
you have already found, it is not completely trivial. What you should have
done is access the attributes directly from the search results.

Additionally, I see the property way_osm_id, which clearly isn't a tag.
 It would also seem to indicate that this query returned a way rather
 than a node like I'd hoped. This conclusion is further born out by the
 tag names. So clearly I'm not getting the search correct. But beyond
 that, the way being returned by this search isn't close to the lat,lon I
 provided. What am I missing?


The lat/long values are quite a bit deeper in the graph. In the case 

Re: [Neo4j] neo4j-spatial roadmap/stability

2011-06-23 Thread Craig Taverner
Hi Christopher,

Thanks for your interest in neo4j and neo4j-spatial. I will answer your
questions and comments inline.

I am working for the largest German speaking travel and holiday portal.
 Currently we are using a relatively simple MySQL based spatial distance
 functionality. We plan to enhance this by something which is capable of a
 flexible set of spatial queries. We will evaluate Neo4j-Spatial for that
 and
 benchmark it against PostGIS/PostGreSQL.


This would be a very interesting application for neo4j-spatial. I'm sure we
could support you in that. Obviously it is not as mature as PostGIS, but I
think it is very suitable for flexible queries, especially if you plan to
combine a complex domain model with spatial data, or expose a spatial
element to existing domains.

I found some Roadmap descriptions in the Neo4j Wiki (
 http://wiki.neo4j.org/content/Neo4j_Spatial_Project_Plan), but I am not
 sure
 that these are still valid. Craig said (somewhere) that Neo4j Spatial is
 still alpha (I hope that this means that only the interfaces are still
 unstable). And I know that neo4j-spatial is an open source project where
 there is no Neo Technology responsibility.


The project plan you found was unfortunately the original plan put down
before neo4j-spatial really started, and represents the expectations for
2010. Most of these were met, and several other capabilities achieved in
addition. I will edit the wiki to more accurately reflect the current status
of the project.

However, it is still true that it is in an alpha state. The API's are likely
to change. Since last September we have viewed it as an alpha release,
available for people to try out and provide feedback on. We believe it is
capable of many useful tasks, and can be used for real applications. But it
has not been in the 'wild' for long, and so there are probably remaining
bugs and performance issues. In addition, as mentioned before, we will
almost certainly change the API's a little as we receive more feedback and
move the system forward. Already in 2011 there have been three new additions
influencing the API: the SimplePointLayer for LBS and related capabilities,
the beginnings of the REST API for inclusion in Neo4j-Server, and the
Geoprocessing features.

Can you drop a few words about the Spatial roadmap, its stability and
 planned licensing (all based on using it on a high volume web site)?


I think we need Peter's opinion on the licensing. I believe it is currently
the same as neo4j itself. The code comments state AGPL, and I am not sure if
the recent decision to move the core to GPL is applicable to the spatial
code.

For the roadmap we will also update the wiki pages. Currently the efforts
are to:

   - Improve the OSM model API (some basic API for exploring the OSM ways
   and nodes, already in place but needing some refinement)
   - Improve the REST API for spatial (we have some customers trying this
   out, and will make enhancements based on their feedback)
   - Integrate the spatial index into the new automatic indexing feature of
   Neo4j (some initial prototype of this is in place, and will be refined for
   the 1.5 release of Neo4j)
   - Improved Geoprocessing support, particularly on the OSM model. This is
   involving a GSoC project and will be presented at FOSS4G in Denver this
   year. See
   http://2011.foss4g.org/sessions/geoprocessing-neo4j-spatial-and-osm

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j -- Can it be embedded in Android?

2011-06-24 Thread Craig Taverner
I heard that Peter Neubauer made a port of neo4j to android a few years ago,
but that nothing has been done since and no version since then would work.
So my understanding is that it does not work on android, but that it is
possible to make it work (with some work ;-).

Peter is away, but I expect he would have a better answer than me.

On Fri, Jun 24, 2011 at 1:33 PM, Sidharth Kshatriya sid.kshatr...@gmail.com
 wrote:

 Dear All,

 I have googled for this on the web and did not arrive at a satisfactory
 answer.

 *Question: Is it possible to run Neo4j on Android? *

 Thanks,

 Sidharth

 --
 Sidharth Kshatriya
 www.sidk.info
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j -- Can it be embedded in Android?

2011-06-24 Thread Craig Taverner
Personally what I would like to see would be a sub-graph approach, with the
android device storing a sub-graph of the main database, and updating that
asynchronously with the server. Seems like something that can be done in a
domain specific way, but much harder to do generically. I wanted this for
OSM, with the local OSM graph on the android device representing a local map
supporting fast LBS services, and automatically updating from the main OSM
graph on a big central server as the user travels.

On Fri, Jun 24, 2011 at 2:56 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

 I think the limited capabilities of the Android device(s) (RAM, primarily)
 limit the usefulness of Neo4J versus alternatives since the datasets are
 usually small and simple in mobile apps.  If we need any heavy-duty graph
 work for a mobile app, we'd do it on the server.

 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
 On Behalf Of Sidharth Kshatriya
 Sent: Friday, June 24, 2011 8:53 AM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Neo4j -- Can it be embedded in Android?

 Yes, I saw that on the mailing list archives too. I would have though there
 would be some interest in using this on android -- but there seems to be no
 news about it since...

 On Fri, Jun 24, 2011 at 6:13 PM, Rick Bullotta
 rick.bullo...@thingworx.comwrote:

  I remember something like that, too.  The main issue is probably the
  non-traditional file system that Android exposes.
 
  -Original Message-
  From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
  On Behalf Of Craig Taverner
  Sent: Friday, June 24, 2011 8:37 AM
  To: Neo4j user discussions
  Subject: Re: [Neo4j] Neo4j -- Can it be embedded in Android?
 
  I heard that Peter Neubauer made a port of neo4j to android a few years
  ago,
  but that nothing has been done since and no version since then would
 work.
  So my understanding is that it does not work on android, but that it is
  possible to make it work (with some work ;-).
 
  Peter is away, but I expect he would have a better answer than me.
 
  On Fri, Jun 24, 2011 at 1:33 PM, Sidharth Kshatriya 
  sid.kshatr...@gmail.com
   wrote:
 
   Dear All,
  
   I have googled for this on the web and did not arrive at a satisfactory
   answer.
  
   *Question: Is it possible to run Neo4j on Android? *
  
   Thanks,
  
   Sidharth
  
   --
   Sidharth Kshatriya
   www.sidk.info
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Sidharth Kshatriya
 www.sidk.info
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Recent slowdown in imports with lucene

2011-06-25 Thread Craig Taverner
Hi,

Has anyone noticed a slowdown of imports into neo4j with recent snapshots?
Neo4j-spatial importing OSM data (which uses lucene to find matching nodes
for ways) is suddenly running much slower than usual on non-batch imports.
For most of my medium sized test cases, I normally have surprisingly similar
import times for batch inserter and non-batch inserter
(EmbeddedGraphDatabase) versions of the OSM import, but in recent runs the
normal API is now more than 10 times slower. Down to 70 nodes per second,
which is insanely slow.

Any idea if there is something in the recent snapshots for me to look into?
Reproducing the problem requires simply running the TestOSMImport test cases
in neo4j-spatial. I have only tried this on my laptop, so I have not ruled
out that there is something local going on.

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Recent slowdown in imports with lucene

2011-06-26 Thread Craig Taverner
Sorry for the lack of details. I wrote the email late at night, as I am
again.

Anyway, the relevant code in github is
OSMImporter.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java.
When adding nodes to the graph, it also adds the osm-id to a lucene index.
There is no index#removal call, only multiple index#add calls within the
same transaction. In fact we call index.add and index.get for one index (osm
changesets), while calling index.add on another (osm-nodes). The relevant
lines of code are 812 for adding new OSM nodes to the graph, and 914 for
finding changesets in a different index.

I have not investigated for which version of neo4j the slowdown started, or
if there is somehow some other cause. I will try find time to do that later
this week. But I thought I should ask on the list anyway in case anyone else
has a similar problem, or if there are some obvious answers.

On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson
matt...@neotechnology.comwrote:

 Please elaborate on how you are using your index. Are you using
 Index#remove(entity,key) or Index#remove(entity) followed by get/query in
 the same tx? There was a recent change in transactional state
 implementation, where a full representation (in-memory lucene index) was
 needed for it to be able to return accurate results in some corner cases.
 That change could slow things down, but not that much though. I'll give
 some
 different scenarios a go and see if I can find some culprit for this.

 But again, a little more information would be useful, as always.

 2011/6/26 Craig Taverner cr...@amanzi.com

  Hi,
 
  Has anyone noticed a slowdown of imports into neo4j with recent
 snapshots?
  Neo4j-spatial importing OSM data (which uses lucene to find matching
 nodes
  for ways) is suddenly running much slower than usual on non-batch
 imports.
  For most of my medium sized test cases, I normally have surprisingly
  similar
  import times for batch inserter and non-batch inserter
  (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs
 the
  normal API is now more than 10 times slower. Down to 70 nodes per second,
  which is insanely slow.
 
  Any idea if there is something in the recent snapshots for me to look
 into?
  Reproducing the problem requires simply running the TestOSMImport test
  cases
  in neo4j-spatial. I have only tried this on my laptop, so I have not
 ruled
  out that there is something local going on.
 
  Regards, Craig
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Recent slowdown in imports with lucene

2011-06-26 Thread Craig Taverner
Hi again,

My apologies, but I have found the problem, and it is in the OSMImporter
itself, nothing to do with Lucene or Neo4j. Peter made a
commithttps://github.com/neo4j/neo4j-spatial/commit/b5e0f1d1a11ed9c8b2b8074f529362a1607a7643#src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.javain
May that while at first glance appears to be a cleanup of my code
(removal of string literals), it did have two meaningful changes I only saw
on deeper inspection:

   - Addition of the map type: exact to the index creating (when I
   removed this, node creation improved from 70/s to 140/s)
   - User control over the commit size (previously I had hard-coded this to
   5000 nodes per tx).

There was a small, but significant bug in the commit size, with the new user
parameter not being used to initialize anything, with the consequence that
every node was committed individually. Setting the block size back to 5000
increased the node creation rate to nearly 1 (over 100 times faster).
That is a serious improvement.

Sorry again for wasting space on the list. I'm glad this was a user error,
though, not a neo4j issue :-)

Regards, Craig

On Mon, Jun 27, 2011 at 12:54 AM, Craig Taverner cr...@amanzi.com wrote:

 Sorry for the lack of details. I wrote the email late at night, as I am
 again.

 Anyway, the relevant code in github is 
 OSMImporter.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java.
 When adding nodes to the graph, it also adds the osm-id to a lucene index.
 There is no index#removal call, only multiple index#add calls within the
 same transaction. In fact we call index.add and index.get for one index (osm
 changesets), while calling index.add on another (osm-nodes). The relevant
 lines of code are 812 for adding new OSM nodes to the graph, and 914 for
 finding changesets in a different index.

 I have not investigated for which version of neo4j the slowdown started, or
 if there is somehow some other cause. I will try find time to do that later
 this week. But I thought I should ask on the list anyway in case anyone else
 has a similar problem, or if there are some obvious answers.


 On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson 
 matt...@neotechnology.com wrote:

 Please elaborate on how you are using your index. Are you using
 Index#remove(entity,key) or Index#remove(entity) followed by get/query in
 the same tx? There was a recent change in transactional state
 implementation, where a full representation (in-memory lucene index) was
 needed for it to be able to return accurate results in some corner cases.
 That change could slow things down, but not that much though. I'll give
 some
 different scenarios a go and see if I can find some culprit for this.

 But again, a little more information would be useful, as always.

 2011/6/26 Craig Taverner cr...@amanzi.com

  Hi,
 
  Has anyone noticed a slowdown of imports into neo4j with recent
 snapshots?
  Neo4j-spatial importing OSM data (which uses lucene to find matching
 nodes
  for ways) is suddenly running much slower than usual on non-batch
 imports.
  For most of my medium sized test cases, I normally have surprisingly
  similar
  import times for batch inserter and non-batch inserter
  (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs
 the
  normal API is now more than 10 times slower. Down to 70 nodes per
 second,
  which is insanely slow.
 
  Any idea if there is something in the recent snapshots for me to look
 into?
  Reproducing the problem requires simply running the TestOSMImport test
  cases
  in neo4j-spatial. I have only tried this on my laptop, so I have not
 ruled
  out that there is something local going on.
 
  Regards, Craig
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user



___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] cassandra + neo4j graph

2011-06-27 Thread Craig Taverner
Hi,

I can comment on the spatial side. The
neo4j-spatialhttps://github.com/neo4j/neo4j-spatiallibrary provides
some tools for doing spatial analysis on your data. I do
not know exactly what you plan to do, but since you mention user and place
locations, I guess you are likely to be asking the database for proximity
searches (users near me, or places of interest near me), in which case the
SimplePointLayerhttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SimplePointLayer.javaclass
should provide you what you need. Read the code (linked above), it is
simple. Or read the related blog Neo4j Spatial, Part1: Finding things close
to other 
thingshttp://blog.neo4j.org/2011/03/neo4j-spatial-part1-finding-things.html.
You also do not need to include neo4j-spatial from the beginning. Just model
your graph in a way suiting your domain, and when you want to enable spatial
searches, include neo4j-spatial dependencies in your pom and start using it.
If you happen to conform to one of the expected spatial structures, you can
add you nodes to the spatial index directly, otherwise implement a
GeometryEncoder and things should work from there. What I think you might
find interesting is that you can edit the search mechanism to filter on both
spatial and domain specific characteristics in the same pass. There are
various options for this, so we can discuss that later, should you wish.

Regards, Craig


On Mon, Jun 27, 2011 at 3:49 PM, Aliabbas Petiwala aliabba...@gmail.comwrote:

 thanks for the informative reply , to add more , the social networking
 website will be geo aware and some spatial info also needs to be
 stored  like the coordinates of the user node or the coordinates of
 the location\place how can we add more also will neo4j alone + spatial
 suffice ? can there be multiple masters for load balancing and how
 about splitting the graph in the design itself like designing in terms
 of multiple graphs which are mapped to a glue graph?
 hats off for building such a pioneering technology!

 regards,
 Aliabbas

 On 6/26/11, Jim Webber j...@neotechnology.com wrote:
  Hi Aliabbas,
 
  It's difficult to make pronouncements about your solution design without
  knowing about it, but here are some heuristics that can help you to plan
  whether you go with a native Neo4j solution or mix it up with other
 stores.
  All of these are only ideas and you should test first to ensure they make
  sense in your domain.
 
  1. Document/record size. If each node is likely to contain a lot of data
  (e.g. many megabytes) then you may choose to hold that outside of Neo4j
  (e.g. file system, KV store). Otherwise Neo4j.
 
  2. Length of individual fields. If they're small enough to fit within our
  short-string parameters (optimised around post codes, telephone numbers
 etc)
  then you get a performance boost compared to longer strings (which live
 in a
  separate store file in Neo4j). If your individual fields are really
 really
  long (See above, many megabytes), then consider moving them outside
 Neo4j.
  If you can slice up your fields into shorter strings then you'll get a
 good
  performance and footprint boost.
 
  3. Many machines. Neo4j has master/slave replication so write performance
 is
  asymptotically limited by the IO performance of the master (while reads
  scale horizontally, pretty much). The number of nodes you have is not a
  problem for Neo4j, so what is critical is whether a single master can
 handle
  the write load you want to throw at it. Since modern buses are fast, and
  since graph data structures are often less write-heavy than equivalents
 in
  other data stores*, I'd suggest that you might be well served by Neo4j
 here.
 
  But my overriding advice is to spike something with Neo4j and then, only
 if
  you find something that doesn't work in your context, to think about
 adding
  another data store.
 
  Jim
 
  * I'll be blogging about this shortly since it's a common enough
  misconception that 1000 writes in a relational/other NOSQL database
 implies
  1000 writes in a graph, whereas often it's a single write meaning graphs
 can
  be 1000 times better for the same workload.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 


 --
 Aliabbas Petiwala
 M.Tech CSE
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j-graph-collections

2011-06-28 Thread Craig Taverner
The RTree in principle should be generalizable, but the current
implementation in neo4j-spatial does make a few assumptions specific to
spatial data, and makes use of spatial envelopes for the tree node bounding
boxes. It is also specific to 2D. We could make a few improvements first,
like generalizing to n-dimensions, replacing the recursive search with a
traverser and generalizing the bounding boxes to be simple double-arrays.
Then the only thing left would be to decide if it is ok for it to be based
on n-dim doubles or should be generalized to more types.

On Tue, Jun 28, 2011 at 11:14 PM, Saikat Kanjilal sxk1...@hotmail.comwrote:

 I would be interested in helping out with this, let me know next steps.

 Sent from my iPhone

 On Jun 28, 2011, at 8:49 AM, Niels Hoogeveen pd_aficion...@hotmail.com
 wrote:

 
  A couple of weeks ago Peter Neubauer set up a repository for in-graph
 datastructures: https://github.com/peterneubauer/graph-collections.
  At this time of writing only the Btree/Timeline index is part of this
 component.
  In my opinion it would be interesting to move the Rtree parts of
 neo-spatial to neo4j-graph-collections too.
  I looked at the code but don't feel competent to seperate out those
 classes that support generic Rtrees from those classes that are clearly
 spatial related.
  Is there any enthusiasm for such a project and if so, who is willing and
 able to do this?
  Niels
 
 
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j-graph-collections

2011-06-29 Thread Craig Taverner
I have previously used two solutions to deal with multiple types in btrees:

   - My first index in 2009 was a btree-like n-dim index using generics to
   support int[], long[], float[] and double[] (no strings). I used this for
   TimeLine (long[1]) and Location (double[2]). The knowledge about what type
   was used was in the code for constructing the index (whether a new index or
   accessing an existing index in the graph).
   - In December I started my amanzi-index (on
githubhttps://github.com/craigtaverner/amanzi-index)
   that is also btree-like, n-dimensional. But this time it can index multiple
   types in the same tree (so a float, int and string in the same tree, instead
   of being forced to have all properties of the same type). It is a re-write
   of the previous index to support Strings, and mixed types. This time it does
   save the type information in meta-data at the tree root.

The idea of using a 'comparator' class for the types is similar, but simpler
than the idea I implemented for amanzi-index, where I have mapper classes
that describe not only how to compare types, but also how to map from values
to index keys and back. This includes (to some extent) the concept of the
lucene analyser, since the mapper can decide on custom distribution of, for
example, strings and category indexes.

For both of these indexes, you configure the index up front, and then only
call index.add(node) to index a node. This will fit in well with the new
auto-indexing ideas in neo4j.

On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:






 At this moment Btree only supports the primitive datatype long, while Rtree
 only supports the datatype double. For Btree it makes sense to at least
 support strings, floats, doubles and ints too. Use cases for these data
 types are pretty obvious and are Btree backed in (almost) every RDBMS
 product around.I think the best solution would be to create Comparator
 objects wrapping these primitive data types and store the class name of the
 comparator in root of the index tree. This allows users to create their own
 comparators for datatypes not covered yet. It would make sense people would
 want to store BigInt and BigDecimal objects in a Btree too, others may want
 to store dates (instead of datetime), fractions, complex numbers or even
 more exotic data types.
 Niels
  From: sxk1...@hotmail.com
  To: user@lists.neo4j.org
  Date: Tue, 28 Jun 2011 22:43:24 -0700
  Subject: Re: [Neo4j] neo4j-graph-collections
 
 
  I've read through this thread in more detail and have a few thoughts,
 when you talk about type I am assuming that you are referring to an
 interface that both (Btree,Rtree) can implement, for the data types I'd like
 to understand the use cases first before implementing the different data
 types, maybe we could store types of Object instead of Long or Double and
 implement comparators in a more meaningful fashion.   Also I was wondering
 if unit tests would need to be extracted out of the spatial component and
 embedded inside the graph-collections component as well or whether we'd
 potentially need to write brand new unit tests as well.
  Craig as I mentioned I'd love to help, let me know if it would be
 possible to fork a repo or to talk in more detail this week.
  Regards
 
   From: pd_aficion...@hotmail.com
   To: user@lists.neo4j.org
   Date: Wed, 29 Jun 2011 01:35:43 +0200
   Subject: Re: [Neo4j] neo4j-graph-collections
  
  
   As to the issue of n-dim doubles, it would be interesting to consider
 creating a set of classes of type Orderable (supporting , =, , =
 operations), this we can use in both Rtree and Btree. Right now Btree only
 supports datatype Long. This should also become more generic. A first step
 we can take is at least wrap the common datatypes in Orderable classes.
   Niels
  
Date: Wed, 29 Jun 2011 00:32:15 +0200
From: cr...@amanzi.com
To: user@lists.neo4j.org
Subject: Re: [Neo4j] neo4j-graph-collections
   
The RTree in principle should be generalizable, but the current
implementation in neo4j-spatial does make a few assumptions specific
 to
spatial data, and makes use of spatial envelopes for the tree node
 bounding
boxes. It is also specific to 2D. We could make a few improvements
 first,
like generalizing to n-dimensions, replacing the recursive search
 with a
traverser and generalizing the bounding boxes to be simple
 double-arrays.
Then the only thing left would be to decide if it is ok for it to be
 based
on n-dim doubles or should be generalized to more types.
   
On Tue, Jun 28, 2011 at 11:14 PM, Saikat Kanjilal 
 sxk1...@hotmail.comwrote:
   
 I would be interested in helping out with this, let me know next
 steps.

 Sent from my iPhone

 On Jun 28, 2011, at 8:49 AM, Niels Hoogeveen 
 pd_aficion...@hotmail.com
 wrote:

 
  A couple of weeks ago Peter Neubauer set up a repository for
 in-graph
 

Re: [Neo4j] neo4j-graph-collections

2011-06-29 Thread Craig Taverner
It is technically possible, but it is a somewhat specialized index, not a
normal BTree, so I think you would want both (mine and a classic btree). My
index performs better for certain data patterns, is best with semi-ordered
data and moderately even distributions (since it has no rebalancing), and
requires the developer to pick a good starting 'resolution' which means they
should know something about their data. Perhaps we just port some of the
typing support into a btree in the collections project?

On Wed, Jun 29, 2011 at 4:19 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:


 Craig,
 Would it be possible to merge your work on Amanzi with the work the Neo
 team has done on the Btree component that is now in neo4j-graph-collections,
 so we can eventually have one implementation that meets a broad variety of
 needs?
 Niels

  Date: Wed, 29 Jun 2011 15:34:47 +0200
  From: cr...@amanzi.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] neo4j-graph-collections
 
  I have previously used two solutions to deal with multiple types in
 btrees:
 
 - My first index in 2009 was a btree-like n-dim index using generics
 to
 support int[], long[], float[] and double[] (no strings). I used this
 for
 TimeLine (long[1]) and Location (double[2]). The knowledge about what
 type
 was used was in the code for constructing the index (whether a new
 index or
 accessing an existing index in the graph).
 - In December I started my amanzi-index (on
  githubhttps://github.com/craigtaverner/amanzi-index)
 that is also btree-like, n-dimensional. But this time it can index
 multiple
 types in the same tree (so a float, int and string in the same tree,
 instead
 of being forced to have all properties of the same type). It is a
 re-write
 of the previous index to support Strings, and mixed types. This time
 it does
 save the type information in meta-data at the tree root.
 
  The idea of using a 'comparator' class for the types is similar, but
 simpler
  than the idea I implemented for amanzi-index, where I have mapper classes
  that describe not only how to compare types, but also how to map from
 values
  to index keys and back. This includes (to some extent) the concept of the
  lucene analyser, since the mapper can decide on custom distribution of,
 for
  example, strings and category indexes.
 
  For both of these indexes, you configure the index up front, and then
 only
  call index.add(node) to index a node. This will fit in well with the new
  auto-indexing ideas in neo4j.
 
  On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen
  pd_aficion...@hotmail.comwrote:
 
  
  
  
  
  
   At this moment Btree only supports the primitive datatype long, while
 Rtree
   only supports the datatype double. For Btree it makes sense to at least
   support strings, floats, doubles and ints too. Use cases for these data
   types are pretty obvious and are Btree backed in (almost) every RDBMS
   product around.I think the best solution would be to create Comparator
   objects wrapping these primitive data types and store the class name of
 the
   comparator in root of the index tree. This allows users to create their
 own
   comparators for datatypes not covered yet. It would make sense people
 would
   want to store BigInt and BigDecimal objects in a Btree too, others may
 want
   to store dates (instead of datetime), fractions, complex numbers or
 even
   more exotic data types.
   Niels
From: sxk1...@hotmail.com
To: user@lists.neo4j.org
Date: Tue, 28 Jun 2011 22:43:24 -0700
Subject: Re: [Neo4j] neo4j-graph-collections
   
   
I've read through this thread in more detail and have a few thoughts,
   when you talk about type I am assuming that you are referring to an
   interface that both (Btree,Rtree) can implement, for the data types I'd
 like
   to understand the use cases first before implementing the different
 data
   types, maybe we could store types of Object instead of Long or Double
 and
   implement comparators in a more meaningful fashion.   Also I was
 wondering
   if unit tests would need to be extracted out of the spatial component
 and
   embedded inside the graph-collections component as well or whether we'd
   potentially need to write brand new unit tests as well.
Craig as I mentioned I'd love to help, let me know if it would be
   possible to fork a repo or to talk in more detail this week.
Regards
   
 From: pd_aficion...@hotmail.com
 To: user@lists.neo4j.org
 Date: Wed, 29 Jun 2011 01:35:43 +0200
 Subject: Re: [Neo4j] neo4j-graph-collections


 As to the issue of n-dim doubles, it would be interesting to
 consider
   creating a set of classes of type Orderable (supporting , =, , =
   operations), this we can use in both Rtree and Btree. Right now Btree
 only
   supports datatype Long. This should also become more generic. A first
 step
   we can take is at least wrap the common datatypes in Orderable classes.
 Niels

Re: [Neo4j] neo4j-graph-collections

2011-06-29 Thread Craig Taverner
I think moving the RTree to the generic collections would not be too hard. I
saw Saikat showed interested in doing this himself.

Saikat, contact me off-list for further details on what I think could be
done to make this port.

On Wed, Jun 29, 2011 at 9:52 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:


 Peter, I totally agree. Having the Rtree index removed of spatial
 dependencies in graph-collections should be our first priority. Once that is
 done we can focus on the other issues.
 Which doesn't mean we should stop discussing future improvements like
 setting up comparators (or something to that extent) that can be reusable,
 but we shouldn't try to get that up before Rtree is in graph-collections.
 Niels

  From: peter.neuba...@neotechnology.com
  Date: Wed, 29 Jun 2011 21:10:15 +0200
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] neo4j-graph-collections
 
  Craig,
  just gave you push access to the graph collections in case you want to
  do anything there.
 
  Also, IMHO it would be more important to isolate and split out the
  RTree component from Spatial than to optimize it - that could be done
  in the new place with targeted performance tests later?
 
  Cheers,
 
  /peter neubauer
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org   - Your high performance graph
 database.
  http://startupbootcamp.org/- Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Wed, Jun 29, 2011 at 4:19 PM, Niels Hoogeveen
  pd_aficion...@hotmail.com wrote:
  
   Craig,
   Would it be possible to merge your work on Amanzi with the work the Neo
 team has done on the Btree component that is now in neo4j-graph-collections,
 so we can eventually have one implementation that meets a broad variety of
 needs?
   Niels
  
   Date: Wed, 29 Jun 2011 15:34:47 +0200
   From: cr...@amanzi.com
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] neo4j-graph-collections
  
   I have previously used two solutions to deal with multiple types in
 btrees:
  
  - My first index in 2009 was a btree-like n-dim index using
 generics to
  support int[], long[], float[] and double[] (no strings). I used
 this for
  TimeLine (long[1]) and Location (double[2]). The knowledge about
 what type
  was used was in the code for constructing the index (whether a new
 index or
  accessing an existing index in the graph).
  - In December I started my amanzi-index (on
   githubhttps://github.com/craigtaverner/amanzi-index)
  that is also btree-like, n-dimensional. But this time it can index
 multiple
  types in the same tree (so a float, int and string in the same
 tree, instead
  of being forced to have all properties of the same type). It is a
 re-write
  of the previous index to support Strings, and mixed types. This
 time it does
  save the type information in meta-data at the tree root.
  
   The idea of using a 'comparator' class for the types is similar, but
 simpler
   than the idea I implemented for amanzi-index, where I have mapper
 classes
   that describe not only how to compare types, but also how to map from
 values
   to index keys and back. This includes (to some extent) the concept of
 the
   lucene analyser, since the mapper can decide on custom distribution
 of, for
   example, strings and category indexes.
  
   For both of these indexes, you configure the index up front, and then
 only
   call index.add(node) to index a node. This will fit in well with the
 new
   auto-indexing ideas in neo4j.
  
   On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen
   pd_aficion...@hotmail.comwrote:
  
   
   
   
   
   
At this moment Btree only supports the primitive datatype long,
 while Rtree
only supports the datatype double. For Btree it makes sense to at
 least
support strings, floats, doubles and ints too. Use cases for these
 data
types are pretty obvious and are Btree backed in (almost) every
 RDBMS
product around.I think the best solution would be to create
 Comparator
objects wrapping these primitive data types and store the class name
 of the
comparator in root of the index tree. This allows users to create
 their own
comparators for datatypes not covered yet. It would make sense
 people would
want to store BigInt and BigDecimal objects in a Btree too, others
 may want
to store dates (instead of datetime), fractions, complex numbers or
 even
more exotic data types.
Niels
 From: sxk1...@hotmail.com
 To: user@lists.neo4j.org
 Date: Tue, 28 Jun 2011 22:43:24 -0700
 Subject: Re: [Neo4j] neo4j-graph-collections


 I've read through this thread in more detail and have a few
 thoughts,
when you talk about type I am assuming that you are referring to an
interface that both 

Re: [Neo4j] traversing densely populated nodes

2011-06-30 Thread Craig Taverner
This topics has come up before, and the domain level solutions are usually
very similar, like Norbert's category/proxy nodes (to group by
type/direction) and Niels' TimeLineIndex (BTree). I wonder whether we can
build a generic user-level solution that can also be wrapped to appear as an
internal database solution?

For example, consider Niels's solution of the TimeLine index. In this case
we group all the nodes based on a consistent hash. Usually the timeline
would use a timestamp, but really any reasonably variable property can do,
even the node-id itself. Then we have a BTree between the dense nodes and
the root node (node with too many relationships). How about this crazy idea,
create an API that mimics the normal node.getRelationship*() API, but
internally traverses the entire tree? And also for creating the
relationships? So for most cod we just do the usual
node.createRelationshipTo(node,type,direction) and node.traverse(...), but
internally we actually traverse the b-tree.

This would solve the performance bottleneck being observed while keeping the
'illusion' of directly connected relationships. The solution would be
implemented mostly in the application space, so will not need any changes to
the core database. I see this as being of the same kind of solution as the
auto-indexing. We setup some initial configuration that results in certain
structures being created on demand. With auto-indexing we are talking about
mostly automatically adding lucene indexes. With this idea we are talking
about automatically replacing direct relationships with b-trees to resolve a
specific performance issue.

And when the relationship density is very low, if the b-tree is
auto-balancing, it could just be a direct relationship anyway.

On Wed, Jun 29, 2011 at 6:56 PM, Agelos Pikoulas
agelos.pikou...@gmail.comwrote:

 My problem pattern is exactly the same as Niels's :

 A dense-node has millions of relations of a certain direction  type,
 and only a few (sparse) relations of a different direction and type.
 The traversing is usually following only those sparse relationships on
 those
 dense-nodes.

 Now, even when traversing on these sparse relations, neo4j becomes
 extremely
 slow
 on a certainly non linear Order (the big cs O).

 Some tests I run (email me if u want the code) reveal that even the number
 of those dense-nodes in the database greatly influences the results.

 I just reported to Michael the runs with the latest M05 snapshot, which are
 not very positive...
 I have suggested an (auto) indexing of relationship types / direction that
 is used by traversing frameworks,
 but I ain't no graphdb-engine expert :-(

 A'


 Message: 5
  Date: Wed, 29 Jun 2011 18:19:10 +0200
  From: Niels Hoogeveen pd_aficion...@hotmail.com
  Subject: Re: [Neo4j] traversing densely populated nodes
  To: user@lists.neo4j.org
  Message-ID: col110-w326b152552b8f7fbe1312d8b...@phx.gbl
  Content-Type: text/plain; charset=iso-8859-1
 
 
  Michael,
 
 
 
  The issue I am refering to does not pertain to traversing many relations
 at
  once
 
  but the impact many relationship of one type have on relationships
 
  of another type on the same node.
 
 
 
  Example:
 
 
 
  A topic class has 2 million outgoing relationships of type HAS_INSTANCE
  and
 
  has 3 outgoing relationships of type SUB_CLASS_OF.
 
 
 
  Fetching the 3 relations of type SUB_CLASS_OF takes very long,
 
  I presume due to the presence of the 2 million other relationships.
 
 
 
  I have no need to ever fetch the HAS_INSTANCE relationships from
 
  the topic node. That relation is always traversed from the other
 direction.
 
 
 
  I do want to know the class of a topic instance, leading to he topic
 class,
 
  but have no real interest ever to traverse all topic instance from  the
  topic
 
  class (at least not directly.. i do want to know the most recent
 addition,
 
  and that's what I use the timeline index for).
 
 
 
  Niels
 
 
   From: michael.hun...@neotechnology.com
   Date: Wed, 29 Jun 2011 17:50:08 +0200
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] traversing densely populated nodes
  
   I think this is the same problem that Angelos is facing, we are
 currently
  evaluating options to improve the performance on those highly connected
  supernodes.
  
   A traditional option is really to split them into group or even kind of
  shard their relationships to a second layer.
  
   We're looking into storage improvement options as well as modifications
  to retrieval of that many relationships at once.
  
   Cheers
  
   Michael
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Database engine using Neo4j

2011-06-30 Thread Craig Taverner
Hi Kriti,

I can comment on a few things, especially neo4j-spatial:

   - Neo4j is certainly good for social networks, and people have used it
   for that, but I personally do not have experience with that so I will not
   comment further (others can chip in where necessary).
   - Neo4j-Spatial is good for performing some spatial queries on your
   domain data. So you start by modeling your domain however you want, and then
   when you want to start using neo4j-spatial, just add all nodes that have
   spatial components (eg. location) to the spatial index and they will be
   available for querying. The SimplePointLayer class has support for querying
   by proximity, which sounds like what you want. You can also query with a
   filter on properties (so only nearby objects matching some other criteria).
   - I do my neo4j-spatial development in eclipse, so there should be no
   issues for you using eclipse. Just use m2eclipse, and add the dependency to
   your pom.xml. The current version o neo4j-spatial requires neo4j1.4, so if
   you are using older neo4j, you might need to make minor changes.
   - Neo4j is not optimized for storing BLOBs, so while it can store images
   as byte[], it is advisable to rather store a reference to the image (eg.
   URI), and store the image in another way (filesystem, other database, etc.)

Regards, Craig

On Wed, Jun 29, 2011 at 2:06 PM, kriti sharma kriti.0...@gmail.com wrote:

 Dear Users,

 I am developing a time capsule DB engine using Neo4j as a database.
 I intend to develop three scales (temporal , geo/spatial and
 egocentric/personal relationships) in the db structure.
 for the geolocation part, i would like to be able to query upon a location
 keyword and also some nearby places/photos/people that i have in my DB.

 Do you think neo4j spatial will be a good choice for such a spatial scheme?
 I have developed a timeline in the usual neo4j using timeline feature. Can
 I
 simply integrate neo4j spatial in my existing code for neo4j in eclipse?

 i am retrieving data from twitter, flickr, facebook etc. so the format of
 data may not be uniform. Therefore i found Neo4j to be an excellent option.
 Has some work been done in modelling a user's Facebook data(friends and
 networks) relationships in Neo4j?

 How should I go about storing images in the DB?

 Thanks
 Kriti
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] traversing densely populated nodes

2011-06-30 Thread Craig Taverner
In the amanzi-index I link all indexed nodes into the index tree, so
traversals are straight up the tree. Of course this also means that there
are at least as many relationships as indexed nodes.

I was reviewing Michaels code for the relationship expander, and think that
is a great idea, tranparently using an index instead of the normal
relationships API, and can imagine using the relationship expander to
instead traverse the BTree to the final relationship to the leaf nodes.

So if we imagine a BTree with perhaps 10 or 20 hops from the root to the
leaf node, the relationship expander Michael described would complete all
hops and return only the last relationship, giving the illusion of direct
connections from root to leaf. This would certainly perform well, especially
for cases where there are factors limiting the number of relationships we
want returned. I think the request for type and direction is the first
obvious case, but we could be even more explicit than that, if we pass
constraints based on the BTree's consistent hash.

On Thu, Jun 30, 2011 at 11:36 PM, Niels Hoogeveen pd_aficion...@hotmail.com
 wrote:


 In theory the approach I described earlier could work, though there are
 some pitfalls to the current implementation that need ironing out before
 this can become a recommended approach.
 The choice of Timeline instead of Btree may actually be the wrong choice
 after all. I chose Timeline because of my familiarity with this particular
 class, but its implementation may actually not be all that suitable for this
 particular use case. This has to do with the fact that Timeline is not just
  a tree, but a list where entries with an interval of max. 1000 are stored
 in a Btree index. This works reasonably well for a Timeline, but makes the
 approach less ideal for storing dense relationships.
 The problem with the Timeline implementation is the ability to lookup the
 tree root from a particular leave. In an ordinary Btree is would simply be a
 traversal from the leave through the layers of block nodes to the tree root.
 In Timeline the traversal will be different. It first has to move through
 the Timeline list until it finds an entry that is stored in the Btree (which
 worst case takes 1000 hops), and then it has to traverse the Btree up to the
 tree root. To avoid this complicated traversal I ended up doing a lookup
 through Lucene of the timeline URI (which is stored in all timeline list
 entries). In fact I might as well have added the URI of the dense node as a
 property and do the lookup through Lucene without the Timeline, it just
 happens that I like the sort order of Timeline, making it a useful approach
 anyway.
 I will experiment using Btree directly (without Timeline) and see if that
 leads to a simpler and faster traversal from leave to root node.
 There is one more issue before this can become production ready. Btree as
 it is implemented now is not thread safe (per the implementations Javadocs),
 so it need some love and attention to make it work properly.
 Niels

  Date: Thu, 30 Jun 2011 13:57:20 +0200
  From: cr...@amanzi.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] traversing densely populated nodes
 
  This topics has come up before, and the domain level solutions are
 usually
  very similar, like Norbert's category/proxy nodes (to group by
  type/direction) and Niels' TimeLineIndex (BTree). I wonder whether we can
  build a generic user-level solution that can also be wrapped to appear as
 an
  internal database solution?
 
  For example, consider Niels's solution of the TimeLine index. In this
 case
  we group all the nodes based on a consistent hash. Usually the timeline
  would use a timestamp, but really any reasonably variable property can
 do,
  even the node-id itself. Then we have a BTree between the dense nodes and
  the root node (node with too many relationships). How about this crazy
 idea,
  create an API that mimics the normal node.getRelationship*() API, but
  internally traverses the entire tree? And also for creating the
  relationships? So for most cod we just do the usual
  node.createRelationshipTo(node,type,direction) and node.traverse(...),
 but
  internally we actually traverse the b-tree.
 
  This would solve the performance bottleneck being observed while keeping
 the
  'illusion' of directly connected relationships. The solution would be
  implemented mostly in the application space, so will not need any changes
 to
  the core database. I see this as being of the same kind of solution as
 the
  auto-indexing. We setup some initial configuration that results in
 certain
  structures being created on demand. With auto-indexing we are talking
 about
  mostly automatically adding lucene indexes. With this idea we are talking
  about automatically replacing direct relationships with b-trees to
 resolve a
  specific performance issue.
 
  And when the relationship density is very low, if the b-tree is
  auto-balancing, it could just be a direct 

Re: [Neo4j] GSoC 2011 Neo4j Geoprocessing | Weekly Report #6

2011-07-02 Thread Craig Taverner
Hi Andreas,

Sounds like good progress over all. It is only a week to the mid-terms, so
it would be good to do a general code overview and see if this can be
integrated with trunk. Shall we plan for a review and test integration in
the middle of next week?

Regards, Craig

On Sat, Jul 2, 2011 at 10:25 AM, Andreas Wilhelm a...@kabelbw.de wrote:

 Hi,

 This week I had a little blocker with deleting some subgraph nodes and
 relations. For that I made a seperate test to identify the problem and
 try to find a solution.

 Apart from that I integrated a additonal spatial type function to get
 the distance between geometry nodes and
 updated the already existing spatial type functions to the new API.


 Best Regards

 Andreas Wilhelm
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] reify links with other neo4j databases located on different distributed servers

2011-07-02 Thread Craig Taverner
As far as I know there is no internal support for transparent traversals
across shards. Generally people are doing that in the application layer.
However, I think there might be a middle ground of sorts. I we modify the
relationship expander, I could imagine that relationships that are between
shards could be modified to return node on the other shard. This would make
the traversal return nodes across shards, but since I've not tried this
myself, I am uncertain if there are other consequences.

On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala aliabba...@gmail.comwrote:

 Hi,

 I cannot figure out how my application logic can reify links with
 other neo4j databases located on different distributed servers?
 hence , how can i make the traversals and graph algorithms transparent
 to the location of the different databases ?
 --
 Aliabbas Petiwala
 M.Tech CSE
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] wkb value in node created by addGeometryWKTToLayer

2011-07-02 Thread Craig Taverner
Hi Boris,

You do not need to read the property yourself from the node, rather use the
GeometryEncoder for this, it converts from the internal spatial storage to
the Geometry class, which you can work with. If you call geom.toString() you
will get a nice printable version (in WKT). Using the GeometryEncoder is a
particularly good idea because we support many internal storage formats, not
just the WKB you found. If you have point data only, you should consider
using the SimplePointLayer (created with
SpatialDatabaseService.createSimplePointLayer()), which will store the Point
as two properties, for latitude and longitude.

Back to your main question: WKB and WKT are two different formats for
representing spatial data. We support both with the WKBGeometryEncoder and
WKTGeometryEncoder classes, but in both cases we convert from that format to
JTS Geometry class for performing spatial operations on. Internally these
classes use the WKBReader/WKBWriter (and WKT versions of this) for
performing the conversions. If you want to convert between WKB and WKT
yourself, you should just use the JTS code directly.

But as I said before, I do not think you need to do this. If you are getting
your nodes from a search using the index, something like
search.getResults().get(0).getGeometry().toString() will return the WKT
version.

Regards, Craig

On Sat, Jul 2, 2011 at 1:04 AM, Boris Kizelshteyn bo...@popcha.com wrote:

 Craig or anyone who can answer this: what does the wkb value represent
 here.
 I know its the well known bytes, but how do I get back to wkt? I thought it
 was a byte array, but I can't seem to get my original values back. Form the
 values in the test case I have:

 POINT(15.2 60.1)


 wkb:

 [0,0,0,0,2,0,0,0,2,64,46,51,51,51,51,51,51,64,78,25,-103,-103,-103,-103,-102,64,46,-103,-103,-103,-103,-103,-102,64,78,12,-52,-52,-52,-52,-51]
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-07-02 Thread Craig Taverner
Hi Boris,

Ah! You are using the REST API. That changes a lot, since Neo4j Spatial is
only recently exposed in REST and we do not expose most of the capabilities
I have discussed in this thread, or indeed in my other answer today.

I did recently add some REST methods that might work for you, specifically
the addEditableLayer, which makes a WKB layer, and the
addGeometryWKTToLayer, for adding any kind of Geometry (including
LineString) to the layer. However, these were only added recently, and I
have no experience using them myself, so consider this very much prototype
code. From your other question today, can I assume you are having trouble
making sense of the data coming back? So we need a better way to return the
results in WKT instead of WKB? One option would be to enhance the
addEditableLayer method to allow the creation of WKT layers instead of WKB
layers, so the internal representation is more internet friendly.

I've just added untested support for setting the format to WKT for the
internal representation of the editable layer in the REST interface. This is
untested (outside of my usual unit tests, that is), and is only in the trunk
of neo4j-spatial, but you are welcome to try it out and see what happens.

Regards, Craig

On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com wrote:

 Hi Craig,

 Thanks so much for this reply. It is very insightful. Is it possible for me
 to implement the LineString geometries and lookups using REST?

 Many thanks!

 On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com wrote:

  OK. I understand much better what you want now.
 
  Your person nodes are not geographic objects, they are persons that can
 be
  at many positions and indeed move around. However, the 'path' that they
  take
  is a geographic object and can be placed on the map and analysed
  geographically.
 
  So the question I have is how do you store the path the person takes? Is
  this a bunch of position nodes connected back to that person? Or perhaps
 a
  chain of position-(next)-position-(next)-position, etc? However you
 have
  stored this in the graph, you can express this as a geographic object by
  implementing the GeometryEncoder interface. See, for example, the 6 lines
  of
  code it takes to traverse a chain of NEXT locations and produce a
  LineString
  geometry in the SimpleGraphEncoder at
 
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82
 
  
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82
  If
  you do this, you can create a layer that uses your own geometry encoder
 (or
  the SimpleGraphEncoder I referenced above, if you use the same graph
  structure) and your own domain model will be expressed as LineString
  geometries and you can perform spatial operations on them.
 
  Alternatively, if your data is more static in nature, and you are
 analysing
  only what the person did in the past, and the graph will therefor not
  change, perhaps you do not care to store the locations in the graph, and
  you
  can just import them as a LineString directly into a standard layer.
 
  Whatever route you take, the final action you want to perform is to find
  points near the LineString (path the person took). I do not think the
  bounding box is the right approach for that either. You need to try, for
  example, the method findClosestEdges in the utilities class at
 
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115
 
  
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115
  This
  method can find the part of the persons path that it closest to the point
  of
  interest. There also also many other geographic operations you might be
  interested in trying, once you have a better feel for the types of
 queries
  you want to ask.
 
  Regards, Craig
 
  On Wed, Jun 8, 2011 at 2:17 AM, Boris Kizelshteyn bo...@popcha.com
  wrote:
 
   Thanks for the detailed response! Here is what I'm trying to do and I'm
   still not sure how to accomplish it:
  
   1. I have a node which is a person
  
   2. I have geo data as that person moves around the world
  
   3. I use the geodata to create a bounding box of where that person has
  been
   today
  
   4. I want to say, was this person A near location X today?
  
   5. I do this by seeing if location X is in A's bounding box.
  
   From looking at what you suggest doing, it's not clear how I assign the
   node
   person A to a layer? Is it that the bounding box is now in the layer
 and
   not
   in the node? The issue then becomes, how od I associate the two as the
   RTree
   relationship seems to establish itself on the bounding box between the
  node
   and the layer.
  
   Many thanks for your patience as I learn this challenging material

Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-07-02 Thread Craig Taverner
As I understand it, Andreas is working on the much more complex problem of
updating OSM geometries. That is more complex because it involves
restructuring the connected graph.

The case Boris has is much simpler, just modifying the WKT or WKB in the
editable layer. In the Java API this is simply to call the
GeometryEncoder.encodeGeometry() method, which will modify the geometry in
place (ie. replace the old geometry with a new one). However, I do not think
it is that simple on the REST interface. I can check, but think we will need
a new method for updating geometries. Internally it is trivial to code.

So I just added a quick method, called updateGeometryFromWKT, which requires
the geometry (in WKT), the existing geometry node-id, and the layer. Give it
a try.

On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.comwrote:

 Actually,
 Andreas Wilhelm is working right now on updating geometries.

 Sent from my phone.
 On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote:
  Wow that's great! I'll try it out asap. This leads to my next question:
 how
  do I update the geometry in a layer, rather than add new? What I am
 thinking
  of doing is having a multipoint geometery associated with each of my user
  nodes which will represent their location history. My plan is to add the
  geometry to a world layer and then associate the returned node with the
  user. How do I then add new points to that connecter node? Can I just
 edit
  the wkt and assume the index will update? Or do you have a better
 suggestion
  for doing this? I would rather avoid having each point be a seperate node
 as
  I am tracking gps data and getting lots of coordinates, it would be many
  thousands of nodes per user.
 
  Many thanks!
 
 
 
  On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com
 wrote:
 
  Hi Boris,
 
  Ah! You are using the REST API. That changes a lot, since Neo4j Spatial
 is
  only recently exposed in REST and we do not expose most of the
  capabilities
  I have discussed in this thread, or indeed in my other answer today.
 
  I did recently add some REST methods that might work for you,
 specifically
  the addEditableLayer, which makes a WKB layer, and the
  addGeometryWKTToLayer, for adding any kind of Geometry (including
  LineString) to the layer. However, these were only added recently, and
 I
  have no experience using them myself, so consider this very much
 prototype
  code. From your other question today, can I assume you are having
 trouble
  making sense of the data coming back? So we need a better way to return
  the
  results in WKT instead of WKB? One option would be to enhance the
  addEditableLayer method to allow the creation of WKT layers instead of
 WKB
  layers, so the internal representation is more internet friendly.
 
  I've just added untested support for setting the format to WKT for the
  internal representation of the editable layer in the REST interface.
 This
  is
  untested (outside of my usual unit tests, that is), and is only in the
  trunk
  of neo4j-spatial, but you are welcome to try it out and see what
 happens.
 
  Regards, Craig
 
  On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com
  wrote:
 
   Hi Craig,
  
   Thanks so much for this reply. It is very insightful. Is it possible
 for
  me
   to implement the LineString geometries and lookups using REST?
  
   Many thanks!
  
   On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com
  wrote:
  
OK. I understand much better what you want now.
   
Your person nodes are not geographic objects, they are persons that
  can
   be
at many positions and indeed move around. However, the 'path' that
  they
take
is a geographic object and can be placed on the map and analysed
geographically.
   
So the question I have is how do you store the path the person
 takes?
  Is
this a bunch of position nodes connected back to that person? Or
  perhaps
   a
chain of position-(next)-position-(next)-position, etc? However
 you
   have
stored this in the graph, you can express this as a geographic
 object
  by
implementing the GeometryEncoder interface. See, for example, the 6
  lines
of
code it takes to traverse a chain of NEXT locations and produce a
LineString
geometry in the SimpleGraphEncoder at
   
   
  
 

 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82
   

   
  
 

 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82
If
you do this, you can create a layer that uses your own geometry
  encoder
   (or
the SimpleGraphEncoder I referenced above, if you use the same
 graph
structure) and your own domain model will be expressed as
 LineString
geometries and you can perform spatial operations on them.
   
Alternatively, if your data is more static

[Neo4j] Cypher error in neo4j-spatial

2011-07-02 Thread Craig Taverner
Hi,

Recent builds of Neo4j-Spatial no longer like Peters new bounding box query.
Peter is on vacation, and I am not familiar with the code (nor cypher), so I
thought I would just dump the error message here for now in case someone can
give me a quick pointer.

The line of code is:
Query query = parser.parse( start n=(layer1,'bbox:[15.0, 16.0, 56.0,
57.0]') match (n) -[r] - (x) return n.bbox, r:TYPE, x.layer?, x.bbox? );

The error is:
org.neo4j.cypher.SyntaxError: string matching regex `\z' expected but `:'
found
at org.neo4j.cypher.parser.CypherParser.parse(CypherParser.scala:75)
at org.neo4j.cypher.javacompat.CypherParser.parse(CypherParser.java:39)
at
org.neo4j.gis.spatial.IndexProviderTest.testNodeIndex(IndexProviderTest.java:91)

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-07-06 Thread Craig Taverner
Hi Boris,

I can see the new update method here:
https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/server/plugin/SpatialPlugin.java#L138

And the commit for it is here:
https://github.com/neo4j/neo4j-spatial/commit/22eaf91957a6265ef1e6923b5da572b75383b83e

Hope that helps.

Let me know if this works. The REST method is entirely untested, but does
wrap code that is tested, so I'm relatively optimistic :-)

Regards, Craig

On Wed, Jul 6, 2011 at 1:51 AM, Boris Kizelshteyn bo...@popcha.com wrote:

 Hi Craig,

 This is awesome!

 Where is the update method? I can't find the code on github.

 Thanks!

 On Sat, Jul 2, 2011 at 6:00 PM, Craig Taverner cr...@amanzi.com wrote:

  As I understand it, Andreas is working on the much more complex problem
 of
  updating OSM geometries. That is more complex because it involves
  restructuring the connected graph.
 
  The case Boris has is much simpler, just modifying the WKT or WKB in the
  editable layer. In the Java API this is simply to call the
  GeometryEncoder.encodeGeometry() method, which will modify the geometry
 in
  place (ie. replace the old geometry with a new one). However, I do not
  think
  it is that simple on the REST interface. I can check, but think we will
  need
  a new method for updating geometries. Internally it is trivial to code.
 
  So I just added a quick method, called updateGeometryFromWKT, which
  requires
  the geometry (in WKT), the existing geometry node-id, and the layer. Give
  it
  a try.
 
  On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.com
  wrote:
 
   Actually,
   Andreas Wilhelm is working right now on updating geometries.
  
   Sent from my phone.
   On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote:
Wow that's great! I'll try it out asap. This leads to my next
 question:
   how
do I update the geometry in a layer, rather than add new? What I am
   thinking
of doing is having a multipoint geometery associated with each of my
  user
nodes which will represent their location history. My plan is to add
  the
geometry to a world layer and then associate the returned node with
  the
user. How do I then add new points to that connecter node? Can I just
   edit
the wkt and assume the index will update? Or do you have a better
   suggestion
for doing this? I would rather avoid having each point be a seperate
  node
   as
I am tracking gps data and getting lots of coordinates, it would be
  many
thousands of nodes per user.
   
Many thanks!
   
   
   
On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com
   wrote:
   
Hi Boris,
   
Ah! You are using the REST API. That changes a lot, since Neo4j
  Spatial
   is
only recently exposed in REST and we do not expose most of the
capabilities
I have discussed in this thread, or indeed in my other answer
 today.
   
I did recently add some REST methods that might work for you,
   specifically
the addEditableLayer, which makes a WKB layer, and the
addGeometryWKTToLayer, for adding any kind of Geometry (including
LineString) to the layer. However, these were only added recently,
  and
   I
have no experience using them myself, so consider this very much
   prototype
code. From your other question today, can I assume you are having
   trouble
making sense of the data coming back? So we need a better way to
  return
the
results in WKT instead of WKB? One option would be to enhance the
addEditableLayer method to allow the creation of WKT layers instead
  of
   WKB
layers, so the internal representation is more internet friendly.
   
I've just added untested support for setting the format to WKT for
  the
internal representation of the editable layer in the REST
 interface.
   This
is
untested (outside of my usual unit tests, that is), and is only in
  the
trunk
of neo4j-spatial, but you are welcome to try it out and see what
   happens.
   
Regards, Craig
   
On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn 
 bo...@popcha.com
wrote:
   
 Hi Craig,

 Thanks so much for this reply. It is very insightful. Is it
  possible
   for
me
 to implement the LineString geometries and lookups using REST?

 Many thanks!

 On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com
 
wrote:

  OK. I understand much better what you want now.
 
  Your person nodes are not geographic objects, they are persons
  that
can
 be
  at many positions and indeed move around. However, the 'path'
  that
they
  take
  is a geographic object and can be placed on the map and
 analysed
  geographically.
 
  So the question I have is how do you store the path the person
   takes?
Is
  this a bunch of position nodes connected back to that person?
 Or
perhaps
 a
  chain of position-(next

Re: [Neo4j] Neo4j Spatial - Keep OSM imports

2011-07-08 Thread Craig Taverner
Another option is to run the main method of OSMImport class, which expects
command line arguments for database location and OSM file, and will simply
import a file once. This is not tested often, so there is a risk things have
changed, but it is worth a try.

Another, even easier, option in my opinion is the JRuby gem,
neo4j-spatial.rb. See http://rubygems.org/gems/neo4j-spatial

To get this running, just install JRuby from http://jruby.org, and then
install the gem with jruby -S gem install neo4j-spatial and then you will
have new console commands like 'import_layer'. If you run 'import_layer
mydata.osm', it will import it to a new database, which you can use. See the
github page for more information:
https://github.com/craigtaverner/neo4j-spatial.rb

On Thu, Jul 7, 2011 at 10:47 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Robin,

 the database is deleted after each run in Neo4jTestCase.java,

   @Override
@After
protected void tearDown() throws Exception {
shutdownDatabase(true);
super.tearDown();
}

 if you change to shutdownDatabase(false), the database will not be
 deleted. In this case, make sure to run just that test in order not to
 write several tests to the same DB for clarity.

 mvn test -Dtest=TestDynamicLayers

 Does that work for you?


 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com wrote:
  Hello,
 
  First of all, I don't know anything in java, and I'm trying to figure out
 if
  neo4j could be usefull for my projects. If it is, I will of course learn
 a
  bit of java so that I can use neo4j in a decent way for my needs.
 
  I'd like to use a neo4j spatial database together with GeoServer.
  For this, I'm following the tutorial here :
  http://wiki.neo4j.org/content/Neo4j_Spatial_in_GeoServer
  But this paragraph is blocking me :
  
 
- One option for the database location is a database created using the
unit tests in Neo4j Spatial. The rest of this wiki assumes that you ran
 the
TestDynamicLayers unit test which loads an OSM dataset for the city of
 Malmö
in Sweden, and then creates a number of Dynamic Layers (or views) on
 this
data, which we can publish in GeoServer.
- If you do use the unit test for the sample database, then the
 location
of the database will be in the target/var/neo4j-db directory of the
 Neo4j
Source code.
 
  
 
  My problem is I do not succeed keeping those neo4j spatial databases
 created
  with the tests : When I run TestDynamicLayers, it builds databases (in
  target/var/neo4j-db), but as soon as the database is successfully loaded,
 it
  deletes it and start importing another database, and so on.
 
  My poor understanding of java doesn't help a lot, I tried to edit the
 .java
  in Netbeans + Maven, but until then, it doesn't work, all the directories
  created during the tests are deleted when the test ends.
 
  Any idea how I could keep those databases ?
 
  Thanks,
 
  Robin
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j Spatial - Keep OSM imports - Use in GeoServer

2011-07-12 Thread Craig Taverner
I am travelling at the moment, so cannot give a long answer, but can suggest
you look at the wiki page for neo4j in uDig, because there we have made some
updates concerning which jars to use, and that will probably help you get
this working.
On Jul 12, 2011 10:59 AM, Robin Cura robin.c...@gmail.com wrote:
 Hi,

 First of all, thanks a lot to both of you for your answers, I have only
been
 able to try this yesterday, and it released me from lots of troubles.

 I succeeded editing the Neo4jTestCase.java file in Netbeans, as you told.
 I've got troubles to install latest JRuby release (needed for
neo4j-spatial)
 within my Ubuntu, so, I'll make this later, but it's really a good thing
to
 know considering the simplicity of use.

 Creating thoses databases made me realize another problem.In fact, I
 followed the tutorial about using neo4j db in Geoserver, and it appears
that
 my neo4j plugin for Geoserver doesn't work, as I always get this error
when
 trying to create a new store linking to my neo4j database.
 My database is a folder named db1 (and db2 for the other one), located
 in my ~/ folder.

 In Geoserver, I create a new store and make it link to
 file:/home/administrateur/db1/neostore.id
 But each time, I got this errror :

 Error connecting to Store.

 There was an error trying to connecto to store neo4jstore. Do you want to
 save it anyway?

 Original exception error:

 Could not acquire data access 'neo4jstore'

 I tried with my 2 databases, and same problem.
 It seems those 2 db aren't the problem, as I've been able to
open/visualise
 those in Gephi (using neo4j import plugin).

 My guess is that my neo4-spatial plugin for Geoserver isn't working
 properly.

 The main problem is that, since the tutorial was written, neo4j changed.

 In the tuto, we have to place some files in geoserver/WEB-INF/lib/ folder
:

 - json-simple-1.1.jar -- No problem, this file is still used
 - geronimo-jta_1.1_spec-1.1.1.jar -- Same, this is still the version
 used in neo4j
 - neo4j-kernel-1.2-1.2.M04.jar -- Replaced this one with my current
 neo4j kernel jar, neo4j-kernel-1.4.jar
 - neo4j-index-1.2-1.2.M04.jar
 - neo4j-spatial.jar-- Replaced this one with the latest build returned
 by using sudo mvn clean package : neo4j-spatial-0.6-SNAPSHOT.jar

 My problem is that there is no more neo4j-index file in latest neo4j
 releases. There is some neo4j-lucene-index files, but 1.4 doesn't seem to
 use neo4j-index anymore.
 When I only put neo4j-lucene-index.jar, Geoserver doesn't propose any
option
 to create a Store from Neo4j databases.

 So, what I did is I used the neo4j-index-1.3-1.3.M01.jar file from
previous
 release of Neo4j : Geoserver proposes to create a Store from a Neo4j db,
but
 I got the error message quoted above.

 Any idea how I could make this work ? What is the file that replace
 neo4j-index in Neo4j 1.4 ?

 I join one of my database, archived, so that one of you with a working
neo4j
 plugin in Geoserver could test it and confirm the problem isn't with the
DB.

 Thanks,

 Robin Cura

 2011/7/9 Craig Taverner cr...@amanzi.com

 Another option is to run the main method of OSMImport class, which
expects
 command line arguments for database location and OSM file, and will
simply
 import a file once. This is not tested often, so there is a risk things
 have
 changed, but it is worth a try.

 Another, even easier, option in my opinion is the JRuby gem,
 neo4j-spatial.rb. See http://rubygems.org/gems/neo4j-spatial

 To get this running, just install JRuby from http://jruby.org, and then
 install the gem with jruby -S gem install neo4j-spatial and then you
will
 have new console commands like 'import_layer'. If you run 'import_layer
 mydata.osm', it will import it to a new database, which you can use. See
 the
 github page for more information:
 https://github.com/craigtaverner/neo4j-spatial.rb

 On Thu, Jul 7, 2011 at 10:47 AM, Peter Neubauer 
 peter.neuba...@neotechnology.com wrote:

  Robin,
 
  the database is deleted after each run in Neo4jTestCase.java,
 
  @Override
  @After
  protected void tearDown() throws Exception {
  shutdownDatabase(true);
  super.tearDown();
  }
 
  if you change to shutdownDatabase(false), the database will not be
  deleted. In this case, make sure to run just that test in order not to
  write several tests to the same DB for clarity.
 
  mvn test -Dtest=TestDynamicLayers
 
  Does that work for you?
 
 
  Cheers,
 
  /peter neubauer
 
  GTalk: neubauer.peter
  Skype peter.neubauer
  Phone +46 704 106975
  LinkedIn http://www.linkedin.com/in/neubauer
  Twitter http://twitter.com/peterneubauer
 
  http://www.neo4j.org - Your high performance graph
 database.
  http://startupbootcamp.org/ - Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com
wrote:
   Hello,
  
   First of all, I don't know anything in java, and I'm trying to figure
 out
  if
   neo4j could

Re: [Neo4j] How to create a graph database out of a huge dataset?

2011-07-19 Thread Craig Taverner
I'm not sure it's such a good idea to call tx.success() on every iteration
of the loop. I suggest call it only in the commit, and after the loop (ie.
move it two lines down).

Also I think a commit size of 50k it a little large. You're probably not
going to see much improvement past 10k. In fact I generally only use 1k
myself (but I hear 10k is popular too :-)

On Sun, Jul 17, 2011 at 8:53 PM, st3ven st3...@web.de wrote:

 Hi,

 thanks for your fast answer.
 Right now I'm using lucene for 6M authors, but my whole dataset consists of
 nearly 25M authors.
 Can i use lucene there also, because I think this getting really slow to
 check if a user already exists.
 How can I change my heap memory settings and my memory-map settings, cause
 I'm using the transactional mode?
 Cause I think with 25M authors I will get a OutOfMemory Exception.

 Here is my code that I have already written so far:

 import java.io.BufferedReader;
 import java.io.FileReader;
 import java.io.IOException;

 import org.neo4j.graphdb.GraphDatabaseService;
 import org.neo4j.graphdb.Node;
 import org.neo4j.graphdb.Relationship;
 import org.neo4j.graphdb.Transaction;
 import org.neo4j.graphdb.index.Index;
 import org.neo4j.graphdb.index.IndexHits;
 import org.neo4j.graphdb.index.IndexManager;
 import org.neo4j.kernel.EmbeddedGraphDatabase;

 public class WikiGraphRegUser {

/**
 * @param args
 */
public static void main(String[] args) throws IOException {

BufferedReader bf = new BufferedReader(new FileReader(
E:/wiki0.csv));
WikiGraphRegUser wgru = new WikiGraphRegUser();
wgru.createGraphDatabase(bf);
}

private String articleName = ;
private GraphDatabaseService db;
private IndexManager index;
private IndexNode authorList;
private int transactionCounter = 0;
private Node article;
private boolean isFirstAuthor = false;
private Node author;
private Relationship relationship;
private int node;

private void createGraphDatabase(BufferedReader bf) {
db = new EmbeddedGraphDatabase(target/db);
index = db.index();
authorList = index.forNodes(Author);

String zeile;
Transaction tx = db.beginTx();

try {
// reads lines of CSV-file
while ((zeile = bf.readLine()) != null) {
if (transactionCounter++ % 5 == 0) {

tx.success();
tx.finish();
tx = db.beginTx();
}
// String[] looks like this: Article%;%
 Timestamp%;% Author
String[] artikelinfo = zeile.split(%;% );
if (artikelinfo.length != 3) {
System.out.println(ERROR: check
 CSV);
for (int i = 0; i 
 artikelinfo.length; i++) {

  System.out.println(artikelinfo[i]);
}
return;
}

if (articleName == ) {
// create Article and connect with
 ReferenceNode
article =
 createArticle(artikelinfo[0],

  db.getReferenceNode(), MyRelationshipTypes.ARTICLE);
articleName = artikelinfo[0];

isFirstAuthor = true;

} else if
 (!articleName.equals(artikelinfo[0])) {
// create Article and connect with
 ReferenceNode
article =
 createArticle(artikelinfo[0],

  db.getReferenceNode(), MyRelationshipTypes.ARTICLE);
articleName = artikelinfo[0];
isFirstAuthor = true;
}
// checks if author already exists
IndexHitsNode hits =
 authorList.get(Author, artikelinfo[2]);
// if new author
if (hits.size() == 0) {
if (isFirstAuthor) {
// creates author and
 connects him with an article
author =
 createAndConnectNode(artikelinfo[2], article,

  MyRelationshipTypes.WROTE, artikelinfo[1]);
isFirstAuthor = false;
} else {

author 

Re: [Neo4j] How often are Spatial snapshots published?

2011-07-22 Thread Craig Taverner
Interesting that if you look at the github 'blame' for that file (see
https://github.com/neo4j/neo4j-spatial/blame/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java),
you find that all the findClosestEdges methods where added in October 2010.
So if Nolan has a version older than that, then something weird is going on.
He must have the very first version from September 2010, which is not
compatible with any recent Neo4, Geotools or uDig.

When I look at m2.neo4j.org I can see that the latest 0.6-SNAPSHOT is from
May. So we do have a problem, but not one that takes us back to last
September.

Nolan, perhaps your pom.xml refers to an older neo4j-spatial? You should use
0.6-SNAPSHOT. And we will change that again soon (to 0.7) since we are
making changes to the geoprocessing and indexing.

On Fri, Jul 22, 2011 at 10:04 AM, Anders Nawroth
and...@neotechnology.comwrote:

 Hi!

 The deployment seems to be broken at the moment, I'll look into that ASAP.

 /anders

 2011-07-22 09:28, Peter Neubauer skrev:
  Nolan,
  saftest is to build it yourself from GITHub, I will check the
  deployment. Is that ok for now?
 
  /peter
 
  On Fri, Jul 22, 2011 at 3:57 AM, Nolan Darilekno...@thewordnerd.info
  wrote:
  I'm looking at the Spatial sources from Git, and am seeing lots of
  versions of SpatialTopologyUtils.findClosestEdges that don't appear to
  be in the snapshot I'm downloading. For instance,
 
   public static ArrayListPointResult  findClosestEdges(Point point,
   Layer layer) {
 
 
  doesn't appear to be in the snapshot build I have--that or my local
  cache is borken.
 
  Are these snapshots rebuilt regularly?
 
  Thanks.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j Spatial and gtype property

2011-07-27 Thread Craig Taverner
Actually we do allow multiple geometry types in the same layer, but some
actions, like export to shapely, will fail. We even test for this in
TestDynamicLayers.

You can use the gtype if you want, but it is specific to some
GeometryEncoders, and might change in future releases. It would be better to
get the layers geometry encoder and use that.
On Jul 27, 2011 6:04 PM, Peter Neubauer peter.neuba...@neotechnology.com
wrote:
 Christopher,
 What do you mean by allowing to use? Yes, these properties are used to
 store the Geometry Type for a Layer and for geometry nodes. Sadly, you
 cannot have more than one Geometry in Layers due to the limitations of
 e.g. the GeoTools stack.

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Wed, Jul 27, 2011 at 4:07 AM, Christopher Schmidt
 fakod...@googlemail.com wrote:
 Hi all,

 is it allowed to use the gtype-property to get the geometry type numbers?

 (Which are defined in org.neo4j.gis.spatial.Constants)

 --
 Christopher
 twitter: @fakod
 blog: http://blog.fakod.eu
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j Spatial and gtype property

2011-07-29 Thread Craig Taverner
Yes. If you have performed a search and now have SpatialDatabaseRecord
results, then that is the best method to use.

On Thu, Jul 28, 2011 at 6:03 AM, Christopher Schmidt 
fakod...@googlemail.com wrote:

 So best is to use SpatialDatabaseRecord.getGeometry()?

 Christopher

 On Wed, Jul 27, 2011 at 10:50 PM, Craig Taverner cr...@amanzi.com wrote:

  Actually we do allow multiple geometry types in the same layer, but some
  actions, like export to shapely, will fail. We even test for this in
  TestDynamicLayers.
 
  You can use the gtype if you want, but it is specific to some
  GeometryEncoders, and might change in future releases. It would be better
  to
  get the layers geometry encoder and use that.
  On Jul 27, 2011 6:04 PM, Peter Neubauer 
  peter.neuba...@neotechnology.com
  wrote:
   Christopher,
   What do you mean by allowing to use? Yes, these properties are used to
   store the Geometry Type for a Layer and for geometry nodes. Sadly, you
   cannot have more than one Geometry in Layers due to the limitations of
   e.g. the GeoTools stack.
  
   Cheers,
  
   /peter neubauer
  
   GTalk:  neubauer.peter
   Skype   peter.neubauer
   Phone   +46 704 106975
   LinkedIn   http://www.linkedin.com/in/neubauer
   Twitter  http://twitter.com/peterneubauer
  
   http://www.neo4j.org   - Your high performance graph
  database.
   http://startupbootcamp.org/- Öresund - Innovation happens HERE.
   http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
 party.
  
  
  
   On Wed, Jul 27, 2011 at 4:07 AM, Christopher Schmidt
   fakod...@googlemail.com wrote:
   Hi all,
  
   is it allowed to use the gtype-property to get the geometry type
  numbers?
  
   (Which are defined in org.neo4j.gis.spatial.Constants)
  
   --
   Christopher
   twitter: @fakod
   blog: http://blog.fakod.eu
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Christopher
 twitter: @fakod
 blog: http://blog.fakod.eu
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial and postgis

2011-08-13 Thread Craig Taverner
Or if you want a command line import, try the ruby gem 'neo4j-spatial.rb'.
Once installed you can type:
osm_import file.shp
On Aug 13, 2011 10:33 AM, Andreas Wilhelm a...@kabelbw.de wrote:
 Hi,

 with the pgsql2shp tool you can dump your postgis db in a shapefile and
 you should be able to import it in Neo4j Spatial in the following way:

 String shpPath = SHP_DIR + File.separator + layerName;
 ShapefileImporter importer = new ShapefileImporter(graphDb(), new
 NullListener(), commitInterval);
 importer.importFile(shpPath, layerName);


 Best Regards

 Andreas



 Am 12.08.2011 11:10, schrieb chen zhao:
 Hi,

 I very interested in neo4j spatial . but I do not know how to import the
 spatial data.

 My data are stored in postgis. I read the document 
 http://wiki.neo4j.org/content/Spatial_Data_Storage; and 
 http://wiki.neo4j.org/content/Importing_and_Exporting_Spatial_Data,but I
 yet do not know to to import data from postgis or import shapfiles.

 Could you provide some detail information?

 Please advice.

 zhao
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Spatial query with property filter

2011-08-29 Thread Craig Taverner
I can elaborate a little on what Peter says. The DynamicLayer support is
indeed the only way to do what you want right now, but I think it is
actually quite a good fit for your use case. When defining a dynamic layer
you are actually just defining a 'returnable evaluator', which will be
applied to the nodes during the RTree spatial search. This means that the
primary search is spatial, but for each leaf node (geometry) the dynamic
layer query is applied as a filter.

If you use CQL for the query, then all geometries are converted into JTS
geometry classes for the filter (which adds a little overhead, so if the
spatial query is not your limited factor, this can affect performance). If
you use JSON for the query, it is applied directly to the graph as a pattern
match. So JSON should be faster, but does also require that you know the
structure of the graph, which the CQL approach does not.

Peters pointer to the TestDynamicLayers class is the best place to start for
seeing how to use both CQL and JSON filter syntaxes.

On Mon, Aug 29, 2011 at 11:59 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Hi there,
 well, spatial querying is not something that can be easily stuck into an
 iterator. If you want more than casual querying, I think you need to use
 the
 GeoTools APIs, we provide support for CQL as a query lang there, see

 https://github.com/neo4j/spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestDynamicLayers.java#L60for
 some examples. Basically, you define a dynamic layer witha  CQL query,
 which will return the subset of the full layer (e.g. a SimplePointLayer)
 that matches that query.

 Would that help?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


 On Mon, Aug 29, 2011 at 1:37 AM, faffi obscurredbyclo...@gmail.com
 wrote:

  Hey guys,
 
  I'm seeing some kind of disconnect between the spatial and the regular
  graph
  traversing query. I can't find a way of executing a spatial query like in
  SimplePointLayer but also providing something like a ReturnEvaluator.
 
  My use case is essentially for all nodes within a 10km radius, return all
  with name foo. Do I actually have to iterate through all the nodes
  returned by the query in a list and individually check them?
 
  Thanks,
  faffi
 
  --
  View this message in context:
 
 http://neo4j-community-discussions.438527.n3.nabble.com/Spatial-query-with-property-filter-tp3291410p3291410.html
  Sent from the Neo4j Community Discussions mailing list archive at
  Nabble.com.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j low-level data storage

2011-10-07 Thread Craig Taverner
I think Daniels questions are very relevant, but not just to OSM. Any large
graph (of which OSM is simply a good example) will be affected by
fragmentation, and that can affect performance. I recently was hit by
performance of GIS queries (not OSM) related to fragmentation of the index
tree. I will describe that problem below, but first let me describe my view
on Daniels question.

It is true that if parts of the graph that are geographically close are also
close on disk the load time for bounding box queries will be faster.
However, this is not a problem that is easy to solve in a generic way,
because it requires knowledge of the domain. I can see two ways to create a
less fragmented graph:

   - Have a de-fragmenting algorithm that re-organizes an existing graph
   according to some rules. This does not exist in neo4j (yet?), but is
   probably easier to generalize, since it should be possible to first analyse
   the connectedness of the graph, and then defragment based on that. This
   means a reasonable solution might be possible without knowing the domain.
   - Be sure to load domain specific data in the order you wish to query it.
   In other words, create a graph that is already de-fragmented.

This second approach is the route I have started following (at least I've
taken one or two tiny baby-steps in that direction, but plan for more). In
the case of the OSM model produced by the OSMImporter in Neo4j-Spatial, we
do not do much here. We are importing the data in the order it was created
in the original postgres database (ie. in the order it was originally added
to open street map). However, since the XML format puts ways after all
nodes, we actually also store all ways after all nodes, which means that to
load any particular way completely from the database requires hitting disk
at at least two very different locations, the location of the way node and
the interconnects between the nodes, and the location(s) of the original
location nodes. This multiple hit will occur on the nodes, relationships and
properties tables in a similar way. So I can also answer a question Daniel
asked about the ids. The Neo4j nodes, relationships and properties have
their own id space. So you can have node 1, relationship 1 and property 1.
Lets consider a real example, a street made of 5 points, added early to OSM
(so low id's in both postgres and in neo4j). The OSM file will have these
nodes near the top, but the way that connects them together will be near the
bottom of the file. In Postgres the nodes and ways are in different tables,
and will both be near the top. In neo4j both osm-ways and osm-nodes are
neo4j-nodes (in the same 'table'). The osm-nodes will have low ids, but the
ways will have a high id. Also we use proxy nodes to connect osm-ways to
osm-nodes, and these will be created together with the way. So we will have
5 nodes with low ids, and 8 nodes with high id's (5 proxy nodes, 1 way node,
1 geometry node and 1 tags node). If the way was big and/or edited multiple
times, we could get even higher fragmentation. Personally I think that
fragmenting one geometry into a few specific locations is not a big problem
for the neo4j caches. However, when we are talking about a result-set or
traversal of thousands or hundreds of thousands of geometries, then doubling
or tripling the number of disk hits due to fragmentation can definitely have
a big impact.

How can this fragmentation situation be improved? One idea is to load the
data with two passes. The current loader is trying to optimize OSM import
speed, which is difficult already (and slower than in rdbms due to increased
explicit structure), and so we have a single pass loader, with a lucene
index for reconnecting ways to nodes. However, I think we could change this
to a two pass loader, with the first pass reading and indexing the point
nodes into a unique id-index (for fast postgres id lookup), and the second
pass would connect the ways, and store both the nodes and ways to the
database at the same time, in continuous disk space. This would improve
query performance, and if we make a good unique-id index faster than lucene,
we will actually also improve import speed .. :-)

Now all of the above does not answer the original question regarding
bounding box queries. All we will have done with this is improve the load
time for complete OSM geometries (by reducing geometry fragmentation). But
what about the index itself. We are storing the index as part of the graph.
Today, Neo4j-spatial uses an RTree index that is created at the end of the
load in OSMImporter. This means we load the complete OSM file, and then we
index it. This is a good idea because it will store the entire RTree in
contiguous disk space. Sort of  there is one issue with the RTree node
splitting that will cause slight fragmentation, but I think it is not too
serious. Now when performing bounding box queries, the main work done by the
RTree will hit the minimum amount of disk space, until 

Re: [Neo4j] Neo4j in GIS Applications

2011-10-07 Thread Craig Taverner
Hi all,

I am certainly behind on my emails, but I did just answer a related question
about OSM and fragmentation, and I think that might have answered some of
Daniels questions.

But I can say a little more about OSM and Neo4j here, specifically about the
issue of joins in postgres. Let me start by describing where I think
postgres might be faster than neo4j, and then move onto where neo4j is
faster than postgres.

Importing OSM data into postgres will be faster than neo4j because the
foreign keys are simple integer references between tables and are indexed
using postgres high performance indexes. In Neo4j the relationships are much
more detailed explicit bi-directional references taking more disk space (but
no index space). The disk write time is longer (more data written), but the
advantages of not having an index make it worth while.

So that leads naturally to where neo4j is faster. The reason there is no
index on the foreign key is because there is no need for one. Each
relationship contains the id of the node it points to (and points from), and
that id is directly mapped to the location on disk of the node itself. So
this is more like an array lookup, because all nodes are the same size on
disk. So the 'join' you perform when traversing from one osm-node to another
is extremely fast, but more importantly it is not affected by database size.
It is O(1) in performance! Fantastic! In rdbms, the need for an index on the
foreign key means you are building a tree structure to get the join down
from O(N) to O(ln(N)) or something better, but never as good as O(1).

In neo4j-spatial, if you perform a bounding box query, you are traversing an
RTree, which does not exist in posgres, but does exist in PostGIS. In both
Neo4j-Spatial and PostGIS you are working with a tree index that will slow
things down if there is a lot of data, and currently the postgis rtree is
better optimized than the neo4j-spatial rtree. But if you are performing
more graph-like processing, for example proximity searches, or routing
analysis, then you will get the full O(1) benefits of the graph database,
and no way can postgres match that :-)

OK. Lots of hype, but I get enthusiastic sometimes. Take anything I say with
a pinch of salt. Believe the part that make sense to you, and try some tests
otherwise. It would be great to hear your experiences with modeling OSM in
neo4j versus postgres.

Regards, Craig

On Tue, Oct 4, 2011 at 7:18 PM, Andreas Kollegger 
andreas.kolleg...@neotechnology.com wrote:

 Hi Daniel,

 If you haven't yet, you should check out the work done in the Neo4j Spatial
 project - https://github.com/neo4j/spatial - which has fairly
 comprehensive
 support for GIS.

 Data locality, as you mention, is exactly a big advantage of using a graph
 for geospatial data. Take a look at the Neo4j Spatial project and let us
 know what you think.

 Best,
 Andreas

 On Tue, Oct 4, 2011 at 9:58 AM, danielb danielbercht...@gmail.com wrote:

  Hello everyone,
 
  I am going to write my master thesis about the suitability of graph
  databases in GIS applications (at least I hope so^^). The database has to
  provide topological queries, network analysis and the ability to store
  large
  amount of mapdata for viewing - all based on OSM-data of Germany ( 100M
  nodes). Most likely I will compare Neo4j to PostGIS.
  As a starting point I want to know why you would recommend Neo4j to do
 the
  job? What are the main advantages of a graph database compared to a
  (object-)relational database in the GIS environment? The main focus and
 the
  goal of this work should be to show a performance improvement over
  relational databases.
  In a student project (OSM navigation system) we worked with relational
  (SQLite) and object-oriented (Perst) databases on netbook hardware and
  embedded systems. The relational database approach showed us two
 problems:
  If you transfer the OSM model directly into tables then you have a lot of
  joins which slows everything down (and lots of redundancy when using
  different tables for each zoom level). The other way is to store as much
 as
  possible in one big (sparse) table. But this would also have some
  performance issues I guess and from a design perspective it is not a nice
  solution. The object-oriented database also suffered from many random
 reads
  when loading a bounding box. In addition we could not say how data was
  stored in detail.
  The performance indeed increased after caching occured or by the use of
 SSD
  hardware. You can also store everything in RAM (money does the job), but
  for
  now you have to assume that all of the data has to be read from a slow
 disk
  the first time. Can Neo4j be configured to read for example a bounding
 box
  of OSM data from disk in an efficient way (data locality)?
  Maybe you also have some suggestions where I should have a look at in
 this
  work and what can be improved in Neo4j to get better results. I also
 would
  appreciate related papers.
 
  

Re: [Neo4j] Problem Installing Spatial (Beginner)

2011-10-07 Thread Craig Taverner
Sorry for such a late response, I missed this mail.

I must first point out that it seems you are trying to use Neo4j-Spatial in
the standalone server version of Neo4j. That is possible, but not well
supported. We have only exposed a few of the functions in the server, and do
not test it regularly.

The main way we are using neo4j-spatial at the moment is in the embedded
version of neo4j. This is where the maven instructions come in because they
assume you are writing a Java application that will embed the database. If
you are using a java application, and you can start using maven, then
everything should be easy to get working.

However, since I am relatively sure you are using neo4j-server, I think you
are getting into deep water. We need to improve our support for neo4j server
more before I can recommend you try it. The next release, 0.7, is focusing
on geoprocessing features, and then we hope to expose this in neo4j-server
in 0.8. Hopefully then things will be much easier for you.

On Tue, Sep 27, 2011 at 5:24 PM, handloomweaver a...@atomised.coop wrote:

 Hi

 I wonder if someone would be so kind to help. I'm new to Neo4j and was
 trying to install Neo4jSpatial to try its GIS features out. I need to be
 clear that I have no experience of Java  Maven so I'm struggling a bit.

 I want to install Neo4j  Spatial once somewhere on my 4GB MacBook Pro. I
 have no problem downloading the Neo4j Java Binary and starting it. But I'm
 confused about the Spatial library. Looking at the Github page it says
 either use Maven or copy a zip file into a folder in Neo4j. Is the zip file
 the Github repository contents or something else?

 I've tried the Maven way (mvn install) described on GitHub but I'm firstly
 confused about if/where Neo4j is being installed (does it install it first,
 where?) and anyway the install fails. It seems to be a JVM Heap memory
 problem? Why is it failing. How can I make it not fail. Is it a config file
 somewhere needing tweaked?

 http://handloomweaver.s3.amazonaws.com/Terminal_Output.txt
 http://handloomweaver.s3.amazonaws.com/surefire-reports.zip

 I'm really keen to use Neo4J spatial but the barrier to entry for the less
 technical GIS developer is proving too high for me!

 I'd SO appreciate some help/pointers. I apologise that I am posting such a
 NOOB question on your forum but I've exhausted Google searches.

 Thanks





 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Problem-Installing-Spatial-Beginner-tp3372924p3372924.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] osm_import.rb

2011-11-11 Thread Craig Taverner
Hi,

Sorry for a late contribution to this discussion. I will try make a few
comments to cover the various mails above.

Firstly, the neo4j-spatial.rb GEM at version 0.0.8 on RubyGems works with
Neo4j-Spatial 0.6, which does include the non-batch inserter code, so in
principle should work for you. However, there is a need to change one line
of code in the Ruby to make it use the normal graph API instead of the
batch inserter. I will commit this change later, but for now you would
change line 118 of osm.rb (see
https://github.com/craigtaverner/neo4j-spatial.rb/blob/master/lib/neo4j/spatial/osm.rb#L118),
to instead look like:
#@importer.import_file batch_inserter, @osm_path
@importer.import_file normal_database, @osm_path, false, 5000
(basically you replace 'batch_inserter' with 'normal_database' and add the
two extra parameters 'false, 5000').

Looking at the errors you are getting, I see they are, as you suspected,
related to out of date instructions. I will try get round to updating the
instructions soon, but in the meantime:

   - For using the Ruby Gem, you should use the osm_import command (added
   automatically to your path when you install the gem). So you can replace
   the command 'jruby -S examples/osm_import.rb' with just 'osm_import'.
   - When using the code directly from github, there is a jar missing in
   the lib/neo4j/spatial/jars directory. This is the
   neo4j-spatial-0.6-SNAPSHOT.jar, which can be downloaded and copied into
   that directory manually. The direct link to this file on the
m2.neo4j.orgsite is
   
http://m2.neo4j.org/org/neo4j/neo4j-spatial/0.6-SNAPSHOT/neo4j-spatial-0.6-SNAPSHOT.jar

Your last comment about 'includePoints' is just a setting for whether or
not to use all OSM points as individual geometries or not. The default is
false because you normally do not want to be able to search for all points
on a long road, but for the road itself. I recommend leaving this as false,
unless you have a specific need.

Regards, Craig

On Thu, Nov 10, 2011 at 2:51 PM, grimace macegh...@gmail.com wrote:

 I ended up trying again with just java (but still running with
 batchInserter), adjusting my memory settings and max heap, it's currently
 working on the americas.osm file from cloudmade -
 http://downloads.cloudmade.com/americas#downloads_breadcrumbs. The file is
 about 99 GB when assembled.

 I'm running on ubuntu 11.10 Core 2 Duo 2.Ghz with 4G ram (not very fast,
 but
 what I have available right now),

 Java Heap -- -Xmx=3072M
 config settings:
 neostore.nodestore.db.mapped_memory=1000M
 neostore.relationshipstore.db.mapped_memory=300M
 neostore.propertystore.db.mapped_memory=400M
 neostore.propertystore.db.strings.mapped_memory=800M
 neostore.propertystore.db.arrays.mapped_memory=100M

 My code is essentially from the test suite that you suggested but I am
 using
 the batchImporter instead.  I'm about 1/3 of the way through and don't want
 to interrupt the process, but when it's done I'll try it without the batch
 importer.  It runs at about 4500 nodes/second.  Is that reasonable? I
 haven't looked at performance numbers from anyone else. Would the non batch
 performance be better?

 Is is better to 'includePoints' or not?

 One questions I had was, once I get this imported via this method ( neo4j
 embedded ), is it possible to move the imported db to a neo4j server?  I'm
 hoping it is. If so, what would that process be?



 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/osm-import-rb-tp3493463p3496760.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OSMImporter: Is there a way to do incremental imports?

2011-11-22 Thread Craig Taverner
I did some initial work on incremental imports back in 2010, but stopped
due to some complications:

   - We needed to mix lucene reads and writes during the import (read to
   check if the node already exists, so we don't import twice) and this
   performs very badly in the batch inserter. We decided to first code a
   non-batch insert mode before re-starting the incremental import work. Now
   Peter and I did code a non-batch importer in early 2011, but never went
   back to complete the incremental import.
   - We wanted to support both the case of importing multiple OSM files
   that could be stitched together by resolving overlaps, as well as the case
   of applying changesets to the existing OSM model. This increased the
   complexity of the work just enough to ensure it got dropped. In early 2011
   we also added support to changesets in the model (but only as a data
   structure, not in terms of importing changesets). So we are one step closer
   to this also.

Since we now have non-batch importing, and changeset data structures, the
opportunity to re-start the incremental import and importing changesets is
there. It should not be too hard.

For incremental imports, stitching osm files together, we re-activate the
old code that tests the lucene index before adding nodes and relations.
There might be some subtle edge cases to consider, but a set of tests with
overlapping and non-overlapping osm files should flush them out.

For applying changesets, more thinking is still required. Do we want to
support history in the model, or only the latest version? Should we verify
that only newer changesets are applied and in the right order, or rely on
the user to get it right?

I can say that we did some thinking this summer on the data structures
required to support a complete change history. This relies on the fact that
we already support multiple possible ways on the same nodes, so we can
also, in principle, support multiple possible 'versions' of ways on the
same nodes. More thinking is required, but I have a suspicion that we
should actually go ahead and do this properly will full history, because
that might be the only way to make sure the user never messes things up by
importing in the wrong order.

On Tue, Nov 22, 2011 at 9:58 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Gregory,
 incremental loads (and thus, restarts of OSM imports) are a feature we
 want to add later on, but it's not in there yet. This would also mean
 we could stitch in other areas on demand, and support submitting
 changesets back to OSM or at least capture them, so you as an OSM
 based app can contribute to OSM automagically.

 I know it's much to ask, but help here would be greatly appreciated. I
 hope to lab with Michael Hunger on import of data into OSM (and
 others) this Friday and hope to get somewhere :)

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org  - NOSQL for the Enterprise.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.



 On Tue, Nov 22, 2011 at 7:15 AM, grimace macegh...@gmail.com wrote:
  I've been playing with OSMImporter; tried batch and native java.  I've
 had
  mixed success trying to import the planet, but since it's of considerable
  size, the job usually blows up or grinds to a halt about half way.  I
 think
  the most I've made it to is 651M nodes and that's not even the ways or
  relations.   I just don't know enough about it and thought I would ask
  before I try to dive in to it, but what would I have to do to so that I
  could restart the job ( where it left off ) when it blows?
 
  --
  View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3526941.html
  Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] possibility to merge some neo4j databases?

2011-11-29 Thread Craig Taverner
There are two approaches I can think of:
- use a better index for mapping ids. Lucent is too slow. Memory hashtables
are memory bound.Peter has been investigating alternative dbs like bdb. I
tried, but did not finish a hashmap of cached arrays, and Chris wrote his
big data import project on github, which is a hashmap of cached hashmaps.
Many promising solutions, but none yet complete. All Target the general
case of id mapping.
- for this specific case, merging small databases, I had an idea a couple
of years ago which I still think will work. Bulk appending entire
databases, by offsetting all internal ids by the current max id. I remember
the reason Johan did not like this idea was that it suffered from the same
flaws as the batch inserter, locking the entire db, no rollback and risk of
entire db corruption. For people happy with the batch inserter, perhaps
this is still an option, but unlikely to get prioritized by the neo team
because if the corruption risks. It would, however, perform spectacularly
well since the id map is a trivial function.

Personally I hope someone completes Chris persistent hashmap or a similar
solution. Id maps are a recurring theme and would be very valuable.
On Nov 29, 2011 12:07 PM, osallou olivier.sal...@gmail.com wrote:

 Hi,
 I need to batch insert millions of data in neo4j.
 It is quite difficult to keep all in a Map to get node ids, so it needs
 frequent lookups in index to get some node ids for relationships, and
 result is quite low.

 Is there any way to build several neo4j databases (independantly) then
 to merge them? (I could build many small db in parallel)

 Thanks

 Olivier

 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/possibility-to-merge-some-neo4j-databases-tp3544694p3544694.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Contributors section in the manual

2011-11-29 Thread Craig Taverner
What is the sort order? Date of first commit, number of lines, commits,
packages?
On Nov 21, 2011 2:35 PM, Peter Neubauer peter.neuba...@neotechnology.com
wrote:

 Everyone,
 have started to put in some random people in, see
 http://docs.neo4j.org/chunked/snapshot/contributors.html .

 Any ideas what more info to provide here, or how to make this nicer?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org  - NOSQL for the Enterprise.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.



 On Sun, Nov 13, 2011 at 10:42 AM, Peter Neubauer
 peter.neuba...@neotechnology.com wrote:
  To start with,
  The manual is for the direct codebase that is part of the distribution.
 The
  next step is to include sections and pointers to other stable related
  projects and drivers.
 
  Does that sound reasonable?
 
  On Nov 13, 2011 1:36 AM, Nigel Small ni...@nigelsmall.name wrote:
 
  Are you looking for info on associated projects like py2neo or direct
  contributions to the main code base?
 
  On a side note, I've been getting quite a few hits to my blog post on
  pagination in Neo4j. The bits I wrote for that are all Python/py2neo
 again
  but that or something similar might be worth including somewhere on the
  Neo
  site as it appears to be a reasonably sought-after topic.
 
  Cheers
 
  *Nigel Small*
  Phone: +44 7814 638 246
  Blog: http://nigelsmall.name/
  GTalk: ni...@nigelsmall.name
  MSN: nasm...@live.co.uk
  Skype: technige
  Twitter: @technige https://twitter.com/#!/technige
  LinkedIn: http://uk.linkedin.com/in/nigelsmall
 
 
 
  On 12 November 2011 20:40, Peter Neubauer
  peter.neuba...@neotechnology.comwrote:
 
   Hi guys,
   I would love to add a section on contributors to the Neo4j Manual, in
   http://docs.neo4j.org/chunked/snapshot/community.html so that all of
   you that participate in the process can be found in there.
  
   Do you have any suggestions on how to present this, that is - what
   info, links and maybe a short presentation snippets and pictures?
   Graph to components or simply a table?
  
   Thoughts?
  
   Cheers,
  
   /peter neubauer
  
   GTalk:  neubauer.peter
   Skype   peter.neubauer
   Phone   +46 704 106975
   LinkedIn   http://www.linkedin.com/in/neubauer
   Twitter  http://twitter.com/peterneubauer
  
   http://www.neo4j.org  - NOSQL for the Enterprise.
   http://startupbootcamp.org/- Öresund - Innovation happens HERE.
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OSMImporter: Is there a way to do incremental imports?

2011-12-06 Thread Craig Taverner
There was only a method ending in 'WithCheck', or something like that,
lying unused in the code from last year. Nothing more than that. Except for
thinking about it, which is why I wrote the previous mail.
On Dec 2, 2011 12:50 PM, Peter Neubauer pe...@neubauer.se wrote:

 Not sure,
 Craig, do you have the code somewhere?

 /peter

 On Tue, Nov 22, 2011 at 4:17 PM, grimace macegh...@gmail.com wrote:
  thanks for the response(s)!  The hardware I'm testing on is not the best
 and
  only 4G of ram so I'm limited, but this seems the best opportunity for
 me to
  learn this...that being said...
 
  For incremental imports, stitching osm files together, we re-activate
 the
  old code that tests the lucene index before adding nodes and relations.
  There might be some subtle edge cases to consider, but a set of tests
  with
  overlapping and non-overlapping osm files should flush them out.
 
  I'd love to play with this. Is the old code there for me to re-enable in
  testing? Or can you point me to where this might be put in?
 
  Thx,
  Greg
 
  --
  View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3527995.html
  Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Are graphs ok for lots of Event data

2011-12-07 Thread Craig Taverner
Of course the graph can be used for processing event data, and whether
that works for your case or not depends. But we have used it for this,
and I can discuss a few points.

The event stream is obviously just a linear chain and can be modeled
as such in the graph (eg. with NEXT relationships between event
nodes). However this does not bring much advantage over the original
flat file which already has implicit next (next line, assuming time
ordered). You could instead use a TimeLineIndex to manage the order,
and then you would have an advantage over disordered original data.
Durations between events can be new nodes with START and END
relationships to the individual events, and the time difference
optionally added as a property to the duration node.

One nice thing about the graph is that you can keep adding data and
structure as you go, sometimes much later. So your question about
adding server and number of items processed, etc, can be added later,
at your convenience.

When grouping events together and getting statistics, some things can
be added incrementally, like max/min/count/total. But percentile is
not so trivial. Consider the case where you want to know the
statistics for each hour of events. If you have an hour node connected
to all event nodes in that hour, you can update the
max/min/count/total values as new event data enters the database. But
percentile needs to be calculated once all events in the hour have
arrived. This can be handled at the application level.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] path finding using OSM ways

2011-12-07 Thread Craig Taverner
We do indeed have twice the node count (and twice the relationship
count). This is a necessary side effect of the fact that an OSM node
can participate in more than one way (at intersections as well as
shared edges of polygons, etc.). In addition, with shared edges the
direction can be reversed from one way to the other, so we need a
completely separate set of nodes and relationships to model one way
versus the other. We have considered a compacted version of the graph
where we only use the extra nodes and relationships when they are
needed, but the code to decide when they are needed or to convert the
subgraph to the expanded version when needed (ie. when a new joined
way is loaded) would be much more complex, and therefor susceptible to
bugs. We choose a cleaner, simpler code base over a more complex, but
more compact graph.

Now we also want to model historical changes. It appears that the use
of multiple nodes/relationships will also allow us to model this, so
it is a good thing (tm) :-)

For routing, I would create a set of relationships connecting directly
all nodes that are intersection points, and ignoring all the nodes
along the way. We can add edge weights to these new relationships for
the distance traveled, or other appropriate weighting factors (type of
road, possible speed, hinderences, etc.). This graph would be ideal
for routing calculations. The main OSM graph is not ideal for routing,
but is designed to be a true and accurate reflection of the original
OSM data and topology stored in the open street map database. With
Neo4j we can do both :-)

These routing relationships have not been added to the current OSM
model in neo4j-spatial, but would be relatively trivial to add (if we
ignore advanced concepts like turning restrictions). They could be
added by the OSMImporter code that identifies intersections, with only
a few lines of extra code (I think ;-)

On 12/6/11, danielb danielbercht...@gmail.com wrote:

 craig.taverner wrote

 ...
 - Create a way-point node for these
 ...


 Hi together,

 I wonder why to add extra nodes to the graph (if I understand Craig
 correctly)? Wouldn't you then end up in expanding twice the node count
 (way-point nodes and OSM nodes themself, because you have to query the OSM
 id (or any other identification value of the end node) in every expand and
 lat / lon if you don't have precompiled edge weights)? I would just connect
 the OSM nodes directly with new edges to form a routing subgraph.

 Best Regards,
 Daniel


 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-path-finding-using-OSM-ways-tp3004328p3564688.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Feedback requested: Major wish list item for Neo4J

2011-12-07 Thread Craig Taverner
I definitely second this suggestion. We have recently being working on a
binary store for dense data we would like to access as if they were
properties of nodes. Right now we have properties that are references to
files on disk, and then handle the binary ourselves, but this does not
benefit from any transactional advantages. Rick's suggestion of a plugable
store would suite us very well, because I presume Neo4j would specify the
interface/api to use to implement such a store in a way that could be
handled atomically within transactions, and then we could satisfy that with
our own store.

On Wed, Dec 7, 2011 at 3:43 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

 One area where I would love to see the Neo4J team focus some energy is in
 the efficient storage and retrieval of blob/large text properties.  Similar
 to the indexing strategy in Neo4J, it would be nice if this was pluggable
 (and it could depend on some other data store more optimized for blob/clob
 properties).

 The keys for this to be successful are:

 - Transacted
 - Does not store these properties in memory except when accessed (and
 then, perhaps offer a getPropertyAsStream method and a
 setPropertyFromStream method for optimal performance)
 - Transparent - should just work

 Nice to haves, but not at all required in the first iteration:

 - Pluggable (store in Neo4J native, filesystem, EC2 simple storage, etc.)

 Addition of these capabilities would move Neo4J into a dramatically
 expanded realm of potential applications, some of which are quite mind
 blowing, both in the social realm and in the enterprise realm.

 Feedback welcomed!
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


<    1   2   3