Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-07-06 Thread Craig Taverner
Hi Boris,

I can see the new update method here:
https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/server/plugin/SpatialPlugin.java#L138

And the commit for it is here:
https://github.com/neo4j/neo4j-spatial/commit/22eaf91957a6265ef1e6923b5da572b75383b83e

Hope that helps.

Let me know if this works. The REST method is entirely untested, but does
wrap code that is tested, so I'm relatively optimistic :-)

Regards, Craig

On Wed, Jul 6, 2011 at 1:51 AM, Boris Kizelshteyn bo...@popcha.com wrote:

 Hi Craig,

 This is awesome!

 Where is the update method? I can't find the code on github.

 Thanks!

 On Sat, Jul 2, 2011 at 6:00 PM, Craig Taverner cr...@amanzi.com wrote:

  As I understand it, Andreas is working on the much more complex problem
 of
  updating OSM geometries. That is more complex because it involves
  restructuring the connected graph.
 
  The case Boris has is much simpler, just modifying the WKT or WKB in the
  editable layer. In the Java API this is simply to call the
  GeometryEncoder.encodeGeometry() method, which will modify the geometry
 in
  place (ie. replace the old geometry with a new one). However, I do not
  think
  it is that simple on the REST interface. I can check, but think we will
  need
  a new method for updating geometries. Internally it is trivial to code.
 
  So I just added a quick method, called updateGeometryFromWKT, which
  requires
  the geometry (in WKT), the existing geometry node-id, and the layer. Give
  it
  a try.
 
  On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.com
  wrote:
 
   Actually,
   Andreas Wilhelm is working right now on updating geometries.
  
   Sent from my phone.
   On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote:
Wow that's great! I'll try it out asap. This leads to my next
 question:
   how
do I update the geometry in a layer, rather than add new? What I am
   thinking
of doing is having a multipoint geometery associated with each of my
  user
nodes which will represent their location history. My plan is to add
  the
geometry to a world layer and then associate the returned node with
  the
user. How do I then add new points to that connecter node? Can I just
   edit
the wkt and assume the index will update? Or do you have a better
   suggestion
for doing this? I would rather avoid having each point be a seperate
  node
   as
I am tracking gps data and getting lots of coordinates, it would be
  many
thousands of nodes per user.
   
Many thanks!
   
   
   
On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com
   wrote:
   
Hi Boris,
   
Ah! You are using the REST API. That changes a lot, since Neo4j
  Spatial
   is
only recently exposed in REST and we do not expose most of the
capabilities
I have discussed in this thread, or indeed in my other answer
 today.
   
I did recently add some REST methods that might work for you,
   specifically
the addEditableLayer, which makes a WKB layer, and the
addGeometryWKTToLayer, for adding any kind of Geometry (including
LineString) to the layer. However, these were only added recently,
  and
   I
have no experience using them myself, so consider this very much
   prototype
code. From your other question today, can I assume you are having
   trouble
making sense of the data coming back? So we need a better way to
  return
the
results in WKT instead of WKB? One option would be to enhance the
addEditableLayer method to allow the creation of WKT layers instead
  of
   WKB
layers, so the internal representation is more internet friendly.
   
I've just added untested support for setting the format to WKT for
  the
internal representation of the editable layer in the REST
 interface.
   This
is
untested (outside of my usual unit tests, that is), and is only in
  the
trunk
of neo4j-spatial, but you are welcome to try it out and see what
   happens.
   
Regards, Craig
   
On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn 
 bo...@popcha.com
wrote:
   
 Hi Craig,

 Thanks so much for this reply. It is very insightful. Is it
  possible
   for
me
 to implement the LineString geometries and lookups using REST?

 Many thanks!

 On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com
 
wrote:

  OK. I understand much better what you want now.
 
  Your person nodes are not geographic objects, they are persons
  that
can
 be
  at many positions and indeed move around. However, the 'path'
  that
they
  take
  is a geographic object and can be placed on the map and
 analysed
  geographically.
 
  So the question I have is how do you store the path the person
   takes?
Is
  this a bunch of position nodes connected back to that person?
 Or
perhaps
 a
  chain of 

Re: [Neo4j] neo4j + RDF + SPARQL or non-RDF?

2011-07-06 Thread Mattias Persson
I'd say that you can often use nodes and relationships without URIs,
although maybe some concept of IDs other than the internal ids of nodes and
relationships. Data stored in neo4j can often be seen as triples-like
statements:

   (personA)--[KNOWS]--(personB)

but that's just the simplest form... f.ex:

   (personA)--[KNOWS]--(personB)--[MARRIED_TO]--(personC)
   |
[LIVES_IN]
   |
  v
  (Sweden)

and you can traverse that as one graph, whereas each node-relationship-node
could in this setting be viewed as one triple in RDF terms. I think RDF is
needlessly limiting graph capabilities to triples, and neo4j (and fellow
property graphs) does not. Also check out the Cypher query language,
http://docs.neo4j.org/chunked/1.4.M06/cypher-query-lang.html

2011/7/3 noppanit noppani...@gmail.com

 Hi Folks,

 I'm very interested in RDF and SPARQL, but I'm a total newbie. However,
 since neo4j can do the same thing without having to use RDF or SPARQL to
 traverse the graph. Would it be better if I use RDF to store the graph with
 URIs or I can just ignore that and use pure nodes and relationships to
 store
 the data, but to store in triples-like structure? And what does neo4j guys
 think about RDF and the direction of neo4j with semantic web, because I
 think neo4j is the perfect tool for semantic web.

 Cheers,
 Toy.

 --
 View this message in context:
 http://neo4j-user-list.438527.n3.nabble.com/neo4j-RDF-SPARQL-or-non-RDF-tp3135352p3135352.html
 Sent from the Neo4J User List mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Possibility of extending Zookeeper features to App Server

2011-07-06 Thread Mattias Persson
The role of Zookeeper in HA in neo4j is just master election and instance
discovery. It's not involved in the actual communication between instances.
Knowing that, are your questions still valid?

2011/7/1 Brendan Cheng ccp...@gmail.com

 Hi,

 I'm very interested your HA architecture and wonder if possible to
 extend the zookeeper features in order to cover the jobs for an app
 server.
 So that, we can have much simply architecture.  The jobs for app
 server includes user authentication, encryption service for
 communication..etc.
 from your experience, is this going to work or what is a preferred
 architecture?

 Regards,

 Brendan
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Start script of 1.4.M05/1.4.M06 fails with older versions of bash + FIX

2011-07-06 Thread Stephan Hagemann
Hi everyone,

a rather old linux installation on our build server led us to find out that
the new start script introduced in M05 (?) does not work with all versions
of bash.

We got:
cruise:/virtual/hudson/hudson_home/jobs/graphdb/workspace#
/opt/neo4j/bin/neo4j start
/opt/neo4j/bin/neo4j: line 37: syntax error in conditional expression:
unexpected token `('
/opt/neo4j/bin/neo4j: line 37: syntax error near `^(['
/opt/neo4j/bin/neo4j: line 37: `  if [[ ${line} =~ ^([^#\s][^=]+)=(.+)$
]]; then'

On a system with this info:
ext-xecruise52-1:/opt/neo4j/bin# cat /proc/version
Linux version 2.6.28.7-ibm-x3650 (root@obc-fai42-1) (gcc version 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Feb 26 13:50:31 CET 2009

Here is the quick fix I just found (no patch, since I don't want to suggest
I know it works on other system versions...).
Enclose the regexps on line 37ff in quotes as so:

  if [[ ${line} =~ ^([^#\s][^=]+)=(.+)$ ]]; then
key=`echo ${BASH_REMATCH[1]} | sed 's/\./_/g'`
value=${BASH_REMATCH[2]}
if [[ ${key} =~ ^(.*)_([0-9]+)$ ]]; then


Cheers,
Stephan
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] REST Batch: failed to insert empty array as a node property

2011-07-06 Thread Igor Dovgiy
When trying to process POSTing to batch-path of something like...

[{id:1,
 method:POST,
  to:/node,
  body:{user_properties:[]}
}]

...server fails with...
  exception : java.lang.RuntimeException,
  stacktrace : [
org.neo4j.server.rest.web.BatchOperationService.performJob(BatchOperationService.java:137),
org.neo4j.server.rest.web.BatchOperationService.performBatchOperations(BatchOperationService.java:83),
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method),
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source),
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source),
java.lang.reflect.Method.invoke(Unknown Source) etc.

A bug, or have I missed something in Neo4j docs?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
I have a graph with roughly 10M nodes. Some of these nodes are highly 
connected to other nodes. For example I may have a single node with 1M+ 
relationships. A good analogy is a population that has a  lives-in 
relationship to a state. Now the problem...

Both neoclipse or neo4j-shell are terribly slow when working with these 
nodes. In the shell I would expect a `cd node-id` to be very fast, 
much like selecting via a rowid in a standard DB. Instead, I usually see 
several seconds delay. Doing a `ls` takes so long that I usually have to 
just kill the process. In fact `ls` never outputs anything which is odd 
since I would expect it to stream the output as it found it. I have 
very similar performance issues with neoclipse.

I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. 
Disclaimer, I am new to Neo4j.

Thanks,
Andrew
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Lifecycle of a Graph ?

2011-07-06 Thread V
Can anyone explains me the life cycle of a graph with Neo4j spring data
graph.

I want to load multiple copies of a persisted graph in memory, but some how
the second instance returns null. Not sure if I am doing something wrong or
if this is something related to the graph lifecycle?
Any comments/ suggestions please.

Karan
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lifecycle of a Graph ?

2011-07-06 Thread Michael Hunger
Could you explain, how you load it into memory? 

And what you do that returns null?

The graph itself has no lifecycle.

Then Graph-Entities are attached the nodes and relationships when you load (via 
repositories, cypher or direct, or when you navigate along relationships) them 
and allow write through in transactions and read-through all the time.

They become detached when you leave the tx and start changing properties.

Newly created entities are also detached.

Cheers

Michael


Am 06.07.2011 um 16:20 schrieb V:

 Can anyone explains me the life cycle of a graph with Neo4j spring data
 graph.
 
 I want to load multiple copies of a persisted graph in memory, but some how
 the second instance returns null. Not sure if I am doing something wrong or
 if this is something related to the graph lifecycle?
 Any comments/ suggestions please.
 
 Karan
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Michael Hunger
Andrew,

could you please also try to access the graph via the latest Milestone 1.4.M06 
to see if things have improved.

Does this behaviour only effect the supernodes or every node in your graph 
(e.g. when you access, cd, ls a person-node?)

We've been discussing some changes to the initial loading/caching that might 
improve performance on heavily connected (super-)nodes.

If our changes and tests are successful these change will be integrated in 
early 1.5. Milestones.

Cheers

Michael

Am 06.07.2011 um 16:15 schrieb Andrew White:

 I have a graph with roughly 10M nodes. Some of these nodes are highly 
 connected to other nodes. For example I may have a single node with 1M+ 
 relationships. A good analogy is a population that has a  lives-in 
 relationship to a state. Now the problem...
 
 Both neoclipse or neo4j-shell are terribly slow when working with these 
 nodes. In the shell I would expect a `cd node-id` to be very fast, 
 much like selecting via a rowid in a standard DB. Instead, I usually see 
 several seconds delay. Doing a `ls` takes so long that I usually have to 
 just kill the process. In fact `ls` never outputs anything which is odd 
 since I would expect it to stream the output as it found it. I have 
 very similar performance issues with neoclipse.
 
 I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. 
 Disclaimer, I am new to Neo4j.
 
 Thanks,
 Andrew
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Rick Bullotta
Hi, Michael.

Are you thinking maybe of lazily loading relationships in 1.5?  That might be a 
huge boost.

Rick

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Michael Hunger
Sent: Wednesday, July 06, 2011 10:32 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships

Andrew,

could you please also try to access the graph via the latest Milestone 1.4.M06 
to see if things have improved.

Does this behaviour only effect the supernodes or every node in your graph 
(e.g. when you access, cd, ls a person-node?)

We've been discussing some changes to the initial loading/caching that might 
improve performance on heavily connected (super-)nodes.

If our changes and tests are successful these change will be integrated in 
early 1.5. Milestones.

Cheers

Michael

Am 06.07.2011 um 16:15 schrieb Andrew White:

 I have a graph with roughly 10M nodes. Some of these nodes are highly 
 connected to other nodes. For example I may have a single node with 1M+ 
 relationships. A good analogy is a population that has a  lives-in 
 relationship to a state. Now the problem...
 
 Both neoclipse or neo4j-shell are terribly slow when working with these 
 nodes. In the shell I would expect a `cd node-id` to be very fast, 
 much like selecting via a rowid in a standard DB. Instead, I usually see 
 several seconds delay. Doing a `ls` takes so long that I usually have to 
 just kill the process. In fact `ls` never outputs anything which is odd 
 since I would expect it to stream the output as it found it. I have 
 very similar performance issues with neoclipse.
 
 I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. 
 Disclaimer, I am new to Neo4j.
 
 Thanks,
 Andrew
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Peter Neubauer
Andrew,
if you upgrade to 1.4.M06, your shell should be able to do Cypher in
order to count the relationships of a node, not returning them:

start n=(1) match (n)-[r]-(x) return count(r)

and try that several times to see if cold caches are initially slowing
down things.

or something along these lines. In the LS and Neoclipse the output and
visualization will be slow for that amount of data.

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Wed, Jul 6, 2011 at 4:15 PM, Andrew White li...@andrewewhite.net wrote:
 I have a graph with roughly 10M nodes. Some of these nodes are highly
 connected to other nodes. For example I may have a single node with 1M+
 relationships. A good analogy is a population that has a  lives-in
 relationship to a state. Now the problem...

 Both neoclipse or neo4j-shell are terribly slow when working with these
 nodes. In the shell I would expect a `cd node-id` to be very fast,
 much like selecting via a rowid in a standard DB. Instead, I usually see
 several seconds delay. Doing a `ls` takes so long that I usually have to
 just kill the process. In fact `ls` never outputs anything which is odd
 since I would expect it to stream the output as it found it. I have
 very similar performance issues with neoclipse.

 I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
 Disclaimer, I am new to Neo4j.

 Thanks,
 Andrew
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
This is consistently slow. I made a graph which just goes off of the 
root reference node (0) and I am seeing the following...

(0)$ cd 1 about 1 minute
(1)$ cd 0 instant
(0)$ cd 1 about 1 minute


It's almost like it is scanning the entire relationship list before 
actually looking up the next node. Of note I have found the following 
when running neoclipse...

WARNING: [/path/to/neo4j-db/neostore.relationshipstore.db] Unable
to memory map


And I see this in the logs...

neostore.nodestore.db.mapped_memory=20M
neostore.propertystore.db.arrays.mapped_memory=130M
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.relationshipstore.db.mapped_memory=100M

Am I missing something obvious? Even without memory maps, I would expect 
this to be somewhat faster since reading 156MB (the size of my 
neostore.relationshipstore.db file) of relation data should be very 
fast. Also, is there anyway to do a pre-warm up so that the first hit 
isn't so slow? I would hate for my first user in PROD to get hammered 
because a cache wasn't warmed up.

Thanks,
Andrew


On 07/06/2011 09:24 AM, Rick Bullotta wrote:
 Hi, Andrew.

 In general, this scenario (1 million+ relationships on a node) can be slow, 
 but usually only the first time you access the node.  If you're only 
 accessing the node once in a session, then yes, it will seem sluggish.  The 
 Neoclipse issue is probably a combination of two issues: the first is lazily 
 loading the node information the first time, and the second is the visual 
 rendering of all those relationships.

 Rick

 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
 Behalf Of Andrew White
 Sent: Wednesday, July 06, 2011 10:15 AM
 To: user@lists.neo4j.org
 Subject: [Neo4j] Performance issue on nodes with lots of relationships

 I have a graph with roughly 10M nodes. Some of these nodes are highly
 connected to other nodes. For example I may have a single node with 1M+
 relationships. A good analogy is a population that has a  lives-in
 relationship to a state. Now the problem...

 Both neoclipse or neo4j-shell are terribly slow when working with these
 nodes. In the shell I would expect a `cdnode-id` to be very fast,
 much like selecting via a rowid in a standard DB. Instead, I usually see
 several seconds delay. Doing a `ls` takes so long that I usually have to
 just kill the process. In fact `ls` never outputs anything which is odd
 since I would expect it to stream the output as it found it. I have
 very similar performance issues with neoclipse.

 I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
 Disclaimer, I am new to Neo4j.

 Thanks,
 Andrew
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-07-06 Thread Boris Kizelshteyn
Ah ha ... the reason I couldn't find it is because there is a typo ...
udpateGeometryFromWKT the p and d are switched :)

However, I rebuilt it but do not see this in the REST extensions after
moving everything from /target/dependency to plugins. Any thoughts?

Thanks!

On Wed, Jul 6, 2011 at 4:31 AM, Craig Taverner cr...@amanzi.com wrote:

 Hi Boris,

 I can see the new update method here:

 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/server/plugin/SpatialPlugin.java#L138

 And the commit for it is here:

 https://github.com/neo4j/neo4j-spatial/commit/22eaf91957a6265ef1e6923b5da572b75383b83e

 Hope that helps.

 Let me know if this works. The REST method is entirely untested, but does
 wrap code that is tested, so I'm relatively optimistic :-)

 Regards, Craig

 On Wed, Jul 6, 2011 at 1:51 AM, Boris Kizelshteyn bo...@popcha.com
 wrote:

  Hi Craig,
 
  This is awesome!
 
  Where is the update method? I can't find the code on github.
 
  Thanks!
 
  On Sat, Jul 2, 2011 at 6:00 PM, Craig Taverner cr...@amanzi.com wrote:
 
   As I understand it, Andreas is working on the much more complex problem
  of
   updating OSM geometries. That is more complex because it involves
   restructuring the connected graph.
  
   The case Boris has is much simpler, just modifying the WKT or WKB in
 the
   editable layer. In the Java API this is simply to call the
   GeometryEncoder.encodeGeometry() method, which will modify the geometry
  in
   place (ie. replace the old geometry with a new one). However, I do not
   think
   it is that simple on the REST interface. I can check, but think we will
   need
   a new method for updating geometries. Internally it is trivial to code.
  
   So I just added a quick method, called updateGeometryFromWKT, which
   requires
   the geometry (in WKT), the existing geometry node-id, and the layer.
 Give
   it
   a try.
  
   On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer 
 neubauer.pe...@gmail.com
   wrote:
  
Actually,
Andreas Wilhelm is working right now on updating geometries.
   
Sent from my phone.
On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com
 wrote:
 Wow that's great! I'll try it out asap. This leads to my next
  question:
how
 do I update the geometry in a layer, rather than add new? What I am
thinking
 of doing is having a multipoint geometery associated with each of
 my
   user
 nodes which will represent their location history. My plan is to
 add
   the
 geometry to a world layer and then associate the returned node
 with
   the
 user. How do I then add new points to that connecter node? Can I
 just
edit
 the wkt and assume the index will update? Or do you have a better
suggestion
 for doing this? I would rather avoid having each point be a
 seperate
   node
as
 I am tracking gps data and getting lots of coordinates, it would be
   many
 thousands of nodes per user.

 Many thanks!



 On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com
wrote:

 Hi Boris,

 Ah! You are using the REST API. That changes a lot, since Neo4j
   Spatial
is
 only recently exposed in REST and we do not expose most of the
 capabilities
 I have discussed in this thread, or indeed in my other answer
  today.

 I did recently add some REST methods that might work for you,
specifically
 the addEditableLayer, which makes a WKB layer, and the
 addGeometryWKTToLayer, for adding any kind of Geometry (including
 LineString) to the layer. However, these were only added
 recently,
   and
I
 have no experience using them myself, so consider this very much
prototype
 code. From your other question today, can I assume you are having
trouble
 making sense of the data coming back? So we need a better way to
   return
 the
 results in WKT instead of WKB? One option would be to enhance the
 addEditableLayer method to allow the creation of WKT layers
 instead
   of
WKB
 layers, so the internal representation is more internet friendly.

 I've just added untested support for setting the format to WKT
 for
   the
 internal representation of the editable layer in the REST
  interface.
This
 is
 untested (outside of my usual unit tests, that is), and is only
 in
   the
 trunk
 of neo4j-spatial, but you are welcome to try it out and see what
happens.

 Regards, Craig

 On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn 
  bo...@popcha.com
 wrote:

  Hi Craig,
 
  Thanks so much for this reply. It is very insightful. Is it
   possible
for
 me
  to implement the LineString geometries and lookups using REST?
 
  Many thanks!
 
  On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner 
 cr...@amanzi.com
  
 wrote:
 
   OK. I understand much better what you want now.
 

Re: [Neo4j] Data Federation

2011-07-06 Thread Jim Webber
Hi John,

 But if I try to do a distributed join, aren't I hit with having to transfer
 more data over the wire?  

Yes you're right - that's one penalty of having a graph distributed. Each time 
you hit a relationship that crosses a machine, the latency is way higher than 
if you were traversing locally within an instance.

 I am not sure if we need auto sharding.  My data is already in place in
 legacy systems.  

OK, then that's better - you've already sharded. But if your data is already 
housed in legacy systems, you'd have to export it into Neo4j since Neo4j is a 
database and not a triple store API layer.

 I am no expert by any means, but my understanding is that Map-reduce is for
 data that is not interconnected.  That is, you run each map completely
 independently on each shard.

Map reduce can take its data from anywhere (in theory). But map-reduce is a 
batch oriented programming pattern (with Hadoop being a popular implementation 
of that pattern), whereas neo4j is a database - a box of tricks that allows you 
to store and retrieve highly connected data efficiently.

But now I've puzzled myself - I get the sense that you might well do your 
processing in Hadoop rather than export data into Neo4j and then process it as 
separate graphs.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Start script of 1.4.M05/1.4.M06 fails with older versions of bash + FIX

2011-07-06 Thread Jim Webber
Thanks for that Stephan,

I've dropped it into our QA backlog for the 1.4 GA release.

Jim

On 6 Jul 2011, at 12:06, Stephan Hagemann wrote:

 Hi everyone,
 
 a rather old linux installation on our build server led us to find out that
 the new start script introduced in M05 (?) does not work with all versions
 of bash.
 
 We got:
 cruise:/virtual/hudson/hudson_home/jobs/graphdb/workspace#
 /opt/neo4j/bin/neo4j start
 /opt/neo4j/bin/neo4j: line 37: syntax error in conditional expression:
 unexpected token `('
 /opt/neo4j/bin/neo4j: line 37: syntax error near `^(['
 /opt/neo4j/bin/neo4j: line 37: `  if [[ ${line} =~ ^([^#\s][^=]+)=(.+)$
 ]]; then'
 
 On a system with this info:
 ext-xecruise52-1:/opt/neo4j/bin# cat /proc/version
 Linux version 2.6.28.7-ibm-x3650 (root@obc-fai42-1) (gcc version 4.1.2
 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Feb 26 13:50:31 CET 2009
 
 Here is the quick fix I just found (no patch, since I don't want to suggest
 I know it works on other system versions...).
 Enclose the regexps on line 37ff in quotes as so:
 
  if [[ ${line} =~ ^([^#\s][^=]+)=(.+)$ ]]; then
key=`echo ${BASH_REMATCH[1]} | sed 's/\./_/g'`
value=${BASH_REMATCH[2]}
if [[ ${key} =~ ^(.*)_([0-9]+)$ ]]; then
 
 
 Cheers,
 Stephan
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Jim Webber
Hi Rick,

 Are you thinking maybe of lazily loading relationships in 1.5?  That might be 
 a huge boost.

Added to the backlog to be discussed for inclusion in 1.5.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Michael Hunger
Andrew,

can you by chance share you graph-db or perhaps your generator script? Then we 
could evaluate that and see where the performance hit occurs.

Neo4j-shell checks the connectedness of the graph so that you can't get lost 
just while navigating.

Could you try to use cd -a 1 (this does absolute jumps w/o checking 
connectedness).

Are those logs you showed from neoclipse as well, or in messages.log in the 
graph-db directory? 

The unable to memory map sounds not so good, that shouldn't be a problem in 
Ubuntu.

Cheers,

Michael

Am 06.07.2011 um 16:59 schrieb Andrew White:

 This is consistently slow. I made a graph which just goes off of the 
 root reference node (0) and I am seeing the following...
 
(0)$ cd 1 about 1 minute
(1)$ cd 0 instant
(0)$ cd 1 about 1 minute
 
 
 It's almost like it is scanning the entire relationship list before 
 actually looking up the next node. Of note I have found the following 
 when running neoclipse...
 
WARNING: [/path/to/neo4j-db/neostore.relationshipstore.db] Unable
to memory map
 
 
 And I see this in the logs...
 
neostore.nodestore.db.mapped_memory=20M
neostore.propertystore.db.arrays.mapped_memory=130M
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.relationshipstore.db.mapped_memory=100M
 
 Am I missing something obvious? Even without memory maps, I would expect 
 this to be somewhat faster since reading 156MB (the size of my 
 neostore.relationshipstore.db file) of relation data should be very 
 fast. Also, is there anyway to do a pre-warm up so that the first hit 
 isn't so slow? I would hate for my first user in PROD to get hammered 
 because a cache wasn't warmed up.
 
 Thanks,
 Andrew
 
 
 On 07/06/2011 09:24 AM, Rick Bullotta wrote:
 Hi, Andrew.
 
 In general, this scenario (1 million+ relationships on a node) can be slow, 
 but usually only the first time you access the node.  If you're only 
 accessing the node once in a session, then yes, it will seem sluggish.  The 
 Neoclipse issue is probably a combination of two issues: the first is lazily 
 loading the node information the first time, and the second is the visual 
 rendering of all those relationships.
 
 Rick
 
 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
 Behalf Of Andrew White
 Sent: Wednesday, July 06, 2011 10:15 AM
 To: user@lists.neo4j.org
 Subject: [Neo4j] Performance issue on nodes with lots of relationships
 
 I have a graph with roughly 10M nodes. Some of these nodes are highly
 connected to other nodes. For example I may have a single node with 1M+
 relationships. A good analogy is a population that has a  lives-in
 relationship to a state. Now the problem...
 
 Both neoclipse or neo4j-shell are terribly slow when working with these
 nodes. In the shell I would expect a `cdnode-id` to be very fast,
 much like selecting via a rowid in a standard DB. Instead, I usually see
 several seconds delay. Doing a `ls` takes so long that I usually have to
 just kill the process. In fact `ls` never outputs anything which is odd
 since I would expect it to stream the output as it found it. I have
 very similar performance issues with neoclipse.
 
 I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
 Disclaimer, I am new to Neo4j.
 
 Thanks,
 Andrew
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
Logs are attached. I am using the Sun 64bit HotSpot JVM (see logs). For 
this particular graph I simply have a single root reference node (0) and 
millions of nodes with a 1:1 relationship with the root. For all 
intents, this version of the graph is like a flat table with all 
elements sharing the same parent. This is the simplest graph I could 
construct that will eventually represent a sub graph in a more complex 
system.


Some file sizes for the db store are...

  43M  neostore.nodestore.db
 424M  neostore.propertystore.db
 193M  neostore.propertystore.db.arrays
 1.1K  neostore.propertystore.db.index
 1.1K  neostore.propertystore.db.index.keys
 238M  neostore.propertystore.db.strings
 156M  neostore.relationshipstore.db
   10  neostore.relationshiptypestore.db
  129  neostore.relationshiptypestore.db.names

Andrew

On 07/06/2011 12:03 PM, Michael Hunger wrote:

Ok, then it is checking the connectedness which actually really traverses all 
the relationships between the current and the target node.

Could you share the whole messages.log file from that graph store?

Which JVM are you running?

If you can't share the db, could you please describe the structure of the 
graph, so which category of nodes has what number of (types of) relationships 
to which others?

Also does your node 0 contain the many rels or the node with the id 1 ?

Cheers

Michael

Am 06.07.2011 um 18:48 schrieb Andrew White:


When using `cd -a` it is indeed very fast. As to the logs, those where
from messages.log.

Sharing the graph-db would be tough considering I am generating this
graph off of several GB of data and my local upload is very limited. Any
hints on the memory map issue are welcomed too.

Thanks for all of your help so far. I am going to try/reply to the other
recommendations in other e-mails soonish.

Andrew

On 07/06/2011 11:32 AM, Michael Hunger wrote:

Andrew,

can you by chance share you graph-db or perhaps your generator script? Then we 
could evaluate that and see where the performance hit occurs.

Neo4j-shell checks the connectedness of the graph so that you can't get lost 
just while navigating.

Could you try to use cd -a 1 (this does absolute jumps w/o checking 
connectedness).

Are those logs you showed from neoclipse as well, or in messages.log in the 
graph-db directory?

The unable to memory map sounds not so good, that shouldn't be a problem in 
Ubuntu.

Cheers,

Michael

Am 06.07.2011 um 16:59 schrieb Andrew White:


This is consistently slow. I made a graph which just goes off of the
root reference node (0) and I am seeing the following...

(0)$ cd 1about 1 minute
(1)$ cd 0instant
(0)$ cd 1about 1 minute


It's almost like it is scanning the entire relationship list before
actually looking up the next node. Of note I have found the following
when running neoclipse...

WARNING: [/path/to/neo4j-db/neostore.relationshipstore.db] Unable
to memory map


And I see this in the logs...

neostore.nodestore.db.mapped_memory=20M
neostore.propertystore.db.arrays.mapped_memory=130M
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.strings.mapped_memory=130M
neostore.relationshipstore.db.mapped_memory=100M

Am I missing something obvious? Even without memory maps, I would expect
this to be somewhat faster since reading 156MB (the size of my
neostore.relationshipstore.db file) of relation data should be very
fast. Also, is there anyway to do a pre-warm up so that the first hit
isn't so slow? I would hate for my first user in PROD to get hammered
because a cache wasn't warmed up.

Thanks,
Andrew


On 07/06/2011 09:24 AM, Rick Bullotta wrote:

Hi, Andrew.

In general, this scenario (1 million+ relationships on a node) can be slow, but 
usually only the first time you access the node.  If you're only accessing the 
node once in a session, then yes, it will seem sluggish.  The Neoclipse issue 
is probably a combination of two issues: the first is lazily loading the node 
information the first time, and the second is the visual rendering of all those 
relationships.

Rick

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Andrew White
Sent: Wednesday, July 06, 2011 10:15 AM
To: user@lists.neo4j.org
Subject: [Neo4j] Performance issue on nodes with lots of relationships

I have a graph with roughly 10M nodes. Some of these nodes are highly
connected to other nodes. For example I may have a single node with 1M+
relationships. A good analogy is a population that has a  lives-in
relationship to a state. Now the problem...

Both neoclipse or neo4j-shell are terribly slow when working with these
nodes. In the shell I would expect a `cdnode-id` to be very fast,
much like selecting via a rowid in a standard DB. Instead, I usually see
several seconds delay. Doing 

[Neo4j] Setting up a Cluster and querying

2011-07-06 Thread Christian Godde
Hi there,

I am quite a newbie with neo4j and I hope somebody can help me.

I want to set up a Cluster with 6 Servers and a few Coordinators
(can a Server at the same time be a Coordinator?).
Theoretically the setting up of this cluster is more or less clear to me.
But the big question for me is:
How do I query this cluster?
So that I don't communicate with a single server all the time, but the
server with the lowest load at this time.

I hope you know what I mean.

Regards,
Christian
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Mattias Persson
2011/7/6 Jim Webber j...@neotechnology.com

 Hi Rick,

  Are you thinking maybe of lazily loading relationships in 1.5?  That
 might be a huge boost.

 Added to the backlog to be discussed for inclusion in 1.5.


Neo4j _is_ lazily loading relationships... and have done since before 1.0.
Maybe there's some issue with the shell only.


 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Mattias Persson
Just noticed that ls shell reads all relationships before displaying
them... I'll fix this tomorrow.

2011/7/6 Mattias Persson matt...@neotechnology.com



 2011/7/6 Jim Webber j...@neotechnology.com

 Hi Rick,

  Are you thinking maybe of lazily loading relationships in 1.5?  That
 might be a huge boost.

 Added to the backlog to be discussed for inclusion in 1.5.


 Neo4j _is_ lazily loading relationships... and have done since before 1.0.
 Maybe there's some issue with the shell only.


 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Unable to upgrade neostore

2011-07-06 Thread Paul A. Jackson
I did not. If this is what is required then you have answered my question.  
Thanks.

-Paul

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Adriano Henrique de Almeida
Sent: Tuesday, July 05, 2011 10:59 PM
To: Neo4j user discussions
Subject: Re: [Neo4j] Unable to upgrade neostore

Paul,

Did you try to upgrade to 1.2, then to 1.3 and then to 1.4 before going from
the 1.1 straight to the 1.4?

Regards

2011/7/5 Paul A. Jackson paul.jack...@pb.com

 I have a neo4j 1.1 graph that I tried opening with 1.4M5. I had a
 configuration that contained allow_store_upgrade=true:
 [15] = {java.util.HashMap$Entry@12374} allow_store_upgrade - true
   key: java.lang.String = {java.lang.String@12376}allow_store_upgrade
   value: java.lang.String = {java.lang.String@12380}true

 And I get this exception:
 jvm 1| Caused by: org.neo4j.graphdb.TransactionFailureException: Could
 not create data source [nioneodb], see nested
 exception for cause of error
 jvm 1|  at
 org.neo4j.kernel.impl.transaction.TxModule.registerDataSource(TxModule.java:153)
 jvm 1|  at
 org.neo4j.kernel.GraphDbInstance.start(GraphDbInstance.java:111)
 jvm 1|  at
 org.neo4j.kernel.EmbeddedGraphDbImpl.init(EmbeddedGraphDbImpl.java:189)
 jvm 1|  at
 org.neo4j.kernel.EmbeddedGraphDatabase.init(EmbeddedGraphDatabase.java:79)
 jvm 1|  at
 com.g1.dcg.graph.neo4j.NeoGraph.init(NeoGraph.java:118)
 jvm 1|  ... 12 more
 jvm 1| Caused by:
 org.neo4j.kernel.impl.nioneo.store.IllegalStoreVersionException: Store
 version [NeoStore v0.9.5].
 Please make sure you are not running old Neo4j kernel on a store that has
 been created by newer version of Neo4j.
 jvm 1|  at
 org.neo4j.kernel.impl.nioneo.store.NeoStore.versionFound(NeoStore.java:431)
 jvm 1|  at
 org.neo4j.kernel.impl.nioneo.store.AbstractStore.loadStorage(AbstractStore.java:147)
 jvm 1|  at
 org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.init(CommonAbstractStore.java:170)
 jvm 1|  at
 org.neo4j.kernel.impl.nioneo.store.AbstractStore.init(AbstractStore.java:120)
 jvm 1|  at
 org.neo4j.kernel.impl.nioneo.store.NeoStore.init(NeoStore.java:65)
 jvm 1|  at
 org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaDataSource.init(NeoStoreXaDataSource.java:132)
 jvm 1|  at
 sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 jvm 1|  at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 jvm 1|  at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 jvm 1|  at
 java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 jvm 1|  at
 org.neo4j.kernel.impl.transaction.XaDataSourceManager.create(XaDataSourceManager.java:75)
 jvm 1|  at
 org.neo4j.kernel.impl.transaction.TxModule.registerDataSource(TxModule.java:147)
 jvm 1|  ... 16 more

 My main question is whether this is supported or I am doing something
 wrong. I don't really need to support the upgrade of version 1.1 databases,
 but I want to make sure my code is correct so that I will be able to support
 upgrades in the future.

 Thanks.

 Paul Jackson

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Adriano Almeida
Caelum | Ensino e Inovação
www.caelum.com.br
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Neo4jPHP

2011-07-06 Thread Josh Adell
Hey all,

I've been working on another PHP client for Neo4j.  I think it's ready
for some real-life testing, and I'm interested to see what you all
think.
GitHub: https://github.com/jadell/Neo4jPHP
Download: https://github.com/jadell/Neo4jPHP/tarball/0.0.1-beta

Features:
- Developed against the Neo4j 1.4 milestone releases
- Simple, object-oriented API
- Almost complete REST API coverage
- Indexing of nodes and relationships, including exact match and query support
- Cypher queries (thanks to Jacob Hansson)
- Traversal support, including paged traversals
- Lazy-loading of node and relationship data

Hopefully coming soon:
- Client-side caching
- Batch operations

There are some usage examples included.

It's a beta release, so please be gentle (on me, that is; be as rough
as you want with the code.)  If anyone finds any bugs or has feature
requests, please use the GitHub issues page at
https://github.com/jadell/Neo4jPHP/issues

Thanks and enjoy!

-- Josh Adell
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Setting up a Cluster and querying

2011-07-06 Thread David Montag
Hi Christian,

Please see http://docs.neo4j.org/chunked/1.4.M06/ha.html for info on Neo4j
HA. You can run a coordinator and a Neo4j server on the same machines.
That's a common setup. As for how to query it, answering that requires some
more explanation about how Neo4j can be run.

Neo4j can be used in two deployment modes: embedded in a Java process, or
stand-alone server. The server however internally runs an embedded instance.
See http://docs.neo4j.org/chunked/1.4.M06/deployment-scenarios.html for more
information on this.

In an HA environment, a stand-alone server would be accessed over HTTP via
the REST API[1]. You can also write custom extensions[2] in order to deploy
Java code on the server so that you can build your own domain-specific query
API. If you're not using the stand-alone server, but instead using embedded
Neo4j in e.g. a web application deployed on Tomcat, then the API you expose
from your webapp is completely up to you. Internally it then uses an
embedded Neo4j instance, where you have full access to the Java API. In
addition to these options, you can also use our new query language,
Cypher[3]. You can try it out from the web administration interface of the
stand-alone server.

When setting up a Neo4j HA cluster, you typically also configure a load
balancer in front of the cluster. The load balancer can use any method it
desires to distribute the requests to the machines in the cluster. The load
balancer is however not included in the Neo4j distribution -- it is
something the user needs to provide. You could look into the Apache HTTP
Server or HAProxy.

Hope that answers some of your questions.

David

[1] http://docs.neo4j.org/chunked/1.4.M06/rest-api.html
[2] http://docs.neo4j.org/chunked/1.4.M06/server-plugins.html,
http://docs.neo4j.org/chunked/1.4.M06/server-unmanaged-extensions.html
[3] http://docs.neo4j.org/chunked/1.4.M06/cypher-query-lang.html

On Wed, Jul 6, 2011 at 11:22 AM, Christian Godde 
christian.go...@googlemail.com wrote:

 Hi there,

 I am quite a newbie with neo4j and I hope somebody can help me.

 I want to set up a Cluster with 6 Servers and a few Coordinators
 (can a Server at the same time be a Coordinator?).
 Theoretically the setting up of this cluster is more or less clear to me.
 But the big question for me is:
 How do I query this cluster?
 So that I don't communicate with a single server all the time, but the
 server with the lowest load at this time.

 I hope you know what I mean.

 Regards,
 Christian
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
David Montag david.mon...@neotechnology.com
Neo Technology, www.neotechnology.com
Cell: 650.556.4411
Skype: ddmontag
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Indexed relationships

2011-07-06 Thread Niels Hoogeveen

Pushed SortedTree to Git after adding a unit test and doing some debugging.
TODO:Add API for indexed relationships using SortedTree as the 
implementation.Make SortedTree thread safe.
With regard to the latter issue. I am considering the following solution. 
Acquire a lock (delete a non existent property) on the node that points to the 
root of the tree at the start of AddNode, RemoveNode and Delete. No other node 
in the SortedTree is really stable, even the rootnode may be moved down, 
turning another node into the new rootnode, while after a couple of remove 
actions the original rootnode may even be deleted. 
Locking the node pointing to the rootnode, prevents all other 
threads/transactions from updating the tree. This may seem restrictive, but a 
single new entry or a single removal may in fact have impact on much of the 
tree, due to balancing. More selective locking would require a prebalancing 
tree walk, determining the affected subtrees, lock them and once every affected 
subtree is locked, perform the actual balancing. 
Please let me hear if there are any objections to locking the node pointing to 
the tree as the a solution to make SortedTree thread safe.
Niels

 Date: Tue, 5 Jul 2011 08:27:57 +0200
 From: neubauer.pe...@gmail.com
 To: user@lists.neo4j.org
 Subject: Re: [Neo4j] Indexed relationships
 
 Great work Nils!
 
 /peter
 
 Sent from my phone.
 On Jul 4, 2011 11:39 PM, Niels Hoogeveen pd_aficion...@hotmail.com
 wrote:
 
  Made some more changes to the SortedTree implementation. Previously
 SortedTree would throw an exception if a duplicate entry was being added.
  I changed SortedTree to allow a key to point to more than one node, unless
 the SortedTree is created as a unique index, in which case an exception is
 raised when an attempt is made to add a node to an existing key entry.
  A SortedTree once defined as unique can not be changed to a non-unique
 index or vice-versa.
  SortedTrees now have a name, which is stored in the a property of the
 TREE_ROOT relationship and in the KEY_VALUE relationship (a new relationship
 that points from the SortedTree to the Node inserted in the SortedTree). The
 name of a SortedTree can not be changed.
  SortedTrees now store the class of the Comparator, so a SortedTree, once
 created, can not be used with a different Comparator.
  SortedTree is now an Iterable, making it possible to use it in a
 foreach-loop.
  Since there are as of yet, no unit tests for SortedTree, I will create
 those first before pushing my changes to Git. Preliminary results so far are
 good. I integrated the changes in my own application and it seems to work
 fine.
  Todo:
  Decide on an API for indexed relationships. (Community input still
 welcome).Write unit tests.Make SortedTree thread safe (Community help still
 welcome).
  Niels
 
  From: pd_aficion...@hotmail.com
  To: user@lists.neo4j.org
  Date: Mon, 4 Jul 2011 15:49:45 +0200
  Subject: Re: [Neo4j] Indexed relationships
 
 
  I forgot to add another recurrent issue that can be solved with indexed
 relationships: guaranteed unicity constraints.
   From: pd_aficion...@hotmail.com
   To: user@lists.neo4j.org
   Date: Mon, 4 Jul 2011 01:55:08 +0200
   Subject: [Neo4j] Indexed relationships
  
  
   In the thread [Neo4j] traversing densely populated nodes we discussed
 the problems arising when large numbers of relationships are added to the
 same node.
   Over the weekend, I have worked on a solution for the
 dense-relationship-nodes using SortedTree in the neo-graph-collections
 component. After some minor tweaks to the implementation of SortedTree, I
 have managed to get a workable solution, where two nodes are not directly
 linked by a relationship, but by means of a BTree (entirely stored in the
 graph).
   Before continuing this work, I'd like to have a discussion about
 features, since what we have now is not just a solution for the dense
 populated node issue, but is actually a full fledges indexed relationship,
 which makes it suitable for other purposes too.
   An indexed relationship can for example be used to maintain a sorted
 set of relationships in the graph, that is not necessarily huge, but large
 enough to make sorting on internal memory too expensive an operation, or
 situations where only one out of a large number of relationships is actually
 traversed in most cases.
   There are probably more use cases for in-graph indexed relationships,
 so I'd like to know what features are desirable and what API would Neo4J
 users appreciate.
   P.S. I still think it would be good to consider, if technically
 possible, partitioning the relationship store per relationship type and per
 direction. The indexed relationship solution works, but is of course slower
 than a direct relationship, both with respect to insert time and traversal
 time. If dense relationships are never traversed going out of the dense
 node, the extra structure maintained by the BTree is only extra burden.
   P.P.S. If there are people 

Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
I am on a standard filesystem (ext4). I haven't seen the issue again 
today so I wonder if it was a fluke.

Andrew

On 07/06/2011 12:29 PM, Paul Bandler wrote:
 Any hints on the memory map issue are welcomed too.
 I experienced that on Solaris when I'd placed the db on a filesystem that 
 didn't support memory mapped I/o such as NFS

 Sent from my iPhone

 On 6 Jul 2011, at 17:48, Andrew Whiteli...@andrewewhite.net  wrote:

 Any
 hints on the memory map issue are welcomed too.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] t= new Table() in Gremlin

2011-07-06 Thread Boris Kizelshteyn
Greetings! I am using stable 1.3 and when I issue t = new Table() in the
gremlin shell I get:


   - gremlin t = new Table();
   - == startup failed:
   - == groovysh_evaluate: 26: unable to resolve class Table
   - ==  @ line 26, column 5.
   - ==t = new Table();

What am I doing wrong?

Thanks!
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] t= new Table() in Gremlin

2011-07-06 Thread Michael Hunger
Boris,

I think, 1.3 uses a much older version of gremlin which didn't get the
Table result type yet? 

Please pull 1.4.M06 and try to use it there to see if it works in the current 
version.

Cheers

Michael

Am 07.07.2011 um 04:34 schrieb Boris Kizelshteyn:

 Greetings! I am using stable 1.3 and when I issue t = new Table() in the
 gremlin shell I get:
 
 
   - gremlin t = new Table();
   - == startup failed:
   - == groovysh_evaluate: 26: unable to resolve class Table
   - ==  @ line 26, column 5.
   - ==t = new Table();
 
 What am I doing wrong?
 
 Thanks!
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
I just tested with 1.4.M06 and performance seems about the same. Also, 
only the supernodes are affected, the child nodes are very fast.

On 07/06/2011 09:31 AM, Michael Hunger wrote:
 Andrew,

 could you please also try to access the graph via the latest Milestone 
 1.4.M06 to see if things have improved.

 Does this behaviour only effect the supernodes or every node in your graph 
 (e.g. when you access, cd, ls a person-node?)

 We've been discussing some changes to the initial loading/caching that might 
 improve performance on heavily connected (super-)nodes.

 If our changes and tests are successful these change will be integrated in 
 early 1.5. Milestones.

 Cheers

 Michael

 Am 06.07.2011 um 16:15 schrieb Andrew White:

 I have a graph with roughly 10M nodes. Some of these nodes are highly
 connected to other nodes. For example I may have a single node with 1M+
 relationships. A good analogy is a population that has a  lives-in
 relationship to a state. Now the problem...

 Both neoclipse or neo4j-shell are terribly slow when working with these
 nodes. In the shell I would expect a `cdnode-id` to be very fast,
 much like selecting via a rowid in a standard DB. Instead, I usually see
 several seconds delay. Doing a `ls` takes so long that I usually have to
 just kill the process. In fact `ls` never outputs anything which is odd
 since I would expect it to stream the output as it found it. I have
 very similar performance issues with neoclipse.

 I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
 Disclaimer, I am new to Neo4j.

 Thanks,
 Andrew
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread Andrew White
Here is some interesting stats to consider. First, I split my nodes into 
two groups, one node with 1.4M children and the other with 3.4M 
children. While I do see some cache warm-up improvements, the 
transversal doesn't seem to scale linearly; ie the larger super-node has 
2.4x more children but takes 17x longer to transverse.

neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
+--+
| count(r) |
+--+
| 1468486  |
+--+
1 rows, 25724 ms
neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
+--+
| count(r) |
+--+
| 1468486  |
+--+
1 rows, 19763 ms

neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
+--+
| count(r) |
+--+
| 3472174  |
+--+
1 rows, 565448 ms
neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
+--+
| count(r) |
+--+
| 3472174  |
+--+
1 rows, 337975 ms

Any ideas on this?
Andrew

On 07/06/2011 09:55 AM, Peter Neubauer wrote:
 Andrew,
 if you upgrade to 1.4.M06, your shell should be able to do Cypher in
 order to count the relationships of a node, not returning them:

 start n=(1) match (n)-[r]-(x) return count(r)

 and try that several times to see if cold caches are initially slowing
 down things.

 or something along these lines. In the LS and Neoclipse the output and
 visualization will be slow for that amount of data.

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Wed, Jul 6, 2011 at 4:15 PM, Andrew Whiteli...@andrewewhite.net  wrote:
 I have a graph with roughly 10M nodes. Some of these nodes are highly
 connected to other nodes. For example I may have a single node with 1M+
 relationships. A good analogy is a population that has a  lives-in
 relationship to a state. Now the problem...

 Both neoclipse or neo4j-shell are terribly slow when working with these
 nodes. In the shell I would expect a `cdnode-id` to be very fast,
 much like selecting via a rowid in a standard DB. Instead, I usually see
 several seconds delay. Doing a `ls` takes so long that I usually have to
 just kill the process. In fact `ls` never outputs anything which is odd
 since I would expect it to stream the output as it found it. I have
 very similar performance issues with neoclipse.

 I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
 Disclaimer, I am new to Neo4j.

 Thanks,
 Andrew
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance issue on nodes with lots of relationships

2011-07-06 Thread David Montag
Hi Andrew,

How big is your configured Java heap? It could be that all the nodes and
relationships don't fit into the cache.

David

On Wed, Jul 6, 2011 at 8:03 PM, Andrew White li...@andrewewhite.net wrote:

 Here is some interesting stats to consider. First, I split my nodes into
 two groups, one node with 1.4M children and the other with 3.4M
 children. While I do see some cache warm-up improvements, the
 transversal doesn't seem to scale linearly; ie the larger super-node has
 2.4x more children but takes 17x longer to transverse.

 neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
 +--+
 | count(r) |
 +--+
 | 1468486  |
 +--+
 1 rows, 25724 ms
 neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
 +--+
 | count(r) |
 +--+
 | 1468486  |
 +--+
 1 rows, 19763 ms

 neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
 +--+
 | count(r) |
 +--+
 | 3472174  |
 +--+
 1 rows, 565448 ms
 neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
 +--+
 | count(r) |
 +--+
 | 3472174  |
 +--+
 1 rows, 337975 ms

 Any ideas on this?
 Andrew

 On 07/06/2011 09:55 AM, Peter Neubauer wrote:
  Andrew,
  if you upgrade to 1.4.M06, your shell should be able to do Cypher in
  order to count the relationships of a node, not returning them:
 
  start n=(1) match (n)-[r]-(x) return count(r)
 
  and try that several times to see if cold caches are initially slowing
  down things.
 
  or something along these lines. In the LS and Neoclipse the output and
  visualization will be slow for that amount of data.
 
  Cheers,
 
  /peter neubauer
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org   - Your high performance graph
 database.
  http://startupbootcamp.org/- Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Wed, Jul 6, 2011 at 4:15 PM, Andrew Whiteli...@andrewewhite.net
  wrote:
  I have a graph with roughly 10M nodes. Some of these nodes are highly
  connected to other nodes. For example I may have a single node with 1M+
  relationships. A good analogy is a population that has a  lives-in
  relationship to a state. Now the problem...
 
  Both neoclipse or neo4j-shell are terribly slow when working with these
  nodes. In the shell I would expect a `cdnode-id` to be very fast,
  much like selecting via a rowid in a standard DB. Instead, I usually see
  several seconds delay. Doing a `ls` takes so long that I usually have to
  just kill the process. In fact `ls` never outputs anything which is odd
  since I would expect it to stream the output as it found it. I have
  very similar performance issues with neoclipse.
 
  I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
  Disclaimer, I am new to Neo4j.
 
  Thanks,
  Andrew
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
David Montag david.mon...@neotechnology.com
Neo Technology, www.neotechnology.com
Cell: 650.556.4411
Skype: ddmontag
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] t= new Table() in Gremlin

2011-07-06 Thread Marko Rodriguez
Hi,

To further clarify, Table is provided in Gremlin 1.1+.

To check the version of Gremlin, do GremlinTokens.VERSION.

Good luck,
Marko.

http://markorodriguez.com

On Jul 6, 2011, at 8:40 PM, Michael Hunger wrote:

 Boris,
 
 I think, 1.3 uses a much older version of gremlin which didn't get the
 Table result type yet? 
 
 Please pull 1.4.M06 and try to use it there to see if it works in the current 
 version.
 
 Cheers
 
 Michael
 
 Am 07.07.2011 um 04:34 schrieb Boris Kizelshteyn:
 
 Greetings! I am using stable 1.3 and when I issue t = new Table() in the
 gremlin shell I get:
 
 
  - gremlin t = new Table();
  - == startup failed:
  - == groovysh_evaluate: 26: unable to resolve class Table
  - ==  @ line 26, column 5.
  - ==t = new Table();
 
 What am I doing wrong?
 
 Thanks!
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user