[Neo4j] On twice?

2010-08-01 Thread Tom Smith
Small point, somehow I seem to be on neo4j list twice, getting two of 
everything... twice...

Tom







___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] On twice?

2010-08-01 Thread Peter Neubauer
Tom,
let me sort that out tomorrow ...

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Sun, Aug 1, 2010 at 10:21 AM, Tom Smith tas...@york.ac.uk wrote:
 Small point, somehow I seem to be on neo4j list twice, getting two of 
 everything... twice...

 Tom







 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Stumped by performance issue in traversal - would take a month to run!

2010-08-01 Thread Martin Neumann
Hi,
there are some environmental optimizations you can do to speed things up.
Neo4j is stored as a graph on disk, so traversal translate to moving
the cursor on the hard drive if the data was not in RAM. For good
performance you need a fast hd (flash drive would do best).
Deleting lots of nodes can create holes in the db, so read operations have
to move longer physical distance on the had drive then necessary. Only way I
am aware of to get rid of holes reliably is to copy the DB into a fresh
clean Neo4j DB.

cheers Martin



On Fri, Jul 30, 2010 at 8:10 PM, Jeff Klann jkl...@iupui.edu wrote:

 Hi, so I got 2GB more RAM and noticed that after adding some more memory
 map
 and increasing the heap space, my small query went from 6hrs to 3min. Quite
 reasonable!

 But the larger one that would take a month would still take a month. So
 I've
 been performance testing parts of it:

 The algorithm as in my first post showed *no* performance improvement on
 more RAM.
 But individual parts
   - Traversing only (first three lines) was much speedier, but still seems
 slow. 1.5 million traversals (15 out of 7000 items) took 23sec. It shaves
 off a few seconds if I run this twice and time it the second time, or if I
 don't print any node properties as I traverse. (Does Neo4J load ALL the
 properties for a node if one is accessed?) Even with a double run and not
 reading node properties, it still takes 16sec, which would make traversal
 take two hours. I thought Neo4J was suppposed to do ~1m traversals/sec,
 this
 is doing about 100k. Why? (And in fact on the other query it was getting
 about 800,000 traversals/sec.) Is one of Traversers vs. getRelationship
 iterators faster when getting all relationships of a type at depth 1?
   - Searching for relationships between A  B (but not writing to them)
 takes it from 20s to 91s. Yuck. Maybe edge indexing is the way to avoid
 that?
   - Incrementing a property on the root node for every A  B takes it from
 20s to 61s (57s if it's all in one transaction). THAT seems weird. I
 imagine
 it has something to do with logging changes? Any way that can be turned off
 for a particular property (like it could be marked 'volatile' during a
 transaction or something)?

 I'm much more hopeful with the extra RAM but it's still kind of slow.
 Suggestions?

 Thanks,
 Jeff Klann

 On Wed, Jul 28, 2010 at 11:20 AM, Jeff Klann jkl...@iupui.edu wrote:

  Hi, I have an algorithm running on my little server that is very very
 slow.
  It's a recommendation traversal (for all A and B in the catalog of items:
  for each item A, how many customers also purchased another item in the
  catalog B). It's processed 90 items in about 8 hours so far! Before I
 dive
  deeper into trying to figure out the performance problem, I thought I'd
  email the list to see if more experienced people have ideas.
 
  Some characteristics of my datastore: it's size is pretty moderate for a
  database application. 7500 items, not sure how many customers and
 purchases
  (how can I find the size of an index?) but probably ~1 million customers.
  The relationshipstore + nodestore  500mb. (Propertystore is huge but I
  don't access it much in traversals.)
 
  The possibilities I see are:
 
  1) *Neo4J is just slow.* Probably not slower than Postgres which I was
  using previously, but maybe I need to switch to a distributed map-reduce
 db
  in the cloud and give up the very nice graph modeling approach? I didn't
  think this would be a problem, because my data size is pretty moderate
 and
  Neo4J is supposed to be fast.
 
  2) *I just need more RAM.* I definitely need more RAM - I have a measly
  1GB currently. But would this get my 20day traversal down to a few hours?
  Doesn't seem like it'd have THAT much impact. I'm running Linux and
 nothing
  much else besides Neo4j, so I've got 650m physical RAM. Using 300m heap,
  about 300m memory-map.
 
  3) *There's some secret about Neo4J performance I don't know.* Is there
  something I'm unaware that Neo4J is doing? When I access a property, does
 it
  load a chunk of properties I don't care about? For the current node/edge
 or
  others? I turned off log rotation and I commit after each item A. Are
 there
  other performance tips I might have missed?
 
  4) *My algorithm is inefficient.* It's a fairly naive algorithm and maybe
  there's some optimizations I can do. It looks like:
 
  For each item A in the catalog:
For each customer C that has purchased that item:
 For each item B that customer purchased:
Update the co-occurrence edge between AB.
 
(If the edge exists, add one to its weight. If it doesn't exist,
  create it with weight one.)
 
  This is O(n^2) worst case, but practically it'll be much better due to
 the
  sparseness of purchases. The large number of customers slows it down,
  though. The slowest part, I suspect, is the last line. It's a lot of
 finding
  and re-finding edges between As and Bs and updating the edge properties.
 I
  

Re: [Neo4j] Stumped by performance issue in traversal - would take a month to run!

2010-08-01 Thread Martin Neumann
Hi,
there are some environmental optimizations you can do to speed things up.
Neo4j is stored as a graph on disk, so traversal translate to moving
the cursor on the hard drive if the data was not in RAM. For good
performance you need a fast hd (flash drive would do best).
Deleting lots of nodes can create holes in the db, so read operations have
to move longer physical distance on the had drive then necessary. Only way I
am aware of to get rid of holes reliably is to copy the DB into a fresh
clean Neo4j DB.

cheers Martin



On Fri, Jul 30, 2010 at 8:10 PM, Jeff Klann jkl...@iupui.edu wrote:

 Hi, so I got 2GB more RAM and noticed that after adding some more memory
 map
 and increasing the heap space, my small query went from 6hrs to 3min. Quite
 reasonable!

 But the larger one that would take a month would still take a month. So
 I've
 been performance testing parts of it:

 The algorithm as in my first post showed *no* performance improvement on
 more RAM.
 But individual parts
   - Traversing only (first three lines) was much speedier, but still seems
 slow. 1.5 million traversals (15 out of 7000 items) took 23sec. It shaves
 off a few seconds if I run this twice and time it the second time, or if I
 don't print any node properties as I traverse. (Does Neo4J load ALL the
 properties for a node if one is accessed?) Even with a double run and not
 reading node properties, it still takes 16sec, which would make traversal
 take two hours. I thought Neo4J was suppposed to do ~1m traversals/sec,
 this
 is doing about 100k. Why? (And in fact on the other query it was getting
 about 800,000 traversals/sec.) Is one of Traversers vs. getRelationship
 iterators faster when getting all relationships of a type at depth 1?
   - Searching for relationships between A  B (but not writing to them)
 takes it from 20s to 91s. Yuck. Maybe edge indexing is the way to avoid
 that?
   - Incrementing a property on the root node for every A  B takes it from
 20s to 61s (57s if it's all in one transaction). THAT seems weird. I
 imagine
 it has something to do with logging changes? Any way that can be turned off
 for a particular property (like it could be marked 'volatile' during a
 transaction or something)?

 I'm much more hopeful with the extra RAM but it's still kind of slow.
 Suggestions?

 Thanks,
 Jeff Klann

 On Wed, Jul 28, 2010 at 11:20 AM, Jeff Klann jkl...@iupui.edu wrote:

  Hi, I have an algorithm running on my little server that is very very
 slow.
  It's a recommendation traversal (for all A and B in the catalog of items:
  for each item A, how many customers also purchased another item in the
  catalog B). It's processed 90 items in about 8 hours so far! Before I
 dive
  deeper into trying to figure out the performance problem, I thought I'd
  email the list to see if more experienced people have ideas.
 
  Some characteristics of my datastore: it's size is pretty moderate for a
  database application. 7500 items, not sure how many customers and
 purchases
  (how can I find the size of an index?) but probably ~1 million customers.
  The relationshipstore + nodestore  500mb. (Propertystore is huge but I
  don't access it much in traversals.)
 
  The possibilities I see are:
 
  1) *Neo4J is just slow.* Probably not slower than Postgres which I was
  using previously, but maybe I need to switch to a distributed map-reduce
 db
  in the cloud and give up the very nice graph modeling approach? I didn't
  think this would be a problem, because my data size is pretty moderate
 and
  Neo4J is supposed to be fast.
 
  2) *I just need more RAM.* I definitely need more RAM - I have a measly
  1GB currently. But would this get my 20day traversal down to a few hours?
  Doesn't seem like it'd have THAT much impact. I'm running Linux and
 nothing
  much else besides Neo4j, so I've got 650m physical RAM. Using 300m heap,
  about 300m memory-map.
 
  3) *There's some secret about Neo4J performance I don't know.* Is there
  something I'm unaware that Neo4J is doing? When I access a property, does
 it
  load a chunk of properties I don't care about? For the current node/edge
 or
  others? I turned off log rotation and I commit after each item A. Are
 there
  other performance tips I might have missed?
 
  4) *My algorithm is inefficient.* It's a fairly naive algorithm and maybe
  there's some optimizations I can do. It looks like:
 
  For each item A in the catalog:
For each customer C that has purchased that item:
 For each item B that customer purchased:
Update the co-occurrence edge between AB.
 
(If the edge exists, add one to its weight. If it doesn't exist,
  create it with weight one.)
 
  This is O(n^2) worst case, but practically it'll be much better due to
 the
  sparseness of purchases. The large number of customers slows it down,
  though. The slowest part, I suspect, is the last line. It's a lot of
 finding
  and re-finding edges between As and Bs and updating the edge properties.
 I
  

Re: [Neo4j] On twice?

2010-08-01 Thread Mattias Persson
I have the same issue

2010/8/1, Peter Neubauer peter.neuba...@neotechnology.com:
 Tom,
 let me sort that out tomorrow ...

 Cheers,

 /peter neubauer

 COO and Sales, Neo Technology

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Sun, Aug 1, 2010 at 10:21 AM, Tom Smith tas...@york.ac.uk wrote:
 Small point, somehow I seem to be on neo4j list twice, getting two of
 everything... twice...

 Tom







 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] different behavior of python bindings and Java API

2010-08-01 Thread Thomas Fenzl
Hi all,

I'm writing some learning tests to understand the python bindings to
neo4j. The code can be found at bitbucket (
https://bitbucket.org/another_thomas/learnneo4jpy) if anyone is interested.
Still working on basic stuff, I found a difference between the behavior
of neo4j-python and neo4j as described in the Java API.

I did not expect the following to work:

with graph.transaction as tx:
id = graph.node(name=foo).id
with graph.transaction as tx:
print graph.node[id]

as according the Java API, transactions should roll back unless
explicitly marked successful.
Neo4j-python used was head from svn, using jpype.

Is that different behavior on purpose?

Thanks,
Thomas
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Attributes or Relationship Check During Traversal

2010-08-01 Thread Alex D'Amour
Hello all,

I have a question regarding traversals over a large graph when that
traversal depends on a discretely valued attribute of the nodes being
traversed.

As a small example, the nodes in my graph can have 2 states -- on and off.
I'd like to traverse over paths that only consist of active nodes. Since
this state attributes can only take 2 values, I see two possible approaches
to implementing this:

1) Use node properties, and have the PruneEvaluator and filter Predicate
check to see whether the current endNode has a property called on.

2) Create a state node which represents the on state. Have all nodes that
are in the on state have a relationship of type STATE_ON incoming from the
on node. Have the PruneEvaluator and filter Predicate check whether the
node has a single relationship of type STATE_ON, INCOMING.

Which is closer to what we might consider best practices for Neo4j? The
problem I see in implementation 1 is that that traversal has to hit the
property store, which could slow things down. The problem with 2 is that
there can be up to #nodes relationships coming from the on state node, and
making this more efficient by setting up a tree of on state nodes seems to
be manually replicating something that the indexing service has already
accomplished.

Also, how efficiently would each of these two implementations exploit
caching (or is this irrelevant?)?

Finally, would your answer change if we generalized this to a larger number
of categories?

Thanks,
Alex
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j-spatial and jython

2010-08-01 Thread sima

hi Tobias,
right now I can use different querying functions of geoNeo4j, like for example 
for bbox:
...
sq = searchIntersectWindow(Envelope(...))
spatialIndex.executeSearch(sq)
...
I was wondering if you know about querying on other features and tags(other 
than coordinates) of the geodata. is that possible and if yes how...

thanks,
sima




From: SIMA lotfi lotfis...@yahoo.com
To: Tobias Ivarsson [via Neo4J User List] 
ml-node+983357-111945109-341...@n3.nabble.com
Sent: Thursday, July 22, 2010 9:08:55
Subject: Re: [Neo4j] neo4j-spatial and jython


thanks Tobias, it is working now, have been learning a lot abt jython and 
geo-neo4j lately :)

best,
sima

--- On Wed, 21/7/10, Tobias Ivarsson [via Neo4J User List] 
ml-node+983357-111945109-341...@n3.nabble.com wrote:


From: Tobias Ivarsson [via Neo4J User List] 
ml-node+983357-111945109-341...@n3.nabble.com
Subject: Re: [Neo4j] neo4j-spatial and jython
To: sima lotfis...@yahoo.com
Date: Wednesday, 21 July, 2010, 9:29 AM


Your code has at least two (these are the obvious ones) issues: 

1. You are trying to write Java in Python. This is Java: 
ShapefileImporter importer = new ShapefileImporter(graphdb); 
In Python you don't declare types, and you don't use 'new' to create new 
instances. 

2. You are mixing the python bindings with pure Java libraries. 
Since the python bindings and neo4j-spatial don't have any knowledge about 
each other you cannot use them together, you will have to use the pure Java 
API for Neo4j instead of the python wrappers. Something like this: 

from org.neo4j.kernel import EmbeddedGraphDatabase 
from org.neo4j.gis.spatial import ShapefileImport 
graphdb = EmbeddedGraphDatabase( GRAPH_STORE_DIR ) 
importer = ShapefileImporter(graphdb) 
... 

Cheers, 
Tobias 

On Wed, Jul 21, 2010 at 5:21 AM, sima [hidden email] wrote: 


 
 so i have this jython code to import a shapefile using the 
 ShapefileImporter 
 in neo4j-spatial 
 -- 
 import sys 
 import neo4j 
 sys.path.append('home/sima/Downloads/neo4j-spatial/src/main/java/') 
 from org.neo4j.gis.spatial import ShaefileImporter 
 
sys.path.append('home/sima/Downloads/neo4j-spatial/target/neo4j-spatial-0.1-SNAPSHOT.jar')
) 

 graphdb = 
 neo4j.GraphDatabase(sys.path.append('home/sima/Downloads/scripts/neo4j); 
 ShapefileImporter importer = new ShapefileImporter(graphdb); 
 #importer.importShapefile(roads.shp, layer_roads); 
 graphdb.shutdown(); 
 - 
 but i got this error : 
 
  File OSM2Neo4j.py, line 58 
ShapefileImporter importer = new ShapefileImporter(graphdb); 
 ^ 
 SyntaxError: mismatched input 'importer' expecting NEWLINE 
 - 
 im guessing it still can't recognize the ShapefileImporter  but i m not 
 sure 
 why? any one knows why? 
 
 
 thanks, 
 sima 
 -- 
 View this message in context: 
http://neo4j-user-list.438527.n3.nabble.com/Neo4j-neo4j-spatial-and-jython-tp970428p983205.html
l
 Sent from the Neo4J User List mailing list archive at Nabble.com. 
 ___ 
 Neo4j mailing list 
 [hidden email] 
 https://lists.neo4j.org/mailman/listinfo/user
 


-- 
Tobias Ivarsson [hidden email] 
Hacker, Neo Technology 
www.neotechnology.com 
Cellphone: +46 706 534857 
___ 
Neo4j mailing list 
[hidden email] 
https://lists.neo4j.org/mailman/listinfo/user



 
View message @ 
http://neo4j-user-list.438527.n3.nabble.com/Neo4j-neo4j-spatial-and-jython-tp970428p983357.html
 
To unsubscribe from Re: [Neo4j] neo4j-spatial and jython, click here. 
 



-- 
View this message in context: 
http://neo4j-user-list.438527.n3.nabble.com/Neo4j-neo4j-spatial-and-jython-tp970428p1014753.html
Sent from the Neo4J User List mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user