[Neo4j] Tell neo to not reuse ID's

2010-06-02 Thread Martin Neumann
Hej,

Is it somehow possible to tell Neo4j not to reuse id's at all?

Im running some experiments on Neo4j and I want to add and delete the
nodes and relationships. To make sure that I can repeat the same
experiment I create a log containing the ID's of the nodes i want to
delete. To make sure that I can rerun the experiment each node I add
has to have the same ID in each experiment.
If ID's can be reused that is not always the case thats why I need to
turn it off or work around it.

hope for your help
cheers Martin
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] memory consumption

2010-07-04 Thread Martin Neumann
Do you use the Batchinserter or a normal Transaction?
When using a normal Transaction to insert huge amounts of data I always
submit and create a new transaction every X Items. This keeps the
transaction small and reduces the memory used.

cheers Martin

On Sun, Jul 4, 2010 at 4:13 PM,  wrote:

> Hallo,
>
> I'm currently working with neo4j database and want to insert a bunch
> of data into it.
>
> At the very beginning the program works quite well. But as more data
> has been inserted into the database, the insertion runs more and more
> slowly and I noticed that the program consumes really a lot of memory.
> Even though I splitted the input file into small pieces so that each
> time the program tries only to insert a small part of data, the
> problem occurs. That means, as there exists already much data in the
> database, the program consumes a lot of memory as soon as it begins so
> that the insertion is so slow that it seems that it won't be able to
> finish.
>
> I wonder if there's some solutions to save the memory. Thanks in advance.
>
> Cheers,
> Qiuyan
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Read-only transactions?

2010-07-28 Thread Martin Neumann
If you use the latest development version of Neo4j you can do read
operations without a transaction. Especially for huge number of reads this
speeds things up allot.

cheers
Martin

On Wed, Jul 28, 2010 at 4:53 PM, Tim Jones  wrote:

> Hi,
>
> Is it possible to mark a transaction as being read-only? It's taking a
> while for
> my transaction to shut down, even though there are no writes to commit.
>
> Thanks,
> Tim
>
>
>
>
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Stumped by performance issue in traversal - would take a month to run!

2010-08-01 Thread Martin Neumann
Hi,
there are some environmental optimizations you can do to speed things up.
Neo4j is stored as a graph on disk, so traversal translate to moving
the cursor on the hard drive if the data was not in RAM. For good
performance you need a fast hd (flash drive would do best).
Deleting lots of nodes can create holes in the db, so read operations have
to move longer physical distance on the had drive then necessary. Only way I
am aware of to get rid of holes reliably is to copy the DB into a fresh
clean Neo4j DB.

cheers Martin



On Fri, Jul 30, 2010 at 8:10 PM, Jeff Klann  wrote:

> Hi, so I got 2GB more RAM and noticed that after adding some more memory
> map
> and increasing the heap space, my small query went from 6hrs to 3min. Quite
> reasonable!
>
> But the larger one that would take a month would still take a month. So
> I've
> been performance testing parts of it:
>
> The algorithm as in my first post showed *no* performance improvement on
> more RAM.
> But individual parts
>   - Traversing only (first three lines) was much speedier, but still seems
> slow. 1.5 million traversals (15 out of 7000 items) took 23sec. It shaves
> off a few seconds if I run this twice and time it the second time, or if I
> don't print any node properties as I traverse. (Does Neo4J load ALL the
> properties for a node if one is accessed?) Even with a double run and not
> reading node properties, it still takes 16sec, which would make traversal
> take two hours. I thought Neo4J was suppposed to do ~1m traversals/sec,
> this
> is doing about 100k. Why? (And in fact on the other query it was getting
> about 800,000 traversals/sec.) Is one of Traversers vs. getRelationship
> iterators faster when getting all relationships of a type at depth 1?
>   - Searching for relationships between A & B (but not writing to them)
> takes it from 20s to 91s. Yuck. Maybe edge indexing is the way to avoid
> that?
>   - Incrementing a property on the root node for every A & B takes it from
> 20s to 61s (57s if it's all in one transaction). THAT seems weird. I
> imagine
> it has something to do with logging changes? Any way that can be turned off
> for a particular property (like it could be marked 'volatile' during a
> transaction or something)?
>
> I'm much more hopeful with the extra RAM but it's still kind of slow.
> Suggestions?
>
> Thanks,
> Jeff Klann
>
> On Wed, Jul 28, 2010 at 11:20 AM, Jeff Klann  wrote:
>
> > Hi, I have an algorithm running on my little server that is very very
> slow.
> > It's a recommendation traversal (for all A and B in the catalog of items:
> > for each item A, how many customers also purchased another item in the
> > catalog B). It's processed 90 items in about 8 hours so far! Before I
> dive
> > deeper into trying to figure out the performance problem, I thought I'd
> > email the list to see if more experienced people have ideas.
> >
> > Some characteristics of my datastore: it's size is pretty moderate for a
> > database application. 7500 items, not sure how many customers and
> purchases
> > (how can I find the size of an index?) but probably ~1 million customers.
> > The relationshipstore + nodestore < 500mb. (Propertystore is huge but I
> > don't access it much in traversals.)
> >
> > The possibilities I see are:
> >
> > 1) *Neo4J is just slow.* Probably not slower than Postgres which I was
> > using previously, but maybe I need to switch to a distributed map-reduce
> db
> > in the cloud and give up the very nice graph modeling approach? I didn't
> > think this would be a problem, because my data size is pretty moderate
> and
> > Neo4J is supposed to be fast.
> >
> > 2) *I just need more RAM.* I definitely need more RAM - I have a measly
> > 1GB currently. But would this get my 20day traversal down to a few hours?
> > Doesn't seem like it'd have THAT much impact. I'm running Linux and
> nothing
> > much else besides Neo4j, so I've got 650m physical RAM. Using 300m heap,
> > about 300m memory-map.
> >
> > 3) *There's some secret about Neo4J performance I don't know.* Is there
> > something I'm unaware that Neo4J is doing? When I access a property, does
> it
> > load a chunk of properties I don't care about? For the current node/edge
> or
> > others? I turned off log rotation and I commit after each item A. Are
> there
> > other performance tips I might have missed?
> >
> > 4) *My algorithm is inefficient.* It's a fairly naive algorithm and maybe
> > there's some optimizations I can do. It looks like:
> >
> >> For each item A in the catalog:
> >>   For each customer C that has purchased that item:
> >>For each item B that customer purchased:
> >>   Update the co-occurrence edge between A&B.
> >>
> >   (If the edge exists, add one to its weight. If it doesn't exist,
> >> create it with weight one.)
> >>
> > This is O(n^2) worst case, but practically it'll be much better due to
> the
> > sparseness of purchases. The large number of customers slows it down,
> > though. The slowest part, I suspect, i

Re: [Neo4j] Stumped by performance issue in traversal - would take a month to run!

2010-08-01 Thread Martin Neumann
Hi,
there are some environmental optimizations you can do to speed things up.
Neo4j is stored as a graph on disk, so traversal translate to moving
the cursor on the hard drive if the data was not in RAM. For good
performance you need a fast hd (flash drive would do best).
Deleting lots of nodes can create holes in the db, so read operations have
to move longer physical distance on the had drive then necessary. Only way I
am aware of to get rid of holes reliably is to copy the DB into a fresh
clean Neo4j DB.

cheers Martin



On Fri, Jul 30, 2010 at 8:10 PM, Jeff Klann  wrote:

> Hi, so I got 2GB more RAM and noticed that after adding some more memory
> map
> and increasing the heap space, my small query went from 6hrs to 3min. Quite
> reasonable!
>
> But the larger one that would take a month would still take a month. So
> I've
> been performance testing parts of it:
>
> The algorithm as in my first post showed *no* performance improvement on
> more RAM.
> But individual parts
>   - Traversing only (first three lines) was much speedier, but still seems
> slow. 1.5 million traversals (15 out of 7000 items) took 23sec. It shaves
> off a few seconds if I run this twice and time it the second time, or if I
> don't print any node properties as I traverse. (Does Neo4J load ALL the
> properties for a node if one is accessed?) Even with a double run and not
> reading node properties, it still takes 16sec, which would make traversal
> take two hours. I thought Neo4J was suppposed to do ~1m traversals/sec,
> this
> is doing about 100k. Why? (And in fact on the other query it was getting
> about 800,000 traversals/sec.) Is one of Traversers vs. getRelationship
> iterators faster when getting all relationships of a type at depth 1?
>   - Searching for relationships between A & B (but not writing to them)
> takes it from 20s to 91s. Yuck. Maybe edge indexing is the way to avoid
> that?
>   - Incrementing a property on the root node for every A & B takes it from
> 20s to 61s (57s if it's all in one transaction). THAT seems weird. I
> imagine
> it has something to do with logging changes? Any way that can be turned off
> for a particular property (like it could be marked 'volatile' during a
> transaction or something)?
>
> I'm much more hopeful with the extra RAM but it's still kind of slow.
> Suggestions?
>
> Thanks,
> Jeff Klann
>
> On Wed, Jul 28, 2010 at 11:20 AM, Jeff Klann  wrote:
>
> > Hi, I have an algorithm running on my little server that is very very
> slow.
> > It's a recommendation traversal (for all A and B in the catalog of items:
> > for each item A, how many customers also purchased another item in the
> > catalog B). It's processed 90 items in about 8 hours so far! Before I
> dive
> > deeper into trying to figure out the performance problem, I thought I'd
> > email the list to see if more experienced people have ideas.
> >
> > Some characteristics of my datastore: it's size is pretty moderate for a
> > database application. 7500 items, not sure how many customers and
> purchases
> > (how can I find the size of an index?) but probably ~1 million customers.
> > The relationshipstore + nodestore < 500mb. (Propertystore is huge but I
> > don't access it much in traversals.)
> >
> > The possibilities I see are:
> >
> > 1) *Neo4J is just slow.* Probably not slower than Postgres which I was
> > using previously, but maybe I need to switch to a distributed map-reduce
> db
> > in the cloud and give up the very nice graph modeling approach? I didn't
> > think this would be a problem, because my data size is pretty moderate
> and
> > Neo4J is supposed to be fast.
> >
> > 2) *I just need more RAM.* I definitely need more RAM - I have a measly
> > 1GB currently. But would this get my 20day traversal down to a few hours?
> > Doesn't seem like it'd have THAT much impact. I'm running Linux and
> nothing
> > much else besides Neo4j, so I've got 650m physical RAM. Using 300m heap,
> > about 300m memory-map.
> >
> > 3) *There's some secret about Neo4J performance I don't know.* Is there
> > something I'm unaware that Neo4J is doing? When I access a property, does
> it
> > load a chunk of properties I don't care about? For the current node/edge
> or
> > others? I turned off log rotation and I commit after each item A. Are
> there
> > other performance tips I might have missed?
> >
> > 4) *My algorithm is inefficient.* It's a fairly naive algorithm and maybe
> > there's some optimizations I can do. It looks like:
> >
> >> For each item A in the catalog:
> >>   For each customer C that has purchased that item:
> >>For each item B that customer purchased:
> >>   Update the co-occurrence edge between A&B.
> >>
> >   (If the edge exists, add one to its weight. If it doesn't exist,
> >> create it with weight one.)
> >>
> > This is O(n^2) worst case, but practically it'll be much better due to
> the
> > sparseness of purchases. The large number of customers slows it down,
> > though. The slowest part, I suspect, i

Re: [Neo4j] Stumped by performance issue in traversal - would take a month to run!

2010-08-05 Thread Martin Neumann
> >
> > - Martin, I'm confused a bit about SSDs. I read up on them after I read
> > your
> > post. You said flash drives are best, but I read that even the highest
> > performing flash drives are about 30MB/s read, whereas modern hard
drives
> > are at least 50MB/s. True SSDs claim to be 50MB/s too but they're quite
> > expensive. So why is a flash drive best? I could definitely spring for
> one
> > big enough to hold my db if it'd help a lot, but it has that slower read
> > speed. Does the faster seek time really make that much of a difference?
> Any
> > brands you'd recommend?
> >

Neo4j stores the data as Graph on HD.

An example: e = (n1,n2)
e at location 1000
n1 at location 1
n2 at location 5

A traversal, assuming nothing is cached, would result in moving the head to
1 then to 1000 then back to 5.
Normal HD take a while to move to the locations before it can start to read
data. SSD does not have these delays. If you read little data that is spread
widely over the storage, like in a traversal, SSD are much faster then HD
even if they are slower to retrieve the data.
I don't have performance data on that myself but I heard rumors of around
20-40 times speedup.

cheers Martin


On Thu, Aug 5, 2010 at 9:02 PM, Jeff Klann  wrote:

> Thanks for the answers.
>
> Yes, I can do online updates of the datastore, but while this is in R&D I
> will need to rerun the main loop when I change the algorithm and just for
> personal benefit I don't want to wait hours to see the changes. Seems to be
> running acceptably now, though. However, I haven't benchmarked it against
> doing JOINS in Postgres. Are there any good performance stats out there?
> The
> speed is about the same as I'd expect from SQL.
>
> The graph will probably be nearly a complete graph in the end. The edges
> between orders will eventually store various stats on the relationships
> between pairs of items. It'd be nice if I can query an index for outgoing
> edges from nodes with certain properties. Is this possible? I'll have a
> look
> at the edge indexer component.
>
> Thanks,
> Jeff Klann
>
> On Mon, Aug 2, 2010 at 2:40 PM, David Montag <
> david.mon...@neotechnology.com
> > wrote:
>
> > Hi Jeff,
> >
> > Please see answers below.
> >
> > On Mon, Aug 2, 2010 at 5:47 PM, Jeff Klann  wrote:
> >
> > > Thank you all for your continued interest in helping me. I tweaked the
> > code
> > > more to minimize writes to the database and it now looks like:
> > > For each item A
> > >   For each customer that purchased A
> > >  For each item B (with id>A) that A purchased
> > > Increment (in memory) the weight of (A-B)
> > >   Write out the edges [(A-B):weight] to disk and clear the in-memory
> map
> > >
> > > This actually (if I'm not mistaken) covers all relationships and does
> > 7500
> > > items in about 45 minutes! Not too bad, especially due to (I think)
> > > avoiding
> > > edge-checking, and I think it's usable for my application, though it's
> > > still
> > > ~200k traversals/sec on average, which is a few times slower than I
> > hoped.
> > > I
> > > doubt that's much faster than a two-table join in SQL, though deeper
> > > traversals should show benefits.
> > >
> >
> > Do you need to do this computation on the full graph all the time? Maybe
> it
> > would be enough to do it once, and then update it when a customer buys
> > something? Usually, high one-time costs can be tolerated, and with Neo4j
> > you
> > can actually do the updating for a customer performing a purchase at
> > runtime
> > without performance problems.
> >
> >
> > >
> > > - David, thank you for your answers on traversers vs. getRelationships
> > and
> > > on property-loading. I imported some properties I don't really need,
> > > perhaps
> > > if I delete them it'll speed things up? Also I'm using the old
> > > Node.traverse(). How is the new framework better? I expect it has a
> nicer
> > > syntax, which I would like to try, but does it improve performance too?
> > >
> >
> > Well, depending on your setup you should be able to theoretically improve
> > performance compared to the old traversal framework. The old framework
> > keeps
> > track of visited nodes, so that you don't traverse to the same node
> twice.
> > This behavior is customizable in the new framework. Please see
> > http://wiki.neo4j.org/content/Traversal_Framework and check the
> Uniqueness
> > constraints. If you know exactly when to stop, then you should be able to
> > use Uniqueness.NONE, meaning that the framework does not keep track of
> > visited nodes, meaning that you could end up traversing in a cycle. In
> your
> > network however, you might know that you always traverse (item)
> <--BOUGHT--
> > (customer) --BOUGHT--> (item) --CORRELATION--> (item)*  and no further
> than
> > that, so then you know that you won't end up in a cycle. But yeah, then
> you
> > need to programmatically make sure you don't go too far. And I don't know
> > if
> > this gives you any performance benefits what so ever.
> >
> 

Re: [Neo4j] Are Relationships Singleton?

2010-08-18 Thread Martin Neumann
Hi,

Neo4j is a multi graph -> there can exist multiple relationships between two
nodes even with the same type and direction. Each relationship has a unique
ID so you can tell them apart.
Whats your error message in detail?

cheers Martin


On Wed, Aug 18, 2010 at 9:54 PM, Marius Kubatz wrote:

> Hi!
>
> I have a very stupid question...
> Is it ensured that a relationship between two Nodes stays a singleton, even
> if another relationship of the same type and direction is added between
> those Nodes?
>
> Sometimes I get strange results when I delete nodes, that's why I ask.
>
> Thanks in advance and best regards,
> Marius
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] strange performance

2010-11-16 Thread Martin Neumann
Hey,

Is it the exact same query?
If not, it would be interesting to know how many Relations and Nodes have
been traversed.
Depending on graph topology this can differ a lot.

cheers Martin Neumann


On Tue, Nov 16, 2010 at 10:46 AM, Martin Grimmer
wrote:

> Hello,
>
> currently we evaluate Neo4j for one of our projects.
> Our tests showed some strange behaviour to us:
>
>...
>at query 941/1000 (0,009410) with 1809 mb in 0,30 seconds -
>avg: 0,308696 - max: 54,533000
>at query 942/1000 (0,009420) with 1809 mb in 0,069000 seconds -
>avg: 0,308441 - max: 54,533000
>at query 943/1000 (0,009430) with 1809 mb in 0,057000 seconds -
>avg: 0,308174 - max: 54,533000
>at query 944/1000 (0,009440) with 1809 mb in 0,038000 seconds -
>avg: 0,307888 - max: 54,533000
>at query 945/1000 (0,009450) with 1809 mb in 0,63 seconds -
>avg: 0,308229 - max: 54,533000
>at query 946/1000 (0,009460) with 1809 mb in* 60,997000 seconds*
>- avg: 0,372450 - max: 60,997000
>... fast again
>
> Our Neo4j database is about 9 GB in size with about 130M arcs and 15M
> nodes.
> A query gets two strings as input, these are keys for the lucene index
> service to get access to two nodes. Then we start
> an algorithm for these two nodes which determines nodes which are
> adjacent (by 2 steps BFS) to both restricted to only one specific arc type.
> I dont know why some queries are so much (> x100) slower. Maybe you are
> able to help me?
>
> best regards
> --
>
> * Martin Grimmer *
> Developer, Semantic Web Project, IT
>
> Unister GmbH
> Barfußgässchen 11 | 04109 Leipzig
>
> Telefon: +49 (0)341 49288 5064
> martin.grim...@unister.de 
> <mailto:%0a%20%20martin.grim...@unister.de<0a%2520%2520martin.grim...@unister.de>
> >
> www.unister.de <http://www.unister.de>
>
> Vertretungsberechtigter Geschäftsführer: Thomas Wagner
> Amtsgericht Leipzig, HRB: 19056
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user