Re: [Neo] Representing graph using raphaelJS

2009-12-10 Thread Matt Johnston
I haven't tried raphael, but I am working on creating a way to display the
graph in SVG, which raphael uses. SVG could easily be used online (through
raphael or svgweb) and also converted to PDF for offline usage.

My approach is to write a simplified SVG library instead of using Batik and
display the graph in a radial plot like http://thejit.org/ and
http://people.ischool.berkeley.edu/~ping/gtv/

I have the plotting of a basic graph working correctly. Now I am focusing on
more complex graphs, ones where nodes have more than one relationship.

Matt

On Thu, Dec 10, 2009 at 7:33 AM, Laurent Laborde wrote:

> Friendly greetings !
>
> Anyone tried to display neo4j graph using http://raphaeljs.com/ ?
>
> Raphaël is a small JavaScript library that should simplify your work
> with vector graphics on the web. If you want to create your own
> specific chart or image crop and rotate widget, for example, you can
> achieve it simply and easily with this library.
>
> I'm very bad at javascript, so i'm not good to "test" the
> usefullness/efficiency of this library with neo4J graph.
> If anyone tried, or test, and tell us how good/bad it is, that would
> be really awesome !!
>
> BTW... i'm planning to try neo4j+jetty (embedable java webserver, used
> by eclipse, google app engine, ...) and display graph result in a
> webpage instead of using a desktop GUI.
> raphaeljs look good ... if it's not as good as it look, i'll try with
> a java applet.
>
>
> --
> Laurent "ker2x" Laborde
> Sysadmin & DBA at http://www.over-blog.com/
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Ordering of getRelationships

2009-12-10 Thread Rick Bullotta
AFAIK, no guarantees on ordering, thus the reason for the indexing
utilities.

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Adam Rabung
Sent: Thursday, December 10, 2009 5:15 PM
To: Neo user discussions
Subject: [Neo] Ordering of getRelationships

Hi,
I was wondering if there are any guarantees about the order of relationships
that come out of Node.getRelationships?  In my test case (attached) it seems
they come out in "creation date, ascending" order, until you close and
re-open the database.  After the database is reopened, it seems like the
order is reversed. For example:
1. Create a parent
2. Add child c1
3. Add child c2
4. getRelationships on parent returns c1, c2 5. commit and finish 6.
getRelationships on parent returns c1, c2 7. Shutdown and reopen 8.
getRelationships on parent returns c2, c1

It's been a long week, and I'm convinced I'm messing something up here.

Thanks,
Adam

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Lucene Index Corruption?

2009-12-10 Thread Adam Rabung
Hi,
Thanks for all of the information.

Of course the #1 solution to this problem is to ensure all transactions are
closed before shutdown is called.  I am trying to implement some kind of
failsafe in the case that some unforeseen problem/bug causes transactions to
remain open.  What do you think of an optional parameter to shutdown() to
signify "Do your best to rollback+finish any open transactions"?  Or, could
you provide a code example of how to close an Iterable of TransactionImpls
using TransactionManager suspend and resume?

Thank you,
Adam


On Thu, Dec 10, 2009 at 4:46 AM, Johan Svensson wrote:

> Hi Adam,
>
> On Wed, Dec 9, 2009 at 4:37 PM, Adam Rabung  wrote:
> > 1. This iterator will just prevent duplicates from being returned from
> > the iterator?  If there's a condition (bug in my code) that causes
> > shutdown w/ open transactions, will the Lucene indexes continue to
> > double until they're huge?
> >
>
> Yes, currently the index may get duplicate entries for each non clean
> shutdown.
>
> > 2. Would it be possible to detect this situation, and rebuild the
> > indexes?  I guess this is a losing cause if the app is regularly
> > corrupting the data.
> >
>
> Yes, it is possible to rebuild the index if everything that needs
> re-index is reachable and stored in the graph.
>
> > 3. Could you allow me to close transacations from different threads?
> > Yesterday, I wrote something that tracks tx opens and closes, and
> > could iterate through all open transactions and call finish() on them.
> >  But TransactionImpl.finish seems to assume the calling thread is the
> > creating thread, which is not the case here.
> >
>
> I would recommend not to do this. It is better to make sure each
> thread opening up a transaction manages that transaction and closes it
> properly. You can however change thread for a transaction using the
> TransactionManager's suspend() and resume() methods.
>
> > 4. Better yet, expose API for me to force-finish all open
> > transactions?  I'd rather have a botched transaction than a corrupt
> > index.
> >
>
> I do not think that is a good solution. What if some thread detected
> that the running transaction has to be rolled back? If you
> concurrently force commit from another thread then... Something that
> may be possible to do is to force rollback any running transactions in
> shutdown and wait for any transaction that has reached the
> prepared/committing state to complete.
>
> > 5. Is the only condition for this open transactions + a Lucene
> > shutdown (via shutdown() OR abrubt process termination)?  In further
> > testing, it seems I can't reproduce the problem w/ a clean or dirty
> > shutdown if all transactions are closed.
> >
>
> Correct, this will only happen after a crash/non clean shutdown
> marking the logical log as dirty (meaning recovery will be run on next
> startup replaying the logical log).
>
> > 6. I assume your iterator fix will make b11?  What are the chances the
> > root cause will be fixed in b11?  Do you have a tentative release date
> > for b11?
> >
>
> We are planing to release in mid december/before christmas. A real fix
> for this problem will not make it into that release so for now I would
> suggest to properly close the transactions in the thread that opens
> them.
>
> Regards,
> -Johan
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Ordering of getRelationships

2009-12-10 Thread Adam Rabung
Hi,
I was wondering if there are any guarantees about the order of relationships
that come out of Node.getRelationships?  In my test case (attached) it seems
they come out in "creation date, ascending" order, until you close and
re-open the database.  After the database is reopened, it seems like the
order is reversed. For example:
1. Create a parent
2. Add child c1
3. Add child c2
4. getRelationships on parent returns c1, c2
5. commit and finish
6. getRelationships on parent returns c1, c2
7. Shutdown and reopen
8. getRelationships on parent returns c2, c1

It's been a long week, and I'm convinced I'm messing something up here.

Thanks,
Adam
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] SingleValueIndex & org.neo4j.api.core.NotFoundException: More than one relationship

2009-12-10 Thread Arin Sarkissian
Hey guys,

I'm pretty new to Neo4j, especially the indexing stuff.

Here's my situation. I want to be able to get nodes via one of their
properties; in this case let's say the Nodes represent a user & I want
to be able to grab a User's node via their username.

So, my initial attempt may be naive (no batching etc) but I've been
reading thru a a large text file (CSV format: username, userid),
creating a node for each of these lines & indexing the username
component.

Now this needs to be unique (given that usernames are unique) so I've
been using the SingleValueIndex and have run into a problem: It looks
like the SingleValueIndex actually does allow multiple values for a
given lookup (ex: username = phatduckk) however when trying to fetch a
Node from the index that has multiples (ie: username=phatduckk was
indexed twice) I get a NotFoundException with "More than one
relationship " as the message.

I have posted a skeleton piece of code that surfaces this problem over
at: http://gist.github.com/253553
This code does not make sure that the Node being indexed for a key is
always the same - it actually does the opposite: it tries to index
username=phatduckk w/ a different node each time.

The bit of code at http://gist.github.com/253569 does the opposite: i
tries to index the same node for the index username=phatduckk each
time.

I also have a few random questions sprinkled in the code as comments
which illustrate my noobiness =)

Honestly, I'm not claiming this is a bug - I may be completely
misusing and misunderstanding the indexing functionality but with my
limited experience it doesn't seem that SingleValueIndex is "single"
at all (hopefully - i'm just wrong and made a dumb mistake).

Thanks for the help guys,
Arin
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Noob questions/comments

2009-12-10 Thread Craig Taverner
Hi Nick,

Your system sounds super-cool. Must be fun and challenging to deal with the
clustering issues as well. We have though a little about this for future
projects but have no immediate needs.

I have some further comments regarding the balance you need to strike
between index search speed and the need to update the index as the data
streams in. We also have that need, which is probably the main reason I
still have my own home-grown index. Unlike most 'official' indexing
algorithms, I build my index from the ground up. This means the index comes
into existence as data is loaded, and only areas containing data actually
include index nodes. So, for example, if we have data coming in every few
milliseconds for a while, then there is a big gap of days or weeks, then the
data flows again, the gap will not contain any index information. The tree
also only grows as high as needed to cover the full range, so if only a
small range of data is indexed, the complete tree is small, and the top node
only covers that range. This bottom-up approach is very useful for streaming
data, since the index is driven by the stream. It also performs well if the
data coming in is already to some extent in the right order, because finding
the right index nodes to attach too is fast (often still in memory). Our
data streams are produced by people driving along roads, so the time and
location indexes are fast to keep up to date (each event is close to the
previous event in time and space).

I've investigated a few other indexing algorithms, but usually they are
designed to cope with bulk indexing data that is entirely un-ordered, and
that 're-index' step is usually quite costly.

I'm not using neo4j's own timeline index, but my quick glance at it makes it
look like it probably also works for streaming data.

We will continue to look for improvements to the balancing act, and we
should keep in touch on that question.

Regards, Craig

On Thu, Dec 10, 2009 at 5:02 PM, Rick Bullotta <
rick.bullo...@burningskysoftware.com> wrote:

> Awesome inputs, Craig.
>
> That's very similar to how we were considering it - the indexing part is
> the
> piece that I'm having a tough time getting my head around.  If I had a
> timeline-oriented index, I'd need to:
>
> a) Find the start point (either start time for a forward-oriented traversal
> or end time for a backcward-oriented traversal)
>
> b) Traverse in time order
>
> I'll take a look at the timeline example to see if that can do what we
> need.
>
> Because we have multiple types of log entries, we were planning to have two
> nodes per entry - one for the "common" properties (timestamp, source,
> category, description, etc.) and one for the "content" (event specific
> properties) with simple a relationship between them.  In some cases, we
> might even have multiple event entries for one "common" node (e.g. a
> multi-valued sensor).  Neo deals with this nicely for us.
>
> We were then planning to maintain relationships from the common properties
> (which are what will generally be most effective for quickly reducing the
> size of the set of nodes from many millions to a few thousand) to the
> source/category nodes they refer to.  Sort of like a map/reduce scenario
> updated as nodes are created.
>
> We will probably then have a Hadoop or other mechanism to "crawl" and do a
> map/reduce on the event-specific content as needed to maintain
> indexes/relationships for specific queries.
>
> Lastly, we built an engine in Java to deal with SQL-like filtering,
> sorting,
> and aggregation of the event node data once we have the set of possible
> candidates based on time range, source, and category.
>
> I am confident we'll find a way to strike a balance between inbound
> performance (cost of setting up the relationships and indexes as
> nodes/events are added) versus query performance.
>
> Thanks again for sharing your experiences!
>
> Rick
>
> -Original Message-
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
> On
> Behalf Of Craig Taverner
> Sent: Thursday, December 10, 2009 10:51 AM
> To: Neo user discussions
> Subject: Re: [Neo] Noob questions/comments
>
> I'd like to comment on the question of indexing event logs. We are also
> dealing with event logs and we query them based on time windows as well as
> other properties. In our case they are also located on a map, and
> categorized using event types. So we have three types of indexes in use:
>
>   - time index (using long values, similar to the TimelineIndex Johan
>   mentioned, but we coded one ourselves)
>   - spatial index (using two double values, based on the same mechanism for
>   the time index, but in 2D)
>   - category index (we just create a list of category nodes and link the
>   events to the relevant category)
>
> All of these indexes are simply nodes that the event stream nodes link to.
> For numerical indexes like the time and spatial indexes we use tree
> structures (not B-tree, but usually multiple 

Re: [Neo] Noob questions/comments

2009-12-10 Thread Rick Bullotta
Awesome inputs, Craig.

That's very similar to how we were considering it - the indexing part is the
piece that I'm having a tough time getting my head around.  If I had a
timeline-oriented index, I'd need to:

a) Find the start point (either start time for a forward-oriented traversal
or end time for a backcward-oriented traversal)

b) Traverse in time order

I'll take a look at the timeline example to see if that can do what we need.

Because we have multiple types of log entries, we were planning to have two
nodes per entry - one for the "common" properties (timestamp, source,
category, description, etc.) and one for the "content" (event specific
properties) with simple a relationship between them.  In some cases, we
might even have multiple event entries for one "common" node (e.g. a
multi-valued sensor).  Neo deals with this nicely for us.

We were then planning to maintain relationships from the common properties
(which are what will generally be most effective for quickly reducing the
size of the set of nodes from many millions to a few thousand) to the
source/category nodes they refer to.  Sort of like a map/reduce scenario
updated as nodes are created.

We will probably then have a Hadoop or other mechanism to "crawl" and do a
map/reduce on the event-specific content as needed to maintain
indexes/relationships for specific queries.

Lastly, we built an engine in Java to deal with SQL-like filtering, sorting,
and aggregation of the event node data once we have the set of possible
candidates based on time range, source, and category.

I am confident we'll find a way to strike a balance between inbound
performance (cost of setting up the relationships and indexes as
nodes/events are added) versus query performance.

Thanks again for sharing your experiences!

Rick

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Craig Taverner
Sent: Thursday, December 10, 2009 10:51 AM
To: Neo user discussions
Subject: Re: [Neo] Noob questions/comments

I'd like to comment on the question of indexing event logs. We are also
dealing with event logs and we query them based on time windows as well as
other properties. In our case they are also located on a map, and
categorized using event types. So we have three types of indexes in use:

   - time index (using long values, similar to the TimelineIndex Johan
   mentioned, but we coded one ourselves)
   - spatial index (using two double values, based on the same mechanism for
   the time index, but in 2D)
   - category index (we just create a list of category nodes and link the
   events to the relevant category)

All of these indexes are simply nodes that the event stream nodes link to.
For numerical indexes like the time and spatial indexes we use tree
structures (not B-tree, but usually multiple children per-parent, suitable
for uneven density data). The first level of the tree (closest to the data)
is chosen with a resolution close to common queries (in our case the events
occur many times per second, but queries are usually at second resolution,
so the first index is of second resolution).

You had a very important question about combined indexes, for example
querying on timestamp and category in the same query with high performance.
Currently we do not have need for that in our system, but we have
brainstormed a nice solution to this, so I thought I'd mention it here in
case it is useful. There are two options:

   - If one of the criteria is very limiting all the time, for example
   querying on time-window always returns a small set, then query that first
   and do a slow search for the linked categories. This adds no additional
   complexity to the database, but makes assumptions about the queries, and
   only performs well if these assumptions are true.
   - Otherwise you can build an combined index by connecting the tree nodes
   of one index to the tree nodes of another. In the case of time and
category
   indices, each of the nodes in the B-tree or multi-tree time index would
be
   connected to all the categories for which its underlying data nodes
belong.
   Then when traversing the time tree, you can test for both the time-window
   constraints and the category constraints, and exit the search if either
   fail. We have considered the possibility of building these structures on
   demand, based on actual queries, so the first query that works with any
two
   constraints would search on one, and then build the combined index for
both.
   This allows subsequent searches to run very fast, without needing to
build
   all possible combinations of combined index (assuming many single
property
   indices exist).

-  One aspect of our application will store nodes that can be
> considered similar to event logs.  There may be many thousands of these
> nodes per "event stream".  We would like to be able to traverse the
entries
> in chronological order, very quickly.  We were considering the follo

Re: [Neo] Noob questions/comments

2009-12-10 Thread Craig Taverner
I'd like to comment on the question of indexing event logs. We are also
dealing with event logs and we query them based on time windows as well as
other properties. In our case they are also located on a map, and
categorized using event types. So we have three types of indexes in use:

   - time index (using long values, similar to the TimelineIndex Johan
   mentioned, but we coded one ourselves)
   - spatial index (using two double values, based on the same mechanism for
   the time index, but in 2D)
   - category index (we just create a list of category nodes and link the
   events to the relevant category)

All of these indexes are simply nodes that the event stream nodes link to.
For numerical indexes like the time and spatial indexes we use tree
structures (not B-tree, but usually multiple children per-parent, suitable
for uneven density data). The first level of the tree (closest to the data)
is chosen with a resolution close to common queries (in our case the events
occur many times per second, but queries are usually at second resolution,
so the first index is of second resolution).

You had a very important question about combined indexes, for example
querying on timestamp and category in the same query with high performance.
Currently we do not have need for that in our system, but we have
brainstormed a nice solution to this, so I thought I'd mention it here in
case it is useful. There are two options:

   - If one of the criteria is very limiting all the time, for example
   querying on time-window always returns a small set, then query that first
   and do a slow search for the linked categories. This adds no additional
   complexity to the database, but makes assumptions about the queries, and
   only performs well if these assumptions are true.
   - Otherwise you can build an combined index by connecting the tree nodes
   of one index to the tree nodes of another. In the case of time and category
   indices, each of the nodes in the B-tree or multi-tree time index would be
   connected to all the categories for which its underlying data nodes belong.
   Then when traversing the time tree, you can test for both the time-window
   constraints and the category constraints, and exit the search if either
   fail. We have considered the possibility of building these structures on
   demand, based on actual queries, so the first query that works with any two
   constraints would search on one, and then build the combined index for both.
   This allows subsequent searches to run very fast, without needing to build
   all possible combinations of combined index (assuming many single property
   indices exist).

-  One aspect of our application will store nodes that can be
> considered similar to event logs.  There may be many thousands of these
> nodes per "event stream".  We would like to be able to traverse the entries
> in chronological order, very quickly.  We were considering the following
> design possibilities:
>
> o   Simply create a node for each "stream" and a node for each entry, with
> a
> relationship between the stream and the entry, then implement our own sort
> routine
>

Our approach is to create a node for each entry, and index using time and
spatial indices. The first level of index is another stream of data, ordered
by the relevant property, and traversable in that order (eg. time order).

o   Similar to the above, but create a node for each "day", and manage
> relationships to allow traversal by stream and/or day
>

In our approach, each level in the index tree represents a higher level of
granularity. We go up in fixed steps (multiples). A B-tree steps 2X. We tend
to step 10X, because that gives isosceles pyramid trees. But you might
prefer to step in known temporal quantities, seconds, minutes, hours, days,
weeks, months, etc. That will improve search performance if your common
queries are exact multiples of the different index levels.


> o   Create a node for each stream, a node for each entry and treat the
> entries as a forward-only linked list using relationships between the
> entries (and of course a relationship between the stream and the "first"
> entry)
>

We tend to create relationships for all common query or traversal paths,
with different relationship types in all cases. So traversing the original
data would use 'next'. Traversing down the index would be 'child', or
perhaps 'index-child' if there is ambiguity. etc.

-  Anyone used any kind of intermediate index or other approach to
> bridge multiple Neo instances?
>

Hmm... I think it was this question that got me started on the combined
index discussion above, but now that I re-read it, I see it has nothing to
do with combined indices. I've thought a bit about bridging indices, but
have nothing really useful to offer here. Sorry. Hope the long discussion
above still has some value :-(
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailm

Re: [Neo] Noob questions/comments

2009-12-10 Thread Rick Bullotta
Hi, Jakub.

Yes, it would be our intent to open source any viewer that we end up
creating.

Rick

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Jakub Kotowski
Sent: Thursday, December 10, 2009 6:46 AM
To: Neo user discussions
Subject: Re: [Neo] Noob questions/comments

Hi Rick,


Rick Bullotta schrieb:
> -  Any GUI tools for viewing/navigating the graph structure?  We
are
> prototyping one in Adobe Flex, curious if there are others.
> 

do you plan to make the source code available? I was considering using
Flex and one of the Flash information visualization libraries too. At
the moment I'm using the JavaScript InfoVis Toolikt (http://thejit.org/)
to  display parts of a graph stored in Neo.


Regards,
Jakub





___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] I/O load in Neo during traversals

2009-12-10 Thread Rick Bullotta
Thanks for the great info, Johan.  To answer one of your questions, yes, we
are running on Windows (currently a requirement).

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
Behalf Of Johan Svensson
Sent: Thursday, December 10, 2009 6:56 AM
To: Neo user discussions
Subject: Re: [Neo] I/O load in Neo during traversals

This probably (again) has to do with OS and file system in use. If the
string store is larger than assigned memory for it (see
http://wiki.neo4j.org/content/Neo_Performance_Guide#Tuning_memory_mapped_I.2
FO)
depending on access patterns the memory will be spread out over the
store (so a read/write can be read/written to memory instead of the
actual file).

If you only do read operations and you start to do reads on an area of
the file that has not been cached you will first start hitting the
file. If requests continue to hit that area some other part of the
file (that has been cached) with lower request rate will be swapped
out and the free memory used for the new "hot spot".

During this swap a request will be made to the system to "sync all
changes in this buffer to disk" then unmap and use the free memory to
memory map the new region. If no changes has been made to the buffer
the OS should just release it but I have seen this is not always the
case (sometimes resulting in full write out of the buffer). If you are
running on Windows or have turned memory mapped buffers off the full
buffer will always be written out (this can be fixed and is on my todo
list).

-Johan

On Wed, Dec 9, 2009 at 10:56 PM, Rick Bullotta
 wrote:
> When doing some large traversal testing (no writes/updates), I noticed
that
> the neostore.propertystore.db.strings file was seeing a lot of read I/O
(as
> expected) but also a huge amount of write I/O (almost 5X the read I/O
rate).
> Out of curiosity, what is the write activity that needs to occur when
doing
> traversals?
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] LuceneIndexBatchInserter doubt

2009-12-10 Thread Mattias Persson
To continue this thread in the user list:

Thanks Núria, I've gotten your samples code/files and I'm running it
now to try to reproduce you problem.

2009/12/9 Núria Trench :
> I have finished uploading the 4 csv files. You'll see an e-mail with the
> other 3 csv files packed in a rar file.
> Thanks,
>
> Núria.
>
> 2009/12/9 Núria Trench 
>>
>> Yes, you are right. But there is one csv file that is too big to be packed
>> with other files and I am reducing it.
>> I am sending the other files now.
>>
>> 2009/12/9 Mattias Persson 
>>>
>>> By the way, you might consider packing those files (with zip or tar.gz
>>> or something) cause they will shrink quite well
>>>
>>> 2009/12/9 Mattias Persson :
>>> > Great, but I only got the images.csv file... I'm starting to test with
>>> > that at least
>>> >
>>> > 2009/12/9 Núria Trench :
>>> >> Hi again,
>>> >>
>>> >> The errors show up after being parsed 2 csv files to create all the
>>> >> nodes,
>>> >> just in the moment of calling the method "getSingleNode" for looking
>>> >> up the
>>> >> tail and head node for creating all the edges by reading the other two
>>> >> csv
>>> >> files.
>>> >>
>>> >> I am sending with Sprend the four csv files that will help you to
>>> >> trigger
>>> >> index behaviour.
>>> >>
>>> >> Thank you,
>>> >>
>>> >> Núria.
>>> >>
>>> >> 2009/12/9 Mattias Persson 
>>> >>>
>>> >>> Hmm, I've no idea... but does the errors show up early in the process
>>> >>> or do you have to insert a LOT of data to trigger it? In such case
>>> >>> you
>>> >>> could send me a part of them... maybe using http://www.sprend.se ,
>>> >>> WDYT?
>>> >>>
>>> >>> 2009/12/9 Núria Trench :
>>> >>> > Hi Mattias,
>>> >>> >
>>> >>> > The data isn't confident but the files are very big (5,5 GB).
>>> >>> > How can I send you this data?
>>> >>> >
>>> >>> > 2009/12/9 Mattias Persson 
>>> >>> >>
>>> >>> >> Yep I got the java code, thanks. Yeah if the data is confident or
>>> >>> >> sensitive you can just send me the formatting, else consider
>>> >>> >> sending
>>> >>> >> the files as well (or a subset if they are big).
>>> >>> >>
>>> >>> >> 2009/12/9 Núria Trench :
>>> >>> >> >
>>> >>> >> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Mattias Persson, [matt...@neotechnology.com]
>>> >>> >> Neo Technology, www.neotechnology.com
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Mattias Persson, [matt...@neotechnology.com]
>>> >>> Neo Technology, www.neotechnology.com
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Mattias Persson, [matt...@neotechnology.com]
>>> > Neo Technology, www.neotechnology.com
>>> >
>>>
>>>
>>>
>>> --
>>> Mattias Persson, [matt...@neotechnology.com]
>>> Neo Technology, www.neotechnology.com
>>
>
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Representing graph using raphaelJS

2009-12-10 Thread Laurent Laborde
Friendly greetings !

Anyone tried to display neo4j graph using http://raphaeljs.com/ ?

Raphaël is a small JavaScript library that should simplify your work
with vector graphics on the web. If you want to create your own
specific chart or image crop and rotate widget, for example, you can
achieve it simply and easily with this library.

I'm very bad at javascript, so i'm not good to "test" the
usefullness/efficiency of this library with neo4J graph.
If anyone tried, or test, and tell us how good/bad it is, that would
be really awesome !!

BTW... i'm planning to try neo4j+jetty (embedable java webserver, used
by eclipse, google app engine, ...) and display graph result in a
webpage instead of using a desktop GUI.
raphaeljs look good ... if it's not as good as it look, i'll try with
a java applet.


-- 
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] I/O load in Neo during traversals

2009-12-10 Thread Johan Svensson
This probably (again) has to do with OS and file system in use. If the
string store is larger than assigned memory for it (see
http://wiki.neo4j.org/content/Neo_Performance_Guide#Tuning_memory_mapped_I.2FO)
depending on access patterns the memory will be spread out over the
store (so a read/write can be read/written to memory instead of the
actual file).

If you only do read operations and you start to do reads on an area of
the file that has not been cached you will first start hitting the
file. If requests continue to hit that area some other part of the
file (that has been cached) with lower request rate will be swapped
out and the free memory used for the new "hot spot".

During this swap a request will be made to the system to "sync all
changes in this buffer to disk" then unmap and use the free memory to
memory map the new region. If no changes has been made to the buffer
the OS should just release it but I have seen this is not always the
case (sometimes resulting in full write out of the buffer). If you are
running on Windows or have turned memory mapped buffers off the full
buffer will always be written out (this can be fixed and is on my todo
list).

-Johan

On Wed, Dec 9, 2009 at 10:56 PM, Rick Bullotta
 wrote:
> When doing some large traversal testing (no writes/updates), I noticed that
> the neostore.propertystore.db.strings file was seeing a lot of read I/O (as
> expected) but also a huge amount of write I/O (almost 5X the read I/O rate).
> Out of curiosity, what is the write activity that needs to occur when doing
> traversals?
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Noob questions/comments

2009-12-10 Thread Todd Stavish
For Flex, I've used flare to display graphs. Flex Builder will
auto-gen the client-side code from a webservice. Click on the layouts
tab to see what the graphs looks like.

http://flare.prefuse.org/demo






On Thu, Dec 10, 2009 at 6:45 AM, Jakub Kotowski  wrote:
> Hi Rick,
>
>
> Rick Bullotta schrieb:
>> -          Any GUI tools for viewing/navigating the graph structure?  We are
>> prototyping one in Adobe Flex, curious if there are others.
>>
>
> do you plan to make the source code available? I was considering using
> Flex and one of the Flash information visualization libraries too. At
> the moment I'm using the JavaScript InfoVis Toolikt (http://thejit.org/)
> to  display parts of a graph stored in Neo.
>
>
> Regards,
> Jakub
>
>
>
>
>
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Noob questions/comments

2009-12-10 Thread Jakub Kotowski
Hi Rick,


Rick Bullotta schrieb:
> -  Any GUI tools for viewing/navigating the graph structure?  We are
> prototyping one in Adobe Flex, curious if there are others.
> 

do you plan to make the source code available? I was considering using
Flex and one of the Flash information visualization libraries too. At
the moment I'm using the JavaScript InfoVis Toolikt (http://thejit.org/)
to  display parts of a graph stored in Neo.


Regards,
Jakub





___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Troubleshooting performance/memory issues

2009-12-10 Thread Johan Svensson
Hi,

The logical log stores all transactions and will rotate itself at a
configurable value (default 10MB). Once the log is rotated data will
be flushed to the store files so we only have to perform recovery on
the "latest" log file in case of a crash. Any running transaction will
be copied from the log to the new log during rotate.

A transaction commit will result in a flush to the logical log file
(and global transaction log) if the transaction is a "write
transaction" (read only transactions do not touch the log files).

You mentioned a 10 times performance increase going from chunk size 1k
to 10k nodes and that is quite a lot. I could understand if you went
from 100 to 10k (usually we max out write performance somewhere around
20k-100k write operations/transaction but it depends on hardware).
What kind of system is this running on (OS, file system ext4?,
hardware)?

Regards,
-Johan

On Wed, Dec 9, 2009 at 9:59 PM, Rick Bullotta
 wrote:
> FYI, we experimented with different heap size (1GB), along with different
> "chunk sizes", and were able to eliminate the heap error and get about a 10X
> improvement in insert speed.  It would be helpful to better understand the
> interactions of the various Neo startup parameters, transaction buffers, and
> so on, and their impact on performance.  I read the performance guidelines,
> which was some help, but perhaps some additional scenario-based
> recommendations might help (frequent updates/frequent access, infrequent
> update/frequent access, burst mode update vs steady update rate, etc...).
>
> Learning more about Neo every hour!
>
> -Original Message-
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
> Behalf Of Rick Bullotta
> Sent: Wednesday, December 09, 2009 2:57 PM
> To: 'Neo user discussions'
> Subject: [Neo] Troubleshooting performance/memory issues
>
> Hi, all.
>
>
>
> When trying to load a few hundred thousand nodes & relationships (chunking
> it in groups of 1000 nodes or so), we are getting an out of memory heap
> error after 15-20 minutes or so.  No big deal, we expanded the heap settings
> for the JVM.  But then we also noticed that the nioneo_logical_log.xxx file
> was continuing to grow, even though we were wrapping each 1000 node inserts
> in their own transaction (there is no other transaction active) and
> committing w/success and finishing each group of 1000.    Periodically
> (seemingly unrelated to our transaction finishing), that file shrinks again
> and the data is flushed to the other neo propertystore and relationshipstore
> files.  I just wanted to check if that was normal behavior, or if there is
> something wrong with way we (or Neo) is handling the transactions, and thus
> the reason we hit an out-of-memory error.
>
>
>
> Thanks,
>
>
>
> Rick
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Noob questions/comments

2009-12-10 Thread Johan Svensson
Hi Rick,

On Wed, Dec 9, 2009 at 4:54 PM, Rick Bullotta
 wrote:
> Questions:
>
> -          If you delete the reference node (id = 0), how can you recreate
> it?

Being able to delete the reference node can sometimes be a problem but
I have not come up with a good solution on how to fix it. You can set
a different reference node using non standard API
(NeoModule.setReferenceNode). You can also use the batch inserter to
recreate it invoking BatchInserter.createNode( 0, null ) or start with
a new db.

>
> -          If you have a number of "loose" or disjoint graphs structured as
> trees with a single root node, is there a best practice for
> tracking/iterating only the top level node(s) of these disjoint graphs?  Is
> relating them to the reference node and doing a first level traversal the
> best way?

I would say relating them back to the reference node (or a
sub-reference node, see
http://wiki.neo4j.org/content/Design_Guide#Subreferences) is the best
way.

>
> -          We would like to treat our properties as slightly more complex
> than a simple type (they might have a last modified date, validity flag, and
> so on) - given the choice between adding properties to track this state or
> using nodes and relationships for these entities, what are the pros and cons
> of each approach?

Depends on use-cases you have that involves the specific property. If
you just need to get the last modified date such as
"entity.getLastModified()" a plain property is a good solution. If you
have a use-case that is "service.giveMeAllEntitiesOrderedByDate()"
then using relationships and nodes may be a better idea.

>
> -          One aspect of our application will store nodes that can be
> considered similar to event logs.  There may be many thousands of these
> nodes per "event stream".  We would like to be able to traverse the entries
> in chronological order, very quickly.  We were considering the following
> design possibilities:
>
> o   Simply create a node for each "stream" and a node for each entry, with a
> relationship between the stream and the entry, then implement our own sort
> routine
>
> o   Similar to the above, but create a node for each "day", and manage
> relationships to allow traversal by stream and/or day
>
> o   Create a node for each stream, a node for each entry and treat the
> entries as a forward-only linked list using relationships between the
> entries (and of course a relationship between the stream and the "first"
> entry)

Again it depends on use-case and what your data looks like. I can see
that all of the above suggested designs would work but pick the one
that solves the problem best. For example if the application always
manages these event streams by accessing events by day having a "day"
node would be a good idea. Also have a look at the timeline utility
(http://components.neo4j.org/index-util/apidocs/org/neo4j/util/timeline/TimelineIndex.html).

> -          Has the fact that the node id is an "int" rather than a "long"
> been an issue in any implementations?  Are node id's reused if deleted (I
> suspect not, but just wanted to confirm).

Node and relationship ids are reused. Currently they will not be
reused within the same "session" so you have to do a clean shutdown
then start a new NeoService for them to be reused (this may change to
reuse in same session for later release). Node (and relationship) ids
are exposed through the API as long so think of them as a long. You
are however correct that they are currently stored in a 32-bit format
but this will change as machines get more powerful (right now we are
discussing adding 1-3 bits going from ~4B to 8B,16B and 32B depending
on store files).

>
> -          Any whitepaper/best practices for high availability/load-balanced
> scenarios?  We were considering using a message queue to send "deltas"
> around between nodes or something similar.

We are currently working on our "Neo4j level" HA solution (master
slave replication but can write to slaves). Using a message queue to
send "domain layer deltas" around is how we have done this
successfully in the past so while you wait for the real HA solution I
would say thats the way to go.

> -          We'll be hosting Neo inside a servlet engine.  Plan was to start
> up Neo within the init method of an autoloading servlet.  Any other
> recommendations/suggestions?  Best practice for ensuring a clean shutdown?
>

Make sure you can redirect a new request (so it will not start a
transaction) once you intend to invoke NeoService.shutdown. Then wait
for all the current request to complete to ensure a clean shutdown. We
are hoping to fix this problem by making shutdown block new
transactions, rollback any running transactions and wait for
transactions that are in the committing state.

> -          Anyone used any kind of intermediate index or other approach to
> bridge multiple Neo instances?

Not sure I understand what you mean, could you explain some more?

>
> -          Any GUI tools for viewi

Re: [Neo] Lucene Index Corruption?

2009-12-10 Thread Johan Svensson
Hi Adam,

On Wed, Dec 9, 2009 at 4:37 PM, Adam Rabung  wrote:
> 1. This iterator will just prevent duplicates from being returned from
> the iterator?  If there's a condition (bug in my code) that causes
> shutdown w/ open transactions, will the Lucene indexes continue to
> double until they're huge?
>

Yes, currently the index may get duplicate entries for each non clean shutdown.

> 2. Would it be possible to detect this situation, and rebuild the
> indexes?  I guess this is a losing cause if the app is regularly
> corrupting the data.
>

Yes, it is possible to rebuild the index if everything that needs
re-index is reachable and stored in the graph.

> 3. Could you allow me to close transacations from different threads?
> Yesterday, I wrote something that tracks tx opens and closes, and
> could iterate through all open transactions and call finish() on them.
>  But TransactionImpl.finish seems to assume the calling thread is the
> creating thread, which is not the case here.
>

I would recommend not to do this. It is better to make sure each
thread opening up a transaction manages that transaction and closes it
properly. You can however change thread for a transaction using the
TransactionManager's suspend() and resume() methods.

> 4. Better yet, expose API for me to force-finish all open
> transactions?  I'd rather have a botched transaction than a corrupt
> index.
>

I do not think that is a good solution. What if some thread detected
that the running transaction has to be rolled back? If you
concurrently force commit from another thread then... Something that
may be possible to do is to force rollback any running transactions in
shutdown and wait for any transaction that has reached the
prepared/committing state to complete.

> 5. Is the only condition for this open transactions + a Lucene
> shutdown (via shutdown() OR abrubt process termination)?  In further
> testing, it seems I can't reproduce the problem w/ a clean or dirty
> shutdown if all transactions are closed.
>

Correct, this will only happen after a crash/non clean shutdown
marking the logical log as dirty (meaning recovery will be run on next
startup replaying the logical log).

> 6. I assume your iterator fix will make b11?  What are the chances the
> root cause will be fixed in b11?  Do you have a tentative release date
> for b11?
>

We are planing to release in mid december/before christmas. A real fix
for this problem will not make it into that release so for now I would
suggest to properly close the transactions in the thread that opens
them.

Regards,
-Johan
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user