Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread rick . bullotta
   In a previous post, I suggested a two-way traversal (I guess it's
   actually a traversal done once in one direction and "n-1" times in the
   other direction, where "n" = number of tags you're matching posts on).



   I'm willing to bet it could be pretty fast...



   Do you have any code that can create a dummy data set in Neo4J?  If you
   do, I'd be willing to give it a try...





    Original Message 
   Subject: Re: [Neo] How to efficiently query in Neo4J?
   From: Michael Ludwig 
   Date: Fri, April 09, 2010 2:44 pm
   To: Neo user discussions 
   Alastair James schrieb am 09.04.2010 um 14:04:37 (+0100)
   [Re: [Neo] How to efficiently query in Neo4J?]:
   > So, I suppose this question boils down to, is there an efficient way
   > to calculate the union of two traversals without retrieving all
   result
   > sets and performing the union in user code?
   No need for two traversals if you annotate your category tree in Neo4j
   the same way Celko has popularized for SQL, i.e. marking each category
   with *left* and *right*. It's really not a question of graph or sets,
   as in both cases what you deal with is a tree.
   [1]http://intelligent-enterprise.informationweek.com/001020/celko.jhtml
   Note that this needs some custom logic for category tree updates. But
   it's not difficult in SQL, and I think it's not much more difficult in
   Neo4j either.
   --
   Michael Ludwig
   ___
   Neo mailing list
   User@lists.neo4j.org
   [2]https://lists.neo4j.org/mailman/listinfo/user

References

   1. http://intelligent-enterprise.informationweek.com/001020/celko.jhtml
   2. https://lists.neo4j.org/mailman/listinfo/user
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread Michael Ludwig
Alastair James schrieb am 09.04.2010 um 14:04:37 (+0100)
[Re: [Neo] How to efficiently query in Neo4J?]:

> So, I suppose this question boils down to, is there an efficient way
> to calculate the union of two traversals without retrieving all result
> sets and performing the union in user code?

No need for two traversals if you annotate your category tree in Neo4j
the same way Celko has popularized for SQL, i.e. marking each category
with *left* and *right*. It's really not a question of graph or sets,
as in both cases what you deal with is a tree.

http://intelligent-enterprise.informationweek.com/001020/celko.jhtml

Note that this needs some custom logic for category tree updates. But
it's not difficult in SQL, and I think it's not much more difficult in
Neo4j either.

-- 
Michael Ludwig
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread Alastair James
On 9 April 2010 14:21, Max De Marzi Jr.  wrote:

> On first traversal, add a relationship to a "found node" to each node that
> would return, and check for this relationship on the second traversal?
> Maybe create a unique id, set a property or add a node property with the
> unique id on the first traversal, and check for this property on the second
> traversal?
>

I suspect that would lead to pretty poor performance given that I am
altering the data on each node visited. Also, I would have to remember to
remove these properties / relationships later on otherwise the database will
balloon in size.

That said, it would be suitable for queries that I could generate (say)
every hour. So adding relations to the graph would be acceptable for a index
that is built using a 'slow query', but not suitable for realtime results.
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread Max De Marzi Jr.
On first traversal, add a relationship to a "found node" to each node that
would return, and check for this relationship on the second traversal?
Maybe create a unique id, set a property or add a node property with the
unique id on the first traversal, and check for this property on the second
traversal?


On Fri, Apr 9, 2010 at 8:04 AM, Alastair James  wrote:

> On 9 April 2010 07:50, Peter Neubauer  >wrote:
>
> > Marko and me tried to summarize what is working especially good with
> > Graph Databases and what not:
> >
>
> Yes, but in my mind, my use case is a *perfect* example of what should work
> well in a graph DB. It is an exact example of inforamtion organised by
> related categories and other relations. After all, in effect, every website
> IS a graph already. Using a graph data model allows us to build site
> engines
> that are can harness the flexibility of the web without requiring loads of
> joining tables as typically found in SQL databases that try model this.
>
> So therefore, I do not accept that this is not a good use case for graphs.
> I
> probably presented a too simplistic example of just posts and tags, but I
> envisage a system where posts relate to posts, to tags, tags relate to tags
> whatever. It would be a *nightmare* to implement in SQL.
>
> So, I suppose this question boils down to, is there an efficient way to
> calculate the union of two traversals without retrieving all result sets
> and
> performing the union in user code?
>
> Al
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread Alastair James
On 9 April 2010 07:50, Peter Neubauer wrote:

> Marko and me tried to summarize what is working especially good with
> Graph Databases and what not:
>

Yes, but in my mind, my use case is a *perfect* example of what should work
well in a graph DB. It is an exact example of inforamtion organised by
related categories and other relations. After all, in effect, every website
IS a graph already. Using a graph data model allows us to build site engines
that are can harness the flexibility of the web without requiring loads of
joining tables as typically found in SQL databases that try model this.

So therefore, I do not accept that this is not a good use case for graphs. I
probably presented a too simplistic example of just posts and tags, but I
envisage a system where posts relate to posts, to tags, tags relate to tags
whatever. It would be a *nightmare* to implement in SQL.

So, I suppose this question boils down to, is there an efficient way to
calculate the union of two traversals without retrieving all result sets and
performing the union in user code?

Al
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversers in the REST API

2010-04-09 Thread Alastair James
>
> Why not just slap memcached in the middle?  Would help with scalability
>   as well, plus you could keep cached results keyed by query params in
>   there if needed.  Just a thought...
>

Yes, but in my mind that says "you cant use neo without a 3rd party caching
layer" which seems a little crazy as it adds complexity and leads to
inevitable 'eventual consistency' as the many many different cached views
expire. Plus, there would still be the memory overhead of inflating 1000+
nodes from memcache.

Anyway, if you are caching, better to cache the output of the generated HTML
pages (or page fragments), something that any website of scale will do
already, so caching in the middleware wont help.

However, caching does not help situations where everybody sees a slightly
different view of the data or that the search state space is large enough to
make the chances of a cache hit quite small.

Al
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversers in the REST API

2010-04-09 Thread rick . bullotta
   Why not just slap memcached in the middle?  Would help with scalability
   as well, plus you could keep cached results keyed by query params in
   there if needed.  Just a thought...



    Original Message 
   Subject: Re: [Neo] Traversers in the REST API
   From: Alastair James 
   Date: Fri, April 09, 2010 8:32 am
   To: Neo user discussions 
   >Since in manycases the results of a query will need to be reformed
   into
   > their associated domain objects
   Unlikely to be the case over the HTTP API. Its unlikely people will
   create
   domain objects in (e.g.) PHP they will just use the data directly.
   Pagination is kinda tricky if the data changes between subsequent
   > requests for "pages". Since pagination is generally used for UIs, a
   > common approach is to place the entire dataset (or a cursor,
   depending
   > on where the data is coming from) in a session object. Regardless of
   > where it is kept, if you want to deal with data changes, you either
   > have to a) invalidate the "cached" dataset if data changes or b) keep
   a
   > copy of the whole dataset around in its "as queried" state so that
   > subsequent paging requests are consistent. Either case involves
   > keeping a fairly big duplicate data structure on the server or middle
   > tier and violates one of the objectives of REST-ful APIs, which is
   that
   > of statelessness. For that reason, I personally think the REST-ful
   API
   > shouldn't deal with paging. It should probably be done at some
   > intermediate level as needed by applications. We can certainly build
   a
   > separate API that we can all leverage if needed, but I don't think it
   > should be in the core REST-ful layer.
   >
   Well, I think for my use cases (websites), its likely that users dont
   flick
   between pages that often. For example, on may sites, users will view
   page 1
   and select an item, any very view move on to page 2. Its a very
   different
   usage pattern compared to a traditional desktop UI, so there
   is absolutely no need to hold the sorted set on the server in a cursor
   type
   way.
   A typical use case for me would be 1000+ matching rows, with 90%+ of
   page
   views for the first 10, 5% for the next 10 etc...! You can clearly see
   that
   sending the entire results set of 1000+ rows over HTTP/JSON is
   inefficient.
   Of course, caching between the web server and the neo HTTP API can
   help, but
   not in all cases, and it seems silly to rely on this.
   Al
   --
   Dr Alastair James
   CTO James Publishing Ltd.
   [1]http://www.linkedin.com/pub/3/914/163
   [2]www.worldreviewer.com
   WINNER Travolution Awards Best Travel Information Website 2009
   WINNER IRHAS Awards, Los Angeles, Best Travel Website 2008
   WINNER Travolution Awards Best New Online Travel Company 2008
   WINNER Travel Weekly Magellan Award 2008
   WINNER Yahoo! Finds of the Year 2007
   "Noli nothis permittere te terere!"
   ___
   Neo mailing list
   User@lists.neo4j.org
   [3]https://lists.neo4j.org/mailman/listinfo/user

References

   1. http://www.linkedin.com/pub/3/914/163
   2. http://www.worldreviewer.com/
   3. https://lists.neo4j.org/mailman/listinfo/user
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversers in the REST API

2010-04-09 Thread Alastair James
>Since in manycases the results of a query will need to be reformed into
 > their associated domain objects

Unlikely to be the case over the HTTP API.  Its unlikely people will create
domain objects in (e.g.) PHP they will just use the data directly.

Pagination is kinda tricky if the data changes between subsequent
>   requests for "pages".  Since pagination is generally used for UIs, a
>   common approach is to place the entire dataset (or a cursor, depending
>   on where the data is coming from) in a session object.  Regardless of
>   where it is kept, if you want to deal with data changes, you either
>   have to a) invalidate the "cached" dataset if data changes or b) keep a
>   copy of the whole dataset around in its "as queried" state so that
>   subsequent paging requests are consistent.  Either case involves
>   keeping a fairly big duplicate data structure on the server or middle
>   tier and violates one of the objectives of REST-ful APIs, which is that
>   of statelessness.  For that reason, I personally think the REST-ful API
>   shouldn't deal with paging.  It should probably be done at some
>   intermediate level as needed by applications.  We can certainly build a
>   separate API that we can all leverage if needed, but I don't think it
>   should be in the core REST-ful layer.
>

Well, I think for my use cases (websites), its likely that users dont flick
between pages that often. For example, on may sites, users will view page 1
and select an item, any very view move on to page 2. Its a very different
usage pattern compared to a traditional desktop UI, so there
is absolutely no need to hold the sorted set on the server in a cursor type
way.

A typical use case for me would be 1000+ matching rows, with 90%+ of page
views for the first 10, 5% for the next 10 etc...! You can clearly see that
sending the entire results set of 1000+ rows over HTTP/JSON is inefficient.

Of course, caching between the web server and the neo HTTP API can help, but
not in all cases, and it seems silly to rely on this.

Al




-- 
Dr Alastair James
CTO James Publishing Ltd.
http://www.linkedin.com/pub/3/914/163

www.worldreviewer.com

WINNER Travolution Awards Best Travel Information Website 2009
WINNER IRHAS Awards, Los Angeles, Best Travel Website 2008
WINNER Travolution Awards Best New Online Travel Company 2008
WINNER Travel Weekly Magellan Award 2008
WINNER Yahoo! Finds of the Year 2007

"Noli nothis permittere te terere!"
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversers in the REST API

2010-04-09 Thread rick . bullotta
   Since in manycases the results of a query will need to be reformed into
   their associated domain objects, we've chosen to do our sorting at that
   point (and on the server).  We do our (primary) filtering within the
   traversal/DB->domain object processes.  That seems to work well.



   Pagination is kinda tricky if the data changes between subsequent
   requests for "pages".  Since pagination is generally used for UIs, a
   common approach is to place the entire dataset (or a cursor, depending
   on where the data is coming from) in a session object.  Regardless of
   where it is kept, if you want to deal with data changes, you either
   have to a) invalidate the "cached" dataset if data changes or b) keep a
   copy of the whole dataset around in its "as queried" state so that
   subsequent paging requests are consistent.  Either case involves
   keeping a fairly big duplicate data structure on the server or middle
   tier and violates one of the objectives of REST-ful APIs, which is that
   of statelessness.  For that reason, I personally think the REST-ful API
   shouldn't deal with paging.  It should probably be done at some
   intermediate level as needed by applications.  We can certainly build a
   separate API that we can all leverage if needed, but I don't think it
   should be in the core REST-ful layer.



   Just my $0.02, after taxes.







    Original Message 
   Subject: Re: [Neo] Traversers in the REST API
   From: Tobias Ivarsson 
   Date: Fri, April 09, 2010 4:00 am
   To: Neo user discussions 
   I definitely agree that limiting or paging a set of results is probably
   not
   very useful without some sort of sorting. The (only) benefit of pushing
   sorting to the client is that the client might be able to filter the
   result
   further before sorting it. Since sorting is generally the most
   expensive
   operation it should be done as late as possible. However the idea of
   semi-sorting, to get only one page of sorted results at each request,
   that
   was mentioned in some thread yesterday sounds quite compelling.
   I agree that an equivalent of LIMIT, OFFSET and ORDER BY is a good
   target.
   As to indexing: the structure of the graph IS the index to a large
   extent.
   This means that a well designed graph would often not need paging if
   the
   traversal is done right. There are however some cases where this is
   hard to
   accomplish and we need to work on supporting those cases better.
   Remember that a Graph Database is NOT a Relational Database. A lot of
   the
   ideas people have about databases are based on their knowledge of
   Relational
   Databases. I understand that it can be hard, but if that baggage could
   be
   left at the door it would make things a lot easier. Nobody is saying
   that
   Relational Databases are dead (except for some publicity stunts) far
   from
   it! What we (and a lot of other people) are saying is that the age of
   "one
   database to rule them all" is over. Different problems are best solved
   with
   different kinds of databases, RDBMSes are great at some, K/V stores
   some,
   and Graph Databases are great for some. Then there are some problems
   that
   are best solved with a combination of two or more (kinds of) databases,
   where each database brings its own strengths to the table, and is used
   only
   for the things it is good at.
   That's enough deviation from the topic, my conclusions remain the same
   as
   they were before this discussion started, I will state them in as few
   words
   as possible and in bullet point form to convey them as clearly as I
   can:
   * The REST API will probably need result set limiting or pagination.
   * Limiting and pagination will require (server side) sorting
   * Sorting can be better implemented if it's implemented in the core of
   the
   traversal framework
   * Limiting / Pagination can be deferred for a while until we know what
   it
   needs to look like (from looking at actual uses)
   * (Server side) Sorting can be deferred until we need it for limiting /
   pagination
   Peace,
   Tobias
   On Thu, Apr 8, 2010 at 10:17 PM, Michael Ludwig  wrote:
   > Tobias Ivarsson schrieb am 08.04.2010 um 18:23:27 (+0200)
   > [Re: [Neo] Traversers in the REST API]:
   >
   > > On Wed, Apr 7, 2010 at 3:05 PM, Alastair James 
   > > wrote:
   >
   > > > when we start talking about returning 1000s of nodes in JSON over
   > > > HTTP just to get the first 10 this is clearly sub-optimal (as I
   > > > build websites this is a very common use case). So, as you say,
   > > > sorting and limiting can wait, but I suspect the HTTP API would
   > > > benefit from offering it. Limiting need not require changes to
   the
   > > > core API, it could be implemented as a second stage in the HTTP
   API
   > > > code prior to output encoding.
   > >
   > > For paging / limiting: yes, you are absolutely right, this would
   not
   > > effect the core API at all, only the

Re: [Neo] meta meta classes

2010-04-09 Thread Mattias Persson
Now we're getting somewhere... I like this solution a lot, thanks for the
great idea. Let me see if I get time to try and implement it soon! Or if you
get to if before me and supply a patch. It's nice either way :)

2010/4/9 Niels Hoogeveen 

>
> Having thought about implementation a little bit more, it would be neat to
> have MetaModelClass implement an interface similar to MetaModelRestrictable,
> with one difference, the setRange and getRange don't refer to a
> PropertyRange but to an InstanceRange.
>
> Like PropertyRange, InstanceRange should be an abstract class so several
> different different implementations of InstanceRanges can be provided:
>
> * InstanceEnumerationRange (enumerates the n instances the class contains
> and sets minCardinality = maxCardinality = n )
> * IndexedInstanceRange (indexes the n instances it contains and sets
> minCardinality = maxCardinality = n, making the class into an Instance
> array)
> * NullInstanceRange (set minCardinality = maxCardinality = 0, making the
> class abstract)
> * InstanceClassesRange (states that the instances of a class have to be
> instances of one of a given set of classes)
>
> The first three range types restrict the MetaModelClass to having no
> getDirectInstances.add functionality, and also the setting of cardinality
> should be disabled, which makes me wonder if we need a class
> FixedMetaModelClass, which simply doesn't provide these methods, or if we
> should throw exceptions when methods are inappropriately called, or if we
> allow inconsistent data like the rest of the meta model does.
>
>
> > From: pd_aficion...@hotmail.com
> > To: user@lists.neo4j.org
> > Date: Thu, 8 Apr 2010 23:52:02 +0200
> > Subject: Re: [Neo] meta meta classes
> >
> >
> > I think the best solution here is to have an instance enumeration on
> MetaModelClass. Singletons are special case of an enumeration.
> >
> > See:
> > http://www.w3.org/TR/2002/WD-owl-ref-20021112/#Enumerated
> >
> http://owl.cs.manchester.ac.uk/2007/05/api/javadoc/org/semanticweb/owl/model/OWLObjectOneOf.html
> >
> >
> >
> > > Date: Thu, 8 Apr 2010 22:33:11 +0200
> > > From: matt...@neotechnology.com
> > > To: user@lists.neo4j.org
> > > Subject: Re: [Neo] meta meta classes
> > >
> > > Ok now I get your point! Thank you for clarifying.
> > >
> > > Your singleton proposal could be a good idea then. Could it
> > > potentially be a hindrance in some scenario? I mean should we have a
> > > MetaModelClass#setSingleton(boolean) or something so that this
> > > behaviour can be controlled?
> > >
> > > 2010/4/8, Niels Hoogeveen :
> > > >
> > > > Each country needs to be modeled as classes, because I want to set
> the
> > > > restriction that French regions (which can have different properties
> from
> > > > Canadian provinces) can only have a relationship with the country
> France,
> > > > and Canadian provinces can only have a relationship with the country
> Canada.
> > > > The domain and the range of a PropertyType are classes not instances.
> If
> > > > countries were simply instances of the  "country" class, it would be
> > > > possible to say that an instance of a Canadian province is a
> subdivision of
> > > > France.
> > > >
> > > > I'd like to be able to iterate over the subdivision of France and
> have
> > > > guaranteed that each instance has the property "region code", a
> property
> > > > unknown to Canadian provinces. Without having a restriction stating
> that a
> > > > specific sub-division belongs to a specific country, any sub-division
> can be
> > > > related to any country, so a user may erroneously say that Alberta is
> a
> > > > French region. Not only is this factually incorrect, but structurally
> too.
> > > > Alberta, being a Canadian province, doesn't have the "region code"
> property,
> > > > which I want French regions to have.
> > > >
> > > >
> > > >> Date: Thu, 8 Apr 2010 20:06:32 +0200
> > > >> From: matt...@neotechnology.com
> > > >> To: user@lists.neo4j.org
> > > >> Subject: Re: [Neo] meta meta classes
> > > >>
> > > >> 2010/4/8 Niels Hoogeveen 
> > > >>
> > > >> >
> > > >> > Your point about the cardinality restriction is a correct
> observation.
> > > >> >
> > > >> > In fact it would be better to create a "is-subdivision-of"
> PropertyType
> > > >> > on
> > > >> > "sub-division" and give that a range "country" with a cardinality
> of 1.
> > > >> > Then
> > > >> > for each subclass of "sub-division" a restriction should be set,
> naming
> > > >> > the
> > > >> > "country" class this specific "sub-division" class applies to.
> > > >> >
> > > >> > Still, it requires  each country to be defined as both a class and
> an
> > > >> > instance.
> > > >> >
> > > >> Why just countries as classes? why not each subdivision as classes
> as
> > > >> well?
> > > >> Why countries as classes at all?
> > > >>
> > > >> >
> > > >> > > Date: Thu, 8 Apr 2010 19:15:30 +0200
> > > >> > > From: matt...@neotechnology.com
> > > >> > > To: user@lists.neo4j.org
> > > >> > > Subject: Re: [Neo] meta meta 

Re: [Neo] meta meta classes

2010-04-09 Thread Niels Hoogeveen

Having thought about implementation a little bit more, it would be neat to have 
MetaModelClass implement an interface similar to MetaModelRestrictable, with 
one difference, the setRange and getRange don't refer to a PropertyRange but to 
an InstanceRange. 

Like PropertyRange, InstanceRange should be an abstract class so several 
different different implementations of InstanceRanges can be provided:

* InstanceEnumerationRange (enumerates the n instances the class contains and 
sets minCardinality = maxCardinality = n )
* IndexedInstanceRange (indexes the n instances it contains and sets 
minCardinality = maxCardinality = n, making the class into an Instance array)
* NullInstanceRange (set minCardinality = maxCardinality = 0, making the class 
abstract)
* InstanceClassesRange (states that the instances of a class have to be 
instances of one of a given set of classes)

The first three range types restrict the MetaModelClass to having no 
getDirectInstances.add functionality, and also the setting of cardinality 
should be disabled, which makes me wonder if we need a class 
FixedMetaModelClass, which simply doesn't provide these methods, or if we 
should throw exceptions when methods are inappropriately called, or if we allow 
inconsistent data like the rest of the meta model does.


> From: pd_aficion...@hotmail.com
> To: user@lists.neo4j.org
> Date: Thu, 8 Apr 2010 23:52:02 +0200
> Subject: Re: [Neo] meta meta classes
> 
> 
> I think the best solution here is to have an instance enumeration on 
> MetaModelClass. Singletons are special case of an enumeration. 
> 
> See: 
> http://www.w3.org/TR/2002/WD-owl-ref-20021112/#Enumerated 
> http://owl.cs.manchester.ac.uk/2007/05/api/javadoc/org/semanticweb/owl/model/OWLObjectOneOf.html
> 
> 
> 
> > Date: Thu, 8 Apr 2010 22:33:11 +0200
> > From: matt...@neotechnology.com
> > To: user@lists.neo4j.org
> > Subject: Re: [Neo] meta meta classes
> > 
> > Ok now I get your point! Thank you for clarifying.
> > 
> > Your singleton proposal could be a good idea then. Could it
> > potentially be a hindrance in some scenario? I mean should we have a
> > MetaModelClass#setSingleton(boolean) or something so that this
> > behaviour can be controlled?
> > 
> > 2010/4/8, Niels Hoogeveen :
> > >
> > > Each country needs to be modeled as classes, because I want to set the
> > > restriction that French regions (which can have different properties from
> > > Canadian provinces) can only have a relationship with the country France,
> > > and Canadian provinces can only have a relationship with the country 
> > > Canada.
> > > The domain and the range of a PropertyType are classes not instances. If
> > > countries were simply instances of the  "country" class, it would be
> > > possible to say that an instance of a Canadian province is a subdivision 
> > > of
> > > France.
> > >
> > > I'd like to be able to iterate over the subdivision of France and have
> > > guaranteed that each instance has the property "region code", a property
> > > unknown to Canadian provinces. Without having a restriction stating that a
> > > specific sub-division belongs to a specific country, any sub-division can 
> > > be
> > > related to any country, so a user may erroneously say that Alberta is a
> > > French region. Not only is this factually incorrect, but structurally too.
> > > Alberta, being a Canadian province, doesn't have the "region code" 
> > > property,
> > > which I want French regions to have.
> > >
> > >
> > >> Date: Thu, 8 Apr 2010 20:06:32 +0200
> > >> From: matt...@neotechnology.com
> > >> To: user@lists.neo4j.org
> > >> Subject: Re: [Neo] meta meta classes
> > >>
> > >> 2010/4/8 Niels Hoogeveen 
> > >>
> > >> >
> > >> > Your point about the cardinality restriction is a correct observation.
> > >> >
> > >> > In fact it would be better to create a "is-subdivision-of" PropertyType
> > >> > on
> > >> > "sub-division" and give that a range "country" with a cardinality of 1.
> > >> > Then
> > >> > for each subclass of "sub-division" a restriction should be set, naming
> > >> > the
> > >> > "country" class this specific "sub-division" class applies to.
> > >> >
> > >> > Still, it requires  each country to be defined as both a class and an
> > >> > instance.
> > >> >
> > >> Why just countries as classes? why not each subdivision as classes as
> > >> well?
> > >> Why countries as classes at all?
> > >>
> > >> >
> > >> > > Date: Thu, 8 Apr 2010 19:15:30 +0200
> > >> > > From: matt...@neotechnology.com
> > >> > > To: user@lists.neo4j.org
> > >> > > Subject: Re: [Neo] meta meta classes
> > >> > >
> > >> > > So, you describe the model (with "country", "sub-division" and
> > >> > > "has-sub-division") which is OK! Then you not just want to add data
> > >> > > which
> > >> > > would conform to it, you also describe the highest level of that data
> > >> > > in
> > >> > the
> > >> > > meta model itself with cardinality for how many sub-divisions each
> > >> > > such
> > >> > > level must contain.
> > >> >

Re: [Neo] Find Dominating Set in a graph,Dead end using Neo4j

2010-04-09 Thread Tobias Ivarsson
Hi,

Sorry for the much delayed response.

First a disclaimer:
Neo4j makes no claim to magically make P=NP (the problem of finding a
minimal dominating set is NP-hard), it would be cool if Neo4j could be used
to prove such a thing, but I am not going to spend the rest of my life
attempting it.

I would suggest that you try to find a small dominating set rather than the
minimal dominating set. Even then I cant think of a solution that doesn't
require having all Nodes sorted by their degree in memory. If you want to do
this with the kind of numbers of Nodes we consider normal (from several
millions to billions), you would run into a lot of trouble.

The code you have written looks like it might instantiate the collections
you use for storing your result multiple times. But I cant tell without the
full source code.

Cheers,
Tobias

On Sun, Mar 28, 2010 at 3:01 PM, francis adediran wrote:

> Hi,
> I have been trying to solve the problem of minimal dominating set
> using Neo4j but to no avail. The algorithm
> to solve this problem is correct but implementing it using Neo4j is
> becoming a pain or maybe i'm not yet familiar
> with Neo as i thought i was.
>
> The minimal dominating set in a graph is a set of nodes such that
> every node in the network graph
> is a neighbor of at least one element of the set. In social networking
> terms, they are the set of people
> who collectively are friends with every one in the network.
> More info on dominating set here
> http://en.wikipedia.org/wiki/Dominating_set
> so for example,given this bi-directional graph network
> A
>  \
>   \
>   B
>  /   \
> / \
>C  D
> \ /
>   E
> B and C are the minimal dominating set. Also,the sets {B,E} and {B,D}
> are possible candidates depending on
> your algorithm. but the most important thing to get a set with minimal
> number of nodes
>
> the best algorithm know so far, is Chvatal algorithm. it basically
> selects the node
> which brings the most new nodes into the dominated set.
> In our graph above, the dominating set S is {B,C) and dominated set V
> is{A,D,E}
> B is selected because it brings in A,C and D
> C is selected because it brings in E
>
> The dominating set can be useful to marketers and even spammers as it
> provide an indirect way to reach an entire network
>
> I've tried implementing this in Neo4j,but the problem is since we need
> to keep track of the dominating and dominated sets
> neo4j keeps re-instantiating them in the Returnable Evaluator while
> traversing the network
> is there a better way to do this Neo4j?
>
> Traverser friendsTraverser = startNode.traverse(Order.BREADTH_FIRST,
>
> StopEvaluator.END_OF_GRAPH,
> new
> myReturnableEvaluator(){
>private ArrayList  DM = new ArrayList();
> //Dominating set
>private ArrayList  V = new ArrayList(); //dominated set
>@Override
>public boolean isReturnableNode(TraversalPosition pos) {
>int
> numberof_new_users=this.getNodesDegree(pos.currentNode());
>if(!(DM.isEmpty()))
>{
>//get node's conns
>Iterable relationships =
> pos.currentNode().getRelationships(Direction.OUTGOING);
>//get all there end nodes
>for(Relationship r:relationships)
>{
>Node end = r.getEndNode();
> .
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Tobias Ivarsson 
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversers in the REST API

2010-04-09 Thread Tobias Ivarsson
I definitely agree that limiting or paging a set of results is probably not
very useful without some sort of sorting. The (only) benefit of pushing
sorting to the client is that the client might be able to filter the result
further before sorting it. Since sorting is generally the most expensive
operation it should be done as late as possible. However the idea of
semi-sorting, to get only one page of sorted results at each request, that
was mentioned in some thread yesterday sounds quite compelling.

I agree that an equivalent of LIMIT, OFFSET and ORDER BY is a good target.

As to indexing: the structure of the graph IS the index to a large extent.
This means that a well designed graph would often not need paging if the
traversal is done right. There are however some cases where this is hard to
accomplish and we need to work on supporting those cases better.

Remember that a Graph Database is NOT a Relational Database. A lot of the
ideas people have about databases are based on their knowledge of Relational
Databases. I understand that it can be hard, but if that baggage could be
left at the door it would make things a lot easier. Nobody is saying that
Relational Databases are dead (except for some publicity stunts) far from
it! What we (and a lot of other people) are saying is that the age of "one
database to rule them all" is over. Different problems are best solved with
different kinds of databases, RDBMSes are great at some, K/V stores some,
and Graph Databases are great for some. Then there are some problems that
are best solved with a combination of two or more (kinds of) databases,
where each database brings its own strengths to the table, and is used only
for the things it is good at.


That's enough deviation from the topic, my conclusions remain the same as
they were before this discussion started, I will state them in as few words
as possible and in bullet point form to convey them as clearly as I can:
* The REST API will probably need result set limiting or pagination.
* Limiting and pagination will require (server side) sorting
* Sorting can be better implemented if it's implemented in the core of the
traversal framework
* Limiting / Pagination can be deferred for a while until we know what it
needs to look like (from looking at actual uses)
* (Server side) Sorting can be deferred until we need it for limiting /
pagination

Peace,
Tobias

On Thu, Apr 8, 2010 at 10:17 PM, Michael Ludwig  wrote:

> Tobias Ivarsson schrieb am 08.04.2010 um 18:23:27 (+0200)
> [Re: [Neo] Traversers in the REST API]:
>
> > On Wed, Apr 7, 2010 at 3:05 PM, Alastair James 
> > wrote:
>
> > > when we start talking about returning 1000s of nodes in JSON over
> > > HTTP just to get the first 10 this is clearly sub-optimal (as I
> > > build websites this is a very common use case). So, as you say,
> > > sorting and limiting can wait, but I suspect the HTTP API would
> > > benefit from offering it. Limiting need not require changes to the
> > > core API, it could be implemented as a second stage in the HTTP API
> > > code prior to output encoding.
> >
> > For paging / limiting: yes, you are absolutely right, this would not
> > effect the core API at all, only the REST API. Limiting/paging is
> > something we would probably add to the REST API before sorting.
>
> Limiting and paging usually go hand in hand with sorting, in my
> experience. Why would anyone want to page through an unsorted
> collection?
>
> > Sorting might be a similar case, but I still think the client would be
> > better fitted to do sorting well.
>
> The server has indexes to support the sorting. (If it doesn't, it has a
> problem anyway.) What does the client have to support sorting? So how
> would it be better fitted to do sorting well?
>
> > But once paging / limiting is added it would be quite natural / useful
> > to add sorting as well. What I want to avoid is keeping state on the
> > server while waiting for the client to request the next page.
>
> If you ensure a binary tree index is used to do the sorting, you should
> be fine.
>
> --
> Michael Ludwig
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Tobias Ivarsson 
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] verbosequadstore loading

2010-04-09 Thread Mattias Persson
There are some problems at the moment regarding insertion speeds.

o We haven't yet created an rdf store which can use a BatchInserter (which
could also be tweaked to ignore checking if statements already exists before
it adds each statement and all that).
o The other one is that the sail layer on top of the neo4j-rdf component
contains functionality which allows a thread to have more than one running
transaction at the same time. This was added due to some users requirements,
but slows it down by a factor 2 or something (not sure about this).

I would like to see both these issues resolved soon, and when they are fixed
insertion speeds will be quite nice!

2010/4/9 Lyudmila L. Balakireva 

> Hi,
> How to optimize loading to the VerboseQuadStore?
>  I am doing test similar to the  test example from neo rdf sail   and it
> is  very slow.  The size of  files 3G - 7G .
> Thanks,
> Luda
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] sail and time index

2010-04-09 Thread Mattias Persson
You could perhaps use the LuceneFulltextQueryIndexService (which is a
LuceneFulltextIndexService, but which interprets the "value" argument in
getNodes() as lucene query
syntax).
Index your | as a one concatenated value and query it with range
queries,
f.ex: [http://my-uri|0 TO http://my-uri|1270797634350] (since
all values are strings in lucene), and grabbing the last one. If the results
of the range query could be reversed you could grab the first one instead,
which would be much better.

This solution doesn't strike me as being particularly good, but might work
(I haven't tried it).

2010/4/8 Lyudmila L. Balakireva 

> Hi,
>
> I have  question how to better deal with time range index. I  am using
> Sail layer to build a triple store with context as time value.  I need
> binary search the nearest context value for the given uri and requested
> time.
> For example for the mysql it will be  table [ uri,time ] where rows:
> u ,t1
> u, t2
> u1,t1
> u2,t2
> etc.
>  having  the given uri  u and  time t   t1 < t  nearest t1 for uri u  by query: select max (time) where time =u;
>
> Is any  internal trick  I can use  for neo4j to build such index or
> optimize the operation?
>  The timeline index  is somewhat  different since I need first to break
> recordset by uri and then find time. It will be many millions of
> timelines attached to the one node and I still need to iterate thru the
> nodes.  What else can be done here then Mysql?
> Thank you for help,
> Luda
> ___
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user