Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread Peter Neubauer
Hi there,
as both Rick and Michael say - there is different use cases, different
ways of organizing your data and different ways to query it.

Marko and me tried to summarize what is working especially good with
Graph Databases and what not:
http://markorodriguez.com/Blarko/Entries/2010/4/7_The_Graph_Traversal_Pattern.html,
the paper is even at
http://www.scribd.com/doc/29591188/The-Graph-Traversal-Pattern for
easy reading.

To cite the conclusion:
--8
4 Conclusion
Graphs are a flexible modeling construct that can be used to model a domain
and the indices that partition that domain into an efficient, searchable space.
When the relations between the objects of the domain are seen as vertex
partitions, then a graph is simply an index that relates vertices to vertices by
edges. The way in which these vertices relate to each other determines which
graph traversals are most efficient to execute and which problems can be solved
by the graph data structure. Graph databases and the graph traversal pattern
do not require a global analysis of data. For many problems, only local subsets
of the graph need to be traversed to yield a solution. By structuring the graph
in such a way as to minimize traversal steps, limit the use of external indices,
and reduce the number of set-based operations, modelers gain great efficiency
that is difficult to accomplish with other data management solutions.
---8

So, graphs are no silver bullets but can gain great efficiency when
applied to the right problems in the right manner.

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk:  neubauer.peter
Skype   peter.neubauer
Phone   +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter  http://twitter.com/peterneubauer

http://www.neo4j.org   - Your high performance graph database.
http://nosqleu.org- The biggest NOSQL event. Ever.
http://www.thoughtmade.com - Scandinavias coolest Bring-a-Thing party.



On Fri, Apr 9, 2010 at 12:33 AM, Michael Ludwig mil...@gmx.de wrote:
 rick.bullotta schrieb am 08.04.2010 um 15:16:11 (-0700)
 [Re: [Neo] How to efficiently query in Neo4J?]:

 Factor in a wide range of SLAs needed for performance vs availability
 vs affordability vs scalability vs adminstration costs, and the
 equation gets a whole lot more complicated.

 Granted.

 I'm sure there's a graphy-model for the tag/post example that could be
 made smoking fast with Neo also.

 Sure, but there's also a way of looking at screws that might suggest you
 should use a hammer ;-) and it would be wrong. Which doesn't mean it
 couldn't be modeled for the tag/post example - just a general caveat to
 think about both tools and problems when trying to find a good solution.

 Throw columnar storage, key-value, and document DB's into the mix, and
 the good news is that we have a lot of weapons in our arsenal now to
 tackle very demanding and diverse application challenges!

 Yes, it's becoming very interesting. Lots of new high-level tools for
 specialized or relaxed requirements.

 SQL won't be dethroned, though.
 --
 Michael Ludwig
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] sail and time index

2010-04-09 Thread Mattias Persson
You could perhaps use the LuceneFulltextQueryIndexService (which is a
LuceneFulltextIndexService, but which interprets the value argument in
getNodes() as lucene query
syntaxhttp://lucene.apache.org/java/2_9_1/queryparsersyntax.html).
Index your URI|time as a one concatenated value and query it with range
querieshttp://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Range%20Searches,
f.ex: [http://my-uri|0 TO http://my-uri|1270797634350] (since
all values are strings in lucene), and grabbing the last one. If the results
of the range query could be reversed you could grab the first one instead,
which would be much better.

This solution doesn't strike me as being particularly good, but might work
(I haven't tried it).

2010/4/8 Lyudmila L. Balakireva lu...@lanl.gov

 Hi,

 I have  question how to better deal with time range index. I  am using
 Sail layer to build a triple store with context as time value.  I need
 binary search the nearest context value for the given uri and requested
 time.
 For example for the mysql it will be  table [ uri,time ] where rows:
 u ,t1
 u, t2
 u1,t1
 u2,t2
 etc.
  having  the given uri  u and  time t   t1  t t2  I can fast find  the
 nearest t1 for uri u  by query: select max (time) where timet  and uri
 =u;

 Is any  internal trick  I can use  for neo4j to build such index or
 optimize the operation?
  The timeline index  is somewhat  different since I need first to break
 recordset by uri and then find time. It will be many millions of
 timelines attached to the one node and I still need to iterate thru the
 nodes.  What else can be done here then Mysql?
 Thank you for help,
 Luda
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] verbosequadstore loading

2010-04-09 Thread Mattias Persson
There are some problems at the moment regarding insertion speeds.

o We haven't yet created an rdf store which can use a BatchInserter (which
could also be tweaked to ignore checking if statements already exists before
it adds each statement and all that).
o The other one is that the sail layer on top of the neo4j-rdf component
contains functionality which allows a thread to have more than one running
transaction at the same time. This was added due to some users requirements,
but slows it down by a factor 2 or something (not sure about this).

I would like to see both these issues resolved soon, and when they are fixed
insertion speeds will be quite nice!

2010/4/9 Lyudmila L. Balakireva lu...@lanl.gov

 Hi,
 How to optimize loading to the VerboseQuadStore?
  I am doing test similar to the  test example from neo rdf sail   and it
 is  very slow.  The size of  files 3G - 7G .
 Thanks,
 Luda
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversers in the REST API

2010-04-09 Thread Tobias Ivarsson
I definitely agree that limiting or paging a set of results is probably not
very useful without some sort of sorting. The (only) benefit of pushing
sorting to the client is that the client might be able to filter the result
further before sorting it. Since sorting is generally the most expensive
operation it should be done as late as possible. However the idea of
semi-sorting, to get only one page of sorted results at each request, that
was mentioned in some thread yesterday sounds quite compelling.

I agree that an equivalent of LIMIT, OFFSET and ORDER BY is a good target.

As to indexing: the structure of the graph IS the index to a large extent.
This means that a well designed graph would often not need paging if the
traversal is done right. There are however some cases where this is hard to
accomplish and we need to work on supporting those cases better.

Remember that a Graph Database is NOT a Relational Database. A lot of the
ideas people have about databases are based on their knowledge of Relational
Databases. I understand that it can be hard, but if that baggage could be
left at the door it would make things a lot easier. Nobody is saying that
Relational Databases are dead (except for some publicity stunts) far from
it! What we (and a lot of other people) are saying is that the age of one
database to rule them all is over. Different problems are best solved with
different kinds of databases, RDBMSes are great at some, K/V stores some,
and Graph Databases are great for some. Then there are some problems that
are best solved with a combination of two or more (kinds of) databases,
where each database brings its own strengths to the table, and is used only
for the things it is good at.


That's enough deviation from the topic, my conclusions remain the same as
they were before this discussion started, I will state them in as few words
as possible and in bullet point form to convey them as clearly as I can:
* The REST API will probably need result set limiting or pagination.
* Limiting and pagination will require (server side) sorting
* Sorting can be better implemented if it's implemented in the core of the
traversal framework
* Limiting / Pagination can be deferred for a while until we know what it
needs to look like (from looking at actual uses)
* (Server side) Sorting can be deferred until we need it for limiting /
pagination

Peace,
Tobias

On Thu, Apr 8, 2010 at 10:17 PM, Michael Ludwig mil...@gmx.de wrote:

 Tobias Ivarsson schrieb am 08.04.2010 um 18:23:27 (+0200)
 [Re: [Neo] Traversers in the REST API]:

  On Wed, Apr 7, 2010 at 3:05 PM, Alastair James al.ja...@gmail.com
  wrote:

   when we start talking about returning 1000s of nodes in JSON over
   HTTP just to get the first 10 this is clearly sub-optimal (as I
   build websites this is a very common use case). So, as you say,
   sorting and limiting can wait, but I suspect the HTTP API would
   benefit from offering it. Limiting need not require changes to the
   core API, it could be implemented as a second stage in the HTTP API
   code prior to output encoding.
 
  For paging / limiting: yes, you are absolutely right, this would not
  effect the core API at all, only the REST API. Limiting/paging is
  something we would probably add to the REST API before sorting.

 Limiting and paging usually go hand in hand with sorting, in my
 experience. Why would anyone want to page through an unsorted
 collection?

  Sorting might be a similar case, but I still think the client would be
  better fitted to do sorting well.

 The server has indexes to support the sorting. (If it doesn't, it has a
 problem anyway.) What does the client have to support sorting? So how
 would it be better fitted to do sorting well?

  But once paging / limiting is added it would be quite natural / useful
  to add sorting as well. What I want to avoid is keeping state on the
  server while waiting for the client to request the next page.

 If you ensure a binary tree index is used to do the sorting, you should
 be fine.

 --
 Michael Ludwig
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Tobias Ivarsson tobias.ivars...@neotechnology.com
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Find Dominating Set in a graph,Dead end using Neo4j

2010-04-09 Thread Tobias Ivarsson
Hi,

Sorry for the much delayed response.

First a disclaimer:
Neo4j makes no claim to magically make P=NP (the problem of finding a
minimal dominating set is NP-hard), it would be cool if Neo4j could be used
to prove such a thing, but I am not going to spend the rest of my life
attempting it.

I would suggest that you try to find a small dominating set rather than the
minimal dominating set. Even then I cant think of a solution that doesn't
require having all Nodes sorted by their degree in memory. If you want to do
this with the kind of numbers of Nodes we consider normal (from several
millions to billions), you would run into a lot of trouble.

The code you have written looks like it might instantiate the collections
you use for storing your result multiple times. But I cant tell without the
full source code.

Cheers,
Tobias

On Sun, Mar 28, 2010 at 3:01 PM, francis adediran adedi...@gmail.comwrote:

 Hi,
 I have been trying to solve the problem of minimal dominating set
 using Neo4j but to no avail. The algorithm
 to solve this problem is correct but implementing it using Neo4j is
 becoming a pain or maybe i'm not yet familiar
 with Neo as i thought i was.

 The minimal dominating set in a graph is a set of nodes such that
 every node in the network graph
 is a neighbor of at least one element of the set. In social networking
 terms, they are the set of people
 who collectively are friends with every one in the network.
 More info on dominating set here
 http://en.wikipedia.org/wiki/Dominating_set
 so for example,given this bi-directional graph network
 A
  \
   \
   B
  /   \
 / \
C  D
 \ /
   E
 B and C are the minimal dominating set. Also,the sets {B,E} and {B,D}
 are possible candidates depending on
 your algorithm. but the most important thing to get a set with minimal
 number of nodes

 the best algorithm know so far, is Chvatal algorithm. it basically
 selects the node
 which brings the most new nodes into the dominated set.
 In our graph above, the dominating set S is {B,C) and dominated set V
 is{A,D,E}
 B is selected because it brings in A,C and D
 C is selected because it brings in E

 The dominating set can be useful to marketers and even spammers as it
 provide an indirect way to reach an entire network

 I've tried implementing this in Neo4j,but the problem is since we need
 to keep track of the dominating and dominated sets
 neo4j keeps re-instantiating them in the Returnable Evaluator while
 traversing the network
 is there a better way to do this Neo4j?

 Traverser friendsTraverser = startNode.traverse(Order.BREADTH_FIRST,

 StopEvaluator.END_OF_GRAPH,
 new
 myReturnableEvaluator(){
private ArrayList Node DM = new ArrayListNode();
 //Dominating set
private ArrayList Node V = new ArrayListNode(); //dominated set
@Override
public boolean isReturnableNode(TraversalPosition pos) {
int
 numberof_new_users=this.getNodesDegree(pos.currentNode());
if(!(DM.isEmpty()))
{
//get node's conns
IterableRelationship relationships =
 pos.currentNode().getRelationships(Direction.OUTGOING);
//get all there end nodes
for(Relationship r:relationships)
{
Node end = r.getEndNode();
 .
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Tobias Ivarsson tobias.ivars...@neotechnology.com
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] meta meta classes

2010-04-09 Thread Mattias Persson
Now we're getting somewhere... I like this solution a lot, thanks for the
great idea. Let me see if I get time to try and implement it soon! Or if you
get to if before me and supply a patch. It's nice either way :)

2010/4/9 Niels Hoogeveen pd_aficion...@hotmail.com


 Having thought about implementation a little bit more, it would be neat to
 have MetaModelClass implement an interface similar to MetaModelRestrictable,
 with one difference, the setRange and getRange don't refer to a
 PropertyRange but to an InstanceRange.

 Like PropertyRange, InstanceRange should be an abstract class so several
 different different implementations of InstanceRanges can be provided:

 * InstanceEnumerationRange (enumerates the n instances the class contains
 and sets minCardinality = maxCardinality = n )
 * IndexedInstanceRange (indexes the n instances it contains and sets
 minCardinality = maxCardinality = n, making the class into an Instance
 array)
 * NullInstanceRange (set minCardinality = maxCardinality = 0, making the
 class abstract)
 * InstanceClassesRange (states that the instances of a class have to be
 instances of one of a given set of classes)

 The first three range types restrict the MetaModelClass to having no
 getDirectInstances.add functionality, and also the setting of cardinality
 should be disabled, which makes me wonder if we need a class
 FixedMetaModelClass, which simply doesn't provide these methods, or if we
 should throw exceptions when methods are inappropriately called, or if we
 allow inconsistent data like the rest of the meta model does.


  From: pd_aficion...@hotmail.com
  To: user@lists.neo4j.org
  Date: Thu, 8 Apr 2010 23:52:02 +0200
  Subject: Re: [Neo] meta meta classes
 
 
  I think the best solution here is to have an instance enumeration on
 MetaModelClass. Singletons are special case of an enumeration.
 
  See:
  http://www.w3.org/TR/2002/WD-owl-ref-20021112/#Enumerated
 
 http://owl.cs.manchester.ac.uk/2007/05/api/javadoc/org/semanticweb/owl/model/OWLObjectOneOf.html
 
 
 
   Date: Thu, 8 Apr 2010 22:33:11 +0200
   From: matt...@neotechnology.com
   To: user@lists.neo4j.org
   Subject: Re: [Neo] meta meta classes
  
   Ok now I get your point! Thank you for clarifying.
  
   Your singleton proposal could be a good idea then. Could it
   potentially be a hindrance in some scenario? I mean should we have a
   MetaModelClass#setSingleton(boolean) or something so that this
   behaviour can be controlled?
  
   2010/4/8, Niels Hoogeveen pd_aficion...@hotmail.com:
   
Each country needs to be modeled as classes, because I want to set
 the
restriction that French regions (which can have different properties
 from
Canadian provinces) can only have a relationship with the country
 France,
and Canadian provinces can only have a relationship with the country
 Canada.
The domain and the range of a PropertyType are classes not instances.
 If
countries were simply instances of the  country class, it would be
possible to say that an instance of a Canadian province is a
 subdivision of
France.
   
I'd like to be able to iterate over the subdivision of France and
 have
guaranteed that each instance has the property region code, a
 property
unknown to Canadian provinces. Without having a restriction stating
 that a
specific sub-division belongs to a specific country, any sub-division
 can be
related to any country, so a user may erroneously say that Alberta is
 a
French region. Not only is this factually incorrect, but structurally
 too.
Alberta, being a Canadian province, doesn't have the region code
 property,
which I want French regions to have.
   
   
Date: Thu, 8 Apr 2010 20:06:32 +0200
From: matt...@neotechnology.com
To: user@lists.neo4j.org
Subject: Re: [Neo] meta meta classes
   
2010/4/8 Niels Hoogeveen pd_aficion...@hotmail.com
   

 Your point about the cardinality restriction is a correct
 observation.

 In fact it would be better to create a is-subdivision-of
 PropertyType
 on
 sub-division and give that a range country with a cardinality
 of 1.
 Then
 for each subclass of sub-division a restriction should be set,
 naming
 the
 country class this specific sub-division class applies to.

 Still, it requires  each country to be defined as both a class and
 an
 instance.

Why just countries as classes? why not each subdivision as classes
 as
well?
Why countries as classes at all?
   

  Date: Thu, 8 Apr 2010 19:15:30 +0200
  From: matt...@neotechnology.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo] meta meta classes
 
  So, you describe the model (with country, sub-division and
  has-sub-division) which is OK! Then you not just want to add
 data
  which
  would conform to it, you also describe the highest level of that
 data
  in
 the
  meta model itself with cardinality for how 

Re: [Neo] Traversers in the REST API

2010-04-09 Thread rick . bullotta
   Since in manycases the results of a query will need to be reformed into
   their associated domain objects, we've chosen to do our sorting at that
   point (and on the server).  We do our (primary) filtering within the
   traversal/DB-domain object processes.  That seems to work well.



   Pagination is kinda tricky if the data changes between subsequent
   requests for pages.  Since pagination is generally used for UIs, a
   common approach is to place the entire dataset (or a cursor, depending
   on where the data is coming from) in a session object.  Regardless of
   where it is kept, if you want to deal with data changes, you either
   have to a) invalidate the cached dataset if data changes or b) keep a
   copy of the whole dataset around in its as queried state so that
   subsequent paging requests are consistent.  Either case involves
   keeping a fairly big duplicate data structure on the server or middle
   tier and violates one of the objectives of REST-ful APIs, which is that
   of statelessness.  For that reason, I personally think the REST-ful API
   shouldn't deal with paging.  It should probably be done at some
   intermediate level as needed by applications.  We can certainly build a
   separate API that we can all leverage if needed, but I don't think it
   should be in the core REST-ful layer.



   Just my $0.02, after taxes.







    Original Message 
   Subject: Re: [Neo] Traversers in the REST API
   From: Tobias Ivarsson tobias.ivars...@neotechnology.com
   Date: Fri, April 09, 2010 4:00 am
   To: Neo user discussions user@lists.neo4j.org
   I definitely agree that limiting or paging a set of results is probably
   not
   very useful without some sort of sorting. The (only) benefit of pushing
   sorting to the client is that the client might be able to filter the
   result
   further before sorting it. Since sorting is generally the most
   expensive
   operation it should be done as late as possible. However the idea of
   semi-sorting, to get only one page of sorted results at each request,
   that
   was mentioned in some thread yesterday sounds quite compelling.
   I agree that an equivalent of LIMIT, OFFSET and ORDER BY is a good
   target.
   As to indexing: the structure of the graph IS the index to a large
   extent.
   This means that a well designed graph would often not need paging if
   the
   traversal is done right. There are however some cases where this is
   hard to
   accomplish and we need to work on supporting those cases better.
   Remember that a Graph Database is NOT a Relational Database. A lot of
   the
   ideas people have about databases are based on their knowledge of
   Relational
   Databases. I understand that it can be hard, but if that baggage could
   be
   left at the door it would make things a lot easier. Nobody is saying
   that
   Relational Databases are dead (except for some publicity stunts) far
   from
   it! What we (and a lot of other people) are saying is that the age of
   one
   database to rule them all is over. Different problems are best solved
   with
   different kinds of databases, RDBMSes are great at some, K/V stores
   some,
   and Graph Databases are great for some. Then there are some problems
   that
   are best solved with a combination of two or more (kinds of) databases,
   where each database brings its own strengths to the table, and is used
   only
   for the things it is good at.
   That's enough deviation from the topic, my conclusions remain the same
   as
   they were before this discussion started, I will state them in as few
   words
   as possible and in bullet point form to convey them as clearly as I
   can:
   * The REST API will probably need result set limiting or pagination.
   * Limiting and pagination will require (server side) sorting
   * Sorting can be better implemented if it's implemented in the core of
   the
   traversal framework
   * Limiting / Pagination can be deferred for a while until we know what
   it
   needs to look like (from looking at actual uses)
   * (Server side) Sorting can be deferred until we need it for limiting /
   pagination
   Peace,
   Tobias
   On Thu, Apr 8, 2010 at 10:17 PM, Michael Ludwig mil...@gmx.de wrote:
Tobias Ivarsson schrieb am 08.04.2010 um 18:23:27 (+0200)
[Re: [Neo] Traversers in the REST API]:
   
 On Wed, Apr 7, 2010 at 3:05 PM, Alastair James al.ja...@gmail.com
 wrote:
   
  when we start talking about returning 1000s of nodes in JSON over
  HTTP just to get the first 10 this is clearly sub-optimal (as I
  build websites this is a very common use case). So, as you say,
  sorting and limiting can wait, but I suspect the HTTP API would
  benefit from offering it. Limiting need not require changes to
   the
  core API, it could be implemented as a second stage in the HTTP
   API
  code prior to output encoding.

 For paging / limiting: yes, you are absolutely right, this would
   not
 

Re: [Neo] Traversers in the REST API

2010-04-09 Thread Alastair James
Since in manycases the results of a query will need to be reformed into
  their associated domain objects

Unlikely to be the case over the HTTP API.  Its unlikely people will create
domain objects in (e.g.) PHP they will just use the data directly.

Pagination is kinda tricky if the data changes between subsequent
   requests for pages.  Since pagination is generally used for UIs, a
   common approach is to place the entire dataset (or a cursor, depending
   on where the data is coming from) in a session object.  Regardless of
   where it is kept, if you want to deal with data changes, you either
   have to a) invalidate the cached dataset if data changes or b) keep a
   copy of the whole dataset around in its as queried state so that
   subsequent paging requests are consistent.  Either case involves
   keeping a fairly big duplicate data structure on the server or middle
   tier and violates one of the objectives of REST-ful APIs, which is that
   of statelessness.  For that reason, I personally think the REST-ful API
   shouldn't deal with paging.  It should probably be done at some
   intermediate level as needed by applications.  We can certainly build a
   separate API that we can all leverage if needed, but I don't think it
   should be in the core REST-ful layer.


Well, I think for my use cases (websites), its likely that users dont flick
between pages that often. For example, on may sites, users will view page 1
and select an item, any very view move on to page 2. Its a very different
usage pattern compared to a traditional desktop UI, so there
is absolutely no need to hold the sorted set on the server in a cursor type
way.

A typical use case for me would be 1000+ matching rows, with 90%+ of page
views for the first 10, 5% for the next 10 etc...! You can clearly see that
sending the entire results set of 1000+ rows over HTTP/JSON is inefficient.

Of course, caching between the web server and the neo HTTP API can help, but
not in all cases, and it seems silly to rely on this.

Al




-- 
Dr Alastair James
CTO James Publishing Ltd.
http://www.linkedin.com/pub/3/914/163

www.worldreviewer.com

WINNER Travolution Awards Best Travel Information Website 2009
WINNER IRHAS Awards, Los Angeles, Best Travel Website 2008
WINNER Travolution Awards Best New Online Travel Company 2008
WINNER Travel Weekly Magellan Award 2008
WINNER Yahoo! Finds of the Year 2007

Noli nothis permittere te terere!
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Traversers in the REST API

2010-04-09 Thread rick . bullotta
   Why not just slap memcached in the middle?  Would help with scalability
   as well, plus you could keep cached results keyed by query params in
   there if needed.  Just a thought...



    Original Message 
   Subject: Re: [Neo] Traversers in the REST API
   From: Alastair James al.ja...@gmail.com
   Date: Fri, April 09, 2010 8:32 am
   To: Neo user discussions user@lists.neo4j.org
   Since in manycases the results of a query will need to be reformed
   into
their associated domain objects
   Unlikely to be the case over the HTTP API. Its unlikely people will
   create
   domain objects in (e.g.) PHP they will just use the data directly.
   Pagination is kinda tricky if the data changes between subsequent
requests for pages. Since pagination is generally used for UIs, a
common approach is to place the entire dataset (or a cursor,
   depending
on where the data is coming from) in a session object. Regardless of
where it is kept, if you want to deal with data changes, you either
have to a) invalidate the cached dataset if data changes or b) keep
   a
copy of the whole dataset around in its as queried state so that
subsequent paging requests are consistent. Either case involves
keeping a fairly big duplicate data structure on the server or middle
tier and violates one of the objectives of REST-ful APIs, which is
   that
of statelessness. For that reason, I personally think the REST-ful
   API
shouldn't deal with paging. It should probably be done at some
intermediate level as needed by applications. We can certainly build
   a
separate API that we can all leverage if needed, but I don't think it
should be in the core REST-ful layer.
   
   Well, I think for my use cases (websites), its likely that users dont
   flick
   between pages that often. For example, on may sites, users will view
   page 1
   and select an item, any very view move on to page 2. Its a very
   different
   usage pattern compared to a traditional desktop UI, so there
   is absolutely no need to hold the sorted set on the server in a cursor
   type
   way.
   A typical use case for me would be 1000+ matching rows, with 90%+ of
   page
   views for the first 10, 5% for the next 10 etc...! You can clearly see
   that
   sending the entire results set of 1000+ rows over HTTP/JSON is
   inefficient.
   Of course, caching between the web server and the neo HTTP API can
   help, but
   not in all cases, and it seems silly to rely on this.
   Al
   --
   Dr Alastair James
   CTO James Publishing Ltd.
   [1]http://www.linkedin.com/pub/3/914/163
   [2]www.worldreviewer.com
   WINNER Travolution Awards Best Travel Information Website 2009
   WINNER IRHAS Awards, Los Angeles, Best Travel Website 2008
   WINNER Travolution Awards Best New Online Travel Company 2008
   WINNER Travel Weekly Magellan Award 2008
   WINNER Yahoo! Finds of the Year 2007
   Noli nothis permittere te terere!
   ___
   Neo mailing list
   User@lists.neo4j.org
   [3]https://lists.neo4j.org/mailman/listinfo/user

References

   1. http://www.linkedin.com/pub/3/914/163
   2. http://www.worldreviewer.com/
   3. https://lists.neo4j.org/mailman/listinfo/user
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread Alastair James
On 9 April 2010 14:21, Max De Marzi Jr. maxdema...@gmail.com wrote:

 On first traversal, add a relationship to a found node to each node that
 would return, and check for this relationship on the second traversal?
 Maybe create a unique id, set a property or add a node property with the
 unique id on the first traversal, and check for this property on the second
 traversal?


I suspect that would lead to pretty poor performance given that I am
altering the data on each node visited. Also, I would have to remember to
remove these properties / relationships later on otherwise the database will
balloon in size.

That said, it would be suitable for queries that I could generate (say)
every hour. So adding relations to the graph would be acceptable for a index
that is built using a 'slow query', but not suitable for realtime results.
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread Michael Ludwig
Alastair James schrieb am 09.04.2010 um 14:04:37 (+0100)
[Re: [Neo] How to efficiently query in Neo4J?]:

 So, I suppose this question boils down to, is there an efficient way
 to calculate the union of two traversals without retrieving all result
 sets and performing the union in user code?

No need for two traversals if you annotate your category tree in Neo4j
the same way Celko has popularized for SQL, i.e. marking each category
with *left* and *right*. It's really not a question of graph or sets,
as in both cases what you deal with is a tree.

http://intelligent-enterprise.informationweek.com/001020/celko.jhtml

Note that this needs some custom logic for category tree updates. But
it's not difficult in SQL, and I think it's not much more difficult in
Neo4j either.

-- 
Michael Ludwig
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] How to efficiently query in Neo4J?

2010-04-09 Thread rick . bullotta
   In a previous post, I suggested a two-way traversal (I guess it's
   actually a traversal done once in one direction and n-1 times in the
   other direction, where n = number of tags you're matching posts on).



   I'm willing to bet it could be pretty fast...



   Do you have any code that can create a dummy data set in Neo4J?  If you
   do, I'd be willing to give it a try...





    Original Message 
   Subject: Re: [Neo] How to efficiently query in Neo4J?
   From: Michael Ludwig mil...@gmx.de
   Date: Fri, April 09, 2010 2:44 pm
   To: Neo user discussions user@lists.neo4j.org
   Alastair James schrieb am 09.04.2010 um 14:04:37 (+0100)
   [Re: [Neo] How to efficiently query in Neo4J?]:
So, I suppose this question boils down to, is there an efficient way
to calculate the union of two traversals without retrieving all
   result
sets and performing the union in user code?
   No need for two traversals if you annotate your category tree in Neo4j
   the same way Celko has popularized for SQL, i.e. marking each category
   with *left* and *right*. It's really not a question of graph or sets,
   as in both cases what you deal with is a tree.
   [1]http://intelligent-enterprise.informationweek.com/001020/celko.jhtml
   Note that this needs some custom logic for category tree updates. But
   it's not difficult in SQL, and I think it's not much more difficult in
   Neo4j either.
   --
   Michael Ludwig
   ___
   Neo mailing list
   User@lists.neo4j.org
   [2]https://lists.neo4j.org/mailman/listinfo/user

References

   1. http://intelligent-enterprise.informationweek.com/001020/celko.jhtml
   2. https://lists.neo4j.org/mailman/listinfo/user
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user