Re: [Neo4j] Question from Webinar - traversing a path with nodes of different types

2011-04-22 Thread David Montag
Hi Vipul,

Out of curiosity, what does process in this context mean?

As Rick alludes to, you'd have some component performing the simulation
using the domain objects and possibly a graph traversal.

An example of an algorithm for this would be to walk the graph from 1, and
whenever you find a branch, you split the walk. When you finish walking a
branch (a point where more than one branch joins) you use some kind of
synchronization to join the walks.

Does this make sense?

David

On Wed, Apr 20, 2011 at 11:16 PM, Vipul Gupta vipulgupta...@gmail.comwrote:

 Hi David,

 Inputs are 1 and 6 + Graph is acyclic.

 domain.Client@1 - domain.Router@2 - domain.Router@3 - domain.Router@5- 
 domain.Server@6
   - domain.Router@7 - domain.Router@8 -

 I want a way to start from 1,

 process the 2 path till it reaches 5 (say in a thread)
 process the 7 path till it reaches 5 (in another thread)

  then process 5 and eventually 6.
 the above step of processing intermediate path and waiting on the blocking
 point can happen over and over again in a more complex graph (that is there
 could be a number of loops in between even) and the traversal stops only we
 reach 6

 I hope this makes it a bit clear. I was working out something for this, but
 it is turning out to be too complex a solution for this sort of traversal of
 a graph, so I am hoping if you can suggest something.

 Best Regards,
 Vipul


 On Thu, Apr 21, 2011 at 11:36 AM, David Montag 
 david.mon...@neotechnology.com wrote:

 Hi Vipul,

 Zooming out a little bit, what are the inputs to your algorithm, and what
 do you want it to do?

 For example, given 1 and 6, do you want to find any points in the chain
 between them that are join points of two (or more) subchains (5 in this
 case)?

 David


 On Wed, Apr 20, 2011 at 10:56 PM, Vipul Gupta vipulgupta...@gmail.comwrote:

 my mistake - I meant 5 depends on both 3 and 8 and acts as a blocking
 point till 3 and 8 finishes


 On Thu, Apr 21, 2011 at 11:19 AM, Vipul Gupta 
 vipulgupta...@gmail.comwrote:

 David/Michael,

 Let me modify the example a bit.
 What if my graph structure is like this

 domain.Client@1 - domain.Router@2 - domain.Router@3 -
 domain.Router@5 - domain.Server@6
   - domain.Router@7 - domain.Router@8 -


 Imagine a manufacturing line.
 6 depends on both 3 and 8 and acts as a blocking point till 3 and 8
 finishes.

 Is there a way to get a cleaner traversal for such kind of
 relationship. I want to get a complete intermediate traversal from
 Client to Server.

 Thank a lot for helping out on this.

 Best Regards,
 Vipul




 On Thu, Apr 21, 2011 at 12:09 AM, David Montag 
 david.mon...@neotechnology.com wrote:

 Hi Vipul,

 Thanks for listening!

 It's a very good question, and the short answer is: yes! I'm cc'ing our
 mailing list so that everyone can take part in the answer.

 Here's the long answer, illustrated by an example:

 Let's assume you're modeling a network. You'll have some domain classes
 that are all networked entities with peers:

 @NodeEntity
 public class NetworkEntity {
 @RelatedTo(type = PEER, direction = Direction.BOTH, elementClass
 = NetworkEntity.class)
 private SetNetworkEntity peers;

 public void addPeer(NetworkEntity peer) {
 peers.add(peer);
 }
 }

 public class Server extends NetworkEntity {}
 public class Router extends NetworkEntity {}
 public class Client extends NetworkEntity {}

 Then we can build a small network:

 Client c = new Client().persist();
 Router r1 = new Router().persist();
 Router r21 = new Router().persist();
 Router r22 = new Router().persist();
 Router r3 = new Router().persist();
 Server s = new Server().persist();

 c.addPeer(r1);
 r1.addPeer(r21);
 r1.addPeer(r22);
 r21.addPeer(r3);
 r22.addPeer(r3);
 r3.addPeer(s);

 c.persist();

 Note that after linking the entities, I only call persist() on the
 client. You can read more about this in the reference documentation, but
 essentially it will cascade in the direction of the relationships created,
 and will in this case cascade all the way to the server entity.

 You can now query this:

 IterableEntityPathClient, Server paths =
 c.findAllPathsByTraversal(Traversal.description());

 The above code will get you an EntityPath per node visited during the
 traversal from c. The example does however not use a very interesting
 traversal description, but you can still print the results:

 for (EntityPathClient, Server path : paths) {
 StringBuilder sb = new StringBuilder();
 IteratorNetworkEntity iter =
 path.NetworkEntitynodeEntities().iterator();
 while (iter.hasNext()) {
 sb.append(iter.next());
 if (iter.hasNext()) sb.append( - );
 }
 System.out.println(sb);
 }

 This will print each path, with all entities in the path. This is what
 it looks like:

 domain.Client@1
 domain.Client@1 - domain.Router@2
 domain.Client@1 - domain.Router@2 - domain.Router@3
 domain.Client@1 - 

Re: [Neo4j] Basic Node storage/retrieval related question?

2011-04-22 Thread Michael Hunger
Why are you using 
Object a
and not int a or Integer a

SDG uses the field type and not the current value type to provide conversions 
for non primitive types.

As Object is such it is converted to a String. We will look into accomodating 
for Object values in the future.

By then please use the concrete type or a conversion service.

M

Sent from my iBrick4


Am 21.04.2011 um 19:21 schrieb G vlin...@gmail.com:

 I have a pojo with a field a.
 
 which i initialize like this
 Object a  = 10;
 I store the POJO containing this field using neo4j..
 
 When I load this POJO, I have a getter method to get the object
 
 Object getA() {
return a;
 }
 
 *What should be the class type of a ? *
 I am of the opinion it should be java.lang.Integer but it is coming out to
 be java.lang.String
 
 I am assuming this is because of node.getProperty(... )
 Is there a way I can get Integer object only.
 
 
 Also what all types can be stored  ?
 
 thanks,
 Karan
 .
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Look up a node by the value of one of its properties!!!

2011-04-22 Thread Kobla Gbenyo
Hello,

I am new to this list and I have a question about Neo4j (I am just using 
Neo4j).

I install neo4j server and Jersey Client (REST API) by which I 
communicate with the Neo4j database. My problem is how do I look up a 
node (get the location of this node : URI) knowing the value of one of 
its properties; I do not know the node ID. (for example I have a node 
{name:thomas,age:20...}; and I want to get the location(URI) of 
this node by sending the name thomas to the database server).

Please, could someone give me his feedback?

I will be glad to get any support on graph traversals using REST API.

Best regards,

-- 
Kobla GBENYO,
S/C M. Jean MATHE,
28 Rue de la Normandie,
79 000 Niort.

(+33) 6 26 07 93 41 / 6 62 26 64 47
http://www.gbenyo-expo.fr


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Look up a node by the value of one of its properties!!!

2011-04-22 Thread Michael Hunger
You can use the indexing part of the REST API.
That means after creation you have to add the fields you are interested in to 
an index. Then you can retrieve the node(s) later.

See http://components.neo4j.org/neo4j-server/snapshot/rest.html#Add_to_index

HTH

Michael

Sent from my iBrick4


Am 22.04.2011 um 09:39 schrieb Kobla Gbenyo ko...@riastudio.fr:

 Hello,
 
 I am new to this list and I have a question about Neo4j (I am just using 
 Neo4j).
 
 I install neo4j server and Jersey Client (REST API) by which I 
 communicate with the Neo4j database. My problem is how do I look up a 
 node (get the location of this node : URI) knowing the value of one of 
 its properties; I do not know the node ID. (for example I have a node 
 {name:thomas,age:20...}; and I want to get the location(URI) of 
 this node by sending the name thomas to the database server).
 
 Please, could someone give me his feedback?
 
 I will be glad to get any support on graph traversals using REST API.
 
 Best regards,
 
 -- 
 Kobla GBENYO,
 S/C M. Jean MATHE,
 28 Rue de la Normandie,
 79 000 Niort.
 
 (+33) 6 26 07 93 41 / 6 62 26 64 47
 http://www.gbenyo-expo.fr
 
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Basic Node storage/retrieval related question?

2011-04-22 Thread G
I was storing this as an object because this field was acting as parameters
to different functions I was calling and functions had different parameter
types.

Would Generics help here  ?

so that for my pojo I can have the following instead

T a

T getA(){
   return a
}

I would just give that a quick try. Do you think that would solve this issue
for me or do you have an alternate idea  ?

-Karan




On Fri, Apr 22, 2011 at 1:08 PM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 Why are you using
 Object a
 and not int a or Integer a

 SDG uses the field type and not the current value type to provide
 conversions for non primitive types.

 As Object is such it is converted to a String. We will look into
 accomodating for Object values in the future.

 By then please use the concrete type or a conversion service.

 M

 Sent from my iBrick4


 Am 21.04.2011 um 19:21 schrieb G vlin...@gmail.com:

  I have a pojo with a field a.
 
  which i initialize like this
  Object a  = 10;
  I store the POJO containing this field using neo4j..
 
  When I load this POJO, I have a getter method to get the object
 
  Object getA() {
 return a;
  }
 
  *What should be the class type of a ? *
  I am of the opinion it should be java.lang.Integer but it is coming out
 to
  be java.lang.String
 
  I am assuming this is because of node.getProperty(... )
  Is there a way I can get Integer object only.
 
 
  Also what all types can be stored  ?
 
  thanks,
  Karan
  .
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Question about REST interface concurrency

2011-04-22 Thread Jim Webber
Hi Stephen,

I think the network IO you've measured is consistent with the rest of the 
behaviour your've described. 

What I'm thinking is that you're simply reaching the limits of create 
transaction-create a node-complete transaction-flush to filesystem (that is, 
you're basically testing disk write speed/seek time/etc).

Can you check how busy your IO to disk is? I expect it'll be relatively high.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] about two database

2011-04-22 Thread Jim Webber
Hi Jose,

 1-i have 2 database (graph), I need to get information from one database 
 to another without having to take the target database instance of 
 another database.

One reasonable way of doing this is to use the HA configuration. The HA 
protocol will keep two (or many) instances of the database in sync.

 2- i need know how to open a database(graph) if it already exists.
 thanks beforehand

You could try to open a EmbeddedReadOnlyGraphDatabase. If the database store 
exists (and is a valid database) then no exception will be thrown. Otherwise 
you'll get a TransactionFailureException.

Jim

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Craig Taverner

 Good catch, forgot to add the in-graph representation of the results to my
 mail, thanks for adding that part. Temporary (transient) nodes and
 relationships would really rock here, with the advantage that with HA you
 have them distributed to all cluster nodes.
 Certainly Craig has to add some interesting things to this, as those
 resemble probably his in graph indexes / R-Trees.


I certainly make use of this model, much more so for my statistical analysis
than for graph indexes (but I'm planning to merge indexes and statistics).

However, in my case the structures are currently very domain specific. But I
think the idea is sound and should be generalizable. What I do is have a
concept of a 'dataset' on which queries can be performed. The dataset is
usually the root of a large sub-graph. The query parser (domain specific)
creates a hashcode of the query, checks if the dataset node already has a
resultset (as a connected sub-graph with its own root node containing the
previous query hashcode), and if so return that (traverse it), otherwise
perform the complete dataset traversal, creating the resultset as a new
subgraph and then return it. This works well specifically for statistical
queries, where the resultset is much smaller than the dataset, so adding new
subgraphs has small impact on the database size, and the resultset is much
faster to return, so this is a performance enhancement for multiple requests
from the client. Also, I keep the resultset permanently, not temporarily.
Very few operations modify the dataset, and if they do, we delete all
resultsets, and they get re-created the next time. My work on merging the
indexes with the statistics is also planned to only recreate 'dirty' subsets
of the result-set, so modifying the dataset has minimal impact on the query
performance.

After reading Rick's previous email I started thinking of approaches to
generalizing this, but I think your 'transient' nodes perhaps encompass
everything I thought about. Here is an idea:

   - Have new nodes/relations/properties tables on disk, like a second graph
   database, but different in the sense that it has one-way relations into the
   main database, which cannot be seen by the main graph and so are by
   definition not part of the graph. These can have transience and expiry
   characteristics. Then we can build the resultset graphs as transient graphs
   in the transient database, with 'drill-down' capabilities to the original
   graph (something I find I always need for statistical queries, and something
   a graph is simply much better at than a relational database).
   - Use some kind of hashcode in the traversal definition or query to
   identify existing, cached, transient graphs in the second database, so you
   can rely on those for repeated queries, or pagination or streaming, etc.

As traversers are lazy a count operation is not so easily possible, you
 could run the traversal and discard the results. But then the client could
 also just pull those results until it reaches its
 internal tresholds and then decide to use more filtering or stop the
 pulling and ask the user for more filtering (you can always retrieve n+1 and
 show the user that there are more that n results available).


Yes. Count needs to perform the traversal. So the only way to not have to
traverse twice is to keep a cache. If we make the cache a transient
sub-graph (possibly in the second database I described above), then we have
the interesting behaviour that count() takes a while, but subsequent
queries, pagination or streaming, are fast.

Please don't forget that a count() query in a RDBMS can be as ridicully
 expensive as the original query (especially if just the column selection was
 replaced with count, and sorting, grouping etc was still left in place
 together with lots of joins).


Good to hear they have the same problem as us :-)
(or even more problems)

Sorting on your own instead of letting the db do that mostly harms the
 performance as it requires you to build up all the data in memory, sort it
 and then use it. Instead of having the db do that more efficiently, stream
 the data and you can use it directly from the stream.


Client side sorting makes sense if you know the domain well enough to know,
for example, you will receive a small enough result set to 'fit' in the
client, and want to give the user multiple interactive sort options without
hitting the database again. But I agree that in general it makes sense to
get the database to do the sort.

Cheers, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Rick Otten
 Client side sorting makes sense if you know the domain well enough to
 know, for example, you will receive a small enough result set to 'fit'
 in the client, and want to give the user multiple interactive sort
 options without hitting the database again. But I agree that in general
it makes sense to get the database to do the sort.


I'll concede this point.  In general it should be better to do the sorts
on the database server, which is typically by design a hefty backend
system that is optimized for that sort of processing.

In my experience with regular SQL databases, unfortunately they typically
only scale vertically, and are usually running on expensive
enterprise-grade hardware.  Most of the ones I've worked either run on
minimally sized hardware or have quickly outgrown their hardware.

So they are always either:
  1) Currently suffering from a capacity problem.
  2) Just recovering from a capacity problem.
  3) Heading rapidly towards a new capacity problem.

The next problem I run into is a political, rather than technical one. 
The database administration team is often a different group of people from
the appserver/front end development team.   The guys writing the queries
are usually closer to the appserver than the database.  In other words, it
is easier for them to manage a problem in the appserver, than it is to
manage a problem in the database.

So, instead of having a deep well of data processing power to draw on, and
then using a wide layer of thin commodity hardware presentation layer
servers, we end up transferring data processing power out of the data
server and into the presentation layer.

As we evolve into building data processing systems which can scale
horizontally on commodity hardware, the perpetual capacity problems the
legacy vertical databases suffer from may wane, finally freeing the other
layers from having to pick up some of the slack.


-- 
Rick Otten
rot...@windfish.net
O=='=+


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Tobias Ivarsson
On Thu, Apr 21, 2011 at 11:18 PM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 Rick,

 great thoughts.

 Good catch, forgot to add the in-graph representation of the results to my
 mail, thanks for adding that part. Temporary (transient) nodes and
 relationships would really rock here, with the advantage that with HA you
 have them distributed to all cluster nodes.
 Certainly Craig has to add some interesting things to this, as those
 resemble probably his in graph indexes / R-Trees.

 As traversers are lazy a count operation is not so easily possible, you
 could run the traversal and discard the results. But then the client could
 also just pull those results until it reaches its
 internal tresholds and then decide to use more filtering or stop the
 pulling and ask the user for more filtering (you can always retrieve n+1 and
 show the user that there are more that n results available).

 The index result size() method only returns an estimate of the result size
 (which might not contain currently changed index entries).

 Please don't forget that a count() query in a RDBMS can be as ridicully
 expensive as the original query (especially if just the column selection was
 replaced with count, and sorting, grouping etc was still left in place
 together with lots of joins).

 Sorting on your own instead of letting the db do that mostly harms the
 performance as it requires you to build up all the data in memory, sort it
 and then use it. Instead of having the db do that more efficiently, stream
 the data and you can use it directly from the stream.


throw new SlapOnTheFingersException(sometimes the application developer can
do a better job since she has better knowledge of the data, the database
only has generic knowledge);

Since Jake had already mentioned (in this very thread) that he expected one
of those, I thought I might as well throw one in there.

I agree with the analysis of count(), as the name (count) implies, it will
have to run the entire query in order to count the number of resulting
items.

About sorting I'm torn. The perception of sorting in the database being slow
that Rick points to is one that I've seen a lot. When you hand the
responsibility of sorting to the database you hide the fact that sorting is
an expensive operation, it does require reading in all data in order to sort
it. People often expect databases to be smarter than that, since they
sometimes are, but that is pretty much only when reading straight from an
index and not doing much more. A generic sort of data can never be better
than O(log(n!)) [O(log(n!)) is almost equal to, and commonly rounded to the
easier to compute function O(n log(n))]. If you put the responsibility of
sorting in the hands of the application you can sometimes utilize knowledge
about the data to do a more efficient sorting than the database could have
done. Most often by simply doing an application level filtering of the data
before sorting it, based on some filtering that could not be transfered to
the database query. This does however make the work of the application
developer slightly more tedious, which is why I think it would be sensible
to have support for sorting on the database level, and hope that users will
be sensible about using it, and not assume magic from it.

Something I find very interesting is the concept of semi-sorted data.
Semi-sorted data is often good enough, easier to achieve, and quite easy to
then sort completely if that is required. Examples of semi-sorted data could
be data in an order that satisfies the heap property. Or for spatial queries
returning the closest hits first, but not necessarily in perfect order, say
returning the hits within a miles radius first, before the ones in a radius
between 1-10 miles, and so on, without requiring the hits in each 'segment'
to be perfectly ordered by distance. Breadth first order is another example
of semi-sorted data, that could be used when traversing data as you've
outlined with paging nodes, or similarly grouped by parent node-order.

I must say that I really enjoy following this discussion. I really like the
idea of streaming, since I think that can be implemented more easily than
paging, while satisfying many of the desired use cases. But I still want to
hear more arguments for and against both alternatives. And as has already
been pointed out, they aren't mutually exclusive.

I'll keep listening in on the conversation, but I don't have much more to
add at this point. I have one desire for the structure of the conversation
though. When you quote what someone else has said before you, could you
please include who that person was, it makes going back and reading the full
context easier.

Cheers,
-- 
Tobias Ivarsson tobias.ivars...@neotechnology.com
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Strange performance difference on different machines

2011-04-22 Thread Bob Hutchison
Hi Michael,

On 2011-04-21, at 4:38 PM, Michael Hunger wrote:

 Bob,
 
 I don't know if you have already answered these questions. 
 
 Which JDK (also version) are you using for that, what are the JVM memory 
 settings?

Sun's java 1.6 patch level 24 (I think I'll have to confirm on Monday)

Memory settings... from the top of my head... there's a GC setting that I can't 
recall, and the min and max heap is set to 12GB (which it never comes close to)

 
 Do you have a profiler handy that you could throw at your benchmark? (E.g. 
 yourkit has a 30 day trial, other profilers surely too).

Monday for that too.

 
 Do you have the source code of your tests at hand? So we could run exactly 
 the same code on our own Linux systems for cross checking?

If it's useful I can probably extract something, starting Monday :-)

 
 What Linux distribution is it, and 64 or 32 bit? Do you also have a disk 
 formatted with ext3 to cross check? (Perhaps just a loopback device).

Ubuntu 10.10 64bit. The machine has been set up as ext4, I'll see what I can 
scrounge up for ext3.

 
 How much memory does the linux box have available?

16GB

 
 Thanks so much.
 

Thank you.

Cheers,
Bob

 Michael
 
 Am 21.04.2011 um 21:53 schrieb Bob Hutchison:
 
 
 On 2011-04-20, at 7:30 AM, Tobias Ivarsson wrote:
 
 Sorry I got a bit distracted when writing this. I should have added that I
 then want you to send the results of running that benchmark to me so that I
 can further analyze what the cause of these slow writes might be.
 
 Thank you,
 Tobias
 
 That's what I figured you meant. Sorry for the delay, here they are:
 
 On a HP z400, quad Xeon W3550 @ 3.07GHz
 ext4 filesystem
 -
 
 dd if=/dev/urandom of=store bs=1M count=1000
 1000+0 records in
 1000+0 records out
 1048576000 bytes (1.0 GB) copied, 111.175 s, 9.4 MB/s
 dd if=store of=/dev/null bs=100M
 10+0 records in
 10+0 records out
 1048576000 bytes (1.0 GB) copied, 0.281153 s, 3.7 GB/s
 dd if=store of=/dev/null bs=100M
 10+0 records in
 10+0 records out
 1048576000 bytes (1.0 GB) copied, 0.244339 s, 4.3 GB/s
 dd if=store of=/dev/null bs=100M
 10+0 records in
 10+0 records out
 1048576000 bytes (1.0 GB) copied, 0.242583 s, 4.3 GB/s
 
 
 ./run ../store logfile 33 100 500 100
 tx_count[100] records[31397] fdatasyncs[100] read[0.9881029 MB] 
 wrote[1.9762058 MB]
 Time was: 5.012
 19.952114 tx/s, 6264.365 records/s, 19.952114 fdatasyncs/s, 201.87897 kB/s 
 on reads, 403.75793 kB/s on writes
 
 ./run ../store logfile 33 1000 5000 10 
 tx_count[10] records[30997] fdatasyncs[10] read[0.9755144 MB] 
 wrote[1.9510288 MB]
 Time was: 0.604
 16.556292 tx/s, 51319.54 records/s, 16.556292 fdatasyncs/s, 1653.8523 kB/s 
 on reads, 3307.7046 kB/s on writes
 
 ./run ../store logfile 33 1000 5000 100 
 tx_count[100] records[298245] fdatasyncs[100] read[9.386144 MB] 
 wrote[18.772287 MB]
 Time was: 199.116
 0.5022198 tx/s, 1497.8455 records/s, 0.5022198 fdatasyncs/s, 48.270412 kB/s 
 on reads, 96.540825 kB/s on writes
 
 procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  2  0 8541712 336716 367094000 1 7   12   20  4  1 95   0
 0  2  0 8525712 336716 367094800 0   979 1653 3186  4  1 60 
 35
 1  2  0 8525220 336716 367120400 0  1244 1671 3150  4  1 71 
 24
 0  2  0 8524724 336716 367133200 0   709 1517 3302  4  1 65 
 30
 0  2  0 8524476 336716 367146000 0  1033 1680 69342  5  7 59 
 29
 0  2  0 8539168 336716 367158800 0  1375 1599 3272  3  1 70 
 25
 1  2  0 8538860 336716 367171600 0  1157 1594 3097  3  1 72 
 24
 0  1  0 8541340 336716 367184400 0  1151 1512 3182  3  2 70 
 25
 0  1  0 8524812 336716 367197200 0  1597 1641 3391  4  2 72 
 22
 
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so




___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] about two database

2011-04-22 Thread Jim Webber
Hi Jose,

 thanks you very much for your answer, but do not know where I can find 
 some example about de HA.

The main wiki page is here:

http://wiki.neo4j.org/content/High_Availability_Cluster

And the (milestone) docs are here:

http://docs.neo4j.org/chunked/milestone/server-ha.html

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Jim Webber
Hi Michael,

 Just in case we're not talking about the same kind of streaming --
 when I think streaming, I think streaming uploads, streaming
 downloads, etc.

I'm thinking chunked transfers. That is the server starts sending a response 
and then eventually terminates it when the whole response has been sent to the 
client.

Although it seems a bit rude, the client could simply opt to close the 
connection when it's read enough providing what it has read makes sense. 
Sometimes document fragments can make sense:

results
   node id=1234
 property name=planet value=Earth/
   /node
   node id=1235
property name=planet value=Mars/
   /node
!-- client gets bored here and kills the connection missing out on what would 
have followed --
   node id=1236
property name=planet value=Jupiter/
   /node
   node id=1237
property name=planet value=Saturn/
   /node
/results

In this case we certainly don't have well-formed XML, but some streaming API 
(e.g. stax) might already have been able to create some local objects on the 
client side as the Earth and Mars nodes came in.

I don't think this is elegant at all, but it might be practical. I've asked 
Mark Nottingham for his view on this since he's pretty sensible about Web 
things.

Jim




___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] about two database

2011-04-22 Thread Jose Angel Inda Herrera
Hi Jim,
I am creating a package of algorithms to work in such databases NEO4J, 
then for that I have two graphs (DB), and what you want to know is how 
to obtain such a node in the graph G1 from G2, but without having an 
instance of G1,
This can be done?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Jim Webber
Hi Georg,

It would at least have to be an iterator over pages - otherwise the results 
tend to be fine-grained and so horribly inefficient for sending over a network.

Jim

On 22 Apr 2011, at 18:24, Georg Summer wrote:

 I might be a little newbish here, but then why not an Iterator?
 The iterator lives on the server and is accessible through the REST
 interface, providing a advance and value method. It either operates on a
 stored and once-created-stable result set or holds the query and evaluates
 it on demand (issues of changing underlying graph included).
 
 The client can have paginator functionality by advancing and derefing the
 iterator n times or streaming-like behaviour by constantly pushing the
 obtained data into a queue and keep on going.
 
 If the client does not need the iterator anymore he simple stops using it
 and a timeout kills it eventually on the server. a client-callable delete
 method for the iterator would work as well.
 
 
 Georg
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Rick Bullotta
Sax (or stax) is an example of streaming with a higher level format, but there 
are plenty of other ways as well.  The *critical* performance element is to 
*never* have to accumulate an entire intermediate document on either side (eg 
json object or xml Dom) if you can avoid it.  You end up requiring 4x the 
resources (or more), extra latency, more parsing, and more garbage collection.

I'll get with Jim webber and propose a prototype of alternatives.

Note also that the lack of binary I/O in the browser without 
flash/java/silverlight is a challenge, but we can work around it.




- Reply message -
From: Michael DeHaan michael.deh...@gmail.com
Date: Fri, Apr 22, 2011 12:18 pm
Subject: [Neo4j] REST results pagination
To: Neo4j user discussions user@lists.neo4j.org

On Thu, Apr 21, 2011 at 5:00 PM, Michael Hunger
michael.hun...@neotechnology.com wrote:
 Really cool discussion so far,

 I would also prefer streaming over paging as with that approach we can give 
 both ends more of the control they need.

Just in case we're not talking about the same kind of streaming --
when I think streaming, I think streaming uploads, streaming
downloads, etc.

If the REST format is JSON (or XML, whatever), that's a /document/ so
you can't just say read the next (up to) 512 bytes and work on it.
It becomes a more low-level endeavor because if you're in the middle
of reading a record, or don't even have the end of list terminator,
what you have isn't parseable yet.  I'm sure a lot of hacking could be
done to make the client figure out if he had enough other than the
closing array element, but it's a lot to ask of a JSON client.

So I'm interested in how, in that proposal, the REST API might stream
results to a client, because for the streaming to be meaningful, you
need to be able to parse what you get back and know where the
boundaries are (or build a buffer until you fill in a datastructure
enough to operate on it).

I don't see that working with JSON/REST so much.   It seems to imply a
message bus.

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] about two database

2011-04-22 Thread Michael Hunger
It is in principle possible but what is the issue of having an instance (ro or 
rw) of the second db that does the parsing of the store files for you?

Sent from my iBrick4


Am 22.04.2011 um 19:22 schrieb Jose Angel Inda Herrera 
jai...@estudiantes.uci.cu:

 Hi Jim,
 I am creating a package of algorithms to work in such databases NEO4J, 
 then for that I have two graphs (DB), and what you want to know is how 
 to obtain such a node in the graph G1 from G2, but without having an 
 instance of G1,
 This can be done?
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Rick Bullotta
That would need to hold resources on the server (potentially for an 
indeterminate amount of time) since it must be stateful.  In general, stateful 
apis do not scale well in cases of dynamic queries.






- Reply message -
From: Georg Summer georg.sum...@gmail.com
Date: Fri, Apr 22, 2011 1:25 pm
Subject: [Neo4j] REST results pagination
To: Neo4j user discussions user@lists.neo4j.org

I might be a little newbish here, but then why not an Iterator?
The iterator lives on the server and is accessible through the REST
interface, providing a advance and value method. It either operates on a
stored and once-created-stable result set or holds the query and evaluates
it on demand (issues of changing underlying graph included).

The client can have paginator functionality by advancing and derefing the
iterator n times or streaming-like behaviour by constantly pushing the
obtained data into a queue and keep on going.

If the client does not need the iterator anymore he simple stops using it
and a timeout kills it eventually on the server. a client-callable delete
method for the iterator would work as well.


Georg

On 22 April 2011 18:43, Jim Webber j...@neotechnology.com wrote:

 Hi Michael,

  Just in case we're not talking about the same kind of streaming --
  when I think streaming, I think streaming uploads, streaming
  downloads, etc.

 I'm thinking chunked transfers. That is the server starts sending a
 response and then eventually terminates it when the whole response has been
 sent to the client.

 Although it seems a bit rude, the client could simply opt to close the
 connection when it's read enough providing what it has read makes sense.
 Sometimes document fragments can make sense:

 results
   node id=1234
 property name=planet value=Earth/
   /node
   node id=1235
property name=planet value=Mars/
   /node
 !-- client gets bored here and kills the connection missing out on what
 would have followed --
   node id=1236
property name=planet value=Jupiter/
   /node
   node id=1237
property name=planet value=Saturn/
   /node
 /results

 In this case we certainly don't have well-formed XML, but some streaming
 API (e.g. stax) might already have been able to create some local objects on
 the client side as the Earth and Mars nodes came in.

 I don't think this is elegant at all, but it might be practical. I've asked
 Mark Nottingham for his view on this since he's pretty sensible about Web
 things.

 Jim




 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Rick Bullotta
I'll be happy to host the streaming rest api summit.  Ample amounts of beer 
will be provided.;-)


- Reply message -
From: Jim Webber j...@neotechnology.com
Date: Fri, Apr 22, 2011 1:46 pm
Subject: [Neo4j] REST results pagination
To: Neo4j user discussions user@lists.neo4j.org

Hi Georg,

It would at least have to be an iterator over pages - otherwise the results 
tend to be fine-grained and so horribly inefficient for sending over a network.

Jim

On 22 Apr 2011, at 18:24, Georg Summer wrote:

 I might be a little newbish here, but then why not an Iterator?
 The iterator lives on the server and is accessible through the REST
 interface, providing a advance and value method. It either operates on a
 stored and once-created-stable result set or holds the query and evaluates
 it on demand (issues of changing underlying graph included).

 The client can have paginator functionality by advancing and derefing the
 iterator n times or streaming-like behaviour by constantly pushing the
 obtained data into a queue and keep on going.

 If the client does not need the iterator anymore he simple stops using it
 and a timeout kills it eventually on the server. a client-callable delete
 method for the iterator would work as well.


 Georg
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Michael Hunger
And you would want to reuse your connection so you don't have to pay this 
penalty per request

Just asking how would duch a REST Resource iterator look like -URI, verbs, 
request,response formats?

I assume then evety query (index,traversal) would just return the iterator URI 
for later consumption. If we store the query and/or result information (as 
discussed by Crsig and others) at the node returned as iterator this would be 
a nice fit.

M
Sent from my iBrick4


Am 22.04.2011 um 19:46 schrieb Jim Webber j...@neotechnology.com:

 Hi Georg,
 
 It would at least have to be an iterator over pages - otherwise the results 
 tend to be fine-grained and so horribly inefficient for sending over a 
 network.
 
 Jim
 
 On 22 Apr 2011, at 18:24, Georg Summer wrote:
 
 I might be a little newbish here, but then why not an Iterator?
 The iterator lives on the server and is accessible through the REST
 interface, providing a advance and value method. It either operates on a
 stored and once-created-stable result set or holds the query and evaluates
 it on demand (issues of changing underlying graph included).
 
 The client can have paginator functionality by advancing and derefing the
 iterator n times or streaming-like behaviour by constantly pushing the
 obtained data into a queue and keep on going.
 
 If the client does not need the iterator anymore he simple stops using it
 and a timeout kills it eventually on the server. a client-callable delete
 method for the iterator would work as well.
 
 
 Georg
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Error building Neo4j

2011-04-22 Thread Kevin Moore
Maven 2 solved the problem.

I submitted a pull request from https://github.com/kevmoo/community to
explain such in the readme.

On Thu, Apr 21, 2011 at 02:39, Jim Webber j...@neotechnology.com wrote:

 Hi Kevin,

 I can replicate your problem. The way I worked around this was to use Maven
 2.2.1 rather than Maven 3.0.x. Then I get a green build for community
 edition.

 I'll poke the devteam and see what Maven versions they're running on.

 Jim


 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] about two database

2011-04-22 Thread Jose Angel Inda Herrera
El 22/04/11 13:51, Michael Hunger escribió:
 It is in principle possible but what is the issue of having an instance (ro 
 or rw) of the second db that does the parsing of the store files for you?
hi michel
if I have the instance of the second bd means that I have is a reference 
to the 2nd database
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] about delete node

2011-04-22 Thread Jose Angel Inda Herrera
Hi list,
There is some property that when I delete a node you mark me as to be 
removed to perform the operation to clear when the transaction is completed
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] about delete node

2011-04-22 Thread Michael Hunger
Sorry, I'm not sure to follow.

There is just the node.delete() operation, which is commited at the end of tx.
http://wiki.neo4j.org/content/Delete_Semantics

Do you mean you want to mark a node as to be removed? There is nothing like 
that.
Or do you want a property that tells you that a node has been removed? (You can 
set that yourself prior to deletion).

Hope that helps

Michael

Am 23.04.2011 um 04:00 schrieb Jose Angel Inda Herrera:

 Hi list,
 There is some property that when I delete a node you mark me as to be 
 removed to perform the operation to clear when the transaction is completed
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Michael Hunger
I spent some time looking at what others are doing for inspiration.

I kind of like the Riak/Basho approach with multipart-chunks and the approach 
of explictely creating a resource for the query that can be navigated (either 
via pages or first,next,[prev,last] links) and expires (and could be 
reconstructed).

Cheers

Michael

Good discussion: 
http://stackoverflow.com/questions/924472/paging-in-a-rest-collection

CouchDB: 
http://wiki.apache.org/couchdb/HTTP_Document_API
startKey + limit, endKey + limit, sorting, insert/update order

Mongooese: [cursor-id]+batch_size


OrientDB: .../[limit]

Sones: no real rest API, but a SQL on top of the graph: 
http://developers.sones.de/documentation/graph-query-language/select/
with limit, offset, but also depth (for graph)

HBase explcitly creates scanners, which can be then access with next 
operations, and expire after no activity for a certain timeout


riak:
http://wiki.basho.com/REST-API.html
 client-id header for client identification - sticky?
optional query parameters for including properties, and if to stream the data 
keys=[true,false,stream]

If “keys=stream”, the response will be transferred using chunked-encoding, 
where each chunk is a JSON object. The first chunk will contain the “props” 
entry (if props was not set to false). Subsequent chunks will contain 
individual JSON objects with the “keys” entry containing a sublist of the total 
keyset (some sublists may be empty).
riak seems to support partial json, non closed elements: -d 
'{props:{n_val:5'

returns multiple responses in one go, Content-Type: multipart/mixed; 
boundary=YinLMzyUR9feB17okMytgKsylvh

--YinLMzyUR9feB17okMytgKsylvh
Content-Type: application/x-www-form-urlencoded
Link: /riak/test; rel=up
Etag: 16vic4eU9ny46o4KPiDz1f
Last-Modified: Wed, 10 Mar 2010 18:01:06 GMT

{bar:baz}
(this block can be repeated n times)
--YinLMzyUR9feB17okMytgKsylvh--
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0

Query results:
Content-Type – always multipart/mixed, with a boundary specified
Understanding the response body

The response body will always be multipart/mixed, with each chunk representing 
a single phase of the link-walking query. Each phase will also be encoded in 
multipart/mixed, with each chunk representing a single object that was found. 
If no objects were found or “keep” was not set on the phase, no chunks will be 
present in that phase. Objects inside phase results will include Location 
headers that can be used to determine bucket and key. In fact, you can treat 
each object-chunk similarly to a complete response from read object, without 
the status code.
 HTTP/1.1 200 OK
 Server: MochiWeb/1.1 WebMachine/1.6 (eat around the stinger)
 Expires: Wed, 10 Mar 2010 20:24:49 GMT
 Date: Wed, 10 Mar 2010 20:14:49 GMT
 Content-Type: multipart/mixed; boundary=JZi8W8pB0Z3nO3odw11GUB4LQCN
 Content-Length: 970


--JZi8W8pB0Z3nO3odw11GUB4LQCN
Content-Type: multipart/mixed; boundary=OjZ8Km9J5vbsmxtcn1p48J91cJP

--OjZ8Km9J5vbsmxtcn1p48J91cJP
Content-Type: application/json
Etag: 3pvmY35coyWPxh8mh4uBQC
Last-Modified: Wed, 10 Mar 2010 20:14:13 GMT

{riak:CAP}
--OjZ8Km9J5vbsmxtcn1p48J91cJP--

--JZi8W8pB0Z3nO3odw11GUB4LQCN
Content-Type: multipart/mixed; boundary=RJKFlAs9PrdBNfd74HANycvbA8C

--RJKFlAs9PrdBNfd74HANycvbA8C
Location: /riak/test/doc2
Content-Type: application/json
Etag: 6dQBm9oYA1mxRSH0e96l5W
Last-Modified: Wed, 10 Mar 2010 18:11:41 GMT

{foo:bar}
--RJKFlAs9PrdBNfd74HANycvbA8C--

--JZi8W8pB0Z3nO3odw11GUB4LQCN--
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0

Riak - MapReduce:
Optional query parameters:

* chunked – when set to true, results will be returned one at a time in 
multipart/mixed format using chunked-encoding.
Important headers:

* Content-Type – application/json when chunked is not true, 
otherwise multipart/mixed with application/json parts

Other interesting endpoints: /ping, /stats
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user