Re: [Neo4j] Starting neo4j Server doesn't return to promt

2011-04-19 Thread Peter Neubauer
Stefan,
Could you please send the console output and the content of the data/log dir
for more info?
On Apr 19, 2011 1:02 AM, Stephan Hagemann stephan.hagem...@googlemail.com
wrote:
 Hello group,

 I just realized that since upgrading to Neo4j 1.3 my deployment is broken.
 It seems to be due to the fact that when starting up, the server does not
 return to a prompt (I noticed this locally also - I need to press enter to
 get the prompt). Vlad (the deployment script) thus probably assumes that
the
 startup is not yet finished. I have played with the startup options in the
 neo4j executable, but to no avail. Is anyone else experiencing this or has
 some ideas?

 Thanks!
 Stephan
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to combine both traversing and index queries?

2011-04-19 Thread Craig Taverner
Another approach to this problem is to consider that an index is actually
structured as a graph (a tree), and so if you write the tree into the graph
together with your data model, you can combined the index and the traversal
into a pure graph traversal. Of course, it is insufficient to simply build
both the index tree and the domain model as two graphs that only connect at
the result nodes. You need to build a combined graph that achieves the
purpose of both indexing and domain structure. This is a very domain
specific thing and so there are no general purpose solutions. You have to
build the graph to suite your domain.

One approach is to build the domain graph first, then decide why you want
indexing, and without adding lucene (or any external index) to the mix,
think about how to modify the graph to also achieve the same effect.

On Mon, Apr 18, 2011 at 8:54 PM, Willem-Paul Stuurman 
w.p.stuur...@knollenstein.com wrote:

 Hi Ville,

 We ran into a similar problem basically wanting to search only part of the
 graph using Lucene. We used traversing to determine the nodes to search from
 and from there on use Lucene to do a search on nodes connected to the nodes
 from the traverse result.

 We solved it as follows:
 - defined a TransactionEventHandler to auto-update the indexes with node
 properties, but also add relationships to the same index. We use the
 relationship.name() as the property name for Lucene, with the 'other node'
 id as the value.
 - traverse to get a set of nodes from where on the search. We apply the ACL
 here to only return nodes the user is allowed to see.
 - create a BooleanQuery for Lucene with the relationship.name() field
 names and id's. So if the relationship would be 'IS_FRIEND_OF' and we want
 to do a full text search for 'trinity' on friends of people with ids 1,2 and
 3, we create a query that contains: +(name:trinity) +(isfriendof:1
 isfriendof:2 isfriendof:3)

 To make sure we only get back 'person' nodes we also indexed the node type
 (in our case 'emtype'), so the complete query is:
 +emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3)

 This way you can easily traverse to define the 'edges' of where to search
 and let Lucene handle the search within that region.

 Optionally we add the ACL to the Lucene query as well using the same
 technique, basically adding all group ids the current user is member of and
 has a 'CAN_ACCESS' relationship with the node:
 +emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3)
 +(canaccess:233 canaccess:254 canaccess:324)

 It works for us because in our case we know the traversal will return a
 reasonable set of nodes (not thousands+). Lucene can return thousands of
 nodes, but that's not a problem of course. And we can still use the fun
 stuff like sorting, paging and score results.

 Hope this helps.

 Cheers

 Paul


 PS: we always use lower case field names without underscores because
 somehow it makes Lucene happier


 On 18 apr 2011, at 11:19, Mattias Persson wrote:

  2011/4/18 Michael Hunger michael.hun...@neotechnology.com:
  Would it be also possible to go the other way round?
 
  E.g. have the index-results (name:Vil*) as starting point and traverse
 backwards the two steps to your start node? (Either using a traversal or the
 shortest path graph algo with a maximum way-length)?
 
  That's what I suggested, but it doesn't exist yet :)
 
  To do it that way today (do a traversal from each and every index
  result) would probably be slower than doing one traversal with
  filtering.
 
 
  Cheers
 
  Michael
 
  Am 18.04.2011 um 11:03 schrieb Mattias Persson:
 
  Hi Ville,
 
  2011/4/14 Ville Mattila vi...@mattila.fi:
  Hi there,
 
  I am somehow stuck with a problem of combining traversing and queries
  to indices efficiently - something like finding all people with a name
  starting with Vil* two steps away from a reference node.
 
  Traversing all friends within two steps from the reference node is
  trivial, but I find it a bit inefficient to apply a return evaluator
  in each of the nodes visited during traversal. Or is it so? How about
  more complex criteria which may involve more than one property or even
  more complex (Lucene) queries?
 
 
  The best solution IMHO (one that isn't available yet) would be to let
  a traversal have multiple starting points, that is have the index
  result as starting point.
 
  I think that doing a traversal and filtering with an evaluator is the
  way to go. Have you tried doing this and saw a bad performance for it?
 
  I was thinking to spice up my Neo4j setup with Elasticsearch
  (www.elasticsearch.org) to dedicate Neo4j to keep track of the
  relationships and ES to index all the data in them, however it makes
  me feel very uncomfortable to keep up the consistency when data gets
  updated. However, now I need to keep also Neo4j indices updated. And
  not to be said, combining traversal and an external index is yet more
  complicated. However I like 

Re: [Neo4j] Wiki documentation neo4j+restfulie.

2011-04-19 Thread Jim Webber
Hi José,

Please feel free to add to the wiki. We've had a problem with spammers 
recently, so if you run into permissions problems please shout.

Jim

On 22 Mar 2011, at 20:19, jdbjun...@gmail.com wrote:

 Hi, going through the neo4j documentation I found some examples of how
 access neo4j api using two rest libaries (rest-client, neography).
 After reading it, I've decided to do the same tests using the library
 restfulie, which I'm committer.
 Am I allowed to change the wiki adding the restfulie example? If it is ok,
 is any one willing to review it before changing the wiki?
 
 Thanks,
 José Donizetti.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Jim Webber
Hi Javier,

I've just checked and that's in our list of stuff we really should do because 
it annoys us that it's not there.

No promises, but we do intend to work through at least some of that list for 
the 1.4 releases.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Saikat Kanjilal

I'd like to propose that we put this functionality into the plugin 
(https://github.com/skanjila/gremlin-translation-plugin) that Peter and I are 
currently working on, thoughts?
 From: j...@neotechnology.com
 Date: Tue, 19 Apr 2011 15:25:20 +0100
 To: user@lists.neo4j.org
 Subject: Re: [Neo4j] REST results pagination
 
 Hi Javier,
 
 I've just checked and that's in our list of stuff we really should do 
 because it annoys us that it's not there.
 
 No promises, but we do intend to work through at least some of that list for 
 the 1.4 releases.
 
 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
  
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Javier de la Rosa
On Tue, Apr 19, 2011 at 10:32, Saikat Kanjilal sxk1...@hotmail.com wrote:
 I'd like to propose that we put this functionality into the plugin 
 (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I are 
 currently working on, thoughts?

+1

 From: j...@neotechnology.com
 I've just checked and that's in our list of stuff we really should do 
 because it annoys us that it's not there.
 No promises, but we do intend to work through at least some of that list for 
 the 1.4 releases.

It will be great to see the feature in the 1.4 :-)



-- 
Javier de la Rosa
http://versae.es
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Neo4j/Spatial and Scala

2011-04-19 Thread Christopher Schmidt
Hi all,

I am evaluating the advantages of using Neo4j and its spatial extension. For
testing I have extended (forked) the neo4j-scala with some spatial
convenience methods. So that something written in Java like:

SpatialDatabaseService db = new SpatialDatabaseService( graphDb() );
EditableLayer layer = (EditableLayer) db.getOrCreateEditableLayer( test );
SpatialDatabaseRecord record = layer.add(
layer.getGeometryFactory().createPoint(new Coordinate( 15.3, 56.2 ) ) );

Will be converted to:

class Neo4jTest extends Neo4jSpatialWrapper with
EmbeddedGraphDatabaseServiceProvider with SpatialDatabaseServiceProvider {
  def neo4jStoreDir = NEO4J_STORE_DIR

  withLayer(getOrCreateEditableLayer(test)) {
implicit layer =
val myRecord = add newPoint ((15.3, 56.2))
  }
}

Please refer the github (https://github.com/FaKod/neo4j-scala) Readme or the
test cases for more examples .

Since we will try (if we have enough time) to extend Neo4j Scala, I would
love to get some comments from some of the Scala enthusiasts in this list
(hope there are any ;-). Now or later, here or to my email adress.
What do you think? Does it help? Is it OK how we use the Traits or
the implicits? How should we do POPO to Node serialization? With annotations
like those in jo4neo?

Regards
-- 
Christopher
twitter: @fakod
blog: http://blog.fakod.eu
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST API thoughts/questions/feedback

2011-04-19 Thread Jacob Hansson
Hey Michael, big thanks again for taking the time to write down your
experiences working with the REST API.

See inline response.

On Mon, Apr 18, 2011 at 4:10 PM, Michael DeHaan michael.deh...@gmail.comwrote:

 Hi all.

 I've been working recently on writing a Perl binding for the Neo4j
 REST API and thought I'd share some observations, and hopefully can
 get a few suggestions on some things.  You can see some of the Perl
 work in progress here -- https://github.com/mpdehaan/Elevator (search
 for *Neo*.pm).  Basically it's a data later that allows objects to be
 plugged between Sql, NoSql (Riak and Mongo so far) and Neo4j.

 The idea is we can build data classes and just call commit() and the
 like on them, though if the class is backed by Neo4j obviously we'll
 be able to add links between them, query the links, and so forth.  I'm
 still working on that.

 Basically the REST API *is* working for me, but here are my
 observations about it:

 (1)  I'd like to be able to be able to specify the node ID of a node
 before I create it.  I like having a primary key, as I can do with
 things like Mongo and Riak.  If I do not have a primary key, I have to
 search before I add, upsert becomes difficult, as do deletions, and
 I have to worry about which copy of a given object is authorative.
 I understand this can't work for everyone but seems like it would be
 useful. If that can be done now, I'd love info on how to!


I think the current standard approach to key/value storage is, like you
mention, to store unique keys in an index. This does mean you have to build
upsert abstractions yourself, always doing an index lookup before inserts
or updates.

As far as allowing neo4j clients to set ids for nodes, I think the problems
that would create (for instance in High Availability setups where each slave
gets a set of ids it can assign) seems like they would outweigh the
benefits.



 (2)  I'd like a delete_all find of API to be able to delete all nodes
 matching a particular criteria versus having to do a search.   For
 instance, I may need to rebuild the database, or part of it, and it's
 not clear on how to drop it.   Also, is there a way to drop the entire
 database via REST?


This feels like a two-part idea, both of which I like :)

First, the ability to do manipulating operations like deleting and/or
editing data on a large scale without having to pull down each node over
http would be awesome. There is talk about putting together a query
language, and that could potentially be outfitted to do mutating operations,
similar to how SQL was extended to do that. Will definately keep this in
mind!

Second, the ability to nuke the database I think is a great thing to have
in a development environment. A feature we're discussing is the ability to
have multiple databases running in each neo4j server, allowing you to nuke
and create databases as appropriate.

For a faster fix, take a look at Michael Hungers db-nuker plugin:
https://github.com/jexp/neo4j-clean-remote-db-addon


 (3)  I'd like to be able to have the key of the node automatically
 added to the index without having to make a second call.  Ideally I'd
 like to be able to configure the server to auto-index certain fields,
 which is something some of the NoSQL/search tools offer. Similarly,
 when updating the node, the index should auto update without an
 explicit call to the indexer.


Agreed, auto-indexing would be *awesome*. There are some hard problems
related to doing auto indexing *well* that need to be solved first, but this
is something that I really hope we will end up implementing.



 (4)  The capability to do an upsert would be very useful, create a
 node if it exists for the given key, if not, update it.


Like I said above, the current approach I think is to put this logic on the
client side, which is slower, but the logic for doing this without
user-defined key-value style ids would potentially be very complex. I might
be wrong, but it my gut feeling is that we can't do this well if we don't
have user-defined ids.



 (5)   It seems the indexes are the only means of search?   If I need
 to search on a field that isn't indexed (say in production, I need to
 add a new index), how do I go about adding it for all the nodes that
 need to be added to the index *to* that index?   It seems I'd need to
 be keeping at least an index of all nodes of a given type all along,
 so I could at least iterate over those?'


The main means of searching the graph structure inside a neo4j database is
by traversing it. Basically, you write a description for how to travel the
graph and what data to return, and then you get a list of nodes, a list of
relationships or a list of paths back, depending on what you asked for.

The indexes are currently mainly used for simple lookups and for finding
starting points for traversals.

See http://components.neo4j.org/neo4j-server/milestone/rest.html#Traverse



 I think most of the underlying questions/problems I have are that I'm
 

Re: [Neo4j] REST results pagination

2011-04-19 Thread Jim Webber
 I'd like to propose that we put this functionality into the plugin 
 (https://github.com/skanjila/gremlin-translation-plugin) that Peter and I 
 are currently working on, thoughts?

I'm thinking that, if we do it, it should be handled through content 
negotiation. That is if you ask for application/atom then you get paged lists 
of results. I don't necessarily think that's a plugin, it's more likely part of 
the representation logic in server itself.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] WebCrawler-Data in Neo4j

2011-04-19 Thread Marc Seeger
Hey,
I'm currently thinking about how my current data (in mysql + solr)
would fit into Neo4j.

In one of my documents, there are the 3 types of data I have:

1. Properties that have high cardinality: e.g. the domain name
(www.example.org, unique), the subdomain name (www.), the
host-name (example)
2. A bunch of numbers (the website latency (1244ms), the amount of
incoming links (e.g. 2321))
3. A number of 'tags' that have a relatively low cardinality (100).
Things like the webserver (apache), the country (germany)

As for the model, I think it would be something like this:
- Every domain gets a node
- #1 would be modeled as a property on the domain node
- #2 would probably be put into a lucene index so I can sort on it later on
- #3 could be modeled using relations. E.g. a node that has two
properties: type:webserver and name:apache. All of the domain-nodes
can have a relation called runs on the webserver

Does this make sense?
I am used to Document DBs, relational DBs and Column Stores, but Graph
DBs are still pretty new to me and I don't think I got the model 100%
:)

Using this model, would I be able to filter subsets of the data such
as All Domains that run on apache and are in Germany and have more
than 200 incoming links sorted by the amount of links?
I played a bit arround with the neography gem in Ruby and I could do stuff like:

germany_nginx = germany_nodel.shortest_path_to(websrv_nginx).depth(2).nodes

But I couldn't figure out how to expand this query

Looking forward to the feedback!
Marc



-- 
Pessimists, we're told, look at a glass containing 50% air and 50%
water and see it as half empty. Optimists, in contrast, see it as half
full. Engineers, of course, understand the glass is twice as big as it
needs to be. (Bob Lewis)
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST API thoughts/questions/feedback

2011-04-19 Thread Michael DeHaan
On Tue, Apr 19, 2011 at 10:48 AM, Jacob Hansson ja...@voltvoodoo.com wrote:
 Hey Michael, big thanks again for taking the time to write down your
 experiences working with the REST API.

 See inline response.

Thanks for the follow up.  That's quite helpful and let's me know I'm
not doing the unique-key-implementation in too much of
a non-idiomatic way.

I'll get back with you about doc fixes and should the bindings
materialize further, I'll share some examples.

--Michael
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Strange performance difference on different machines

2011-04-19 Thread Bob Hutchison
Hi Tobias,

On 2011-04-19, at 1:48 AM, Tobias Ivarsson wrote:

 Hi Bob,
 
 What happens here is that you perform a tiny operation in each transaction,
 so what you are really testing here is how fast your file system can flush,
 because with such tiny transactions all of the time is going to be spent in
 transactional overhead (i.e. flushing transaction logs to the disk).
 
 The reason you see such large differences between Mac OS X and Linux is
 because Mac OS X cheats. Flushing a file (fdatasync) on Mac does pretty much
 nothing. The only thing Mac OS X guarantees is that it will write the data
 that you just flushed before it writes the next data block you flush, so
 called ordered writes. This means that you could potentially get data-loss
 on hard failure, but never in a way that makes your data internally
 inconsistent.

Okay, that's makes some sense. Thanks for the information.

 
 So to give a short answer to your questions:
 1) The linux number is reasonable, Mac OS X cheats.
 2) What you are testing is the write speed of your disk for writing small
 chunks of data.

So you're thinking that 16 or 17 writes is what should be expected?

Cheers,
Bob

 
 Cheers,
 Tobias
 
 On Mon, Apr 18, 2011 at 10:57 PM, Bob Hutchison 
 hutch-li...@recursive.cawrote:
 
 Hi,
 
 Using Neo4j 1.3 and the Borneo (Clojure) wrapper I'm getting radically
 different performance numbers with identical test code.
 
 The test is a simple-minded: create two nodes and a relation between them.
 No properties, no indexes, all nodes and relations are different.
 
 On OS X, it takes about 50s to perform that operation 50,000 times,  0.8s
 to do it 500 times. It uses roughly 30-40% of one core to do this.
 
 On linux it takes about 30s to perform that operation 500 times. The CPU
 usage is negligible (really negligible... almost none).
 
 I cannot explain the difference in behaviour.
 
 I have two questions:
 
 1) is either of these a reasonable number? I hoping the OS X numbers are
 not too fast.
 
 2) any ideas as to what might be the cause of this?
 
 The Computers are comparable. The OS X is a 2.8 GHz i7, the linux box is a
 3.something GHz Xeon (I don't remember the details).
 
 Thanks in advance for any help,
 Bob
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
 -- 
 Tobias Ivarsson tobias.ivars...@neotechnology.com
 Hacker, Neo Technology
 www.neotechnology.com
 Cellphone: +46 706 534857
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so




___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Strange performance difference on different machines

2011-04-19 Thread Rick Bullotta
I sure hope not! That's crazy slow, even with one transaction per operation...

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Bob Hutchison
Sent: Tuesday, April 19, 2011 4:11 PM
To: Neo4j user discussions
Subject: Re: [Neo4j] Strange performance difference on different machines

Hi Tobias,

On 2011-04-19, at 1:48 AM, Tobias Ivarsson wrote:

 Hi Bob,
 
 What happens here is that you perform a tiny operation in each transaction,
 so what you are really testing here is how fast your file system can flush,
 because with such tiny transactions all of the time is going to be spent in
 transactional overhead (i.e. flushing transaction logs to the disk).
 
 The reason you see such large differences between Mac OS X and Linux is
 because Mac OS X cheats. Flushing a file (fdatasync) on Mac does pretty much
 nothing. The only thing Mac OS X guarantees is that it will write the data
 that you just flushed before it writes the next data block you flush, so
 called ordered writes. This means that you could potentially get data-loss
 on hard failure, but never in a way that makes your data internally
 inconsistent.

Okay, that's makes some sense. Thanks for the information.

 
 So to give a short answer to your questions:
 1) The linux number is reasonable, Mac OS X cheats.
 2) What you are testing is the write speed of your disk for writing small
 chunks of data.

So you're thinking that 16 or 17 writes is what should be expected?

Cheers,
Bob

 
 Cheers,
 Tobias
 
 On Mon, Apr 18, 2011 at 10:57 PM, Bob Hutchison 
 hutch-li...@recursive.cawrote:
 
 Hi,
 
 Using Neo4j 1.3 and the Borneo (Clojure) wrapper I'm getting radically
 different performance numbers with identical test code.
 
 The test is a simple-minded: create two nodes and a relation between them.
 No properties, no indexes, all nodes and relations are different.
 
 On OS X, it takes about 50s to perform that operation 50,000 times,  0.8s
 to do it 500 times. It uses roughly 30-40% of one core to do this.
 
 On linux it takes about 30s to perform that operation 500 times. The CPU
 usage is negligible (really negligible... almost none).
 
 I cannot explain the difference in behaviour.
 
 I have two questions:
 
 1) is either of these a reasonable number? I hoping the OS X numbers are
 not too fast.
 
 2) any ideas as to what might be the cause of this?
 
 The Computers are comparable. The OS X is a 2.8 GHz i7, the linux box is a
 3.something GHz Xeon (I don't remember the details).
 
 Thanks in advance for any help,
 Bob
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
 -- 
 Tobias Ivarsson tobias.ivars...@neotechnology.com
 Hacker, Neo Technology
 www.neotechnology.com
 Cellphone: +46 706 534857
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


Bob Hutchison
Recursive Design Inc.
http://www.recursive.ca/
weblog: http://xampl.com/so




___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-19 Thread Javier de la Rosa
On Tue, Apr 19, 2011 at 10:25, Jim Webber j...@neotechnology.com wrote:
 I've just checked and that's in our list of stuff we really should do 
 because it annoys us that it's not there.
 No promises, but we do intend to work through at least some of that list for 
 the 1.4 releases.

If this finally is developed, it will possible to request for all
nodes and all relationships in some URL?


 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user



--
Javier de la Rosa
http://versae.es
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user