Re: [Neo4j] Fwd: Sync databases

2011-09-14 Thread Xavier Shay
> From: *Eddy Respondek* 
> Date: Wed, Sep 14, 2011 at 11:30 AM
> To: gremlin-us...@googlegroups.com
>
>
> This may be a little off topic but maybe someone has done something similar
> before.
>
> Basically I have a separate Wordpress site (php/mysql) which I've been
> extending significantly and I've setup another server on the same network
> for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my
> graph setup now and would like to attempt to get something small into
> development so I can monitor the results. I want to do a simple "like"
> relationship between users and articles.
>
> That means I need to keep an identical index of user ids and article ids in
> the graph db. I know how to update the id's when a new user or article is
> created, deleted, etc. What I don't know is the correct way to ensure data
> integrity in case something goes wrong like the graph db server crashes,
> etc.
>
> Does anyone have any thoughts on the best way to do this?
>
I store the last loaded IDs as properties on the root node, then every time
my sync script is run it loads everything from those IDs forward, checking
as it goes that it doesn't create duplicates. It's not the fastest way, but
it's robust. If the graph DB becomes corrupt, you roll back to the latest
back up and rerun the sync.

(We have the advantage that our data is immutable - you'll need some extra
changes if that isn't the case for you, but can use the same general
technique)

Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Returning arbitrary data from gremlin

2011-09-11 Thread Xavier Shay
On Sun, Sep 11, 2011 at 9:26 AM, espeed  wrote:

> Hi Xavier -
>
> If you would, provide a more detailed description of what the query is
> trying to do.
>
We have a primary node type, which has outbound connections to secondary
node types.
A primary node will never have a direct connection to another primary node,
but it can be "related" if they both have an outbound connection to the same
secondary node.

This query is pulling out all "related" edges to a given depth, and
returning those so that the graph can be visualized. As well as the edges,
for the visualization I also need properties stored on the nodes (they have
different types, for instance).


>
> You might look at the Gremlin aggregate, scatter, and gather methods, and I
> thought there was a collect() method, but I don't see it in the docs --
> Marko, isn't/wasn't that method?).
>
I couldn't from the docs figure out how to apply these methods to my
problem. I tried a few variations but nothing that behaved as I needed.


> Here are some docs that might be useful:
>
> Gremlin: Depth First vs. Breadth First
>   https://github.com/tinkerpop/gremlin/wiki/Depth-First-vs.-Breadth-First
>
> [Gremlin] The Generalize AsPipe model - back(), loop(), and table()
>   https://groups.google.com/d/topic/gremlin-users/5ujhy2bMKMI/discussion

loop() makes sense to me, the above code avoids it for performance reasons.
I played around with table quite a bit (reading both the wiki and Neo4j Rest
docs, including the one Peter linked) but couldn't make it "fit" into what I
want to do.

If it was possible to create a table outside of a pipe (where I am doing
t.collect in my code), I think that would be helpful, but I couldn't figure
out the API (I couldn't even find Table in the Gremlin source code ... where
is it?).

And to make things cleaner, you can also create user-define steps:
>  https://github.com/tinkerpop/gremlin/wiki/User-Defined-Steps.
>
> For example, here's a user-defined step that builds a tree:
>https://gist.github.com/1197179

I didn't know about these, will investigate.

Thanks everyone,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Returning arbitrary data from gremlin

2011-09-11 Thread Xavier Shay
Hello,
I've been playing with Gremlin over the weekend - thanks to all the help
provided already on other threads, it has been really useful.

I'm trying to optimize a query that currently takes three HTTP requests
(index fetch, traverse, batch get for node data) by writing a Gremlin script
and executing that over the REST API. I have it working, but it's somewhat
clunky in the way it gets the data out:

  t = [];
  start = g.v(3);
  for (i in 1..5) {
start = start.
  outE.except(t).sideEffect { t.add(it) }.inV.
   inE.except(t).sideEffect { t.add(it) }.outV;
  };
  start >> -1;
  t = t.collect {
outN = it.outV.toList().get(0);
inN  = it.inV.toList().get(0);
[outN.type, outN.key, inN.type, inN.key].join(",");
  };
  t.join(":")

(The sideEffect bit is to ensure breadth first ordering - trying to copy
the behavior of traverse/relationships)

I feel I should be able to use the Table object, but couldn't figure it out
(is there explicit API documentation for it anywhere?)
Ideally, I'd like to return a custom JSON object, something like:
[{"type": outN.type, "key": outN.key}, {"type": inN.type, "key", inN.key}]

...but I couldn't see an easy way of constructing that.
What I have already is trivial to parse, so it's not a big deal, but it's a
bit messy.

Cheers,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-10 Thread Xavier Shay
Reporting back, I came up with the following query that does what I need:
m = [:];
g.idx("user").get("key", "%query%*")._()[0..5].sideEffect { x = it.key
}.out.in.uniqueObject.loop(3) { it.loops < 2 }.sideEffect {
  if (!m[x]) { m[x] = 0; };
  m[x]++
} >> -1;
m;

* I didn't use mapWithDefault because I'm sending via the REST API and want
to be able to see the values of the map in the result: "{1=1, 2=1, 3=8,
4=1}"
* The out.in is a feature of our data (each user node is always connected to
other user nodes via an intermediary node)
* The [0..5] allows me to page through the index to avoid one massive query.
* I'm thinking perhaps the "aggregate" step would be useful, but I haven't
figured out how to use it yet.
* I will unroll the loop as Marko suggested when I get to optimizing it

Thanks for all the help,
Xavier

On Thu, Sep 8, 2011 at 9:32 AM, Xavier Shay wrote:

> Thanks everyone, this gives me plenty to work with. Will report back.
>
>
> On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez wrote:
>
>> Hey,
>>
>> > Won't this count dupes more than once?
>> >
>> > Xavier's requirements of "how many other nodes are they connected"
>> sounds
>> > like you should only count uniques, and that's why I am checking the
>> size of
>> > groupCount map instead of using count(). Instead of a map you could use
>> a
>> > Set with aggregate(), but I wasn't sure if they'd have the
>> aggregate-loop
>> > fix yet.
>>
>> Then add a uniqueObject to the pipeline.
>>
>>g.idx(index_name)[[key:value]].both.loop(1){it.loops <
>> depth}.uniqueObject.count()
>>
>> > Also Xavier said, "For all nodes in a particular index". I took that to
>> mean
>> > all nodes in an index, not all nodes for a particular value in an index,
>> > hence the wildcard query:
>> >
>> > index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
>> > "*")
>> >
>> > However, I am not sure/can't remember if you can do a wildcard query
>> without
>> > at least one leading character.
>>
>>
>> Oh then yea, the %query% header can be used.
>>
>> Marko.
>>
>> http://markorodriguez.com
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Gremlin pipes broken in 1.5

2011-09-10 Thread Xavier Shay
On Fri, Sep 9, 2011 at 1:16 PM, Peter Neubauer <
peter.neuba...@neotechnology.com> wrote:

> Xavier,
> this can well be a packaging problem. Could you try with 1.5.M01? The
> dependency problem is fixed, but we are fighting for the build
> pipeline to deliver a new snapshot of the final artifact.
>
ah ok, I thought I was on 1.5.M01 but was on a snapshot. 1.5.M01 works fine.
Sorry for the confusion.

Xavier


>
> Cheers,
>
> /peter neubauer
>
> GTalk:  neubauer.peter
> Skype   peter.neubauer
> Phone   +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter  http://twitter.com/peterneubauer
>
> http://www.neo4j.org   - Your high performance graph database.
> http://startupbootcamp.org/- Ă–resund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Fri, Sep 9, 2011 at 10:03 PM, Xavier Shay 
> wrote:
> > On Fri, Sep 9, 2011 at 12:39 PM, Marko Rodriguez  >wrote:
> >
> >> Hi,
> >>
> >> This error seems to be an issue with depedencies. Are you using an
> >> TinkerPop s in your project. I believe Neo4j 1.5.M01 is
> >> depending on Blueprints 0.9 which does have the getInEdges(String...)
> method
> >> (similarly for getOutEdges(). Perhaps you are depending on Blueprints
> 0.8
> >> somewhere in your project and the class loader is getting confused?
> >>
> > Don't really understand this suggestion. I downloaded neo4j from
> > http://dist.neo4j.org/neo4j-community-1.5-SNAPSHOT-unix.tar.gz, unpacked
> it,
> > then started up a server using `bin/neo4j server`. After loading in a
> node
> > via the rest interface, I ran the above command. I don't have a project
> that
> > could be relevant.
> >
> >
> >>
> >> Also, you don't need an identity pipe. You can do:
> >>g.v(1).both
> >>
> > ah nice. (This still causes the same exception though.)
> >
> > Cheers,
> > Xavier
> >
> >
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >> On Sep 9, 2011, at 1:29 PM, Xavier Shay wrote:
> >>
> >> > Hello,
> >> > I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am
> >> getting
> >> > the following exception whenever I try to use a pipe:
> >> >
> >> >> curl -H Accept:application/json -X POST -d
> >> '{"script":"g.v(1)._().both;"}'
> >> > -H Content-Type:application/json "
> >> >
> http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script";
> >> > {
> >> >  "message" :
> >> >
> >>
> "com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;",
> >> >  "exception" : "java.lang.AbstractMethodError:
> >> >
> >>
> com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;",
> >> >  "stacktrace" : [... trim, see
> >> > https://gist.github.com/1d6df2b03fa4d402ade3... ]
> >> > }
> >> >
> >> > The same command without ".both" works as expected. Other pipe methods
> >> such
> >> > as "outE" all cause the same exception. This was working with 1.4, and
> I
> >> > couldn't find any mention of backwards incompatibility in the Gremlin
> >> > changelog.
> >> >
> >> > I'm not sure how to debug this further. Any suggestions?
> >> >
> >> > Cheers,
> >> > Xavier
> >> > ___
> >> > Neo4j mailing list
> >> > User@lists.neo4j.org
> >> > https://lists.neo4j.org/mailman/listinfo/user
> >>
> >> ___
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Gremlin pipes broken in 1.5

2011-09-09 Thread Xavier Shay
On Fri, Sep 9, 2011 at 12:39 PM, Marko Rodriguez wrote:

> Hi,
>
> This error seems to be an issue with depedencies. Are you using an
> TinkerPop s in your project. I believe Neo4j 1.5.M01 is
> depending on Blueprints 0.9 which does have the getInEdges(String...) method
> (similarly for getOutEdges(). Perhaps you are depending on Blueprints 0.8
> somewhere in your project and the class loader is getting confused?
>
Don't really understand this suggestion. I downloaded neo4j from
http://dist.neo4j.org/neo4j-community-1.5-SNAPSHOT-unix.tar.gz, unpacked it,
then started up a server using `bin/neo4j server`. After loading in a node
via the rest interface, I ran the above command. I don't have a project that
could be relevant.


>
> Also, you don't need an identity pipe. You can do:
>g.v(1).both
>
ah nice. (This still causes the same exception though.)

Cheers,
Xavier


> Marko.
>
> http://markorodriguez.com
>
> On Sep 9, 2011, at 1:29 PM, Xavier Shay wrote:
>
> > Hello,
> > I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am
> getting
> > the following exception whenever I try to use a pipe:
> >
> >> curl -H Accept:application/json -X POST -d
> '{"script":"g.v(1)._().both;"}'
> > -H Content-Type:application/json "
> > http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script";
> > {
> >  "message" :
> >
> "com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;",
> >  "exception" : "java.lang.AbstractMethodError:
> >
> com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;",
> >  "stacktrace" : [... trim, see
> > https://gist.github.com/1d6df2b03fa4d402ade3... ]
> > }
> >
> > The same command without ".both" works as expected. Other pipe methods
> such
> > as "outE" all cause the same exception. This was working with 1.4, and I
> > couldn't find any mention of backwards incompatibility in the Gremlin
> > changelog.
> >
> > I'm not sure how to debug this further. Any suggestions?
> >
> > Cheers,
> > Xavier
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Gremlin pipes broken in 1.5

2011-09-09 Thread Xavier Shay
Hello,
I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am getting
the following exception whenever I try to use a pipe:

> curl -H Accept:application/json -X POST -d '{"script":"g.v(1)._().both;"}'
-H Content-Type:application/json "
http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script";
{
  "message" :
"com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;",
  "exception" : "java.lang.AbstractMethodError:
com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;",
  "stacktrace" : [... trim, see
https://gist.github.com/1d6df2b03fa4d402ade3... ]
}

The same command without ".both" works as expected. Other pipe methods such
as "outE" all cause the same exception. This was working with 1.4, and I
couldn't find any mention of backwards incompatibility in the Gremlin
changelog.

I'm not sure how to debug this further. Any suggestions?

Cheers,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-08 Thread Xavier Shay
Thanks everyone, this gives me plenty to work with. Will report back.

On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez wrote:

> Hey,
>
> > Won't this count dupes more than once?
> >
> > Xavier's requirements of "how many other nodes are they connected" sounds
> > like you should only count uniques, and that's why I am checking the size
> of
> > groupCount map instead of using count(). Instead of a map you could use a
> > Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop
> > fix yet.
>
> Then add a uniqueObject to the pipeline.
>
>g.idx(index_name)[[key:value]].both.loop(1){it.loops <
> depth}.uniqueObject.count()
>
> > Also Xavier said, "For all nodes in a particular index". I took that to
> mean
> > all nodes in an index, not all nodes for a particular value in an index,
> > hence the wildcard query:
> >
> > index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
> > "*")
> >
> > However, I am not sure/can't remember if you can do a wildcard query
> without
> > at least one leading character.
>
>
> Oh then yea, the %query% header can be used.
>
> Marko.
>
> http://markorodriguez.com
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Aggregate queries

2011-09-07 Thread Xavier Shay
Hello,
Is there an effective way to run aggregate queries over neo4j database? I'm
currently running a stock install of the REST server, and want to answer
questions like:

"For all nodes in a particular index, how many other nodes are they
connected to at depth X?"

Currently I have a script that fires up a number of threads and just hammers
the server with an HTTP request per index entry to fetch the relationships,
then does some post-processing on the result to calculate the count. This
doesn't seem the most efficient way.

What other options do I have?

Cheers,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user