Re: [Neo4j] Large scale network analysis - best strategy?

2014-06-19 Thread Gareth Simons
Nigel, thanks for pointing out server side vs client side considerations.

On 18 Jun 2014, at 21:22, Nigel Small ni...@nigelsmall.com wrote:

 Hi Gareth
 
 All of py2neo's features use the Neo4j REST interface, both for Cypher and 
 for its other functions. The main difference concerns where the application 
 logic exists: on the server or on the client.
 
 With Cypher, all the logic is encapsulated within the language itself and 
 will be processed server-side; this is generally pretty fast. While some 
 optimisations exist for the other functions, the logic will generally occur 
 on the client and one or more calls will need to be made to the server on 
 each step; you will have slightly more control with this method but it will 
 generally be slower.
 
 Assuming performance is your priority, I would therefore recommend using 
 Cypher wherever possible and only resorting to other functions if you really 
 need to.
 
 Hope this helps
 Nigel
 
 PS: glad you had some success with py2neo :-)
 
 On 18 Jun 2014 13:44, Shongololo garethsim...@gmail.com wrote:
 Hi Nigel,
 
 Out of curiosity - it appears that your py2neo works quite seamlessly with 
 Cypher by using the append / execute / commit steps. (I actually ended up 
 loading in my data using py2neo's Cypher module.) I would appreciate your 
 take on py2neo's Cypher implementation vs. py2neo's non-Cypher implementation 
 for speed and flexibility? (It appears that the Cypher module has a method 
 Record for capturing query results?)
 
 Thanks,
 Gareth
 
 On Tuesday, June 17, 2014 11:04:53 PM UTC+1, Nigel Small wrote:
 Hi Gareth
 
 As you identify, there are certainly some differences in terms of performance 
 and feature set that you get when working with Neo4j under different 
 programming languages. Depending on your background, constraints and 
 integration needs, you could consider a hybrid approach whereby you continue 
 working with Python for your main application and build anything that 
 requires serious performance as a server extension in Java. Neo4j plugin 
 support is pretty comprehensive: for example, my server extension load2neo 
 provides a facility to bulk load data but also has direct support from my 
 Python driver, py2neo. This approach is somewhat analogous to compiling a C 
 extension in Python and could be done as an optimisation step once you have 
 built your end-to-end application logic.
 
 Bear in mind also that Cypher is very powerful these days. It would certainly 
 be worth exploring some of its more recent capabilities before choosing an 
 architectural path as you may find there is little that cannot already be 
 achieved purely with Cypher. If this is the case, your choice of application 
 language could then become far less critical.
 
 I'd suggest beginning with a prototype in a language you are comfortable 
 with. Then, build a suite of queries you need to run and ascertain the 
 bottlenecks or missing features. Once you have a list of these, you can then 
 make an informed decision on which pieces to optimise. 
 
 Kind regards
 Nigel
 
 
 On 17 June 2014 15:42, Shongololo gareth...@gmail.com wrote:
 I am preparing a Neo4j database on which I would like to do some network 
 analysis. It is a representation of a weakly connected and static physical 
 system, and will have in the region of 50 million nodes where, lets say, 
 about 50 nodes will connect to a parent node, which in turn is linked (think 
 streets and intersections) to a network of other parent nodes.
 
 For most of the analysis, I will be using a weighted distance decay, so 
 analysis of things like betweenness or centrality will be computed for 
 the parent node network, but only to a limited extent. So, for example, if 
 (a)--(b)--(c)--(d)--(e), then the computation will only be based up to, say, 
 two steps away. So (a) will consider (b) and (c), whereas (c) will consider 
 two steps in either direction.
 
 My question is a conceptual and strategic one: What is the best approach for 
 doing this kind of analysis with neo4j?
 
 I currently work with Python, but it appears that for speed, flexibility, and 
 use of complex graph algorithms, I am better off working with the embedded 
 Java API for direct and powerful access to the graph? Or is an approach using 
 something like bulb flow with gremlin also feasible? How does the power and 
 flexibility of the different embedded tools compare - e.g. Python embedded 
 vs. Java vs. Node.js?
 
 Thanks.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 Neo4j group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to neo4j+un...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 Neo4j group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to neo4j+unsubscr...@googlegroups.com.
 For more options, visit 

Re: [Neo4j] Large scale network analysis - best strategy?

2014-06-18 Thread Shongololo
Hi Nigel,

Out of curiosity - it appears that your py2neo works quite seamlessly with 
Cypher by using the append / execute / commit steps. (I actually ended up 
loading in my data using py2neo's Cypher module.) I would appreciate your 
take on py2neo's Cypher implementation vs. py2neo's non-Cypher 
implementation for speed and flexibility? (It appears that the Cypher 
module has a method Record for capturing query results?)

Thanks,
Gareth

On Tuesday, June 17, 2014 11:04:53 PM UTC+1, Nigel Small wrote:

 Hi Gareth

 As you identify, there are certainly some differences in terms of 
 performance and feature set that you get when working with Neo4j under 
 different programming languages. Depending on your background, constraints 
 and integration needs, you could consider a hybrid approach whereby you 
 continue working with Python for your main application and build anything 
 that requires serious performance as a server extension in Java. Neo4j 
 plugin support is pretty comprehensive: for example, my server extension 
 load2neo http://nigelsmall.com/load2neo provides a facility to bulk 
 load data but also has direct support from my Python driver, py2neo 
 http://py2neo.org/. This approach is somewhat analogous to compiling a 
 C extension in Python and could be done as an optimisation step once you 
 have built your end-to-end application logic.

 Bear in mind also that Cypher is very powerful these days. It would 
 certainly be worth exploring some of its more recent capabilities before 
 choosing an architectural path as you may find there is little that cannot 
 already be achieved purely with Cypher. If this is the case, your choice of 
 application language could then become far less critical.

 I'd suggest beginning with a prototype in a language you are comfortable 
 with. Then, build a suite of queries you need to run and ascertain the 
 bottlenecks or missing features. Once you have a list of these, you can 
 then make an informed decision on which pieces to optimise. 

 Kind regards
 Nigel


 On 17 June 2014 15:42, Shongololo gareth...@gmail.com javascript: 
 wrote:

 I am preparing a Neo4j database on which I would like to do some network 
 analysis. It is a representation of a weakly connected and static physical 
 system, and will have in the region of 50 million nodes where, lets say, 
 about 50 nodes will connect to a parent node, which in turn is linked 
 (think streets and intersections) to a network of other parent nodes.

 For most of the analysis, I will be using a weighted distance decay, so 
 analysis of things like betweenness or centrality will be computed for 
 the parent node network, but only to a limited extent. So, for example, if 
 (a)--(b)--(c)--(d)--(e), then the computation will only be based up to, 
 say, two steps away. So (a) will consider (b) and (c), whereas (c) will 
 consider two steps in either direction.

 My question is a conceptual and strategic one: What is the best approach 
 for doing this kind of analysis with neo4j?

 I currently work with Python, but it appears that for speed, flexibility, 
 and use of complex graph algorithms, I am better off working with the 
 embedded Java API for direct and powerful access to the graph? Or is an 
 approach using something like bulb flow with gremlin also feasible? How 
 does the power and flexibility of the different embedded tools compare - 
 e.g. Python embedded vs. Java vs. Node.js?

 Thanks.

  -- 
 You received this message because you are subscribed to the Google Groups 
 Neo4j group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to neo4j+un...@googlegroups.com javascript:.
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
Neo4j group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Neo4j] Large scale network analysis - best strategy?

2014-06-18 Thread Nigel Small
Hi Gareth

All of py2neo's features use the Neo4j REST interface, both for Cypher and
for its other functions. The main difference concerns where the application
logic exists: on the server or on the client.

With Cypher, all the logic is encapsulated within the language itself and
will be processed server-side; this is generally pretty fast. While some
optimisations exist for the other functions, the logic will generally occur
on the client and one or more calls will need to be made to the server on
each step; you will have slightly more control with this method but it will
generally be slower.

Assuming performance is your priority, I would therefore recommend using
Cypher wherever possible and only resorting to other functions if you
really need to.

Hope this helps
Nigel

PS: glad you had some success with py2neo :-)
On 18 Jun 2014 13:44, Shongololo garethsim...@gmail.com wrote:

 Hi Nigel,

 Out of curiosity - it appears that your py2neo works quite seamlessly with
 Cypher by using the append / execute / commit steps. (I actually ended up
 loading in my data using py2neo's Cypher module.) I would appreciate your
 take on py2neo's Cypher implementation vs. py2neo's non-Cypher
 implementation for speed and flexibility? (It appears that the Cypher
 module has a method Record for capturing query results?)

 Thanks,
 Gareth

 On Tuesday, June 17, 2014 11:04:53 PM UTC+1, Nigel Small wrote:

 Hi Gareth

 As you identify, there are certainly some differences in terms of
 performance and feature set that you get when working with Neo4j under
 different programming languages. Depending on your background, constraints
 and integration needs, you could consider a hybrid approach whereby you
 continue working with Python for your main application and build anything
 that requires serious performance as a server extension in Java. Neo4j
 plugin support is pretty comprehensive: for example, my server extension
 load2neo http://nigelsmall.com/load2neo provides a facility to bulk
 load data but also has direct support from my Python driver, py2neo
 http://py2neo.org/. This approach is somewhat analogous to compiling a
 C extension in Python and could be done as an optimisation step once you
 have built your end-to-end application logic.

 Bear in mind also that Cypher is very powerful these days. It would
 certainly be worth exploring some of its more recent capabilities before
 choosing an architectural path as you may find there is little that cannot
 already be achieved purely with Cypher. If this is the case, your choice of
 application language could then become far less critical.

 I'd suggest beginning with a prototype in a language you are comfortable
 with. Then, build a suite of queries you need to run and ascertain the
 bottlenecks or missing features. Once you have a list of these, you can
 then make an informed decision on which pieces to optimise.

 Kind regards
 Nigel


 On 17 June 2014 15:42, Shongololo gareth...@gmail.com wrote:

 I am preparing a Neo4j database on which I would like to do some network
 analysis. It is a representation of a weakly connected and static physical
 system, and will have in the region of 50 million nodes where, lets say,
 about 50 nodes will connect to a parent node, which in turn is linked
 (think streets and intersections) to a network of other parent nodes.

 For most of the analysis, I will be using a weighted distance decay, so
 analysis of things like betweenness or centrality will be computed for
 the parent node network, but only to a limited extent. So, for example, if
 (a)--(b)--(c)--(d)--(e), then the computation will only be based up to,
 say, two steps away. So (a) will consider (b) and (c), whereas (c) will
 consider two steps in either direction.

 My question is a conceptual and strategic one: What is the best approach
 for doing this kind of analysis with neo4j?

 I currently work with Python, but it appears that for speed,
 flexibility, and use of complex graph algorithms, I am better off working
 with the embedded Java API for direct and powerful access to the graph? Or
 is an approach using something like bulb flow with gremlin also feasible?
 How does the power and flexibility of the different embedded tools compare
 - e.g. Python embedded vs. Java vs. Node.js?

 Thanks.

  --
 You received this message because you are subscribed to the Google
 Groups Neo4j group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to neo4j+un...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 Neo4j group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to neo4j+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
Neo4j group.
To unsubscribe from this group and stop receiving 

Re: [Neo4j] Large scale network analysis - best strategy?

2014-06-17 Thread Nigel Small
Hi Gareth

As you identify, there are certainly some differences in terms of
performance and feature set that you get when working with Neo4j under
different programming languages. Depending on your background, constraints
and integration needs, you could consider a hybrid approach whereby you
continue working with Python for your main application and build anything
that requires serious performance as a server extension in Java. Neo4j
plugin support is pretty comprehensive: for example, my server extension
load2neo http://nigelsmall.com/load2neo provides a facility to bulk load
data but also has direct support from my Python driver, py2neo
http://py2neo.org/. This approach is somewhat analogous to compiling a C
extension in Python and could be done as an optimisation step once you have
built your end-to-end application logic.

Bear in mind also that Cypher is very powerful these days. It would
certainly be worth exploring some of its more recent capabilities before
choosing an architectural path as you may find there is little that cannot
already be achieved purely with Cypher. If this is the case, your choice of
application language could then become far less critical.

I'd suggest beginning with a prototype in a language you are comfortable
with. Then, build a suite of queries you need to run and ascertain the
bottlenecks or missing features. Once you have a list of these, you can
then make an informed decision on which pieces to optimise.

Kind regards
Nigel


On 17 June 2014 15:42, Shongololo garethsim...@gmail.com wrote:

 I am preparing a Neo4j database on which I would like to do some network
 analysis. It is a representation of a weakly connected and static physical
 system, and will have in the region of 50 million nodes where, lets say,
 about 50 nodes will connect to a parent node, which in turn is linked
 (think streets and intersections) to a network of other parent nodes.

 For most of the analysis, I will be using a weighted distance decay, so
 analysis of things like betweenness or centrality will be computed for
 the parent node network, but only to a limited extent. So, for example, if
 (a)--(b)--(c)--(d)--(e), then the computation will only be based up to,
 say, two steps away. So (a) will consider (b) and (c), whereas (c) will
 consider two steps in either direction.

 My question is a conceptual and strategic one: What is the best approach
 for doing this kind of analysis with neo4j?

 I currently work with Python, but it appears that for speed, flexibility,
 and use of complex graph algorithms, I am better off working with the
 embedded Java API for direct and powerful access to the graph? Or is an
 approach using something like bulb flow with gremlin also feasible? How
 does the power and flexibility of the different embedded tools compare -
 e.g. Python embedded vs. Java vs. Node.js?

 Thanks.

  --
 You received this message because you are subscribed to the Google Groups
 Neo4j group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to neo4j+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
Neo4j group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.