Re: [Neo4j] Large scale network analysis - best strategy?
Nigel, thanks for pointing out server side vs client side considerations. On 18 Jun 2014, at 21:22, Nigel Small ni...@nigelsmall.com wrote: Hi Gareth All of py2neo's features use the Neo4j REST interface, both for Cypher and for its other functions. The main difference concerns where the application logic exists: on the server or on the client. With Cypher, all the logic is encapsulated within the language itself and will be processed server-side; this is generally pretty fast. While some optimisations exist for the other functions, the logic will generally occur on the client and one or more calls will need to be made to the server on each step; you will have slightly more control with this method but it will generally be slower. Assuming performance is your priority, I would therefore recommend using Cypher wherever possible and only resorting to other functions if you really need to. Hope this helps Nigel PS: glad you had some success with py2neo :-) On 18 Jun 2014 13:44, Shongololo garethsim...@gmail.com wrote: Hi Nigel, Out of curiosity - it appears that your py2neo works quite seamlessly with Cypher by using the append / execute / commit steps. (I actually ended up loading in my data using py2neo's Cypher module.) I would appreciate your take on py2neo's Cypher implementation vs. py2neo's non-Cypher implementation for speed and flexibility? (It appears that the Cypher module has a method Record for capturing query results?) Thanks, Gareth On Tuesday, June 17, 2014 11:04:53 PM UTC+1, Nigel Small wrote: Hi Gareth As you identify, there are certainly some differences in terms of performance and feature set that you get when working with Neo4j under different programming languages. Depending on your background, constraints and integration needs, you could consider a hybrid approach whereby you continue working with Python for your main application and build anything that requires serious performance as a server extension in Java. Neo4j plugin support is pretty comprehensive: for example, my server extension load2neo provides a facility to bulk load data but also has direct support from my Python driver, py2neo. This approach is somewhat analogous to compiling a C extension in Python and could be done as an optimisation step once you have built your end-to-end application logic. Bear in mind also that Cypher is very powerful these days. It would certainly be worth exploring some of its more recent capabilities before choosing an architectural path as you may find there is little that cannot already be achieved purely with Cypher. If this is the case, your choice of application language could then become far less critical. I'd suggest beginning with a prototype in a language you are comfortable with. Then, build a suite of queries you need to run and ascertain the bottlenecks or missing features. Once you have a list of these, you can then make an informed decision on which pieces to optimise. Kind regards Nigel On 17 June 2014 15:42, Shongololo gareth...@gmail.com wrote: I am preparing a Neo4j database on which I would like to do some network analysis. It is a representation of a weakly connected and static physical system, and will have in the region of 50 million nodes where, lets say, about 50 nodes will connect to a parent node, which in turn is linked (think streets and intersections) to a network of other parent nodes. For most of the analysis, I will be using a weighted distance decay, so analysis of things like betweenness or centrality will be computed for the parent node network, but only to a limited extent. So, for example, if (a)--(b)--(c)--(d)--(e), then the computation will only be based up to, say, two steps away. So (a) will consider (b) and (c), whereas (c) will consider two steps in either direction. My question is a conceptual and strategic one: What is the best approach for doing this kind of analysis with neo4j? I currently work with Python, but it appears that for speed, flexibility, and use of complex graph algorithms, I am better off working with the embedded Java API for direct and powerful access to the graph? Or is an approach using something like bulb flow with gremlin also feasible? How does the power and flexibility of the different embedded tools compare - e.g. Python embedded vs. Java vs. Node.js? Thanks. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit
Re: [Neo4j] Large scale network analysis - best strategy?
Hi Nigel, Out of curiosity - it appears that your py2neo works quite seamlessly with Cypher by using the append / execute / commit steps. (I actually ended up loading in my data using py2neo's Cypher module.) I would appreciate your take on py2neo's Cypher implementation vs. py2neo's non-Cypher implementation for speed and flexibility? (It appears that the Cypher module has a method Record for capturing query results?) Thanks, Gareth On Tuesday, June 17, 2014 11:04:53 PM UTC+1, Nigel Small wrote: Hi Gareth As you identify, there are certainly some differences in terms of performance and feature set that you get when working with Neo4j under different programming languages. Depending on your background, constraints and integration needs, you could consider a hybrid approach whereby you continue working with Python for your main application and build anything that requires serious performance as a server extension in Java. Neo4j plugin support is pretty comprehensive: for example, my server extension load2neo http://nigelsmall.com/load2neo provides a facility to bulk load data but also has direct support from my Python driver, py2neo http://py2neo.org/. This approach is somewhat analogous to compiling a C extension in Python and could be done as an optimisation step once you have built your end-to-end application logic. Bear in mind also that Cypher is very powerful these days. It would certainly be worth exploring some of its more recent capabilities before choosing an architectural path as you may find there is little that cannot already be achieved purely with Cypher. If this is the case, your choice of application language could then become far less critical. I'd suggest beginning with a prototype in a language you are comfortable with. Then, build a suite of queries you need to run and ascertain the bottlenecks or missing features. Once you have a list of these, you can then make an informed decision on which pieces to optimise. Kind regards Nigel On 17 June 2014 15:42, Shongololo gareth...@gmail.com javascript: wrote: I am preparing a Neo4j database on which I would like to do some network analysis. It is a representation of a weakly connected and static physical system, and will have in the region of 50 million nodes where, lets say, about 50 nodes will connect to a parent node, which in turn is linked (think streets and intersections) to a network of other parent nodes. For most of the analysis, I will be using a weighted distance decay, so analysis of things like betweenness or centrality will be computed for the parent node network, but only to a limited extent. So, for example, if (a)--(b)--(c)--(d)--(e), then the computation will only be based up to, say, two steps away. So (a) will consider (b) and (c), whereas (c) will consider two steps in either direction. My question is a conceptual and strategic one: What is the best approach for doing this kind of analysis with neo4j? I currently work with Python, but it appears that for speed, flexibility, and use of complex graph algorithms, I am better off working with the embedded Java API for direct and powerful access to the graph? Or is an approach using something like bulb flow with gremlin also feasible? How does the power and flexibility of the different embedded tools compare - e.g. Python embedded vs. Java vs. Node.js? Thanks. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com javascript:. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [Neo4j] Large scale network analysis - best strategy?
Hi Gareth All of py2neo's features use the Neo4j REST interface, both for Cypher and for its other functions. The main difference concerns where the application logic exists: on the server or on the client. With Cypher, all the logic is encapsulated within the language itself and will be processed server-side; this is generally pretty fast. While some optimisations exist for the other functions, the logic will generally occur on the client and one or more calls will need to be made to the server on each step; you will have slightly more control with this method but it will generally be slower. Assuming performance is your priority, I would therefore recommend using Cypher wherever possible and only resorting to other functions if you really need to. Hope this helps Nigel PS: glad you had some success with py2neo :-) On 18 Jun 2014 13:44, Shongololo garethsim...@gmail.com wrote: Hi Nigel, Out of curiosity - it appears that your py2neo works quite seamlessly with Cypher by using the append / execute / commit steps. (I actually ended up loading in my data using py2neo's Cypher module.) I would appreciate your take on py2neo's Cypher implementation vs. py2neo's non-Cypher implementation for speed and flexibility? (It appears that the Cypher module has a method Record for capturing query results?) Thanks, Gareth On Tuesday, June 17, 2014 11:04:53 PM UTC+1, Nigel Small wrote: Hi Gareth As you identify, there are certainly some differences in terms of performance and feature set that you get when working with Neo4j under different programming languages. Depending on your background, constraints and integration needs, you could consider a hybrid approach whereby you continue working with Python for your main application and build anything that requires serious performance as a server extension in Java. Neo4j plugin support is pretty comprehensive: for example, my server extension load2neo http://nigelsmall.com/load2neo provides a facility to bulk load data but also has direct support from my Python driver, py2neo http://py2neo.org/. This approach is somewhat analogous to compiling a C extension in Python and could be done as an optimisation step once you have built your end-to-end application logic. Bear in mind also that Cypher is very powerful these days. It would certainly be worth exploring some of its more recent capabilities before choosing an architectural path as you may find there is little that cannot already be achieved purely with Cypher. If this is the case, your choice of application language could then become far less critical. I'd suggest beginning with a prototype in a language you are comfortable with. Then, build a suite of queries you need to run and ascertain the bottlenecks or missing features. Once you have a list of these, you can then make an informed decision on which pieces to optimise. Kind regards Nigel On 17 June 2014 15:42, Shongololo gareth...@gmail.com wrote: I am preparing a Neo4j database on which I would like to do some network analysis. It is a representation of a weakly connected and static physical system, and will have in the region of 50 million nodes where, lets say, about 50 nodes will connect to a parent node, which in turn is linked (think streets and intersections) to a network of other parent nodes. For most of the analysis, I will be using a weighted distance decay, so analysis of things like betweenness or centrality will be computed for the parent node network, but only to a limited extent. So, for example, if (a)--(b)--(c)--(d)--(e), then the computation will only be based up to, say, two steps away. So (a) will consider (b) and (c), whereas (c) will consider two steps in either direction. My question is a conceptual and strategic one: What is the best approach for doing this kind of analysis with neo4j? I currently work with Python, but it appears that for speed, flexibility, and use of complex graph algorithms, I am better off working with the embedded Java API for direct and powerful access to the graph? Or is an approach using something like bulb flow with gremlin also feasible? How does the power and flexibility of the different embedded tools compare - e.g. Python embedded vs. Java vs. Node.js? Thanks. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving
Re: [Neo4j] Large scale network analysis - best strategy?
Hi Gareth As you identify, there are certainly some differences in terms of performance and feature set that you get when working with Neo4j under different programming languages. Depending on your background, constraints and integration needs, you could consider a hybrid approach whereby you continue working with Python for your main application and build anything that requires serious performance as a server extension in Java. Neo4j plugin support is pretty comprehensive: for example, my server extension load2neo http://nigelsmall.com/load2neo provides a facility to bulk load data but also has direct support from my Python driver, py2neo http://py2neo.org/. This approach is somewhat analogous to compiling a C extension in Python and could be done as an optimisation step once you have built your end-to-end application logic. Bear in mind also that Cypher is very powerful these days. It would certainly be worth exploring some of its more recent capabilities before choosing an architectural path as you may find there is little that cannot already be achieved purely with Cypher. If this is the case, your choice of application language could then become far less critical. I'd suggest beginning with a prototype in a language you are comfortable with. Then, build a suite of queries you need to run and ascertain the bottlenecks or missing features. Once you have a list of these, you can then make an informed decision on which pieces to optimise. Kind regards Nigel On 17 June 2014 15:42, Shongololo garethsim...@gmail.com wrote: I am preparing a Neo4j database on which I would like to do some network analysis. It is a representation of a weakly connected and static physical system, and will have in the region of 50 million nodes where, lets say, about 50 nodes will connect to a parent node, which in turn is linked (think streets and intersections) to a network of other parent nodes. For most of the analysis, I will be using a weighted distance decay, so analysis of things like betweenness or centrality will be computed for the parent node network, but only to a limited extent. So, for example, if (a)--(b)--(c)--(d)--(e), then the computation will only be based up to, say, two steps away. So (a) will consider (b) and (c), whereas (c) will consider two steps in either direction. My question is a conceptual and strategic one: What is the best approach for doing this kind of analysis with neo4j? I currently work with Python, but it appears that for speed, flexibility, and use of complex graph algorithms, I am better off working with the embedded Java API for direct and powerful access to the graph? Or is an approach using something like bulb flow with gremlin also feasible? How does the power and flexibility of the different embedded tools compare - e.g. Python embedded vs. Java vs. Node.js? Thanks. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Neo4j group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.