I think I've resolved this by not creating a profile node to step though. I was thinking I would use this node as a way of connecting a user and then traversing to similar profile nodes and finding users that way, thought it might be easier than creating relationships on a per user basis, but in hindsight the connection per user is sufficient. Now to create a different test dataset and randmoly select 1000 profile responses and assign them to a name
On Thursday, 23 March 2017 14:35:02 UTC, Dave Clissold wrote: > > I am fairly new to programming and this is my first time using graph > databases, Cypher and Neo4J, I am learning as I go, testing to see if each > stage is a viable route to final development and trying to gain enough of a > basic understanding of each element needed for the application, so I can > hire and communicate with a full time team, as well as be able to do grunt > work when needed, rather than be the entrepreneur who has no clue about > what is happening and just expects things to happen. Any assistance would > be greatly appreciated. > > I am trying to create a database which will allow users with similar > profiles to match. They have answered questions and have been able to > create the nodes that would represent each profile possibility by assigning > a numerical value to each answer, so I have. > > :Profile > quA: 1, quB: 1,quC: 1, quD: 1, quE: 1, quF: 1, quG: 1, quH: 1, quI: 1, > quJ: 1 > .... > all the way to > .... > quA: 5, quB: 5,quC: 5, quD: 5, quE: 5, quF: 5, quG: 3, quH: 3, quI: 2, > quJ: 2 > > where each numerical value is stored as an integer, this has resulted in > 562500 nodes imported by CSV this created a 515Mb database. I have also > concatenated the answers to create a unique ID for each node so that I can > run the following query. > > MATCH (a1:Profile), (b1:Profile) > WHERE a1.profileID < b1.profileId AND a1.quA = b1.quA AND a1.quB = b1.quB > AND a1.quC = b1.quC AND a1.quD = b1.quD AND a1.quE = b1.quE AND a1.quF = > b1.quF AND a1.quG = b1.quG > CREATE UNIQUE (a1)-[:SIMILAR {strength: 7} ]->(b1) > > > and so on so that I have every combination of 7 parameters matching up to > 9 parameters matching. I know that will eventually create 175 relationships > per node so a massive total of 98,437,500 relationships. > > > Have set this up in a docker container on a google compute 8core 52Gb (the > max on the free trial option), with a 65500MB heap size, (based on the > calculator). > > I am trying to find out if there is a more efficient way to create these > relationships, as on this setup, I have tried running the 1st query, > above), it has currently taken over 5 hours and has not finished, . Can > anyone suggest a better query or workflow to create such a large number of > relationships? The last thing I want to do is try and create individual > relationships and input them, unless someone can suggest a way of doing > this via a script and to send the queries via json. > > Regards > > > Dave > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.