Thanks for that Michael, a little over my head at the moment but will definitely put it in the study list
On Thursday, 20 April 2017 12:54:48 UTC+1, Michael Hunger wrote: > > Dave, Kamal, > > the apoc library recently got some similarity functions, which might be > helpful for your use-case ? > > Please have a look: > https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_graph_algorithms_work_in_progress > > apoc.algo.cosineSimilarity([vector1], [vector2]) > > Compute cosine similarity > > apoc.algo.euclideanDistance([vector1], [vector2]) > > Compute Euclidean distance > > apoc.algo.euclideanSimilarity([vector1], [vector2]) > > Compute Euclidean similarity > > Cheers, Michael > > On Wed, Apr 19, 2017 at 10:00 PM, Kamal Murthy <amey...@gmail.com > <javascript:>> wrote: > >> Hi Dave, >> >> MATCH (a1:Profile) >> MATCH (b1:Profile) >> WHERE a1.profileID = 1111111111 AND b1.profileId = 1111111122 >> MERGE (a1)-[rel:SIMILAR]-(b1) ON CREATE SET rel.strength = 8 >> >> Q: profileID = 1111111111 will have a total marks of 10 while profileID = >> 1111111122 will have a total marks 12. I am not sure if that is what you >> want. >> >> In my opinion it is best to group by total marks, with 10 as minimum and >> 50 as maximum, assuming that marks for each question range from 1 to 5. >> >> Q. Generating .csv file. >> >> Ten questions with marks ranging from 1 to 5 for each question, there >> will be 9,765,625 records (profiles). One can create a table in SQL server >> database with ten columns (like Q1 to Q10) and 5 rows. Each column will >> have 1, 2, 3, 4, 5 values. Using a cross join, one can generate all the >> combinations like (1,1,1,1,1,1,1,1,1,1 to 5,5,5,5,5,5,5,5,5,5). Then you >> can export the data (as .csv file) with concatenating the column values to >> get the ids and sum the values to get the total marks ( all for each >> column). >> >> You can use this .csv file create nodes and relationships in Ne04j. >> >> -Kamal >> >> >> >> On Monday, April 10, 2017 at 4:39:00 AM UTC-7, Dave Clissold wrote: >>> >>> Sorry I got a little confused about what you were asking.. here is the >>> png output of the PROFILE, Is this what you were asking for? >>> >>> >>> <https://lh3.googleusercontent.com/-7xZp8YMOzkw/WOtt9QtFQQI/AAAAAAAAASQ/qdexik_7Oo0l0RV3npKsXEqaSQr9RjlqQCLcB/s1600/plan%2B%25281%2529.png> >>> >>> >>> I put the check into the smaller id, when I ran an original test it >>> created 4 different relationships per match, but I think taht was because I >>> was using MATCH not MERGE and did not have anything to stop the NODE from >>> being itself such as a1 <> b1, would this be better and only create a >>> single relationship? >>> >>> On Thursday, 23 March 2017 14:35:02 UTC, Dave Clissold wrote: >>>> >>>> I am fairly new to programming and this is my first time using graph >>>> databases, Cypher and Neo4J, I am learning as I go, testing to see if each >>>> stage is a viable route to final development and trying to gain enough of >>>> a >>>> basic understanding of each element needed for the application, so I >>>> can hire and communicate with a full time team, as well as be able to do >>>> grunt work when needed, rather than be the entrepreneur who has no clue >>>> about what is happening and just expects things to happen. Any assistance >>>> would be greatly appreciated. >>>> >>>> I am trying to create a database which will allow users with similar >>>> profiles to match. They have answered questions and have been able to >>>> create the nodes that would represent each profile possibility by >>>> assigning >>>> a numerical value to each answer, so I have. >>>> >>>> :Profile >>>> quA: 1, quB: 1,quC: 1, quD: 1, quE: 1, quF: 1, quG: 1, quH: 1, quI: 1, >>>> quJ: 1 >>>> .... >>>> all the way to >>>> .... >>>> quA: 5, quB: 5,quC: 5, quD: 5, quE: 5, quF: 5, quG: 3, quH: 3, quI: 2, >>>> quJ: 2 >>>> >>>> where each numerical value is stored as an integer, this has resulted >>>> in 562500 nodes imported by CSV this created a 515Mb database. I have also >>>> concatenated the answers to create a unique ID for each node so that I can >>>> run the following query. >>>> >>>> MATCH (a1:Profile), (b1:Profile) >>>> WHERE a1.profileID < b1.profileId AND a1.quA = b1.quA AND a1.quB = >>>> b1.quB AND a1.quC = b1.quC AND a1.quD = b1.quD AND a1.quE = b1.quE AND >>>> a1.quF = b1.quF AND a1.quG = b1.quG >>>> CREATE UNIQUE (a1)-[:SIMILAR {strength: 7} ]->(b1) >>>> >>>> >>>> and so on so that I have every combination of 7 parameters matching up >>>> to 9 parameters matching. I know that will eventually create 175 >>>> relationships per node so a massive total of 98,437,500 relationships. >>>> >>>> >>>> Have set this up in a docker container on a google compute 8core 52Gb >>>> (the max on the free trial option), with a 65500MB heap size, (based on >>>> the >>>> calculator). >>>> >>>> I am trying to find out if there is a more efficient way to create >>>> these relationships, as on this setup, I have tried running the 1st query, >>>> above), it has currently taken over 5 hours and has not finished, . Can >>>> anyone suggest a better query or workflow to create such a large number of >>>> relationships? The last thing I want to do is try and create individual >>>> relationships and input them, unless someone can suggest a way of doing >>>> this via a script and to send the queries via json. >>>> >>>> Regards >>>> >>>> >>>> Dave >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+un...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.