Thanks for that Michael, a little over my head at the moment but will 
definitely put it in the study list

On Thursday, 20 April 2017 12:54:48 UTC+1, Michael Hunger wrote:
>
> Dave, Kamal,
>
> the apoc library recently got some similarity functions, which might be 
> helpful for your use-case ?
>
> Please have a look: 
> https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_graph_algorithms_work_in_progress
>
> apoc.algo.cosineSimilarity([vector1], [vector2])
>
> Compute cosine similarity
>
> apoc.algo.euclideanDistance([vector1], [vector2])
>
> Compute Euclidean distance
>
> apoc.algo.euclideanSimilarity([vector1], [vector2])
>
> Compute Euclidean similarity
>
> Cheers, Michael
>
> On Wed, Apr 19, 2017 at 10:00 PM, Kamal Murthy <amey...@gmail.com 
> <javascript:>> wrote:
>
>> Hi Dave,
>>
>> MATCH (a1:Profile)
>> MATCH (b1:Profile)
>> WHERE a1.profileID = 1111111111 AND b1.profileId = 1111111122
>> MERGE (a1)-[rel:SIMILAR]-(b1) ON CREATE SET rel.strength = 8
>>
>> Q: profileID = 1111111111 will have a total marks of 10 while profileID = 
>> 1111111122 will have a total marks 12. I am not sure if that is what you 
>> want.
>>
>> In my opinion it is best to group by total marks, with 10 as minimum and 
>> 50 as maximum, assuming that marks for each question range from 1 to 5.
>>
>> Q. Generating .csv file. 
>>
>> Ten questions with marks ranging from 1 to 5 for each question, there 
>> will be 9,765,625 records (profiles). One can create a table in SQL server 
>> database with ten columns (like Q1 to Q10) and 5 rows. Each column will 
>> have 1, 2, 3, 4, 5 values. Using a cross join, one can generate all the 
>> combinations like (1,1,1,1,1,1,1,1,1,1 to 5,5,5,5,5,5,5,5,5,5). Then you 
>> can export the data (as .csv file) with concatenating the column values to 
>> get the ids and sum the values to get the total marks ( all for each 
>> column). 
>>
>> You can use this .csv file create nodes and relationships in Ne04j. 
>>
>> -Kamal
>>
>>
>>
>> On Monday, April 10, 2017 at 4:39:00 AM UTC-7, Dave Clissold wrote:
>>>
>>> Sorry I got a little confused about what you were asking.. here is the 
>>> png output of the PROFILE, Is this what you were asking for?
>>>
>>>
>>> <https://lh3.googleusercontent.com/-7xZp8YMOzkw/WOtt9QtFQQI/AAAAAAAAASQ/qdexik_7Oo0l0RV3npKsXEqaSQr9RjlqQCLcB/s1600/plan%2B%25281%2529.png>
>>>
>>>
>>> I put the check into the smaller id, when I ran an original test it 
>>> created 4 different relationships per match, but I think taht was because I 
>>> was using MATCH not MERGE and did not have anything to stop the NODE from 
>>> being itself such as a1 <> b1, would this be better and only create a 
>>> single relationship?
>>>
>>> On Thursday, 23 March 2017 14:35:02 UTC, Dave Clissold wrote:
>>>>
>>>> I am fairly new to programming and this is my first time using graph 
>>>> databases, Cypher and Neo4J, I am learning as I go, testing to see if each 
>>>> stage is a viable route to final development and trying to gain enough of 
>>>> a 
>>>> basic understanding of each element needed for the application,  so I 
>>>> can hire and communicate with a full time team, as well as be able to do 
>>>> grunt work when needed, rather than be the entrepreneur who has no clue 
>>>> about what is happening and just expects things to happen. Any assistance 
>>>> would be greatly appreciated.
>>>>
>>>> I am trying to create a database which will allow users with similar 
>>>> profiles to match.  They have answered questions and have been able to 
>>>> create the nodes that would represent each profile possibility by 
>>>> assigning 
>>>> a numerical value to each answer, so I have.
>>>>
>>>> :Profile
>>>> quA: 1, quB: 1,quC: 1, quD: 1, quE: 1, quF: 1, quG: 1, quH: 1, quI: 1, 
>>>> quJ: 1
>>>> ....
>>>> all the way to
>>>> ....
>>>> quA: 5, quB: 5,quC: 5, quD: 5, quE: 5, quF: 5, quG: 3, quH: 3, quI: 2, 
>>>> quJ: 2
>>>>
>>>> where each numerical value is stored as an integer, this has resulted 
>>>> in 562500 nodes imported by CSV this created a 515Mb database. I have also 
>>>> concatenated the answers to create a unique ID for each node so that I can 
>>>> run the following query.
>>>>
>>>> MATCH (a1:Profile), (b1:Profile)
>>>> WHERE a1.profileID < b1.profileId AND a1.quA = b1.quA AND a1.quB = 
>>>> b1.quB AND a1.quC = b1.quC AND a1.quD = b1.quD AND a1.quE = b1.quE AND 
>>>> a1.quF = b1.quF AND a1.quG = b1.quG
>>>> CREATE UNIQUE (a1)-[:SIMILAR  {strength: 7} ]->(b1)
>>>>
>>>>
>>>> and so on so that I have every combination of 7 parameters matching up 
>>>> to 9 parameters matching. I know that will eventually create 175 
>>>> relationships per node so a massive total of 98,437,500 relationships.
>>>>
>>>>
>>>> Have set this up in a docker container on a google compute 8core 52Gb 
>>>> (the max on the free trial option), with a 65500MB heap size, (based on 
>>>> the 
>>>> calculator).
>>>>
>>>> I am trying to find out if there is a more efficient way to create 
>>>> these relationships, as on this setup, I have tried running the 1st query, 
>>>> above), it has currently taken over 5 hours and has not finished, .  Can 
>>>> anyone suggest a better query or workflow to create such a large number of 
>>>> relationships?  The last thing I want to do is try and create individual 
>>>> relationships and input them, unless someone can suggest a way of doing 
>>>> this via a script and to send the queries via json.
>>>>
>>>> Regards
>>>>
>>>>
>>>> Dave
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to