Hi all,

I have a website where users can refer other users, who in turn can refer 
other users.   This can be a fairly large tree/graph structure (as deep as 
1000 or more).  We keep track of each users revenue they have earned us 
(from ads).    I'm trying to calculate a user's worth by finding out all 
the revenue they brought us via referrals, referrals of referrals, etc, as 
without that user all the rest of the users in his graph potentially would 
not have existed.   

I've done this using Nested Sets using MySQL, but this was proving 
challenging.  The issue's I had was it was (relatively) faster creating the 
tree from scratch each time -- around 2 hours.  But when adding to the tree 
new users from say yesterday, it takes much longer (more than the 2 hours 
to build it from scratch again).  The reason for this is for each insertion 
it would do a rgt=rgt+2 where rgt>referrer's rgt (same with lft), which 
meant changing the 1000+ records already in the database.  It was painfully 
slow.  When building it from scratch, we simply iterate over each users 
friends+friends of friends etc, that way it was really only changing the 
top end of the database records and therefor is much faster than a new 
insert at an earlier data point.

Anyway, it works, however the timing keeps going up and up, 2 hours becomes 
3 hours now, etc, and it's a lot of data.

Anyway I've been playing around with switching this to Neo4j, but am having 
a lot of trouble, need some guidance.   I've assigned userId 1 as my system 
user -- so if we wanted to add up the entire systems revenue we could start 
the graph from 1.

I'm using batch-import, here's an example of my nodes.csv (note I've 
changed some of it these are not actual values):
userId:int:userIds referUserId:int:referUserIds eventDate:string 
revenue:float lastTransactionTime:string
1 2014-01-25 0.00
115 1 2014-01-25 8.31
122 122 2014-01-25 2.45
123 1 2014-01-25 1.25
132 115 2014-01-25 7.53
133 115 2014-01-25 3.39
134 133 2014-01-25 10.69
135 134 2014-01-25 1.00
136 134 2014-01-25 0.69
137 134 2014-01-25 0.39
138 137 2014-01-25 1.29
139 137 2014-01-25 1.19
140 137 2014-01-25 1.09

Here's an example of my rels.csv:
userId:int:userIds userId:int:userIds type
1 115 referred
115 122 referred
1 123 referred
122 132 referred
122 133 referred
133 134 referred
134 135 referred
134 136 referred
134 137 referred
137 138 referred
137 139 referred
137 140 referred

Anyway two questions I have ---  
1) Which way should the relationship be going?  From 115 to 1 (115 was 
referred_by 1) or from 1 to 115 (1 referred 115) or should I have both 
relationships?
2) Should I start from userId 1 as system?  Is that what's causing it to 
loop?

Here's a query I've tried without success:
START user=node:userIds(userId='115')
MATCH (user)-[:referred*]-(friend)
WITH sum(friend.revenue) as revenues
RETURN revenues

It returns 30.96 .. However if you add up referrer and referrers of 
referrers of 115, it should add up to 29.71.  It's as if it's adding all of 
them, even 123 with a referral of 1 which seems to be the 1.25 difference. 
      I've tried :referred*..2 but that only returns 2 levels of data 
(14.62).

Anyway I need some help here.  

Thanks,
Steve

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to