Hi all, I have a website where users can refer other users, who in turn can refer other users. This can be a fairly large tree/graph structure (as deep as 1000 or more). We keep track of each users revenue they have earned us (from ads). I'm trying to calculate a user's worth by finding out all the revenue they brought us via referrals, referrals of referrals, etc, as without that user all the rest of the users in his graph potentially would not have existed.
I've done this using Nested Sets using MySQL, but this was proving challenging. The issue's I had was it was (relatively) faster creating the tree from scratch each time -- around 2 hours. But when adding to the tree new users from say yesterday, it takes much longer (more than the 2 hours to build it from scratch again). The reason for this is for each insertion it would do a rgt=rgt+2 where rgt>referrer's rgt (same with lft), which meant changing the 1000+ records already in the database. It was painfully slow. When building it from scratch, we simply iterate over each users friends+friends of friends etc, that way it was really only changing the top end of the database records and therefor is much faster than a new insert at an earlier data point. Anyway, it works, however the timing keeps going up and up, 2 hours becomes 3 hours now, etc, and it's a lot of data. Anyway I've been playing around with switching this to Neo4j, but am having a lot of trouble, need some guidance. I've assigned userId 1 as my system user -- so if we wanted to add up the entire systems revenue we could start the graph from 1. I'm using batch-import, here's an example of my nodes.csv (note I've changed some of it these are not actual values): userId:int:userIds referUserId:int:referUserIds eventDate:string revenue:float lastTransactionTime:string 1 2014-01-25 0.00 115 1 2014-01-25 8.31 122 122 2014-01-25 2.45 123 1 2014-01-25 1.25 132 115 2014-01-25 7.53 133 115 2014-01-25 3.39 134 133 2014-01-25 10.69 135 134 2014-01-25 1.00 136 134 2014-01-25 0.69 137 134 2014-01-25 0.39 138 137 2014-01-25 1.29 139 137 2014-01-25 1.19 140 137 2014-01-25 1.09 Here's an example of my rels.csv: userId:int:userIds userId:int:userIds type 1 115 referred 115 122 referred 1 123 referred 122 132 referred 122 133 referred 133 134 referred 134 135 referred 134 136 referred 134 137 referred 137 138 referred 137 139 referred 137 140 referred Anyway two questions I have --- 1) Which way should the relationship be going? From 115 to 1 (115 was referred_by 1) or from 1 to 115 (1 referred 115) or should I have both relationships? 2) Should I start from userId 1 as system? Is that what's causing it to loop? Here's a query I've tried without success: START user=node:userIds(userId='115') MATCH (user)-[:referred*]-(friend) WITH sum(friend.revenue) as revenues RETURN revenues It returns 30.96 .. However if you add up referrer and referrers of referrers of 115, it should add up to 29.71. It's as if it's adding all of them, even 123 with a referral of 1 which seems to be the 1.25 difference. I've tried :referred*..2 but that only returns 2 levels of data (14.62). Anyway I need some help here. Thanks, Steve -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.