Tim has forwarded me the email since I was removed from the thread. On 27/02/14 17:35, "Timothee Maret" <[email protected]> wrote:
> >________________________________________ >From: Thomas Mueller >Sent: Thursday, February 27, 2014 5:12 PM >To: [email protected] >Cc: Marcel Reutegger; Ian Boston; Timothee Maret >Subject: Re: Oak Scalability: Load Distribution > >Hi, > >>The path depth is prepended to >>the path to ensure that the nodes are distributed more equally. > >Actually, the reason for the prefix is not that the nodes are distributed >more equally, but so that queries for child nodes are efficient, and so >that siblings are stored next to each other. Queries for child nodes are >range queries of the form "id between '2:/content/' and '2:/content0'". >This is efficient because MongoDB keeps a documents sorted by id. For more >details about range queries, see >http://docs.mongodb.org/manual/core/index-single/ Ok, I understand; however, this doesn’t change my statement. > >>Cards: /content/<tenant>/<board>/<card> >>Comments: /content/<tenant>/<board>/<card>/comments/<comment> >> >>As you can see, all cards and all comments are saved on the same level >>and >>hence end up on the same cluster node. > >In this case, cards are stored next to each other, and comments are stored >next to each other. But not necessarily on the same cluster node. I agree that this must not always be the case (e.g. if every tenant has a lot of other data below /content on the same level), but the probability is way too high. > >> If we assume that every card gets >>10 comments, this will cause 10 times more write load on the ³comments² >>cluster node than on the ³cards² cluster node. > >MongoDB will distribute the nodes evenly accross shards. In the extreme >case, if there are 10 shards, and if 10% of the data is cards and 90% is >comments, then one cluster node will have all the cards, while the >comments are distributed accross the remaining cluster nodes. Which means that one cluster is much less busy with writes. In reality, there is a lot of other (often read-only) data which is saved on - let’s say - another 10 nodes which had then much less write operations too. > >>A much better distribution could be achieved if the hash/checksum of the >>parent node path would be used instead of the path depth. > >Sure, we can do some experiments and try it out. My fear is that using an >index on randomly distributed data will perform poorly, and we might end >up with similar problems than we had with Jackrabbit 2.x. But I might be >wrong. I don’t understand the problem. My suggestion is just to use a hash/short checksum of the parent node path instead of the node level/path depth: cc24:/content/a/1 cc24:/content/a/2 0ef7:/content/b/1 0ef7:/content/b/2 09d7:/content/c/1 09d7:/content/c/2 (where the first 4 characters are the first 4 characters of the SHA1 of the parent node path) instead of: 3:/content/a/1 3:/content/a/2 3:/content/b/1 3:/content/b/2 3:/content/c/1 3:/content/c/2 The advantage is a much better distribution for all nodes which are in a bucket (like a tenant, board, artificial bucket and so on). Regards, Joel > >Regards, >Thomas >
