Hi,

During the TechFair I have talked with Marcel about the MongoDB
Microkernel and asked some questions about clustering. He explained that
the nodes are distributed over the clusters based on the key which
consists of the path depth and node path. The path depth is prepended to
the path to ensure that the nodes are distributed more equally. However, I
fear that this will cause troubles since often only some kind of data is
written heavily. In our current project, for example, we will have a lot
of card and comment writes and all cards and comments are on the same
level (the paths are simplified):

Cards: /content/<tenant>/<board>/<card>
Comments: /content/<tenant>/<board>/<card>/comments/<comment>

As you can see, all cards and all comments are saved on the same level and
hence end up on the same cluster node. If we assume that every card gets
10 comments, this will cause 10 times more write load on the “comments”
cluster node than on the “cards” cluster node. And again, the same is true
for cards compared to other content.


A much better distribution could be achieved if the hash/checksum of the
parent node path would be used instead of the path depth. Children would
still be saved on the same cluster node, but the cards/comments would be
saved on different clusters for every board/card. I strongly recommend to
reconsider the choice of the path depth as the prefix of the MongoDB key
since this will lead to really bad load distribution.

Regards, Joel

Reply via email to