[Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

Pranith Kumar Karampuri Fri, 11 Mar 2016 03:10:46 -0800

hi,
      I think this is the RCA for the issue:

Basically with distributed ec + disctributed replicate as cold, hottiers. tiersends a lookup which fails on ec. (By this time dict alreadycontains ecxattrs) After this lookup_everywhere code path is hit in tier whichtriggerslookup on each of distribute's hash lookup but fails which leads tothe cold,hot dht's lookup_everywhere in two parallel epoll threads where inec's thread it

    tries to set trusted.ec.version/dirty/size in the dictionary, the older

values against the same key get erased. While this erasing is goingon if thethread that is doing lookup on afr's subvolume accesses thesemembers either indict_copy_with_ref or client xlator trying to serialize, that caneither leadto crash or hang based on when the spin/mutex lock is called oninvalid memory.

At the moment I sent http://review.gluster.org/13680 (I am pressed fortime because I need to provide a build for our customer with a fix),which avoids parallel accesses of elements which step on each other.

Raghavendra G and I discussed about this problem and the right way tofix it is to take a copy(without dict_foreach) of the dictionary indict_foreach inside a lock and then loop over the local dictionary. I amworried about the performance implication of this, so wondering ifanyone has a better idea.

Also included Xavi, who earlier said we need to change dict.c but it isa bigger change. May be the time has come? I would love to gather allyour inputs and implement a better version of dict if we need one.


Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

Reply via email to