[ https://issues.apache.org/jira/browse/TS-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
B Wyatt updated TS-949: ----------------------- Attachment: TS949-BW-p1.patch Attaching a modified version of JP's patch that removes the unnecessary(?) multiplication and division. Additionally, it uses the volume index mapping. > key->volume hash table is not consistent when a disk is marked as bad or > removed due to failure > ----------------------------------------------------------------------------------------------- > > Key: TS-949 > URL: https://issues.apache.org/jira/browse/TS-949 > Project: Traffic Server > Issue Type: Bug > Components: Cache > Affects Versions: 3.1.0 > Environment: Multi-volume cache with apparently faulty drives > Reporter: B Wyatt > Assignee: John Plevyak > Fix For: 3.1.2 > > Attachments: TS-949-jp-1.patch, TS949-BW-p1.patch > > > The method for resolving collisions when distributing hash-table space to > volumes for the object_key->volume hash table creates inconsistency when a > disk is determined to be bad, or when a failed disk is removed from the > volume.config. > Background: > The hash space is distributed by round robin draft where each volume "drafts" > a random index in the hash table until the hash space is exhausted. The > random order in which a given volume drafts hash table slots is consistent > across reboot/crash/disk-failure, however when a volume attempts to draft a > slot which has already been occupied, it skips to its next random pick and > attempts to draft that slot until it finds an open slot. This ensures that > the hash is partitioned evenly between volumes. > The issue: > Resolving slot contention breaks the consistency as it is dependent on the > order that the volumes draft. When rebuilding the hash after disk failure or > reboot with fewer drives, a volume may secure an index that was previously > occupied by the dead-disk. In the old hash, the surviving volume would have > selected another random index due to contention. If this index is taken, by > the next draft round it will represent an inconsistent key->volume result. > The effects of one inconsistency will then cascade as whichever volume > occupies that index after removing a dead disk is now behind on its draft > sequence as well. > An Example: > ||Disk||Draft Sequence|| > |A|1,4,7,5| > |B|4,2,8,1| > |C|3,7,5,2| > Pre-failure Hash Table after 2 rounds of draft: > |A|B|C|B|C|?|A|?| > Post-failure of drive B Hash Table after 3 rounds of draft: > |A|C|C|A|{color:red}A{color}|?|{color:red}C{color}|?| > Two slots have become inconsistent and more will probably follow. These > inconsistencies become objects stored in a volume but lost to the top level > cache for open/lookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira