yknoya commented on PR #12395: URL: https://github.com/apache/trafficserver/pull/12395#issuecomment-3142000024
# Problem A bug was discovered where the cache loaded from host.db is not being used. Upon debugging, it was found that the following logic in the `probe` function is executed, resulting in the cache being marked as stale: https://github.com/apache/trafficserver/blob/9f6ababfed6c9d0266ee1d7e1703a09413ca48cb/iocore/hostdb/HostDB.cc#L567-L570 # Cause The cause of the stale determination was identified in `HostDB::unmarshall`, specifically in the following section: https://github.com/apache/trafficserver/blob/9f6ababfed6c9d0266ee1d7e1703a09413ca48cb/iocore/hostdb/I_HostDBProcessor.h#L180-L183 This code first populates `HostDBInfo` using memcpy, then calls the constructor via placement new. As described in the comment, the constructor is invoked to reset the reference count of the base class `RefCountObj` to zero. However, calling the constructor also reinitializes members that use default member initializers: https://github.com/apache/trafficserver/blob/9f6ababfed6c9d0266ee1d7e1703a09413ca48cb/iocore/hostdb/I_HostDBProcessor.h#L322-L339 Debug logs added before and after the constructor call showed that fields such as `ip_timestamp` were unintentionally reset, causing the cache to be incorrectly judged as stale: ``` [Aug 1 10:29:01.022] traffic_server DEBUG: <I_HostDBProcessor.h:182 (unmarshall)> (hostdb) HostDBInfo before new: key=17900179173150759707 hostname_offset=80, ip_timestamp=1754011708, ip_timeout_interval=900, _iobuffer_index=0, m_refcount=2 [Aug 1 10:29:01.022] traffic_server DEBUG: <I_HostDBProcessor.h:192 (unmarshall)> (hostdb) HostDBInfo after new: key=0 hostname_offset=0, ip_timestamp=0, ip_timeout_interval=0, _iobuffer_index=0, m_refcount=0 ``` The patch used to add the debug logs is shown below: <details> <summary>patch</summary> ```diff diff --git a/iocore/hostdb/I_HostDBProcessor.h b/iocore/hostdb/I_HostDBProcessor.h index 0be0a81b4..2356337d7 100644 --- a/iocore/hostdb/I_HostDBProcessor.h +++ b/iocore/hostdb/I_HostDBProcessor.h @@ -178,10 +178,22 @@ struct HostDBInfo : public RefCountObj { HostDBInfo *ret = HostDBInfo::alloc(size - sizeof(HostDBInfo)); int buf_index = ret->_iobuffer_index; memcpy((void *)ret, buf, size); + + Debug("hostdb", "HostDBInfo before new: key=%lu hostname_offset=%u, " + "ip_timestamp=%u, ip_timeout_interval=%u, _iobuffer_index=%d, m_refcount=%d", + ret->key, ret->hostname_offset, ret->ip_timestamp, ret->ip_timeout_interval, + ret->_iobuffer_index, ret->refcount()); + // Reset the refcount back to 0, this is a bit ugly-- but I'm not sure we want to expose a method // to mess with the refcount, since this is a fairly unique use case ret = new (ret) HostDBInfo(); ret->_iobuffer_index = buf_index; + + Debug("hostdb", "HostDBInfo after new: key=%lu hostname_offset=%u, " + "ip_timestamp=%u, ip_timeout_interval=%u, _iobuffer_index=%d, m_refcount=%d", + ret->key, ret->hostname_offset, ret->ip_timestamp, ret->ip_timeout_interval, + ret->_iobuffer_index, ret->refcount()); + return ret; } diff --git a/iocore/hostdb/RefCountCache.cc b/iocore/hostdb/RefCountCache.cc index af9cf0fd2..8f58c3767 100644 --- a/iocore/hostdb/RefCountCache.cc +++ b/iocore/hostdb/RefCountCache.cc @@ -49,5 +49,5 @@ RefCountCacheHeader::operator==(RefCountCacheHeader const &that) const bool RefCountCacheHeader::compatible(RefCountCacheHeader *that) const { - return this->magic == that->magic && this->version == that->version && this->object_version == that->version; + return this->magic == that->magic && this->version == that->version && this->object_version == that->object_version; }; ``` </details> # Fix To fix this issue, the construction of `HostDBInfo` in `HostDB::unmarshall` must satisfy the following conditions: - The reference counter (`m_refcount`) should be initialized. - All other member variables should not be reinitialized. I considered ways to initialize the reference counter without calling the constructor. However, I concluded that invoking the constructor is also necessary to reassign the virtual function table pointer (vptr): https://github.com/apache/trafficserver/blob/9f6ababfed6c9d0266ee1d7e1703a09413ca48cb/iocore/hostdb/I_HostDBProcessor.h#L180-L183 As a result, I chose the following workaround: Back up the member variable values before calling the constructor, then restore them after the constructor has been executed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
