Your message dated Sun, 5 Apr 2026 16:16:34 +0200
with message-id <[email protected]>
and subject line Closing
has caused the Debian Bug report #1093304,
regarding swift-container: swift corrupts container database
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
1093304: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093304
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: swift-container
Version: 2.26.0-10+deb11u1+wmf1
Severity: normal

Hi,

We had an outage due to swift simultaneously quarantining all three copies of a container database (saying they were corrupt) during a listing operation. Given all three databases were corrupt, this is I think not a case of a disk/fs fault causing corruption, but rather that swift had processed an operation that wrote to the container DB in a way that corrupted the sqlite file.

Each container-server had a backtrace like this:

Jan 5 07:20:28 ms-be2058 container-server: ERROR __call__ error with GET /sdb3/16503/AUTH_mw/wikipedia-commons-local-thumb.f8 :
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swift/common/db.py", line 475, in get
    yield conn
File "/usr/lib/python3/dist-packages/swift/container/backend.py", line 1173, in list_objects_iter
    return [transform_func(r) for r in curs]
File "/usr/lib/python3/dist-packages/swift/container/backend.py", line 1173, in <listcomp>
    return [transform_func(r) for r in curs]
sqlite3.DatabaseError: database disk image is malformed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/swift/container/server.py", line 867, in __call__
    res = getattr(self, req.method)(req)
File "/usr/lib/python3/dist-packages/swift/common/utils.py", line 2007, in _timing_stats
    resp = func(ctrl, *args, **kwargs)
File "/usr/lib/python3/dist-packages/swift/container/server.py", line 752, in GET
    container_list = src_broker.list_objects_iter(
File "/usr/lib/python3/dist-packages/swift/container/backend.py", line 1223, in list_objects_iter
    return results
  File "/usr/lib/python3.9/contextlib.py", line 135, in __exit__
    self.gen.throw(type, value, traceback)
File "/usr/lib/python3/dist-packages/swift/common/db.py", line 483, in get
    self.possibly_quarantine(*sys.exc_info())
File "/usr/lib/python3/dist-packages/swift/common/db.py", line 436, in possibly_quarantine
    self.quarantine(exc_hint)
File "/usr/lib/python3/dist-packages/swift/common/db.py", line 414, in quarantine
    raise sqlite3.DatabaseError(detail)
sqlite3.DatabaseError: Quarantined /srv/swift-storage/sdb3/containers/16503/280/4077d9164732d6587761ef101bcbc280 to /srv/swift-storage/sdb3/quarantined/containers/4077d9164732d6587761ef101bcbc280 due to malformed database (txn: tx4d7ef4ae3a434f458e950-00677a32bc)

And, indeed, if I did an integrity check on the quarantined files, each one showed the same errors:

mvernon@ms-be2073:~$ sqlite3 4077d9164732d6587761ef101bcbc280.db "PRAGMA integrity_check"
row 423322 missing from index ix_object_deleted_name
row 2701219 missing from index ix_object_deleted_name

Which is quite surprising, given generally rowids are not the same across the 3 databases. One of the complained-of rows is still extant in the table (and dates from 2016), and the other isn't.

In all 3 cases, the latest object (based on rowid) is an object that was deleted very shortly before the outage started - 07:19:50 UTC, with the outage starting at 07:20:28.

It is perhaps significant that that last listing that succeeded was using that object as a prefix in a list request at the same time as the object was being deleted:

object with highest rowid:
19933856|f/f8/Gascones,_molino_(1988)_02.jpg/300px-Gascones,_molino_(1988)_02.jpg|1736061590.04401|0|application/deleted|noetag|1|0

final successful listing:
Jan 5 07:19:50 ms-fe2010 proxy-server: 10.194.179.98 10.192.16.76 05/Jan/2025/07/19/50 GET /v1/AUTH_mw/wikipedia-commons-local-thumb.f8%3Flimit%3D9000%26prefix%3Df%252Ff8%252FGascones%252C_molino_%25281988%2529_02.jpg%252F%26format%3Djson%26states%3Dlisting HTTP/1.0 200 - wikimedia/multi-http-client%20v1.1 AUTH_tk22395377a... - 511 - txc6028b8aef0d4705aef82-00677a3296 - 0.0301 - - 1736061590.006474018 1736061590.036608696 0

final delete:
ms-fe2011.proxylog.gz:Jan 5 07:19:50 ms-fe2011 proxy-server: 10.194.179.98 10.192.32.36 05/Jan/2025/07/19/50 DELETE /v1/AUTH_mw/wikipedia-commons-local-thumb.f8/f/f8/Gascones%252C_molino_%25281988%2529_02.jpg/300px-Gascones%252C_molino_%25281988%2529_02.jpg HTTP/1.0 204 - wikimedia/multi-http-client%20v1.1 AUTH_tk22395377a... - - - tx8d7b6c325ca54e89a4a08-00677a3296 - 0.1511 - - 1736061590.041912317 1736061590.193058491 0

There is further investigation of the incident at https://phabricator.wikimedia.org/T383053 and I have lots of logs, which these seem like the most pertinent parts of, but if you'd like other log extracts that can be done.

Finally, I should note that this container contains image thumbnails, generated by a separate service via a 404 handler in swift middleware - see https://github.com/wikimedia/operations-puppet/blob/45d5772c846e42269c2f1a19c8784fd9d2deb240/modules/swift/files/python3.9/SwiftMedia/wmf/rewrite.py#L48

Thanks,

Matthew

-- System Information:
Debian Release: 11.11
  APT prefers oldstable-updates
APT policy: (500, 'oldstable-updates'), (500, 'oldstable-security'), (500, 'oldstable-debug'), (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-30-amd64 (SMP w/48 CPU threads)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages swift-container depends on:
ii  init-system-helpers   1.60
ii  lsb-base              11.1.0
ii  openstack-pkg-tools   117
ii  python3               3.9.2-3
ii  python3-pastescript   2.0.2-4
ii  python3-swift         2.26.0-10+deb11u1+wmf1
ii  rsync                 3.2.3-4+deb11u1
ii  swift                 2.26.0-10+deb11u1+wmf1
ii  uwsgi-plugin-python3  2.0.19.1-7.1

Versions of packages swift-container recommends:
ii  swift-drive-audit  2.26.0-10+deb11u1+wmf1

swift-container suggests no packages.

--- End Message ---
--- Begin Message ---
Hi,

As this was forwarded to upstream, there's nothing more I can do, and I do not wish to keep this bug opened forever with no action possible on my side.

Please follow-up at:
https://bugs.launchpad.net/swift/+bug/2141924

Cheers,

Thomas Goirand (zigo)

--- End Message ---

Reply via email to