Bug#771452: s3ql: fsck.s3ql on crashed file system results in uncaught exception

Shannon Dealy Sat, 29 Nov 2014 10:21:27 -0800

Package: s3ql
Version: 2.11.1+dfsg-1
Severity: critical
Justification: causes serious data loss


Dear Maintainer,

While running rsync to backup data to an s3ql file system mounted from Amazon's
S3 services, the internet connection failed, resulting in the following 
error(s) from rsync:

   rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken 
pipe (32)
   rsync: write failed on "<file name removed>": Software caused connection 
abort (103)
   rsync error: error in file IO (code 11) at receiver.c(322) [receiver=3.0.9]
   rsync: connection unexpectedly closed (17298 bytes received so far) [sender]
   rsync error: error in rsync protocol data stream (code 12) at io.c(605) 
[sender=3.0.9]

I attempted to unmount the file system with the following result (twice):

   # umount.s3ql /media/server-external
   File system appears to have crashed.

I then forced it to unmount as follows:

   # fusermount -u -z /media/server-external

Then attempted to fsck the file system (twice - both gave the same result):

   fsck.s3ql s3://<bucket name>/<file system prefix>
   Enter file system encryption passphrase: 
   Starting fsck of s3://<bucket name>/<file system prefix>
   Using cached metadata.
   Remote metadata is outdated.
   Checking DB integrity...
   Creating temporary extra indices...
   Checking lost+found...
   Checking cached objects...
   Committing block 14 of inode 442809 to backend
   Committing block 16 of inode 442809 to backend
   Committing block 17 of inode 442809 to backend
   Committing block 15 of inode 442809 to backend
   Committing block 19 of inode 442809 to backend
   Committing block 18 of inode 442809 to backend
   Checking names (refcounts)...
   Checking contents (names)...
   Checking contents (inodes)...
   Checking contents (parent inodes)...
   Checking objects (reference counts)...
   Checking objects (backend)...
   ..processed 100000 objects so far..
   Dropping temporary indices...
   Uncaught top-level exception:
   Traceback (most recent call last):
     File "/usr/bin/fsck.s3ql", line 9, in <module>
       load_entry_point('s3ql==2.11.1', 'console_scripts', 'fsck.s3ql')()
     File "/usr/lib/s3ql/s3ql/fsck.py", line 1189, in main
       fsck.check()
     File "/usr/lib/s3ql/s3ql/fsck.py", line 85, in check
       self.check_objects_id()
     File "/usr/lib/s3ql/s3ql/fsck.py", line 848, in check_objects_id
       self.conn.execute('INSERT INTO obj_ids VALUES(?)', (obj_id,))
     File "/usr/lib/s3ql/s3ql/database.py", line 98, in execute
       self.conn.cursor().execute(*a, **kw)
     File "src/cursor.c", line 231, in resetcursor
   apsw.ConstraintError: ConstraintError: PRIMARY KEY must be unique

Next I copied the entire Amazon bucket to a new bucket and attempted an fsck
on the copy, minus the locally cached file system data:

   # fsck.s3ql s3://<new bucket name>/<file system prefix>
   Enter file system encryption passphrase: 
   Starting fsck of s3://<new bucket name>/<file system prefix>
   Uncaught top-level exception:
   Traceback (most recent call last):
     File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 381, in 
_convert_legacy_metadata
       meta_new['data'] = meta['data']
   KeyError: 'data'

   During handling of the above exception, another exception occurred:

   Traceback (most recent call last):
     File "/usr/bin/fsck.s3ql", line 9, in <module>
       load_entry_point('s3ql==2.11.1', 'console_scripts', 'fsck.s3ql')()
     File "/usr/lib/s3ql/s3ql/fsck.py", line 1111, in main
       param = backend.lookup('s3ql_metadata')
     File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 72, in lookup
       meta_raw = self._convert_legacy_metadata(meta_raw)
     File "/usr/lib/s3ql/s3ql/backends/comprenc.py", line 383, in 
_convert_legacy_metadata
       raise CorruptedObjectError('meta key data is missing')
   s3ql.backends.common.CorruptedObjectError: meta key data is missing


NOTE: I'm not sure about the exact implication of "_convert_legacy_metadata"
      in the traceback above, but this was NOT a legacy file system, it was 
      just created using s3ql 2.11.1 as it is cheaper to rebuild it than
      to pull 700 GB in the old copy down from Amazon to do the "verify"
      specified as part of the upgrade procedure from older versions.

At the time of this failure I had uploaded between 200 and 300 GB of
deduplicated/compressed data to the new file system.

As things currently stand, unless I have overlooked or misunderstood something
(which I consider entirely possible), this network connection failure has
resulted in 100% data loss unless fsck can be fixed in a manner which will
allow it to complete correctly and recover the file system data.  As I
maintain other backups, no actual data has been lost (so far), but this
makes s3ql unsafe to use and further attempts to backup my data to S3
pointless.

Regards,

Shannon Dealy

-- System Information:
Debian Release: 7.7
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'testing'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.13-0.bpo.1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages s3ql depends on:
ii  fuse                   2.9.3-9
ii  libc6                  2.18-4
ii  libjs-sphinxdoc        1.1.3+dfsg-4
ii  libsqlite3-0           3.7.13-1+deb7u1
ii  psmisc                 22.19-1+deb7u1
ii  python3                3.4.2-1
ii  python3-apsw           3.8.6-r1-1
ii  python3-crypto         2.6.1-5+b2
ii  python3-defusedxml     0.4.1-2
ii  python3-dugong         3.3+dfsg-2
ii  python3-llfuse         0.40-2+b2
ii  python3-pkg-resources  5.5.1-1
ii  python3-requests       2.4.3-4

s3ql recommends no packages.

s3ql suggests no packages.

-- debconf-show failed


-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#771452: s3ql: fsck.s3ql on crashed file system results in uncaught exception

Reply via email to