Bug#704957: git-annex: git annex fsck reports "bad file content" for intact files

2013-04-08 Thread Joey Hess
Henrik Ahlgren wrote:
> Version: 3.20120629

>   Bad file content; moved to 
> /home/pablo/Documents/private/.git/annex/bad/SHA256E-s10392194--82ffa5dcee1460a0aaf2e6ee0d6361dc0efcd67881d786feae805f7a55c7d228.psd.gz
>   Bad file content; moved to 
> /home/pablo/Documents/private/.git/annex/bad/SHA256E-s175727--a71185f5778152394dced031a66094bd1553b3b8d2ea23cd539cd3b93bfe304e.1999.pdf

Note the multiple extensions on the filenames; ".psd.gz", ".1999.pdf"

You apparently are using a newer version of git-annex than 3.20120629
somewhere, because the first version that used more than the final
filename extension was 3.20120721.

(And the first version that used the SHA256E backend by default when
adding files was 3.20120924.)

It does turn out that backwards compatability has been broken;
3.20120629 fails to fsck files added using the SHA256E backend by
current versions of git-annex. It would probably be worth backporting a
fix to wheezy for this.

The fix is quite simple:

diff --git a/Backend/SHA.hs b/Backend/SHA.hs
index 838a97a..054922d 100644
--- a/Backend/SHA.hs
+++ b/Backend/SHA.hs
@@ -107,5 +107,5 @@ checkKeyChecksum size key file = do
else check <$> shaN size file
where
check s
-   | s == dropExtension (keyName key) = True
+   | s == dropExtensions (keyName key) = True
| otherwise = False

-- 
see shy jo


signature.asc
Description: Digital signature


Bug#704957: git-annex: git annex fsck reports "bad file content" for intact files

2013-04-08 Thread Henrik Ahlgren
Package: git-annex
Version: 3.20120629
Severity: normal

Dear Maintainer,

The annex-fsck command appears to think some files are bad, even though
there appears to be nothing wrong with them.

What I did: I ran "git annex fsck" on two separate annex repositories (on the 
same machine), both
containing some few hundred annexed files of various sizes, plus thousands of 
smaller non-annexed
objects. "git fsck" for the normal objects succeeded without any errors.

Outcome: Both repositories reported "Bad file content" errors on seemingly 
random files,
and the number of bad files was 46 and 47. Example (filenames censored - they 
don't contain
any special char except spaces):

fsck XXX/YYY.psd.gz (checksum...) 
  Bad file content; moved to 
/home/pablo/Documents/private/.git/annex/bad/SHA256E-s10392194--82ffa5dcee1460a0aaf2e6ee0d6361dc0efcd67881d786feae805f7a55c7d228.psd.gz
failed

(...)
fsck AAA/BBB/CCC.pdf (checksum...) 
  Bad file content; moved to 
/home/pablo/Documents/private/.git/annex/bad/SHA256E-s175727--a71185f5778152394dced031a66094bd1553b3b8d2ea23cd539cd3b93bfe304e.1999.pdf
failed
(Recording state in git...)
git-annex: fsck: 47 failed

However when I check the integrity of the objects moved to .git/annex/bad, the 
SHA256 sums match:

$ sha256sum 
SHA256E-s10392194--82ffa5dcee1460a0aaf2e6ee0d6361dc0efcd67881d786feae805f7a55c7d228.psd.gz
 
82ffa5dcee1460a0aaf2e6ee0d6361dc0efcd67881d786feae805f7a55c7d228  
SHA256E-s10392194--82ffa5dcee1460a0aaf2e6ee0d6361dc0efcd67881d786feae805f7a55c7d228.psd.gz

I also ran "sha256sum .git/annex/bad/*" and all the bad files seem to be just 
fine.

This problem seems to be reproducible: I copied the objects back from another 
repository
(residing on an SD card) with "git annex copy --from=8gbmicrosd", and ran the 
same fsck
command again, and got the exact same results (i.e. diffed the logged output of 
stdout/stderr).

After that I ran annex-fsck on the repository on the SD card, and once again, 
the same thing happened,
so we can rule out a hardware problem on the main SSD drive of the machine. I 
also ran a test
sript that creates 1000 files of 100 MB in size from /dev/unrandom, takes an 
SHA256 hash of them,
saves them using the hash as their filename, and then checks the SHA256 sum of 
all files, and
repeats this cycle 100 times. No integrity errors were observed on this test, 
so both the hardware
and filesystem/kernel appears to work reliably (note: I'm using self-compiled 
3.8.5, haven't yet tried
reproducing this with stock Wheezy kernel).

Best regards and thanks for the great tool.

Henrik

-- System Information:
Debian Release: 7.0
  APT prefers testing
  APT policy: (990, 'testing'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.8.5 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages git-annex depends on:
ii  curl7.26.0-1+wheezy1
ii  git 1:1.7.10.4-1+wheezy1
ii  libc6   2.13-38
ii  libffi5 3.0.10-3
ii  libgmp102:5.0.5+dfsg-2
ii  libpcre31:8.30-5
ii  openssh-client  1:6.0p1-4
ii  rsync   3.0.9-4
ii  uuid1.6.2-1.3
ii  wget1.13.4-3

Versions of packages git-annex recommends:
ii  lsof  4.86+dfsg-1

Versions of packages git-annex suggests:
pn  bup   
ii  gnupg 1.4.12-7
pn  graphviz  

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org