The following code reliably throws a SIGBUS in the memset, and cat
testfile > /dev/null returns an IO error.

I've sometimes gotten as high as iteration 900 before a SIGBUS, so
don't assume a single clear is OK.

linux 3.17.0, SATA -> MD(raid5) -> bcache (ssd) -> btrfs

Working on eliminating more variables.

#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define MB      (1024ull * 1024)
#define GB      (1024ull * MB)
#define TEST_SIZE       (4096)

int main() {
        int fd;
        srandom(1024);
        fd=open("testfile", O_RDWR|O_CREAT, 0600);
        posix_fallocate(fd, 0, TEST_SIZE * MB);

        uint8_t * map = 0;

        int i;
        for(i=0;i<1000;i++) {
                size_t location=(random() % (TEST_SIZE-1)) * MB;
                map = (uint8_t *) mmap(map, MB, PROT_READ|PROT_WRITE,
MAP_SHARED,
                                fd, location);

                printf("%d: writing at %04zd mb\n", i, location);

                memset(map, 0x5a, 1 * MB);
                msync(map, 1*MB, MS_ASYNC);

                munmap(map, MB);
        }
}

On Wed, Oct 29, 2014 at 5:50 PM, Dan Merillat <dan.meril...@gmail.com> wrote:
> I'm in the middle of debugging the exact same thing.  3.17.0 -
> rtorrent dies with SIGBUS.
>
> I've done some debugging, the sequence is something like this:
> open a new file
> fallocate() to the final size
> mmap() all (or a portion) of the file
> write to the region
> run SHA1 on that mmap'd region to validate the chink
> crash, eventually.  Generally not at the same point.
>
> Reading that file (cat > /dev/null) returns -EIO.
>
> Looking up the process maps, the SIGBUS appears to be happening in the
> middle of a mapped region of a pre-allocated file - I.E. it shouldn't
> be.  I'm not completely ruling out a rtorrent bug but it appears sane
> to me.
>
> Weirder: "old" files, that have been around a while, work just fine for 
> seeding.
> I've re-hashed my entire collection without an error.
>
> Seeing this on both inherit-COW and no-inherit-COW files, and the
> filesystem is not using compression.
>
> The interesting part is going back and attempting to read the files
> later they sometimes don't throw an IO error.
>
> Absolutely nothing in dmesg.
>
> Working on a testcase that triggers it reliably but no luck so far.  I
> thought I had bad RAM but two people upgrading to 3.17 and seeing the
> same bug at around the same time can't be a coincidence.  I rebooted
> to 3.17 on the 25th, the first new download was on the 28th and that
> failed.
>
> Working on a testcase for it that's more reproducable than "go grab
> torrent files with rtorrent".
>
> On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne <a...@tevsa.net> wrote:
>> Hi, it seems that when using rtorrent to download into a btrfs system,
>> it leads to the creation of files that fail to read properly.
>> For instance, I get rtorrent to crash, but if I try to rsync the file he
>> was writting into someplace else, rsync also fails with the message
>> "can't map file "$file": Input/Output error (5)".
>> If I give it time, eventually the file gets into a good state and I can
>> rsync it somewhere else (as long as rtorrent doesn't keep writting into
>> it). This doesn't happen using ext4 on the same system.
>>
>> No btrfs errors, or any other errors, show up in any log. Scrubbing or
>> balancing don't turn up any issues. I've tried using a subvolume mounted
>> with nodatacow and/or flushoncommit, which didn't help. I'm not using
>> quotas and at some point had a single snapshot that I deleted. The
>> filesystem was originally created recently (on a 3.16.4+ kernel).
>>
>> Here's what the array looks like:
>>
>> Label: 'data'  uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811
>>         Total devices 4 FS bytes used 3.14TiB
>>         devid    4 size 2.73TiB used 2.36TiB path /dev/sdd1
>>         devid    5 size 1.82TiB used 1.45TiB path /dev/sdc1
>>         devid    6 size 1.82TiB used 1.45TiB path /dev/sdb1
>>         devid    7 size 1.82TiB used 1.45TiB path /dev/sda1
>>
>> Btrfs v3.17
>>
>> Data, RAID1: total=3.34TiB, used=3.13TiB
>> System, RAID1: total=32.00MiB, used=512.00KiB
>> Metadata, RAID1: total=10.00GiB, used=7.31GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>>
>> On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28
>> 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3
>> AuthenticAMD GNU/Linux
>>
>> I'm utterly puzzled and clueless at how to dig into this issue.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to