Re: [PATCH v6 2/2] btrfs: Add zstd support to grub btrfs
On Mon, Nov 19, 2018 at 03:22:51PM +0100, Daniel Kiper wrote: > On Thu, Nov 15, 2018 at 02:36:03PM -0800, Nick Terrell wrote: > > - Adds zstd support to the btrfs module. > > - Adds a test case for btrfs zstd support. > > - Changes top_srcdir to srcdir in the btrfs module's lzo include > > following comments from Daniel Kiper about the zstd include. > > > > Tested on Ubuntu-18.04 with a btrfs /boot partition with and without zstd > > compression. A test case was also added to the test suite that fails before > > the patch, and passes after. > > > > Signed-off-by: Nick Terrell > > Reviewed-by: Daniel Kiper > > If there are no objections I will apply this patch series in a week or so. Errr... I have just realized that there are two many spaces just at the beginning of most lines. There should be 2 instead of 4. Please take a look at currently existing functions. Anyway, I can fix this before committing. However, if you could repost whole patch series that would be much easier for me. Daniel
Re: [PATCH v6 2/2] btrfs: Add zstd support to grub btrfs
On Thu, Nov 15, 2018 at 02:36:03PM -0800, Nick Terrell wrote: > - Adds zstd support to the btrfs module. > - Adds a test case for btrfs zstd support. > - Changes top_srcdir to srcdir in the btrfs module's lzo include > following comments from Daniel Kiper about the zstd include. > > Tested on Ubuntu-18.04 with a btrfs /boot partition with and without zstd > compression. A test case was also added to the test suite that fails before > the patch, and passes after. > > Signed-off-by: Nick Terrell Reviewed-by: Daniel Kiper If there are no objections I will apply this patch series in a week or so. Daniel
Price Inquiry
Hi,friend, This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia. We are glad to know about your company from the web and we are interested in your products. Could you kindly send us your Latest catalog and price list for our trial order. Best Regards, Daniel Murray Purchasing Manager
Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.
On Tue, Oct 09, 2018 at 07:51:01PM +0200, Daniel Kiper wrote: > On Thu, Sep 27, 2018 at 08:34:56PM +0200, Goffredo Baroncelli wrote: > > From: Goffredo Baroncelli > > > > Signed-off-by: Goffredo Baroncelli > > Code LGTM. Though comment begs improvement. I will send you updated > comment for approval shortly. Below you can find updated patch. Please check I have not messed up something. Daniel >From ecefb12a10d39bdd09e1d2b8fbbcbdb1b35274f8 Mon Sep 17 00:00:00 2001 From: Goffredo Baroncelli Date: Thu, 27 Sep 2018 20:34:56 +0200 Subject: [PATCH 1/1] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile. Signed-off-by: Goffredo Baroncelli Signed-off-by: Daniel Kiper --- grub-core/fs/btrfs.c | 73 ++ 1 file changed, 73 insertions(+) diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c index be19544..933a57d 100644 --- a/grub-core/fs/btrfs.c +++ b/grub-core/fs/btrfs.c @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20 #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40 +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100 grub_uint8_t dummy2[0xc]; grub_uint16_t nstripes; grub_uint16_t nsubstripes; @@ -766,6 +768,77 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr, csize = chunk_stripe_length - low; break; } + case GRUB_BTRFS_CHUNK_TYPE_RAID5: + case GRUB_BTRFS_CHUNK_TYPE_RAID6: + { + grub_uint64_t nparities, stripe_nr, high, low; + + redundancy = 1; /* no redundancy for now */ + + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) + { + grub_dprintf ("btrfs", "RAID5\n"); + nparities = 1; + } + else + { + grub_dprintf ("btrfs", "RAID6\n"); + nparities = 2; + } + + /* + * RAID 6 layout consists of several stripes spread over + * the disks, e.g.: + * + * Disk_0 Disk_1 Disk_2 Disk_3 + * A0 B0 P0 Q0 + * Q1 A1 B1 P1 + * P2 Q2 A2 B2 + * + * Note: placement of the parities depend on row number. + * + * Pay attention that the btrfs terminology may differ from + * terminology used in other RAID implementations, e.g. LVM, + * dm or md. The main difference is that btrfs calls contiguous + * block of data on a given disk, e.g. A0, stripe instead of chunk. + * + * The variables listed below have following meaning: + * - stripe_nr is the stripe number excluding the parities + * (A0 = 0, B0 = 1, A1 = 2, B1 = 3, etc.), + * - high is the row number (0 for A0...Q0, 1 for Q1...P1, etc.), + * - stripen is the disk number in a row (0 for A0, Q1, P2, + * 1 for B0, A1, Q2, etc.), + * - off is the logical address to read, + * - chunk_stripe_length is the size of a stripe (typically 64 KiB), + * - nstripes is the number of disks in a row, + * - low is the offset of the data inside a stripe, + * - stripe_offset is the data offset in an array, + * - csize is the "potential" data to read; it will be reduced + * to size if the latter is smaller, + * - nparities is the number of parities (1 for RAID 5, 2 for + * RAID 6); used only in RAID 5/6 code. + */ + stripe_nr = grub_divmod64 (off, chunk_stripe_length, ); + + /* + * stripen is computed without the parities + * (0 for A0, A1, A2, 1 for B0, B1, B2, etc.). + */ + high = grub_divmod64 (stripe_nr, nstripes - nparities, ); + + /* + * The stripes are spread over the disks. Every each row their + * positions are shifted by 1 place. So, the real disks number + * change. Hence, we have to take current row number modulo + * nstripes into account (0 for A0, 1 for A1, 2 for A2, etc.). + */ + grub_divmod64 (high + stripen, nstripes, ); + + stripe_offset = low + chunk_stripe_length * high; + csize = chunk_stripe_length - low; + + break; + } default: grub_dprintf ("btrfs", "unsupported RAID\n"); return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET, -- 1.7.10.4
Re: [PATCH 9/9] btrfs: Add RAID 6 recovery for a btrfs filesystem.
On Wed, Sep 26, 2018 at 09:56:07PM +0200, Goffredo Baroncelli wrote: > On 25/09/2018 21.20, Daniel Kiper wrote: > > On Wed, Sep 19, 2018 at 08:40:40PM +0200, Goffredo Baroncelli wrote: > >> From: Goffredo Baroncelli > >> > [] > >> * - stripe_offset is the disk offset, > >> * - csize is the "potential" data to read. It will be reduced to > >> *size if the latter is smaller. > >> + * - parities_pos is the position of the parity inside a row ( > > > > s/inside/in/> > >> + *2 for P1, 3 for P2...) > > + * - nparities is the number of parities (1 for RAID5, 2 for > RAID6); > + *used only in RAID5/6 code. > > >> */ > >> block_nr = grub_divmod64 (off, chunk_stripe_length, ); > >> > >> @@ -1030,6 +1069,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data > >> *data, grub_disk_addr_t addr, > >> */ > >> grub_divmod64 (high + stripen, nstripes, ); > >> > >> +grub_divmod64 (high + nstripes - nparities, nstripes, > >> + _pos); > > > > I think that this math requires a bit of explanation in the comment > > before grub_divmod64(). Especially I am interested in why high + > > nstripes - nparities works as expected. > > > What about > > /* > * parities_pos is equal to "(high - nparities) % nstripes" (see the diagram > above). > * However "high - nparities" might be negative (eg when high == 0) leading > to an > * incorrect computation. > * Instead "high + nstripes - nparities" is always positive and in modulo > nstripes is > * equal to "(high - nparities) % nstripes > */ LGTM. Daniel
Re: [PATCH 7/9] btrfs: Add support for recovery for a RAID 5 btrfs profiles.
On Wed, Sep 26, 2018 at 09:55:57PM +0200, Goffredo Baroncelli wrote: > On 25/09/2018 21.10, Daniel Kiper wrote: > > On Wed, Sep 19, 2018 at 08:40:38PM +0200, Goffredo Baroncelli wrote: > >> From: Goffredo Baroncelli > >> > >> Add support for recovery for a RAID 5 btrfs profile. In addition > >> it is added some code as preparatory work for RAID 6 recovery code. > >> > >> Signed-off-by: Goffredo Baroncelli > >> --- > >> grub-core/fs/btrfs.c | 169 +-- > >> 1 file changed, 164 insertions(+), 5 deletions(-) > >> > >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c > >> index 5c1ebae77..55a7eeffc 100644 > >> --- a/grub-core/fs/btrfs.c > >> +++ b/grub-core/fs/btrfs.c > >> @@ -29,6 +29,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> > >> GRUB_MOD_LICENSE ("GPLv3+"); > >> > >> @@ -665,6 +666,148 @@ btrfs_read_from_chunk (struct grub_btrfs_data *data, > >> return err; > >> } > >> > >> +struct raid56_buffer { > >> + void *buf; > >> + int data_is_valid; > >> +}; > >> + > >> +static void > >> +rebuild_raid5 (char *dest, struct raid56_buffer *buffers, > >> + grub_uint64_t nstripes, grub_uint64_t csize) > >> +{ > >> + grub_uint64_t i; > >> + int first; > >> + > >> + i = 0; > >> + while (buffers[i].data_is_valid && i < nstripes) > >> +++i; > > > > for (i = 0; buffers[i].data_is_valid && i < nstripes; i++); > > > >> + if (i == nstripes) > >> +{ > >> + grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are > >> OK\n"); > >> + return; > >> +} > >> + > >> + grub_dprintf ("btrfs", "rebuilding RAID 5 stripe #%" PRIuGRUB_UINT64_T > >> "\n", > >> + i); > > > > One line here please. > > > >> + for (i = 0, first = 1; i < nstripes; i++) > >> +{ > >> + if (!buffers[i].data_is_valid) > >> + continue; > >> + > >> + if (first) { > >> + grub_memcpy(dest, buffers[i].buf, csize); > >> + first = 0; > >> + } else > >> + grub_crypto_xor (dest, dest, buffers[i].buf, csize); > >> + > >> +} > > > > Hmmm... I think that this function can be simpler. You can drop first > > while/for and "if (i == nstripes)". Then here: > > > > if (first) { > > grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n"); > > > > Am I right? > > Ehm.. no. The "if" is an internal check to avoid BUG. rebuild_raid5() should > be called only if some disk is missed. > To perform this control, the code checks if all buffers are valid. Otherwise > there is an internal BUG. Something is wrong here. I think that the code checks if it is an invalid buffer. If there is not then GRUB complains. Right? However, it looks that I misread the code and made a mistake here. So, please ignore this change. Though please change while() with for() at the beginning. Daniel
Re: [PATCH 4/9] btrfs: Avoid a rescan for a device which was already not found.
On Wed, Sep 26, 2018 at 09:55:54PM +0200, Goffredo Baroncelli wrote: > On 25/09/2018 19.29, Daniel Kiper wrote: > > On Wed, Sep 19, 2018 at 08:40:35PM +0200, Goffredo Baroncelli wrote: > >> From: Goffredo Baroncelli > >> > >> If a device is not found, do not return immediately but > >> record this failure by storing NULL in data->devices_attached[]. > > > > Still the same question: Where the store happens in the code? > > I cannot find it in the patch below. This have to be clarified. > > > > Daniel > > > What about the following commit description > - > Change the behavior of find_device(): before the patch, a read of a > missed device might trigger a rescan. However, it is never recorded s/might/may/ > that a device is missed, so each single read of a missed device might > triggers a rescan. It is the caller who decides if a rescan is > performed in case of a missed device. And it does quite often, without > considering if in the past a devices was already found as "missed" > This behavior causes a lot of unneeded rescan, causing a huge slowdown > in case of a missed device. > > After the patch, the "missed device" information is stored in the > cache (as a NULL value). A rescan is triggered only if no information What do you mean by "cache"? ctx.dev_found? If yes please use latter instead of former. Or both together if it makes sense. > at all is found in the cache. This means that only the first time a > read of a missed device triggers a rescan. > > The change in the code is done removing "return NULL" when the disk is > not found. So it is always executed the code which stores in the cache cache? > the value returned by grub_device_iterate(): NULL if the device is > missed, or a valid data otherwise. > - Otherwise it is much better than earlier one. Daniel
Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.
On Wed, Sep 26, 2018 at 10:40:32PM +0200, Goffredo Baroncelli wrote: > On 25/09/2018 17.31, Daniel Kiper wrote: > > On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote: > >> From: Goffredo Baroncelli > >> > >> Signed-off-by: Goffredo Baroncelli > >> --- > >> grub-core/fs/btrfs.c | 66 > >> 1 file changed, 66 insertions(+) > >> > >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c > >> index be195448d..56c42746d 100644 > >> --- a/grub-core/fs/btrfs.c > >> +++ b/grub-core/fs/btrfs.c > >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item > >> #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 > >> #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20 > >> #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40 > >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 > >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100 > >>grub_uint8_t dummy2[0xc]; > >>grub_uint16_t nstripes; > >>grub_uint16_t nsubstripes; > >> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data > >> *data, grub_disk_addr_t addr, > >> stripe_offset = low + chunk_stripe_length > >>* high; > >> csize = chunk_stripe_length - low; > >> +break; > >> + } > >> +case GRUB_BTRFS_CHUNK_TYPE_RAID5: > >> +case GRUB_BTRFS_CHUNK_TYPE_RAID6: > >> + { > >> +grub_uint64_t nparities, block_nr, high, low; > >> + > >> +redundancy = 1; /* no redundancy for now */ > >> + > >> +if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) > >> + { > >> +grub_dprintf ("btrfs", "RAID5\n"); > >> +nparities = 1; > >> + } > >> +else > >> + { > >> +grub_dprintf ("btrfs", "RAID6\n"); > >> +nparities = 2; > >> + } > >> + > >> +/* > >> + * A RAID 6 layout consists of several blocks spread on the disks. > >> + * The raid terminology is used to call all the blocks of a row > >> + * "stripe". Unfortunately the BTRFS terminology confuses block > > > > Stripe is data set or parity (parity stripe) on one disk. Block has > > different meaning. Please stick to btrfs terminology and say it clearly > > in the comment. And even add a link to btrfs wiki page to ease reading. > > > > I think about this one: > > > > https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID > > > >> + * and stripe. > > > > I do not think so. Or at least not so much... > > Trust me, generally speaking stripe is the "row" in the disks (without the > parity); looking at the ext3 man page: > > >stride=stride-size > Configure the filesystem for a RAID array with > stride-size filesystem blocks. This is the number of > blocks read or written to disk before moving to the > next disk, which is sometimes referred to as the > chunk size. This mostly affects placement of > filesystem metadata like bitmaps at mke2fs time to > avoid placing them on a single disk, which can hurt > performance. It may also be used by the block > allo??? > cator. > >stripe_width=stripe-width > Configure the filesystem for a RAID array with > stripe-width filesystem blocks per stripe. This is > typically stride-size * N, where N is the number of > data-bearing disks in the RAID (e.g. for RAID 5 > there is one parity disk, so N will be the number of > disks in the array minus 1). This allows the block > allocator to prevent read-modify-write of the parity > in a RAID stripe if possible when the data is > writ??? > ten. > > > Looking at the RAID5 wikipedia page, it seems that the term "stripe" > is coherent with the ext3 man page. Ugh... It looks that I have messe
Re: [PATCH 3/3] btrfs: Add zstd support to btrfs
ze_t ret = -1; > + > + /* Zstd will fail if it can't fit the entire output in the destination > + * buffer, so if osize isn't large enough, allocate a temporary buffer. > + */ > + if (otmpsize < ZSTD_BTRFS_MAX_INPUT) { > + allocated = grub_malloc (ZSTD_BTRFS_MAX_INPUT); > + if (!allocated) { > + grub_dprintf ("zstd", "outtmpbuf allocation failed\n"); > + goto out; > + } > + otmpbuf = (char*)allocated; > + otmpsize = ZSTD_BTRFS_MAX_INPUT; > + } > + > + /* Allocate space for, and initialize, the ZSTD_DCtx. */ > + wmem = grub_malloc (wmem_size); > + if (!wmem) { > + grub_dprintf ("zstd", "wmem allocation failed\n"); > + goto out; > + } > + dctx = ZSTD_initDCtx (wmem, wmem_size); > + > + /* Get the real input size, there may be junk at the > + * end of the frame. > + */ > + isize = ZSTD_findFrameCompressedSize (ibuf, isize); > + if (ZSTD_isError (isize)) { > + grub_dprintf ("zstd", "first frame is invalid %d\n", > + (int)ZSTD_getErrorCode (isize)); > + goto out; > + } > + > + /* Decompress and check for errors */ > + zstd_ret = ZSTD_decompressDCtx (dctx, otmpbuf, otmpsize, ibuf, isize); > + if (ZSTD_isError (zstd_ret)) { > + grub_dprintf ("zstd", "zstd failed with code %d\n", > + (int)ZSTD_getErrorCode (zstd_ret)); > + goto out; > + } > + > + /* Move the requested data into the obuf. > + * obuf may be equal to otmpbuf, which is why grub_memmove() is > required. > + */ > + grub_memmove (obuf, otmpbuf + off, osize); > + ret = osize; > + > +out: s/out/err/ > + grub_free (allocated); > + grub_free (wmem); > + return ret; > +} > + > static grub_ssize_t > grub_btrfs_lzo_decompress(char *ibuf, grub_size_t isize, grub_off_t off, > char *obuf, grub_size_t osize) > @@ -1087,7 +1156,8 @@ grub_btrfs_extent_read (struct grub_btrfs_data *data, > >if (data->extent->compression != GRUB_BTRFS_COMPRESSION_NONE > && data->extent->compression != GRUB_BTRFS_COMPRESSION_ZLIB > - && data->extent->compression != GRUB_BTRFS_COMPRESSION_LZO) > + && data->extent->compression != GRUB_BTRFS_COMPRESSION_LZO > + && data->extent->compression != GRUB_BTRFS_COMPRESSION_ZSTD) > { > grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET, > "compression type 0x%x not supported", > @@ -1127,6 +1197,15 @@ grub_btrfs_extent_read (struct grub_btrfs_data *data, > != (grub_ssize_t) csize) > return -1; > } > + else if (data->extent->compression == GRUB_BTRFS_COMPRESSION_ZSTD) > + { > + if (grub_btrfs_zstd_decompress(data->extent->inl, data->extsize - > +((grub_uint8_t *) data->extent->inl > + - (grub_uint8_t *) data->extent), > +extoff, buf, csize) > + != (grub_ssize_t) csize) > + return -1; > + } > else > grub_memcpy (buf, data->extent->inl + extoff, csize); > break; > @@ -1164,6 +1243,10 @@ grub_btrfs_extent_read (struct grub_btrfs_data *data, > ret = grub_btrfs_lzo_decompress (tmp, zsize, extoff > + grub_le_to_cpu64 (data->extent->offset), > buf, csize); > + else if (data->extent->compression == GRUB_BTRFS_COMPRESSION_ZSTD) > + ret = grub_btrfs_zstd_decompress (tmp, zsize, extoff > + + grub_le_to_cpu64 (data->extent->offset), > + buf, csize); > else > ret = -1; > > diff --git a/tests/btrfs_test.in b/tests/btrfs_test.in > index 2b37ddd33..0c9bf3a68 100644 > --- a/tests/btrfs_test.in > +++ b/tests/btrfs_test.in > @@ -18,6 +18,7 @@ fi > "@builddir@/grub-fs-tester" btrfs > "@builddir@/grub-fs-tester" btrfs_zlib > "@builddir@/grub-fs-tester" btrfs_lzo > +"@builddir@/grub-fs-tester" btrfs_zstd > "@builddir@/grub-fs-tester" btrfs_raid0 > "@builddir@/grub-fs-tester" btrfs_raid1 > "@builddir@/grub-fs-tester" btrfs_single > diff --git a/tests/util/grub-fs-tester.in b/tests/util/grub-fs-tester.in > index ef65fbc93..147d946d2 100644 > --- a/tests/util/grub-fs-tester.in > +++ b/tests/util/grub-fs-tester.in > @@ -600,7 +600,7 @@ for LOGSECSIZE in $(range "$MINLOGSECSIZE" > "$MAXLOGSECSIZE" 1); do > GENERATED=n > LODEVICES= > MOUNTDEVICE= > - > + Ditto. Daniel
Re: Bad superblock when mounting rw, ro mount works
Thanks for the help, I started `check --repair --init-extent-tree` right around a week ago as a last effort before restoring from backup. Unfortunately, that command is still running. It does seem to be using about half of the system's RAM (8 of 16GB) and 100% load on a single core. Is this type of run time expected for an 8TB drive? The byte numbers it's referencing seem to be a bit odd to me as they're larger than the number of bytes on the drive. Here's the head and tail of the current run (separated by -) if that's indicative of progress: btrfs unable to find ref byte nr 49673574858752 parent 0 root 1 owner 2 offset 0 btrfs unable to find ref byte nr 49673719529472 parent 0 root 1 owner 1 offset 1 btrfs unable to find ref byte nr 62243448012800 parent 0 root 1 owner 0 offset 1 btrfs unable to find ref byte nr 49673575120896 parent 0 root 1 owner 1 offset 1 btrfs unable to find ref byte nr 49673575251968 parent 0 root 1 owner 0 offset 1 checking extents ref mismatch on [49218307751936 67108864] extent item 0, found 1 data backref 49218307751936 root 5 owner 1359193 offset 536870912 num_refs 0 not found in extent tree incorrect local backref count on 49218307751936 root 5 owner 1359193 offset 536870912 found 1 wanted 0 back 0x5583559bd790 backpointer mismatch on [49218307751936 67108864] - data backref 49230998138880 root 5 owner 1409678 offset 7408779264 num_refs 0 not found in extent tree incorrect local backref count on 49230998138880 root 5 owner 1409678 offset 7408779264 found 1 wanted 0 back 0x5583b95f0e20 backpointer mismatch on [49230998138880 16384] adding new data backref on 49230998138880 root 5 owner 1409678 offset 7408779264 found 1 Repaired extent references for 49230998138880 ref mismatch on [49230998155264 16384] extent item 0, found 1 data backref 49230998155264 root 5 owner 669291 offset 3905650688 num_refs 0 not found in extent tree incorrect local backref count on 49230998155264 root 5 owner 669291 offset 3905650688 found 1 wanted 0 back 0x5582efb12930 backpointer mismatch on [49230998155264 16384] adding new data backref on 49230998155264 root 5 owner 669291 offset 3905650688 found 1 Thanks, Daniel On Thu, Jun 14, 2018 at 10:43 AM, Qu Wenruo wrote: > From the output, specially the lowmem mode output since original mode > handles extent tree corruption poorly and aborted, it's your extent tree > corrupted and causing the bug. > > Thus, you should be able to mount the fs RO and copy all the data back > without much hassle. > Just need to pay attention for csum error. > > And considering how many extent tree corruption, I don't think it's a > good idea to manually fix the fs. > > The last chance is, to try --repair --init-extent-tree as your last > chance, if you still want to salvage the filesystem. > The lowmem mode shows no extra bug, thus it's possible for > --init-extent-tree to re-init extent tree and save the day. > > But personally speaking I'm not fully confident of the operation, thus > it may fail and you may need to use the backup. > > BTW, even --init-extent-tree succeeded, you may still need to run btrfs > check again to check if all the bugs are fixed. > But at least from the lowmem output, the remaining errors are all fixable. > > Thanks, > Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad superblock when mounting rw, ro mount works
>> Your very first task right now is to mount ro, and update your >> backups. Don't do anything else until you've done that. It's a >> testimony to Btrfs that this file system mounts at all, even ro, so >> take advantage of this fact before you can't mount it anymore. Backups are in place for the important parts, though I'd prefer not to use them if possible. For btrfs-progs, I am using 4.16.1 installed from https://github.com/kdave/btrfs-progs. Regarding em5, it had errors when I initially added it to the array that were due to a faulty SAS card. The card was replaced and I haven't seen errors since until this popped up. Regarding what you said about my dmesg, the numbers are actually still the same (bdev /dev/mapper/em5 errs: wr 164286, rd 3444262, flush 2110, corrupt 3, gen 181) after doing a backup and reboot. I would think they would have changed unless btrfs is just ignoring that disk now. Just to make sure, I've checked to make sure there weren't any loose connections. The logs for check (with and without lowmem mode), `btrfs fi us`, and `smartctl -x` are attached. The checks are only for em5, but I can do other disks if necessary (it's a 6-disk raid10-style setup). Note that there are a lot of errors on the SMART info, but they seem to be back from when I was having hardware issues. --- On Thu, Jun 7, 2018 at 4:50 PM, Chris Murphy wrote: > On Thu, Jun 7, 2018 at 2:38 PM, Chris Murphy wrote: > > >> Your very first task right now is to mount ro, and update your >> backups. Don't do anything else until you've done that. It's a >> testimony to Btrfs that this file system mounts at all, even ro, so >> take advantage of this fact before you can't mount it anymore. > > After you've done the backup, you need to find out why one of these > devices is being so unreliable. That has to be fixed first. You can > recreate a new Btrfs or some other file system, and you'll just run > into the exact same problem down the road. Next, it might be useful to > see the output from btrfs-progs 4.16.1 'btrfs check' and 'btrfs check > --mode=lowmem' both of which are slow, the second one is really slow > but is a different implementation so it's helpful to see both outputs. > That's safe as long as you do not use --repair. > > Also we need to see the output from 'btrfs fi us ' with > the volume mounted (ro). Off hand I think the most likely outcome is > that you get a backup from the ro mounted file system, and you'll have > to recreate it from scratch and restore from backups. In other words, > no matter what you need a current backup. > > -- > Chris Murphy -- Daniel Underwood NCSU Physics 2016 Undergraduate Researcher, Triangle Universities Nuclear Laboratory (704) 244-0244 daniel.underwoo...@gmail.com djund...@ncsu.edu media-usage.log Description: Binary data sdh-smart.log Description: Binary data em5-check.log Description: Binary data em5-check-lowmem.log Description: Binary data
Bad superblock when mounting rw, ro mount works
Hi, I have a raid10-like setup that is failing to mount in rw mode with the error mount: /mnt/media: wrong fs type, bad option, bad superblock on /dev/mapper/em1, missing codepage or helper program, or other error read-only mounts seem to work and the files seem to be there. I started having issues after a system crash during the process of deleting a number of large files. After this (Ubuntu 16.04/Kernel 4.4), any attempt to mount the array in rw mode would cause a similar crash. I did an upgrade to Ubuntu 18.04/Kernel 4.15 and now get the error above. I have looked through a variety of posts on the mailing list, but couldn't find anything with the same issue. I have done a scrub on the array that resulted in 6 verify errors with dmesg showing something about extent trees. It didn't list them as uncorrectable errors, but couldn't correct them either as I can't mount in rw. I also tried `btrfs rescue zero-log /dev/mapper/em1`, which changes the above to (say) em5, but then zero-log on em5 causes it to go back to em1. Any direction would be appreciated. From what I could tell, my next steps would be a check with --repair or --init-extent-tree, though I'm reluctant to try those without being explicitly told to do so. I have attached a dmesg.log file and hope I haven't greped out anything important. Thanks, Daniel -- Daniel Underwood dmesg.log Description: Binary data
Re: [PATCH 10/14] vgem: separate errno from VM_FAULT_* values
On Wed, May 16, 2018 at 07:43:44AM +0200, Christoph Hellwig wrote: > And streamline the code in vgem_fault with early returns so that it is > a little bit more readable. > > Signed-off-by: Christoph Hellwig <h...@lst.de> > --- > drivers/gpu/drm/vgem/vgem_drv.c | 51 +++-- > 1 file changed, 23 insertions(+), 28 deletions(-) > > diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c > index 2524ff116f00..a261e0aab83a 100644 > --- a/drivers/gpu/drm/vgem/vgem_drv.c > +++ b/drivers/gpu/drm/vgem/vgem_drv.c > @@ -61,12 +61,13 @@ static void vgem_gem_free_object(struct drm_gem_object > *obj) > kfree(vgem_obj); > } > > -static int vgem_gem_fault(struct vm_fault *vmf) > +static vm_fault_t vgem_gem_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > struct drm_vgem_gem_object *obj = vma->vm_private_data; > /* We don't use vmf->pgoff since that has the fake offset */ > unsigned long vaddr = vmf->address; > + struct page *page; > int ret; > loff_t num_pages; > pgoff_t page_offset; > @@ -85,35 +86,29 @@ static int vgem_gem_fault(struct vm_fault *vmf) > ret = 0; > } > mutex_unlock(>pages_lock); > - if (ret) { > - struct page *page; > - > - page = shmem_read_mapping_page( > - file_inode(obj->base.filp)->i_mapping, > - page_offset); > - if (!IS_ERR(page)) { > - vmf->page = page; > - ret = 0; > - } else switch (PTR_ERR(page)) { > - case -ENOSPC: > - case -ENOMEM: > - ret = VM_FAULT_OOM; > - break; > - case -EBUSY: > - ret = VM_FAULT_RETRY; > - break; > - case -EFAULT: > - case -EINVAL: > - ret = VM_FAULT_SIGBUS; > - break; > - default: > - WARN_ON(PTR_ERR(page)); > - ret = VM_FAULT_SIGBUS; > - break; > - } > + if (!ret) > + return 0; > + > + page = shmem_read_mapping_page(file_inode(obj->base.filp)->i_mapping, > + page_offset); > + if (!IS_ERR(page)) { > + vmf->page = page; > + return 0; > + } > > + switch (PTR_ERR(page)) { > + case -ENOSPC: > + case -ENOMEM: > + return VM_FAULT_OOM; > + case -EBUSY: > + return VM_FAULT_RETRY; > + case -EFAULT: > + case -EINVAL: > + return VM_FAULT_SIGBUS; > + default: > + WARN_ON(PTR_ERR(page)); > + return VM_FAULT_SIGBUS; > } > - return ret; Reviewed-by: Daniel Vetter <daniel.vet...@ffwll.ch> Want me to merge this through drm-misc or plan to pick it up yourself? -Daniel > } > > static const struct vm_operations_struct vgem_gem_vm_ops = { > -- > 2.17.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Add support for BTRFS raid5/6 to GRUB
On Tue, Apr 17, 2018 at 09:57:40PM +0200, Goffredo Baroncelli wrote: > Hi All, > > Below you can find a patch to add support for accessing files from > grub in a RAID5/6 btrfs filesystem. This is a RFC because it is > missing the support for recovery (i.e. if some devices are missed). In > the next days (weeks ?) I will extend this patch to support also this > case. > > Comments are welcome. More or less LGTM. Just a nitpick below... I am happy to take full blown patch into GRUB if it is ready. > BR > G.Baroncelli > > > --- > > commit 8c80a1b7c913faf50f95c5c76b4666ed17685666 > Author: Goffredo Baroncelli <kreij...@inwind.it> > Date: Tue Apr 17 21:40:31 2018 +0200 > > Add initial support for btrfs raid5/6 chunk > > diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c > index be195448d..4c5632acb 100644 > --- a/grub-core/fs/btrfs.c > +++ b/grub-core/fs/btrfs.c > @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item > #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 > #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20 > #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40 > +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 > +#define GRUB_BTRFS_CHUNK_TYPE_RAID60x100 >grub_uint8_t dummy2[0xc]; >grub_uint16_t nstripes; >grub_uint16_t nsubstripes; > @@ -764,6 +766,39 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, > grub_disk_addr_t addr, > stripe_offset = low + chunk_stripe_length > * high; > csize = chunk_stripe_length - low; > + break; > + } > + case GRUB_BTRFS_CHUNK_TYPE_RAID5: > + case GRUB_BTRFS_CHUNK_TYPE_RAID6: > + { > + grub_uint64_t nparities; > + grub_uint64_t parity_pos; > + grub_uint64_t stripe_nr, high; > + grub_uint64_t low; > + > + redundancy = 1; /* no redundancy for now */ > + > + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) > + { > + grub_dprintf ("btrfs", "RAID5\n"); > + nparities = 1; > + } > + else > + { > + grub_dprintf ("btrfs", "RAID6\n"); > + nparities = 2; > + } > + > + stripe_nr = grub_divmod64 (off, chunk_stripe_length, ); > + > + high = grub_divmod64 (stripe_nr, nstripes - nparities, ); > + grub_divmod64 (high+nstripes-nparities, nstripes, _pos); > + grub_divmod64 (parity_pos+nparities+stripen, nstripes, ); Missing spaces around "+" and "-". Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization
On 03/12/2018 03:06 PM, Masami Hiramatsu wrote: > On Mon, 12 Mar 2018 11:44:21 +0100 > Daniel Borkmann <dan...@iogearbox.net> wrote: >> On 03/12/2018 11:27 AM, Masami Hiramatsu wrote: >>> On Mon, 12 Mar 2018 19:00:49 +0900 >>> Masami Hiramatsu <mhira...@kernel.org> wrote: >>> >>>> Since the kprobe which was optimized by jump can not change >>>> the execution path, the kprobe for error-injection must not >>>> be optimized. To prohibit it, set a dummy post-handler as >>>> officially stated in Documentation/kprobes.txt. >>> >>> Note that trace-probe based BPF is not affected, because it >>> ensures the trace-probe is based on ftrace, which is not >>> jump optimized. >> >> Thanks for the fix! I presume this should go via bpf instead of bpf-next >> tree since 4b1a29a7f542 ("error-injection: Support fault injection >> framework") >> is in Linus' tree as well. Unless there are objection I would rather route >> it that way so it would be for 4.16. > > Ah, right! It should go into 4.16. It should be applicable cleanly either tree > since there is only the above commit on kernel/fail_function.c :) Applied to bpf tree, thanks Masami! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization
Hi Masami, On 03/12/2018 11:27 AM, Masami Hiramatsu wrote: > On Mon, 12 Mar 2018 19:00:49 +0900 > Masami Hiramatsu <mhira...@kernel.org> wrote: > >> Since the kprobe which was optimized by jump can not change >> the execution path, the kprobe for error-injection must not >> be optimized. To prohibit it, set a dummy post-handler as >> officially stated in Documentation/kprobes.txt. > > Note that trace-probe based BPF is not affected, because it > ensures the trace-probe is based on ftrace, which is not > jump optimized. Thanks for the fix! I presume this should go via bpf instead of bpf-next tree since 4b1a29a7f542 ("error-injection: Support fault injection framework") is in Linus' tree as well. Unless there are objection I would rather route it that way so it would be for 4.16. Thanks, Daniel > Thanks, > >> >> Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework") >> Signed-off-by: Masami Hiramatsu <mhira...@kernel.org> >> --- >> kernel/fail_function.c | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/kernel/fail_function.c b/kernel/fail_function.c >> index 21b0122cb39c..1d5632d8bbcc 100644 >> --- a/kernel/fail_function.c >> +++ b/kernel/fail_function.c >> @@ -14,6 +14,15 @@ >> >> static int fei_kprobe_handler(struct kprobe *kp, struct pt_regs *regs); >> >> +static void fei_post_handler(struct kprobe *kp, struct pt_regs *regs, >> + unsigned long flags) >> +{ >> +/* >> + * A dummy post handler is required to prohibit optimizing, because >> + * jump optimization does not support execution path overriding. >> + */ >> +} >> + >> struct fei_attr { >> struct list_head list; >> struct kprobe kp; >> @@ -56,6 +65,7 @@ static struct fei_attr *fei_attr_new(const char *sym, >> unsigned long addr) >> return NULL; >> } >> attr->kp.pre_handler = fei_kprobe_handler; >> +attr->kp.post_handler = fei_post_handler; >> attr->retval = adjust_error_retval(addr, 0); >> INIT_LIST_HEAD(>list); >> } >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Limit on the number of btrfs snapshots?
A couple of years ago I asked a question on the Unix and Linux Stack Exchange about the limit on the number of BTRFS snapshots: https://unix.stackexchange.com/q/140360/22724 Basically, I want to use something like snapper to take time based snapshots so that I can browse old versions of my data. This would be in addition to my current off site backup since a drive failure would wipe out the data and the snapshots. Is there a limit to the number of snapshots I can take and store? If I have a million snapshots (e.g., a snapshot every minute for two years) would that cause havoc, assuming I have enough disk space for the data, the changed data, and the meta data? The answers there provided a link to the wiki: https://btrfs.wiki.kernel.org/index.php/Btrfs_design#Snapshots_and_Subvolumes that says: "snapshots are writable, and they can be snapshotted again any number of times." While I don't doubt that that is technically true, another user suggested that the practical limit is around 100 snapshots. While I am not convinced that having minute-by-minute versions of my data for two years is helpful (how the hell is anyone going to find the exact minute they are looking for), if there is no cost then I figure why not. I guess I am asking is what is the story and where is it documented. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 1/5] add infrastructure for tagging functions as error injectable
On 12/20/2017 08:13 AM, Masami Hiramatsu wrote: > On Tue, 19 Dec 2017 18:14:17 -0800 > Alexei Starovoitovwrote: [...] >> Please make your suggestion as patches based on top of bpf-next. > > bpf-next seems already pick this series. Would you mean I revert it and > write new patch? No, please submit as follow-ups instead, thanks Masami! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 3/5] bpf: add a bpf_override_function helper
On 12/18/2017 10:51 AM, Masami Hiramatsu wrote: > On Fri, 15 Dec 2017 14:12:54 -0500 > Josef Bacikwrote: >> From: Josef Bacik >> >> Error injection is sloppy and very ad-hoc. BPF could fill this niche >> perfectly with it's kprobe functionality. We could make sure errors are >> only triggered in specific call chains that we care about with very >> specific situations. Accomplish this with the bpf_override_funciton >> helper. This will modify the probe'd callers return value to the >> specified value and set the PC to an override function that simply >> returns, bypassing the originally probed function. This gives us a nice >> clean way to implement systematic error injection for all of our code >> paths. > > OK, got it. I think the error_injectable function list should be defined > in kernel/trace/bpf_trace.c because only bpf calls it and needs to care > the "safeness". > > [...] >> diff --git a/arch/x86/kernel/kprobes/ftrace.c >> b/arch/x86/kernel/kprobes/ftrace.c >> index 8dc0161cec8f..1ea748d682fd 100644 >> --- a/arch/x86/kernel/kprobes/ftrace.c >> +++ b/arch/x86/kernel/kprobes/ftrace.c >> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p) >> p->ainsn.boostable = false; >> return 0; >> } >> + >> +asmlinkage void override_func(void); >> +asm( >> +".type override_func, @function\n" >> +"override_func:\n" >> +" ret\n" >> +".size override_func, .-override_func\n" >> +); >> + >> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs) >> +{ >> +regs->ip = (unsigned long)_func; >> +} >> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function); > > Calling this as "override_function" is meaningless. This is a function > which just return. So I think combination of just_return_func() and > arch_bpf_override_func_just_return() will be better. > > Moreover, this arch/x86/kernel/kprobes/ftrace.c is an archtecture > dependent implementation of kprobes, not bpf. Josef, please work out any necessary cleanups that would still need to be addressed based on Masami's feedback and send them as follow-up patches, thanks. > Hmm, arch/x86/net/bpf_jit_comp.c will be better place? (No, it's JIT only and I'd really prefer to keep it that way, mixing this would result in a huge mess.) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 3/5] bpf: add a bpf_override_function helper
On 12/15/2017 09:34 PM, Alexei Starovoitov wrote: [...] > Also how big is the v9-v10 change ? > May be do it as separate patch, since previous set already sitting > in bpf-next and there are patches on top? +1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection
On 12/08/2017 09:24 PM, Josef Bacik wrote: > On Fri, Dec 08, 2017 at 04:35:44PM +0100, Daniel Borkmann wrote: >> On 12/06/2017 05:12 PM, Josef Bacik wrote: >>> Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro. I went >>> to >>> figure out why the compiler didn't catch it and it's because it was not used >>> anywhere. I had copied it from the trace blacklist code without >>> understanding >>> where it was used as cscope didn't find the original macro I was looking >>> for, so >>> I assumed it was some voodoo and left it in place. Turns out cscope failed >>> me >>> and I didn't need the macro at all, the trace blacklist thing I was looking >>> at >>> was for marking assembly functions as blacklisted and I have no intention of >>> marking assembly functions as error injectable at the moment. >>> >>> v7->v8: >>> - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed. >> >> The series doesn't apply cleanly to the bpf-next tree, so one last respin >> with >> a rebase would unfortunately still be required, thanks! > > I've rebased and let it sit in my git tree to make sure kbuild test bot didn't > blow up, can you pull from > > git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git > bpf-override-return > > or do you want me to repost the whole series? Thanks, Yeah, the patches would need to end up on netdev, so once kbuild bot went through fine after your rebase, please send the series. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection
On 12/06/2017 05:12 PM, Josef Bacik wrote: > Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro. I went to > figure out why the compiler didn't catch it and it's because it was not used > anywhere. I had copied it from the trace blacklist code without understanding > where it was used as cscope didn't find the original macro I was looking for, > so > I assumed it was some voodoo and left it in place. Turns out cscope failed me > and I didn't need the macro at all, the trace blacklist thing I was looking at > was for marking assembly functions as blacklisted and I have no intention of > marking assembly functions as error injectable at the moment. > > v7->v8: > - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed. The series doesn't apply cleanly to the bpf-next tree, so one last respin with a rebase would unfortunately still be required, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 1/5] add infrastructure for tagging functions as error injectable
On 11/28/2017 09:02 PM, Josef Bacik wrote: > On Tue, Nov 28, 2017 at 11:58:41AM -0700, Jonathan Corbet wrote: >> On Wed, 22 Nov 2017 16:23:30 -0500 >> Josef Bacik <jo...@toxicpanda.com> wrote: >>> From: Josef Bacik <jba...@fb.com> >>> >>> Using BPF we can override kprob'ed functions and return arbitrary >>> values. Obviously this can be a bit unsafe, so make this feature opt-in >>> for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in >>> order to give BPF access to that function for error injection purposes. >>> >>> Signed-off-by: Josef Bacik <jba...@fb.com> >>> Acked-by: Ingo Molnar <mi...@kernel.org> >>> --- >>> arch/x86/include/asm/asm.h| 6 ++ >>> include/asm-generic/vmlinux.lds.h | 10 +++ >>> include/linux/bpf.h | 11 +++ >>> include/linux/kprobes.h | 1 + >>> include/linux/module.h| 5 ++ >>> kernel/kprobes.c | 163 >>> ++ >>> kernel/module.c | 6 +- >>> 7 files changed, 201 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h >>> index b0dc91f4bedc..340f4cc43255 100644 >>> --- a/arch/x86/include/asm/asm.h >>> +++ b/arch/x86/include/asm/asm.h >>> @@ -85,6 +85,12 @@ >>> _ASM_PTR (entry); \ >>> .popsection >>> >>> +# define _ASM_KPROBE_ERROR_INJECT(entry) \ >>> + .pushsection "_kprobe_error_inject_list","aw" ; \ >>> + _ASM_ALIGN ;\ >>> + _ASM_PTR (entry); \ >>> + .popseciton >> >> So this stuff is not my area of greatest expertise, but I do have to wonder >> how ".popseciton" can work ... ? > > Well fuck, do you want me to send a increment Daniel/Alexei or resend this > patch > fixed? Thanks, Sorry for late reply, please rebase + respin the whole series with this fixed. There were also few typos in the cover letter / commit messages that would be good to get fixed along the way. Also, could you debug why this wasn't caught at compile/runtime during testing? Thanks a lot, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 0/4] Add the ability to do BPF directed error injection
On 11/22/2017 10:23 PM, Josef Bacik wrote: > This is hopefully the final version, I've addressed the comment by Igno and > added his Acks. > > v6->v7: > - moved the opt-in macro to bpf.h out of kprobes.h. > > v5->v6: > - add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this > feature. This way only functions that opt-in will be allowed to be > overridden. > - added a btrfs patch to allow error injection for open_ctree() so that the > bpf > sample actually works. > > v4->v5: > - disallow kprobe_override programs from being put in the prog map array so we > don't tail call into something we didn't check. This allows us to make the > normal path still fast without a bunch of percpu operations. > > v3->v4: > - fix a build error found by kbuild test bot (I didn't wait long enough > apparently.) > - Added a warning message as per Daniels suggestion. > > v2->v3: > - added a ->kprobe_override flag to bpf_prog. > - added some sanity checks to disallow attaching bpf progs that have > ->kprobe_override set that aren't for ftrace kprobes. > - added the trace_kprobe_ftrace helper to check if the trace_event_call is a > ftrace kprobe. > - renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read > this > value in the kprobe path, and thus only write to it if we're overriding or > clearing the override. > > v1->v2: > - moved things around to make sure that bpf_override_return could really only > be > used for an ftrace kprobe. > - killed the special return values from trace_call_bpf. > - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if > it was being called from an ftrace kprobe context. > - reworked the logic in kprobe_perf_func to take advantage of > bpf_kprobe_state. > - updated the test as per Alexei's review. > > - Original message - > > A lot of our error paths are not well tested because we have no good way of > injecting errors generically. Some subystems (block, memory) have ways to > inject errors, but they are random so it's hard to get reproduceable results. > > With BPF we can add determinism to our error injection. We can use kprobes > and > other things to verify we are injecting errors at the exact case we are trying > to test. This patch gives us the tool to actual do the error injection part. > It is very simple, we just set the return value of the pt_regs we're given to > whatever we provide, and then override the PC with a dummy function that > simply > returns. > > Right now this only works on x86, but it would be simple enough to expand to > other architectures. Thanks, Ok, given the remaining feedback from Ingo was addressed and therefore the series acked, I've applied it to bpf-next tree, thanks Josef. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
FAQ / encryption / error handling?
Hi all, The FAQ has a couple of sections on encryption (general and dm-crypt) One thing that isn't explained there: if you create multiple encrypted volumes (e.g. using dm-crypt) and use Btrfs to combine them into RAID1, how does error recovery work when a read operation returns corrupted data? Without encryption, reading from one disk would give a checksum mismatch and Btrfs would read from the other disk to (hopefully) get a good copy of the data. With this encryption scenario, the failure would potentially be detected in the decryption layer code and instead of returning bad data to Btrfs, it would return some error code. In that case, will Btrfs attempt to read from the other volume and allow the application to proceed as if nothing was wrong? Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 3/5] bpf: add a bpf_override_function helper
On 11/22/2017 10:23 PM, Josef Bacik wrote: > From: Josef Bacik <jba...@fb.com> > > Error injection is sloppy and very ad-hoc. BPF could fill this niche > perfectly with it's kprobe functionality. We could make sure errors are > only triggered in specific call chains that we care about with very > specific situations. Accomplish this with the bpf_override_funciton > helper. This will modify the probe'd callers return value to the > specified value and set the PC to an override function that simply > returns, bypassing the originally probed function. This gives us a nice > clean way to implement systematic error injection for all of our code > paths. > > Acked-by: Alexei Starovoitov <a...@kernel.org> > Acked-by: Ingo Molnar <mi...@kernel.org> > Signed-off-by: Josef Bacik <jba...@fb.com> Series looks good to me as well; BPF bits: Acked-by: Daniel Borkmann <dan...@iogearbox.net> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Nouveau] [PATCH 03/10] driver:gpu: return -ENOMEM on allocation failure.
On Wed, Sep 13, 2017 at 01:02:12PM +0530, Allen Pais wrote: > Signed-off-by: Allen Pais <allen.l...@gmail.com> Applied to drm-misc-next, thanks. -Daniel > --- > drivers/gpu/drm/gma500/mid_bios.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/gma500/mid_bios.c > b/drivers/gpu/drm/gma500/mid_bios.c > index d75ecb3..1fa1633 100644 > --- a/drivers/gpu/drm/gma500/mid_bios.c > +++ b/drivers/gpu/drm/gma500/mid_bios.c > @@ -237,7 +237,7 @@ static int mid_get_vbt_data_r10(struct drm_psb_private > *dev_priv, u32 addr) > > gct = kmalloc(sizeof(*gct) * vbt.panel_count, GFP_KERNEL); > if (!gct) > - return -1; > + return -ENOMEM; > > gct_virtual = ioremap(addr + sizeof(vbt), > sizeof(*gct) * vbt.panel_count); > -- > 2.7.4 > > ___ > Nouveau mailing list > nouv...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Chunk root problem
On 7/7/2017 1:06 AM, Daniel Brady wrote: > On 7/6/2017 11:48 PM, Roman Mamedov wrote: >> On Wed, 5 Jul 2017 22:10:35 -0600 >> Daniel Brady <drbr...@gmail.com> wrote: >> >>> parent transid verify failed >> >> Typically in Btrfs terms this means "you're screwed", fsck will not fix it, >> and >> nobody will know how to fix or what is the cause either. Time to restore from >> backups! Or look into "btrfs restore" if you don't have any. >> >> In your case it's especially puzzling as the difference in transid numbers is >> really significant (about 100K), almost like the FS was operating for months >> without updating some parts of itself -- and no checksum errors either, so >> all looks correct, except that everything is horribly wrong. >> >> This kind of error seems to occur more often in RAID setups, either Btrfs >> native RAID, or with Btrfs on top of other RAID setups -- i.e. where it >> becomes a complex issue that all writes to multi devices DO complete IN >> order, >> in case of an unclean shutdown. (which is much simpler on a single device >> FS). >> >> Also one of your disks or cables is failing (was /dev/sde on that boot, but >> may >> get a different index next boot), check SMART data for it and replace. >> >>> [ 21.230919] BTRFS info (device sdf): bdev /dev/sde errs: wr 402545, rd >>> 234683174, flush 194501, corrupt 0, gen 0 >> > > Well that's not good news. Unfortunately I made a fatal error in not > having a backup. Restore looks like I could recover a good chunk of it > from the dry runs, however it has a lot of trouble reading many files. > I'm sure that is related to the one disk (sde). Drives were setup as raid56. > > After updating the kernel as suggested in the email from Duncan it > reduced the "parent transid verify" errors down to just one and the errs > on sde still exist. > > [ 21.400190] BTRFS info (device sdb): use no compression > [ 21.400191] BTRFS info (device sdb): disk space caching is enabled > [ 21.400192] BTRFS info (device sdb): has skinny extents > [ 21.584923] BTRFS info (device sdb): bdev /dev/sde errs: wr 402545, > rd 234683174, flush 194501, corrupt 0, gen 0 > [ 23.394788] BTRFS error (device sdb): parent transid verify failed on > 5257838690304 wanted 591492 found 489231 > [ 23.416489] BTRFS error (device sdb): parent transid verify failed on > 5257838690304 wanted 591492 found 489231 > [ 23.416524] BTRFS error (device sdb): failed to read block groups: -5 > [ 23.448478] BTRFS error (device sdb): open_ctree failed > > I ran a SMART test as you suggested with a passing result. I also > swapped SATA cables & power with another drive and the error followed > the drive confirmed by the serial via SMART. It seems like it just can't > read from that one drive for whatever reason. I also tried disconnecting > the drive and trying to mount it degraded with no luck. Still had the > transid error just with null as the bdev. > > smartctl -a /dev/sde > smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.0-1.el7.elrepo.x86_64] > (local build) > Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Red (AF) > Device Model: WDC WD30EFRX-68EUZN0 > Serial Number:WD-WCC4N0PEYTEV > LU WWN Device Id: 5 0014ee 2b7dbfe54 > Firmware Version: 82.00A82 > User Capacity:3,000,592,982,016 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate:5400 rpm > Device is:In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) > Local Time is:Fri Jul 7 00:30:10 2017 MDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x00) Offline data collection activity > was never started. > Auto Offline Data Collection: > Disabled. > Self-test execution status: ( 0) The previous self-test routine > completed > without error or no self-test > has ever > been run. > Total time to complete Offline > data collection:(40500) seconds. > Offline data collection > capabilities:(0x7b) SMART execute Offline i
Re: Chunk root problem
On 7/6/2017 11:48 PM, Roman Mamedov wrote: > On Wed, 5 Jul 2017 22:10:35 -0600 > Daniel Brady <drbr...@gmail.com> wrote: > >> parent transid verify failed > > Typically in Btrfs terms this means "you're screwed", fsck will not fix it, > and > nobody will know how to fix or what is the cause either. Time to restore from > backups! Or look into "btrfs restore" if you don't have any. > > In your case it's especially puzzling as the difference in transid numbers is > really significant (about 100K), almost like the FS was operating for months > without updating some parts of itself -- and no checksum errors either, so > all looks correct, except that everything is horribly wrong. > > This kind of error seems to occur more often in RAID setups, either Btrfs > native RAID, or with Btrfs on top of other RAID setups -- i.e. where it > becomes a complex issue that all writes to multi devices DO complete IN order, > in case of an unclean shutdown. (which is much simpler on a single device FS). > > Also one of your disks or cables is failing (was /dev/sde on that boot, but > may > get a different index next boot), check SMART data for it and replace. > >> [ 21.230919] BTRFS info (device sdf): bdev /dev/sde errs: wr 402545, rd >> 234683174, flush 194501, corrupt 0, gen 0 > Well that's not good news. Unfortunately I made a fatal error in not having a backup. Restore looks like I could recover a good chunk of it from the dry runs, however it has a lot of trouble reading many files. I'm sure that is related to the one disk (sde). Drives were setup as raid56. After updating the kernel as suggested in the email from Duncan it reduced the "parent transid verify" errors down to just one and the errs on sde still exist. [ 21.400190] BTRFS info (device sdb): use no compression [ 21.400191] BTRFS info (device sdb): disk space caching is enabled [ 21.400192] BTRFS info (device sdb): has skinny extents [ 21.584923] BTRFS info (device sdb): bdev /dev/sde errs: wr 402545, rd 234683174, flush 194501, corrupt 0, gen 0 [ 23.394788] BTRFS error (device sdb): parent transid verify failed on 5257838690304 wanted 591492 found 489231 [ 23.416489] BTRFS error (device sdb): parent transid verify failed on 5257838690304 wanted 591492 found 489231 [ 23.416524] BTRFS error (device sdb): failed to read block groups: -5 [ 23.448478] BTRFS error (device sdb): open_ctree failed I ran a SMART test as you suggested with a passing result. I also swapped SATA cables & power with another drive and the error followed the drive confirmed by the serial via SMART. It seems like it just can't read from that one drive for whatever reason. I also tried disconnecting the drive and trying to mount it degraded with no luck. Still had the transid error just with null as the bdev. smartctl -a /dev/sde smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.0-1.el7.elrepo.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red (AF) Device Model: WDC WD30EFRX-68EUZN0 Serial Number:WD-WCC4N0PEYTEV LU WWN Device Id: 5 0014ee 2b7dbfe54 Firmware Version: 82.00A82 User Capacity:3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate:5400 rpm Device is:In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is:Fri Jul 7 00:30:10 2017 MDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection:(40500) seconds. Offline data collection capabilities:(0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported.
Re: Chunk root problem
On 7/6/2017 2:26 AM, Duncan wrote: > Daniel Brady posted on Wed, 05 Jul 2017 22:10:35 -0600 as excerpted: > >> My system suddenly decided it did not want to mount my BTRFS setup. I >> recently rebooted the computer. When it came back, the file system was >> in read only mode. I gave it another boot, but now it does not want to >> mount at all. Anything I can do to recover? This is a Rockstor setup >> that I have had running for about a year. >> >> uname -a >> Linux hobonas 4.10.6-1.el7.elrepo.x86_64 #1 SMP Sun Mar 26 >> 12:19:32 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >> >> btrfs --version >> btrfs-progs v4.10.1 > > FWIW, open ctree failed is the btrfs-generic error, but the transid > faileds may provide some help. > > Addressing the easy answer first... > > What btrfs raid mode was it configured for? If raid56, you want the > brand new 4.12 kernel at least, as there were serious bugs in previous > kernels' raid56 mode. DO NOT ATTEMPT A FIX OF RAID56 MODE WITH AN > EARLIER KERNEL, IT'S VERY LIKELY TO ONLY CAUSE FURTHER DAMAGE! But if > you're lucky, kernel 4.12 can auto-repair it. > > With those fixes the known bugs are fixed, but we'll need to wait a > few > cycles to see what the reports are. Even then, however, due to the > infamous parity-raid write hole and the fact that the parity isn't > checksummed, it's not going to be as stable as raid1 or raid10 mode. > Parity-checksumming will take a new implementation and I'm not sure if > anyone's actually working on that or not. But at least until we see > how > stable the newer raid56 code is, 2-4 kernel cycles, it's not > recommended > except for testing only, with even more backups than normal. > > If you were raid1 or raid10 mode, the raid mode is stable so it's a > different issue. I'll let the experts take it from here. Single or > raid0 mode would of course be similar, but without the protection of > the > second copy, making it less resilient. The raid mode was configured for raid56... unfortunately. I learned of the potential instability after it died. I have not attempted to repair it yet because of the possible corruption. I've only tried various ways of mounting it and dry runs of the restore function. I did as you mentioned and upgraded to kernel 4.12. The auto-repair seemed to fix quite a few things, but it is not quite there. Even with a few reboots. uname -r 4.12.0-1.el7.elrepo.x86_64 rpm -qa | grep btrfs btrfs-progs-4.10.1-0.rockstor.x86_64 dmesg [ 21.400190] BTRFS info (device sdb): use no compression [ 21.400191] BTRFS info (device sdb): disk space caching is enabled [ 21.400192] BTRFS info (device sdb): has skinny extents [ 21.584923] BTRFS info (device sdb): bdev /dev/sde errs: wr 402545, rd 234683174, flush 194501, corrupt 0, gen 0 [ 23.394788] BTRFS error (device sdb): parent transid verify failed on 5257838690304 wanted 591492 found 489231 [ 23.416489] BTRFS error (device sdb): parent transid verify failed on 5257838690304 wanted 591492 found 489231 [ 23.416524] BTRFS error (device sdb): failed to read block groups: -5 [ 23.448478] BTRFS error (device sdb): open_ctree failed -Dan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chunk root problem
Hello, My system suddenly decided it did not want to mount my BTRFS setup. I recently rebooted the computer. When it came back, the file system was in read only mode. I gave it another boot, but now it does not want to mount at all. Anything I can do to recover? This is a Rockstor setup that I have had running for about a year. uname -a Linux hobonas 4.10.6-1.el7.elrepo.x86_64 #1 SMP Sun Mar 26 12:19:32 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux btrfs --version btrfs-progs v4.10.1 btrfs fi show Label: 'rockstor_rockstor' uuid: 33e2af57-c30a-468a-9ed5-22994780f6b4 Total devices 1 FS bytes used 5.50GiB devid1 size 215.39GiB used 80.02GiB path /dev/sda3 Label: 'Nexus' uuid: 1c3595a9-3faa-4973-affc-ee8d14d922bf Total devices 5 FS bytes used 3.93TiB devid1 size 2.73TiB used 1.12TiB path /dev/sdd devid2 size 2.73TiB used 1.12TiB path /dev/sdb devid3 size 2.73TiB used 1.12TiB path /dev/sdc devid4 size 2.73TiB used 1.12TiB path /dev/sdf devid5 size 2.73TiB used 1.12TiB path /dev/sde dmesg [ 18.572846] BTRFS: device label Nexus devid 2 transid 595679 /dev/sdb [ 18.572933] BTRFS: device label Nexus devid 3 transid 595679 /dev/sdc [ 18.573027] BTRFS: device label Nexus devid 1 transid 595679 /dev/sdd [ 18.573119] BTRFS: device label Nexus devid 5 transid 595679 /dev/sde [ 18.573200] BTRFS: device label Nexus devid 4 transid 595679 /dev/sdf [ 20.846060] device-mapper: uevent: version 1.0.3 [ 20.846114] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23) initialised: dm-de...@redhat.com [ 21.073884] BTRFS info (device sdf): use no compression [ 21.073886] BTRFS info (device sdf): disk space caching is enabled [ 21.073887] BTRFS info (device sdf): has skinny extents [ 21.084353] BTRFS error (device sdf): parent transid verify failed on 8419247390720 wanted 542466 found 485869 [ 21.230919] BTRFS info (device sdf): bdev /dev/sde errs: wr 402545, rd 234683174, flush 194501, corrupt 0, gen 0 [ 21.794749] BTRFS error (device sdf): parent transid verify failed on 893915128 wanted 594920 found 490791 [ 21.841317] BTRFS error (device sdf): parent transid verify failed on 8939187814400 wanted 594923 found 490824 [ 21.870392] BTRFS error (device sdf): parent transid verify failed on 8418984427520 wanted 594877 found 490575 [ 21.951901] BTRFS error (device sdf): parent transid verify failed on 8939107860480 wanted 594915 found 465207 [ 22.015789] BTRFS error (device sdf): parent transid verify failed on 8939284430848 wanted 594958 found 465274 [ 22.034840] BTRFS error (device sdf): parent transid verify failed on 8418907701248 wanted 594869 found 351596 [ 22.070516] BTRFS error (device sdf): parent transid verify failed on 8939032035328 wanted 594899 found 465175 [ 22.091734] BTRFS error (device sdf): parent transid verify failed on 8939123818496 wanted 594917 found 490777 [ 22.110531] BTRFS error (device sdf): parent transid verify failed on 8939121917952 wanted 594917 found 490775 [ 23.393973] BTRFS error (device sdf): failed to read block groups: -5 [ 23.419807] BTRFS error (device sdf): open_ctree failed mount -t btrfs -o recovery,ro /dev/sdb /mnt2/Nexus mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. Thanks, Dan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
could not do orphan cleanup -22, btrfsck using 100% CPU, no activity
I had a system that experienced a kernel panic and after rebooting, one of the btrfs filesystems doesn't mount on the first attempt The filesystem does mount if I run the mount command manually in the emergency shell The following messages appear in the kernel log: BTRFS critical (device sdc1): corrupt leaf, bad key order: block=790251626496,root=1, slot=46 BTRFS error (device sdc1): Error removing orphan entry, stopping orphan cleanup BTRFS error (device sdc1): could not do orphan cleanup -22 There is a particular file that is now inaccessible. It is not important, but any attempt to access it gives an IO error. When I look at it with 'ls', it shows lots of question marks: $ ls broken-file.txt ? ??filename I was able to copy all files off the filesystem, except that one, using rsync, so I just created a new filesystem and started using that instead. However, I kept a copy of the broken filesystem for troubleshooting I tried running # btrfsck /dev/sdc1 and a lot of output appears: checking extents bad key ordering 46 47 bad block 790251626496 Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots bad key ordering 46 47 root 367 inode 474635 errors 2000, link count wrong unresolved ref dir 20136875 index 5 namelen 6 name foobar filetype 0 errors 3, no dir item, no dir index .. many errors like that root 367 inode 19964842 errors 400, nbytes wrong root 367 inode 19964855 errors 2001, no inode item, link count wrong unresolved ref dir 11208629 index 100627 namelen 6 name Tb8Qlf filetype 1 errors 4, no inode ref ... and many like that Checking filesystem on /dev/sdc1 UUID: found 92684783616 bytes used err is 1 total csum bytes: 88527744 total tree bytes: 439730176 total fs tree bytes: 174620672 total extent tree bytes: 127827968 btree space waste bytes: 131661981 file data blocks allocated: 6149394432 referenced 2896289792 Then I tried btrfsck --repair while monitoring with top and iostat I notice that there is read activity for about a minute, btrfsck sits there for a long time (over 30 minutes) using 100% CPU and a constant 4.4% of RAM and no more disk activity If I enable the btrfsck progress indicator, the animation appears, but still no disk activity I saw previous discussions about the "could not do orphan cleanup -22" messages and it is not clear if this is something that needs to be fixed. The kernel is 4.8.0-2-amd64 and I tried both btrfs-progs v4.7.3 (Debian) and v4.9 (compiled myself) Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Huge load on btrfs subvolume delete
Le 15/08/16 à 10:16, "Austin S. Hemmelgarn" <ahferro...@gmail.com> a écrit : ASH> With respect to databases, you might consider backing them up separately ASH> too. In many cases for something like an SQL database, it's a lot more ASH> flexible to have a dump of the database as a backup than it is to have ASH> the database files themselves, because it decouples it from the ASH> filesystem level layout. With mysql|mariadb, having a consistent dump needs to lock tables during dump, not acceptable on production servers. Even with specialised tools for hotdump, doing the dump on prod servers is too heavy about I/O (I have huge db, writing the dump is expensive and long). I used to have a slave juste for the dump (easy to stop slave, dump, and start slave), but after a while it wasn't able to follow the writings all the day long (prod was on ssd and it wasn't, dump hd was 100% busy all the day long), so it's for me really easier to rsync the raw files once a day on a cheap host before dump. (of course, I need to flush & lock table during the snapshot, before rsync, but it's just one or two seconds, still acceptable) -- Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Huge load on btrfs subvolume delete
Le 15/08/16 à 08:32, "Austin S. Hemmelgarn" <ahferro...@gmail.com> a écrit : ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote: ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete […] ASH> Before I start explaining possible solutions, it helps to explain what's ASH> actually happening here. […] Thanks a lot for these clear and detailed explanations. ASH> > Is there a better way to do so ? ASH> While there isn't any way I know of to do so, there are ways you can ASH> reduce the impact by reducing how much your backing up: Thanks for these clues too ! I'll use --commit-after, in order to wait for complete deletion before starting rsync the next snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the vm (but I guess my main pb is about databases, because their datadir are the ones with most writes). -- Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Huge load on btrfs subvolume delete
Hi, I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete I use snapshots on lxc hosts under debian jessie with - kernel 4.6.0-0.bpo.1-amd64 - btrfs-progs 4.6.1-1~bpo8 For backup, I have each day, for each subvolume btrfs subvolume snapshot -r $subvol $snap # then later ionice -c3 btrfs subvolume delete $snap but ionice doesn't seems to have any effect here and after a few minutes the load grows up quite high (30~40), and I don't know how to make this deletion nicer with I/O Is there a better way to do so ? Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a lot of I/O ? Actually my io priority on btrfs process are ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}' [btrfs-worker] none: prio 4 [btrfs-worker-hi] none: prio 4 [btrfs-delalloc] none: prio 4 [btrfs-flush_del] none: prio 4 [btrfs-cache] none: prio 4 [btrfs-submit] none: prio 4 [btrfs-fixup] none: prio 4 [btrfs-endio] none: prio 4 [btrfs-endio-met] none: prio 4 [btrfs-endio-met] none: prio 4 [btrfs-endio-rai] none: prio 4 [btrfs-endio-rep] none: prio 4 [btrfs-rmw] none: prio 4 [btrfs-endio-wri] none: prio 4 [btrfs-freespace] none: prio 4 [btrfs-delayed-m] none: prio 4 [btrfs-readahead] none: prio 4 [btrfs-qgroup-re] none: prio 4 [btrfs-extent-re] none: prio 4 [btrfs-cleaner] none: prio 0 [btrfs-transacti] none: prio 0 Thanks -- Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: attempt to mount after crash during rebalance hard crashes server
Sorry, I had about 3.5MB if xterm buffer, including my test to see if I would get a panic with the old kernel i had left in grub - I grabbed the wrong panic. running 4.4.6 ( which deb packages as 4.4.0 for some reason - I was confused) I am able to capture this on a mount attempt before my ssh connection fails: Mar 30 09:51:38 ds4-ls0 kernel: [67178.590745] BTRFS info (device dm-45): disk space caching is enabled Mar 30 09:51:38 ds4-ls0 systemd[1]: systemd-udevd.service: Got notification message from PID 338 (WATCHDOG=1) Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: seq 3514 queued, 'add' 'bdi' Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: Validate module index Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: Check if link configuration needs reloading. Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: seq 3514 forked new worker [7411] Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: seq 3514 running Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: passed device to netlink monitor 0x55c10d5c79b0 Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: seq 3514 processed Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: cleanup idle workers Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: Unload module index Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: Unloaded link configuration context. Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: worker [7411] exited Mar 30 09:51:38 ds4-ls0 kernel: [67178.841517] BTRFS info (device dm-45): bdev /dev/dm-31 errs: wr 13870290, rd 9, flush 2798850, corrupt 0, gen 0 Mar 30 09:52:09 ds4-ls0 kernel: [67207.430391] BUG: unable to handle kernel NULL pointer dereference at 01f0 Mar 30 09:52:09 ds4-ls0 kernel: [67207.477511] IP: [] can_overcommit+0x1e/0xf0 [btrfs] Mar 30 09:52:09 ds4-ls0 kernel: [67207.516215] PGD 0 I ran check last night - the output is about 23MB - don't know if that is useful, or where to look. I only posted at the recommendation of someone in IRC, in hopes to be helpful, as a kernel panic seems an extreme result of a corrupted FS. This machine is an off site copy of a file archive, I need to either fix or recreate it to maintain redundancy, but the up-time requirements are basically 0. The old kernel is the result of this machine being built when it was and then basically left as a black box. If poking at this is not of use to anybody I'll just run check --repair and see what I get. Daniel Warren Unix System Admin,Compliance Infrastructure Architect, ITServices MCMC LLC On Tue, Mar 29, 2016 at 6:55 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Warren, Daniel posted on Tue, 29 Mar 2016 16:21:28 -0400 as excerpted: > >> I'm running 4.4.0 from deb sid > > Correction. > > According to the kernel panic you posted at... > > http://pastebin.com/aBF6XmzA > > ... you're running kernel 3.16.something. > > You might be running btrfs-progs userspace 4.4.0, but on mounted > filesystems it's the kernel code that counts, not the userspace code. > > Btrfs is still stabilizing, and kernel 3.16 is ancient history. On this > list we're forward focused and track mainline. If your distro supports > btrfs on that old a kernel, that's their business, but we don't track > what patches they may or may not have backported and thus can't really > support it here very well, so in that case, you really should be looking > to your distro for that support, as they know what they've backported and > what they haven't, and are thus in a far better position to provide that > support. > > On this list, meanwhile, we recommend one of two kernel tracks, both > mainline, current or LTS. On current we recommend and provide the best > support for the latest two kernel series. With 4.5 out that's 4.5 and > 4.4. > > On the LTS track, the former position was similar, the latest two LTS > kernel series, with 4.4 being the latest and 4.1 the previous one. > However, as btrfs has matured, now the second LTS series back, 3.18, > wasn't bad, and while we still really recommend the last couple LTS > series, we do recognize that some people will still be on 3.18 and we > still do our best to support them as well. > > But before 3.18, and on non-mainline-LTS kernels more than two back, so > currently 4.4, while we'll still do the best we can, unless it's a known > issue recognizable on sight, very often that best is simply to ask that > people upgrade to something reasonably current and report back with their > results then, if the problem remains. > > As for btrfs-progs userspace, during normal operations, most of the time > the userspace code simply calls the appropriate kernel functionality to > do the real work, so userspace version isn't as important. Mkfs.btrfs is > an exception, and of course once the filesystem is having issues and > you're using btrfs check or btrfs restore, along with other tools, to try > to diagnose and fix the problem or at least to recover
attempt to mount after crash during rebalance hard crashes server
Greetings all, I'm running 4.4.0 from deb sid My server crashed during a balance after I had added 10 disks to the original 15, I have not been able to bring the FS up since, it causes a system crash btrfs fi sh looks fine, but when I mount , it crashes the server with a NULL pointer dereference error Each Disk in the set is LUKS encrypted btrfs fi sh http://pastebin.com/QLTqSU8L kernel panic http://pastebin.com/aBF6XmzA If it's of any use I can run tests before I attempt check --repair I can let this sit a day or two if any data gathering would be of use. Daniel Warren Unix System Admin,Compliance Infrastructure Architect, ITServices MCMC LLC -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
More memory more jitters?
Hi List, I have read the Gotcha[1] page: Files with a lot of random writes can become heavily fragmented (1+ extents) causing trashing on HDDs and excessive multi-second spikes of CPU load on systems with an SSD or **large amount a RAM**. Why could large amount of memory worsen the problem? If **too much** memory is a problem, is it possible to limit the memory btrfs use? Background info: I am running a heavy-write database server with 96GB ram. In the worse case it cause multi minutes of high cpu loads. Systemd keeping kill and restarting services, and old job don't die because they stuck in uninterruptable wait... etc. Tried with nodatacow, but it seems only affect new file. It is not an subvolume option either... Regards, Daniel [1] https://btrfs.wiki.kernel.org/index.php/Gotchas#Fragmentation -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs progs 4.1.1 & 4.2 segfault on chunk-recover
Hello guys I think I might found a bug, Lots of text, I dont know what you want from me and not, so I try to get almost everything in one mail, please dont shoot me! :) To make a long store somewhat short, this is about what happend to me; (skip to if you dont care about history) Arch-linux, btrfs-progs 4.1.1 & 4.2, linux 4.1.6-1 Data, RAID5: total=3.11TiB, used=0.00B <-- this one said the other day used=3.05TiB System, RAID1: total=32.00MiB, used=0.00B Metadata, RAID1: total=8.00GiB, used=144.00KiB GlobalReserve, single: total=16.00MiB, used=0.00B Label: 'Isolinear' uuid: 9bb3f369-f2a9-46be-8dde-1106ae740e36 Total devices 9 FS bytes used 144.00KiB devid7 size 2.73TiB used 541.12GiB path /dev/sdi devid9 size 1.36TiB used 533.09GiB path /dev/sdd2 devid 10 size 1.36TiB used 533.09GiB path /dev/sdg2 devid 11 size 1.82TiB used 536.12GiB path /dev/sdj2 devid 12 size 1.82TiB used 538.09GiB path /dev/sdh2 devid 13 size 286.09GiB used 286.09GiB path /dev/sda3 devid 14 size 286.09GiB used 286.09GiB path /dev/sdb3 devid 15 size 372.61GiB used 372.61GiB path /dev/sdf1 *** Some devices missing drive 8 was a 1.36TiB drive 15 is the new drive I added to the system. *one of 8 drives started to fail, smart saw error, I failed in my configure and I didn't get notified - Ran for 3-14 days before I realized. *I tried on active running system to btrfs dev del /dev/sd[failing] - Did not work (I think it was csum errors) *I added one new disk to raid, rebooted and added new disk to array, tried balancing. Power fail and ups fail after x hours *I rebooted realized the failing drive was now dead. I could mount system with degraded and some files gave me kernel panic ( https://goo.gl/photos/UXrZj6YEUW3945b37 )- others were reading fine. -Was unable to dev del missing. At this point I knew the system was probobly broken beyond repair. so I just tried all commands I could think of. check repair, check init-csum-tree etc endless loop - First very fast text scrolling, lots of CPU not much diskIO, after ~48h text slow, lots of cpu, almost no diskIO same type of message repeating (with new numbers): - ref mismatch on [17959857729536 4096] extent item 0, found 1 adding new data backref on 17959857729536 parent 35277570539520 owner 0 offset 0 found 1 Backref 17959857729536 parent 35277570539520 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 17959857729536 parent 35277570539520 owner 0 offset 0 found 1 wanted 0 back 0x145f7800 backpointer mismatch on [17959857729536 4096] ref mismatch on [17959857733632 4096] extent item 0, found 1 adding new data backref on 17959857733632 parent 35277570785280 owner 0 offset 0 found 1 Backref 17959857733632 parent 35277570785280 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 17959857733632 parent 35277570785280 owner 0 offset 0 found 1 wanted 0 back 0x145f7b90 backpointer mismatch on [17959857733632 4096] - Found out that chunk-recover gave segfault.(4.1.1 & kdave 4.2) 4.1.1 said in bt: #0 0x004251bb in btrfs_new_device_extent_record () #1 0x004301cb in ?? () #2 0x0043085d in ?? () #3 0x7fd8071074a4 in start_thread () from /usr/lib/libpthread.so.0 #4 0x7fd806e4513d in clone () from /usr/lib/libc.so.6 not much help, but I compiled -> https://github.com/kdave/btrfs-progs and backtrace: --> http://pastebin.com/XqRrqAB5 I can repeat the segfault. I made two btrfs-image , one is around 4MB the other is around 300MB think it was. So, did I find a bug? I cant find my logs at the beginning of my failing drive, what it said when I tried to remove the broken drive. I might be able to try the setup again (Got one more drive-about-to-fail) ps; Ive tried to make alpine to work, but it wont accept my passwords, I hope gmail web client is ok for you guys, openwrt dev team rejected my posts just because of this email client best regards Daniel end -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: (renamed thread) btrfs metrics, free space reporting
On 05/01/12 11:09, Daniel Pocock wrote: From there on, one could potentially create a matrix: (proportional font art, apologies): | subvol1 | subvol2 | subvol3 | --+--+--+--+ subvol1 | 200M | 20M | 50M | --+--+--+--+ subvol2 |20M |350M | 22M | --+--+--+--+ subvol3 |50M | 22M |634M | --+--+--+--+ The diagonal obviously shows the unique blocks, subvol2 and subvol1 share 20M data, etc. Missing from this plot would be how much is shared between subvol1, subvol2, and subvol3 together, but it's a start and not something that hard to understand. One might add a column for total size of each subvol, which may obviously not be an addition of the rest of the columns in this diagram. Anyway, something like this would be high on my list of `df` numbers I'd like to see - since I think they are useful numbers. This is an interesting way to look at it Ganglia typically records time series data, it is quite conceivable to create a metric for every permutation in each and store that in rrdtool The challenge would then be in reporting on the data: the rrdtool graphs use time as an X-axis, and then it can display multiple Y values However, now that I've started thinking about the type of data generated from btrfs, I was wondering if some kind of rr3dtool is needed - a 3D graphing solution - or potentially making graphs that do not include time on any axis? Has anyone seen anything similar for administering ZFS, for example? I just wanted to follow up on this and see if anybody had any more comments or if the situation has changed? One other thing that came to mind for me is the idea of letting the local system administrator define views (similar to views in SQL) and also nominate which of the views should be used to return values for the standard df command. This would allow existing monitoring tools and scripts to continue getting some data that is considered sensible for a specific context. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
disk failure but no alert
There are two large disks, part of the disks partitioned for MD RAID1 and the rest of the disks partitioned for BtrFs RAID1 One of the disks (/dev/sdd) appears to have failed, there were plenty of alerts from MD (including dmesg and emails) but nothing from the BtrFs filesystem Could this just be a problem on a sector within the MD RAID1 partition (/dev/sdd2) or is BtrFs failing to alert? If there is a failure on another partition on the same disk, should BtrFs be notified by the kernel in some way and should it consider the filesystem to be at risk? Should I do anything proactively to stop BtrFs using the /dev/sdd3 partition now? Unfortunately it is not possible to get a new disk to this server in the same day and it may just be shut down until the disk can be replaced. # uname -a Linux - 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64 GNU/Linux # btrfs fi show /dev/sdd3 Label: none uuid: - Total devices 2 FS bytes used 1.74TiB devid1 size 4.55TiB used 1.75TiB path /dev/sdd3 devid2 size 4.55TiB used 1.75TiB path /dev/sda3 Btrfs v3.17 Here is the dmesg output: [996932.734999] sd 0:0:3:0: [sdd] [996932.735039] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [996932.735047] sd 0:0:3:0: [sdd] [996932.735053] Sense Key : Illegal Request [current] [996932.735062] Info fld=0x80808 [996932.735069] sd 0:0:3:0: [sdd] [996932.735078] Add. Sense: Logical block address out of range [996932.735085] sd 0:0:3:0: [sdd] CDB: [996932.735089] Write(16): 8a 00 00 00 00 00 00 08 08 08 00 00 00 02 00 00 [996932.735110] end_request: critical target error, dev sdd, sector 526344 [996932.735280] md: super_written gets error=-121, uptodate=0 [996932.735290] md/raid1:md2: Disk failure on sdd2, disabling device. md/raid1:md2: Operation continuing on 1 devices. [996932.777853] RAID1 conf printout: [996932.777917] --- wd:1 rd:2 [996932.777925] disk 0, wo:0, o:1, dev:sda2 [996932.777931] disk 1, wo:1, o:0, dev:sdd2 [996932.794052] RAID1 conf printout: [996932.794063] --- wd:1 rd:2 [996932.794069] disk 0, wo:0, o:1, dev:sda2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Documentation: filesystems: btrfs: Fixed typos and whitespace
I am a high school student trying to become familiar with Linux kernel development. The btrfs documentation in Documentation/filesystems had a few typos and errors in whitespace. This patch corrects both of these. Signed-off-by: Daniel Grimshaw grims...@linux.vnet.ibm.com --- diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index d11cc2f..57d9d54 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -61,7 +61,7 @@ Options with (*) are default options and will not show in the mount options. check_int enables the integrity checker module, which examines all block write requests to ensure on-disk consistency, at a large -memory and CPU cost. +memory and CPU cost. check_int_data includes extent data in the integrity checks, and implies the check_int option. @@ -113,7 +113,7 @@ Options with (*) are default options and will not show in the mount options. Disable/enable debugging option to be more verbose in some ENOSPC conditions. fatal_errors=action -Action to take when encountering a fatal error: +Action to take when encountering a fatal error: bug - BUG() on a fatal error. This is the default. panic - panic() on a fatal error. @@ -132,10 +132,10 @@ Options with (*) are default options and will not show in the mount options. max_inline=bytes Specify the maximum amount of space, in bytes, that can be inlined in -a metadata B-tree leaf. The value is specified in bytes, optionally +a metadata B-tree leaf. The value is specified in bytes, optionally with a K, M, or G suffix, case insensitive. In practice, this value is limited by the root sector size, with some space unavailable due -to leaf headers. For a 4k sectorsize, max inline data is ~3900 bytes. +to leaf headers. For a 4k sector size, max inline data is ~3900 bytes. metadata_ratio=value Specify that 1 metadata chunk should be allocated after every value @@ -161,7 +161,7 @@ Options with (*) are default options and will not show in the mount options. datasum(*) nodatasum -Enable/disable data checksumming for newly created files. +Enable/disable data check-summing for newly created files. Datasum implies datacow. treelog(*) @@ -170,7 +170,7 @@ Options with (*) are default options and will not show in the mount options. recovery Enable autorecovery attempts if a bad tree root is found at mount time. -Currently this scans a list of several previous tree roots and tries to +Currently this scans a list of several previous tree roots and tries to use the first readable. rescan_uuid_tree @@ -194,7 +194,7 @@ Options with (*) are default options and will not show in the mount options. ssd_spread Options to control ssd allocation schemes. By default, BTRFS will enable or disable ssd allocation heuristics depending on whether a -rotational or nonrotational disk is in use. The ssd and nossd options +rotational or non-rotational disk is in use. The ssd and nossd options can override this autodetection. The ssd_spread mount option attempts to allocate into big chunks @@ -216,13 +216,13 @@ Options with (*) are default options and will not show in the mount options. This allows mounting of subvolumes which are not in the root of the mounted filesystem. You can use btrfs subvolume show to see the object ID for a subvolume. - + thread_pool=number The number of worker threads to allocate. The default number is equal to the number of CPUs + 2, or 8, whichever is smaller. user_subvol_rm_allowed -Allow subvolumes to be deleted by a non-root user. Use with caution. +Allow subvolumes to be deleted by a non-root user. Use with caution. MAILING LIST -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Documentation: filesystems: btrfs: Fixed typos and whitespace
I am a high school student trying to become familiar with Linux kernel development. The btrfs documentation in Documentation/filesystems had a few typos and errors in whitespace. This patch corrects both of these. This is a resend of an earlier patch with corrected patchfile. Signed-off-by: Daniel Grimshaw grims...@linux.vnet.ibm.com --- Documentation/filesystems/btrfs.txt | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index d11cc2f..c772b47 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -61,7 +61,7 @@ Options with (*) are default options and will not show in the mount options. check_int enables the integrity checker module, which examines all block write requests to ensure on-disk consistency, at a large - memory and CPU cost. + memory and CPU cost. check_int_data includes extent data in the integrity checks, and implies the check_int option. @@ -113,7 +113,7 @@ Options with (*) are default options and will not show in the mount options. Disable/enable debugging option to be more verbose in some ENOSPC conditions. fatal_errors=action - Action to take when encountering a fatal error: + Action to take when encountering a fatal error: bug - BUG() on a fatal error. This is the default. panic - panic() on a fatal error. @@ -132,10 +132,10 @@ Options with (*) are default options and will not show in the mount options. max_inline=bytes Specify the maximum amount of space, in bytes, that can be inlined in - a metadata B-tree leaf. The value is specified in bytes, optionally + a metadata B-tree leaf. The value is specified in bytes, optionally with a K, M, or G suffix, case insensitive. In practice, this value is limited by the root sector size, with some space unavailable due - to leaf headers. For a 4k sectorsize, max inline data is ~3900 bytes. + to leaf headers. For a 4k sector size, max inline data is ~3900 bytes. metadata_ratio=value Specify that 1 metadata chunk should be allocated after every value @@ -170,7 +170,7 @@ Options with (*) are default options and will not show in the mount options. recovery Enable autorecovery attempts if a bad tree root is found at mount time. - Currently this scans a list of several previous tree roots and tries to + Currently this scans a list of several previous tree roots and tries to use the first readable. rescan_uuid_tree @@ -194,7 +194,7 @@ Options with (*) are default options and will not show in the mount options. ssd_spread Options to control ssd allocation schemes. By default, BTRFS will enable or disable ssd allocation heuristics depending on whether a - rotational or nonrotational disk is in use. The ssd and nossd options + rotational or non-rotational disk is in use. The ssd and nossd options can override this autodetection. The ssd_spread mount option attempts to allocate into big chunks @@ -216,13 +216,13 @@ Options with (*) are default options and will not show in the mount options. This allows mounting of subvolumes which are not in the root of the mounted filesystem. You can use btrfs subvolume show to see the object ID for a subvolume. - + thread_pool=number The number of worker threads to allocate. The default number is equal to the number of CPUs + 2, or 8, whichever is smaller. user_subvol_rm_allowed - Allow subvolumes to be deleted by a non-root user. Use with caution. + Allow subvolumes to be deleted by a non-root user. Use with caution. MAILING LIST -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[WIP][PATCH] tux3: preliminatry nospace handling
Hi Josef, This is a rollup patch for preliminary nospace handling in Tux3, in line with my post here: http://lkml.iu.edu/hypermail/linux/kernel/1505.1/03167.html You still have ENOSPC issues. Maybe it would be helpful to look at what we have done. I saw a reproducible case with 1,000 tasks in parallel last week that went nospace while 28% full. You also are not giving a very good picture of the true full state via df. Our algorithm is pretty simple, reliable and fast. I do not see any reason why Btrfs could not do it basically the same way. In one way it is easier for you - you are not forced to commit the entire delta, you can choose the bits you want to force to disk as convenient. You have more different kinds of cache objects to account, but that should be just detail. Your current frontend accounting looks plausible. We're trying something a bit different with df, to see how it flies - we don't always return the same number to f_blocks, we actually return the volume size less the accounting reserve, which is variable. The reserve gets smaller as freespace gets smaller, so it is not a nasty surprise to the user to see it change, rather a pleasant surprise. What it does is make the 100% really be 100%, less just a handful of blocks, and it makes used and available add up exactly to blocks. If the user wants to know how many blocks they really have, they can look at /proc/partitions. Regards, Daniel diff --git a/fs/tux3/commit.c b/fs/tux3/commit.c index 909a222..7043580 100644 --- a/fs/tux3/commit.c +++ b/fs/tux3/commit.c @@ -297,6 +297,7 @@ static int commit_delta(struct sb *sb) tux3_wake_delta_commit(sb); /* Commit was finished, apply defered bfree. */ + sb-defreed = 0; return unstash(sb, sb-defree, apply_defered_bfree); } @@ -321,13 +322,13 @@ static int need_unify(struct sb *sb) /* For debugging */ void tux3_start_backend(struct sb *sb) { - assert(current-journal_info == NULL); + assert(!change_active()); current-journal_info = sb; } void tux3_end_backend(void) { - assert(current-journal_info); + assert(change_active()); current-journal_info = NULL; } @@ -337,12 +338,103 @@ int tux3_under_backend(struct sb *sb) return current-journal_info == sb; } +/* Internal use only */ +static struct delta_ref *to_delta_ref(struct sb *sb, unsigned delta) +{ + return sb-delta_refs[tux3_delta(delta)]; +} + +static block_t newfree(struct sb *sb) +{ + return sb-freeblocks + sb-defreed; +} + +/* + * Reserve size should vary with budget. The reserve can include the + * log block overhead on the assumption that every block in the budget + * is a data block that generates one log record (or two?). + */ +block_t set_budget(struct sb *sb) +{ + block_t reserve = sb-freeblocks 7; /* FIXME: magic number */ + + if (1) { + if (reserve max_reserve_blocks) + reserve = max_reserve_blocks; + if (reserve min_reserve_blocks) + reserve = min_reserve_blocks; + } else if (0) + reserve = 10; + + block_t budget = newfree(sb) - reserve; + if (1) + tux3_msg(sb, set_budget: free %Li, budget %Li, reserve %Li, newfree(sb), budget, reserve); + sb-reserve = reserve; + atomic_set(sb-budget, budget); + return reserve; +} + +/* + * After transition, the front delta may have used some of the balance + * left over from this delta. The charged amount of the back delta is + * now stable and gives the exact balance at transition by subtracting + * from the old budget. The difference between the new budget and the + * balance at transition, which must never be negative, is added to + * the current balance, so the effect is exactly the same as if we had + * set the new budget and balance atomically at transition time. But + * we do not know the new balance at transition time and even if we + * did, we would need to add serialization against frontend changes, + * which are currently lockless and would like to stay that way. So we + * let the current delta charge against the remaining balance until + * flush is done, here, then adjust the balance to what it would have + * been if the budget had been reset exactly at transition. + * + * We have: + * + *consumed = oldfree - free + *oldbudget = oldfree - reserve + *newbudget = free - reserve + *transition_balance = oldbudget - charged + * + * Factoring out the reserve, the balance adjustment is: + * + *adjust = newbudget - transition_balance + * = (free - reserve) - ((oldfree - reserve) - charged) + * = free + (charged - oldfree) + * = charged + (free - oldfree) + * = charged - consumed + * + * To extend for variable reserve size, add the difference between + * old and new reserve size to the balance adjustment. + */ +void reset_balance(struct sb *sb, unsigned delta
Re: Tux3 Report: How fast can we fsync?
On 04/30/2015 04:14 AM, Filipe Manana wrote: On 04/30/2015 11:28 AM, Daniel Phillips wrote: It looks like Btrfs hit a bug, not a huge surprise. Btrfs hit an assert for me earlier this evening. It is rare but it happens. Hi Daniel, Would you mind reporting (to linux-btrfs@vger.kernel.org) the bug/assertion you hit during your tests with btrfs? Kernel 3.19.0 under KVM with BTRFS mounted on a file in /tmp, see the KVM command below. I believe I was running the 10,000 task test using the sync program below: syncs foo 10 1. 346 [ cut here ] 347 kernel BUG at fs/btrfs/extent_io.c:4548! 348 invalid opcode: [#1] PREEMPT SMP 349 Modules linked in: 350 CPU: 2 PID: 5754 Comm: sync6 Not tainted 3.19.0-56544-g65cf1a5 #756 351 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 352 task: ec3c0ea0 ti: ec3ea000 task.ti: ec3ea000 353 EIP: 0060:[c1301a30] EFLAGS: 00010202 CPU: 2 354 EIP is at btrfs_release_extent_buffer_page+0xf0/0x100 355 EAX: 0001 EBX: f47198f0 ECX: EDX: 0001 356 ESI: f47198f0 EDI: f61f1808 EBP: ec3ebbac ESP: ec3ebb9c 357 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 358 CR0: 8005003b CR2: b756a356 CR3: 2c3ce000 CR4: 06d0 359 Stack: 360 0005 f47198f0 f61f1000 f61f1808 ec3ebbc0 c1301a7f f47198f0 361 f6a3d940 ec3ebbcc c1301ee5 d9a6c770 ec3ebbdc c12b436d fff92000 da136b20 362 ec3ebc74 c12e42b6 0c00 1000 363 Call Trace: 364 [c1301a7f] release_extent_buffer+0x3f/0xb0 365 [c1301ee5] free_extent_buffer+0x45/0x80 366 [c12b436d] btrfs_release_path+0x2d/0x90 367 [c12e42b6] cow_file_range_inline+0x466/0x600 368 [c12e495e] cow_file_range+0x50e/0x640 369 [c12fdde1] ? find_lock_delalloc_range.constprop.42+0x2e1/0x320 370 [c12e5af9] run_delalloc_range+0x419/0x450 371 [c12fdf6b] writepage_delalloc.isra.32+0x14b/0x1d0 372 [c12ff20e] __extent_writepage+0xde/0x2b0 373 [c11208fd] ? find_get_pages_tag+0xad/0x120 374 [c130135c] extent_writepages+0x29c/0x350 375 [c12e1530] ? btrfs_direct_IO+0x300/0x300 376 [c12e009f] btrfs_writepages+0x1f/0x30 377 [c11299e5] do_writepages+0x15/0x40 378 [c112199f] __filemap_fdatawrite_range+0x4f/0x60 379 [c1121aa2] filemap_fdatawrite_range+0x22/0x30 380 [c12f4768] btrfs_fdatawrite_range+0x28/0x70 381 [c12f47d1] start_ordered_ops+0x21/0x30 382 [c12f4823] btrfs_sync_file+0x43/0x370 383 [c115c3e5] ? vfs_write+0x135/0x1c0 384 [c12f47e0] ? start_ordered_ops+0x30/0x30 385 [c1183e27] do_fsync+0x47/0x70 386 [c118403d] SyS_fsync+0xd/0x10 387 [c15bd8ae] syscall_call+0x7/0x7 388 Code: 8b 03 f6 c4 20 75 26 f0 80 63 01 f7 c7 43 1c 00 00 00 00 89 d8 e8 61 94 e2 ff eb c3 8d b4 26 00 00 00 00 83 c4 04 5b 5e 5f 5d c3 0f 0b 0f 0b 388 0f 0b 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 57 56 389 EIP: [c1301a30] btrfs_release_extent_buffer_page+0xf0/0x100 SS:ESP 0068:ec3ebb9c 390 ---[ end trace 12b9bbe75d9541a3 ]--- KVM command: mkfs.btrfs -f /tmp/disk.img kvm -kernel /src/linux-tux3/arch/x86/boot/bzImage -append root=/dev/sda1 console=ttyS0 console=tty0 oops=panic tux3.tux3_trace=0 -serial file:serial.txt -hda /more/kvm/hdd.img -hdb /tmp/disk.img -net nic -net user,hostfwd=tcp::1234-:22 -smp 4 -m 2000 Source code: /* * syncs.c * * D.R. Phillips, 2015 * * To build: c99 -Wall syncs.c -o syncs * To run: ./syncs [filename [syncs [tasks]]] */ #include unistd.h #include stdlib.h #include stdio.h #include fcntl.h #include sys/wait.h #include errno.h #include sys/stat.h char text[1024] = { hello world!\n }; int main(int argc, const char *argv[]) { const char *basename = argc 1 ? foo : argv[1]; char name[100]; int steps = argc 3 ? 1 : atoi(argv[2]); int tasks = argc 4 ? 1 : atoi(argv[3]); int err, fd; for (int t = 0; t tasks; t++) { snprintf(name, sizeof name, %s%i, basename, t); if (!fork()) goto child; } for (int t = 0; t tasks; t++) wait(err); return 0; child: fd = creat(name, S_IRWXU); for (int i = 0; i steps; i++) { write(fd, text, sizeof text); fsync(fd); } return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID1 migrate to bigger disks
I've got a RAID1 on two 1TB partitions, /dev/sda3 and /dev/sdb3 I'm adding two new disks, they will have bigger partitions /dev/sdc3 and /dev/sdd3 I'd like the BtrFs to migrate from the old partitions to the new ones as safely and quickly as possible and if it is reasonable to do so, keeping it online throughout the migration. Should I do the following: btrfs device add /dev/sdc3 /dev/sdd3 /mnt/btrfs0 btrfs device delete /dev/sda3 /dev/sdb3 /mnt/btrfs0 or should I do it this way: btrfs device add /dev/sdc3 /mnt/btrfs0 btrfs device delete /dev/sda3 /mnt/btrfs0 btrfs device add /dev/sdd3 /mnt/btrfs0 btrfs device delete /dev/sdb3 /mnt/btrfs0 or is there some other way to go about it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 migrate to bigger disks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 24/01/15 15:36, Hugo Mills wrote: On Sat, Jan 24, 2015 at 03:32:44PM +0100, Daniel Pocock wrote: I've got a RAID1 on two 1TB partitions, /dev/sda3 and /dev/sdb3 I'm adding two new disks, they will have bigger partitions /dev/sdc3 and /dev/sdd3 I'd like the BtrFs to migrate from the old partitions to the new ones as safely and quickly as possible and if it is reasonable to do so, keeping it online throughout the migration. Should I do the following: btrfs device add /dev/sdc3 /dev/sdd3 /mnt/btrfs0 btrfs device delete /dev/sda3 /dev/sdb3 /mnt/btrfs0 or should I do it this way: btrfs device add /dev/sdc3 /mnt/btrfs0 btrfs device delete /dev/sda3 /mnt/btrfs0 btrfs device add /dev/sdd3 /mnt/btrfs0 btrfs device delete /dev/sdb3 /mnt/btrfs0 or is there some other way to go about it? btrfs replace start /dev/sda3 /dev/sdc3 /mountpoint btrfs fi resize 3:max /mountpoint btrfs replace start /dev/sdb3 /dev/sdd3 /mountpoint btrfs fi resize 4:max /mountpoint The 3 and 4 in the resize commands should be the devid of the newly-added device. Thanks for the fast reply In the event of power failure, can I safely shutdown the server during this operation and resume after starting again? I get more than 2 hours runtime from the UPS but I suspect that migrating 1TB will take at least 12 hours. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iQIcBAEBCAAGBQJUw7A+AAoJEOm1uwJp1aqDIuoQAKLkc0DgDj1bE6b6RPh6cnb9 lm+rjJD6aCo84dKZ3kVjaYCuUrFK5uXdj1D3ZrELD//jyjr6HbMK7CSJzuzfxmol rMRM0NaVb4RJ4WPuzbnUjT8pmNytisfP9oG1mV9JmJ+y6sZ2ApvOQwPyHpHWglSL D+H4clpOa3jXCeNoVxjm1eipLSWnnpSO4NVdXTIgBiHqUaR+LpKpUlh3QGtknvV/ uHegNuTJ+C4/Stp3hrzKy0/OUlIyzucFEXKPJbI/88XvMuZcL/XTO8FoHRP4r/vq qjICj4Dtjv+xOOe7WKT1Gw8wCz/66xMfSXIUSU02insfwfh0/fFpAS6XybKk4UsH i7LwswqduJMgFiVHv9bMvwyx3UdmhVJRotjGobVP3XPbI3GMSCEztXHdSGLOFE9D /IksBehi0XNw/YWOaLcoyA2XTXahBTcsTtktkZStrn5kKXvOPuE7LDyjkHq/o9W8 IYvti9Dvx2IicdJxRM7+5F6bKON2O7foDuSUJFd6/WAkrLVwdudurTqGDmk+uIdS kZVUVpehmdjYltUyb4wY/ATAvKnQTm/U18L04pSQIbdtdQZD7bAVl7PotLctgdHn xf7TokhjJZZmOk4C29m+uAQHy0gobDDXlPi3jtpO4Zj+CR9pXM1/+oa40xhWh4Eh WtDofKi5z7BLVYFNqIix =BR1V -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BtrFs on drives with error recovery control / TLER?
Hi, Can anybody comment on how BtrFs (particularly RAID1 mirroring) interacts with drives that offer error recovery control (or TLER in WDC terms)? I generally prefer to buy this type of drive for any serious data storage purposes I notice ZFS gets a mention in the Wikipedia article about the topic: http://en.wikipedia.org/wiki/Error_recovery_control Should BtrFs be mentioned there too? Regards, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3
Hello again, Sorry for the delay, I had some things to do this past week, including figuring out the stability problems that I was having, but everything is good now. I rebuilt the Fedora package for btrfs-progs 3.17.2 with your patches, and btrfsck successfully removed the orphan file! The contents seem to be intact in /lost+found. Thank you very much Qu, you've been immensily helpful. Regards, Daniel On Wed, Nov 26, 2014 at 1:07 AM, Daniel Miranda danielk...@gmail.com wrote: Alright, I'll just have to understand how to build btrfs-progs now, since I'm currently just using the packages from the Fedora repo. Thanks for all the help and time spent so far, Daniel On Wed, Nov 26, 2014 at 12:41 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Hi Daniel, With your btrfs-image dump, I tested with my patchset sent to maillist, my patchset succeeds fixing the image. You can get the patchset and then apply it on 3.17.2, and --repair should fix it. The file with nlink error will be moved to 'lost+found' dir. Although the best fixing should be just adding the missing dir_index, but currently the patchset does quite well and does not need to do any modify. The patchset can be extracted using patchwork: 0001: https://patchwork.kernel.org/patch/5364131/mbox/ 0002: https://patchwork.kernel.org/patch/5364141/mbox/ 0003: https://patchwork.kernel.org/patch/5364101/mbox/ 0004 v2: https://patchwork.kernel.org/patch/5383611/mbox/ 0005 v2: https://patchwork.kernel.org/patch/5383601/mbox/ 0006: https://patchwork.kernel.org/patch/5364151/mbox Any feedback is welcomed to improve the patches. Thanks, Qu Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 15:42 I just ran the repair but the ghost file has not disappeared, unfortunately. On Tue, Nov 25, 2014 at 5:26 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 15:20 Here are the logs. I'll send you a link to my dump directly after I finish uploading it. Please notify me when you have downloaded it so I can delete it. checking extents checking free space cache checking fs roots root 5 inode 17149868 errors 2000, link count wrong unresolved ref dir 17182377 index 245 namelen 8 name string.h filetype 1 errors 1, no dir item link count error seems resolved by Josef's patch commit already in 3.17.2. If using 3.17.2, josef's commit will rebuild the dir item and dir index. root 5 inode 17182377 errors 200, dir isize wrong This isize error seems caused by previous line. If 3.17.2 can repair above problem, it should not be a problem and will disappear. According to the above output, btrfsck --repair with btrfs-progs 3.17.2 has a good chance repairing it. Just have a try. Thanks, Qu Checking filesystem on /dev/mapper/fedora_daniel--pc-root UUID: fef8f718-0622-4cb1-9597-749650d366a4 found 55108022156 bytes used err is 1 total csum bytes: 89787396 total tree bytes: 2303455232 total fs tree bytes: 2024841216 total extent tree bytes: 145272832 btree space waste bytes: 529672422 file data blocks allocated: 253414481920 referenced 94127726592 Btrfs v3.17 Regards, Daniel On Tue, Nov 25, 2014 at 3:20 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 13:14 I'll go run that and get you the output. Thanks. I can do the image dump, sure. I don't know how long it might take to upload it somewhere though. Right now `btrfs fi df` shows about 2GiB of metadata (it's a 120GiB volume). I'll see how large it ends up after compression. 120G volume seems quite small, compared the images I received recently (1T x2 RAID1 and 4T single). With '-c 9' it shouldn't be too huge I think(The 1T raid1 is about 1G metadata with -c9). BTW, btrfs-image dump will have all the filenames and hierarchy, even without its data, it is still better considering your privacy twice before uploading. Thanks, Qu Thanks for the quick response, Daniel On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Hi, What's the btrfsck output? Without --repair option. Also, if it is OK for you, would you please dump the btrfs with 'btrfs-image' command? '-c 9' option is highly recommended considering the size of it. This will helps a lot for developers to test the btrfsck repair function. Thanks, Qu Original
Re: [RFC PATCH] Btrfs: add sha256 checksum option
2014-11-25 11:30 GMT+01:00 Liu Bo bo.li@oracle.com: On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote: On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte holger.hoffstae...@googlemail.com wrote: Would there be room for a compromise with e.g. 128 bits? For example, Spooky V2 hash is 128 bits and is very fast. It is noncryptographic, but it is more than adequate for data checksums. http://burtleburtle.net/bob/hash/spooky.html SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine (Xeon E3-1270 V2 @ 3.50Ghz) Thanks for the suggestion, I'll take a look. Btw, it's not in kernel yet, is it? The best option would be blake2b, but it isn't implemented in the kernel. It is not a problem to use it locally (I can upload the code stripped for usage in kernel). from https://blake2.net/ Q: Why do you want BLAKE2 to be fast? Aren't fast hashes bad? A: You want your hash function to be fast if you are using it to compute the secure hash of a large amount of data, such as in distributed filesystems (e.g. Tahoe-LAFS), cloud storage systems (e.g. OpenStack Swift), intrusion detection systems (e.g. Samhain), integrity-checking local filesystems (e.g. ZFS), peer-to-peer file-sharing tools (e.g. BitTorrent), or version control systems (e.g. git). You only want your hash function to be slow if you're using it to stretch user-supplied passwords, in which case see the next question. https://blake2.net/ https://github.com/floodyberry/blake2b-opt Best regards, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3
Alright, I'll just have to understand how to build btrfs-progs now, since I'm currently just using the packages from the Fedora repo. Thanks for all the help and time spent so far, Daniel On Wed, Nov 26, 2014 at 12:41 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Hi Daniel, With your btrfs-image dump, I tested with my patchset sent to maillist, my patchset succeeds fixing the image. You can get the patchset and then apply it on 3.17.2, and --repair should fix it. The file with nlink error will be moved to 'lost+found' dir. Although the best fixing should be just adding the missing dir_index, but currently the patchset does quite well and does not need to do any modify. The patchset can be extracted using patchwork: 0001: https://patchwork.kernel.org/patch/5364131/mbox/ 0002: https://patchwork.kernel.org/patch/5364141/mbox/ 0003: https://patchwork.kernel.org/patch/5364101/mbox/ 0004 v2: https://patchwork.kernel.org/patch/5383611/mbox/ 0005 v2: https://patchwork.kernel.org/patch/5383601/mbox/ 0006: https://patchwork.kernel.org/patch/5364151/mbox Any feedback is welcomed to improve the patches. Thanks, Qu Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 15:42 I just ran the repair but the ghost file has not disappeared, unfortunately. On Tue, Nov 25, 2014 at 5:26 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 15:20 Here are the logs. I'll send you a link to my dump directly after I finish uploading it. Please notify me when you have downloaded it so I can delete it. checking extents checking free space cache checking fs roots root 5 inode 17149868 errors 2000, link count wrong unresolved ref dir 17182377 index 245 namelen 8 name string.h filetype 1 errors 1, no dir item link count error seems resolved by Josef's patch commit already in 3.17.2. If using 3.17.2, josef's commit will rebuild the dir item and dir index. root 5 inode 17182377 errors 200, dir isize wrong This isize error seems caused by previous line. If 3.17.2 can repair above problem, it should not be a problem and will disappear. According to the above output, btrfsck --repair with btrfs-progs 3.17.2 has a good chance repairing it. Just have a try. Thanks, Qu Checking filesystem on /dev/mapper/fedora_daniel--pc-root UUID: fef8f718-0622-4cb1-9597-749650d366a4 found 55108022156 bytes used err is 1 total csum bytes: 89787396 total tree bytes: 2303455232 total fs tree bytes: 2024841216 total extent tree bytes: 145272832 btree space waste bytes: 529672422 file data blocks allocated: 253414481920 referenced 94127726592 Btrfs v3.17 Regards, Daniel On Tue, Nov 25, 2014 at 3:20 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 13:14 I'll go run that and get you the output. Thanks. I can do the image dump, sure. I don't know how long it might take to upload it somewhere though. Right now `btrfs fi df` shows about 2GiB of metadata (it's a 120GiB volume). I'll see how large it ends up after compression. 120G volume seems quite small, compared the images I received recently (1T x2 RAID1 and 4T single). With '-c 9' it shouldn't be too huge I think(The 1T raid1 is about 1G metadata with -c9). BTW, btrfs-image dump will have all the filenames and hierarchy, even without its data, it is still better considering your privacy twice before uploading. Thanks, Qu Thanks for the quick response, Daniel On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Hi, What's the btrfsck output? Without --repair option. Also, if it is OK for you, would you please dump the btrfs with 'btrfs-image' command? '-c 9' option is highly recommended considering the size of it. This will helps a lot for developers to test the btrfsck repair function. Thanks, Qu Original Message Subject: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: linux-btrfs@vger.kernel.org Date: 2014年11月25日 13:04 Hello, After I had some brief stability issues with my computer, it seems some form of metadata corruption took place in my BTRFS filesystem, and now a particular file seems to exist, but I cannot access any details on it or delete it. If I try to `ls
Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3
Hello, After I had some brief stability issues with my computer, it seems some form of metadata corruption took place in my BTRFS filesystem, and now a particular file seems to exist, but I cannot access any details on it or delete it. If I try to `ls` in the directory it is in, that's what I get: ls: cannot access string.h: No such file or directory total 0 drwxr-xr-x. 1 danielkza mock 16 Nov 21 14:18 ./ drwxr-xr-x. 1 danielkza mock 6 Nov 21 14:18 ../ -?? ? ? ? ?? string.h If I try to delete it I get: rm: cannot remove ‘string.h’: No such file or directory I'm using kernel 3.17.3 from Fedora 21. I got no messages on dmesg or anything of the sort. I know the btrfs fsck situation is complicated, but is there any utility I should use to try and repair this? Losing this file is not a problem, it's just one header from the kernel I was building. Regards, Daniel Miranda -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3
I'll go run that and get you the output. I can do the image dump, sure. I don't know how long it might take to upload it somewhere though. Right now `btrfs fi df` shows about 2GiB of metadata (it's a 120GiB volume). I'll see how large it ends up after compression. Thanks for the quick response, Daniel On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Hi, What's the btrfsck output? Without --repair option. Also, if it is OK for you, would you please dump the btrfs with 'btrfs-image' command? '-c 9' option is highly recommended considering the size of it. This will helps a lot for developers to test the btrfsck repair function. Thanks, Qu Original Message Subject: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: linux-btrfs@vger.kernel.org Date: 2014年11月25日 13:04 Hello, After I had some brief stability issues with my computer, it seems some form of metadata corruption took place in my BTRFS filesystem, and now a particular file seems to exist, but I cannot access any details on it or delete it. If I try to `ls` in the directory it is in, that's what I get: ls: cannot access string.h: No such file or directory total 0 drwxr-xr-x. 1 danielkza mock 16 Nov 21 14:18 ./ drwxr-xr-x. 1 danielkza mock 6 Nov 21 14:18 ../ -?? ? ? ? ?? string.h If I try to delete it I get: rm: cannot remove ‘string.h’: No such file or directory I'm using kernel 3.17.3 from Fedora 21. I got no messages on dmesg or anything of the sort. I know the btrfs fsck situation is complicated, but is there any utility I should use to try and repair this? Losing this file is not a problem, it's just one header from the kernel I was building. Regards, Daniel Miranda -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3
Here are the logs. I'll send you a link to my dump directly after I finish uploading it. Please notify me when you have downloaded it so I can delete it. checking extents checking free space cache checking fs roots root 5 inode 17149868 errors 2000, link count wrong unresolved ref dir 17182377 index 245 namelen 8 name string.h filetype 1 errors 1, no dir item root 5 inode 17182377 errors 200, dir isize wrong Checking filesystem on /dev/mapper/fedora_daniel--pc-root UUID: fef8f718-0622-4cb1-9597-749650d366a4 found 55108022156 bytes used err is 1 total csum bytes: 89787396 total tree bytes: 2303455232 total fs tree bytes: 2024841216 total extent tree bytes: 145272832 btree space waste bytes: 529672422 file data blocks allocated: 253414481920 referenced 94127726592 Btrfs v3.17 Regards, Daniel On Tue, Nov 25, 2014 at 3:20 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 13:14 I'll go run that and get you the output. Thanks. I can do the image dump, sure. I don't know how long it might take to upload it somewhere though. Right now `btrfs fi df` shows about 2GiB of metadata (it's a 120GiB volume). I'll see how large it ends up after compression. 120G volume seems quite small, compared the images I received recently (1T x2 RAID1 and 4T single). With '-c 9' it shouldn't be too huge I think(The 1T raid1 is about 1G metadata with -c9). BTW, btrfs-image dump will have all the filenames and hierarchy, even without its data, it is still better considering your privacy twice before uploading. Thanks, Qu Thanks for the quick response, Daniel On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Hi, What's the btrfsck output? Without --repair option. Also, if it is OK for you, would you please dump the btrfs with 'btrfs-image' command? '-c 9' option is highly recommended considering the size of it. This will helps a lot for developers to test the btrfsck repair function. Thanks, Qu Original Message Subject: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: linux-btrfs@vger.kernel.org Date: 2014年11月25日 13:04 Hello, After I had some brief stability issues with my computer, it seems some form of metadata corruption took place in my BTRFS filesystem, and now a particular file seems to exist, but I cannot access any details on it or delete it. If I try to `ls` in the directory it is in, that's what I get: ls: cannot access string.h: No such file or directory total 0 drwxr-xr-x. 1 danielkza mock 16 Nov 21 14:18 ./ drwxr-xr-x. 1 danielkza mock 6 Nov 21 14:18 ../ -?? ? ? ? ?? string.h If I try to delete it I get: rm: cannot remove ‘string.h’: No such file or directory I'm using kernel 3.17.3 from Fedora 21. I got no messages on dmesg or anything of the sort. I know the btrfs fsck situation is complicated, but is there any utility I should use to try and repair this? Losing this file is not a problem, it's just one header from the kernel I was building. Regards, Daniel Miranda -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3
I just ran the repair but the ghost file has not disappeared, unfortunately. On Tue, Nov 25, 2014 at 5:26 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 15:20 Here are the logs. I'll send you a link to my dump directly after I finish uploading it. Please notify me when you have downloaded it so I can delete it. checking extents checking free space cache checking fs roots root 5 inode 17149868 errors 2000, link count wrong unresolved ref dir 17182377 index 245 namelen 8 name string.h filetype 1 errors 1, no dir item link count error seems resolved by Josef's patch commit already in 3.17.2. If using 3.17.2, josef's commit will rebuild the dir item and dir index. root 5 inode 17182377 errors 200, dir isize wrong This isize error seems caused by previous line. If 3.17.2 can repair above problem, it should not be a problem and will disappear. According to the above output, btrfsck --repair with btrfs-progs 3.17.2 has a good chance repairing it. Just have a try. Thanks, Qu Checking filesystem on /dev/mapper/fedora_daniel--pc-root UUID: fef8f718-0622-4cb1-9597-749650d366a4 found 55108022156 bytes used err is 1 total csum bytes: 89787396 total tree bytes: 2303455232 total fs tree bytes: 2024841216 total extent tree bytes: 145272832 btree space waste bytes: 529672422 file data blocks allocated: 253414481920 referenced 94127726592 Btrfs v3.17 Regards, Daniel On Tue, Nov 25, 2014 at 3:20 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Original Message Subject: Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年11月25日 13:14 I'll go run that and get you the output. Thanks. I can do the image dump, sure. I don't know how long it might take to upload it somewhere though. Right now `btrfs fi df` shows about 2GiB of metadata (it's a 120GiB volume). I'll see how large it ends up after compression. 120G volume seems quite small, compared the images I received recently (1T x2 RAID1 and 4T single). With '-c 9' it shouldn't be too huge I think(The 1T raid1 is about 1G metadata with -c9). BTW, btrfs-image dump will have all the filenames and hierarchy, even without its data, it is still better considering your privacy twice before uploading. Thanks, Qu Thanks for the quick response, Daniel On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Hi, What's the btrfsck output? Without --repair option. Also, if it is OK for you, would you please dump the btrfs with 'btrfs-image' command? '-c 9' option is highly recommended considering the size of it. This will helps a lot for developers to test the btrfsck repair function. Thanks, Qu Original Message Subject: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3 From: Daniel Miranda danielk...@gmail.com To: linux-btrfs@vger.kernel.org Date: 2014年11月25日 13:04 Hello, After I had some brief stability issues with my computer, it seems some form of metadata corruption took place in my BTRFS filesystem, and now a particular file seems to exist, but I cannot access any details on it or delete it. If I try to `ls` in the directory it is in, that's what I get: ls: cannot access string.h: No such file or directory total 0 drwxr-xr-x. 1 danielkza mock 16 Nov 21 14:18 ./ drwxr-xr-x. 1 danielkza mock 6 Nov 21 14:18 ../ -?? ? ? ? ?? string.h If I try to delete it I get: rm: cannot remove ‘string.h’: No such file or directory I'm using kernel 3.17.3 from Fedora 21. I got no messages on dmesg or anything of the sort. I know the btrfs fsck situation is complicated, but is there any utility I should use to try and repair this? Losing this file is not a problem, it's just one header from the kernel I was building. Regards, Daniel Miranda -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: disk-io: replace root args iff only fs_info used
This is the 3rd independent patch of a larger project to cleanup btrfs's internal usage of btrfs_root. Many functions take btrfs_root only to grab the fs_info struct. By requiring a root these functions cause programmer overhead. That these functions can accept any valid root is not obvious until inspection. This patch reduces the specificity of such functions to accept the fs_info directly. These patches can be applied independently and thus are not being submitted as a patch series. There should be about 26 patches by the project's completion. Each patch will cleanup between 1 and 34 functions apiece. Each patch covers a single file's functions. This patch affects the following function(s): 1) csum_tree_block 2) csum_dirty_buffer 3) check_tree_block_fsid 4) btrfs_find_tree_block 5) clean_tree_block Signed-off-by: Daniel Dressler danieru.dress...@gmail.com --- fs/btrfs/ctree.c | 26 +- fs/btrfs/disk-io.c | 32 fs/btrfs/disk-io.h | 4 ++-- fs/btrfs/extent-tree.c | 6 +++--- 4 files changed, 34 insertions(+), 34 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 19bc616..e76a6ba 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1075,7 +1075,7 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, ret = btrfs_dec_ref(trans, root, buf, 1); BUG_ON(ret); /* -ENOMEM */ } - clean_tree_block(trans, root, buf); + clean_tree_block(trans, root-fs_info, buf); *last_ref = 1; } return 0; @@ -1681,7 +1681,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans, continue; } - cur = btrfs_find_tree_block(root, blocknr); + cur = btrfs_find_tree_block(root-fs_info, blocknr); if (cur) uptodate = btrfs_buffer_uptodate(cur, gen, 0); else @@ -1946,7 +1946,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, path-locks[level] = 0; path-nodes[level] = NULL; - clean_tree_block(trans, root, mid); + clean_tree_block(trans, root-fs_info, mid); btrfs_tree_unlock(mid); /* once for the path */ free_extent_buffer(mid); @@ -2000,7 +2000,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, if (wret 0 wret != -ENOSPC) ret = wret; if (btrfs_header_nritems(right) == 0) { - clean_tree_block(trans, root, right); + clean_tree_block(trans, root-fs_info, right); btrfs_tree_unlock(right); del_ptr(root, path, level + 1, pslot + 1); root_sub_used(root, right-len); @@ -2044,7 +2044,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, BUG_ON(wret == 1); } if (btrfs_header_nritems(mid) == 0) { - clean_tree_block(trans, root, mid); + clean_tree_block(trans, root-fs_info, mid); btrfs_tree_unlock(mid); del_ptr(root, path, level + 1, pslot); root_sub_used(root, mid-len); @@ -2262,7 +2262,7 @@ static void reada_for_search(struct btrfs_root *root, search = btrfs_node_blockptr(node, slot); blocksize = root-nodesize; - eb = btrfs_find_tree_block(root, search); + eb = btrfs_find_tree_block(root-fs_info, search); if (eb) { free_extent_buffer(eb); return; @@ -2324,7 +2324,7 @@ static noinline void reada_for_balance(struct btrfs_root *root, if (slot 0) { block1 = btrfs_node_blockptr(parent, slot - 1); gen = btrfs_node_ptr_generation(parent, slot - 1); - eb = btrfs_find_tree_block(root, block1); + eb = btrfs_find_tree_block(root-fs_info, block1); /* * if we get -eagain from btrfs_buffer_uptodate, we * don't want to return eagain here. That will loop @@ -2337,7 +2337,7 @@ static noinline void reada_for_balance(struct btrfs_root *root, if (slot + 1 nritems) { block2 = btrfs_node_blockptr(parent, slot + 1); gen = btrfs_node_ptr_generation(parent, slot + 1); - eb = btrfs_find_tree_block(root, block2); + eb = btrfs_find_tree_block(root-fs_info, block2); if (eb btrfs_buffer_uptodate(eb, gen, 1) != 0) block2 = 0; free_extent_buffer(eb); @@ -2455,7 +2455,7 @@ read_block_for_search(struct btrfs_trans_handle *trans, blocknr = btrfs_node_blockptr(b, slot); gen = btrfs_node_ptr_generation(b, slot); - tmp
Re: [PATCH] Btrfs: ctree: reduce args where only fs_info used
Ah thanks David for looking at this. Sorry for the thin paragraphs my vim was warming too early about long lines. I will reformat it to break at 74 chars. No problem, I'll redo everything so it is one function per patch. Now fair warning: there are about 102 functions to cleanup. I was a bit worried that many patches would cause too much maintainer overhead but it is no problem for me. Only a few functions have dependecies on other functions needing cleanup. Thus there will be some small patch series for those function sets. A big benefit of one function one patch is that extent-io.c will no longer be a 34 function monster patch. Thank you David, I'll redo all these patches. Is there any rate limiting I should be doing? I don't want to flood the list with burst of dozen plus patches, or is that an okay volume? Daniel 2014-11-22 0:55 GMT+09:00 David Sterba dste...@suse.cz: On Wed, Nov 12, 2014 at 01:43:09PM +0900, Daniel Dressler wrote: This patch is part of a larger project to cleanup btrfs's internal usage of struct btrfs_root. Many functions take btrfs_root only to grab a pointer to fs_info. Thanks for picking up the project. A mere formality, can you please justify the paragraphs to 74 chars? -- This patch is part of a larger project to cleanup btrfs's internal usage of struct btrfs_root. Many functions take btrfs_root only to grab a pointer to fs_info. -- This patch does not address the two functions in ctree.c (insert_ptr, and split_item) which only use root for BUG_ONs in ctree.c This patch affects the following functions: 1) fixup_low_keys 2) btrfs_set_item_key_safe Please send one patch per function change, unless there are more that are somehow entangled that it would make it hard to separate. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: disk-io: replace root args iff only fs_info used
Thank you David this is helpful feedback. What would a cover letter be like? Would that be a separate email to the list, or maybe the first email in a patch series? Sorry I've twice looked for the integration repo. I found some that look like it could be but those had older commits. Could you direct me to the exact branch I'd love to work against it. These patches were done against linux-next. I think small one function patches might be best. I have the codebase mapped out and each file's functions-to-be-cleaned count varies wildly. If I did batch files together and split large files apart there would be no rhyme or reason for the groupings. With single function patches it is very clear what changes are justified since they should only occur in the affected function or in a call-site. With multiple functions the call-site changes get mixed up would it would be harder to review. Daniel 2014-11-22 1:15 GMT+09:00 David Sterba dste...@suse.cz: On Fri, Nov 21, 2014 at 05:15:07PM +0900, Daniel Dressler wrote: This is the 3rd independent patch of a larger project to cleanup btrfs's internal usage of btrfs_root. Many functions take btrfs_root only to grab the fs_info struct. By requiring a root these functions cause programmer overhead. That these functions can accept any valid root is not obvious until inspection. This patch reduces the specificity of such functions to accept the fs_info directly. These patches can be applied independently and thus are not being submitted as a patch series. There should be about 26 patches by the project's completion. Each patch will cleanup between 1 and 34 functions apiece. Each patch covers a single file's functions. It's good to have this kind of introduction but it really belongs ot the cover letter not the individual patches. This patch affects the following function(s): 1) csum_tree_block 2) csum_dirty_buffer 3) check_tree_block_fsid 4) btrfs_find_tree_block 5) clean_tree_block Now that I see that, I'm not sure that my previous comment about 'one patch per function' is the right way to go. This patch looks good as it stands. The change is simple enough that I won't be opposed to grouping even more functions together as long as it stays revieweable. The patches are likely to clash with a lot of pending patches, so you may want to base it on the integration branch next time. This would make maintainers' life easier and also raises chances to merge the patches. Reviewed-by: David Sterba dste...@suse.cz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: delayed-inode: replace root args iff only fs_info used
This is the second independent patch of a larger project to cleanup btrfs's internal usage of btrfs_root. Many functions take btrfs_root only to grab the fs_info struct. By requiring a root these functions cause programmer overhead. That these functions can accept any valid root is not obvious until inspection. This patch reduces the specificity of such functions to accept the fs_info directly. These patches can be applied independently and thus are not being submitted as a patch series. There should be about 26 patches by the project's completion. Each patch will cleanup between 1 and 34 functions apiece. Each patch covers a single file's functions. This patch affects the following function(s): 1) btrfs_wq_run_delayed_node Signed-off-by: Daniel Dressler danieru.dress...@gmail.com --- fs/btrfs/delayed-inode.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 054577b..e590da6 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1383,7 +1383,7 @@ out: static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root, -struct btrfs_root *root, int nr) +struct btrfs_fs_info *fs_info, int nr) { struct btrfs_async_delayed_work *async_work; @@ -1399,7 +1399,7 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root, btrfs_async_run_delayed_root, NULL, NULL); async_work-nr = nr; - btrfs_queue_work(root-fs_info-delayed_workers, async_work-work); + btrfs_queue_work(fs_info-delayed_workers, async_work-work); return 0; } @@ -1426,6 +1426,7 @@ static int could_end_wait(struct btrfs_delayed_root *delayed_root, int seq) void btrfs_balance_delayed_items(struct btrfs_root *root) { struct btrfs_delayed_root *delayed_root; + struct btrfs_fs_info *fs_info = root-fs_info; delayed_root = btrfs_get_delayed_root(root); @@ -1438,7 +1439,7 @@ void btrfs_balance_delayed_items(struct btrfs_root *root) seq = atomic_read(delayed_root-items_seq); - ret = btrfs_wq_run_delayed_node(delayed_root, root, 0); + ret = btrfs_wq_run_delayed_node(delayed_root, fs_info, 0); if (ret) return; @@ -1447,7 +1448,7 @@ void btrfs_balance_delayed_items(struct btrfs_root *root) return; } - btrfs_wq_run_delayed_node(delayed_root, root, BTRFS_DELAYED_BATCH); + btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH); } /* Will return 0 or -ENOMEM */ -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS messes up snapshot LV with origin
If a UUID is not unique enough how will adding a second UUID or unique drive identifier help? A UUID only serves any purpose when it is unique. Thus duplicate UUIDs are themselves a failure state. The solution should be to make it harder to get into this failure state. Not to make all programs resilient against running under this failure state. It isn't a btrfs bug that it requires Universal Unique IDs to be universally unique. Daniel 2014-11-17 15:59 GMT+09:00 Brendan Hide bren...@swiftspirit.co.za: cc'd bug-g...@gnu.org for FYI On 2014/11/17 03:42, Duncan wrote: MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted: Hello guys, I think you'll like this... https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429 UUID is an initialism for Universally Unique IDentifier.[1] If the UUID isn't unique, by definition, then, it can't be a UUID, and that's a bug in whatever is making the non-unique would-be UUID that isn't unique and thus cannot be a universally unique ID. In this case that would appear to be LVM. Perhaps the right question to ask is Where should this bug be fixed?. TL;DR: This needs more thought and input from btrfs devs. To LVM, the bug is likely seen as being out of scope. The correct fix probably lies in the ecosystem design, which requires co-operation from btrfs. Making a snapshot in LVM is a fundamental thing - and I feel LVM, in making its snapshot, is doing its job exactly as expected. Additionally, there are other ways to get to a similar state without LVM: ddrescue backup, SAN snapshot, old missing disk re-introduced, etc. That leaves two places where this can be fixed: grub and btrfs Grub is already a little smart here - it avoids snapshots. But in this case it is relying on the UUID and only finding it in the snapshot. So possibly this is a bug in grub affecting the bug reporter specifically - but perhaps the bug is in btrfs where grub is relying on btrfs code. Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice that is left to the user/admin/distro. I don't think saying LVM snapshots are incompatible with btrfs is the right way to go either. That leaves two aspects of this issue which I view as two separate bugs: a) Btrfs cannot gracefully handle separate filesystems that have the same UUID. At all. b) Grub appears to pick the wrong filesystem when presented with two filesystems with the same UUID. I feel a) is a btrfs bug. I feel b) is a bug that is more about ecosystem design than grub being silly. I imagine a couple of aspects that could help fix a): - Utilise a unique drive identifier in the btrfs metadata (surely this exists already?). This way, any two filesystems will always have different drive identifiers *except* in cases like a ddrescue'd copy or a block-level snapshot. This will provide a sensible mechanism for defined behaviour, preventing corruption - even if that defined behaviour is to simply give out lots of PEBKAC errors and panic. - Utilise a drive list to ensure that two unrelated filesystems with the same UUID cannot get mixed up. Yes, the user/admin would likely be the culprit here (perhaps a VM rollout process that always gives out the same UUID in all its filesystems). Again, does btrfs not already have something like this built-in that we're simply not utilising fully? I'm not exactly sure of the correct way to fix b) except that I imagine it would be trivial to fix once a) is fixed. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: qgroup: add BUILD_BUG to report pointer cast breakage
Our ulist data structure stores at max 64bit values. qgroup has used this structure to store pointers. In the future when we upgrade to 128bit this casting of pointers to uint64_t will break. This patch adds a BUILD_BUG ensuring that this code will not be left untouched in the upgrade. It also marks this issue on the TODO list so it may be addressed before such an upgrade. Signed-off-by: Daniel Dressler danieru.dress...@gmail.com --- fs/btrfs/qgroup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 48b60db..87f7c98 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -44,6 +44,7 @@ * - caches fuer ulists * - performance benchmarks * - check all ioctl parameters + * - do not cast uintptr_t to uint64_t in ulist usage */ /* @@ -101,6 +102,7 @@ struct btrfs_qgroup_list { #define ptr_to_u64(x) ((u64)(uintptr_t)x) #define u64_to_ptr(x) ((struct btrfs_qgroup *)(uintptr_t)x) +BUILD_BUG_ON(UINTPTR_MAX UINT64_MAX); static int qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: qgroup: add BUILD_BUG to report pointer cast breakage
I am very very sorry, I forgot to even test building. Please pretend this patch was never submitted. Daniel 2014-11-13 0:00 GMT+09:00 Daniel Dressler danieru.dress...@gmail.com: Our ulist data structure stores at max 64bit values. qgroup has used this structure to store pointers. In the future when we upgrade to 128bit this casting of pointers to uint64_t will break. This patch adds a BUILD_BUG ensuring that this code will not be left untouched in the upgrade. It also marks this issue on the TODO list so it may be addressed before such an upgrade. Signed-off-by: Daniel Dressler danieru.dress...@gmail.com --- fs/btrfs/qgroup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 48b60db..87f7c98 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -44,6 +44,7 @@ * - caches fuer ulists * - performance benchmarks * - check all ioctl parameters + * - do not cast uintptr_t to uint64_t in ulist usage */ /* @@ -101,6 +102,7 @@ struct btrfs_qgroup_list { #define ptr_to_u64(x) ((u64)(uintptr_t)x) #define u64_to_ptr(x) ((struct btrfs_qgroup *)(uintptr_t)x) +BUILD_BUG_ON(UINTPTR_MAX UINT64_MAX); static int qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: qgroup: add BUILD_BUG to report pointer cast breakage
Our ulist data structure stores at max 64bit values. qgroup has used this structure to store pointers. In the future when we upgrade to 128bit this casting of pointers to uint64_t will break. This patch adds a BUILD_BUG ensuring that this code will not be left untouched in the upgrade. It also marks this issue on the TODO list so it may be addressed before such an upgrade. Signed-off-by: Daniel Dressler danieru.dress...@gmail.com --- fs/btrfs/qgroup.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 48b60db..a9a4cab 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -44,6 +44,7 @@ * - caches fuer ulists * - performance benchmarks * - check all ioctl parameters + * - do not cast uintptr_t to uint64_t in ulist usage */ /* @@ -99,8 +100,12 @@ struct btrfs_qgroup_list { struct btrfs_qgroup *member; }; -#define ptr_to_u64(x) ((u64)(uintptr_t)x) -#define u64_to_ptr(x) ((struct btrfs_qgroup *)(uintptr_t)x) +#define ptr_to_u64(x) \ + (BUILD_BUG_ON_ZERO(sizeof(uintptr_t) sizeof(u64)) + \ + ((u64)(uintptr_t)x)) +#define u64_to_ptr(x) \ + (BUILD_BUG_ON_ZERO(sizeof(uintptr_t) sizeof(u64)) + \ + ((struct btrfs_qgroup *)(uintptr_t)x)) static int qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid, -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is it safe to refactor struct btrfs_root *root out of these functions?
Hi I'm gearing up to tackle the Pass fs_info instead of root project suggested on the wiki. I've read through the entire codebase and made note of 102 functions which could be refactored. Three of these do not make any use of their root argument at all, is it safe to refactor these as well? Namely: btrfs_block_rsv_check : http://lxr.free-electrons.com/source/fs/btrfs/extent-tree.c#L4743 copy_to_sk : http://lxr.free-electrons.com/source/fs/btrfs/ioctl.c#L1931 wait_for_commit : http://lxr.free-electrons.com/source/fs/btrfs/transaction.c#L597 None of these function's users make indirect calls through function pointers. Is it safe to refactor them? I ask because it seems strange they would have unused arguments and I'm worried there might be a reason I've missed. Daniel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: ctree: reduce args where only fs_info used
This patch is part of a larger project to cleanup btrfs's internal usage of struct btrfs_root. Many functions take btrfs_root only to grab a pointer to fs_info. This causes programmers to ponder which root can be passed. Since only the fs_info is read affected functions can accept any root, except this is only obvious upon inspection. This patch reduces the specificty of such functions to accept the fs_info directly. This patch does not address the two functions in ctree.c (insert_ptr, and split_item) which only use root for BUG_ONs in ctree.c This patch affects the following functions: 1) fixup_low_keys 2) btrfs_set_item_key_safe Signed-off-by: Daniel Dressler danieru.dress...@gmail.com --- fs/btrfs/ctree.c | 27 +++ fs/btrfs/ctree.h | 3 ++- fs/btrfs/file-item.c | 2 +- fs/btrfs/file.c | 8 4 files changed, 22 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 19bc616..db5a60f 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -3139,7 +3139,8 @@ again: * higher levels * */ -static void fixup_low_keys(struct btrfs_root *root, struct btrfs_path *path, +static void fixup_low_keys(struct btrfs_fs_info *fs_info, + struct btrfs_path *path, struct btrfs_disk_key *key, int level) { int i; @@ -3150,7 +3151,7 @@ static void fixup_low_keys(struct btrfs_root *root, struct btrfs_path *path, if (!path-nodes[i]) break; t = path-nodes[i]; - tree_mod_log_set_node_key(root-fs_info, t, tslot, 1); + tree_mod_log_set_node_key(fs_info, t, tslot, 1); btrfs_set_node_key(t, key, tslot); btrfs_mark_buffer_dirty(path-nodes[i]); if (tslot != 0) @@ -3164,7 +3165,8 @@ static void fixup_low_keys(struct btrfs_root *root, struct btrfs_path *path, * This function isn't completely safe. It's the caller's responsibility * that the new key won't break the order */ -void btrfs_set_item_key_safe(struct btrfs_root *root, struct btrfs_path *path, +void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info, +struct btrfs_path *path, struct btrfs_key *new_key) { struct btrfs_disk_key disk_key; @@ -3186,7 +3188,7 @@ void btrfs_set_item_key_safe(struct btrfs_root *root, struct btrfs_path *path, btrfs_set_item_key(eb, disk_key, slot); btrfs_mark_buffer_dirty(eb); if (slot == 0) - fixup_low_keys(root, path, disk_key, 1); + fixup_low_keys(fs_info, path, disk_key, 1); } /* @@ -3944,7 +3946,7 @@ static noinline int __push_leaf_left(struct btrfs_trans_handle *trans, clean_tree_block(trans, root, right); btrfs_item_key(right, disk_key, 0); - fixup_low_keys(root, path, disk_key, 1); + fixup_low_keys(root-fs_info, path, disk_key, 1); /* then fixup the leaf pointer in the path */ if (path-slots[0] push_items) { @@ -4181,6 +4183,7 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans, int mid; int slot; struct extent_buffer *right; + struct btrfs_fs_info *fs_info = root-fs_info; int ret = 0; int wret; int split; @@ -4284,10 +4287,10 @@ again: btrfs_set_header_backref_rev(right, BTRFS_MIXED_BACKREF_REV); btrfs_set_header_owner(right, root-root_key.objectid); btrfs_set_header_level(right, 0); - write_extent_buffer(right, root-fs_info-fsid, + write_extent_buffer(right, fs_info-fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE); - write_extent_buffer(right, root-fs_info-chunk_tree_uuid, + write_extent_buffer(right, fs_info-chunk_tree_uuid, btrfs_header_chunk_tree_uuid(right), BTRFS_UUID_SIZE); @@ -4310,7 +4313,7 @@ again: path-nodes[0] = right; path-slots[0] = 0; if (path-slots[1] == 0) - fixup_low_keys(root, path, disk_key, 1); + fixup_low_keys(fs_info, path, disk_key, 1); } btrfs_mark_buffer_dirty(right); return ret; @@ -4626,7 +4629,7 @@ void btrfs_truncate_item(struct btrfs_root *root, struct btrfs_path *path, btrfs_set_disk_key_offset(disk_key, offset + size_diff); btrfs_set_item_key(leaf, disk_key, slot); if (slot == 0) - fixup_low_keys(root, path, disk_key, 1); + fixup_low_keys(root-fs_info, path, disk_key, 1); } item = btrfs_item_nr(slot); @@ -4727,7 +4730,7 @@ void setup_items_for_insert(struct btrfs_root *root, struct btrfs_path *path, if (path-slots[0] == 0
bad areas cause btrfs segfault
I've got a couple of directories that cause a btrfs segfault. First one happened at the end of July and I just renamed it to get it out of my way (can't delete it without crashing); the second one just happened and I'll be discarding the filesystem. This crash when touched behavior is frustrating because it makes it iffy to back up everything else. Usually about the second attempt to touch the bad directory requires a reboot. Instead, I would prefer that the filesystem not crash the whole system when it encounters a corrupted area. I've tried btrfs scrub and btrfs check but they don't find anything wrong. I guess the next step would be btrfs restore, but I think I have a good enough backup made with a normal copy skipping the two corrupted directories. Here's my info. $ uname -a Linux cardamom 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux $ btrfs --version Btrfs v3.12 $ btrfs fi show Btrfs v3.12 $ btrfs fi df /home # Replace /home with the mount point of your btrfs-filesystem Data, single: total=110.01GiB, used=108.09GiB System, DUP: total=8.00MiB, used=20.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=3.00GiB, used=2.31GiB Metadata, single: total=8.00MiB, used=0.00 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.13.0-36-generic (buildd@toyol) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 (Ubuntu 3.13.0-36.63-generic 3.13.11.6) [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.13.0-36-generic root=UUID=09bae76d-bf8a-47a7-998d-a929626274c1 ro quiet splash vt.handoff=7 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009ebff] usable [0.00] BIOS-e820: [mem 0x0009ec00-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e4000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xd3f7] usable [0.00] BIOS-e820: [mem 0xd3f8-0xd3f8dfff] ACPI data [0.00] BIOS-e820: [mem 0xd3f8e000-0xd3fc] ACPI NVS [0.00] BIOS-e820: [mem 0xd3fd-0xd3ff] reserved [0.00] BIOS-e820: [mem 0xff70-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x0001abff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.5 present. [0.00] DMI: System manufacturer System Product Name/M3A78-EM, BIOS 180505/19/2009 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] No AGP bridge found [0.00] e820: last_pfn = 0x1ac000 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-E uncachable [0.00] F-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 00 mask FF8000 write-back [0.00] 1 base 008000 mask FFC000 write-back [0.00] 2 base 00C000 mask FFF000 write-back [0.00] 3 base 00D000 mask FFFC00 write-back [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] TOM2: 0001ac00 aka 6848M [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 [0.00] e820: update [mem 0xd400-0x] usable == reserved [0.00] e820: last_pfn = 0xd3f80 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [880ff780] [0.00] Scanning 1 areas for low memory corruption [0.00] Base memory trampoline at [88098000] 98000 size 24576 [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] [mem 0x-0x000f] page 4k [0.00] BRK [0x01fdf000, 0x01fd] PGTABLE [0.00] BRK [0x01fe, 0x01fe0fff] PGTABLE [0.00] BRK [0x01fe1000, 0x01fe1fff] PGTABLE [0.00] init_memory_mapping: [mem 0x1abe0-0x1abff] [0.00] [mem 0x1abe0-0x1abff] page 2M [0.00] BRK [0x01fe2000, 0x01fe2fff] PGTABLE [0.00] init_memory_mapping: [mem 0x1a800-0x1abdf] [0.00] [mem 0x1a800-0x1abdf] page 2M [0.00] init_memory_mapping: [mem 0x18000-0x1a7ff] [0.00] [mem 0x18000-0x1a7ff] page 2M [0.00] init_memory_mapping: [mem 0x0010-0xd3f7] [0.00] [mem 0x0010-0x001f] page 4k [
Re: bad areas cause btrfs segfault
Thanks On Sun, Sep 28, 2014 at 9:38 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Hi, This bug seems to be one reported bug before: http://article.gmane.org/gmane.comp.file-systems.btrfs/33270 And Chris has already updated the 3.13 stable branch to fix the bug. If it is OK for you, updating kernel to 3.14 would be a solution. (Since from 3.15, the new btrfs workqueue implementation caused some bug, and will be fixed in 3.17, 3.15~3.16 is not recommended) Thanks Qu Original Message Subject: bad areas cause btrfs segfault From: Daniel Holth dho...@gmail.com To: linux-btrfs@vger.kernel.org Date: 2014年09月29日 09:11 I've got a couple of directories that cause a btrfs segfault. First one happened at the end of July and I just renamed it to get it out of my way (can't delete it without crashing); the second one just happened and I'll be discarding the filesystem. This crash when touched behavior is frustrating because it makes it iffy to back up everything else. Usually about the second attempt to touch the bad directory requires a reboot. Instead, I would prefer that the filesystem not crash the whole system when it encounters a corrupted area. I've tried btrfs scrub and btrfs check but they don't find anything wrong. I guess the next step would be btrfs restore, but I think I have a good enough backup made with a normal copy skipping the two corrupted directories. Here's my info. $ uname -a Linux cardamom 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux $ btrfs --version Btrfs v3.12 $ btrfs fi show Btrfs v3.12 $ btrfs fi df /home # Replace /home with the mount point of your btrfs-filesystem Data, single: total=110.01GiB, used=108.09GiB System, DUP: total=8.00MiB, used=20.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=3.00GiB, used=2.31GiB Metadata, single: total=8.00MiB, used=0.00 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3.14.18 btrfs_set_item_key_safe BUG
On 3.14.18 with a BTRFS partition mounted noatime,autodefrag,compress=lzo, I see the second assertion in btrfs_set_item_key_safe() trip: void btrfs_set_item_key_safe(struct btrfs_root *root, struct btrfs_path *path, struct btrfs_key *new_key) { struct btrfs_disk_key disk_key; struct extent_buffer *eb; int slot; eb = path-nodes[0]; slot = path-slots[0]; if (slot 0) { btrfs_item_key(eb, disk_key, slot - 1); BUG_ON(comp_keys(disk_key, new_key) = 0); } if (slot btrfs_header_nritems(eb) - 1) { btrfs_item_key(eb, disk_key, slot + 1); BUG_ON(comp_keys(disk_key, new_key) = 0); --- } Full backtrace: kernel BUG at /home/apw/COD/linux/fs/btrfs/ctree.c:3215! invalid opcode: [#1] SMP Modules linked in: nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc bonding psmouse serio_raw joydev video mac_hid lpc_ich lp parport hid_generic usbhid hid bcache btrfs raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor ahci xor async_memcpy libahci async_tx raid1 e1000e ptp pps_core raid0 multipath linear CPU: 0 PID: 6742 Comm: btrfs-endio-wri Not tainted 3.14.18-031418-generic #201409060201 Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0b 09/17/2012 task: 880418609d70 ti: 880121e92000 task.ti: 880121e92000 RIP: 0010:[a01693f1] [a01693f1] btrfs_set_item_key_safe+0x141/0x150 [btrfs] RSP: 0018:880121e93b28 EFLAGS: 00010246 RAX: RBX: 0011 RCX: 3e60 RDX: RSI: 880121e93c67 RDI: 880121e93b07 RBP: 880121e93b88 R08: 1000 R09: 880121e93b48 R10: R11: R12: 88009ce9bcc0 R13: 880121e93c67 R14: 880121e93b47 R15: 8804145f7c60 FS: () GS:88042fc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7ff34a8d1890 CR3: 01c0d000 CR4: 001407f0 Stack: 880121e93b88 880405ca 8803140d1000 d900 6c00f6bf 3e60 880121e93b88 8804145f7c60 88009ce9bcc0 3e5e 0001 0c46 Call Trace: [a01a1868] __btrfs_drop_extents+0x5a8/0xc80 [btrfs] [a0165e00] ? tree_mod_log_free_eb+0x240/0x260 [btrfs] [a0191d6b] insert_reserved_file_extent.constprop.60+0xab/0x310 [btrfs] [a018ee10] ? start_transaction.part.35+0x80/0x540 [btrfs] [a0198565] btrfs_finish_ordered_io+0x465/0x500 [btrfs] [a0198615] finish_ordered_fn+0x15/0x20 [btrfs] [a01bd8f0] worker_loop+0xa0/0x330 [btrfs] [a01bd850] ? check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs] [810930c9] kthread+0xc9/0xe0 [81093000] ? flush_kthread_worker+0xb0/0xb0 [81784abc] ret_from_fork+0x7c/0xb0 [81093000] ? flush_kthread_worker+0xb0/0xb0 Code: 00 00 4c 89 f6 4c 89 e7 48 98 48 8d 04 80 48 8d 54 80 65 e8 b2 6c 04 00 4c 89 ee 4c 89 f7 e8 d7 f4 ff ff 85 c0 0f 8f 5c ff ff ff 0f 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 After rebooting, btrfs check (btrfs-tools 3.14.1-1) shows: checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 5 inode 16170969 errors 80, file extent overlap root 5 inode 17592262 errors 100, file extent discount found 752124326140 bytes used err is 1 total csum bytes: 2415994160 total tree bytes: 18200276992 total fs tree bytes: 14156120064 total extent tree bytes: 1240526848 btree space waste bytes: 2998597745 file data blocks allocated: 2473980772352 referenced 2731118456832 Is it better to not trust compression, autodefrag, or is this filesystem corruption from previous issues, so I should rebuild the FS? Thanks, Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2
Thank you Hugo! Amazing. It almost work all the way, According to some tests I did, echo 2 /proc/cpu/alignment does allow in fact btrfs receive to work in most cases. For the tests, a x86_64 for send, a armv5tel for receive and 2 subvolumes (one with just a few data and binary files and the other a full root partition) were used. The send blobs were md5sum and verified at receive side matched. The small blob was properly process by btrfs receive (file sha1s and metadata all matched). The big blob with the root partition did partially succeeded as it ended abruptly with ERROR: lsetxattr var/log/journal system.posix_acl_default=. failed. Operation not supported. I checked a few restored files and their sha1 and metadata matched. Daniel On 08/19/14 15:22, Hugo Mills wrote: On Tue, Aug 19, 2014 at 03:10:55PM -0700, Zach Brown wrote: On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote: Hello list, I want to use an ARM kirkwood based NSA325v2 NAS (dubbed Receiver) for receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop running kubuntu 14.04 LTS (dubbed Source), storing them on a 3TB WD red disk (having GPT label, partitions created with parted). But all the btrfs receive commands on 'Receiver' fail soon with e.g.: ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File too large ... and that stops reception/snapshot creation. ... Increasing the verbosity with -v -v for btrfs receive shows the following differences between receive operations on 'Receiver' and 'OtherHost', both of them using the identical inputfile /boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send * the chown and chmod operations are different - resulting in weird/wrong permissions and sizes on 'Receiver' side. * what's stransid, this is the first line that differs This is interesting, thanks for going to the trouble to show those diffs. That the commands and strings match up show us that the basic tlv header chaining is working. But the u64 attribute values are sometimes messed up. And messed up in a specific way. A variable number of low order bytes are magically appearing. (gdb) print/x 11709972488 $2 = 0x2b9f80008 (gdb) print/x 178680 $3 = 0x2b9f8 (gdb) print/x 588032 $6 = 0x8f900 (gdb) print/x 2297 $7 = 0x8f9 Some light googling makes me think that the Marvell Kirkwood is not friendly at all to unaligned accesses. ARM isn't in general -- it never has been, even 20 years ago in the ARM3 days when I was writing code in ARM assembler. We've been bitten by this before in btrfs (mkfs on ARM works, mounting it fails fast, because userspace has a trap to fix unaligned accesses, and the kernel doesn't). The (biting tongue) send and receive code is playing some games with casting aligned and unaligned pointers. Maybe that's upsetting the arm toolchain/kirkwood. Almost certainly the toolchain isn't identifying the unaligned accesses, and thus building code that uses them causes stuff to break. There's a workaround for userspace that you can use to verify that this is indeed the problem: echo 2 /proc/cpu/alignment will tell the kernel to fix up unaligned accesses initiated in userspace. It's a performance killer, but it should serve to identify whether the problem is actually this. Hugo. Does this completely untested patch to btrfs-progs, to be run on the receiver, do anything? - z diff --git a/send-stream.c b/send-stream.c index 88e18e2..4f8dd83 100644 --- a/send-stream.c +++ b/send-stream.c @@ -204,7 +204,7 @@ out: int __len; \ TLV_GET(s, attr, (void**)__tmp, __len); \ TLV_CHECK_LEN(sizeof(*__tmp), __len); \ - *v = le##bits##_to_cpu(*__tmp); \ + *v = get_unaligned_le##bits(__tmp); \ } while (0) #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3.15 btrfs free space cache oops
When running MonetDB against a BTRFS RAID-0 set over 4 SSDs [1] on 3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal pagefault in memcpy(): (gdb) list *(__btrfs_write_out_cache+0x3e4) 0x81365984 is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:521). 516if (io_ctl-index = io_ctl-num_pages) 517return -ENOSPC; 518io_ctl_map_page(io_ctl, 0); 519} 520 521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE); 522io_ctl_set_crc(io_ctl, io_ctl-index - 1); 523if (io_ctl-index io_ctl-num_pages) 524io_ctl_map_page(io_ctl, 0); 525return 0; I can try to reproduce it if more data is useful? Thanks, Daniel -- [1] mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata /dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2 mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread -- [2] BUG: unable to handle kernel paging request at 0020 IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 PGD 3bca02c067 PUD 3bcf5fb067 PMD 0 Oops: [#1] SMP Modules linked in: CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7 Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013 task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000 RIP: 0010:[8135a374] [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 RSP: 0018:8809aefcfc40 EFLAGS: 00010246 RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200 RDX: 1000 RSI: 0020 RDI: 884fb9321000 RBP: 8809aefcfd48 R08: 0200 R09: R10: R11: 884fb9320ffc R12: 8831e3303740 R13: 880100579970 R14: 880bb38061c0 R15: 0020 FS: 7fb9447ed700() GS:884bbfc8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0020 CR3: 00329b71c000 CR4: 000407e0 Stack: 8809aefcfc90 0011 000e 884fbbc2c870 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000 Call Trace: [81a75b4b] ? _raw_spin_lock+0xb/0x20 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670 [813199c5] commit_cowonly_roots+0x195/0x250 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70 [8132b6b2] btrfs_sync_file+0x182/0x2a0 [8114a450] do_fsync+0x50/0x80 [8114a6de] SyS_fdatasync+0xe/0x20 [81a766e6] system_call_fastpath+0x1a/0x1f Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85 fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3 48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 RSP 8809aefcfc40 CR2: 0020 -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3.15 btrfs free space cache oops
When running MonetDB over a BTRFS RAID-0 set over 4 SSDs [1] on 3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal pagefault in memcpy(): (gdb) list *(__btrfs_write_out_cache+0x3e4) 0x81365984 is in __btrfs_write_out_cache (fs/btrfs/free-space-cache.c:521). 516if (io_ctl-index = io_ctl-num_pages) 517return -ENOSPC; 518io_ctl_map_page(io_ctl, 0); 519} 520 521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE); 522io_ctl_set_crc(io_ctl, io_ctl-index - 1); 523if (io_ctl-index io_ctl-num_pages) 524io_ctl_map_page(io_ctl, 0); 525return 0; I can try to reproduce it if more data is useful? Thanks, Daniel -- [1] mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata /dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2 mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread -- [2] BUG: unable to handle kernel paging request at 0020 IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 PGD 3bca02c067 PUD 3bcf5fb067 PMD 0 Oops: [#1] SMP Modules linked in: CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7 Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013 task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000 RIP: 0010:[8135a374] [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 RSP: 0018:8809aefcfc40 EFLAGS: 00010246 RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200 RDX: 1000 RSI: 0020 RDI: 884fb9321000 RBP: 8809aefcfd48 R08: 0200 R09: R10: R11: 884fb9320ffc R12: 8831e3303740 R13: 880100579970 R14: 880bb38061c0 R15: 0020 FS: 7fb9447ed700() GS:884bbfc8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0020 CR3: 00329b71c000 CR4: 000407e0 Stack: 8809aefcfc90 0011 000e 884fbbc2c870 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000 Call Trace: [81a75b4b] ? _raw_spin_lock+0xb/0x20 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670 [813199c5] commit_cowonly_roots+0x195/0x250 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70 [8132b6b2] btrfs_sync_file+0x182/0x2a0 [8114a450] do_fsync+0x50/0x80 [8114a6de] SyS_fdatasync+0xe/0x20 [81a766e6] system_call_fastpath+0x1a/0x1f Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85 fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3 48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0 RSP 8809aefcfc40 CR2: 0020 -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions on incremental backups
On 07/18/14 06:40, Russell Coker wrote: Displaying backups is an issue of backup software. It is above the level that BTRFS development touches. While people here can probably offer generic advice on backup software it's not the topic of the list. As said, I don't mind developing the software. But, is the required information easily available? Is there a way to get a diff, something like a list of changed/added/removed files between snapshots? Your usual diff utility will do it. I guess you could parse the output of btrfs send. Following this thought, one step closer in getting a text diff can be to use fardump. It takes a btrfs send binary stream and outputs the send instructions in plaintext. (https://kernel.googlesource.com/pub/scm/linux/kernel/git/arne/far-progs). It certainly would be awesome if btrfs-progs could have an extra parameter to just generate the list of changed/added/removed files between snapshots as all the needed infrastructure is already in place. And, finally, nobody has mentioned on the possibility of merging multiple snapshots into a single snapshot. Would this be possible, to create a snapshot that contains the most recent version of each file present across all of the snapshots (including files which may be present in only one of the snapshots)? There is no btrfs functionality for that. But I'm sure you could do something with standard Unix utilities and copying files around. Sure, but the management of data deduplication is left to the user (presumably using cp --reflink) which is not trivial. Does anybody knows how safe it is to use duperemove or bedup? Any recommendations on how to effectively deduplicate btrfs at this point? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs data dup on single device?
Will it be possible to use DUP for data as well as for metadata on a single device? And if so, am I going to be able to specify more than 1 copy of the data? Storage is pretty cheap now, and to have multiple copies in btrfs is something that I think could be used a lot. I know I will use multiple copies of my data if made possible. Is it something that might be available when RAID1 gets N mirrors instead of just 1 mirror? Daniel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs data dup on single device?
It'll be exactly 2 copies at the moment. Note that performance on an SSD will at least halve, and performance on a rotational device will probably suck quite badly. Neither will help you in the case of a full-device failure. You still need backups, kept on a separate machine. Write performance, sure, but reads shouldn't be that much slower? For DUP on same device I was thinking about family photos, source code and such, not for compiles or databases with a lot of queries. Of course you need backups, offsite backups.. I had a fire a couple of years ago, and, well.. If the second machine also is in the vicinity.. We were lucky this time, but a couple of more minutes and all would have been lost. Got me thinking a bit more. The question is, why? If you have enough disk media errors to make it worth using multiple copies, then your storage device is basically broken and needs replacing, and it can't really be relied on for very much longer. I was thinking that DUP on same device was mostly for protection against bit rot and smaller errors, not device failure. If the device starts to misbehave, it might be enough to rescue the data to another device if you have DUPes. Ok, a backup will probably help there too. I'm putting together a new server at home, and want the checksums in btrfs, and multiple copies of the important data. As I understand it it's better than RAID6 that I used earlier, which has it's own set of problems. And multiple offsite backups. I'll try and see if it's possible to use DUP for data on same device, when I looked around it seemed as it wasn't possible. Hugo. Daniel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on whole disk (no partitions)
2014-06-19 2:07 GMT+02:00 Russell Coker russ...@coker.com.au: For boot disks I use the traditional partitioning system. So far I don't run any systems that have a boot disk larger than 2TB so I haven't needed to use GPT. I have a BTRFS RAID-1 on 2*3TB disks which have no partition tables, when the filesystem is going to use the entire device and there's no boot loader there is no reason to have a partition table. ok, but what about alignment? This can have a significant impact on performance. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on whole disk (no partitions)
2014-06-19 11:11 GMT+02:00 Imran Geriskovan imran.gerisko...@gmail.com: On 6/19/14, Russell Coker russ...@coker.com.au wrote: Grub installs itself and boots from Partitionless Btrfs disk. It is handy for straight forward installations. However, IF you need boot partition (ie. initramfs and kernel to boot from encrypted root) its another story. zfs solved this problem in grub (libzfs). I think we can find a solution to work around this problem. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on whole disk (no partitions)
Hi, I created btrfs directly to disk using such a scheme (no partitions): dd if=/dev/zero of=/dev/sda bs=4096 mkfs.btrfs -L dev_sda /dev/sda mount /dev/sda /mnt cd /mnt btrfs subvolume create __active btrfs subvolume create __active/rootvol btrfs subvolume create __active/usr btrfs subvolume create __active/home btrfs subvolume create __active/var btrfs subvolume create __snapshots cd / umount /mnt mount -o subvol=__active/rootvol /dev/sda /mnt mkdir /mnt/{usr,home,var} mount -o subvol=__active/usr /dev/sda /mnt/usr mount -o subvol=__active/home /dev/sda /mnt/home mount -o subvol=__active/var /dev/sda /mnt/var # /etc/fstab UID=ID/btrfs rw,relative,space_cache,subvol=__active/rootvol0 0 UUID=ID/usrbtrfs rw,relative,space_cache,subvol=__active/usr0 0 UUID=ID/homebtrfs rw,relative,space_cache,subvol=__active/home0 0 UUID=ID/varbtrfs rw,relative,space_cache,subvol=__active/var0 0 Everything works fine. Is such a solution is recommended? In my opinion, the creation of the partitions seems to be completely unnecessary if you can use btrfs. I will be grateful for your feedback. Best regards, Daniel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is metadata redundant over more than one drive with raid0 too?
On 05/04/2014 12:24 AM, Marc MERLIN wrote: Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects against metadata corruption or a single block loss, but otherwise if you lost a drive in a 2 drive raid0, you'll have lost more than just half your files. The scenario you mentioned at the beginning, if I lose a drive, I'll still have full metadata for the entire filesystem and only missing files is more applicable to using -m raid1 -d single. Single is not geared towards performance and, though it doesn't guarantee a file is only on a single disk, the allocation does mean that the majority of all files smaller than a chunk will be stored on only one disk or the other - not both. Ok, so in other words: -d raid0: if you one 1 drive out of 2, you may end up with small files and the rest will be lost -d single: you're more likely to have files be on one drive or the other, although there is no guarantee there either. Correct? Thanks, Marc This often seems to confuse people and I think there is a common misconception that the btrfs raid/single/dup features work at the file level when in reality they work at a level closer to lvm/md. If someone told you that they lost a device out of a jbod or multi disk lvm group(somewhat analogous to -d single) with ext on top you would expect them to lose data in any file that had a fragment in the lost region (lets ignore metadata for a moment). This is potentially up to 100% of the files but this should not be a surprising result. Similarly, someone who has lost a disk out of a md/lvm raid0 volume should not be surprised to have a hard time recovering any data at all from it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Which companies are using Btrfs in production?
On 04/23/2014 06:19 PM, Marc MERLIN wrote: Oh while we're at it, are there companies that can say they are using btrfs in production? Marc Netgear uses BTRFS as the filesystem in their refreshed ReadyNAS line. They apparently use Oracle's linux distro so I assume they're relying on them to do most of the heavy lifting as far as support BTRFS and backporting goes since they're still on 3.0! They also have raid5/6 support so they are probably running BTRFS on top of md. http://www.netgear.com/images/BTRFS%20on%20ReadyNAS%20OS%206_9May1318-76105.pdf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
3.13.5 btrfs read() oops
With kernel 3.13.5 (Ubuntu mainline), when plugging in a (evidently twitchy) USB3 stick with a BTRFS filesystem, I hit an oops in read() [1]. Full dmesg output is at: http://quora.org/2014/btrfs-oops.txt Thanks, Daniel -- [1] IP: 0010:[8135eaf6] [8135eaf6] memcpy+0x6/0x110 RSP: 0018:88025fa1b910 EFLAGS: 00010207 RAX: 88005c3d906e RBX: 027e RCX: 027e RDX: 027e RSI: 00050800 RDI: 88005c3d906e RBP: 88025fa1b948 R08: 1000 R09: 88025fa1b918 R10: R11: R12: 8800560e6350 R13: 1600 R14: 88005c3d92ec R15: 027e FS: 7f9272f79700() GS:88026f3c() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f9264010018 CR3: 00025f79a000 CR4: 001407e0 Stack: a036401c 1000 8800837f3800 8801e041a000 8800763df218 880064c8c4c0 88025fa1ba08 a0348f9c 0f18 1000 Call Trace: [a036401c] ? read_extent_buffer+0xbc/0x110 [btrfs] [a0348f9c] btrfs_get_extent+0x91c/0x970 [btrfs] [a0360217] __do_readpage+0x357/0x730 [btrfs] [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs] [a0360972] __extent_readpages.constprop.41+0x2a2/0x2c0 [btrfs] [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs] [a03627f6] extent_readpages+0x1b6/0x1c0 [btrfs] [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs] [81192f03] ? alloc_pages_current+0xa3/0x160 [a03467df] btrfs_readpages+0x1f/0x30 [btrfs] [811578d9] __do_page_cache_readahead+0x1b9/0x270 [81157dd2] ondemand_readahead+0x152/0x2a0 [81157f51] page_cache_sync_readahead+0x31/0x50 [8114d655] generic_file_aio_read+0x4c5/0x700 [811b671a] do_sync_read+0x5a/0x90 [811b6db5] vfs_read+0x95/0x160 [811b78c9] SyS_read+0x49/0xa0 [81715bff] tracesys+0xe1/0xe6 -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering from hard disk failure in a pool
On 02/14/2014 03:04 AM, Axelle wrote: Hi Hugo, Thanks for your answer. Unfortunately, I had also tried sudo mount -o degraded /dev/sdc1 /samples mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so and dmesg says: [ 1177.695773] btrfs: open_ctree failed [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 1 transid 31105 /dev/sdc6 [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 1 transid 31105 /dev/sdc6 [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 4013.408280] btrfs: allowing degraded mounts [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, gen 0 [ 4015.600424] Btrfs: too many missing devices, writeable mount is not allowed [ 4015.630841] btrfs: open_ctree failed Did the crashed /dev/sdb have more than 1 partitions in your raid1 filesystem? Yes, I know, I'll probably be losing a lot of data, but it's not too much my concern because I had a backup (sooo happy about that :D). If I can manage to recover a little more on the btrfs volume it's bonus, but in the event I do not, I'll be using my backup. So, how do I fix my volume? I guess there would be a solution apart from scratching/deleting everything and starting again... Regards, Axelle On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills h...@carfax.org.uk wrote: On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote: Hi, I've just encountered a hard disk crash in one of my btrfs pools. sudo btrfs filesystem show failed to open /dev/sr0: No medium found Label: none uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add Total devices 3 FS bytes used 112.70GB devid1 size 100.61GB used 89.26GB path /dev/sdc6 devid2 size 93.13GB used 84.00GB path /dev/sdc1 *** Some devices missing The device which is missing is /dev/sdb. I have replaced it with a new hard disk. How do I add it back to the volume and fix the device missing? The pool is expected to mount to /samples (it is not mounted yet). I tried this - which fails: sudo btrfs device add /dev/sdb /samples ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device Why isn't this working? Because it's not mounted. :) I also tried this: sudo mount -o recovery /dev/sdc1 /samples mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so same with /dev/sdc6 Close, but what you want here is: mount -o degraded /dev/sdc1 /samples not recovery. That will tell the FS that there's a missing disk, and it should mount without complaining. If your data is not RAID-1 or RAID-10, then you will almost certainly have lost some data. At that point, since you've removed the dead disk, you can do: btrfs device delete missing /samples which forcibly removes the record of the missing device. Then you can add the new device: btrfs device add /dev/sdb /samples And finally balance to repair the RAID: btrfs balance start /samples It's worth noting that even if you have RAID-1 data and metadata, losing /dev/sdc in your current configuration is likely to cause severe data loss -- probably making the whole FS unrecoverable. This is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices, and will happily put both copies of a piece of RAID-1 data (or metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore wouldn't recommend running like that for very long. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- All hope abandon, Ye who press Enter here. --- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering from hard disk failure in a pool
On 02/14/2014 07:22 AM, Axelle wrote: Did the crashed /dev/sdb have more than 1 partitions in your raid1 filesystem? No, only 1 - as far as I recall. -- Axelle. What does: btrfs filesystem df /samples say now that you've mounted the fs readonly? On Fri, Feb 14, 2014 at 3:58 PM, Daniel Lee longinu...@gmail.com wrote: On 02/14/2014 03:04 AM, Axelle wrote: Hi Hugo, Thanks for your answer. Unfortunately, I had also tried sudo mount -o degraded /dev/sdc1 /samples mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so and dmesg says: [ 1177.695773] btrfs: open_ctree failed [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 1 transid 31105 /dev/sdc6 [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 1 transid 31105 /dev/sdc6 [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 4013.408280] btrfs: allowing degraded mounts [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, gen 0 [ 4015.600424] Btrfs: too many missing devices, writeable mount is not allowed [ 4015.630841] btrfs: open_ctree failed Did the crashed /dev/sdb have more than 1 partitions in your raid1 filesystem? Yes, I know, I'll probably be losing a lot of data, but it's not too much my concern because I had a backup (sooo happy about that :D). If I can manage to recover a little more on the btrfs volume it's bonus, but in the event I do not, I'll be using my backup. So, how do I fix my volume? I guess there would be a solution apart from scratching/deleting everything and starting again... Regards, Axelle On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills h...@carfax.org.uk wrote: On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote: Hi, I've just encountered a hard disk crash in one of my btrfs pools. sudo btrfs filesystem show failed to open /dev/sr0: No medium found Label: none uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add Total devices 3 FS bytes used 112.70GB devid1 size 100.61GB used 89.26GB path /dev/sdc6 devid2 size 93.13GB used 84.00GB path /dev/sdc1 *** Some devices missing The device which is missing is /dev/sdb. I have replaced it with a new hard disk. How do I add it back to the volume and fix the device missing? The pool is expected to mount to /samples (it is not mounted yet). I tried this - which fails: sudo btrfs device add /dev/sdb /samples ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device Why isn't this working? Because it's not mounted. :) I also tried this: sudo mount -o recovery /dev/sdc1 /samples mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so same with /dev/sdc6 Close, but what you want here is: mount -o degraded /dev/sdc1 /samples not recovery. That will tell the FS that there's a missing disk, and it should mount without complaining. If your data is not RAID-1 or RAID-10, then you will almost certainly have lost some data. At that point, since you've removed the dead disk, you can do: btrfs device delete missing /samples which forcibly removes the record of the missing device. Then you can add the new device: btrfs device add /dev/sdb /samples And finally balance to repair the RAID: btrfs balance start /samples It's worth noting that even if you have RAID-1 data and metadata, losing /dev/sdc in your current configuration is likely to cause severe data loss -- probably making the whole FS unrecoverable. This is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices, and will happily put both copies of a piece of RAID-1 data (or metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore wouldn't recommend running like that for very long. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- All hope abandon, Ye who press Enter here. --- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs
Re: Recovering from hard disk failure in a pool
On 02/14/2014 09:53 AM, Axelle wrote: Hi Daniel, This is what it answers now: sudo btrfs filesystem df /samples [sudo] password for axelle: Data, RAID0: total=252.00GB, used=108.99GB System, RAID1: total=8.00MB, used=28.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=5.25GB, used=3.71GB So the issue here is that your data is raid0 which will not tolerate any loss of a device. I'd recommend trashing the current filesystem and creating a new one with some redundancy (use raid1 not raid0, don't add more than one partition from the same disk to a btrfs filesystem, etc.) so you can recover from this sort of scenario in the future. To do this, use wipefs on the remaining partitions to remove all traces of the current btrfs filesystem. By the way, I was happy to recover most of my data :) This is the nice thing about the checksumming in btrfs, knowing that what data you did read off is correct. :) Of course, I still can't add my new /dev/sdb to /samples because it's read-only: sudo btrfs device add /dev/sdb /samples ERROR: error adding the device '/dev/sdb' - Read-only file system Regards Axelle On Fri, Feb 14, 2014 at 5:19 PM, Daniel Lee longinu...@gmail.com wrote: On 02/14/2014 07:22 AM, Axelle wrote: Did the crashed /dev/sdb have more than 1 partitions in your raid1 filesystem? No, only 1 - as far as I recall. -- Axelle. What does: btrfs filesystem df /samples say now that you've mounted the fs readonly? On Fri, Feb 14, 2014 at 3:58 PM, Daniel Lee longinu...@gmail.com wrote: On 02/14/2014 03:04 AM, Axelle wrote: Hi Hugo, Thanks for your answer. Unfortunately, I had also tried sudo mount -o degraded /dev/sdc1 /samples mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so and dmesg says: [ 1177.695773] btrfs: open_ctree failed [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 1 transid 31105 /dev/sdc6 [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 1 transid 31105 /dev/sdc6 [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid 2 transid 31105 /dev/sdc1 [ 4013.408280] btrfs: allowing degraded mounts [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, gen 0 [ 4015.600424] Btrfs: too many missing devices, writeable mount is not allowed [ 4015.630841] btrfs: open_ctree failed Did the crashed /dev/sdb have more than 1 partitions in your raid1 filesystem? Yes, I know, I'll probably be losing a lot of data, but it's not too much my concern because I had a backup (sooo happy about that :D). If I can manage to recover a little more on the btrfs volume it's bonus, but in the event I do not, I'll be using my backup. So, how do I fix my volume? I guess there would be a solution apart from scratching/deleting everything and starting again... Regards, Axelle On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills h...@carfax.org.uk wrote: On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote: Hi, I've just encountered a hard disk crash in one of my btrfs pools. sudo btrfs filesystem show failed to open /dev/sr0: No medium found Label: none uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add Total devices 3 FS bytes used 112.70GB devid1 size 100.61GB used 89.26GB path /dev/sdc6 devid2 size 93.13GB used 84.00GB path /dev/sdc1 *** Some devices missing The device which is missing is /dev/sdb. I have replaced it with a new hard disk. How do I add it back to the volume and fix the device missing? The pool is expected to mount to /samples (it is not mounted yet). I tried this - which fails: sudo btrfs device add /dev/sdb /samples ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device Why isn't this working? Because it's not mounted. :) I also tried this: sudo mount -o recovery /dev/sdc1 /samples mount: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so same with /dev/sdc6 Close, but what you want here is: mount -o degraded /dev/sdc1 /samples not recovery. That will tell the FS that there's a missing disk, and it should mount without complaining. If your data is not RAID-1 or RAID-10, then you will almost certainly have lost some data. At that point, since you've removed the dead disk, you can do: btrfs device delete missing /samples which forcibly removes the record of the missing device. Then you can add the new device: btrfs device add /dev/sdb /samples And finally
Re: [PATCH] btrfs-progs: update INSTALL file
2014-01-22 Anand Jain anand.j...@oracle.com: with the changes that has happened since last time it was updated Signed-off-by: Anand Jain anand.j...@oracle.com --- INSTALL |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/INSTALL b/INSTALL index 8ead607..a86878a 100644 --- a/INSTALL +++ b/INSTALL @@ -12,7 +12,8 @@ complete: modprobe libcrc32c insmod btrfs.ko -The Btrfs utility programs require libuuid to build. This can be found +The Btrfs utility programs (btrfs-progs) require libattr zlib libacl +e2fsprogs libblkid lzo2 to build. This can be found in the e2fsprogs sources, and is usually available as libuuid or e2fsprogs-devel from various distros. Which version of libblkid should we use? From e2fsprogs or from util-linux? libattr/libacl: these libraries are not necessary - you need it only to convert. Daniel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Barrier remount failure
On 3.13-rc5, it's possible to remount a mounted BTRFS filesystem with 'nobarrier', but not possible to remount with 'barrier'. Is this expected? Many thanks, Daniel -- Daniel J Blueman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nagios probe for btrfs RAID status?
On 23/11/13 04:59, Anand Jain wrote: For example, would the command btrfs filesystem show --all-devices give a non-zero error status or some other clue if any of the devices are at risk? No there isn't any good way as of now. that's something to fix. Does it require kernel/driver code changes or it should be possible to implement in the user space utility? It would be useful for people testing the filesystem to know when they get into trouble so they can investigate more quickly (and before the point of no return) [btrfs personal user/sysadmin, not a dev, not anything large enough to have personal nagios experience...] AFAIK, btrfs raid modes currently switch the filesystem to read-only on any device-drop error. That has been deemed the simplest/safest policy during development, tho at some point as stable approaches the behavior could theoretically be made optional. None of the warnings about btrfs's experimental status hint at that, some people may be surprised by it. So detection could watch for read-only and act accordingly, either switching back to read-write or rebooting or simply logging the event, as deemed appropriate. It would be relatively trivial to implement a Nagios check for read-only, Nagios probes are just shell scripts What about when btrfs detects a bad block checksum and recovers data from the equivalent block on another disk? The wiki says there will be a syslog event. Does btrfs keep any stats on the number of blocks that it considers unreliable and can this be queried from user space? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nagios probe for btrfs RAID status?
On 23/11/13 09:37, Daniel Pocock wrote: On 23/11/13 04:59, Anand Jain wrote: For example, would the command btrfs filesystem show --all-devices give a non-zero error status or some other clue if any of the devices are at risk? No there isn't any good way as of now. that's something to fix. Does it require kernel/driver code changes or it should be possible to implement in the user space utility? It would be useful for people testing the filesystem to know when they get into trouble so they can investigate more quickly (and before the point of no return) [btrfs personal user/sysadmin, not a dev, not anything large enough to have personal nagios experience...] AFAIK, btrfs raid modes currently switch the filesystem to read-only on any device-drop error. That has been deemed the simplest/safest policy during development, tho at some point as stable approaches the behavior could theoretically be made optional. None of the warnings about btrfs's experimental status hint at that, some people may be surprised by it. So detection could watch for read-only and act accordingly, either switching back to read-write or rebooting or simply logging the event, as deemed appropriate. It would be relatively trivial to implement a Nagios check for read-only, Nagios probes are just shell scripts Just checked, it already exists, so we are half way there: http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_ro_mounts/details What about when btrfs detects a bad block checksum and recovers data from the equivalent block on another disk? The wiki says there will be a syslog event. Does btrfs keep any stats on the number of blocks that it considers unreliable and can this be queried from user space? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nagios probe for btrfs RAID status?
On 23/11/13 11:35, Duncan wrote: Daniel Pocock posted on Sat, 23 Nov 2013 09:37:50 +0100 as excerpted: What about when btrfs detects a bad block checksum and recovers data from the equivalent block on another disk? The wiki says there will be a syslog event. Does btrfs keep any stats on the number of blocks that it considers unreliable and can this be queried from user space? The way you phrased that question is strange to me (considers unreliable? does that mean ones that it had to fix, or ones that it had to fix more than once, or...), so I'm not sure this answers it, but from the btrfs manpage... Let me clarify: when I said unreliable, I was referring to those blocks where the block device driver reads the block without reporting any error but where btrfs has decided the checksum is bad and not used the data from the block. Such blocks definitely exist. Sometimes the data was corrupted at the moment of writing and no matter how many times you read the block, you always get a bad checksum. btrfs device stats [-z] {path|device} Read and print the device IO stats for all devices of the filesystem identified by path or for a single device. Options -z Reset stats to zero after reading them. Here's the output for my (dual device btrfs raid1) rootfs, here: btrfs dev stat / [/dev/sdc5].write_io_errs 0 [/dev/sdc5].read_io_errs0 [/dev/sdc5].flush_io_errs 0 [/dev/sdc5].corruption_errs 0 [/dev/sdc5].generation_errs 0 [/dev/sda5].write_io_errs 0 [/dev/sda5].read_io_errs0 [/dev/sda5].flush_io_errs 0 [/dev/sda5].corruption_errs 0 [/dev/sda5].generation_errs 0 As you can see, for multi-device filesystems it gives the stats per component device. Any errors accumulate until a reset using -z, so you can easily see if the numbers are increasing over time and by how much. That looks interesting - are these explained anywhere? Should a Nagios plugin just look for any non-zero value or just focus on some of those? Are they runtime stats (since system boot) or are they maintained in the filesystem on disk? My own version of the btrfs utility doesn't have that command though, I am using a Debian stable system. I tried a newer version and it gives ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) so I probably need to update my kernel too. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nagios probe for btrfs RAID status?
I just did a search and couldn't find any probe for btrfs RAID status The check_raid plugin seems to recognise mdadm and various other types of RAID but not btrfs Has anybody seen a plugin for Nagios or could anybody comment on how it should work if somebody wants to make one? For example, would the command btrfs filesystem show --all-devices give a non-zero error status or some other clue if any of the devices are at risk? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-convert won't convert ext* - No valid Btrfs found on /dev/sdb1
Hello Josef, Josef Bacik jbacik at fusionio.com writes: On Thu, Sep 05, 2013 at 10:45:23AM -0500, Eric Sandeen wrote: [...] This was a regression around July 3; there was no regression test at the time. [615f2867854c186a37cb2e2e5a2e13e9ed4ab0df] Btrfs-progs: cleanup similar code in open_ctree_* and close_ctree broke it. Patches were sent to the list to fix it on July 17, https://patchwork.kernel.org/patch/2828820/ but they haven't been merged into the main repo. I sent a regression test for it to the list on Aug 4, but nobody reviewed it, so it hasn't been merged into the test suite, either. Winning all around! Alright, alright I'll review it, Jesus. ;), Is there any progress on this or can I help with solving this somehow? Josef Daniel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs_join_transaction bug...
When running my btrfs exerciser [1] for ~5 mins on 3.11-rc7, I hit a BUG_ON in merge_reloc_roots that asserts btrfs_join_transaction doesn't generate an error [2]. Is this a valid failure when the filesystem went read-only due to out of space? It can also reproduce a livelock between a 'btrfs filesystem balance' and btrfs-transaction kthread. Thanks, Daniel --- [1] 1. boot your box with ramdisk_size set to a fifth of your box's memory, eg ramdisk_size=1572864 for an 8GB box, if rd is compiled into the kernel 2. install fio 3. fetch http://quora.org/2013/workload (for fio) 4. make sure /dev/ram{0-3} don't have any important data 5. run http://quora.org/2013/btrfsatron --- [2] $ sudo btrfs filesystem balance /tmp/btrfsathon ERROR: defrag failed on /tmp/btrfsathon - Read-only file system total 1 failures ERROR: error during balancing '/tmp/btrfsathon' - Read-only file system There may be more info in syslog - try dmesg | tail $ dmesg ... WARNING: CPU: 5 PID: 22243 at /home/apw/COD/linux/fs/btrfs/super.c:253 __btrfs_abort_transaction+0x135/0x140 [btrfs]() btrfs: Transaction aborted (error -28) Modules linked in: dm_crypt snd_hda_codec_hdmi ipt_REJECT xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables arc4 b43 joydev mac80211 snd_hda_codec_cirrus rfcomm bnep cfg80211 snd_hda_intel snd_hda_codec ssb uvcvideo ax88179_178a usbnet snd_hwdep applesmc videobuf2_vmalloc mii snd_pcm btusb videobuf2_memops videobuf2_core input_polldev bluetooth snd_page_alloc videodev nfsd snd_seq_midi snd_seq_midi_event auth_rpcgss snd_rawmidi bcm5974 nfs_acl snd_seq nfs snd_seq_device lockd snd_timer binfmt_misc bcma mei_me lpc_ich sunrpc snd mei soundcore fscache apple_gmux mac_hid apple_bl nls_iso8859_1 lp parport btrfs xor zlib_deflate raid6_pq libcrc32c microcode hid_generic hid_apple usbhid hid nouveau i915 mxm_wmi wmi ttm i2c_algo_bit ahci drm_kms_helper libahci drm video CPU: 5 PID: 22243 Comm: btrfs Not tainted 3.11.0-031100rc7-generic #201308252135 Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS MBP101.88Z.00EE.B02.1208081132 08/08/2012 00fd 8801c766d6c8 81720d7a 0007 8801c766d718 8801c766d708 8106534c 0033 8801f2793800 8802181d6780 ffe4 163d Call Trace: [81720d7a] dump_stack+0x46/0x58 [8106534c] warn_slowpath_common+0x8c/0xc0 [81065436] warn_slowpath_fmt+0x46/0x50 [a02c8290] ? lookup_extent_backref+0x60/0xf0 [btrfs] [a02b8cd5] __btrfs_abort_transaction+0x135/0x140 [btrfs] [a02caca5] __btrfs_free_extent+0x1e5/0x990 [btrfs] [a02cb59c] run_delayed_tree_ref+0x14c/0x1c0 [btrfs] [a02cf84e] run_one_delayed_ref+0xde/0xf0 [btrfs] [a02cf999] run_clustered_refs+0x139/0x530 [btrfs] [a02d3570] btrfs_run_delayed_refs+0x100/0x5a0 [btrfs] [a02e34fe] btrfs_commit_transaction+0xbe/0x9e0 [btrfs] [8172c44e] ? _raw_spin_lock+0xe/0x20 [a02ce24f] ? btrfs_block_rsv_check+0x6f/0x90 [btrfs] [a02e41c0] __btrfs_end_transaction+0x350/0x390 [btrfs] [a02e4233] btrfs_end_transaction_throttle+0x13/0x20 [btrfs] [a0330fc4] relocate_block_group+0x434/0x570 [btrfs] [a03312b7] btrfs_relocate_block_group+0x1b7/0x2f0 [btrfs] [a03093b6] btrfs_relocate_chunk.isra.62+0x56/0x3e0 [btrfs] [a03080c9] ? should_balance_chunk.isra.66+0x49/0x2f0 [btrfs] [a030cda2] __btrfs_balance+0x312/0x3f0 [btrfs] [a030d1ba] btrfs_balance+0x33a/0x5d0 [btrfs] [a03162af] btrfs_ioctl_balance+0x22f/0x550 [btrfs] [a0317f09] btrfs_ioctl+0x4f9/0xa90 [btrfs] [8109caf6] ? account_user_time+0xa6/0xc0 [8109d134] ? vtime_account_user+0x74/0x90 [811c471c] do_vfs_ioctl+0x7c/0x2f0 [810210a9] ? syscall_trace_enter+0x29/0x270 [811c4a21] SyS_ioctl+0x91/0xb0 [81735aaf] tracesys+0xe1/0xe6 ---[ end trace 552316f62b37bc3a ]--- BTRFS error (device ram1) in __btrfs_free_extent:5693: errno=-28 No space left BTRFS info (device ram1): forced readonly BTRFS debug (device ram1): run_one_delayed_ref returned -28 BTRFS error (device ram1) in btrfs_run_delayed_refs:2677: errno=-28 No space left [ cut here ] Kernel BUG at a0330b83 [verbose debug info unavailable] invalid opcode: [#1] SMP Modules linked in: dm_crypt snd_hda_codec_hdmi ipt_REJECT xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables arc4 b43 joydev mac80211 snd_hda_codec_cirrus rfcomm bnep cfg80211 snd_hda_intel snd_hda_codec ssb uvcvideo ax88179_178a usbnet snd_hwdep applesmc videobuf2_vmalloc mii snd_pcm btusb
Re: Running Apache Derby on 3.8 and BTRFS cause kernel oops
Dne Thu, 28 Feb 2013 16:47:21 +0100 Josef Bacik jba...@fusionio.com napsal(a): On Wed, Feb 27, 2013 at 03:59:35PM -0700, Blair Zajac wrote: On 02/27/2013 02:08 PM, Josef Bacik wrote: On Wed, Feb 27, 2013 at 1:19 PM, Daniel Kozák kozz...@gmail.com wrote: [kozzi@KozziFX ~]$ mkdir derby [kozzi@KozziFX ~]$ cd derby/ [kozzi@KozziFX derby]$ wget -c -q http://mirror.hosting90.cz/apache//db/derby/db-derby-10.9.1.0/db-derby-10.9.1.0-bin.zip [kozzi@KozziFX derby]$ unzip -qq db-derby-10.9.1.0-bin.zip [kozzi@KozziFX derby]$ cd db-derby-10.9.1.0-bin/ [kozzi@KozziFX db-derby-10.9.1.0-bin]$ DERBY_HOME=`pwd` [kozzi@KozziFX db-derby-10.9.1.0-bin]$ java -jar $DERBY_HOME/lib/derbyrun.jar server start [kozzi@KozziFX db-derby-10.9.1.0-bin]$ java -jar $DERBY_HOME/lib/derbyrun.jar ij verze ij 10.9 ij CONNECT 'jdbc:derby://localhost:1527/seconddb;create=true'; BTW. after this I must restart my PC, and after restart, my system doesn't boot anymore :-) (some more btrfs oops). So I must use btrfs check --repair /dev/sdaX. Sigh and of course I can't reproduce myself, even with importing a huge database into derby. So you are just mounting with -o compress=lzo? What about the mkfs, are you using raid or anything? Are you on a ssd? Also when this happens is there any output above the --- [ cut here ] ---? There should be something about length and such. Thanks, I was able to reproduce on with 3.8 using Ubuntu 13.04 running in KVM using the commands exactly as given, but it only after stopping and starting the server again. I use the cloud image from here, boot of an Ubuntu CD-ROM ISO to change from ext4 to btrfs, then installed openjdk. http://cloud-images.ubuntu.com/raring/current/raring-server-cloudimg-amd64-disk1.img I could make my image available for download later if you need it, in a pre-failure state. Let me know. Yeah I still can't reproduce, can either of you send me your kernel config so I can see if it's something in my config that's causing problems? Thanks, Josef Yes, here is it -- Vytvořeno poštovní aplikací Opery: http://www.opera.com/mail/ config.gz Description: GNU Zip compressed data
Re: Running Apache Derby on 3.8 and BTRFS cause kernel oops
[kozzi@KozziFX ~]$ mkdir derby [kozzi@KozziFX ~]$ cd derby/ [kozzi@KozziFX derby]$ wget -c -q http://mirror.hosting90.cz/apache//db/derby/db-derby-10.9.1.0/db-derby-10.9.1.0-bin.zip [kozzi@KozziFX derby]$ unzip -qq db-derby-10.9.1.0-bin.zip [kozzi@KozziFX derby]$ cd db-derby-10.9.1.0-bin/ [kozzi@KozziFX db-derby-10.9.1.0-bin]$ DERBY_HOME=`pwd` [kozzi@KozziFX db-derby-10.9.1.0-bin]$ java -jar $DERBY_HOME/lib/derbyrun.jar server start [kozzi@KozziFX db-derby-10.9.1.0-bin]$ java -jar $DERBY_HOME/lib/derbyrun.jar ij verze ij 10.9 ij CONNECT 'jdbc:derby://localhost:1527/seconddb;create=true'; BTW. after this I must restart my PC, and after restart, my system doesn't boot anymore :-) (some more btrfs oops). So I must use btrfs check --repair /dev/sdaX. Dne Wed, 27 Feb 2013 14:25:13 +0100 Josef Bacik jba...@fusionio.com napsal(a): On Wed, Feb 27, 2013 at 06:20:16AM -0700, Daniel Kozák wrote: Hello, On my both machine I have ArchLinux with recent kernel 3.8.0 btrfs as a filesystem (with lzo compression). When I try use Apache derby (create database), I almost every time get this kernel oops: Sweet somebody else is hitting this and I haven't been able to reproduce. Can you give me the exact commands you run so I can try and reproduce myself? Thanks, Josef -- Vytvořeno poštovní aplikací Opery: http://www.opera.com/mail/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Debian 7 (wheezy) support for btrfs RAID
I've been able to run the Debian 7 installer (beta1) and get a working Debian system on btrfs RAID1 root FS. A few manual steps and patches required - it would be useful to get feedback about this process. I might have a go at patching partman to fully support this through the installer menu. I've written up the process as an install report bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686130 Any feedback is welcome. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raw partition or LV for btrfs? (FAQ updated)
On 22/08/12 17:42, David Sterba wrote: On Tue, Aug 14, 2012 at 07:23:48AM -0400, Calvin Walton wrote: A patch to add support for `btrfs fi defrag -c none file` or so would make this easier, and shouldn't be to hard to do :) This one is on my list of 'nice to have', it's needed to extend the ioctl to understand 'none' as to actually use no compression during the defrag, while currently it means 'whatever compression the file has set'. Thanks for all the feedback about this, I've tried to gather the responses into the FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Interaction_with_partitions.2C_device_managers_and_logical_volumes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: interaction with hardware RAID?
Just following up on this... does anyone know if any of this is technically feasible even if not implemented/supported today? Also, do any hardware RAID1 implementations offer something like the full btrfs checksum functionality? I've seen HP promoting their `Advanced Data Mirroring' in new Smart Array products, but I've got no idea if that is just a marketing gimmick, like the way they use the name `Advanced Data Guard' as a moniker for RAID6 Looking around in Google, I was pleasantly disturbed to find so many web sites (including some vendors) using the term `checksum' to refer to a parity bit On 22/08/12 13:05, Daniel Pocock wrote: It is well documented that btrfs data recovery (after silent corruption) is dependent on the use of btrfs's own RAID1. However, I'm curious about whether any hardware RAID vendors are contemplating ways to integrate more closely with btrfs, for example, such that when btrfs detects a bad checksum, it would be able to ask the hardware RAID controller to return all alternate copies of the block. Is this technically possible within any hardware RAID device today, even though not implemented in btrfs? Has there been any suggestion that vendors would support this in future, presumably for the benefit of btrfs, ZFS and other checksumming filesystems? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
interaction with hardware RAID?
It is well documented that btrfs data recovery (after silent corruption) is dependent on the use of btrfs's own RAID1. However, I'm curious about whether any hardware RAID vendors are contemplating ways to integrate more closely with btrfs, for example, such that when btrfs detects a bad checksum, it would be able to ask the hardware RAID controller to return all alternate copies of the block. Is this technically possible within any hardware RAID device today, even though not implemented in btrfs? Has there been any suggestion that vendors would support this in future, presumably for the benefit of btrfs, ZFS and other checksumming filesystems? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
fail to mount after first reboot
I created a 1TB RAID1. So far it is just for testing, no important data on there. After a reboot, I tried to mount it again # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0 mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg00-btrfsvol0_0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so I checked dmesg: [17216.145092] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 34 /dev/mapper/vg00-btrfsvol0_0 [17216.145639] btrfs: disk space caching is enabled [17216.146987] btrfs: failed to read the system array on dm-100 [17216.147556] btrfs: open_ctree failed Then I did btrfsck - it reported no errors, but mounted OK: # btrfsck /dev/mapper/vg00-btrfsvol0_0 checking extents checking fs roots checking root refs found 26848493568 bytes used err is 0 total csum bytes: 26170252 total tree bytes: 48517120 total fs tree bytes: 5492736 btree space waste bytes: 14307930 file data blocks allocated: 26799976448 referenced 26799976448 Btrfs Btrfs v0.19 # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0 # I checked dmesg again, these are the messages from the second mount: [17299.180600] device fsid 928b939f-7f9d-4095-b1ba-e35c5f1277bf devid 1 transid 37928 /dev/dm-96 [17299.204475] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 2 transid 34 /dev/dm-99 [17299.204658] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 34 /dev/dm-100 [17299.288317] device fsid 928b939f-7f9d-4095-b1ba-e35c5f1277bf devid 1 transid 37928 /dev/dm-96 [17299.289024] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 2 transid 34 /dev/dm-99 [17299.289150] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 34 /dev/dm-100 [17310.978518] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 34 /dev/mapper/vg00-btrfsvol0_0 [17310.993882] btrfs: disk space caching is enabled Can anyone comment on this? Also, df is reporting double the actual RAID1 volume size, and double the amount of data stored in this filesystem: # df -lh . FilesystemSize Used Avail Use% Mounted on /dev/mapper/vg00-btrfsvol0_0 1.9T 51G 1.8T 3% /mnt/btrfs0 I would expect to see Size=1T, Used=25G # strace -v -e trace=statfs df -lh /mnt/btrfs0 statfs(/mnt/btrfs0, {f_type=0x9123683e, f_bsize=4096, f_blocks=488374272, f_bfree=475264720, f_bavail=474749786, f_files=0, f_ffree=0, f_fsid={2083217090, -1714407264}, f_namelen=255, f_frsize=4096}) = 0 FilesystemSize Used Avail Use% Mounted on /dev/mapper/vg00-btrfsvol0_0 1.9T 51G 1.8T 3% /mnt/btrfs0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fail to mount after first reboot
On 19/08/12 14:15, Hugo Mills wrote: On Sun, Aug 19, 2012 at 02:08:17PM +, Daniel Pocock wrote: I created a 1TB RAID1. So far it is just for testing, no important data on there. After a reboot, I tried to mount it again # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0 mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg00-btrfsvol0_0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so With multi-volume btrfs filesystems, you have to run btrfs dev scan before trying to mount it. Usually, the distribution will do this in the initrd (if you've installed its btrfs-progs package). I'm running Debian, I've just updated the system from squeeze to wheezy (with 3.2 kernel) so I could try btrfs and do other QA testing on wheezy (as it is in the beta phase now) I already had the btrfs-tools package installed, before creating the filesystem. So it appears Debian doesn't have an init script It does have /lib/udev/rules.d/60-btrfs.rules: SUBSYSTEM!=block, GOTO=btrfs_end ACTION!=add|change, GOTO=btrfs_end ENV{ID_FS_TYPE}!=btrfs, GOTO=btrfs_end RUN+=/sbin/modprobe btrfs RUN+=/sbin/btrfs device scan $env{DEVNAME} LABEL=btrfs_end but I'm guessing that isn't any use to my logical volumes that are activated early in the boot sequence? Could I be having this problem because I put my btrfs on logical volumes? Here is the package version I have: # dpkg --list | grep btrfs ii btrfs-tools 0.19+20120328-7 Checksumming Copy on Write Filesystem utilities Here is a more thorough dmesg, since boot, does this suggest the scan was invoked? I remember seeing some message about checking for btrfs filesystems just after selecting the kernel in grub (root is ext3) # dmesg | grep btrfs [ 40.677505] btrfs: setting nodatacow [ 40.677514] btrfs: turning off barriers [17216.145092] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 34 /dev/mapper/vg00-btrfsvol0_0 [17216.145639] btrfs: disk space caching is enabled [17216.146987] btrfs: failed to read the system array on dm-100 [17216.147556] btrfs: open_ctree failed [17310.978518] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 34 /dev/mapper/vg00-btrfsvol0_0 [17310.993882] btrfs: disk space caching is enabled [17598.736657] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1 transid 37 /dev/mapper/vg00-btrfsvol0_0 [17598.750849] btrfs: disk space caching is enabled Then I did btrfsck - it reported no errors, but mounted OK: # btrfsck /dev/mapper/vg00-btrfsvol0_0 [...] The first thing that btrfsck does is to do a device scan. [...] Ok, that is most likely why my next mount attempted succeeded -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fail to mount after first reboot
On 19/08/12 16:51, Hugo Mills wrote: On Sun, Aug 19, 2012 at 02:33:14PM +, Daniel Pocock wrote: On 19/08/12 14:15, Hugo Mills wrote: On Sun, Aug 19, 2012 at 02:08:17PM +, Daniel Pocock wrote: I created a 1TB RAID1. So far it is just for testing, no important data on there. After a reboot, I tried to mount it again # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0 mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg00-btrfsvol0_0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so With multi-volume btrfs filesystems, you have to run btrfs dev scan before trying to mount it. Usually, the distribution will do this in the initrd (if you've installed its btrfs-progs package). I'm running Debian, I've just updated the system from squeeze to wheezy (with 3.2 kernel) so I could try btrfs and do other QA testing on wheezy (as it is in the beta phase now) I already had the btrfs-tools package installed, before creating the filesystem. So it appears Debian doesn't have an init script It does have /lib/udev/rules.d/60-btrfs.rules: SUBSYSTEM!=block, GOTO=btrfs_end ACTION!=add|change, GOTO=btrfs_end ENV{ID_FS_TYPE}!=btrfs, GOTO=btrfs_end RUN+=/sbin/modprobe btrfs RUN+=/sbin/btrfs device scan $env{DEVNAME} LABEL=btrfs_end but I'm guessing that isn't any use to my logical volumes that are activated early in the boot sequence? Could I be having this problem because I put my btrfs on logical volumes? Possibly. You may need the Device mapper uevents option in the kernel (CONFIG_DM_UEVENT) to trigger that udev rule when you enable your VG(s). Not sure if it's available/enabled in your kernel. I've created a Debian bug report for the issue: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=685311 Thanks for the quick feedback about this -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html