Re: [PATCH v6 2/2] btrfs: Add zstd support to grub btrfs

2018-11-19 Thread Daniel Kiper
On Mon, Nov 19, 2018 at 03:22:51PM +0100, Daniel Kiper wrote:
> On Thu, Nov 15, 2018 at 02:36:03PM -0800, Nick Terrell wrote:
> > - Adds zstd support to the btrfs module.
> > - Adds a test case for btrfs zstd support.
> > - Changes top_srcdir to srcdir in the btrfs module's lzo include
> >   following comments from Daniel Kiper about the zstd include.
> >
> > Tested on Ubuntu-18.04 with a btrfs /boot partition with and without zstd
> > compression. A test case was also added to the test suite that fails before
> > the patch, and passes after.
> >
> > Signed-off-by: Nick Terrell 
>
> Reviewed-by: Daniel Kiper 
>
> If there are no objections I will apply this patch series in a week or so.

Errr... I have just realized that there are two many spaces just at the
beginning of most lines. There should be 2 instead of 4. Please take
a look at currently existing functions. Anyway, I can fix this before
committing. However, if you could repost whole patch series that would
be much easier for me.

Daniel


Re: [PATCH v6 2/2] btrfs: Add zstd support to grub btrfs

2018-11-19 Thread Daniel Kiper
On Thu, Nov 15, 2018 at 02:36:03PM -0800, Nick Terrell wrote:
> - Adds zstd support to the btrfs module.
> - Adds a test case for btrfs zstd support.
> - Changes top_srcdir to srcdir in the btrfs module's lzo include
>   following comments from Daniel Kiper about the zstd include.
>
> Tested on Ubuntu-18.04 with a btrfs /boot partition with and without zstd
> compression. A test case was also added to the test suite that fails before
> the patch, and passes after.
>
> Signed-off-by: Nick Terrell 

Reviewed-by: Daniel Kiper 

If there are no objections I will apply this patch series in a week or so.

Daniel


Price Inquiry

2018-11-11 Thread Daniel Murray
Hi,friend,
 
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in 
your products.
Could you kindly send us your Latest catalog and price list for our trial order.
 
Best Regards,
 
Daniel Murray
Purchasing Manager




Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.

2018-10-11 Thread Daniel Kiper
On Tue, Oct 09, 2018 at 07:51:01PM +0200, Daniel Kiper wrote:
> On Thu, Sep 27, 2018 at 08:34:56PM +0200, Goffredo Baroncelli wrote:
> > From: Goffredo Baroncelli 
> >
> > Signed-off-by: Goffredo Baroncelli 
> 
> Code LGTM. Though comment begs improvement. I will send you updated
> comment for approval shortly.

Below you can find updated patch. Please check I have not messed up something.

Daniel

>From ecefb12a10d39bdd09e1d2b8fbbcbdb1b35274f8 Mon Sep 17 00:00:00 2001
From: Goffredo Baroncelli 
Date: Thu, 27 Sep 2018 20:34:56 +0200
Subject: [PATCH 1/1] btrfs: Add support for reading a filesystem with a RAID
 5 or RAID 6 profile.

Signed-off-by: Goffredo Baroncelli 
Signed-off-by: Daniel Kiper 
---
 grub-core/fs/btrfs.c |   73 ++
 1 file changed, 73 insertions(+)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index be19544..933a57d 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
 #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
 #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20
 #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40
+#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
+#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
   grub_uint8_t dummy2[0xc];
   grub_uint16_t nstripes;
   grub_uint16_t nsubstripes;
@@ -766,6 +768,77 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, 
grub_disk_addr_t addr,
  csize = chunk_stripe_length - low;
  break;
}
+ case GRUB_BTRFS_CHUNK_TYPE_RAID5:
+ case GRUB_BTRFS_CHUNK_TYPE_RAID6:
+   {
+ grub_uint64_t nparities, stripe_nr, high, low;
+
+ redundancy = 1;   /* no redundancy for now */
+
+ if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+   {
+ grub_dprintf ("btrfs", "RAID5\n");
+ nparities = 1;
+   }
+ else
+   {
+ grub_dprintf ("btrfs", "RAID6\n");
+ nparities = 2;
+   }
+
+ /*
+  * RAID 6 layout consists of several stripes spread over
+  * the disks, e.g.:
+  *
+  *   Disk_0  Disk_1  Disk_2  Disk_3
+  * A0  B0  P0  Q0
+  * Q1  A1  B1  P1
+  * P2  Q2  A2  B2
+  *
+  * Note: placement of the parities depend on row number.
+  *
+  * Pay attention that the btrfs terminology may differ from
+  * terminology used in other RAID implementations, e.g. LVM,
+  * dm or md. The main difference is that btrfs calls contiguous
+  * block of data on a given disk, e.g. A0, stripe instead of 
chunk.
+  *
+  * The variables listed below have following meaning:
+  *   - stripe_nr is the stripe number excluding the parities
+  * (A0 = 0, B0 = 1, A1 = 2, B1 = 3, etc.),
+  *   - high is the row number (0 for A0...Q0, 1 for Q1...P1, 
etc.),
+  *   - stripen is the disk number in a row (0 for A0, Q1, P2,
+  * 1 for B0, A1, Q2, etc.),
+  *   - off is the logical address to read,
+  *   - chunk_stripe_length is the size of a stripe (typically 64 
KiB),
+  *   - nstripes is the number of disks in a row,
+  *   - low is the offset of the data inside a stripe,
+  *   - stripe_offset is the data offset in an array,
+  *   - csize is the "potential" data to read; it will be reduced
+  * to size if the latter is smaller,
+  *   - nparities is the number of parities (1 for RAID 5, 2 for
+  * RAID 6); used only in RAID 5/6 code.
+  */
+ stripe_nr = grub_divmod64 (off, chunk_stripe_length, );
+
+ /*
+  * stripen is computed without the parities
+  * (0 for A0, A1, A2, 1 for B0, B1, B2, etc.).
+  */
+ high = grub_divmod64 (stripe_nr, nstripes - nparities, );
+
+ /*
+  * The stripes are spread over the disks. Every each row their
+  * positions are shifted by 1 place. So, the real disks number
+  * change. Hence, we have to take current row number modulo
+  * nstripes into account (0 for A0, 1 for A1, 2 for A2, etc.).
+  */
+ grub_divmod64 (high + stripen, nstripes, );
+
+ stripe_offset = low + chunk_stripe_length * high;
+ csize = chunk_stripe_length - low;
+
+ break;
+   }
  default:
grub_dprintf ("btrfs", "unsupported RAID\n");
return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
-- 
1.7.10.4


Re: [PATCH 9/9] btrfs: Add RAID 6 recovery for a btrfs filesystem.

2018-09-27 Thread Daniel Kiper
On Wed, Sep 26, 2018 at 09:56:07PM +0200, Goffredo Baroncelli wrote:
> On 25/09/2018 21.20, Daniel Kiper wrote:
> > On Wed, Sep 19, 2018 at 08:40:40PM +0200, Goffredo Baroncelli wrote:
> >> From: Goffredo Baroncelli 
> >>
> []
> >>   *  - stripe_offset is the disk offset,
> >>   *  - csize is the "potential" data to read. It will be reduced to
> >>   *size if the latter is smaller.
> >> + *  - parities_pos is the position of the parity inside a row (
> >
> > s/inside/in/>
> >> + *2 for P1, 3 for P2...)
>
> +  *  - nparities is the number of parities (1 for RAID5, 2 for 
> RAID6);
> +  *used only in RAID5/6 code.
>
> >>   */
> >>  block_nr = grub_divmod64 (off, chunk_stripe_length, );
> >>
> >> @@ -1030,6 +1069,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data 
> >> *data, grub_disk_addr_t addr,
> >>   */
> >>  grub_divmod64 (high + stripen, nstripes, );
> >>
> >> +grub_divmod64 (high + nstripes - nparities, nstripes,
> >> +   _pos);
> >
> > I think that this math requires a bit of explanation in the comment
> > before grub_divmod64(). Especially I am interested in why high +
> > nstripes - nparities works as expected.
>
>
> What about
>
> /*
>  * parities_pos is equal to "(high - nparities) % nstripes" (see the diagram 
> above).
>  * However "high - nparities" might be negative (eg when high == 0) leading 
> to an
>  * incorrect computation.
>  * Instead "high + nstripes - nparities" is always positive and in modulo 
> nstripes is
>  * equal to "(high - nparities) % nstripes
>  */

LGTM.

Daniel


Re: [PATCH 7/9] btrfs: Add support for recovery for a RAID 5 btrfs profiles.

2018-09-27 Thread Daniel Kiper
On Wed, Sep 26, 2018 at 09:55:57PM +0200, Goffredo Baroncelli wrote:
> On 25/09/2018 21.10, Daniel Kiper wrote:
> > On Wed, Sep 19, 2018 at 08:40:38PM +0200, Goffredo Baroncelli wrote:
> >> From: Goffredo Baroncelli 
> >>
> >> Add support for recovery for a RAID 5 btrfs profile. In addition
> >> it is added some code as preparatory work for RAID 6 recovery code.
> >>
> >> Signed-off-by: Goffredo Baroncelli 
> >> ---
> >>  grub-core/fs/btrfs.c | 169 +--
> >>  1 file changed, 164 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
> >> index 5c1ebae77..55a7eeffc 100644
> >> --- a/grub-core/fs/btrfs.c
> >> +++ b/grub-core/fs/btrfs.c
> >> @@ -29,6 +29,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>
> >>  GRUB_MOD_LICENSE ("GPLv3+");
> >>
> >> @@ -665,6 +666,148 @@ btrfs_read_from_chunk (struct grub_btrfs_data *data,
> >>  return err;
> >>  }
> >>
> >> +struct raid56_buffer {
> >> +  void *buf;
> >> +  int  data_is_valid;
> >> +};
> >> +
> >> +static void
> >> +rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
> >> + grub_uint64_t nstripes, grub_uint64_t csize)
> >> +{
> >> +  grub_uint64_t i;
> >> +  int first;
> >> +
> >> +  i = 0;
> >> +  while (buffers[i].data_is_valid && i < nstripes)
> >> +++i;
> >
> > for (i = 0; buffers[i].data_is_valid && i < nstripes; i++);
> >
> >> +  if (i == nstripes)
> >> +{
> >> +  grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are 
> >> OK\n");
> >> +  return;
> >> +}
> >> +
> >> +  grub_dprintf ("btrfs", "rebuilding RAID 5 stripe #%" PRIuGRUB_UINT64_T 
> >> "\n",
> >> +  i);
> >
> > One line here please.
> >
> >> +  for (i = 0, first = 1; i < nstripes; i++)
> >> +{
> >> +  if (!buffers[i].data_is_valid)
> >> +  continue;
> >> +
> >> +  if (first) {
> >> +  grub_memcpy(dest, buffers[i].buf, csize);
> >> +  first = 0;
> >> +  } else
> >> +  grub_crypto_xor (dest, dest, buffers[i].buf, csize);
> >> +
> >> +}
> >
> > Hmmm... I think that this function can be simpler. You can drop first
> > while/for and "if (i == nstripes)". Then here:
> >
> > if (first) {
> >   grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
> >
> > Am I right?
>
> Ehm.. no. The "if" is an internal check to avoid BUG. rebuild_raid5() should 
> be called only if some disk is missed.
> To perform this control, the code checks if all buffers are valid. Otherwise 
> there is an internal BUG.

Something is wrong here. I think that the code checks if it is an invalid
buffer. If there is not then GRUB complains. Right? However, it looks
that I misread the code and made a mistake here. So, please ignore
this change. Though please change while() with for() at the beginning.

Daniel


Re: [PATCH 4/9] btrfs: Avoid a rescan for a device which was already not found.

2018-09-27 Thread Daniel Kiper
On Wed, Sep 26, 2018 at 09:55:54PM +0200, Goffredo Baroncelli wrote:
> On 25/09/2018 19.29, Daniel Kiper wrote:
> > On Wed, Sep 19, 2018 at 08:40:35PM +0200, Goffredo Baroncelli wrote:
> >> From: Goffredo Baroncelli 
> >>
> >> If a device is not found, do not return immediately but
> >> record this failure by storing NULL in data->devices_attached[].
> >
> > Still the same question: Where the store happens in the code?
> > I cannot find it in the patch below. This have to be clarified.
> >
> > Daniel
>
>
> What about the following commit description
> -
> Change the behavior of find_device(): before the patch, a read of a
> missed device might trigger a rescan. However, it is never recorded

s/might/may/

> that a device is missed, so each single read of a missed device might
> triggers a rescan.  It is the caller who decides if a rescan is
> performed in case of a missed device. And it does quite often, without
> considering if in the past a devices was already found as "missed"
> This behavior causes a lot of unneeded rescan, causing a huge slowdown
> in case of a missed device.
>
> After the patch, the "missed device" information is stored in the
> cache (as a NULL value). A rescan is triggered only if no information

What do you mean by "cache"? ctx.dev_found? If yes please use latter
instead of former. Or both together if it makes sense.

> at all is found in the cache. This means that only the first time a
> read of a missed device triggers a rescan.
>
> The change in the code is done removing "return NULL" when the disk is
> not found. So it is always executed the code which stores in the cache

cache?

> the value returned by grub_device_iterate(): NULL if the device is
> missed, or a valid data otherwise.
> -

Otherwise it is much better than earlier one.

Daniel


Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.

2018-09-27 Thread Daniel Kiper
On Wed, Sep 26, 2018 at 10:40:32PM +0200, Goffredo Baroncelli wrote:
> On 25/09/2018 17.31, Daniel Kiper wrote:
> > On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote:
> >> From: Goffredo Baroncelli 
> >>
> >> Signed-off-by: Goffredo Baroncelli 
> >> ---
> >>  grub-core/fs/btrfs.c | 66 
> >>  1 file changed, 66 insertions(+)
> >>
> >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
> >> index be195448d..56c42746d 100644
> >> --- a/grub-core/fs/btrfs.c
> >> +++ b/grub-core/fs/btrfs.c
> >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
> >>  #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
> >>  #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20
> >>  #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
> >>grub_uint8_t dummy2[0xc];
> >>grub_uint16_t nstripes;
> >>grub_uint16_t nsubstripes;
> >> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data 
> >> *data, grub_disk_addr_t addr,
> >>  stripe_offset = low + chunk_stripe_length
> >>* high;
> >>  csize = chunk_stripe_length - low;
> >> +break;
> >> +  }
> >> +case GRUB_BTRFS_CHUNK_TYPE_RAID5:
> >> +case GRUB_BTRFS_CHUNK_TYPE_RAID6:
> >> +  {
> >> +grub_uint64_t nparities, block_nr, high, low;
> >> +
> >> +redundancy = 1;   /* no redundancy for now */
> >> +
> >> +if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
> >> +  {
> >> +grub_dprintf ("btrfs", "RAID5\n");
> >> +nparities = 1;
> >> +  }
> >> +else
> >> +  {
> >> +grub_dprintf ("btrfs", "RAID6\n");
> >> +nparities = 2;
> >> +  }
> >> +
> >> +/*
> >> + * A RAID 6 layout consists of several blocks spread on the disks.
> >> + * The raid terminology is used to call all the blocks of a row
> >> + * "stripe". Unfortunately the BTRFS terminology confuses block
> >
> > Stripe is data set or parity (parity stripe) on one disk. Block has
> > different meaning. Please stick to btrfs terminology and say it clearly
> > in the comment. And even add a link to btrfs wiki page to ease reading.
> >
> > I think about this one:
> >   
> > https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
> >
> >> + * and stripe.
> >
> > I do not think so. Or at least not so much...
>
> Trust me, generally speaking stripe is the "row" in the disks (without the 
> parity); looking at the ext3 man page:
>
> 
>stride=stride-size
>   Configure  the  filesystem  for  a  RAID  array with
>   stride-size filesystem blocks. This is the number of
>   blocks  read or written to disk before moving to the
>   next disk, which is sometimes  referred  to  as  the
>   chunk   size.   This  mostly  affects  placement  of
>   filesystem metadata like bitmaps at mke2fs  time  to
>   avoid  placing them on a single disk, which can hurt
>   performance.  It may also be used by the block 
> allo???
>   cator.
>
>stripe_width=stripe-width
>   Configure  the  filesystem  for  a  RAID  array with
>   stripe-width filesystem blocks per stripe.  This  is
>   typically  stride-size * N, where N is the number of
>   data-bearing disks in the  RAID  (e.g.  for  RAID  5
>   there is one parity disk, so N will be the number of
>   disks in the array minus 1).  This allows the  block
>   allocator to prevent read-modify-write of the parity
>   in a RAID stripe if possible when the data is  
> writ???
>   ten.
>
> 
> Looking at the RAID5 wikipedia page, it seems that the term "stripe"
> is coherent with the ext3 man page.

Ugh... It looks that I have messe

Re: [PATCH 3/3] btrfs: Add zstd support to btrfs

2018-09-21 Thread Daniel Kiper
ze_t ret = -1;
> +
> + /* Zstd will fail if it can't fit the entire output in the destination
> +  * buffer, so if osize isn't large enough, allocate a temporary buffer.
> +  */
> + if (otmpsize < ZSTD_BTRFS_MAX_INPUT) {
> + allocated = grub_malloc (ZSTD_BTRFS_MAX_INPUT);
> + if (!allocated) {
> + grub_dprintf ("zstd", "outtmpbuf allocation failed\n");
> + goto out;
> + }
> + otmpbuf = (char*)allocated;
> + otmpsize = ZSTD_BTRFS_MAX_INPUT;
> + }
> +
> + /* Allocate space for, and initialize, the ZSTD_DCtx. */
> + wmem = grub_malloc (wmem_size);
> + if (!wmem) {
> + grub_dprintf ("zstd", "wmem allocation failed\n");
> + goto out;
> + }
> + dctx = ZSTD_initDCtx (wmem, wmem_size);
> +
> + /* Get the real input size, there may be junk at the
> +  * end of the frame.
> +  */
> + isize = ZSTD_findFrameCompressedSize (ibuf, isize);
> + if (ZSTD_isError (isize)) {
> + grub_dprintf ("zstd", "first frame is invalid %d\n",
> + (int)ZSTD_getErrorCode (isize));
> + goto out;
> + }
> +
> + /* Decompress and check for errors */
> + zstd_ret = ZSTD_decompressDCtx (dctx, otmpbuf, otmpsize, ibuf, isize);
> + if (ZSTD_isError (zstd_ret)) {
> + grub_dprintf ("zstd", "zstd failed with  code %d\n",
> + (int)ZSTD_getErrorCode (zstd_ret));
> + goto out;
> + }
> +
> + /* Move the requested data into the obuf.
> +  * obuf may be equal to otmpbuf, which is why grub_memmove() is 
> required.
> +  */
> + grub_memmove (obuf, otmpbuf + off, osize);
> + ret = osize;
> +
> +out:

s/out/err/

> + grub_free (allocated);
> + grub_free (wmem);
> + return ret;
> +}
> +
>  static grub_ssize_t
>  grub_btrfs_lzo_decompress(char *ibuf, grub_size_t isize, grub_off_t off,
> char *obuf, grub_size_t osize)
> @@ -1087,7 +1156,8 @@ grub_btrfs_extent_read (struct grub_btrfs_data *data,
>
>if (data->extent->compression != GRUB_BTRFS_COMPRESSION_NONE
> && data->extent->compression != GRUB_BTRFS_COMPRESSION_ZLIB
> -   && data->extent->compression != GRUB_BTRFS_COMPRESSION_LZO)
> +   && data->extent->compression != GRUB_BTRFS_COMPRESSION_LZO
> +   && data->extent->compression != GRUB_BTRFS_COMPRESSION_ZSTD)
>   {
> grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
> "compression type 0x%x not supported",
> @@ -1127,6 +1197,15 @@ grub_btrfs_extent_read (struct grub_btrfs_data *data,
> != (grub_ssize_t) csize)
>   return -1;
>   }
> +   else if (data->extent->compression == GRUB_BTRFS_COMPRESSION_ZSTD)
> + {
> +   if (grub_btrfs_zstd_decompress(data->extent->inl, data->extsize -
> +((grub_uint8_t *) data->extent->inl
> + - (grub_uint8_t *) data->extent),
> +extoff, buf, csize)
> +   != (grub_ssize_t) csize)
> + return -1;
> + }
> else
>   grub_memcpy (buf, data->extent->inl + extoff, csize);
> break;
> @@ -1164,6 +1243,10 @@ grub_btrfs_extent_read (struct grub_btrfs_data *data,
>   ret = grub_btrfs_lzo_decompress (tmp, zsize, extoff
>   + grub_le_to_cpu64 (data->extent->offset),
>   buf, csize);
> +   else if (data->extent->compression == GRUB_BTRFS_COMPRESSION_ZSTD)
> + ret = grub_btrfs_zstd_decompress (tmp, zsize, extoff
> + + grub_le_to_cpu64 (data->extent->offset),
> + buf, csize);
> else
>   ret = -1;
>
> diff --git a/tests/btrfs_test.in b/tests/btrfs_test.in
> index 2b37ddd33..0c9bf3a68 100644
> --- a/tests/btrfs_test.in
> +++ b/tests/btrfs_test.in
> @@ -18,6 +18,7 @@ fi
>  "@builddir@/grub-fs-tester" btrfs
>  "@builddir@/grub-fs-tester" btrfs_zlib
>  "@builddir@/grub-fs-tester" btrfs_lzo
> +"@builddir@/grub-fs-tester" btrfs_zstd
>  "@builddir@/grub-fs-tester" btrfs_raid0
>  "@builddir@/grub-fs-tester" btrfs_raid1
>  "@builddir@/grub-fs-tester" btrfs_single
> diff --git a/tests/util/grub-fs-tester.in b/tests/util/grub-fs-tester.in
> index ef65fbc93..147d946d2 100644
> --- a/tests/util/grub-fs-tester.in
> +++ b/tests/util/grub-fs-tester.in
> @@ -600,7 +600,7 @@ for LOGSECSIZE in $(range "$MINLOGSECSIZE" 
> "$MAXLOGSECSIZE" 1); do
>   GENERATED=n
>   LODEVICES=
>   MOUNTDEVICE=
> -
> +

Ditto.

Daniel


Re: Bad superblock when mounting rw, ro mount works

2018-06-22 Thread Daniel Underwood
Thanks for the help, I started `check --repair --init-extent-tree`
right around a week ago as a last effort before restoring from backup.
Unfortunately, that command is still running. It does seem to be using
about half of the system's RAM (8 of 16GB) and 100% load on a single
core. Is this type of run time expected for an 8TB drive? The byte
numbers it's referencing seem to be a bit odd to me as they're larger
than the number of bytes on the drive. Here's the head and tail of the
current run (separated by -) if that's indicative of progress:

btrfs unable to find ref byte nr 49673574858752 parent 0 root 1  owner
2 offset 0
btrfs unable to find ref byte nr 49673719529472 parent 0 root 1  owner
1 offset 1
btrfs unable to find ref byte nr 62243448012800 parent 0 root 1  owner
0 offset 1
btrfs unable to find ref byte nr 49673575120896 parent 0 root 1  owner
1 offset 1
btrfs unable to find ref byte nr 49673575251968 parent 0 root 1  owner
0 offset 1
checking extents
ref mismatch on [49218307751936 67108864] extent item 0, found 1
data backref 49218307751936 root 5 owner 1359193 offset 536870912
num_refs 0 not found in extent tree
incorrect local backref count on 49218307751936 root 5 owner 1359193
offset 536870912 found 1 wanted 0 back 0x5583559bd790
backpointer mismatch on [49218307751936 67108864]
-
data backref 49230998138880 root 5 owner 1409678 offset 7408779264
num_refs 0 not found in extent tree
incorrect local backref count on 49230998138880 root 5 owner 1409678
offset 7408779264 found 1 wanted 0 back 0x5583b95f0e20
backpointer mismatch on [49230998138880 16384]
adding new data backref on 49230998138880 root 5 owner 1409678 offset
7408779264 found 1
Repaired extent references for 49230998138880
ref mismatch on [49230998155264 16384] extent item 0, found 1
data backref 49230998155264 root 5 owner 669291 offset 3905650688
num_refs 0 not found in extent tree
incorrect local backref count on 49230998155264 root 5 owner 669291
offset 3905650688 found 1 wanted 0 back 0x5582efb12930
backpointer mismatch on [49230998155264 16384]
adding new data backref on 49230998155264 root 5 owner 669291 offset
3905650688 found 1

Thanks,
Daniel


On Thu, Jun 14, 2018 at 10:43 AM, Qu Wenruo  wrote:
> From the output, specially the lowmem mode output since original mode
> handles extent tree corruption poorly and aborted, it's your extent tree
> corrupted and causing the bug.
>
> Thus, you should be able to mount the fs RO and copy all the data back
> without much hassle.
> Just need to pay attention for csum error.
>
> And considering how many extent tree corruption, I don't think it's a
> good idea to manually fix the fs.
>
> The last chance is, to try --repair --init-extent-tree as your last
> chance, if you still want to salvage the filesystem.
> The lowmem mode shows no extra bug, thus it's possible for
> --init-extent-tree to re-init extent tree and save the day.
>
> But personally speaking I'm not fully confident of the operation, thus
> it may fail and you may need to use the backup.
>
> BTW, even --init-extent-tree succeeded, you may still need to run btrfs
> check again to check if all the bugs are fixed.
> But at least from the lowmem output, the remaining errors are all fixable.
>
> Thanks,
> Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bad superblock when mounting rw, ro mount works

2018-06-14 Thread Daniel Underwood
>> Your very first task right now is to mount ro, and update your
>> backups. Don't do anything else until you've done that. It's a
>> testimony to Btrfs that this file system mounts at all, even ro, so
>> take advantage of this fact before you can't mount it anymore.

Backups are in place for the important parts, though I'd prefer not to
use them if possible.

For btrfs-progs, I am using 4.16.1 installed from
https://github.com/kdave/btrfs-progs.

Regarding em5, it had errors when I initially added it to the array
that were due to a faulty SAS card. The card was replaced and I
haven't seen errors since until this popped up. Regarding what you
said about my dmesg, the numbers are actually still the same (bdev
/dev/mapper/em5 errs: wr 164286, rd 3444262, flush 2110, corrupt 3,
gen 181) after doing a backup and reboot. I would think they would
have changed unless btrfs is just ignoring that disk now. Just to make
sure, I've checked to make sure there weren't any loose connections.
The logs for check (with and without lowmem mode), `btrfs fi us`, and
`smartctl -x` are attached. The checks are only for em5, but I can do
other disks if necessary (it's a 6-disk raid10-style setup). Note that
there are a lot of errors on the SMART info, but they seem to be back
from when I was having hardware issues.

---

On Thu, Jun 7, 2018 at 4:50 PM, Chris Murphy  wrote:
> On Thu, Jun 7, 2018 at 2:38 PM, Chris Murphy  wrote:
>
>
>> Your very first task right now is to mount ro, and update your
>> backups. Don't do anything else until you've done that. It's a
>> testimony to Btrfs that this file system mounts at all, even ro, so
>> take advantage of this fact before you can't mount it anymore.
>
> After you've done the backup, you need to find out why one of these
> devices is being so unreliable. That has to be fixed first. You can
> recreate a new Btrfs or some other file system, and you'll just run
> into the exact same problem down the road. Next, it might be useful to
> see the output from btrfs-progs 4.16.1 'btrfs check' and 'btrfs check
> --mode=lowmem'  both of which are slow, the second one is really slow
> but is a different implementation so it's helpful to see both outputs.
> That's safe as long as you do not use --repair.
>
> Also we need to see the output from 'btrfs fi us ' with
> the volume mounted (ro). Off  hand I think the most likely outcome is
> that you get a backup from the ro mounted file system, and you'll have
> to recreate it from scratch and restore from backups. In other words,
> no matter what you need a current backup.
>
> --
> Chris Murphy



-- 
Daniel Underwood
NCSU Physics 2016
Undergraduate Researcher, Triangle Universities Nuclear Laboratory
(704) 244-0244
daniel.underwoo...@gmail.com
djund...@ncsu.edu


media-usage.log
Description: Binary data


sdh-smart.log
Description: Binary data


em5-check.log
Description: Binary data


em5-check-lowmem.log
Description: Binary data


Bad superblock when mounting rw, ro mount works

2018-06-07 Thread Daniel Underwood
Hi,

I have a raid10-like setup that is failing to mount in rw mode with the error


mount: /mnt/media: wrong fs type, bad option, bad superblock on
/dev/mapper/em1, missing codepage or helper program, or other error

read-only mounts seem to work and the files seem to be there.

I started having issues after a system crash during the process of
deleting a number of large files. After this (Ubuntu 16.04/Kernel
4.4), any attempt to mount the array in rw mode would cause a similar
crash. I did an upgrade to Ubuntu 18.04/Kernel 4.15 and now get the
error above.

I have looked through a variety of posts on the mailing list, but
couldn't find anything with the same issue. I have done a scrub on the
array that resulted in 6 verify errors with dmesg showing something
about extent trees. It didn't list them as uncorrectable errors, but
couldn't correct them either as I can't mount in rw. I also tried
`btrfs rescue zero-log /dev/mapper/em1`, which changes the above to
(say) em5, but then zero-log on em5 causes it to go back to em1.

Any direction would be appreciated. From what I could tell, my next
steps would be a check with --repair or --init-extent-tree, though I'm
reluctant to try those without being explicitly told to do so.

I have attached a dmesg.log file and hope I haven't greped out
anything important.

Thanks,
Daniel
-- 
Daniel Underwood


dmesg.log
Description: Binary data


Re: [PATCH 10/14] vgem: separate errno from VM_FAULT_* values

2018-05-16 Thread Daniel Vetter
On Wed, May 16, 2018 at 07:43:44AM +0200, Christoph Hellwig wrote:
> And streamline the code in vgem_fault with early returns so that it is
> a little bit more readable.
> 
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> ---
>  drivers/gpu/drm/vgem/vgem_drv.c | 51 +++--
>  1 file changed, 23 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vgem/vgem_drv.c b/drivers/gpu/drm/vgem/vgem_drv.c
> index 2524ff116f00..a261e0aab83a 100644
> --- a/drivers/gpu/drm/vgem/vgem_drv.c
> +++ b/drivers/gpu/drm/vgem/vgem_drv.c
> @@ -61,12 +61,13 @@ static void vgem_gem_free_object(struct drm_gem_object 
> *obj)
>   kfree(vgem_obj);
>  }
>  
> -static int vgem_gem_fault(struct vm_fault *vmf)
> +static vm_fault_t vgem_gem_fault(struct vm_fault *vmf)
>  {
>   struct vm_area_struct *vma = vmf->vma;
>   struct drm_vgem_gem_object *obj = vma->vm_private_data;
>   /* We don't use vmf->pgoff since that has the fake offset */
>   unsigned long vaddr = vmf->address;
> + struct page *page;
>   int ret;
>   loff_t num_pages;
>   pgoff_t page_offset;
> @@ -85,35 +86,29 @@ static int vgem_gem_fault(struct vm_fault *vmf)
>   ret = 0;
>   }
>   mutex_unlock(>pages_lock);
> - if (ret) {
> - struct page *page;
> -
> - page = shmem_read_mapping_page(
> - file_inode(obj->base.filp)->i_mapping,
> - page_offset);
> - if (!IS_ERR(page)) {
> - vmf->page = page;
> - ret = 0;
> - } else switch (PTR_ERR(page)) {
> - case -ENOSPC:
> - case -ENOMEM:
> - ret = VM_FAULT_OOM;
> - break;
> - case -EBUSY:
> - ret = VM_FAULT_RETRY;
> - break;
> - case -EFAULT:
> - case -EINVAL:
> - ret = VM_FAULT_SIGBUS;
> - break;
> - default:
> - WARN_ON(PTR_ERR(page));
> - ret = VM_FAULT_SIGBUS;
> - break;
> - }
> + if (!ret)
> + return 0;
> +
> + page = shmem_read_mapping_page(file_inode(obj->base.filp)->i_mapping,
> + page_offset);
> + if (!IS_ERR(page)) {
> + vmf->page = page;
> + return 0;
> + }
>  
> + switch (PTR_ERR(page)) {
> + case -ENOSPC:
> + case -ENOMEM:
> + return VM_FAULT_OOM;
> + case -EBUSY:
> +     return VM_FAULT_RETRY;
> + case -EFAULT:
> + case -EINVAL:
> + return VM_FAULT_SIGBUS;
> + default:
> + WARN_ON(PTR_ERR(page));
> + return VM_FAULT_SIGBUS;
>   }
> - return ret;

Reviewed-by: Daniel Vetter <daniel.vet...@ffwll.ch>

Want me to merge this through drm-misc or plan to pick it up yourself?
-Daniel

>  }
>  
>  static const struct vm_operations_struct vgem_gem_vm_ops = {
> -- 
> 2.17.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add support for BTRFS raid5/6 to GRUB

2018-04-23 Thread Daniel Kiper
On Tue, Apr 17, 2018 at 09:57:40PM +0200, Goffredo Baroncelli wrote:
> Hi All,
>
> Below you can find a patch to add support for accessing files from
> grub in a RAID5/6 btrfs filesystem. This is a RFC because it is
> missing the support for recovery (i.e. if some devices are missed). In
> the next days (weeks ?) I will extend this patch to support also this
> case.
>
> Comments are welcome.

More or less LGTM. Just a nitpick below... I am happy to take full blown
patch into GRUB if it is ready.

> BR
> G.Baroncelli
>
>
> ---
>
> commit 8c80a1b7c913faf50f95c5c76b4666ed17685666
> Author: Goffredo Baroncelli <kreij...@inwind.it>
> Date:   Tue Apr 17 21:40:31 2018 +0200
>
> Add initial support for btrfs raid5/6 chunk
>
> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
> index be195448d..4c5632acb 100644
> --- a/grub-core/fs/btrfs.c
> +++ b/grub-core/fs/btrfs.c
> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
>  #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
>  #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20
>  #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40
> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
> +#define GRUB_BTRFS_CHUNK_TYPE_RAID60x100
>grub_uint8_t dummy2[0xc];
>grub_uint16_t nstripes;
>grub_uint16_t nsubstripes;
> @@ -764,6 +766,39 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, 
> grub_disk_addr_t addr,
> stripe_offset = low + chunk_stripe_length
>   * high;
> csize = chunk_stripe_length - low;
> +   break;
> + }
> +   case GRUB_BTRFS_CHUNK_TYPE_RAID5:
> +   case GRUB_BTRFS_CHUNK_TYPE_RAID6:
> + {
> +   grub_uint64_t nparities;
> +   grub_uint64_t parity_pos;
> +   grub_uint64_t stripe_nr, high;
> +   grub_uint64_t low;
> +
> +   redundancy = 1;   /* no redundancy for now */
> +
> +   if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
> + {
> +   grub_dprintf ("btrfs", "RAID5\n");
> +   nparities = 1;
> + }
> +   else
> + {
> +   grub_dprintf ("btrfs", "RAID6\n");
> +   nparities = 2;
> + }
> +
> +   stripe_nr = grub_divmod64 (off, chunk_stripe_length, );
> +
> +   high = grub_divmod64 (stripe_nr, nstripes - nparities, );
> +   grub_divmod64 (high+nstripes-nparities, nstripes, _pos);
> +   grub_divmod64 (parity_pos+nparities+stripen, nstripes, );

Missing spaces around "+" and "-".

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization

2018-03-12 Thread Daniel Borkmann
On 03/12/2018 03:06 PM, Masami Hiramatsu wrote:
> On Mon, 12 Mar 2018 11:44:21 +0100
> Daniel Borkmann <dan...@iogearbox.net> wrote:
>> On 03/12/2018 11:27 AM, Masami Hiramatsu wrote:
>>> On Mon, 12 Mar 2018 19:00:49 +0900
>>> Masami Hiramatsu <mhira...@kernel.org> wrote:
>>>
>>>> Since the kprobe which was optimized by jump can not change
>>>> the execution path, the kprobe for error-injection must not
>>>> be optimized. To prohibit it, set a dummy post-handler as
>>>> officially stated in Documentation/kprobes.txt.
>>>
>>> Note that trace-probe based BPF is not affected, because it
>>> ensures the trace-probe is based on ftrace, which is not
>>> jump optimized.
>>
>> Thanks for the fix! I presume this should go via bpf instead of bpf-next
>> tree since 4b1a29a7f542 ("error-injection: Support fault injection 
>> framework")
>> is in Linus' tree as well. Unless there are objection I would rather route
>> it that way so it would be for 4.16.
> 
> Ah, right! It should go into 4.16. It should be applicable cleanly either tree
> since there is only the above commit on kernel/fail_function.c :)

Applied to bpf tree, thanks Masami!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization

2018-03-12 Thread Daniel Borkmann
Hi Masami,

On 03/12/2018 11:27 AM, Masami Hiramatsu wrote:
> On Mon, 12 Mar 2018 19:00:49 +0900
> Masami Hiramatsu <mhira...@kernel.org> wrote:
> 
>> Since the kprobe which was optimized by jump can not change
>> the execution path, the kprobe for error-injection must not
>> be optimized. To prohibit it, set a dummy post-handler as
>> officially stated in Documentation/kprobes.txt.
> 
> Note that trace-probe based BPF is not affected, because it
> ensures the trace-probe is based on ftrace, which is not
> jump optimized.

Thanks for the fix! I presume this should go via bpf instead of bpf-next
tree since 4b1a29a7f542 ("error-injection: Support fault injection framework")
is in Linus' tree as well. Unless there are objection I would rather route
it that way so it would be for 4.16.

Thanks,
Daniel

> Thanks,
> 
>>
>> Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework")
>> Signed-off-by: Masami Hiramatsu <mhira...@kernel.org>
>> ---
>>  kernel/fail_function.c |   10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/kernel/fail_function.c b/kernel/fail_function.c
>> index 21b0122cb39c..1d5632d8bbcc 100644
>> --- a/kernel/fail_function.c
>> +++ b/kernel/fail_function.c
>> @@ -14,6 +14,15 @@
>>  
>>  static int fei_kprobe_handler(struct kprobe *kp, struct pt_regs *regs);
>>  
>> +static void fei_post_handler(struct kprobe *kp, struct pt_regs *regs,
>> + unsigned long flags)
>> +{
>> +/*
>> + * A dummy post handler is required to prohibit optimizing, because
>> + * jump optimization does not support execution path overriding.
>> + */
>> +}
>> +
>>  struct fei_attr {
>>  struct list_head list;
>>  struct kprobe kp;
>> @@ -56,6 +65,7 @@ static struct fei_attr *fei_attr_new(const char *sym, 
>> unsigned long addr)
>>  return NULL;
>>  }
>>  attr->kp.pre_handler = fei_kprobe_handler;
>> +attr->kp.post_handler = fei_post_handler;
>>  attr->retval = adjust_error_retval(addr, 0);
>>  INIT_LIST_HEAD(>list);
>>  }
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Limit on the number of btrfs snapshots?

2018-01-12 Thread Daniel E. Shub
A couple of years ago I asked a question on the Unix and Linux Stack
Exchange about the limit on the number of BTRFS snapshots:
https://unix.stackexchange.com/q/140360/22724

Basically, I want to use something like snapper to take time based
snapshots so that I can browse old versions of my data. This would be
in addition to my current off site backup since a drive failure would
wipe out the data and the snapshots. Is there a limit to the number of
snapshots I can take and store? If I have a million snapshots (e.g., a
snapshot every minute for two years) would that cause havoc, assuming
I have enough disk space for the data, the changed data, and the meta
data?

The answers there provided a link to the wiki:
https://btrfs.wiki.kernel.org/index.php/Btrfs_design#Snapshots_and_Subvolumes
that says: "snapshots are writable, and they can be snapshotted again
any number of times."

While I don't doubt that that is technically true, another user
suggested that the practical limit is around 100 snapshots.

While I am not convinced that having minute-by-minute versions of my
data for two years is helpful (how the hell is anyone going to find
the exact minute they are looking for), if there is no cost then I
figure why not.

I guess I am asking is what is the story and where is it documented.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 1/5] add infrastructure for tagging functions as error injectable

2017-12-20 Thread Daniel Borkmann
On 12/20/2017 08:13 AM, Masami Hiramatsu wrote:
> On Tue, 19 Dec 2017 18:14:17 -0800
> Alexei Starovoitov  wrote:
[...]
>> Please make your suggestion as patches based on top of bpf-next.
> 
> bpf-next seems already pick this series. Would you mean I revert it and
> write new patch?

No, please submit as follow-ups instead, thanks Masami!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 3/5] bpf: add a bpf_override_function helper

2017-12-18 Thread Daniel Borkmann
On 12/18/2017 10:51 AM, Masami Hiramatsu wrote:
> On Fri, 15 Dec 2017 14:12:54 -0500
> Josef Bacik  wrote:
>> From: Josef Bacik 
>>
>> Error injection is sloppy and very ad-hoc.  BPF could fill this niche
>> perfectly with it's kprobe functionality.  We could make sure errors are
>> only triggered in specific call chains that we care about with very
>> specific situations.  Accomplish this with the bpf_override_funciton
>> helper.  This will modify the probe'd callers return value to the
>> specified value and set the PC to an override function that simply
>> returns, bypassing the originally probed function.  This gives us a nice
>> clean way to implement systematic error injection for all of our code
>> paths.
> 
> OK, got it. I think the error_injectable function list should be defined
> in kernel/trace/bpf_trace.c because only bpf calls it and needs to care
> the "safeness".
> 
> [...]
>> diff --git a/arch/x86/kernel/kprobes/ftrace.c 
>> b/arch/x86/kernel/kprobes/ftrace.c
>> index 8dc0161cec8f..1ea748d682fd 100644
>> --- a/arch/x86/kernel/kprobes/ftrace.c
>> +++ b/arch/x86/kernel/kprobes/ftrace.c
>> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
>>  p->ainsn.boostable = false;
>>  return 0;
>>  }
>> +
>> +asmlinkage void override_func(void);
>> +asm(
>> +".type override_func, @function\n"
>> +"override_func:\n"
>> +"   ret\n"
>> +".size override_func, .-override_func\n"
>> +);
>> +
>> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
>> +{
>> +regs->ip = (unsigned long)_func;
>> +}
>> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
> 
> Calling this as "override_function" is meaningless. This is a function
> which just return. So I think combination of just_return_func() and
> arch_bpf_override_func_just_return() will be better.
> 
> Moreover, this arch/x86/kernel/kprobes/ftrace.c is an archtecture
> dependent implementation of kprobes, not bpf.

Josef, please work out any necessary cleanups that would still need
to be addressed based on Masami's feedback and send them as follow-up
patches, thanks.

> Hmm, arch/x86/net/bpf_jit_comp.c will be better place?

(No, it's JIT only and I'd really prefer to keep it that way, mixing
 this would result in a huge mess.)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 3/5] bpf: add a bpf_override_function helper

2017-12-15 Thread Daniel Borkmann
On 12/15/2017 09:34 PM, Alexei Starovoitov wrote:
[...]
> Also how big is the v9-v10 change ?
> May be do it as separate patch, since previous set already sitting
> in bpf-next and there are patches on top?

+1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection

2017-12-08 Thread Daniel Borkmann
On 12/08/2017 09:24 PM, Josef Bacik wrote:
> On Fri, Dec 08, 2017 at 04:35:44PM +0100, Daniel Borkmann wrote:
>> On 12/06/2017 05:12 PM, Josef Bacik wrote:
>>> Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro.  I went 
>>> to
>>> figure out why the compiler didn't catch it and it's because it was not used
>>> anywhere.  I had copied it from the trace blacklist code without 
>>> understanding
>>> where it was used as cscope didn't find the original macro I was looking 
>>> for, so
>>> I assumed it was some voodoo and left it in place.  Turns out cscope failed 
>>> me
>>> and I didn't need the macro at all, the trace blacklist thing I was looking 
>>> at
>>> was for marking assembly functions as blacklisted and I have no intention of
>>> marking assembly functions as error injectable at the moment.
>>>
>>> v7->v8:
>>> - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
>>
>> The series doesn't apply cleanly to the bpf-next tree, so one last respin 
>> with
>> a rebase would unfortunately still be required, thanks!
> 
> I've rebased and let it sit in my git tree to make sure kbuild test bot didn't
> blow up, can you pull from
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
> bpf-override-return
> 
> or do you want me to repost the whole series?  Thanks,

Yeah, the patches would need to end up on netdev, so once kbuild bot went
through fine after your rebase, please send the series.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/5] Add the ability to do BPF directed error injection

2017-12-08 Thread Daniel Borkmann
On 12/06/2017 05:12 PM, Josef Bacik wrote:
> Jon noticed that I had a typo in my _ASM_KPROBE_ERROR_INJECT macro.  I went to
> figure out why the compiler didn't catch it and it's because it was not used
> anywhere.  I had copied it from the trace blacklist code without understanding
> where it was used as cscope didn't find the original macro I was looking for, 
> so
> I assumed it was some voodoo and left it in place.  Turns out cscope failed me
> and I didn't need the macro at all, the trace blacklist thing I was looking at
> was for marking assembly functions as blacklisted and I have no intention of
> marking assembly functions as error injectable at the moment.
> 
> v7->v8:
> - removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.

The series doesn't apply cleanly to the bpf-next tree, so one last respin with
a rebase would unfortunately still be required, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/5] add infrastructure for tagging functions as error injectable

2017-11-29 Thread Daniel Borkmann
On 11/28/2017 09:02 PM, Josef Bacik wrote:
> On Tue, Nov 28, 2017 at 11:58:41AM -0700, Jonathan Corbet wrote:
>> On Wed, 22 Nov 2017 16:23:30 -0500
>> Josef Bacik <jo...@toxicpanda.com> wrote:
>>> From: Josef Bacik <jba...@fb.com>
>>>
>>> Using BPF we can override kprob'ed functions and return arbitrary
>>> values.  Obviously this can be a bit unsafe, so make this feature opt-in
>>> for functions.  Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
>>> order to give BPF access to that function for error injection purposes.
>>>
>>> Signed-off-by: Josef Bacik <jba...@fb.com>
>>> Acked-by: Ingo Molnar <mi...@kernel.org>
>>> ---
>>>  arch/x86/include/asm/asm.h|   6 ++
>>>  include/asm-generic/vmlinux.lds.h |  10 +++
>>>  include/linux/bpf.h   |  11 +++
>>>  include/linux/kprobes.h   |   1 +
>>>  include/linux/module.h|   5 ++
>>>  kernel/kprobes.c  | 163 
>>> ++
>>>  kernel/module.c   |   6 +-
>>>  7 files changed, 201 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
>>> index b0dc91f4bedc..340f4cc43255 100644
>>> --- a/arch/x86/include/asm/asm.h
>>> +++ b/arch/x86/include/asm/asm.h
>>> @@ -85,6 +85,12 @@
>>> _ASM_PTR (entry);   \
>>> .popsection
>>>  
>>> +# define _ASM_KPROBE_ERROR_INJECT(entry)   \
>>> +   .pushsection "_kprobe_error_inject_list","aw" ;     \
>>> +   _ASM_ALIGN ;\
>>> +   _ASM_PTR (entry);   \
>>> +   .popseciton
>>
>> So this stuff is not my area of greatest expertise, but I do have to wonder
>> how ".popseciton" can work ... ?
> 
> Well fuck, do you want me to send a increment Daniel/Alexei or resend this 
> patch
> fixed?  Thanks,

Sorry for late reply, please rebase + respin the whole series with
this fixed. There were also few typos in the cover letter / commit
messages that would be good to get fixed along the way.

Also, could you debug why this wasn't caught at compile/runtime during
testing?

Thanks a lot,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 0/4] Add the ability to do BPF directed error injection

2017-11-28 Thread Daniel Borkmann
On 11/22/2017 10:23 PM, Josef Bacik wrote:
> This is hopefully the final version, I've addressed the comment by Igno and
> added his Acks.
> 
> v6->v7:
> - moved the opt-in macro to bpf.h out of kprobes.h.
> 
> v5->v6:
> - add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
>   feature.  This way only functions that opt-in will be allowed to be
>   overridden.
> - added a btrfs patch to allow error injection for open_ctree() so that the 
> bpf
>   sample actually works.
> 
> v4->v5:
> - disallow kprobe_override programs from being put in the prog map array so we
>   don't tail call into something we didn't check.  This allows us to make the
>   normal path still fast without a bunch of percpu operations.
> 
> v3->v4:
> - fix a build error found by kbuild test bot (I didn't wait long enough
>   apparently.)
> - Added a warning message as per Daniels suggestion.
> 
> v2->v3:
> - added a ->kprobe_override flag to bpf_prog.
> - added some sanity checks to disallow attaching bpf progs that have
>   ->kprobe_override set that aren't for ftrace kprobes.
> - added the trace_kprobe_ftrace helper to check if the trace_event_call is a
>   ftrace kprobe.
> - renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read 
> this
>   value in the kprobe path, and thus only write to it if we're overriding or
>   clearing the override.
> 
> v1->v2:
> - moved things around to make sure that bpf_override_return could really only 
> be
>   used for an ftrace kprobe.
> - killed the special return values from trace_call_bpf.
> - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
>   it was being called from an ftrace kprobe context.
> - reworked the logic in kprobe_perf_func to take advantage of 
> bpf_kprobe_state.
> - updated the test as per Alexei's review.
> 
> - Original message -
> 
> A lot of our error paths are not well tested because we have no good way of
> injecting errors generically.  Some subystems (block, memory) have ways to
> inject errors, but they are random so it's hard to get reproduceable results.
> 
> With BPF we can add determinism to our error injection.  We can use kprobes 
> and
> other things to verify we are injecting errors at the exact case we are trying
> to test.  This patch gives us the tool to actual do the error injection part.
> It is very simple, we just set the return value of the pt_regs we're given to
> whatever we provide, and then override the PC with a dummy function that 
> simply
> returns.
> 
> Right now this only works on x86, but it would be simple enough to expand to
> other architectures.  Thanks,

Ok, given the remaining feedback from Ingo was addressed and therefore
the series acked, I've applied it to bpf-next tree, thanks Josef.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FAQ / encryption / error handling?

2017-11-27 Thread Daniel Pocock

Hi all,

The FAQ has a couple of sections on encryption (general and dm-crypt)

One thing that isn't explained there: if you create multiple encrypted
volumes (e.g. using dm-crypt) and use Btrfs to combine them into RAID1,
how does error recovery work when a read operation returns corrupted data?

Without encryption, reading from one disk would give a checksum mismatch
and Btrfs would read from the other disk to (hopefully) get a good copy
of the data.

With this encryption scenario, the failure would potentially be detected
in the decryption layer code and instead of returning bad data to Btrfs,
it would return some error code.  In that case, will Btrfs attempt to
read from the other volume and allow the application to proceed as if
nothing was wrong?

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/5] bpf: add a bpf_override_function helper

2017-11-24 Thread Daniel Borkmann
On 11/22/2017 10:23 PM, Josef Bacik wrote:
> From: Josef Bacik <jba...@fb.com>
> 
> Error injection is sloppy and very ad-hoc.  BPF could fill this niche
> perfectly with it's kprobe functionality.  We could make sure errors are
> only triggered in specific call chains that we care about with very
> specific situations.  Accomplish this with the bpf_override_funciton
> helper.  This will modify the probe'd callers return value to the
> specified value and set the PC to an override function that simply
> returns, bypassing the originally probed function.  This gives us a nice
> clean way to implement systematic error injection for all of our code
> paths.
> 
> Acked-by: Alexei Starovoitov <a...@kernel.org>
> Acked-by: Ingo Molnar <mi...@kernel.org>
> Signed-off-by: Josef Bacik <jba...@fb.com>

Series looks good to me as well; BPF bits:

Acked-by: Daniel Borkmann <dan...@iogearbox.net>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Nouveau] [PATCH 03/10] driver:gpu: return -ENOMEM on allocation failure.

2017-10-12 Thread Daniel Vetter
On Wed, Sep 13, 2017 at 01:02:12PM +0530, Allen Pais wrote:
> Signed-off-by: Allen Pais <allen.l...@gmail.com>

Applied to drm-misc-next, thanks.
-Daniel

> ---
>  drivers/gpu/drm/gma500/mid_bios.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/gma500/mid_bios.c 
> b/drivers/gpu/drm/gma500/mid_bios.c
> index d75ecb3..1fa1633 100644
> --- a/drivers/gpu/drm/gma500/mid_bios.c
> +++ b/drivers/gpu/drm/gma500/mid_bios.c
> @@ -237,7 +237,7 @@ static int mid_get_vbt_data_r10(struct drm_psb_private 
> *dev_priv, u32 addr)
>  
>   gct = kmalloc(sizeof(*gct) * vbt.panel_count, GFP_KERNEL);
>   if (!gct)
> - return -1;
> + return -ENOMEM;
>  
>   gct_virtual = ioremap(addr + sizeof(vbt),
>   sizeof(*gct) * vbt.panel_count);
> -- 
> 2.7.4
> 
> ___
> Nouveau mailing list
> nouv...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Chunk root problem

2017-07-09 Thread Daniel Brady
On 7/7/2017 1:06 AM, Daniel Brady wrote:
> On 7/6/2017 11:48 PM, Roman Mamedov wrote:
>> On Wed, 5 Jul 2017 22:10:35 -0600
>> Daniel Brady <drbr...@gmail.com> wrote:
>>
>>> parent transid verify failed
>>
>> Typically in Btrfs terms this means "you're screwed", fsck will not fix it, 
>> and
>> nobody will know how to fix or what is the cause either. Time to restore from
>> backups! Or look into "btrfs restore" if you don't have any.
>>
>> In your case it's especially puzzling as the difference in transid numbers is
>> really significant (about 100K), almost like the FS was operating for months
>> without updating some parts of itself -- and no checksum errors either, so
>> all looks correct, except that everything is horribly wrong.
>>
>> This kind of error seems to occur more often in RAID setups, either Btrfs
>> native RAID, or with Btrfs on top of other RAID setups -- i.e. where it
>> becomes a complex issue that all writes to multi devices DO complete IN 
>> order,
>> in case of an unclean shutdown. (which is much simpler on a single device 
>> FS).
>>
>> Also one of your disks or cables is failing (was /dev/sde on that boot, but 
>> may
>> get a different index next boot), check SMART data for it and replace.
>>
>>> [   21.230919] BTRFS info (device sdf): bdev /dev/sde errs: wr 402545, rd
>>> 234683174, flush 194501, corrupt 0, gen 0
>>
> 
> Well that's not good news. Unfortunately I made a fatal error in not
> having a backup. Restore looks like I could recover a good chunk of it
> from the dry runs, however it has a lot of trouble reading many files.
> I'm sure that is related to the one disk (sde). Drives were setup as raid56.
> 
> After updating the kernel as suggested in the email from Duncan it
> reduced the "parent transid verify" errors down to just one and the errs
> on sde still exist.
> 
> [   21.400190] BTRFS info (device sdb): use no compression
> [   21.400191] BTRFS info (device sdb): disk space caching is enabled
> [   21.400192] BTRFS info (device sdb): has skinny extents
> [   21.584923] BTRFS info (device sdb): bdev /dev/sde errs: wr 402545,
> rd 234683174, flush 194501, corrupt 0, gen 0
> [   23.394788] BTRFS error (device sdb): parent transid verify failed on
> 5257838690304 wanted 591492 found 489231
> [   23.416489] BTRFS error (device sdb): parent transid verify failed on
> 5257838690304 wanted 591492 found 489231
> [   23.416524] BTRFS error (device sdb): failed to read block groups: -5
> [   23.448478] BTRFS error (device sdb): open_ctree failed
> 
> I ran a SMART test as you suggested with a passing result. I also
> swapped SATA cables & power with another drive and the error followed
> the drive confirmed by the serial via SMART. It seems like it just can't
> read from that one drive for whatever reason. I also tried disconnecting
> the drive and trying to mount it degraded with no luck. Still had the
> transid error just with null as the bdev.
> 
> smartctl -a /dev/sde
> smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.0-1.el7.elrepo.x86_64]
> (local build)
> Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Family: Western Digital Red (AF)
> Device Model: WDC WD30EFRX-68EUZN0
> Serial Number:WD-WCC4N0PEYTEV
> LU WWN Device Id: 5 0014ee 2b7dbfe54
> Firmware Version: 82.00A82
> User Capacity:3,000,592,982,016 bytes [3.00 TB]
> Sector Sizes: 512 bytes logical, 4096 bytes physical
> Rotation Rate:5400 rpm
> Device is:In smartctl database [for details use: -P show]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
> Local Time is:Fri Jul  7 00:30:10 2017 MDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x00) Offline data collection activity
> was never started.
> Auto Offline Data Collection:
> Disabled.
> Self-test execution status:  (   0) The previous self-test routine
> completed
> without error or no self-test
> has ever
> been run.
> Total time to complete Offline
> data collection:(40500) seconds.
> Offline data collection
> capabilities:(0x7b) SMART execute Offline i

Re: Chunk root problem

2017-07-07 Thread Daniel Brady
On 7/6/2017 11:48 PM, Roman Mamedov wrote:
> On Wed, 5 Jul 2017 22:10:35 -0600
> Daniel Brady <drbr...@gmail.com> wrote:
> 
>> parent transid verify failed
> 
> Typically in Btrfs terms this means "you're screwed", fsck will not fix it, 
> and
> nobody will know how to fix or what is the cause either. Time to restore from
> backups! Or look into "btrfs restore" if you don't have any.
> 
> In your case it's especially puzzling as the difference in transid numbers is
> really significant (about 100K), almost like the FS was operating for months
> without updating some parts of itself -- and no checksum errors either, so
> all looks correct, except that everything is horribly wrong.
> 
> This kind of error seems to occur more often in RAID setups, either Btrfs
> native RAID, or with Btrfs on top of other RAID setups -- i.e. where it
> becomes a complex issue that all writes to multi devices DO complete IN order,
> in case of an unclean shutdown. (which is much simpler on a single device FS).
> 
> Also one of your disks or cables is failing (was /dev/sde on that boot, but 
> may
> get a different index next boot), check SMART data for it and replace.
> 
>> [   21.230919] BTRFS info (device sdf): bdev /dev/sde errs: wr 402545, rd
>> 234683174, flush 194501, corrupt 0, gen 0
> 

Well that's not good news. Unfortunately I made a fatal error in not
having a backup. Restore looks like I could recover a good chunk of it
from the dry runs, however it has a lot of trouble reading many files.
I'm sure that is related to the one disk (sde). Drives were setup as raid56.

After updating the kernel as suggested in the email from Duncan it
reduced the "parent transid verify" errors down to just one and the errs
on sde still exist.

[   21.400190] BTRFS info (device sdb): use no compression
[   21.400191] BTRFS info (device sdb): disk space caching is enabled
[   21.400192] BTRFS info (device sdb): has skinny extents
[   21.584923] BTRFS info (device sdb): bdev /dev/sde errs: wr 402545,
rd 234683174, flush 194501, corrupt 0, gen 0
[   23.394788] BTRFS error (device sdb): parent transid verify failed on
5257838690304 wanted 591492 found 489231
[   23.416489] BTRFS error (device sdb): parent transid verify failed on
5257838690304 wanted 591492 found 489231
[   23.416524] BTRFS error (device sdb): failed to read block groups: -5
[   23.448478] BTRFS error (device sdb): open_ctree failed

I ran a SMART test as you suggested with a passing result. I also
swapped SATA cables & power with another drive and the error followed
the drive confirmed by the serial via SMART. It seems like it just can't
read from that one drive for whatever reason. I also tried disconnecting
the drive and trying to mount it degraded with no luck. Still had the
transid error just with null as the bdev.

smartctl -a /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.12.0-1.el7.elrepo.x86_64]
(local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD30EFRX-68EUZN0
Serial Number:WD-WCC4N0PEYTEV
LU WWN Device Id: 5 0014ee 2b7dbfe54
Firmware Version: 82.00A82
User Capacity:3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate:5400 rpm
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:Fri Jul  7 00:30:10 2017 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status:  (   0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection:(40500) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.

Re: Chunk root problem

2017-07-07 Thread Daniel Brady
On 7/6/2017 2:26 AM, Duncan wrote:
> Daniel Brady posted on Wed, 05 Jul 2017 22:10:35 -0600 as excerpted:
>
>> My system suddenly decided it did not want to mount my BTRFS setup. I
>> recently rebooted the computer. When it came back, the file system was
>> in read only mode. I gave it another boot, but now it does not want to
>> mount at all. Anything I can do to recover? This is a Rockstor setup
>> that I have had running for about a year.
>>
>> uname -a
>> Linux hobonas 4.10.6-1.el7.elrepo.x86_64 #1 SMP Sun Mar 26
>> 12:19:32 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
>>
>> btrfs --version
>> btrfs-progs v4.10.1
>
> FWIW, open ctree failed is the btrfs-generic error, but the transid
> faileds may provide some help.
>
> Addressing the easy answer first...
>
> What btrfs raid mode was it configured for?  If raid56, you want the
> brand new 4.12 kernel at least, as there were serious bugs in previous
> kernels' raid56 mode.  DO NOT ATTEMPT A FIX OF RAID56 MODE WITH AN
> EARLIER KERNEL, IT'S VERY LIKELY TO ONLY CAUSE FURTHER DAMAGE!  But if
> you're lucky, kernel 4.12 can auto-repair it.
>
> With those fixes the known bugs are fixed, but we'll need to wait a
> few
> cycles to see what the reports are.  Even then, however, due to the
> infamous parity-raid write hole and the fact that the parity isn't
> checksummed, it's not going to be as stable as raid1 or raid10 mode.
> Parity-checksumming will take a new implementation and I'm not sure if
> anyone's actually working on that or not.  But at least until we see
> how
> stable the newer raid56 code is, 2-4 kernel cycles, it's not
> recommended
> except for testing only, with even more backups than normal.
>
> If you were raid1 or raid10 mode, the raid mode is stable so it's a
> different issue.  I'll let the experts take it from here.  Single or
> raid0 mode would of course be similar, but without the protection of
> the
> second copy, making it less resilient.

The raid mode was configured for raid56... unfortunately. I learned of
the potential instability after it died. I have not attempted to repair
it yet because of the possible corruption. I've only tried various ways
of mounting it and dry runs of the restore function.

I did as you mentioned and upgraded to kernel 4.12. The auto-repair
seemed to fix quite a few things, but it is not quite there. Even with a
few reboots.

uname -r
4.12.0-1.el7.elrepo.x86_64

rpm -qa | grep btrfs
btrfs-progs-4.10.1-0.rockstor.x86_64

dmesg
[   21.400190] BTRFS info (device sdb): use no compression
[   21.400191] BTRFS info (device sdb): disk space caching is enabled
[   21.400192] BTRFS info (device sdb): has skinny extents
[   21.584923] BTRFS info (device sdb): bdev /dev/sde errs: wr 402545,
rd 234683174, flush 194501, corrupt 0, gen 0
[   23.394788] BTRFS error (device sdb): parent transid verify failed on
5257838690304 wanted 591492 found 489231
[   23.416489] BTRFS error (device sdb): parent transid verify failed on
5257838690304 wanted 591492 found 489231
[   23.416524] BTRFS error (device sdb): failed to read block groups: -5
[   23.448478] BTRFS error (device sdb): open_ctree failed

-Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Chunk root problem

2017-07-05 Thread Daniel Brady
Hello,

My system suddenly decided it did not want to mount my BTRFS setup. I
recently rebooted the computer. When it came back, the file system was
in read only mode. I gave it another boot, but now it does not want to
mount at all. Anything I can do to recover? This is a Rockstor setup
that I have had running for about a year.

uname -a
Linux hobonas 4.10.6-1.el7.elrepo.x86_64 #1 SMP Sun Mar 26 12:19:32
EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

btrfs --version
btrfs-progs v4.10.1

btrfs fi show
Label: 'rockstor_rockstor'  uuid: 33e2af57-c30a-468a-9ed5-22994780f6b4
Total devices 1 FS bytes used 5.50GiB
devid1 size 215.39GiB used 80.02GiB path /dev/sda3

Label: 'Nexus'  uuid: 1c3595a9-3faa-4973-affc-ee8d14d922bf
Total devices 5 FS bytes used 3.93TiB
devid1 size 2.73TiB used 1.12TiB path /dev/sdd
devid2 size 2.73TiB used 1.12TiB path /dev/sdb
devid3 size 2.73TiB used 1.12TiB path /dev/sdc
devid4 size 2.73TiB used 1.12TiB path /dev/sdf
devid5 size 2.73TiB used 1.12TiB path /dev/sde

dmesg
[   18.572846] BTRFS: device label Nexus devid 2 transid 595679 /dev/sdb
[   18.572933] BTRFS: device label Nexus devid 3 transid 595679 /dev/sdc
[   18.573027] BTRFS: device label Nexus devid 1 transid 595679 /dev/sdd
[   18.573119] BTRFS: device label Nexus devid 5 transid 595679 /dev/sde
[   18.573200] BTRFS: device label Nexus devid 4 transid 595679 /dev/sdf
[   20.846060] device-mapper: uevent: version 1.0.3
[   20.846114] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23)
initialised: dm-de...@redhat.com
[   21.073884] BTRFS info (device sdf): use no compression
[   21.073886] BTRFS info (device sdf): disk space caching is enabled
[   21.073887] BTRFS info (device sdf): has skinny extents
[   21.084353] BTRFS error (device sdf): parent transid verify failed
on 8419247390720 wanted 542466 found 485869
[   21.230919] BTRFS info (device sdf): bdev /dev/sde errs: wr 402545,
rd 234683174, flush 194501, corrupt 0, gen 0
[   21.794749] BTRFS error (device sdf): parent transid verify failed
on 893915128 wanted 594920 found 490791
[   21.841317] BTRFS error (device sdf): parent transid verify failed
on 8939187814400 wanted 594923 found 490824
[   21.870392] BTRFS error (device sdf): parent transid verify failed
on 8418984427520 wanted 594877 found 490575
[   21.951901] BTRFS error (device sdf): parent transid verify failed
on 8939107860480 wanted 594915 found 465207
[   22.015789] BTRFS error (device sdf): parent transid verify failed
on 8939284430848 wanted 594958 found 465274
[   22.034840] BTRFS error (device sdf): parent transid verify failed
on 8418907701248 wanted 594869 found 351596
[   22.070516] BTRFS error (device sdf): parent transid verify failed
on 8939032035328 wanted 594899 found 465175
[   22.091734] BTRFS error (device sdf): parent transid verify failed
on 8939123818496 wanted 594917 found 490777
[   22.110531] BTRFS error (device sdf): parent transid verify failed
on 8939121917952 wanted 594917 found 490775
[   23.393973] BTRFS error (device sdf): failed to read block groups: -5
[   23.419807] BTRFS error (device sdf): open_ctree failed

mount -t btrfs -o recovery,ro /dev/sdb /mnt2/Nexus
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


could not do orphan cleanup -22, btrfsck using 100% CPU, no activity

2017-01-14 Thread Daniel Pocock

I had a system that experienced a kernel panic and after rebooting, one
of the btrfs filesystems doesn't mount on the first attempt

The filesystem does mount if I run the mount command manually in the
emergency shell

The following messages appear in the kernel log:


BTRFS critical (device sdc1): corrupt leaf, bad key order:
block=790251626496,root=1, slot=46
BTRFS error (device sdc1): Error removing orphan entry, stopping orphan
cleanup
BTRFS error (device sdc1): could not do orphan cleanup -22



There is a particular file that is now inaccessible.  It is not
important, but any attempt to access it gives an IO error.  When I look
at it with 'ls', it shows lots of question marks:

$ ls broken-file.txt
? ??filename



I was able to copy all files off the filesystem, except that one, using
rsync, so I just created a new filesystem and started using that instead.

However, I kept a copy of the broken filesystem for troubleshooting

I tried running

# btrfsck /dev/sdc1

and a lot of output appears:

checking extents
bad key ordering 46 47
bad block 790251626496
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
bad key ordering 46 47
root 367 inode 474635 errors 2000, link count wrong
unresolved ref dir 20136875 index 5 namelen 6 name foobar
filetype 0 errors 3, no dir item, no dir index
  .. many errors like that


root 367 inode 19964842 errors 400, nbytes wrong

root 367 inode 19964855 errors 2001, no inode item, link count wrong
unresolved ref dir 11208629 index 100627 namelen 6 name Tb8Qlf
filetype 1 errors 4, no inode ref
  ... and many like that 

Checking filesystem on /dev/sdc1
UUID:
found 92684783616 bytes used err is 1
total csum bytes: 88527744
total tree bytes: 439730176
total fs tree bytes: 174620672
total extent tree bytes: 127827968
btree space waste bytes: 131661981
file data blocks allocated: 6149394432
 referenced 2896289792



Then I tried btrfsck --repair while monitoring with top and iostat

I notice that there is read activity for about a minute, btrfsck sits
there for a long time (over 30 minutes) using 100% CPU and a constant
4.4% of RAM and no more disk activity

If I enable the btrfsck progress indicator, the animation appears, but
still no disk activity

I saw previous discussions about the "could not do orphan cleanup -22"
messages and it is not clear if this is something that needs to be fixed.

The kernel is 4.8.0-2-amd64 and I tried both btrfs-progs v4.7.3 (Debian)
and v4.9 (compiled myself)

Regards,

Daniel


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Huge load on btrfs subvolume delete

2016-08-15 Thread Daniel Caillibaud
Le 15/08/16 à 10:16, "Austin S. Hemmelgarn" <ahferro...@gmail.com> a écrit :
ASH> With respect to databases, you might consider backing them up separately 
ASH> too.  In many cases for something like an SQL database, it's a lot more 
ASH> flexible to have a dump of the database as a backup than it is to have 
ASH> the database files themselves, because it decouples it from the 
ASH> filesystem level layout.

With mysql|mariadb, having a consistent dump needs to lock tables during dump, 
not acceptable on
production servers. 

Even with specialised tools for hotdump, doing the dump on prod servers is too 
heavy about I/O
(I have huge db, writing the dump is expensive and long).

I used to have a slave juste for the dump (easy to stop slave, dump, and start 
slave), but after
a while it wasn't able to follow the writings all the day long (prod was on ssd 
and it wasn't,
dump hd was 100% busy all the day long), so it's for me really easier to rsync 
the raw
files once a day on a cheap host before dump.

(of course, I need to flush & lock table during the snapshot, before rsync, but 
it's just one or
two seconds, still acceptable)

-- 
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Huge load on btrfs subvolume delete

2016-08-15 Thread Daniel Caillibaud
Le 15/08/16 à 08:32, "Austin S. Hemmelgarn" <ahferro...@gmail.com> a écrit :

ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote:
ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs 
subvolume delete
[…]

ASH> Before I start explaining possible solutions, it helps to explain what's 
ASH> actually happening here.
[…]

Thanks a lot for these clear and detailed explanations.

ASH> > Is there a better way to do so ?

ASH> While there isn't any way I know of to do so, there are ways you can 
ASH> reduce the impact by reducing how much your backing up:

Thanks for these clues too !

I'll use --commit-after, in order to wait for complete deletion before starting 
rsync the next
snapshot, and I keep in mind the benefit of putting /var/log outside the main 
subvolume of the
vm (but I guess my main pb is about databases, because their datadir are the 
ones with most
writes).

-- 
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Huge load on btrfs subvolume delete

2016-08-15 Thread Daniel Caillibaud
Hi,

I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume 
delete

I use snapshots on lxc hosts under debian jessie with
- kernel 4.6.0-0.bpo.1-amd64
- btrfs-progs 4.6.1-1~bpo8

For backup, I have each day, for each subvolume

btrfs subvolume snapshot -r $subvol $snap
# then later
ionice -c3 btrfs subvolume delete $snap

but ionice doesn't seems to have any effect here and after a few minutes the 
load grows up
quite high (30~40), and I don't know how to make this deletion nicer with I/O

Is there a better way to do so ?

Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems 
the one doing a
lot of I/O ?

Actually my io priority on btrfs process are 

ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}'
  [btrfs-worker] none: prio 4
   [btrfs-worker-hi] none: prio 4
[btrfs-delalloc] none: prio 4
   [btrfs-flush_del] none: prio 4
   [btrfs-cache] none: prio 4
  [btrfs-submit] none: prio 4
   [btrfs-fixup] none: prio 4
   [btrfs-endio] none: prio 4
   [btrfs-endio-met] none: prio 4
   [btrfs-endio-met] none: prio 4
   [btrfs-endio-rai] none: prio 4
   [btrfs-endio-rep] none: prio 4
 [btrfs-rmw] none: prio 4
   [btrfs-endio-wri] none: prio 4
   [btrfs-freespace] none: prio 4
   [btrfs-delayed-m] none: prio 4
   [btrfs-readahead] none: prio 4
   [btrfs-qgroup-re] none: prio 4
   [btrfs-extent-re] none: prio 4
 [btrfs-cleaner] none: prio 0
   [btrfs-transacti] none: prio 0



Thanks

-- 
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: attempt to mount after crash during rebalance hard crashes server

2016-03-30 Thread Warren, Daniel
Sorry, I had about 3.5MB if xterm buffer, including my test to see if
I would get a panic with the old kernel i had left in grub - I grabbed
the wrong panic.

running 4.4.6 ( which deb packages as 4.4.0 for some reason - I was
confused) I am able to capture this on a mount attempt before my ssh
connection fails:

Mar 30 09:51:38 ds4-ls0 kernel: [67178.590745] BTRFS info (device
dm-45): disk space caching is enabled
Mar 30 09:51:38 ds4-ls0 systemd[1]: systemd-udevd.service: Got
notification message from PID 338 (WATCHDOG=1)
Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: seq 3514 queued, 'add' 'bdi'
Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: Validate module index
Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: Check if link
configuration needs reloading.
Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: seq 3514 forked new worker [7411]
Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: seq 3514 running
Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: passed device to netlink
monitor 0x55c10d5c79b0
Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: seq 3514 processed
Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: cleanup idle workers
Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: Unload module index
Mar 30 09:51:38 ds4-ls0 systemd-udevd[7411]: Unloaded link
configuration context.
Mar 30 09:51:38 ds4-ls0 systemd-udevd[338]: worker [7411] exited
Mar 30 09:51:38 ds4-ls0 kernel: [67178.841517] BTRFS info (device
dm-45): bdev /dev/dm-31 errs: wr 13870290, rd 9, flush 2798850,
corrupt 0, gen 0
Mar 30 09:52:09 ds4-ls0 kernel: [67207.430391] BUG: unable to handle
kernel NULL pointer dereference at 01f0
Mar 30 09:52:09 ds4-ls0 kernel: [67207.477511] IP:
[] can_overcommit+0x1e/0xf0 [btrfs]
Mar 30 09:52:09 ds4-ls0 kernel: [67207.516215] PGD 0


I ran check last night - the output is about 23MB - don't know if that
is useful, or where to look.

I only posted at the recommendation of someone in IRC, in hopes to be
helpful, as a kernel panic seems an extreme result of a corrupted FS.

This machine is an off site copy of a file archive, I need to either
fix or recreate it to maintain redundancy, but the up-time
requirements are basically 0.

The old kernel is the result of this machine being built when it was
and then basically left as a black box.

If poking at this is not of use to anybody I'll just run check
--repair and see what I get.

Daniel Warren
Unix System Admin,Compliance Infrastructure Architect, ITServices
MCMC LLC


On Tue, Mar 29, 2016 at 6:55 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Warren, Daniel posted on Tue, 29 Mar 2016 16:21:28 -0400 as excerpted:
>
>> I'm running 4.4.0 from deb sid
>
> Correction.
>
> According to the kernel panic you posted at...
>
> http://pastebin.com/aBF6XmzA
>
> ... you're running kernel 3.16.something.
>
> You might be running btrfs-progs userspace 4.4.0, but on mounted
> filesystems it's the kernel code that counts, not the userspace code.
>
> Btrfs is still stabilizing, and kernel 3.16 is ancient history.  On this
> list we're forward focused and track mainline.  If your distro supports
> btrfs on that old a kernel, that's their business, but we don't track
> what patches they may or may not have backported and thus can't really
> support it here very well, so in that case, you really should be looking
> to your distro for that support, as they know what they've backported and
> what they haven't, and are thus in a far better position to provide that
> support.
>
> On this list, meanwhile, we recommend one of two kernel tracks, both
> mainline, current or LTS.  On current we recommend and provide the best
> support for the latest two kernel series.  With 4.5 out that's 4.5 and
> 4.4.
>
> On the LTS track, the former position was similar, the latest two LTS
> kernel series, with 4.4 being the latest and 4.1 the previous one.
> However, as btrfs has matured, now the second LTS series back, 3.18,
> wasn't bad, and while we still really recommend the last couple LTS
> series, we do recognize that some people will still be on 3.18 and we
> still do our best to support them as well.
>
> But before 3.18, and on non-mainline-LTS kernels more than two back, so
> currently 4.4, while we'll still do the best we can, unless it's a known
> issue recognizable on sight, very often that best is simply to ask that
> people upgrade to something reasonably current and report back with their
> results then, if the problem remains.
>
> As for btrfs-progs userspace, during normal operations, most of the time
> the userspace code simply calls the appropriate kernel functionality to
> do the real work, so userspace version isn't as important.  Mkfs.btrfs is
> an exception, and of course once the filesystem is having issues and
> you're using btrfs check or btrfs restore, along with other tools, to try
> to diagnose and fix the problem or at least to recover 

attempt to mount after crash during rebalance hard crashes server

2016-03-29 Thread Warren, Daniel
Greetings all,

I'm running 4.4.0 from deb sid

My server crashed during a balance after I had added 10 disks to the
original 15, I have not been able to bring the FS up since, it causes
a system crash

btrfs fi sh looks fine, but when I mount , it crashes the server with
a NULL pointer dereference error
Each Disk in the set is LUKS encrypted

btrfs fi sh http://pastebin.com/QLTqSU8L
kernel panic http://pastebin.com/aBF6XmzA

If it's of any use I can run tests before I attempt check --repair

I can let this sit a day or two if any data gathering would be of use.


Daniel Warren
Unix System Admin,Compliance Infrastructure Architect, ITServices
MCMC LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


More memory more jitters?

2015-11-14 Thread CHENG Yuk-Pong, Daniel
Hi List,


I have read the Gotcha[1] page:

   Files with a lot of random writes can become heavily fragmented
(1+ extents) causing trashing on HDDs and excessive multi-second
spikes of CPU load on systems with an SSD or **large amount a RAM**.

Why could large amount of memory worsen the problem?

If **too much** memory is a problem, is it possible to limit the
memory btrfs use?

Background info:

I am running a heavy-write database server with 96GB ram. In the worse
case it cause multi minutes of high cpu loads. Systemd keeping kill
and restarting services, and old job don't die because they stuck in
uninterruptable wait... etc.

Tried with nodatacow, but it seems only affect new file. It is not an
subvolume option either...


Regards,
Daniel


[1] https://btrfs.wiki.kernel.org/index.php/Gotchas#Fragmentation
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs progs 4.1.1 & 4.2 segfault on chunk-recover

2015-09-17 Thread Daniel Wiegert
Hello guys

I think I might found a bug, Lots of text, I dont know what you want
from me and not, so I try to get almost everything in one mail, please
dont shoot me! :)

To make a long store somewhat short, this is about what happend to me;
(skip to  if you dont care about history)

Arch-linux, btrfs-progs 4.1.1 & 4.2, linux 4.1.6-1

Data, RAID5: total=3.11TiB, used=0.00B <-- this one said the other day
used=3.05TiB
System, RAID1: total=32.00MiB, used=0.00B
Metadata, RAID1: total=8.00GiB, used=144.00KiB
GlobalReserve, single: total=16.00MiB, used=0.00B

Label: 'Isolinear'  uuid: 9bb3f369-f2a9-46be-8dde-1106ae740e36
Total devices 9 FS bytes used 144.00KiB
devid7 size 2.73TiB used 541.12GiB path /dev/sdi
devid9 size 1.36TiB used 533.09GiB path /dev/sdd2
devid   10 size 1.36TiB used 533.09GiB path /dev/sdg2
devid   11 size 1.82TiB used 536.12GiB path /dev/sdj2
devid   12 size 1.82TiB used 538.09GiB path /dev/sdh2
devid   13 size 286.09GiB used 286.09GiB path /dev/sda3
devid   14 size 286.09GiB used 286.09GiB path /dev/sdb3
devid   15 size 372.61GiB used 372.61GiB path /dev/sdf1
*** Some devices missing

drive 8 was a 1.36TiB
drive 15 is the new drive I added to the system.


*one of 8 drives started to fail, smart saw error, I failed in my
configure and I didn't get notified - Ran for 3-14 days before I
realized.
*I tried on active running system to btrfs dev del /dev/sd[failing] -
Did not work (I think it was csum errors)
*I added one new disk to raid, rebooted and added new disk to array,
tried balancing. Power fail and ups fail after x hours
*I rebooted realized the failing drive was now dead. I could mount
system with degraded and some files gave me kernel panic (
https://goo.gl/photos/UXrZj6YEUW3945b37 )- others were reading fine.
-Was unable to dev del missing.

At this point I knew the system was probobly broken beyond repair. so
I just tried all commands I could think of. check repair, check
init-csum-tree etc endless loop - First very fast text scrolling, lots
of CPU not much diskIO, after ~48h text slow, lots of cpu, almost no
diskIO same type of message repeating (with new numbers):
-
ref mismatch on [17959857729536 4096] extent item 0, found 1
adding new data backref on 17959857729536 parent 35277570539520 owner
0 offset 0 found 1
Backref 17959857729536 parent 35277570539520 owner 0 offset 0 num_refs
0 not found in extent tree
Incorrect local backref count on 17959857729536 parent 35277570539520
owner 0 offset 0 found 1 wanted 0 back 0x145f7800
backpointer mismatch on [17959857729536 4096]
ref mismatch on [17959857733632 4096] extent item 0, found 1
adding new data backref on 17959857733632 parent 35277570785280 owner
0 offset 0 found 1
Backref 17959857733632 parent 35277570785280 owner 0 offset 0 num_refs
0 not found in extent tree
Incorrect local backref count on 17959857733632 parent 35277570785280
owner 0 offset 0 found 1 wanted 0 back 0x145f7b90
backpointer mismatch on [17959857733632 4096]
-

 Found out that chunk-recover gave segfault.(4.1.1 & kdave 4.2)
4.1.1 said in bt:
#0  0x004251bb in btrfs_new_device_extent_record ()
#1  0x004301cb in ?? ()
#2  0x0043085d in ?? ()
#3  0x7fd8071074a4 in start_thread () from /usr/lib/libpthread.so.0
#4  0x7fd806e4513d in clone () from /usr/lib/libc.so.6

not much help, but I compiled -> https://github.com/kdave/btrfs-progs
and backtrace:

-->  http://pastebin.com/XqRrqAB5

I can repeat the segfault. I made two btrfs-image , one is around 4MB
the other is around 300MB think it was.

So, did I find a bug? I cant find my logs at the beginning of my
failing drive, what it said when I tried to remove the broken drive. I
might be able to try the setup again (Got one more
drive-about-to-fail)



ps;
Ive tried to make alpine to work, but it wont accept my passwords, I
hope gmail web client is ok for you guys, openwrt dev team rejected my
posts just because of this email client

best regards
Daniel

end
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: (renamed thread) btrfs metrics, free space reporting

2015-08-30 Thread Daniel Pocock


On 05/01/12 11:09, Daniel Pocock wrote:
 

 From there on, one could potentially create a matrix: (proportional
 font art, apologies):

   | subvol1  | subvol2  | subvol3  |
 --+--+--+--+
  subvol1  |   200M   | 20M  | 50M  |
 --+--+--+--+
  subvol2  |20M   |350M  | 22M  |
 --+--+--+--+
  subvol3  |50M   | 22M  |634M  |
 --+--+--+--+

 The diagonal obviously shows the unique blocks, subvol2 and subvol1
 share 20M data, etc. Missing from this plot would be how much is
 shared between subvol1, subvol2, and subvol3 together, but it's a
 start and not something that hard to understand. One might add a
 column for total size of each subvol, which may obviously not be an
 addition of the rest of the columns in this diagram.

 Anyway, something like this would be high on my list of `df` numbers
 I'd like to see - since I think they are useful numbers.

 
 This is an interesting way to look at it
 
 Ganglia typically records time series data, it is quite conceivable to
 create a metric for every permutation in each and store that in rrdtool
 
 The challenge would then be in reporting on the data: the rrdtool graphs
 use time as an X-axis, and then it can display multiple Y values
 
 However, now that I've started thinking about the type of data generated
 from btrfs, I was wondering if some kind of rr3dtool is needed - a 3D
 graphing solution - or potentially making graphs that do not include
 time on any axis?
 
 Has anyone seen anything similar for administering ZFS, for example?
 


I just wanted to follow up on this and see if anybody had any more
comments or if the situation has changed?

One other thing that came to mind for me is the idea of letting the
local system administrator define views (similar to views in SQL) and
also nominate which of the views should be used to return values for the
standard df command.

This would allow existing monitoring tools and scripts to continue
getting some data that is considered sensible for a specific context.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


disk failure but no alert

2015-08-19 Thread Daniel Pocock


There are two large disks, part of the disks partitioned for MD RAID1
and the rest of the disks partitioned for BtrFs RAID1

One of the disks (/dev/sdd) appears to have failed, there were plenty of
alerts from MD (including dmesg and emails) but nothing from the BtrFs
filesystem

Could this just be a problem on a sector within the MD RAID1 partition
(/dev/sdd2) or is BtrFs failing to alert?  If there is a failure on
another partition on the same disk, should BtrFs be notified by the
kernel in some way and should it consider the filesystem to be at risk?

Should I do anything proactively to stop BtrFs using the /dev/sdd3
partition now?  Unfortunately it is not possible to get a new disk to
this server in the same day and it may just be shut down until the disk
can be replaced.

# uname -a
Linux - 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04)
x86_64 GNU/Linux

# btrfs fi show /dev/sdd3
Label: none  uuid: -
Total devices 2 FS bytes used 1.74TiB
devid1 size 4.55TiB used 1.75TiB path /dev/sdd3
devid2 size 4.55TiB used 1.75TiB path /dev/sda3

Btrfs v3.17


Here is the dmesg output:

[996932.734999] sd 0:0:3:0: [sdd] 
[996932.735039] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[996932.735047] sd 0:0:3:0: [sdd] 
[996932.735053] Sense Key : Illegal Request [current]
[996932.735062] Info fld=0x80808
[996932.735069] sd 0:0:3:0: [sdd] 
[996932.735078] Add. Sense: Logical block address out of range
[996932.735085] sd 0:0:3:0: [sdd] CDB:
[996932.735089] Write(16): 8a 00 00 00 00 00 00 08 08 08 00 00 00 02 00 00
[996932.735110] end_request: critical target error, dev sdd, sector 526344
[996932.735280] md: super_written gets error=-121, uptodate=0
[996932.735290] md/raid1:md2: Disk failure on sdd2, disabling device.
md/raid1:md2: Operation continuing on 1 devices.
[996932.777853] RAID1 conf printout:
[996932.777917]  --- wd:1 rd:2
[996932.777925]  disk 0, wo:0, o:1, dev:sda2
[996932.777931]  disk 1, wo:1, o:0, dev:sdd2
[996932.794052] RAID1 conf printout:
[996932.794063]  --- wd:1 rd:2
[996932.794069]  disk 0, wo:0, o:1, dev:sda2


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Documentation: filesystems: btrfs: Fixed typos and whitespace

2015-07-08 Thread Daniel Grimshaw
I am a high school student trying to become familiar with Linux kernel 
development. The btrfs documentation in Documentation/filesystems had a few 
typos and errors in whitespace. This patch corrects both of these.

Signed-off-by: Daniel Grimshaw grims...@linux.vnet.ibm.com

---
diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index d11cc2f..57d9d54 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -61,7 +61,7 @@ Options with (*) are default options and will not show in the 
mount options.
 
 check_int enables the integrity checker module, which examines all
 block write requests to ensure on-disk consistency, at a large
-memory and CPU cost. 
+memory and CPU cost.
 
 check_int_data includes extent data in the integrity checks, and
 implies the check_int option.
@@ -113,7 +113,7 @@ Options with (*) are default options and will not show in 
the mount options.
 Disable/enable debugging option to be more verbose in some ENOSPC 
conditions.
 
   fatal_errors=action
-Action to take when encountering a fatal error:
+Action to take when encountering a fatal error:
   bug - BUG() on a fatal error.  This is the default.
   panic - panic() on a fatal error.
 
@@ -132,10 +132,10 @@ Options with (*) are default options and will not show in 
the mount options.
 
   max_inline=bytes
 Specify the maximum amount of space, in bytes, that can be inlined in
-a metadata B-tree leaf.  The value is specified in bytes, optionally
+a metadata B-tree leaf.  The value is specified in bytes, optionally
 with a K, M, or G suffix, case insensitive.  In practice, this value
 is limited by the root sector size, with some space unavailable due
-to leaf headers.  For a 4k sectorsize, max inline data is ~3900 bytes.
+to leaf headers.  For a 4k sector size, max inline data is ~3900 bytes.
 
   metadata_ratio=value
 Specify that 1 metadata chunk should be allocated after every value
@@ -161,7 +161,7 @@ Options with (*) are default options and will not show in 
the mount options.
 
   datasum(*)
   nodatasum
-Enable/disable data checksumming for newly created files.
+Enable/disable data check-summing for newly created files.
 Datasum implies datacow.
 
   treelog(*)
@@ -170,7 +170,7 @@ Options with (*) are default options and will not show in 
the mount options.
 
   recovery
 Enable autorecovery attempts if a bad tree root is found at mount time.
-Currently this scans a list of several previous tree roots and tries to
+Currently this scans a list of several previous tree roots and tries to
 use the first readable.
 
   rescan_uuid_tree
@@ -194,7 +194,7 @@ Options with (*) are default options and will not show in 
the mount options.
   ssd_spread
 Options to control ssd allocation schemes.  By default, BTRFS will
 enable or disable ssd allocation heuristics depending on whether a
-rotational or nonrotational disk is in use.  The ssd and nossd options
+rotational or non-rotational disk is in use.  The ssd and nossd options
 can override this autodetection.
 
 The ssd_spread mount option attempts to allocate into big chunks
@@ -216,13 +216,13 @@ Options with (*) are default options and will not show in 
the mount options.
 This allows mounting of subvolumes which are not in the root of the mounted
 filesystem.
 You can use btrfs subvolume show  to see the object ID for a subvolume.
-   
+
   thread_pool=number
 The number of worker threads to allocate.  The default number is equal
 to the number of CPUs + 2, or 8, whichever is smaller.
 
   user_subvol_rm_allowed
-Allow subvolumes to be deleted by a non-root user. Use with caution.
+Allow subvolumes to be deleted by a non-root user. Use with caution.
 
 MAILING LIST
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Documentation: filesystems: btrfs: Fixed typos and whitespace

2015-07-08 Thread Daniel Grimshaw
I am a high school student trying to become familiar with
Linux kernel development. The btrfs documentation in
Documentation/filesystems had a few typos and errors in
whitespace. This patch corrects both of these.

This is a resend of an earlier patch with corrected patchfile.

Signed-off-by: Daniel Grimshaw grims...@linux.vnet.ibm.com
---
 Documentation/filesystems/btrfs.txt |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index d11cc2f..c772b47 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -61,7 +61,7 @@ Options with (*) are default options and will not show in the 
mount options.
 
check_int enables the integrity checker module, which examines all
block write requests to ensure on-disk consistency, at a large
-   memory and CPU cost.  
+   memory and CPU cost.
 
check_int_data includes extent data in the integrity checks, and
implies the check_int option.
@@ -113,7 +113,7 @@ Options with (*) are default options and will not show in 
the mount options.
Disable/enable debugging option to be more verbose in some ENOSPC 
conditions.
 
   fatal_errors=action
-   Action to take when encountering a fatal error: 
+   Action to take when encountering a fatal error:
  bug - BUG() on a fatal error.  This is the default.
  panic - panic() on a fatal error.
 
@@ -132,10 +132,10 @@ Options with (*) are default options and will not show in 
the mount options.
 
   max_inline=bytes
Specify the maximum amount of space, in bytes, that can be inlined in
-   a metadata B-tree leaf.  The value is specified in bytes, optionally 
+   a metadata B-tree leaf.  The value is specified in bytes, optionally
with a K, M, or G suffix, case insensitive.  In practice, this value
is limited by the root sector size, with some space unavailable due
-   to leaf headers.  For a 4k sectorsize, max inline data is ~3900 bytes.
+   to leaf headers.  For a 4k sector size, max inline data is ~3900 bytes.
 
   metadata_ratio=value
Specify that 1 metadata chunk should be allocated after every value
@@ -170,7 +170,7 @@ Options with (*) are default options and will not show in 
the mount options.
 
   recovery
Enable autorecovery attempts if a bad tree root is found at mount time.
-   Currently this scans a list of several previous tree roots and tries to 
+   Currently this scans a list of several previous tree roots and tries to
use the first readable.
 
   rescan_uuid_tree
@@ -194,7 +194,7 @@ Options with (*) are default options and will not show in 
the mount options.
   ssd_spread
Options to control ssd allocation schemes.  By default, BTRFS will
enable or disable ssd allocation heuristics depending on whether a
-   rotational or nonrotational disk is in use.  The ssd and nossd options
+   rotational or non-rotational disk is in use.  The ssd and nossd options
can override this autodetection.
 
The ssd_spread mount option attempts to allocate into big chunks
@@ -216,13 +216,13 @@ Options with (*) are default options and will not show in 
the mount options.
This allows mounting of subvolumes which are not in the root of the 
mounted
filesystem.
You can use btrfs subvolume show  to see the object ID for a 
subvolume.
-   
+
   thread_pool=number
The number of worker threads to allocate.  The default number is equal
to the number of CPUs + 2, or 8, whichever is smaller.
 
   user_subvol_rm_allowed
-   Allow subvolumes to be deleted by a non-root user. Use with caution. 
+   Allow subvolumes to be deleted by a non-root user. Use with caution.
 
 MAILING LIST
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[WIP][PATCH] tux3: preliminatry nospace handling

2015-05-21 Thread Daniel Phillips
Hi Josef,

This is a rollup patch for preliminary nospace handling in Tux3, in 
line with my post here:

   http://lkml.iu.edu/hypermail/linux/kernel/1505.1/03167.html

You still have ENOSPC issues. Maybe it would be helpful to look at 
what we have done. I saw a reproducible case with 1,000 tasks in 
parallel last week that went nospace while 28% full. You also are not
giving a very good picture of the true full state via df.

Our algorithm is pretty simple, reliable and fast. I do not see any 
reason why Btrfs could not do it basically the same way. In one way it 
is easier for you - you are not forced to commit the entire delta, you 
can choose the bits you want to force to disk as convenient. You have 
more different kinds of cache objects to account, but that should be 
just detail. Your current frontend accounting looks plausible.

We're trying something a bit different with df, to see how it flies - 
we don't always return the same number to f_blocks, we actually return 
the volume size less the accounting reserve, which is variable. The 
reserve gets smaller as freespace gets smaller, so it is not a nasty 
surprise to the user to see it change, rather a pleasant surprise. What 
it does is make the 100% really be 100%, less just a handful of blocks, 
and it makes used and available add up exactly to blocks. If the 
user wants to know how many blocks they really have, they can look at 
/proc/partitions.

Regards,

Daniel

diff --git a/fs/tux3/commit.c b/fs/tux3/commit.c
index 909a222..7043580 100644
--- a/fs/tux3/commit.c
+++ b/fs/tux3/commit.c
@@ -297,6 +297,7 @@ static int commit_delta(struct sb *sb)
tux3_wake_delta_commit(sb);
 
/* Commit was finished, apply defered bfree. */
+   sb-defreed = 0;
return unstash(sb, sb-defree, apply_defered_bfree);
 }
 
@@ -321,13 +322,13 @@ static int need_unify(struct sb *sb)
 /* For debugging */
 void tux3_start_backend(struct sb *sb)
 {
-   assert(current-journal_info == NULL);
+   assert(!change_active());
current-journal_info = sb;
 }
 
 void tux3_end_backend(void)
 {
-   assert(current-journal_info);
+   assert(change_active());
current-journal_info = NULL;
 }
 
@@ -337,12 +338,103 @@ int tux3_under_backend(struct sb *sb)
return current-journal_info == sb;
 }
 
+/* Internal use only */
+static struct delta_ref *to_delta_ref(struct sb *sb, unsigned delta)
+{
+   return sb-delta_refs[tux3_delta(delta)];
+}
+
+static block_t newfree(struct sb *sb)
+{
+   return sb-freeblocks + sb-defreed;
+}
+
+/*
+ * Reserve size should vary with budget. The reserve can include the
+ * log block overhead on the assumption that every block in the budget
+ * is a data block that generates one log record (or two?).
+ */
+block_t set_budget(struct sb *sb)
+{
+   block_t reserve = sb-freeblocks  7; /* FIXME: magic number */
+
+   if (1) {
+   if (reserve  max_reserve_blocks)
+   reserve = max_reserve_blocks;
+   if (reserve  min_reserve_blocks)
+   reserve = min_reserve_blocks;
+   } else if (0)
+   reserve = 10;
+
+   block_t budget = newfree(sb) - reserve;
+   if (1)
+   tux3_msg(sb, set_budget: free %Li, budget %Li, reserve %Li, 
newfree(sb), budget, reserve);
+   sb-reserve = reserve;
+   atomic_set(sb-budget, budget);
+   return reserve;
+}
+
+/*
+ * After transition, the front delta may have used some of the balance
+ * left over from this delta. The charged amount of the back delta is
+ * now stable and gives the exact balance at transition by subtracting
+ * from the old budget. The difference between the new budget and the
+ * balance at transition, which must never be negative, is added to
+ * the current balance, so the effect is exactly the same as if we had
+ * set the new budget and balance atomically at transition time. But
+ * we do not know the new balance at transition time and even if we
+ * did, we would need to add serialization against frontend changes,
+ * which are currently lockless and would like to stay that way. So we 
+ * let the current delta charge against the remaining balance until
+ * flush is done, here, then adjust the balance to what it would have
+ * been if the budget had been reset exactly at transition.
+ *
+ * We have:
+ *
+ *consumed = oldfree - free
+ *oldbudget = oldfree - reserve
+ *newbudget = free - reserve
+ *transition_balance = oldbudget - charged
+ * 
+ * Factoring out the reserve, the balance adjustment is:
+ * 
+ *adjust = newbudget - transition_balance
+ *   = (free - reserve) - ((oldfree - reserve) - charged)
+ *   = free + (charged - oldfree)
+ *   = charged + (free - oldfree)
+ *   = charged - consumed
+ *
+ * To extend for variable reserve size, add the difference between
+ * old and new reserve size to the balance adjustment.
+ */
+void reset_balance(struct sb *sb, unsigned delta

Re: Tux3 Report: How fast can we fsync?

2015-04-30 Thread Daniel Phillips
On 04/30/2015 04:14 AM, Filipe Manana wrote:
 
 On 04/30/2015 11:28 AM, Daniel Phillips wrote:
 It looks like Btrfs hit a bug, not a huge surprise. Btrfs hit an assert
 for me earlier this evening. It is rare but it happens.
 
 Hi Daniel,
 
 Would you mind reporting (to linux-btrfs@vger.kernel.org) the
 bug/assertion you hit during your tests with btrfs?

Kernel 3.19.0 under KVM with BTRFS mounted on a file in /tmp, see
the KVM command below. I believe I was running the 10,000 task test
using the sync program below: syncs foo 10 1.

346 [ cut here ]
347 kernel BUG at fs/btrfs/extent_io.c:4548!
348 invalid opcode:  [#1] PREEMPT SMP
349 Modules linked in:
350 CPU: 2 PID: 5754 Comm: sync6 Not tainted 3.19.0-56544-g65cf1a5 #756
351 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
01/01/2011
352 task: ec3c0ea0 ti: ec3ea000 task.ti: ec3ea000
353 EIP: 0060:[c1301a30] EFLAGS: 00010202 CPU: 2
354 EIP is at btrfs_release_extent_buffer_page+0xf0/0x100
355 EAX: 0001 EBX: f47198f0 ECX:  EDX: 0001
356 ESI: f47198f0 EDI: f61f1808 EBP: ec3ebbac ESP: ec3ebb9c
357  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
358 CR0: 8005003b CR2: b756a356 CR3: 2c3ce000 CR4: 06d0
359 Stack:
360  0005 f47198f0 f61f1000 f61f1808 ec3ebbc0 c1301a7f f47198f0 
361  f6a3d940 ec3ebbcc c1301ee5 d9a6c770 ec3ebbdc c12b436d fff92000 da136b20
362  ec3ebc74 c12e42b6 0c00   1000  
363 Call Trace:
364  [c1301a7f] release_extent_buffer+0x3f/0xb0
365  [c1301ee5] free_extent_buffer+0x45/0x80
366  [c12b436d] btrfs_release_path+0x2d/0x90
367  [c12e42b6] cow_file_range_inline+0x466/0x600
368  [c12e495e] cow_file_range+0x50e/0x640
369  [c12fdde1] ? find_lock_delalloc_range.constprop.42+0x2e1/0x320
370  [c12e5af9] run_delalloc_range+0x419/0x450
371  [c12fdf6b] writepage_delalloc.isra.32+0x14b/0x1d0
372  [c12ff20e] __extent_writepage+0xde/0x2b0
373  [c11208fd] ? find_get_pages_tag+0xad/0x120
374  [c130135c] extent_writepages+0x29c/0x350
375  [c12e1530] ? btrfs_direct_IO+0x300/0x300
376  [c12e009f] btrfs_writepages+0x1f/0x30
377  [c11299e5] do_writepages+0x15/0x40
378  [c112199f] __filemap_fdatawrite_range+0x4f/0x60
379  [c1121aa2] filemap_fdatawrite_range+0x22/0x30
380  [c12f4768] btrfs_fdatawrite_range+0x28/0x70
381  [c12f47d1] start_ordered_ops+0x21/0x30
382  [c12f4823] btrfs_sync_file+0x43/0x370
383  [c115c3e5] ? vfs_write+0x135/0x1c0
384  [c12f47e0] ? start_ordered_ops+0x30/0x30
385  [c1183e27] do_fsync+0x47/0x70
386  [c118403d] SyS_fsync+0xd/0x10
387  [c15bd8ae] syscall_call+0x7/0x7
388 Code: 8b 03 f6 c4 20 75 26 f0 80 63 01 f7 c7 43 1c 00 00 00 00 89 d8 e8 
61 94 e2 ff eb c3 8d
b4 26 00 00 00 00 83 c4 04 5b 5e 5f 5d c3 0f 0b 0f 0b 388 0f 0b 0f 0b 90 
8d b4 26 00 00 00 00
55 89 e5 57 56
389 EIP: [c1301a30] btrfs_release_extent_buffer_page+0xf0/0x100 SS:ESP 
0068:ec3ebb9c
390 ---[ end trace 12b9bbe75d9541a3 ]---

KVM command:

mkfs.btrfs -f /tmp/disk.img  kvm -kernel 
/src/linux-tux3/arch/x86/boot/bzImage -append
root=/dev/sda1 console=ttyS0 console=tty0 oops=panic tux3.tux3_trace=0 
-serial file:serial.txt
-hda /more/kvm/hdd.img -hdb /tmp/disk.img -net nic -net 
user,hostfwd=tcp::1234-:22 -smp 4 -m 2000

Source code:

/*
 * syncs.c
 *
 * D.R. Phillips, 2015
 *
 * To build: c99 -Wall syncs.c -o syncs
 * To run: ./syncs [filename [syncs [tasks]]]
 */

#include unistd.h
#include stdlib.h
#include stdio.h
#include fcntl.h
#include sys/wait.h
#include errno.h
#include sys/stat.h

char text[1024] = { hello world!\n };

int main(int argc, const char *argv[]) {
const char *basename = argc  1 ? foo : argv[1];
char name[100];
int steps = argc  3 ? 1 : atoi(argv[2]);
int tasks = argc  4 ? 1 : atoi(argv[3]);
int err, fd;

for (int t = 0; t  tasks; t++) {
snprintf(name, sizeof name, %s%i, basename, t);
if (!fork())
goto child;
}
for (int t = 0; t  tasks; t++)
wait(err);
return 0;

child:
fd = creat(name, S_IRWXU);
for (int i = 0; i  steps; i++) {
write(fd, text, sizeof text);
fsync(fd);
}
return 0;
}
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID1 migrate to bigger disks

2015-01-24 Thread Daniel Pocock



I've got a RAID1 on two 1TB partitions, /dev/sda3 and /dev/sdb3

I'm adding two new disks, they will have bigger partitions /dev/sdc3 and
/dev/sdd3

I'd like the BtrFs to migrate from the old partitions to the new ones as
safely and quickly as possible and if it is reasonable to do so, keeping
it online throughout the migration.

Should I do the following:

btrfs device add /dev/sdc3 /dev/sdd3 /mnt/btrfs0
btrfs device delete /dev/sda3 /dev/sdb3 /mnt/btrfs0

or should I do it this way:

btrfs device add /dev/sdc3 /mnt/btrfs0
btrfs device delete /dev/sda3 /mnt/btrfs0
btrfs device add /dev/sdd3 /mnt/btrfs0
btrfs device delete /dev/sdb3 /mnt/btrfs0

or is there some other way to go about it?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 migrate to bigger disks

2015-01-24 Thread Daniel Pocock
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 24/01/15 15:36, Hugo Mills wrote:
 On Sat, Jan 24, 2015 at 03:32:44PM +0100, Daniel Pocock wrote:
 
 
 
 I've got a RAID1 on two 1TB partitions, /dev/sda3 and /dev/sdb3
 
 I'm adding two new disks, they will have bigger partitions
 /dev/sdc3 and /dev/sdd3
 
 I'd like the BtrFs to migrate from the old partitions to the new
 ones as safely and quickly as possible and if it is reasonable to
 do so, keeping it online throughout the migration.
 
 Should I do the following:
 
 btrfs device add /dev/sdc3 /dev/sdd3 /mnt/btrfs0 btrfs device
 delete /dev/sda3 /dev/sdb3 /mnt/btrfs0
 
 or should I do it this way:
 
 btrfs device add /dev/sdc3 /mnt/btrfs0 btrfs device delete
 /dev/sda3 /mnt/btrfs0 btrfs device add /dev/sdd3 /mnt/btrfs0 
 btrfs device delete /dev/sdb3 /mnt/btrfs0
 
 or is there some other way to go about it?
 
 btrfs replace start /dev/sda3 /dev/sdc3 /mountpoint btrfs fi resize
 3:max /mountpoint btrfs replace start /dev/sdb3 /dev/sdd3
 /mountpoint btrfs fi resize 4:max /mountpoint
 
 The 3 and 4 in the resize commands should be the devid of the 
 newly-added device.

Thanks for the fast reply

In the event of power failure, can I safely shutdown the server during
this operation and resume after starting again?

I get more than 2 hours runtime from the UPS but I suspect that
migrating 1TB will take at least 12 hours.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQIcBAEBCAAGBQJUw7A+AAoJEOm1uwJp1aqDIuoQAKLkc0DgDj1bE6b6RPh6cnb9
lm+rjJD6aCo84dKZ3kVjaYCuUrFK5uXdj1D3ZrELD//jyjr6HbMK7CSJzuzfxmol
rMRM0NaVb4RJ4WPuzbnUjT8pmNytisfP9oG1mV9JmJ+y6sZ2ApvOQwPyHpHWglSL
D+H4clpOa3jXCeNoVxjm1eipLSWnnpSO4NVdXTIgBiHqUaR+LpKpUlh3QGtknvV/
uHegNuTJ+C4/Stp3hrzKy0/OUlIyzucFEXKPJbI/88XvMuZcL/XTO8FoHRP4r/vq
qjICj4Dtjv+xOOe7WKT1Gw8wCz/66xMfSXIUSU02insfwfh0/fFpAS6XybKk4UsH
i7LwswqduJMgFiVHv9bMvwyx3UdmhVJRotjGobVP3XPbI3GMSCEztXHdSGLOFE9D
/IksBehi0XNw/YWOaLcoyA2XTXahBTcsTtktkZStrn5kKXvOPuE7LDyjkHq/o9W8
IYvti9Dvx2IicdJxRM7+5F6bKON2O7foDuSUJFd6/WAkrLVwdudurTqGDmk+uIdS
kZVUVpehmdjYltUyb4wY/ATAvKnQTm/U18L04pSQIbdtdQZD7bAVl7PotLctgdHn
xf7TokhjJZZmOk4C29m+uAQHy0gobDDXlPi3jtpO4Zj+CR9pXM1/+oa40xhWh4Eh
WtDofKi5z7BLVYFNqIix
=BR1V
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BtrFs on drives with error recovery control / TLER?

2015-01-15 Thread Daniel Pocock


Hi,

Can anybody comment on how BtrFs (particularly RAID1 mirroring)
interacts with drives that offer error recovery control (or TLER in WDC
terms)?

I generally prefer to buy this type of drive for any serious data
storage purposes

I notice ZFS gets a mention in the Wikipedia article about the topic:
http://en.wikipedia.org/wiki/Error_recovery_control

Should BtrFs be mentioned there too?

Regards,

Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3

2014-12-03 Thread Daniel Miranda
Hello again,

Sorry for the delay, I had some things to do this past week, including
figuring out the stability problems that I was having, but everything
is good now. I rebuilt the Fedora package for btrfs-progs 3.17.2 with
your patches, and btrfsck successfully removed the orphan file! The
contents seem to be intact in /lost+found. Thank you very much Qu,
you've been immensily helpful.

Regards,
Daniel


On Wed, Nov 26, 2014 at 1:07 AM, Daniel Miranda danielk...@gmail.com wrote:
 Alright, I'll just have to understand how to build btrfs-progs now,
 since I'm currently just using the packages from the Fedora repo.

 Thanks for all the help and time spent so far,
 Daniel


 On Wed, Nov 26, 2014 at 12:41 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Hi Daniel,

 With your btrfs-image dump, I tested with my patchset sent to maillist, my
 patchset succeeds fixing the image.

 You can get the patchset and then apply it on 3.17.2, and --repair should
 fix it.
 The file with nlink error will be moved to 'lost+found' dir.

 Although the best fixing should be just adding the missing dir_index,
 but currently the patchset does quite well and does not need to do any
 modify.

 The patchset can be extracted using patchwork:
 0001: https://patchwork.kernel.org/patch/5364131/mbox/
 0002: https://patchwork.kernel.org/patch/5364141/mbox/
 0003: https://patchwork.kernel.org/patch/5364101/mbox/
 0004 v2: https://patchwork.kernel.org/patch/5383611/mbox/
 0005 v2: https://patchwork.kernel.org/patch/5383601/mbox/
 0006: https://patchwork.kernel.org/patch/5364151/mbox

 Any feedback is welcomed to improve the patches.

 Thanks,
 Qu


  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 15:42

 I just ran the repair but the ghost file has not disappeared,
 unfortunately.

 On Tue, Nov 25, 2014 at 5:26 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 15:20

 Here are the logs. I'll send you a link to my dump directly after I
 finish uploading it. Please notify me when you have downloaded it so I
 can delete it.

 checking extents
 checking free space cache
 checking fs roots
 root 5 inode 17149868 errors 2000, link count wrong
   unresolved ref dir 17182377 index 245 namelen 8 name string.h
 filetype 1 errors 1, no dir item

 link count error seems resolved by Josef's patch commit already in
 3.17.2.
 If using 3.17.2, josef's commit will rebuild the dir item and dir index.

 root 5 inode 17182377 errors 200, dir isize wrong

 This isize error seems caused by previous line.
 If 3.17.2 can repair above problem, it should not be a problem and will
 disappear.

 According to the above output, btrfsck --repair with btrfs-progs 3.17.2
 has
 a good chance repairing it.
 Just have a try.

 Thanks,
 Qu

 Checking filesystem on /dev/mapper/fedora_daniel--pc-root
 UUID: fef8f718-0622-4cb1-9597-749650d366a4
 found 55108022156 bytes used err is 1
 total csum bytes: 89787396
 total tree bytes: 2303455232
 total fs tree bytes: 2024841216
 total extent tree bytes: 145272832
 btree space waste bytes: 529672422
 file data blocks allocated: 253414481920
referenced 94127726592
 Btrfs v3.17


 Regards,
 Daniel

 On Tue, Nov 25, 2014 at 3:20 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 13:14

 I'll go run that and get you the output.

 Thanks.


 I can do the image dump, sure. I don't know how long it might take to
 upload it somewhere though. Right now `btrfs fi df` shows about 2GiB
 of metadata (it's a 120GiB volume). I'll see how large it ends up
 after compression.

 120G volume seems quite small, compared the images I received recently
 (1T
 x2 RAID1 and 4T single).
 With '-c 9' it shouldn't be too huge I think(The 1T raid1 is about 1G
 metadata with -c9).

 BTW, btrfs-image dump will have all the filenames and hierarchy, even
 without its data,
 it is still better considering your privacy twice before uploading.

 Thanks,
 Qu

 Thanks for the quick response,
 Daniel

 On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

 Hi,

 What's the btrfsck output? Without --repair option.

 Also, if it is OK for you, would you please dump the btrfs with
 'btrfs-image' command?
 '-c 9' option is highly recommended considering the size of it.
 This will helps a lot for developers to test the btrfsck repair
 function.

 Thanks,
 Qu


  Original

Re: [RFC PATCH] Btrfs: add sha256 checksum option

2014-11-25 Thread Daniel Cegiełka
2014-11-25 11:30 GMT+01:00 Liu Bo bo.li@oracle.com:
 On Mon, Nov 24, 2014 at 11:34:46AM -0800, John Williams wrote:
 On Mon, Nov 24, 2014 at 12:23 AM, Holger Hoffstätte
 holger.hoffstae...@googlemail.com wrote:

  Would there be room for a compromise with e.g. 128 bits?

 For example, Spooky V2 hash is 128 bits and is very fast. It is
 noncryptographic, but it is more than adequate for data checksums.

 http://burtleburtle.net/bob/hash/spooky.html

 SnapRAID uses this hash, and it runs at about 15 GB/sec on my machine
 (Xeon E3-1270 V2 @ 3.50Ghz)

 Thanks for the suggestion, I'll take a look.

 Btw, it's not in kernel yet, is it?


The best option would be blake2b, but it isn't implemented in the
kernel. It is not a problem to use it locally (I can upload the code
stripped for usage in kernel).

from https://blake2.net/

Q: Why do you want BLAKE2 to be fast? Aren't fast hashes bad?

A: You want your hash function to be fast if you are using it to
compute the secure hash of a large amount of data, such as in
distributed filesystems (e.g. Tahoe-LAFS), cloud storage systems (e.g.
OpenStack Swift), intrusion detection systems (e.g. Samhain),
integrity-checking local filesystems (e.g. ZFS), peer-to-peer
file-sharing tools (e.g. BitTorrent), or version control systems (e.g.
git). You only want your hash function to be slow if you're using it
to stretch user-supplied passwords, in which case see the next
question.

https://blake2.net/
https://github.com/floodyberry/blake2b-opt

Best regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3

2014-11-25 Thread Daniel Miranda
Alright, I'll just have to understand how to build btrfs-progs now,
since I'm currently just using the packages from the Fedora repo.

Thanks for all the help and time spent so far,
Daniel


On Wed, Nov 26, 2014 at 12:41 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Hi Daniel,

 With your btrfs-image dump, I tested with my patchset sent to maillist, my
 patchset succeeds fixing the image.

 You can get the patchset and then apply it on 3.17.2, and --repair should
 fix it.
 The file with nlink error will be moved to 'lost+found' dir.

 Although the best fixing should be just adding the missing dir_index,
 but currently the patchset does quite well and does not need to do any
 modify.

 The patchset can be extracted using patchwork:
 0001: https://patchwork.kernel.org/patch/5364131/mbox/
 0002: https://patchwork.kernel.org/patch/5364141/mbox/
 0003: https://patchwork.kernel.org/patch/5364101/mbox/
 0004 v2: https://patchwork.kernel.org/patch/5383611/mbox/
 0005 v2: https://patchwork.kernel.org/patch/5383601/mbox/
 0006: https://patchwork.kernel.org/patch/5364151/mbox

 Any feedback is welcomed to improve the patches.

 Thanks,
 Qu


  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 15:42

 I just ran the repair but the ghost file has not disappeared,
 unfortunately.

 On Tue, Nov 25, 2014 at 5:26 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 15:20

 Here are the logs. I'll send you a link to my dump directly after I
 finish uploading it. Please notify me when you have downloaded it so I
 can delete it.

 checking extents
 checking free space cache
 checking fs roots
 root 5 inode 17149868 errors 2000, link count wrong
   unresolved ref dir 17182377 index 245 namelen 8 name string.h
 filetype 1 errors 1, no dir item

 link count error seems resolved by Josef's patch commit already in
 3.17.2.
 If using 3.17.2, josef's commit will rebuild the dir item and dir index.

 root 5 inode 17182377 errors 200, dir isize wrong

 This isize error seems caused by previous line.
 If 3.17.2 can repair above problem, it should not be a problem and will
 disappear.

 According to the above output, btrfsck --repair with btrfs-progs 3.17.2
 has
 a good chance repairing it.
 Just have a try.

 Thanks,
 Qu

 Checking filesystem on /dev/mapper/fedora_daniel--pc-root
 UUID: fef8f718-0622-4cb1-9597-749650d366a4
 found 55108022156 bytes used err is 1
 total csum bytes: 89787396
 total tree bytes: 2303455232
 total fs tree bytes: 2024841216
 total extent tree bytes: 145272832
 btree space waste bytes: 529672422
 file data blocks allocated: 253414481920
referenced 94127726592
 Btrfs v3.17


 Regards,
 Daniel

 On Tue, Nov 25, 2014 at 3:20 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 13:14

 I'll go run that and get you the output.

 Thanks.


 I can do the image dump, sure. I don't know how long it might take to
 upload it somewhere though. Right now `btrfs fi df` shows about 2GiB
 of metadata (it's a 120GiB volume). I'll see how large it ends up
 after compression.

 120G volume seems quite small, compared the images I received recently
 (1T
 x2 RAID1 and 4T single).
 With '-c 9' it shouldn't be too huge I think(The 1T raid1 is about 1G
 metadata with -c9).

 BTW, btrfs-image dump will have all the filenames and hierarchy, even
 without its data,
 it is still better considering your privacy twice before uploading.

 Thanks,
 Qu

 Thanks for the quick response,
 Daniel

 On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

 Hi,

 What's the btrfsck output? Without --repair option.

 Also, if it is OK for you, would you please dump the btrfs with
 'btrfs-image' command?
 '-c 9' option is highly recommended considering the size of it.
 This will helps a lot for developers to test the btrfsck repair
 function.

 Thanks,
 Qu


  Original Message 
 Subject: Apparent metadata corruption (file that simultaneously
 does/does
 not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: linux-btrfs@vger.kernel.org
 Date: 2014年11月25日 13:04

 Hello,

 After I had some brief stability issues with my computer, it seems
 some form of metadata corruption took place in my BTRFS filesystem,
 and now a particular file seems to exist, but I cannot access any
 details on it or delete it.

 If I try to `ls

Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3

2014-11-24 Thread Daniel Miranda
Hello,

After I had some brief stability issues with my computer, it seems
some form of metadata corruption took place in my BTRFS filesystem,
and now a particular file seems to exist, but I cannot access any
details on it or delete it.

If I try to `ls` in the directory it is in, that's what I get:

ls: cannot access string.h: No such file or directory
total 0
drwxr-xr-x. 1 danielkza mock 16 Nov 21 14:18 ./
drwxr-xr-x. 1 danielkza mock  6 Nov 21 14:18 ../
-?? ? ? ? ?? string.h

If I try to delete it I get:

rm: cannot remove ‘string.h’: No such file or directory

I'm using kernel 3.17.3 from Fedora 21. I got no messages on dmesg or
anything of the sort. I know the btrfs fsck situation is complicated,
but is there any utility I should use to try and repair this? Losing
this file is not a problem, it's just one header from the kernel I was
building.

Regards,
Daniel Miranda
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3

2014-11-24 Thread Daniel Miranda
I'll go run that and get you the output.

I can do the image dump, sure. I don't know how long it might take to
upload it somewhere though. Right now `btrfs fi df` shows about 2GiB
of metadata (it's a 120GiB volume). I'll see how large it ends up
after compression.

Thanks for the quick response,
Daniel

On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Hi,

 What's the btrfsck output? Without --repair option.

 Also, if it is OK for you, would you please dump the btrfs with
 'btrfs-image' command?
 '-c 9' option is highly recommended considering the size of it.
 This will helps a lot for developers to test the btrfsck repair function.

 Thanks,
 Qu


  Original Message 
 Subject: Apparent metadata corruption (file that simultaneously does/does
 not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: linux-btrfs@vger.kernel.org
 Date: 2014年11月25日 13:04

 Hello,

 After I had some brief stability issues with my computer, it seems
 some form of metadata corruption took place in my BTRFS filesystem,
 and now a particular file seems to exist, but I cannot access any
 details on it or delete it.

 If I try to `ls` in the directory it is in, that's what I get:

 ls: cannot access string.h: No such file or directory
 total 0
 drwxr-xr-x. 1 danielkza mock 16 Nov 21 14:18 ./
 drwxr-xr-x. 1 danielkza mock  6 Nov 21 14:18 ../
 -?? ? ? ? ?? string.h

 If I try to delete it I get:

 rm: cannot remove ‘string.h’: No such file or directory

 I'm using kernel 3.17.3 from Fedora 21. I got no messages on dmesg or
 anything of the sort. I know the btrfs fsck situation is complicated,
 but is there any utility I should use to try and repair this? Losing
 this file is not a problem, it's just one header from the kernel I was
 building.

 Regards,
 Daniel Miranda
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3

2014-11-24 Thread Daniel Miranda
Here are the logs. I'll send you a link to my dump directly after I
finish uploading it. Please notify me when you have downloaded it so I
can delete it.

checking extents
checking free space cache
checking fs roots
root 5 inode 17149868 errors 2000, link count wrong
unresolved ref dir 17182377 index 245 namelen 8 name string.h
filetype 1 errors 1, no dir item
root 5 inode 17182377 errors 200, dir isize wrong
Checking filesystem on /dev/mapper/fedora_daniel--pc-root
UUID: fef8f718-0622-4cb1-9597-749650d366a4
found 55108022156 bytes used err is 1
total csum bytes: 89787396
total tree bytes: 2303455232
total fs tree bytes: 2024841216
total extent tree bytes: 145272832
btree space waste bytes: 529672422
file data blocks allocated: 253414481920
 referenced 94127726592
Btrfs v3.17


Regards,
Daniel

On Tue, Nov 25, 2014 at 3:20 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 13:14

 I'll go run that and get you the output.

 Thanks.


 I can do the image dump, sure. I don't know how long it might take to
 upload it somewhere though. Right now `btrfs fi df` shows about 2GiB
 of metadata (it's a 120GiB volume). I'll see how large it ends up
 after compression.

 120G volume seems quite small, compared the images I received recently (1T
 x2 RAID1 and 4T single).
 With '-c 9' it shouldn't be too huge I think(The 1T raid1 is about 1G
 metadata with -c9).

 BTW, btrfs-image dump will have all the filenames and hierarchy, even
 without its data,
 it is still better considering your privacy twice before uploading.

 Thanks,
 Qu


 Thanks for the quick response,
 Daniel

 On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

 Hi,

 What's the btrfsck output? Without --repair option.

 Also, if it is OK for you, would you please dump the btrfs with
 'btrfs-image' command?
 '-c 9' option is highly recommended considering the size of it.
 This will helps a lot for developers to test the btrfsck repair function.

 Thanks,
 Qu


  Original Message 
 Subject: Apparent metadata corruption (file that simultaneously does/does
 not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: linux-btrfs@vger.kernel.org
 Date: 2014年11月25日 13:04

 Hello,

 After I had some brief stability issues with my computer, it seems
 some form of metadata corruption took place in my BTRFS filesystem,
 and now a particular file seems to exist, but I cannot access any
 details on it or delete it.

 If I try to `ls` in the directory it is in, that's what I get:

 ls: cannot access string.h: No such file or directory
 total 0
 drwxr-xr-x. 1 danielkza mock 16 Nov 21 14:18 ./
 drwxr-xr-x. 1 danielkza mock  6 Nov 21 14:18 ../
 -?? ? ? ? ?? string.h

 If I try to delete it I get:

 rm: cannot remove ‘string.h’: No such file or directory

 I'm using kernel 3.17.3 from Fedora 21. I got no messages on dmesg or
 anything of the sort. I know the btrfs fsck situation is complicated,
 but is there any utility I should use to try and repair this? Losing
 this file is not a problem, it's just one header from the kernel I was
 building.

 Regards,
 Daniel Miranda
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs
 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Apparent metadata corruption (file that simultaneously does/does not exist) on kernel 3.17.3

2014-11-24 Thread Daniel Miranda
I just ran the repair but the ghost file has not disappeared, unfortunately.

On Tue, Nov 25, 2014 at 5:26 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 15:20

 Here are the logs. I'll send you a link to my dump directly after I
 finish uploading it. Please notify me when you have downloaded it so I
 can delete it.

 checking extents
 checking free space cache
 checking fs roots
 root 5 inode 17149868 errors 2000, link count wrong
  unresolved ref dir 17182377 index 245 namelen 8 name string.h
 filetype 1 errors 1, no dir item

 link count error seems resolved by Josef's patch commit already in 3.17.2.
 If using 3.17.2, josef's commit will rebuild the dir item and dir index.

 root 5 inode 17182377 errors 200, dir isize wrong

 This isize error seems caused by previous line.
 If 3.17.2 can repair above problem, it should not be a problem and will
 disappear.

 According to the above output, btrfsck --repair with btrfs-progs 3.17.2 has
 a good chance repairing it.
 Just have a try.

 Thanks,
 Qu

 Checking filesystem on /dev/mapper/fedora_daniel--pc-root
 UUID: fef8f718-0622-4cb1-9597-749650d366a4
 found 55108022156 bytes used err is 1
 total csum bytes: 89787396
 total tree bytes: 2303455232
 total fs tree bytes: 2024841216
 total extent tree bytes: 145272832
 btree space waste bytes: 529672422
 file data blocks allocated: 253414481920
   referenced 94127726592
 Btrfs v3.17


 Regards,
 Daniel

 On Tue, Nov 25, 2014 at 3:20 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

  Original Message 
 Subject: Re: Apparent metadata corruption (file that simultaneously
 does/does not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年11月25日 13:14

 I'll go run that and get you the output.

 Thanks.


 I can do the image dump, sure. I don't know how long it might take to
 upload it somewhere though. Right now `btrfs fi df` shows about 2GiB
 of metadata (it's a 120GiB volume). I'll see how large it ends up
 after compression.

 120G volume seems quite small, compared the images I received recently
 (1T
 x2 RAID1 and 4T single).
 With '-c 9' it shouldn't be too huge I think(The 1T raid1 is about 1G
 metadata with -c9).

 BTW, btrfs-image dump will have all the filenames and hierarchy, even
 without its data,
 it is still better considering your privacy twice before uploading.

 Thanks,
 Qu

 Thanks for the quick response,
 Daniel

 On Tue, Nov 25, 2014 at 3:10 AM, Qu Wenruo quwen...@cn.fujitsu.com
 wrote:

 Hi,

 What's the btrfsck output? Without --repair option.

 Also, if it is OK for you, would you please dump the btrfs with
 'btrfs-image' command?
 '-c 9' option is highly recommended considering the size of it.
 This will helps a lot for developers to test the btrfsck repair
 function.

 Thanks,
 Qu


  Original Message 
 Subject: Apparent metadata corruption (file that simultaneously
 does/does
 not exist) on kernel 3.17.3
 From: Daniel Miranda danielk...@gmail.com
 To: linux-btrfs@vger.kernel.org
 Date: 2014年11月25日 13:04

 Hello,

 After I had some brief stability issues with my computer, it seems
 some form of metadata corruption took place in my BTRFS filesystem,
 and now a particular file seems to exist, but I cannot access any
 details on it or delete it.

 If I try to `ls` in the directory it is in, that's what I get:

 ls: cannot access string.h: No such file or directory
 total 0
 drwxr-xr-x. 1 danielkza mock 16 Nov 21 14:18 ./
 drwxr-xr-x. 1 danielkza mock  6 Nov 21 14:18 ../
 -?? ? ? ? ?? string.h

 If I try to delete it I get:

 rm: cannot remove ‘string.h’: No such file or directory

 I'm using kernel 3.17.3 from Fedora 21. I got no messages on dmesg or
 anything of the sort. I know the btrfs fsck situation is complicated,
 but is there any utility I should use to try and repair this? Losing
 this file is not a problem, it's just one header from the kernel I was
 building.

 Regards,
 Daniel Miranda
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs
 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: disk-io: replace root args iff only fs_info used

2014-11-21 Thread Daniel Dressler
This is the 3rd independent patch of a larger
project to cleanup btrfs's internal usage of
btrfs_root. Many functions take btrfs_root
only to grab the fs_info struct.

By requiring a root these functions cause
programmer overhead. That these functions can
accept any valid root is not obvious until
inspection.

This patch reduces the specificity of such
functions to accept the fs_info directly.

These patches can be applied independently
and thus are not being submitted as a patch
series. There should be about 26 patches by
the project's completion. Each patch will
cleanup between 1 and 34 functions apiece.
Each patch covers a single file's functions.

This patch affects the following function(s):
  1) csum_tree_block
  2) csum_dirty_buffer
  3) check_tree_block_fsid
  4) btrfs_find_tree_block
  5) clean_tree_block

Signed-off-by: Daniel Dressler danieru.dress...@gmail.com
---
 fs/btrfs/ctree.c   | 26 +-
 fs/btrfs/disk-io.c | 32 
 fs/btrfs/disk-io.h |  4 ++--
 fs/btrfs/extent-tree.c |  6 +++---
 4 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 19bc616..e76a6ba 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1075,7 +1075,7 @@ static noinline int update_ref_for_cow(struct 
btrfs_trans_handle *trans,
ret = btrfs_dec_ref(trans, root, buf, 1);
BUG_ON(ret); /* -ENOMEM */
}
-   clean_tree_block(trans, root, buf);
+   clean_tree_block(trans, root-fs_info, buf);
*last_ref = 1;
}
return 0;
@@ -1681,7 +1681,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
continue;
}
 
-   cur = btrfs_find_tree_block(root, blocknr);
+   cur = btrfs_find_tree_block(root-fs_info, blocknr);
if (cur)
uptodate = btrfs_buffer_uptodate(cur, gen, 0);
else
@@ -1946,7 +1946,7 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
 
path-locks[level] = 0;
path-nodes[level] = NULL;
-   clean_tree_block(trans, root, mid);
+   clean_tree_block(trans, root-fs_info, mid);
btrfs_tree_unlock(mid);
/* once for the path */
free_extent_buffer(mid);
@@ -2000,7 +2000,7 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
if (wret  0  wret != -ENOSPC)
ret = wret;
if (btrfs_header_nritems(right) == 0) {
-   clean_tree_block(trans, root, right);
+   clean_tree_block(trans, root-fs_info, right);
btrfs_tree_unlock(right);
del_ptr(root, path, level + 1, pslot + 1);
root_sub_used(root, right-len);
@@ -2044,7 +2044,7 @@ static noinline int balance_level(struct 
btrfs_trans_handle *trans,
BUG_ON(wret == 1);
}
if (btrfs_header_nritems(mid) == 0) {
-   clean_tree_block(trans, root, mid);
+   clean_tree_block(trans, root-fs_info, mid);
btrfs_tree_unlock(mid);
del_ptr(root, path, level + 1, pslot);
root_sub_used(root, mid-len);
@@ -2262,7 +2262,7 @@ static void reada_for_search(struct btrfs_root *root,
 
search = btrfs_node_blockptr(node, slot);
blocksize = root-nodesize;
-   eb = btrfs_find_tree_block(root, search);
+   eb = btrfs_find_tree_block(root-fs_info, search);
if (eb) {
free_extent_buffer(eb);
return;
@@ -2324,7 +2324,7 @@ static noinline void reada_for_balance(struct btrfs_root 
*root,
if (slot  0) {
block1 = btrfs_node_blockptr(parent, slot - 1);
gen = btrfs_node_ptr_generation(parent, slot - 1);
-   eb = btrfs_find_tree_block(root, block1);
+   eb = btrfs_find_tree_block(root-fs_info, block1);
/*
 * if we get -eagain from btrfs_buffer_uptodate, we
 * don't want to return eagain here.  That will loop
@@ -2337,7 +2337,7 @@ static noinline void reada_for_balance(struct btrfs_root 
*root,
if (slot + 1  nritems) {
block2 = btrfs_node_blockptr(parent, slot + 1);
gen = btrfs_node_ptr_generation(parent, slot + 1);
-   eb = btrfs_find_tree_block(root, block2);
+   eb = btrfs_find_tree_block(root-fs_info, block2);
if (eb  btrfs_buffer_uptodate(eb, gen, 1) != 0)
block2 = 0;
free_extent_buffer(eb);
@@ -2455,7 +2455,7 @@ read_block_for_search(struct btrfs_trans_handle *trans,
blocknr = btrfs_node_blockptr(b, slot);
gen = btrfs_node_ptr_generation(b, slot);
 
-   tmp

Re: [PATCH] Btrfs: ctree: reduce args where only fs_info used

2014-11-21 Thread Daniel Dressler
Ah thanks David for looking at this.

Sorry for the thin paragraphs my vim was warming too early about long
lines. I will reformat it to break at 74 chars.

No problem, I'll redo everything so it is one function per patch. Now
fair warning: there are about 102 functions to cleanup. I was a bit
worried that many patches would cause too much maintainer overhead but
it is no problem for me. Only a few functions have dependecies on
other functions needing cleanup. Thus there will be some small patch
series for those function sets. A big benefit of one function one
patch is that extent-io.c will no longer be a 34 function monster
patch.

Thank you David, I'll redo all these patches.

Is there any rate limiting I should be doing? I don't want to flood
the list with burst of dozen plus patches, or is that an okay volume?

Daniel

2014-11-22 0:55 GMT+09:00 David Sterba dste...@suse.cz:
 On Wed, Nov 12, 2014 at 01:43:09PM +0900, Daniel Dressler wrote:
 This patch is part of a larger project to cleanup
 btrfs's internal usage of struct btrfs_root. Many
 functions take btrfs_root only to grab a pointer
 to fs_info.

 Thanks for picking up the project.

 A mere formality, can you please justify the paragraphs to 74 chars?

 --
 This patch is part of a larger project to cleanup btrfs's internal usage
 of struct btrfs_root. Many functions take btrfs_root only to grab a
 pointer to fs_info.
 --

 This patch does not address the two functions in
 ctree.c (insert_ptr, and split_item) which only
 use root for BUG_ONs in ctree.c

 This patch affects the following functions:
   1) fixup_low_keys
   2) btrfs_set_item_key_safe

 Please send one patch per function change, unless there are more that
 are somehow entangled that it would make it hard to separate.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: disk-io: replace root args iff only fs_info used

2014-11-21 Thread Daniel Dressler
Thank you David this is helpful feedback.

What would a cover letter be like? Would that be a separate email to
the list, or maybe the first email in a patch series?

Sorry I've twice looked for the integration repo. I found some that
look like it could be but those had older commits. Could you direct me
to the exact branch I'd love to work against it. These patches were
done against linux-next.

I think small one function patches might be best. I have the codebase
mapped out and each file's functions-to-be-cleaned count varies
wildly. If I did batch files together and split large files apart
there would be no rhyme or reason for the groupings. With single
function patches it is very clear what changes are justified since
they should only occur in the affected function or in a call-site.
With multiple functions the call-site changes get mixed up would it
would be harder to review.

Daniel


2014-11-22 1:15 GMT+09:00 David Sterba dste...@suse.cz:
 On Fri, Nov 21, 2014 at 05:15:07PM +0900, Daniel Dressler wrote:
 This is the 3rd independent patch of a larger
 project to cleanup btrfs's internal usage of
 btrfs_root. Many functions take btrfs_root
 only to grab the fs_info struct.

 By requiring a root these functions cause
 programmer overhead. That these functions can
 accept any valid root is not obvious until
 inspection.

 This patch reduces the specificity of such
 functions to accept the fs_info directly.

 These patches can be applied independently
 and thus are not being submitted as a patch
 series. There should be about 26 patches by
 the project's completion. Each patch will
 cleanup between 1 and 34 functions apiece.
 Each patch covers a single file's functions.

 It's good to have this kind of introduction but it really belongs ot the
 cover letter not the individual patches.

 This patch affects the following function(s):
   1) csum_tree_block
   2) csum_dirty_buffer
   3) check_tree_block_fsid
   4) btrfs_find_tree_block
   5) clean_tree_block

 Now that I see that, I'm not sure that my previous comment about 'one
 patch per function' is the right way to go. This patch looks good as it
 stands. The change is simple enough that I won't be opposed to grouping
 even more functions together as long as it stays revieweable.

 The patches are likely to clash with a lot of pending patches, so you
 may want to base it on the integration branch next time. This would make
 maintainers' life easier and also raises chances to merge the patches.

 Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: delayed-inode: replace root args iff only fs_info used

2014-11-17 Thread Daniel Dressler
This is the second independent patch of a larger
project to cleanup btrfs's internal usage of
btrfs_root. Many functions take btrfs_root
only to grab the fs_info struct.

By requiring a root these functions cause
programmer overhead. That these functions can
accept any valid root is not obvious until
inspection.

This patch reduces the specificity of such
functions to accept the fs_info directly.

These patches can be applied independently
and thus are not being submitted as a patch
series. There should be about 26 patches by
the project's completion. Each patch will
cleanup between 1 and 34 functions apiece.
Each patch covers a single file's functions.

This patch affects the following function(s):
  1) btrfs_wq_run_delayed_node

Signed-off-by: Daniel Dressler danieru.dress...@gmail.com
---
 fs/btrfs/delayed-inode.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 054577b..e590da6 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1383,7 +1383,7 @@ out:
 
 
 static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root,
-struct btrfs_root *root, int nr)
+struct btrfs_fs_info *fs_info, int nr)
 {
struct btrfs_async_delayed_work *async_work;
 
@@ -1399,7 +1399,7 @@ static int btrfs_wq_run_delayed_node(struct 
btrfs_delayed_root *delayed_root,
btrfs_async_run_delayed_root, NULL, NULL);
async_work-nr = nr;
 
-   btrfs_queue_work(root-fs_info-delayed_workers, async_work-work);
+   btrfs_queue_work(fs_info-delayed_workers, async_work-work);
return 0;
 }
 
@@ -1426,6 +1426,7 @@ static int could_end_wait(struct btrfs_delayed_root 
*delayed_root, int seq)
 void btrfs_balance_delayed_items(struct btrfs_root *root)
 {
struct btrfs_delayed_root *delayed_root;
+   struct btrfs_fs_info *fs_info = root-fs_info;
 
delayed_root = btrfs_get_delayed_root(root);
 
@@ -1438,7 +1439,7 @@ void btrfs_balance_delayed_items(struct btrfs_root *root)
 
seq = atomic_read(delayed_root-items_seq);
 
-   ret = btrfs_wq_run_delayed_node(delayed_root, root, 0);
+   ret = btrfs_wq_run_delayed_node(delayed_root, fs_info, 0);
if (ret)
return;
 
@@ -1447,7 +1448,7 @@ void btrfs_balance_delayed_items(struct btrfs_root *root)
return;
}
 
-   btrfs_wq_run_delayed_node(delayed_root, root, BTRFS_DELAYED_BATCH);
+   btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH);
 }
 
 /* Will return 0 or -ENOMEM */
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS messes up snapshot LV with origin

2014-11-16 Thread Daniel Dressler
If a UUID is not unique enough how will adding a second UUID or
unique drive identifier help?

A UUID only serves any purpose when it is unique. Thus duplicate UUIDs
are themselves a failure state.

The solution should be to make it harder to get into this failure
state. Not to make all programs resilient against running under this
failure state. It isn't a btrfs bug that it requires Universal Unique
IDs to be universally unique.

Daniel

2014-11-17 15:59 GMT+09:00 Brendan Hide bren...@swiftspirit.co.za:
 cc'd bug-g...@gnu.org for FYI

 On 2014/11/17 03:42, Duncan wrote:

 MegaBrutal posted on Sun, 16 Nov 2014 22:35:26 +0100 as excerpted:

 Hello guys,

 I think you'll like this...
 https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1391429

 UUID is an initialism for Universally Unique IDentifier.[1]

 If the UUID isn't unique, by definition, then, it can't be a UUID, and
 that's a bug in whatever is making the non-unique would-be UUID that
 isn't unique and thus cannot be a universally unique ID.  In this case
 that would appear to be LVM.

 Perhaps the right question to ask is Where should this bug be fixed?.

 TL;DR: This needs more thought and input from btrfs devs. To LVM, the bug is
 likely seen as being out of scope. The correct fix probably lies in the
 ecosystem design, which requires co-operation from btrfs.

 Making a snapshot in LVM is a fundamental thing - and I feel LVM, in making
 its snapshot, is doing its job exactly as expected.

 Additionally, there are other ways to get to a similar state without LVM:
 ddrescue backup, SAN snapshot, old missing disk re-introduced, etc.

 That leaves two places where this can be fixed: grub and btrfs

 Grub is already a little smart here - it avoids snapshots. But in this case
 it is relying on the UUID and only finding it in the snapshot. So possibly
 this is a bug in grub affecting the bug reporter specifically - but perhaps
 the bug is in btrfs where grub is relying on btrfs code.

 Yes, I'd rather use btrfs' snapshot mechanism - but this is often a choice
 that is left to the user/admin/distro. I don't think saying LVM snapshots
 are incompatible with btrfs is the right way to go either.

 That leaves two aspects of this issue which I view as two separate bugs:
 a) Btrfs cannot gracefully handle separate filesystems that have the same
 UUID. At all.
 b) Grub appears to pick the wrong filesystem when presented with two
 filesystems with the same UUID.

 I feel a) is a btrfs bug.
 I feel b) is a bug that is more about ecosystem design than grub being
 silly.

 I imagine a couple of aspects that could help fix a):
 - Utilise a unique drive identifier in the btrfs metadata (surely this
 exists already?). This way, any two filesystems will always have different
 drive identifiers *except* in cases like a ddrescue'd copy or a block-level
 snapshot. This will provide a sensible mechanism for defined behaviour,
 preventing corruption - even if that defined behaviour is to simply give
 out lots of PEBKAC errors and panic.
 - Utilise a drive list to ensure that two unrelated filesystems with the
 same UUID cannot get mixed up. Yes, the user/admin would likely be the
 culprit here (perhaps a VM rollout process that always gives out the same
 UUID in all its filesystems). Again, does btrfs not already have something
 like this built-in that we're simply not utilising fully?

 I'm not exactly sure of the correct way to fix b) except that I imagine it
 would be trivial to fix once a) is fixed.

 --
 __
 Brendan Hide
 http://swiftspirit.co.za/
 http://www.webafrica.co.za/?AFF1E97


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: qgroup: add BUILD_BUG to report pointer cast breakage

2014-11-12 Thread Daniel Dressler
Our ulist data structure stores at max 64bit
values. qgroup has used this structure to store
pointers. In the future when we upgrade to 128bit
this casting of pointers to uint64_t will break.

This patch adds a BUILD_BUG ensuring that this
code will not be left untouched in the upgrade.

It also marks this issue on the TODO list so it
may be addressed before such an upgrade.

Signed-off-by: Daniel Dressler danieru.dress...@gmail.com
---
 fs/btrfs/qgroup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48b60db..87f7c98 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -44,6 +44,7 @@
  *  - caches fuer ulists
  *  - performance benchmarks
  *  - check all ioctl parameters
+ *  - do not cast uintptr_t to uint64_t in ulist usage
  */
 
 /*
@@ -101,6 +102,7 @@ struct btrfs_qgroup_list {
 
 #define ptr_to_u64(x) ((u64)(uintptr_t)x)
 #define u64_to_ptr(x) ((struct btrfs_qgroup *)(uintptr_t)x)
+BUILD_BUG_ON(UINTPTR_MAX  UINT64_MAX);
 
 static int
 qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: qgroup: add BUILD_BUG to report pointer cast breakage

2014-11-12 Thread Daniel Dressler
I am very very sorry, I forgot to even test building. Please pretend
this patch was never submitted.

Daniel

2014-11-13 0:00 GMT+09:00 Daniel Dressler danieru.dress...@gmail.com:
 Our ulist data structure stores at max 64bit
 values. qgroup has used this structure to store
 pointers. In the future when we upgrade to 128bit
 this casting of pointers to uint64_t will break.

 This patch adds a BUILD_BUG ensuring that this
 code will not be left untouched in the upgrade.

 It also marks this issue on the TODO list so it
 may be addressed before such an upgrade.

 Signed-off-by: Daniel Dressler danieru.dress...@gmail.com
 ---
  fs/btrfs/qgroup.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index 48b60db..87f7c98 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -44,6 +44,7 @@
   *  - caches fuer ulists
   *  - performance benchmarks
   *  - check all ioctl parameters
 + *  - do not cast uintptr_t to uint64_t in ulist usage
   */

  /*
 @@ -101,6 +102,7 @@ struct btrfs_qgroup_list {

  #define ptr_to_u64(x) ((u64)(uintptr_t)x)
  #define u64_to_ptr(x) ((struct btrfs_qgroup *)(uintptr_t)x)
 +BUILD_BUG_ON(UINTPTR_MAX  UINT64_MAX);

  static int
  qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 --
 2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: qgroup: add BUILD_BUG to report pointer cast breakage

2014-11-12 Thread Daniel Dressler
Our ulist data structure stores at max 64bit
values. qgroup has used this structure to store
pointers. In the future when we upgrade to 128bit
this casting of pointers to uint64_t will break.

This patch adds a BUILD_BUG ensuring that this
code will not be left untouched in the upgrade.

It also marks this issue on the TODO list so it
may be addressed before such an upgrade.

Signed-off-by: Daniel Dressler danieru.dress...@gmail.com
---
 fs/btrfs/qgroup.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 48b60db..a9a4cab 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -44,6 +44,7 @@
  *  - caches fuer ulists
  *  - performance benchmarks
  *  - check all ioctl parameters
+ *  - do not cast uintptr_t to uint64_t in ulist usage
  */
 
 /*
@@ -99,8 +100,12 @@ struct btrfs_qgroup_list {
struct btrfs_qgroup *member;
 };
 
-#define ptr_to_u64(x) ((u64)(uintptr_t)x)
-#define u64_to_ptr(x) ((struct btrfs_qgroup *)(uintptr_t)x)
+#define ptr_to_u64(x) \
+   (BUILD_BUG_ON_ZERO(sizeof(uintptr_t)  sizeof(u64)) + \
+   ((u64)(uintptr_t)x))
+#define u64_to_ptr(x) \
+   (BUILD_BUG_ON_ZERO(sizeof(uintptr_t)  sizeof(u64)) + \
+   ((struct btrfs_qgroup *)(uintptr_t)x))
 
 static int
 qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is it safe to refactor struct btrfs_root *root out of these functions?

2014-11-11 Thread Daniel Dressler
Hi

I'm gearing up to tackle the Pass fs_info instead of root  project
suggested on the wiki.

I've read through the entire codebase and made note of 102 functions
which could be refactored. Three of these do not make any use of their
root argument at all, is it safe to refactor these as well?

Namely:
btrfs_block_rsv_check :
http://lxr.free-electrons.com/source/fs/btrfs/extent-tree.c#L4743
copy_to_sk : http://lxr.free-electrons.com/source/fs/btrfs/ioctl.c#L1931
wait_for_commit :
http://lxr.free-electrons.com/source/fs/btrfs/transaction.c#L597

None of these function's users make indirect calls through function
pointers. Is it safe to refactor them? I ask because it seems strange
they would have unused arguments and I'm worried there might be a
reason I've missed.

Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: ctree: reduce args where only fs_info used

2014-11-11 Thread Daniel Dressler
This patch is part of a larger project to cleanup
btrfs's internal usage of struct btrfs_root. Many
functions take btrfs_root only to grab a pointer
to fs_info.

This causes programmers to ponder which root can
be passed. Since only the fs_info is read affected
functions can accept any root, except this is only
obvious upon inspection.

This patch reduces the specificty of such functions
to accept the fs_info directly.

This patch does not address the two functions in
ctree.c (insert_ptr, and split_item) which only
use root for BUG_ONs in ctree.c

This patch affects the following functions:
  1) fixup_low_keys
  2) btrfs_set_item_key_safe

Signed-off-by: Daniel Dressler danieru.dress...@gmail.com
---
 fs/btrfs/ctree.c | 27 +++
 fs/btrfs/ctree.h |  3 ++-
 fs/btrfs/file-item.c |  2 +-
 fs/btrfs/file.c  |  8 
 4 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 19bc616..db5a60f 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3139,7 +3139,8 @@ again:
  * higher levels
  *
  */
-static void fixup_low_keys(struct btrfs_root *root, struct btrfs_path *path,
+static void fixup_low_keys(struct btrfs_fs_info *fs_info,
+  struct btrfs_path *path,
   struct btrfs_disk_key *key, int level)
 {
int i;
@@ -3150,7 +3151,7 @@ static void fixup_low_keys(struct btrfs_root *root, 
struct btrfs_path *path,
if (!path-nodes[i])
break;
t = path-nodes[i];
-   tree_mod_log_set_node_key(root-fs_info, t, tslot, 1);
+   tree_mod_log_set_node_key(fs_info, t, tslot, 1);
btrfs_set_node_key(t, key, tslot);
btrfs_mark_buffer_dirty(path-nodes[i]);
if (tslot != 0)
@@ -3164,7 +3165,8 @@ static void fixup_low_keys(struct btrfs_root *root, 
struct btrfs_path *path,
  * This function isn't completely safe. It's the caller's responsibility
  * that the new key won't break the order
  */
-void btrfs_set_item_key_safe(struct btrfs_root *root, struct btrfs_path *path,
+void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
+struct btrfs_path *path,
 struct btrfs_key *new_key)
 {
struct btrfs_disk_key disk_key;
@@ -3186,7 +3188,7 @@ void btrfs_set_item_key_safe(struct btrfs_root *root, 
struct btrfs_path *path,
btrfs_set_item_key(eb, disk_key, slot);
btrfs_mark_buffer_dirty(eb);
if (slot == 0)
-   fixup_low_keys(root, path, disk_key, 1);
+   fixup_low_keys(fs_info, path, disk_key, 1);
 }
 
 /*
@@ -3944,7 +3946,7 @@ static noinline int __push_leaf_left(struct 
btrfs_trans_handle *trans,
clean_tree_block(trans, root, right);
 
btrfs_item_key(right, disk_key, 0);
-   fixup_low_keys(root, path, disk_key, 1);
+   fixup_low_keys(root-fs_info, path, disk_key, 1);
 
/* then fixup the leaf pointer in the path */
if (path-slots[0]  push_items) {
@@ -4181,6 +4183,7 @@ static noinline int split_leaf(struct btrfs_trans_handle 
*trans,
int mid;
int slot;
struct extent_buffer *right;
+   struct btrfs_fs_info *fs_info = root-fs_info;
int ret = 0;
int wret;
int split;
@@ -4284,10 +4287,10 @@ again:
btrfs_set_header_backref_rev(right, BTRFS_MIXED_BACKREF_REV);
btrfs_set_header_owner(right, root-root_key.objectid);
btrfs_set_header_level(right, 0);
-   write_extent_buffer(right, root-fs_info-fsid,
+   write_extent_buffer(right, fs_info-fsid,
btrfs_header_fsid(), BTRFS_FSID_SIZE);
 
-   write_extent_buffer(right, root-fs_info-chunk_tree_uuid,
+   write_extent_buffer(right, fs_info-chunk_tree_uuid,
btrfs_header_chunk_tree_uuid(right),
BTRFS_UUID_SIZE);
 
@@ -4310,7 +4313,7 @@ again:
path-nodes[0] = right;
path-slots[0] = 0;
if (path-slots[1] == 0)
-   fixup_low_keys(root, path, disk_key, 1);
+   fixup_low_keys(fs_info, path, disk_key, 1);
}
btrfs_mark_buffer_dirty(right);
return ret;
@@ -4626,7 +4629,7 @@ void btrfs_truncate_item(struct btrfs_root *root, struct 
btrfs_path *path,
btrfs_set_disk_key_offset(disk_key, offset + size_diff);
btrfs_set_item_key(leaf, disk_key, slot);
if (slot == 0)
-   fixup_low_keys(root, path, disk_key, 1);
+   fixup_low_keys(root-fs_info, path, disk_key, 1);
}
 
item = btrfs_item_nr(slot);
@@ -4727,7 +4730,7 @@ void setup_items_for_insert(struct btrfs_root *root, 
struct btrfs_path *path,
 
if (path-slots[0] == 0

bad areas cause btrfs segfault

2014-09-28 Thread Daniel Holth
I've got a couple of directories that cause a btrfs segfault. First
one happened at the end of July and I just renamed it to get it out of
my way (can't delete it without crashing); the second one just
happened and I'll be discarding the filesystem.

This crash when touched behavior is frustrating because it makes it
iffy to back up everything else. Usually about the second attempt to
touch the bad directory requires a reboot.

Instead, I would prefer that the filesystem not crash the whole system
when it encounters a corrupted area.

I've tried btrfs scrub and btrfs check but they don't find
anything wrong. I guess the next step would be btrfs restore, but I
think I have a good enough backup made with a normal copy skipping the
two corrupted directories.

Here's my info.

$   uname -a
Linux cardamom 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

$   btrfs --version
Btrfs v3.12

$   btrfs fi show
Btrfs v3.12

$   btrfs fi df /home # Replace /home with the mount point of your
btrfs-filesystem
Data, single: total=110.01GiB, used=108.09GiB
System, DUP: total=8.00MiB, used=20.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=3.00GiB, used=2.31GiB
Metadata, single: total=8.00MiB, used=0.00
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.13.0-36-generic (buildd@toyol) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 (Ubuntu 3.13.0-36.63-generic 3.13.11.6)
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-3.13.0-36-generic root=UUID=09bae76d-bf8a-47a7-998d-a929626274c1 ro quiet splash vt.handoff=7
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009ebff] usable
[0.00] BIOS-e820: [mem 0x0009ec00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e4000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xd3f7] usable
[0.00] BIOS-e820: [mem 0xd3f8-0xd3f8dfff] ACPI data
[0.00] BIOS-e820: [mem 0xd3f8e000-0xd3fc] ACPI NVS
[0.00] BIOS-e820: [mem 0xd3fd-0xd3ff] reserved
[0.00] BIOS-e820: [mem 0xff70-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x0001abff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] DMI: System manufacturer System Product Name/M3A78-EM, BIOS 180505/19/2009
[0.00] e820: update [mem 0x-0x0fff] usable == reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] No AGP bridge found
[0.00] e820: last_pfn = 0x1ac000 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-E uncachable
[0.00]   F-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 00 mask FF8000 write-back
[0.00]   1 base 008000 mask FFC000 write-back
[0.00]   2 base 00C000 mask FFF000 write-back
[0.00]   3 base 00D000 mask FFFC00 write-back
[0.00]   4 disabled
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] TOM2: 0001ac00 aka 6848M
[0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[0.00] e820: update [mem 0xd400-0x] usable == reserved
[0.00] e820: last_pfn = 0xd3f80 max_arch_pfn = 0x4
[0.00] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [880ff780]
[0.00] Scanning 1 areas for low memory corruption
[0.00] Base memory trampoline at [88098000] 98000 size 24576
[0.00] init_memory_mapping: [mem 0x-0x000f]
[0.00]  [mem 0x-0x000f] page 4k
[0.00] BRK [0x01fdf000, 0x01fd] PGTABLE
[0.00] BRK [0x01fe, 0x01fe0fff] PGTABLE
[0.00] BRK [0x01fe1000, 0x01fe1fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x1abe0-0x1abff]
[0.00]  [mem 0x1abe0-0x1abff] page 2M
[0.00] BRK [0x01fe2000, 0x01fe2fff] PGTABLE
[0.00] init_memory_mapping: [mem 0x1a800-0x1abdf]
[0.00]  [mem 0x1a800-0x1abdf] page 2M
[0.00] init_memory_mapping: [mem 0x18000-0x1a7ff]
[0.00]  [mem 0x18000-0x1a7ff] page 2M
[0.00] init_memory_mapping: [mem 0x0010-0xd3f7]
[0.00]  [mem 0x0010-0x001f] page 4k
[  

Re: bad areas cause btrfs segfault

2014-09-28 Thread Daniel Holth
Thanks

On Sun, Sep 28, 2014 at 9:38 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Hi,

 This bug seems to be one reported bug before:
 http://article.gmane.org/gmane.comp.file-systems.btrfs/33270

 And Chris has already updated the 3.13 stable branch to fix the bug.

 If it is OK for you, updating kernel to 3.14 would be a solution.
 (Since from 3.15, the new btrfs workqueue implementation caused some bug,
 and will be fixed in 3.17,
 3.15~3.16 is not recommended)

 Thanks
 Qu

  Original Message 
 Subject: bad areas cause btrfs segfault
 From: Daniel Holth dho...@gmail.com
 To: linux-btrfs@vger.kernel.org
 Date: 2014年09月29日 09:11

 I've got a couple of directories that cause a btrfs segfault. First
 one happened at the end of July and I just renamed it to get it out of
 my way (can't delete it without crashing); the second one just
 happened and I'll be discarding the filesystem.

 This crash when touched behavior is frustrating because it makes it
 iffy to back up everything else. Usually about the second attempt to
 touch the bad directory requires a reboot.

 Instead, I would prefer that the filesystem not crash the whole system
 when it encounters a corrupted area.

 I've tried btrfs scrub and btrfs check but they don't find
 anything wrong. I guess the next step would be btrfs restore, but I
 think I have a good enough backup made with a normal copy skipping the
 two corrupted directories.

 Here's my info.

 $   uname -a
 Linux cardamom 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC
 2014 x86_64 x86_64 x86_64 GNU/Linux

 $   btrfs --version
 Btrfs v3.12

 $   btrfs fi show
 Btrfs v3.12

 $   btrfs fi df /home # Replace /home with the mount point of your
 btrfs-filesystem
 Data, single: total=110.01GiB, used=108.09GiB
 System, DUP: total=8.00MiB, used=20.00KiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=3.00GiB, used=2.31GiB
 Metadata, single: total=8.00MiB, used=0.00


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.14.18 btrfs_set_item_key_safe BUG

2014-09-15 Thread Daniel J Blueman
On 3.14.18 with a BTRFS partition mounted
noatime,autodefrag,compress=lzo, I see the second assertion in
btrfs_set_item_key_safe() trip:

void btrfs_set_item_key_safe(struct btrfs_root *root, struct btrfs_path *path,
 struct btrfs_key *new_key)
{
struct btrfs_disk_key disk_key;
struct extent_buffer *eb;
int slot;

eb = path-nodes[0];
slot = path-slots[0];
if (slot  0) {
btrfs_item_key(eb, disk_key, slot - 1);
BUG_ON(comp_keys(disk_key, new_key) = 0);
}
if (slot  btrfs_header_nritems(eb) - 1) {
btrfs_item_key(eb, disk_key, slot + 1);
BUG_ON(comp_keys(disk_key, new_key) = 0); ---
}

Full backtrace:

kernel BUG at /home/apw/COD/linux/fs/btrfs/ctree.c:3215!
invalid opcode:  [#1] SMP
Modules linked in: nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc
bonding psmouse serio_raw joydev video mac_hid lpc_ich lp parport
hid_generic usbhid hid bcache btrfs raid10 raid456 async_raid6_recov
async_pq raid6_pq async_xor ahci xor async_memcpy libahci async_tx
raid1 e1000e ptp pps_core raid0 multipath linear
CPU: 0 PID: 6742 Comm: btrfs-endio-wri Not tainted
3.14.18-031418-generic #201409060201
Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0b 09/17/2012
task: 880418609d70 ti: 880121e92000 task.ti: 880121e92000
RIP: 0010:[a01693f1] [a01693f1]
btrfs_set_item_key_safe+0x141/0x150 [btrfs]
RSP: 0018:880121e93b28 EFLAGS: 00010246
RAX:  RBX: 0011 RCX: 3e60
RDX:  RSI: 880121e93c67 RDI: 880121e93b07
RBP: 880121e93b88 R08: 1000 R09: 880121e93b48
R10:  R11:  R12: 88009ce9bcc0
R13: 880121e93c67 R14: 880121e93b47 R15: 8804145f7c60
FS: () GS:88042fc0() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 7ff34a8d1890 CR3: 01c0d000 CR4: 001407f0
Stack:
 880121e93b88 880405ca 8803140d1000 d900
 6c00f6bf 3e60 880121e93b88 8804145f7c60
 88009ce9bcc0 3e5e 0001 0c46
Call Trace:
 [a01a1868] __btrfs_drop_extents+0x5a8/0xc80 [btrfs]
 [a0165e00] ? tree_mod_log_free_eb+0x240/0x260 [btrfs]
 [a0191d6b]
insert_reserved_file_extent.constprop.60+0xab/0x310 [btrfs]
 [a018ee10] ? start_transaction.part.35+0x80/0x540 [btrfs]
 [a0198565] btrfs_finish_ordered_io+0x465/0x500 [btrfs]
 [a0198615] finish_ordered_fn+0x15/0x20 [btrfs]
 [a01bd8f0] worker_loop+0xa0/0x330 [btrfs]
 [a01bd850] ? check_pending_worker_creates.isra.1+0xe0/0xe0 [btrfs]
 [810930c9] kthread+0xc9/0xe0
 [81093000] ? flush_kthread_worker+0xb0/0xb0
 [81784abc] ret_from_fork+0x7c/0xb0
 [81093000] ? flush_kthread_worker+0xb0/0xb0
Code: 00 00 4c 89 f6 4c 89 e7 48 98 48 8d 04 80 48 8d 54 80 65 e8 b2
6c 04 00 4c 89 ee 4c 89 f7 e8 d7 f4 ff ff 85 c0 0f 8f 5c ff ff ff 0f
0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55

After rebooting, btrfs check (btrfs-tools 3.14.1-1) shows:

checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 5 inode 16170969 errors 80, file extent overlap
root 5 inode 17592262 errors 100, file extent discount
found 752124326140 bytes used err is 1
total csum bytes: 2415994160
total tree bytes: 18200276992
total fs tree bytes: 14156120064
total extent tree bytes: 1240526848
btree space waste bytes: 2998597745
file data blocks allocated: 2473980772352
 referenced 2731118456832

Is it better to not trust compression, autodefrag, or is this
filesystem corruption from previous issues, so I should rebuild the
FS?

Thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive problem on ARM kirkwood NAS with kernel 3.16.0 and btrfs-progs 3.14.2

2014-08-19 Thread Daniel Mizyrycki

Thank you Hugo!  Amazing. It almost work all the way,

According to some tests I did, echo 2 /proc/cpu/alignment does allow in 
fact btrfs receive to work in most cases. For the tests, a x86_64 for 
send, a armv5tel for receive and 2 subvolumes (one with just a few

data and binary files and the other a full root partition) were used.
The send blobs were md5sum and verified at receive side matched.
The small blob was properly process by btrfs receive (file sha1s and 
metadata all matched).

The big blob with the root partition did partially succeeded as it ended
abruptly with ERROR: lsetxattr var/log/journal 
system.posix_acl_default=. failed. Operation not supported. I checked

a few restored files and their sha1 and metadata matched.

Daniel


On 08/19/14 15:22, Hugo Mills wrote:

On Tue, Aug 19, 2014 at 03:10:55PM -0700, Zach Brown wrote:

On Sun, Aug 17, 2014 at 02:44:34PM +0200, Klaus Holler wrote:

Hello list,

I want to use an ARM kirkwood based NSA325v2 NAS (dubbed Receiver) for
receiving btrfs snapshots done on several hosts, e.g. a Core Duo laptop
running kubuntu 14.04 LTS (dubbed Source), storing them on a 3TB WD
red disk (having GPT label, partitions created with parted).

But all the btrfs receive commands on 'Receiver' fail soon with e.g.:
   ERROR: writing to initrd.img-3.13.0-24-generic.original failed. File
too large
... and that stops reception/snapshot creation.


...


Increasing the verbosity with -v -v for btrfs receive shows the
following differences between receive operations on 'Receiver' and
'OtherHost', both of them using the identical inputfile
/boot/.snapshot/20140816-1310-boot_kernel3.16.0.btrfs-send

* the chown and chmod operations are different - resulting in
weird/wrong permissions and sizes on 'Receiver' side.
* what's stransid, this is the first line that differs


This is interesting, thanks for going to the trouble to show those
diffs.

That the commands and strings match up show us that the basic tlv header
chaining is working.  But the u64 attribute values are sometimes messed
up.  And messed up in a specific way.  A variable number of low order
bytes are magically appearing.

(gdb) print/x 11709972488
$2 = 0x2b9f80008
(gdb) print/x 178680
$3 = 0x2b9f8

(gdb) print/x 588032
$6 = 0x8f900
(gdb) print/x 2297
$7 = 0x8f9

Some light googling makes me think that the Marvell Kirkwood is not
friendly at all to unaligned accesses.


ARM isn't in general -- it never has been, even 20 years ago in the
ARM3 days when I was writing code in ARM assembler. We've been bitten
by this before in btrfs (mkfs on ARM works, mounting it fails fast,
because userspace has a trap to fix unaligned accesses, and the kernel
doesn't).


The (biting tongue) send and receive code is playing some games with
casting aligned and unaligned pointers.  Maybe that's upsetting the arm
toolchain/kirkwood.


Almost certainly the toolchain isn't identifying the unaligned
accesses, and thus building code that uses them causes stuff to break.

There's a workaround for userspace that you can use to verify that
this is indeed the problem: echo 2 /proc/cpu/alignment will tell the
kernel to fix up unaligned accesses initiated in userspace. It's a
performance killer, but it should serve to identify whether the
problem is actually this.

Hugo.


  Does this completely untested patch to btrfs-progs,
to be run on the receiver, do anything?

- z

diff --git a/send-stream.c b/send-stream.c
index 88e18e2..4f8dd83 100644
--- a/send-stream.c
+++ b/send-stream.c
@@ -204,7 +204,7 @@ out:
 int __len; \
 TLV_GET(s, attr, (void**)__tmp, __len); \
 TLV_CHECK_LEN(sizeof(*__tmp), __len); \
-   *v = le##bits##_to_cpu(*__tmp); \
+   *v = get_unaligned_le##bits(__tmp); \
 } while (0)

  #define TLV_GET_U8(s, attr, v) TLV_GET_INT(s, attr, 8, v)



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.15 btrfs free space cache oops

2014-08-11 Thread Daniel J Blueman
When running MonetDB against a BTRFS RAID-0 set over 4 SSDs [1] on
3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal
pagefault in memcpy():

(gdb) list *(__btrfs_write_out_cache+0x3e4)
0x81365984 is in __btrfs_write_out_cache
(fs/btrfs/free-space-cache.c:521).
516if (io_ctl-index = io_ctl-num_pages)
517return -ENOSPC;
518io_ctl_map_page(io_ctl, 0);
519}
520
521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE);
522io_ctl_set_crc(io_ctl, io_ctl-index - 1);
523if (io_ctl-index  io_ctl-num_pages)
524io_ctl_map_page(io_ctl, 0);
525return 0;

I can try to reproduce it if more data is useful?

Thanks,
  Daniel

-- [1]

mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata
/dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2
mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread

-- [2]

BUG: unable to handle kernel paging request at 0020
IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
PGD 3bca02c067 PUD 3bcf5fb067 PMD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7
Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013
task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000
RIP: 0010:[8135a374] [8135a374]
__btrfs_write_out_cache+0x3e4/0x8e0
RSP: 0018:8809aefcfc40 EFLAGS: 00010246
RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200
RDX: 1000 RSI: 0020 RDI: 884fb9321000
RBP: 8809aefcfd48 R08: 0200 R09: 
R10:  R11: 884fb9320ffc R12: 8831e3303740
R13: 880100579970 R14: 880bb38061c0 R15: 0020
FS: 7fb9447ed700() GS:884bbfc8() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 0020 CR3: 00329b71c000 CR4: 000407e0
Stack:
 8809aefcfc90 0011 000e 884fbbc2c870
 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec
 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000
Call Trace:
 [81a75b4b] ? _raw_spin_lock+0xb/0x20
 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0
 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670
 [813199c5] commit_cowonly_roots+0x195/0x250
 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0
 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70
 [8132b6b2] btrfs_sync_file+0x182/0x2a0
 [8114a450] do_fsync+0x50/0x80
 [8114a6de] SyS_fdatasync+0xe/0x20
 [81a766e6] system_call_fastpath+0x1a/0x1f
Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85
fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3
48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f
RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
 RSP 8809aefcfc40
CR2: 0020
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.15 btrfs free space cache oops

2014-08-11 Thread Daniel J Blueman
When running MonetDB over a BTRFS RAID-0 set over 4 SSDs [1] on
3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal
pagefault in memcpy():

(gdb) list *(__btrfs_write_out_cache+0x3e4)
0x81365984 is in __btrfs_write_out_cache
(fs/btrfs/free-space-cache.c:521).
516if (io_ctl-index = io_ctl-num_pages)
517return -ENOSPC;
518io_ctl_map_page(io_ctl, 0);
519}
520
521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE);
522io_ctl_set_crc(io_ctl, io_ctl-index - 1);
523if (io_ctl-index  io_ctl-num_pages)
524io_ctl_map_page(io_ctl, 0);
525return 0;

I can try to reproduce it if more data is useful?

Thanks,
  Daniel

-- [1]

mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata
/dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2
mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread

-- [2]

BUG: unable to handle kernel paging request at 0020
IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
PGD 3bca02c067 PUD 3bcf5fb067 PMD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7
Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013
task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000
RIP: 0010:[8135a374] [8135a374]
__btrfs_write_out_cache+0x3e4/0x8e0
RSP: 0018:8809aefcfc40 EFLAGS: 00010246
RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200
RDX: 1000 RSI: 0020 RDI: 884fb9321000
RBP: 8809aefcfd48 R08: 0200 R09: 
R10:  R11: 884fb9320ffc R12: 8831e3303740
R13: 880100579970 R14: 880bb38061c0 R15: 0020
FS: 7fb9447ed700() GS:884bbfc8() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 0020 CR3: 00329b71c000 CR4: 000407e0
Stack:
 8809aefcfc90 0011 000e 884fbbc2c870
 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec
 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000
Call Trace:
 [81a75b4b] ? _raw_spin_lock+0xb/0x20
 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0
 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670
 [813199c5] commit_cowonly_roots+0x195/0x250
 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0
 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70
 [8132b6b2] btrfs_sync_file+0x182/0x2a0
 [8114a450] do_fsync+0x50/0x80
 [8114a6de] SyS_fdatasync+0xe/0x20
 [81a766e6] system_call_fastpath+0x1a/0x1f
Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85
fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3
48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f
RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
 RSP 8809aefcfc40
CR2: 0020
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on incremental backups

2014-07-18 Thread Daniel Mizyrycki


On 07/18/14 06:40, Russell Coker wrote:

Displaying backups is an issue of backup software. It is above the
level that BTRFS development touches. While people here can probably
offer generic advice on backup software it's not the topic of the
list.


As said, I don't mind developing the software. But, is the required
information easily available? Is there a way to get a diff, something
like a list of changed/added/removed files between snapshots?


Your usual diff utility will do it.  I guess you could parse the output of
btrfs send.
Following this thought, one step closer in getting a text diff can be to 
use fardump. It takes a btrfs send binary stream and outputs the send 
instructions in plaintext. 
(https://kernel.googlesource.com/pub/scm/linux/kernel/git/arne/far-progs).
It certainly would be awesome if btrfs-progs could have an extra 
parameter to just generate the list of changed/added/removed files 
between snapshots as all the needed infrastructure is already in place.





And, finally, nobody has mentioned on the possibility of merging
multiple snapshots into a single snapshot. Would this be possible, to
create a snapshot that contains the most recent version of each file
present across all of the snapshots (including files which may be
present in only one of the snapshots)?


There is no btrfs functionality for that.  But I'm sure you could do something
with standard Unix utilities and copying files around.
Sure, but the management of data deduplication is left to the user 
(presumably using cp --reflink) which is not trivial.

Does anybody knows how safe it is to use duperemove or bedup?
Any recommendations on how to effectively deduplicate btrfs at this point?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs data dup on single device?

2014-06-25 Thread Daniel Landstedt
Will it be possible to use DUP for data as well as for metadata on a
single device?
And if so, am I going to be able to specify more than 1 copy of the data?

Storage is pretty cheap now, and to have multiple copies in btrfs is
something that I think could be used a lot. I know I will use multiple
copies of my data if made possible.

Is it something that might be available when RAID1 gets N mirrors
instead of just 1 mirror?



Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs data dup on single device?

2014-06-25 Thread Daniel Landstedt
It'll be exactly 2 copies at the moment. Note that performance on
 an SSD will at least halve, and performance on a rotational device
 will probably suck quite badly. Neither will help you in the case of a
 full-device failure. You still need backups, kept on a separate machine.


Write performance, sure, but reads shouldn't be that much slower?
For DUP on same device I was thinking about family photos, source code
and such, not for compiles or databases with a lot of queries.
Of course you need backups, offsite backups.. I had a fire a couple of
years ago, and, well.. If the second machine also is in the vicinity..
We were lucky this time, but a couple of more minutes and all would
have been lost. Got me thinking a bit more.

The question is, why? If you have enough disk media errors to make
 it worth using multiple copies, then your storage device is basically
 broken and needs replacing, and it can't really be relied on for very
 much longer.

I was thinking that DUP on same device was mostly for protection
against bit rot and smaller errors, not device failure.
If the device starts to misbehave, it might be enough to rescue the
data to another device if you have DUPes. Ok, a backup will probably
help there too.

I'm putting together a new server at home, and want the checksums in
btrfs, and multiple copies of the important data. As I understand it
it's better than RAID6 that I used earlier, which has it's own set of
problems.
And multiple offsite backups.

I'll try and see if it's possible to use DUP for data on same device,
when I looked around it seemed as it wasn't possible.


Hugo.




Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on whole disk (no partitions)

2014-06-21 Thread Daniel Cegiełka
2014-06-19 2:07 GMT+02:00 Russell Coker russ...@coker.com.au:

 For boot disks I use the traditional partitioning system.  So far I don't run
 any systems that have a boot disk larger than 2TB so I haven't needed to use
 GPT.

 I have a BTRFS RAID-1 on 2*3TB disks which have no partition tables, when the
 filesystem is going to use the entire device and there's no boot loader there
 is no reason to have a partition table.

ok, but what about alignment? This can have a significant impact on performance.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on whole disk (no partitions)

2014-06-21 Thread Daniel Cegiełka
2014-06-19 11:11 GMT+02:00 Imran Geriskovan imran.gerisko...@gmail.com:
 On 6/19/14, Russell Coker russ...@coker.com.au wrote:


 Grub installs itself and boots from Partitionless Btrfs disk.
 It is handy for straight forward installations.

 However, IF you need boot partition (ie. initramfs and kernel to boot
 from encrypted root) its another story.

zfs solved this problem in grub (libzfs). I think we can find a
solution to work around this problem.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs on whole disk (no partitions)

2014-06-18 Thread Daniel Cegiełka
Hi,
I created btrfs directly to disk using such a scheme (no partitions):

dd if=/dev/zero of=/dev/sda bs=4096
mkfs.btrfs -L dev_sda /dev/sda
mount /dev/sda /mnt

cd /mnt
btrfs subvolume create __active
btrfs subvolume create __active/rootvol
btrfs subvolume create __active/usr
btrfs subvolume create __active/home
btrfs subvolume create __active/var
btrfs subvolume create __snapshots

cd /
umount /mnt
mount -o subvol=__active/rootvol /dev/sda /mnt
mkdir /mnt/{usr,home,var}
mount -o subvol=__active/usr /dev/sda /mnt/usr
mount -o subvol=__active/home /dev/sda /mnt/home
mount -o subvol=__active/var /dev/sda /mnt/var

# /etc/fstab
UID=ID/btrfs rw,relative,space_cache,subvol=__active/rootvol0 0
UUID=ID/usrbtrfs rw,relative,space_cache,subvol=__active/usr0 0
UUID=ID/homebtrfs rw,relative,space_cache,subvol=__active/home0 0
UUID=ID/varbtrfs rw,relative,space_cache,subvol=__active/var0 0

Everything works fine. Is such a solution is recommended? In my
opinion, the creation of the partitions seems to be completely
unnecessary if you can use btrfs.

I will be grateful for your feedback.
Best regards,
Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is metadata redundant over more than one drive with raid0 too?

2014-05-04 Thread Daniel Lee
On 05/04/2014 12:24 AM, Marc MERLIN wrote:
  
 Gotcha, thanks for confirming, so -m raid1 -d raid0 really only protects
 against metadata corruption or a single block loss, but otherwise if you
 lost a drive in a 2 drive raid0, you'll have lost more than just half
 your files.

 The scenario you mentioned at the beginning, if I lose a drive,
 I'll still have full metadata for the entire filesystem and only
 missing files is more applicable to using -m raid1 -d single.
 Single is not geared towards performance and, though it doesn't
 guarantee a file is only on a single disk, the allocation does mean
 that the majority of all files smaller than a chunk will be stored
 on only one disk or the other - not both.
 Ok, so in other words:
 -d raid0: if you one 1 drive out of 2, you may end up with small files
 and the rest will be lost

 -d single: you're more likely to have files be on one drive or the
 other, although there is no guarantee there either.

 Correct?

 Thanks,
 Marc
This often seems to confuse people and I think there is a common
misconception that the btrfs raid/single/dup features work at the file
level when in reality they work at a level closer to lvm/md.

If someone told you that they lost a device out of a jbod or multi disk
lvm group(somewhat analogous to -d single) with ext on top you would
expect them to lose data in any file that had a fragment in the lost
region (lets ignore metadata for a moment). This is potentially up to
100% of the files but this should not be a surprising result. Similarly,
someone who has lost a disk out of a md/lvm raid0 volume should not be
surprised to have a hard time recovering any data at all from it.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Which companies are using Btrfs in production?

2014-04-24 Thread Daniel Lee
On 04/23/2014 06:19 PM, Marc MERLIN wrote:
 Oh while we're at it, are there companies that can say they are using btrfs
 in production?

 Marc
Netgear uses BTRFS as the filesystem in their refreshed ReadyNAS line.
They apparently use Oracle's linux distro so I assume they're relying on
them to do most of the heavy lifting as far as support BTRFS and
backporting goes since they're still on 3.0! They also have raid5/6
support so they are probably running BTRFS on top of md.

http://www.netgear.com/images/BTRFS%20on%20ReadyNAS%20OS%206_9May1318-76105.pdf

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.13.5 btrfs read() oops

2014-03-07 Thread Daniel J Blueman
With kernel 3.13.5 (Ubuntu mainline), when plugging in a (evidently
twitchy) USB3 stick with a BTRFS filesystem, I hit an oops in read()
[1].

Full dmesg output is at:
http://quora.org/2014/btrfs-oops.txt

Thanks,
  Daniel

-- [1]

IP: 0010:[8135eaf6] [8135eaf6] memcpy+0x6/0x110
RSP: 0018:88025fa1b910 EFLAGS: 00010207
RAX: 88005c3d906e RBX: 027e RCX: 027e
RDX: 027e RSI: 00050800 RDI: 88005c3d906e
RBP: 88025fa1b948 R08: 1000 R09: 88025fa1b918
R10:  R11:  R12: 8800560e6350
R13: 1600 R14: 88005c3d92ec R15: 027e
FS: 7f9272f79700() GS:88026f3c() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 7f9264010018 CR3: 00025f79a000 CR4: 001407e0
Stack:
 a036401c 1000 8800837f3800 8801e041a000
  8800763df218 880064c8c4c0 88025fa1ba08
 a0348f9c 0f18  1000
Call Trace:
 [a036401c] ? read_extent_buffer+0xbc/0x110 [btrfs]
 [a0348f9c] btrfs_get_extent+0x91c/0x970 [btrfs]
 [a0360217] __do_readpage+0x357/0x730 [btrfs]
 [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
 [a0360972] __extent_readpages.constprop.41+0x2a2/0x2c0 [btrfs]
 [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
 [a03627f6] extent_readpages+0x1b6/0x1c0 [btrfs]
 [a0348680] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
 [81192f03] ? alloc_pages_current+0xa3/0x160
 [a03467df] btrfs_readpages+0x1f/0x30 [btrfs]
 [811578d9] __do_page_cache_readahead+0x1b9/0x270
 [81157dd2] ondemand_readahead+0x152/0x2a0
 [81157f51] page_cache_sync_readahead+0x31/0x50
 [8114d655] generic_file_aio_read+0x4c5/0x700
 [811b671a] do_sync_read+0x5a/0x90
 [811b6db5] vfs_read+0x95/0x160
 [811b78c9] SyS_read+0x49/0xa0
 [81715bff] tracesys+0xe1/0xe6
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Daniel Lee
On 02/14/2014 03:04 AM, Axelle wrote:
 Hi Hugo,

 Thanks for your answer.
 Unfortunately, I had also tried

 sudo mount -o degraded /dev/sdc1 /samples
 mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

 and dmesg says:
 [ 1177.695773] btrfs: open_ctree failed
 [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 1 transid 31105 /dev/sdc6
 [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 1 transid 31105 /dev/sdc6
 [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 4013.408280] btrfs: allowing degraded mounts
 [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, gen 0
 [ 4015.600424] Btrfs: too many missing devices, writeable mount is not allowed
 [ 4015.630841] btrfs: open_ctree failed
Did the crashed /dev/sdb have more than 1 partitions in your raid1
filesystem?

 Yes, I know, I'll probably be losing a lot of data, but it's not too
 much my concern because I had a backup (sooo happy about that :D). If
 I can manage to recover a little more on the btrfs volume it's bonus,
 but in the event I do not, I'll be using my backup.

 So, how do I fix my volume? I guess there would be a solution apart
 from scratching/deleting everything and starting again...


 Regards,
 Axelle



 On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills h...@carfax.org.uk wrote:
 On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
 Hi,
 I've just encountered a hard disk crash in one of my btrfs pools.

 sudo btrfs filesystem show
 failed to open /dev/sr0: No medium found
 Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
 Total devices 3 FS bytes used 112.70GB
 devid1 size 100.61GB used 89.26GB path /dev/sdc6
 devid2 size 93.13GB used 84.00GB path /dev/sdc1
 *** Some devices missing

 The device which is missing is /dev/sdb. I have replaced it with a new
 hard disk. How do I add it back to the volume and fix the device
 missing?
 The pool is expected to mount to /samples (it is not mounted yet).

 I tried this - which fails:
 sudo btrfs device add /dev/sdb /samples
 ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device

 Why isn't this working?
Because it's not mounted. :)

 I also tried this:
 sudo mount -o recovery /dev/sdc1 /samples
 mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so
 same with /dev/sdc6
Close, but what you want here is:

 mount -o degraded /dev/sdc1 /samples

 not recovery. That will tell the FS that there's a missing disk, and
 it should mount without complaining. If your data is not RAID-1 or
 RAID-10, then you will almost certainly have lost some data.

At that point, since you've removed the dead disk, you can do:

 btrfs device delete missing /samples

 which forcibly removes the record of the missing device.

Then you can add the new device:

 btrfs device add /dev/sdb /samples

And finally balance to repair the RAID:

 btrfs balance start /samples

It's worth noting that even if you have RAID-1 data and metadata,
 losing /dev/sdc in your current configuration is likely to cause
 severe data loss -- probably making the whole FS unrecoverable. This
 is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices,
 and will happily put both copies of a piece of RAID-1 data (or
 metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore
 wouldn't recommend running like that for very long.

Hugo.

 --
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- All hope abandon,  Ye who press Enter here. ---
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Daniel Lee
On 02/14/2014 07:22 AM, Axelle wrote:
 Did the crashed /dev/sdb have more than 1 partitions in your raid1
 filesystem?
 No, only 1 - as far as I recall.

 -- Axelle.
What does:

btrfs filesystem df /samples

say now that you've mounted the fs readonly?
 On Fri, Feb 14, 2014 at 3:58 PM, Daniel Lee longinu...@gmail.com wrote:
 On 02/14/2014 03:04 AM, Axelle wrote:
 Hi Hugo,

 Thanks for your answer.
 Unfortunately, I had also tried

 sudo mount -o degraded /dev/sdc1 /samples
 mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

 and dmesg says:
 [ 1177.695773] btrfs: open_ctree failed
 [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 1 transid 31105 /dev/sdc6
 [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 1 transid 31105 /dev/sdc6
 [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 4013.408280] btrfs: allowing degraded mounts
 [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, 
 gen 0
 [ 4015.600424] Btrfs: too many missing devices, writeable mount is not 
 allowed
 [ 4015.630841] btrfs: open_ctree failed
 Did the crashed /dev/sdb have more than 1 partitions in your raid1
 filesystem?
 Yes, I know, I'll probably be losing a lot of data, but it's not too
 much my concern because I had a backup (sooo happy about that :D). If
 I can manage to recover a little more on the btrfs volume it's bonus,
 but in the event I do not, I'll be using my backup.

 So, how do I fix my volume? I guess there would be a solution apart
 from scratching/deleting everything and starting again...


 Regards,
 Axelle



 On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills h...@carfax.org.uk wrote:
 On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
 Hi,
 I've just encountered a hard disk crash in one of my btrfs pools.

 sudo btrfs filesystem show
 failed to open /dev/sr0: No medium found
 Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
 Total devices 3 FS bytes used 112.70GB
 devid1 size 100.61GB used 89.26GB path /dev/sdc6
 devid2 size 93.13GB used 84.00GB path /dev/sdc1
 *** Some devices missing

 The device which is missing is /dev/sdb. I have replaced it with a new
 hard disk. How do I add it back to the volume and fix the device
 missing?
 The pool is expected to mount to /samples (it is not mounted yet).

 I tried this - which fails:
 sudo btrfs device add /dev/sdb /samples
 ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for device

 Why isn't this working?
Because it's not mounted. :)

 I also tried this:
 sudo mount -o recovery /dev/sdc1 /samples
 mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so
 same with /dev/sdc6
Close, but what you want here is:

 mount -o degraded /dev/sdc1 /samples

 not recovery. That will tell the FS that there's a missing disk, and
 it should mount without complaining. If your data is not RAID-1 or
 RAID-10, then you will almost certainly have lost some data.

At that point, since you've removed the dead disk, you can do:

 btrfs device delete missing /samples

 which forcibly removes the record of the missing device.

Then you can add the new device:

 btrfs device add /dev/sdb /samples

And finally balance to repair the RAID:

 btrfs balance start /samples

It's worth noting that even if you have RAID-1 data and metadata,
 losing /dev/sdc in your current configuration is likely to cause
 severe data loss -- probably making the whole FS unrecoverable. This
 is because the FS sees /dev/sdc1 and /dev/sdc6 as independent devices,
 and will happily put both copies of a piece of RAID-1 data (or
 metadata) on /dev/sdc -- one on each of sdc1 and sdc6. I therefore
 wouldn't recommend running like that for very long.

Hugo.

 --
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- All hope abandon,  Ye who press Enter here. ---
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs

Re: Recovering from hard disk failure in a pool

2014-02-14 Thread Daniel Lee
On 02/14/2014 09:53 AM, Axelle wrote:
 Hi Daniel,

 This is what it answers now:

 sudo btrfs filesystem df /samples
 [sudo] password for axelle:
 Data, RAID0: total=252.00GB, used=108.99GB
 System, RAID1: total=8.00MB, used=28.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=5.25GB, used=3.71GB
So the issue here is that your data is raid0 which will not tolerate any
loss of a device. I'd recommend trashing the current filesystem and
creating a new one with some redundancy (use raid1 not raid0, don't add
more than one partition from the same disk to a btrfs filesystem, etc.)
so you can recover from this sort of scenario in the future. To do this,
use wipefs on the remaining partitions to remove all traces of the
current btrfs filesystem.

 By the way, I was happy to recover most of my data :)

This is the nice thing about the checksumming in btrfs, knowing that
what data you did read off is correct. :)

 Of course, I still can't add my new /dev/sdb to /samples because it's 
 read-only:
 sudo btrfs device add /dev/sdb /samples
 ERROR: error adding the device '/dev/sdb' - Read-only file system

 Regards
 Axelle

 On Fri, Feb 14, 2014 at 5:19 PM, Daniel Lee longinu...@gmail.com wrote:
 On 02/14/2014 07:22 AM, Axelle wrote:
 Did the crashed /dev/sdb have more than 1 partitions in your raid1
 filesystem?
 No, only 1 - as far as I recall.

 -- Axelle.
 What does:

 btrfs filesystem df /samples

 say now that you've mounted the fs readonly?
 On Fri, Feb 14, 2014 at 3:58 PM, Daniel Lee longinu...@gmail.com wrote:
 On 02/14/2014 03:04 AM, Axelle wrote:
 Hi Hugo,

 Thanks for your answer.
 Unfortunately, I had also tried

 sudo mount -o degraded /dev/sdc1 /samples
 mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

 and dmesg says:
 [ 1177.695773] btrfs: open_ctree failed
 [ 1247.448766] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 1247.449700] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 1 transid 31105 /dev/sdc6
 [ 1247.458794] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 1247.459601] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 1 transid 31105 /dev/sdc6
 [ 4013.363254] device fsid 545e95c6-d347-4a8c-8a49-38b9f9cb9add devid
 2 transid 31105 /dev/sdc1
 [ 4013.408280] btrfs: allowing degraded mounts
 [ 4013.555764] btrfs: bdev (null) errs: wr 0, rd 14, flush 0, corrupt 0, 
 gen 0
 [ 4015.600424] Btrfs: too many missing devices, writeable mount is not 
 allowed
 [ 4015.630841] btrfs: open_ctree failed
 Did the crashed /dev/sdb have more than 1 partitions in your raid1
 filesystem?
 Yes, I know, I'll probably be losing a lot of data, but it's not too
 much my concern because I had a backup (sooo happy about that :D). If
 I can manage to recover a little more on the btrfs volume it's bonus,
 but in the event I do not, I'll be using my backup.

 So, how do I fix my volume? I guess there would be a solution apart
 from scratching/deleting everything and starting again...


 Regards,
 Axelle



 On Fri, Feb 14, 2014 at 11:58 AM, Hugo Mills h...@carfax.org.uk wrote:
 On Fri, Feb 14, 2014 at 11:35:56AM +0100, Axelle wrote:
 Hi,
 I've just encountered a hard disk crash in one of my btrfs pools.

 sudo btrfs filesystem show
 failed to open /dev/sr0: No medium found
 Label: none  uuid: 545e95c6-d347-4a8c-8a49-38b9f9cb9add
 Total devices 3 FS bytes used 112.70GB
 devid1 size 100.61GB used 89.26GB path /dev/sdc6
 devid2 size 93.13GB used 84.00GB path /dev/sdc1
 *** Some devices missing

 The device which is missing is /dev/sdb. I have replaced it with a new
 hard disk. How do I add it back to the volume and fix the device
 missing?
 The pool is expected to mount to /samples (it is not mounted yet).

 I tried this - which fails:
 sudo btrfs device add /dev/sdb /samples
 ERROR: error adding the device '/dev/sdb' - Inappropriate ioctl for 
 device

 Why isn't this working?
Because it's not mounted. :)

 I also tried this:
 sudo mount -o recovery /dev/sdc1 /samples
 mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so
 same with /dev/sdc6
Close, but what you want here is:

 mount -o degraded /dev/sdc1 /samples

 not recovery. That will tell the FS that there's a missing disk, and
 it should mount without complaining. If your data is not RAID-1 or
 RAID-10, then you will almost certainly have lost some data.

At that point, since you've removed the dead disk, you can do:

 btrfs device delete missing /samples

 which forcibly removes the record of the missing device.

Then you can add the new device:

 btrfs device add /dev/sdb /samples

And finally

Re: [PATCH] btrfs-progs: update INSTALL file

2014-01-27 Thread Daniel Cegiełka
2014-01-22 Anand Jain anand.j...@oracle.com:
 with the changes that has happened since last time it was updated

 Signed-off-by: Anand Jain anand.j...@oracle.com
 ---
  INSTALL |6 ++
  1 files changed, 2 insertions(+), 4 deletions(-)

 diff --git a/INSTALL b/INSTALL
 index 8ead607..a86878a 100644
 --- a/INSTALL
 +++ b/INSTALL
 @@ -12,7 +12,8 @@ complete:
  modprobe libcrc32c
  insmod btrfs.ko

 -The Btrfs utility programs require libuuid to build.  This can be found
 +The Btrfs utility programs (btrfs-progs) require libattr zlib libacl
 +e2fsprogs libblkid lzo2 to build.  This can be found
  in the e2fsprogs sources, and is usually available as libuuid or
  e2fsprogs-devel from various distros.

Which version of libblkid should we use? From e2fsprogs or from util-linux?

libattr/libacl: these libraries are not necessary - you need it only to convert.

Daniel
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Barrier remount failure

2013-12-25 Thread Daniel J Blueman
On 3.13-rc5, it's possible to remount a mounted BTRFS filesystem with
'nobarrier', but not possible to remount with 'barrier'.

Is this expected?

Many thanks,
  Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nagios probe for btrfs RAID status?

2013-11-23 Thread Daniel Pocock


On 23/11/13 04:59, Anand Jain wrote:
 
 
 For example, would the command

  btrfs filesystem show --all-devices

 give a non-zero error status or some other clue if any of the devices
 are at risk?
 
  No there isn't any good way as of now. that's something to fix.

Does it require kernel/driver code changes or it should be possible to
implement in the user space utility?

It would be useful for people testing the filesystem to know when they
get into trouble so they can investigate more quickly (and before the
point of no return)

 [btrfs personal user/sysadmin, not a dev, not anything large enough to
 have personal nagios experience...]
 
 AFAIK, btrfs raid modes currently switch the filesystem to read-only on
 any device-drop error. That has been deemed the simplest/safest policy
 during development, tho at some point as stable approaches the behavior
 could theoretically be made optional.

None of the warnings about btrfs's experimental status hint at that,
some people may be surprised by it.

 So detection could watch for read-only and act accordingly, either
 switching back to read-write or rebooting or simply logging the event,
 as deemed appropriate.

It would be relatively trivial to implement a Nagios check for
read-only, Nagios probes are just shell scripts

What about when btrfs detects a bad block checksum and recovers data
from the equivalent block on another disk?  The wiki says there will be
a syslog event.  Does btrfs keep any stats on the number of blocks that
it considers unreliable and can this be queried from user space?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nagios probe for btrfs RAID status?

2013-11-23 Thread Daniel Pocock


On 23/11/13 09:37, Daniel Pocock wrote:
 
 
 On 23/11/13 04:59, Anand Jain wrote:


 For example, would the command

  btrfs filesystem show --all-devices

 give a non-zero error status or some other clue if any of the devices
 are at risk?

  No there isn't any good way as of now. that's something to fix.
 
 Does it require kernel/driver code changes or it should be possible to
 implement in the user space utility?
 
 It would be useful for people testing the filesystem to know when they
 get into trouble so they can investigate more quickly (and before the
 point of no return)
 
 [btrfs personal user/sysadmin, not a dev, not anything large enough to
 have personal nagios experience...]

 AFAIK, btrfs raid modes currently switch the filesystem to read-only on
 any device-drop error. That has been deemed the simplest/safest policy
 during development, tho at some point as stable approaches the behavior
 could theoretically be made optional.
 
 None of the warnings about btrfs's experimental status hint at that,
 some people may be surprised by it.
 
 So detection could watch for read-only and act accordingly, either
 switching back to read-write or rebooting or simply logging the event,
 as deemed appropriate.
 
 It would be relatively trivial to implement a Nagios check for
 read-only, Nagios probes are just shell scripts

Just checked, it already exists, so we are half way there:

http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_ro_mounts/details


 
 What about when btrfs detects a bad block checksum and recovers data
 from the equivalent block on another disk?  The wiki says there will be
 a syslog event.  Does btrfs keep any stats on the number of blocks that
 it considers unreliable and can this be queried from user space?
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nagios probe for btrfs RAID status?

2013-11-23 Thread Daniel Pocock


On 23/11/13 11:35, Duncan wrote:
 Daniel Pocock posted on Sat, 23 Nov 2013 09:37:50 +0100 as excerpted:
 
 What about when btrfs detects a bad block checksum and recovers data
 from the equivalent block on another disk?  The wiki says there will be
 a syslog event.  Does btrfs keep any stats on the number of blocks that
 it considers unreliable and can this be queried from user space?
 
 The way you phrased that question is strange to me (considers unreliable?
 does that mean ones that it had to fix, or ones that it had to fix more 
 than once, or...), so I'm not sure this answers it, but from the btrfs 
 manpage...


Let me clarify: when I said unreliable, I was referring to those blocks
where the block device driver reads the block without reporting any
error but where btrfs has decided the checksum is bad and not used the
data from the block.

Such blocks definitely exist. Sometimes the data was corrupted at the
moment of writing and no matter how many times you read the block, you
always get a bad checksum.



 
 btrfs device stats [-z] {path|device}
 
 Read and print the device IO stats for all devices of the filesystem 
 identified by path or for a single device.
 
 Options
 
 -z   Reset stats to zero after reading them.
 
 
 
 Here's the output for my (dual device btrfs raid1) rootfs, here:
 
 btrfs dev stat /
 [/dev/sdc5].write_io_errs   0
 [/dev/sdc5].read_io_errs0
 [/dev/sdc5].flush_io_errs   0
 [/dev/sdc5].corruption_errs 0
 [/dev/sdc5].generation_errs 0
 [/dev/sda5].write_io_errs   0
 [/dev/sda5].read_io_errs0
 [/dev/sda5].flush_io_errs   0
 [/dev/sda5].corruption_errs 0
 [/dev/sda5].generation_errs 0
 
 As you can see, for multi-device filesystems it gives the stats per 
 component device.  Any errors accumulate until a reset using -z, so you 
 can easily see if the numbers are increasing over time and by how much.
 


That looks interesting - are these explained anywhere?

Should a Nagios plugin just look for any non-zero value or just focus on
some of those?

Are they runtime stats (since system boot) or are they maintained in the
filesystem on disk?

My own version of the btrfs utility doesn't have that command though, I
am using a Debian stable system.  I tried a newer version and it gives

ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS)

so I probably need to update my kernel too.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Nagios probe for btrfs RAID status?

2013-11-22 Thread Daniel Pocock

I just did a search and couldn't find any probe for btrfs RAID status

The check_raid plugin seems to recognise mdadm and various other types
of RAID but not btrfs

Has anybody seen a plugin for Nagios or could anybody comment on how it
should work if somebody wants to make one?

For example, would the command

btrfs filesystem show --all-devices

give a non-zero error status or some other clue if any of the devices
are at risk?


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-convert won't convert ext* - No valid Btrfs found on /dev/sdb1

2013-10-11 Thread Daniel
Hello Josef,

Josef Bacik jbacik at fusionio.com writes:

 
 On Thu, Sep 05, 2013 at 10:45:23AM -0500, Eric Sandeen wrote:

[...]

  This was a regression around July 3; there was no regression test at
  the time.
  
  [615f2867854c186a37cb2e2e5a2e13e9ed4ab0df] Btrfs-progs: cleanup similar 
code in open_ctree_* and close_ctree
  
  broke it.
  
  Patches were sent to the list to fix it on July 17,
  
  https://patchwork.kernel.org/patch/2828820/
  
  but they haven't been merged into the main repo.
  
  I sent a regression test for it to the list on Aug 4, but nobody
  reviewed it, so it hasn't been merged into the test suite, either.
  
  Winning all around!
 
 Alright, alright I'll review it, Jesus.  ;),

Is there any progress on this or can I help with solving this somehow?
 
 Josef

Daniel

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_join_transaction bug...

2013-09-02 Thread Daniel J Blueman
When running my btrfs exerciser [1] for ~5 mins on 3.11-rc7, I hit a
BUG_ON in merge_reloc_roots that asserts btrfs_join_transaction
doesn't generate an error [2].

Is this a valid failure when the filesystem went read-only due to out of space?

It can also reproduce a livelock between a 'btrfs filesystem balance'
and btrfs-transaction kthread.

Thanks,
  Daniel

--- [1]

1. boot your box with ramdisk_size set to a fifth of your box's
memory, eg ramdisk_size=1572864 for an 8GB box, if rd is compiled into
the kernel
2. install fio
3. fetch http://quora.org/2013/workload (for fio)
4. make sure /dev/ram{0-3} don't have any important data
5. run http://quora.org/2013/btrfsatron

--- [2]

$ sudo btrfs filesystem balance /tmp/btrfsathon
ERROR: defrag failed on /tmp/btrfsathon - Read-only file system
total 1 failures
ERROR: error during balancing '/tmp/btrfsathon' - Read-only file system
There may be more info in syslog - try dmesg | tail

$ dmesg
...
WARNING: CPU: 5 PID: 22243 at /home/apw/COD/linux/fs/btrfs/super.c:253
__btrfs_abort_transaction+0x135/0x140 [btrfs]()
btrfs: Transaction aborted (error -28)
Modules linked in: dm_crypt snd_hda_codec_hdmi ipt_REJECT xt_limit
xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp
nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables
arc4 b43 joydev mac80211 snd_hda_codec_cirrus rfcomm bnep cfg80211
snd_hda_intel snd_hda_codec ssb uvcvideo ax88179_178a usbnet snd_hwdep
applesmc videobuf2_vmalloc mii snd_pcm btusb videobuf2_memops
videobuf2_core input_polldev bluetooth snd_page_alloc videodev nfsd
snd_seq_midi snd_seq_midi_event auth_rpcgss snd_rawmidi bcm5974
nfs_acl snd_seq nfs snd_seq_device lockd snd_timer binfmt_misc bcma
mei_me lpc_ich sunrpc snd mei soundcore fscache apple_gmux mac_hid
apple_bl nls_iso8859_1 lp parport btrfs xor zlib_deflate raid6_pq
libcrc32c microcode hid_generic hid_apple usbhid hid nouveau i915
mxm_wmi wmi ttm i2c_algo_bit ahci drm_kms_helper libahci drm video
CPU: 5 PID: 22243 Comm: btrfs Not tainted 3.11.0-031100rc7-generic #201308252135
Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS
MBP101.88Z.00EE.B02.1208081132 08/08/2012
 00fd 8801c766d6c8 81720d7a 0007
 8801c766d718 8801c766d708 8106534c 0033
 8801f2793800 8802181d6780 ffe4 163d
Call Trace:
 [81720d7a] dump_stack+0x46/0x58
 [8106534c] warn_slowpath_common+0x8c/0xc0
 [81065436] warn_slowpath_fmt+0x46/0x50
 [a02c8290] ? lookup_extent_backref+0x60/0xf0 [btrfs]
 [a02b8cd5] __btrfs_abort_transaction+0x135/0x140 [btrfs]
 [a02caca5] __btrfs_free_extent+0x1e5/0x990 [btrfs]
 [a02cb59c] run_delayed_tree_ref+0x14c/0x1c0 [btrfs]
 [a02cf84e] run_one_delayed_ref+0xde/0xf0 [btrfs]
 [a02cf999] run_clustered_refs+0x139/0x530 [btrfs]
 [a02d3570] btrfs_run_delayed_refs+0x100/0x5a0 [btrfs]
 [a02e34fe] btrfs_commit_transaction+0xbe/0x9e0 [btrfs]
 [8172c44e] ? _raw_spin_lock+0xe/0x20
 [a02ce24f] ? btrfs_block_rsv_check+0x6f/0x90 [btrfs]
 [a02e41c0] __btrfs_end_transaction+0x350/0x390 [btrfs]
 [a02e4233] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
 [a0330fc4] relocate_block_group+0x434/0x570 [btrfs]
 [a03312b7] btrfs_relocate_block_group+0x1b7/0x2f0 [btrfs]
 [a03093b6] btrfs_relocate_chunk.isra.62+0x56/0x3e0 [btrfs]
 [a03080c9] ? should_balance_chunk.isra.66+0x49/0x2f0 [btrfs]
 [a030cda2] __btrfs_balance+0x312/0x3f0 [btrfs]
 [a030d1ba] btrfs_balance+0x33a/0x5d0 [btrfs]
 [a03162af] btrfs_ioctl_balance+0x22f/0x550 [btrfs]
 [a0317f09] btrfs_ioctl+0x4f9/0xa90 [btrfs]
 [8109caf6] ? account_user_time+0xa6/0xc0
 [8109d134] ? vtime_account_user+0x74/0x90
 [811c471c] do_vfs_ioctl+0x7c/0x2f0
 [810210a9] ? syscall_trace_enter+0x29/0x270
 [811c4a21] SyS_ioctl+0x91/0xb0
 [81735aaf] tracesys+0xe1/0xe6
---[ end trace 552316f62b37bc3a ]---
BTRFS error (device ram1) in __btrfs_free_extent:5693: errno=-28 No space left
BTRFS info (device ram1): forced readonly
BTRFS debug (device ram1): run_one_delayed_ref returned -28
BTRFS error (device ram1) in btrfs_run_delayed_refs:2677: errno=-28 No
space left
[ cut here ]
Kernel BUG at a0330b83 [verbose debug info unavailable]
invalid opcode:  [#1] SMP
Modules linked in: dm_crypt snd_hda_codec_hdmi ipt_REJECT xt_limit
xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp
nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables
arc4 b43 joydev mac80211 snd_hda_codec_cirrus rfcomm bnep cfg80211
snd_hda_intel snd_hda_codec ssb uvcvideo ax88179_178a usbnet snd_hwdep
applesmc videobuf2_vmalloc mii snd_pcm btusb

Re: Running Apache Derby on 3.8 and BTRFS cause kernel oops

2013-02-28 Thread Daniel Kozák
Dne Thu, 28 Feb 2013 16:47:21 +0100 Josef Bacik jba...@fusionio.com  
napsal(a):



On Wed, Feb 27, 2013 at 03:59:35PM -0700, Blair Zajac wrote:

On 02/27/2013 02:08 PM, Josef Bacik wrote:
 On Wed, Feb 27, 2013 at 1:19 PM, Daniel Kozák kozz...@gmail.com  
wrote:


 [kozzi@KozziFX ~]$ mkdir derby
 [kozzi@KozziFX ~]$ cd derby/
 [kozzi@KozziFX derby]$ wget -c -q
  
http://mirror.hosting90.cz/apache//db/derby/db-derby-10.9.1.0/db-derby-10.9.1.0-bin.zip

 [kozzi@KozziFX derby]$ unzip -qq db-derby-10.9.1.0-bin.zip
 [kozzi@KozziFX derby]$ cd db-derby-10.9.1.0-bin/
 [kozzi@KozziFX db-derby-10.9.1.0-bin]$ DERBY_HOME=`pwd`
 [kozzi@KozziFX db-derby-10.9.1.0-bin]$ java -jar
 $DERBY_HOME/lib/derbyrun.jar server start 
 [kozzi@KozziFX db-derby-10.9.1.0-bin]$ java -jar
 $DERBY_HOME/lib/derbyrun.jar ij
 verze ij 10.9
 ij CONNECT 'jdbc:derby://localhost:1527/seconddb;create=true';

 BTW. after this I must restart my PC, and after restart, my system  
doesn't

 boot anymore :-) (some more btrfs oops).
 So I must use btrfs check --repair /dev/sdaX.


 Sigh and of course I can't reproduce myself, even with importing a
 huge database into derby.  So you are just mounting with -o
 compress=lzo?  What about the mkfs, are you using raid or anything?
 Are you on a ssd?  Also when this happens is there any output above
 the --- [ cut here ] ---?  There should be something about length and
 such.  Thanks,

I was able to reproduce on with 3.8 using Ubuntu 13.04 running in KVM
using the commands exactly as given, but it only after stopping and
starting the server again.

I use the cloud image from here, boot of an Ubuntu CD-ROM ISO to change
from ext4 to btrfs, then installed openjdk.

http://cloud-images.ubuntu.com/raring/current/raring-server-cloudimg-amd64-disk1.img

I could make my image available for download later if you need it, in a
pre-failure state.  Let me know.



Yeah I still can't reproduce, can either of you send me your kernel  
config so I

can see if it's something in my config that's causing problems?  Thanks,

Josef


Yes, here is it

--
Vytvořeno poštovní aplikací Opery: http://www.opera.com/mail/

config.gz
Description: GNU Zip compressed data


Re: Running Apache Derby on 3.8 and BTRFS cause kernel oops

2013-02-27 Thread Daniel Kozák


[kozzi@KozziFX ~]$ mkdir derby
[kozzi@KozziFX ~]$ cd derby/
[kozzi@KozziFX derby]$ wget -c -q  
http://mirror.hosting90.cz/apache//db/derby/db-derby-10.9.1.0/db-derby-10.9.1.0-bin.zip

[kozzi@KozziFX derby]$ unzip -qq db-derby-10.9.1.0-bin.zip
[kozzi@KozziFX derby]$ cd db-derby-10.9.1.0-bin/
[kozzi@KozziFX db-derby-10.9.1.0-bin]$ DERBY_HOME=`pwd`
[kozzi@KozziFX db-derby-10.9.1.0-bin]$ java -jar  
$DERBY_HOME/lib/derbyrun.jar server start 
[kozzi@KozziFX db-derby-10.9.1.0-bin]$ java -jar  
$DERBY_HOME/lib/derbyrun.jar ij

verze ij 10.9
ij CONNECT 'jdbc:derby://localhost:1527/seconddb;create=true';

BTW. after this I must restart my PC, and after restart, my system doesn't  
boot anymore :-) (some more btrfs oops).

So I must use btrfs check --repair /dev/sdaX.

Dne Wed, 27 Feb 2013 14:25:13 +0100 Josef Bacik jba...@fusionio.com  
napsal(a):



On Wed, Feb 27, 2013 at 06:20:16AM -0700, Daniel Kozák wrote:

Hello,

On my both machine I have ArchLinux with recent kernel 3.8.0 btrfs as a
filesystem (with lzo compression). When I try use Apache derby (create
database), I almost every time get this kernel oops:


Sweet somebody else is hitting this and I haven't been able to  
reproduce.  Can

you give me the exact commands you run so I can try and reproduce myself?
Thanks,

Josef



--
Vytvořeno poštovní aplikací Opery: http://www.opera.com/mail/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Debian 7 (wheezy) support for btrfs RAID

2012-08-29 Thread Daniel Pocock



I've been able to run the Debian 7 installer (beta1) and get a working
Debian system on btrfs RAID1 root FS.

A few manual steps and patches required - it would be useful to get
feedback about this process.  I might have a go at patching partman to
fully support this through the installer menu.

I've written up the process as an install report bug:

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686130

Any feedback is welcome.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raw partition or LV for btrfs? (FAQ updated)

2012-08-28 Thread Daniel Pocock


On 22/08/12 17:42, David Sterba wrote:
 On Tue, Aug 14, 2012 at 07:23:48AM -0400, Calvin Walton wrote:
 A patch to add support for `btrfs fi defrag -c none file` or so would
 make this easier, and shouldn't be to hard to do :)
 
 This one is on my list of 'nice to have', it's needed to extend the
 ioctl to understand 'none' as to actually use no compression during the
 defrag, while currently it means 'whatever compression the file has
 set'.
 

Thanks for all the feedback about this, I've tried to gather the
responses into the FAQ:

https://btrfs.wiki.kernel.org/index.php/FAQ#Interaction_with_partitions.2C_device_managers_and_logical_volumes
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: interaction with hardware RAID?

2012-08-27 Thread Daniel Pocock

Just following up on this... does anyone know if any of this is
technically feasible even if not implemented/supported today?

Also, do any hardware RAID1 implementations offer something like the
full btrfs checksum functionality?

I've seen HP promoting their `Advanced Data Mirroring' in new Smart
Array products, but I've got no idea if that is just a marketing
gimmick, like the way they use the name `Advanced Data Guard' as a
moniker for RAID6

Looking around in Google, I was pleasantly disturbed to find so many web
sites (including some vendors) using the term `checksum' to refer to a
parity bit

On 22/08/12 13:05, Daniel Pocock wrote:
 
 
 
 It is well documented that btrfs data recovery (after silent corruption)
 is dependent on the use of btrfs's own RAID1.
 
 However, I'm curious about whether any hardware RAID vendors are
 contemplating ways to integrate more closely with btrfs, for example,
 such that when btrfs detects a bad checksum, it would be able to ask the
 hardware RAID controller to return all alternate copies of the block.
 
 Is this technically possible within any hardware RAID device today, even
 though not implemented in btrfs?
 
 Has there been any suggestion that vendors would support this in future,
 presumably for the benefit of btrfs, ZFS and other checksumming filesystems?
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


interaction with hardware RAID?

2012-08-22 Thread Daniel Pocock



It is well documented that btrfs data recovery (after silent corruption)
is dependent on the use of btrfs's own RAID1.

However, I'm curious about whether any hardware RAID vendors are
contemplating ways to integrate more closely with btrfs, for example,
such that when btrfs detects a bad checksum, it would be able to ask the
hardware RAID controller to return all alternate copies of the block.

Is this technically possible within any hardware RAID device today, even
though not implemented in btrfs?

Has there been any suggestion that vendors would support this in future,
presumably for the benefit of btrfs, ZFS and other checksumming filesystems?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


fail to mount after first reboot

2012-08-19 Thread Daniel Pocock


I created a 1TB RAID1.  So far it is just for testing, no important data
on there.


After a reboot, I tried to mount it again

# mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0
mount: wrong fs type, bad option, bad superblock on
/dev/mapper/vg00-btrfsvol0_0,
   missing codepage or helper program, or other error
   In some cases useful info is found in syslog - try
   dmesg | tail  or so

I checked dmesg:

[17216.145092] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
transid 34 /dev/mapper/vg00-btrfsvol0_0
[17216.145639] btrfs: disk space caching is enabled
[17216.146987] btrfs: failed to read the system array on dm-100
[17216.147556] btrfs: open_ctree failed


Then I did btrfsck - it reported no errors, but mounted OK:

# btrfsck /dev/mapper/vg00-btrfsvol0_0
checking extents
checking fs roots
checking root refs
found 26848493568 bytes used err is 0
total csum bytes: 26170252
total tree bytes: 48517120
total fs tree bytes: 5492736
btree space waste bytes: 14307930
file data blocks allocated: 26799976448
 referenced 26799976448
Btrfs Btrfs v0.19
# mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0
#


I checked dmesg again, these are the messages from the second mount:

[17299.180600] device fsid 928b939f-7f9d-4095-b1ba-e35c5f1277bf devid 1
transid 37928 /dev/dm-96
[17299.204475] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 2
transid 34 /dev/dm-99
[17299.204658] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
transid 34 /dev/dm-100
[17299.288317] device fsid 928b939f-7f9d-4095-b1ba-e35c5f1277bf devid 1
transid 37928 /dev/dm-96
[17299.289024] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 2
transid 34 /dev/dm-99
[17299.289150] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
transid 34 /dev/dm-100
[17310.978518] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
transid 34 /dev/mapper/vg00-btrfsvol0_0
[17310.993882] btrfs: disk space caching is enabled


Can anyone comment on this?


Also, df is reporting double the actual RAID1 volume size, and double
the amount of data stored in this filesystem:

# df -lh .
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg00-btrfsvol0_0  1.9T   51G  1.8T   3% /mnt/btrfs0

I would expect to see Size=1T, Used=25G

# strace -v -e trace=statfs df -lh /mnt/btrfs0
statfs(/mnt/btrfs0, {f_type=0x9123683e, f_bsize=4096,
f_blocks=488374272, f_bfree=475264720, f_bavail=474749786, f_files=0,
f_ffree=0, f_fsid={2083217090, -1714407264}, f_namelen=255,
f_frsize=4096}) = 0
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg00-btrfsvol0_0  1.9T   51G  1.8T   3% /mnt/btrfs0
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fail to mount after first reboot

2012-08-19 Thread Daniel Pocock


On 19/08/12 14:15, Hugo Mills wrote:
 On Sun, Aug 19, 2012 at 02:08:17PM +, Daniel Pocock wrote:


 I created a 1TB RAID1.  So far it is just for testing, no important data
 on there.


 After a reboot, I tried to mount it again

 # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0
 mount: wrong fs type, bad option, bad superblock on
 /dev/mapper/vg00-btrfsvol0_0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so
 
With multi-volume btrfs filesystems, you have to run btrfs dev
 scan before trying to mount it. Usually, the distribution will do
 this in the initrd (if you've installed its btrfs-progs package).
 


I'm running Debian, I've just updated the system from squeeze to wheezy
(with 3.2 kernel) so I could try btrfs and do other QA testing on wheezy
(as it is in the beta phase now)

I already had the btrfs-tools package installed, before creating the
filesystem.  So it appears Debian doesn't have an init script

It does have /lib/udev/rules.d/60-btrfs.rules:
SUBSYSTEM!=block, GOTO=btrfs_end
ACTION!=add|change, GOTO=btrfs_end
ENV{ID_FS_TYPE}!=btrfs, GOTO=btrfs_end
RUN+=/sbin/modprobe btrfs
RUN+=/sbin/btrfs device scan $env{DEVNAME}

LABEL=btrfs_end

but I'm guessing that isn't any use to my logical volumes that are
activated early in the boot sequence?

Could I be having this problem because I put my btrfs on logical volumes?

Here is the package version I have:

# dpkg --list | grep btrfs
ii  btrfs-tools   0.19+20120328-7
   Checksumming Copy on Write Filesystem utilities

Here is a more thorough dmesg, since boot, does this suggest the scan
was invoked?  I remember seeing some message about checking for btrfs
filesystems just after selecting the kernel in grub (root is ext3)


# dmesg | grep btrfs
[   40.677505] btrfs: setting nodatacow
[   40.677514] btrfs: turning off barriers
[17216.145092] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
transid 34 /dev/mapper/vg00-btrfsvol0_0
[17216.145639] btrfs: disk space caching is enabled
[17216.146987] btrfs: failed to read the system array on dm-100
[17216.147556] btrfs: open_ctree failed
[17310.978518] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
transid 34 /dev/mapper/vg00-btrfsvol0_0
[17310.993882] btrfs: disk space caching is enabled
[17598.736657] device fsid c959d4a5-0713-4685-b572-8a679ec37e20 devid 1
transid 37 /dev/mapper/vg00-btrfsvol0_0
[17598.750849] btrfs: disk space caching is enabled



 Then I did btrfsck - it reported no errors, but mounted OK:

 # btrfsck /dev/mapper/vg00-btrfsvol0_0
 [...]
 
The first thing that btrfsck does is to do a device scan.
 
 [...]

Ok, that is most likely why my next mount attempted succeeded

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fail to mount after first reboot

2012-08-19 Thread Daniel Pocock


On 19/08/12 16:51, Hugo Mills wrote:
 On Sun, Aug 19, 2012 at 02:33:14PM +, Daniel Pocock wrote:
 On 19/08/12 14:15, Hugo Mills wrote:
 On Sun, Aug 19, 2012 at 02:08:17PM +, Daniel Pocock wrote:
 I created a 1TB RAID1.  So far it is just for testing, no important data
 on there.

 After a reboot, I tried to mount it again

 # mount /dev/mapper/vg00-btrfsvol0_0 /mnt/btrfs0
 mount: wrong fs type, bad option, bad superblock on
 /dev/mapper/vg00-btrfsvol0_0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

With multi-volume btrfs filesystems, you have to run btrfs dev
 scan before trying to mount it. Usually, the distribution will do
 this in the initrd (if you've installed its btrfs-progs package).

 I'm running Debian, I've just updated the system from squeeze to wheezy
 (with 3.2 kernel) so I could try btrfs and do other QA testing on wheezy
 (as it is in the beta phase now)

 I already had the btrfs-tools package installed, before creating the
 filesystem.  So it appears Debian doesn't have an init script

 It does have /lib/udev/rules.d/60-btrfs.rules:
 SUBSYSTEM!=block, GOTO=btrfs_end
 ACTION!=add|change, GOTO=btrfs_end
 ENV{ID_FS_TYPE}!=btrfs, GOTO=btrfs_end
 RUN+=/sbin/modprobe btrfs
 RUN+=/sbin/btrfs device scan $env{DEVNAME}

 LABEL=btrfs_end

 but I'm guessing that isn't any use to my logical volumes that are
 activated early in the boot sequence?

 Could I be having this problem because I put my btrfs on logical volumes?
 
Possibly. You may need the Device mapper uevents option in the
 kernel (CONFIG_DM_UEVENT) to trigger that udev rule when you enable
 your VG(s). Not sure if it's available/enabled in your kernel.
 

I've created a Debian bug report for the issue:

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=685311

Thanks for the quick feedback about this
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >