Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit

2012-05-21 Thread Stefan Behrens
On Fri, 18 May 2012 13:57:19 +0200, David Sterba wrote:
> On Wed, May 16, 2012 at 06:50:47PM +0200, Stefan Behrens wrote:
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -823,6 +823,26 @@ struct btrfs_csum_item {
>>  u8 csum;
>>  } __attribute__ ((__packed__));
>>  
>> +struct btrfs_device_stats_item {
>> +/*
>> + * grow this item struct at the end for future enhancements and keep
>> + * the existing values unchanged
>> + */
>> +__le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */
>> +__le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */
>> +__le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */
>> +
>> +/* stats for indirect indications for I/O failures */
>> +__le64 cnt_corruption_errs; /* checksum error, bytenr error or
>> + * contents is illegal: this is an
>> + * indication that the block was damaged
>> + * during read or write, or written to
>> + * wrong location or read from wrong
>> + * location */
>> +__le64 cnt_generation_errs; /* an indication that blocks have not
>> + * been written */
> 
> A few spare u64 would come handy in the future. Currently there are 5,
> so add like 7 or 11. We might be interested in collecting more types of
> stats, ore more fine-grained.
> 
> I see the comment above about enhancing the structure, but then you need
> to version this stucture. Let's say this kernel at version 3.5 add this
> structre as you propose it now, and kernel 3.6 adds another item
> 'cnt_exploded'.

The size of the item is part of the information that is stored on disk.
This is used to downgrade and upgrade the item, and to support
downgrading and upgrading the kernel.

In btrfs_init_device_stats(), where the items are read from disk while
mounting, if the read item is too small, the missing statistic members
at the end are filled with zero. If the item is larger than expected,
the additional values are ignored.
And in update_device_stat_item(), where the items are written to disk
with each commit (only if they have been modified), if the disk item is
too small, it is removed and replaced by a larger one. If the disk item
is larger than expected, the additional values are ignored and not modified.

The ioctl() works similar. In the user mode, the max number of statistic
members is set before the ioctl(). The kernel code uses this parameter
to determine, how much values to write and at the end replaces it with
the number of values written. Then the btrfs tool prints all the values
that it knows by name, but at most the number of values that the kernel
has set.

If I compare this solution to the one that just pads the item size to a
well known size, the disadvantage are the extra lines of code. The
advantage is that no limit at all is set.

Rereading the code, I recognized that I will need to send a v4. The
ioctl structure needs to be padded, or copy_to_user(..., sizeof(struct
...)) would overwrite memory when the kernel struct is grown and the
btrfs tool is old.


> Accessing the 3.6-created image with a 3.5 will be ok (older kernel will
> not touch the new items).
> 
> Accessing the 3.5-created image with a 3.6 will be problematic, as the
> kernel would try to access ->cnt_exploded .
> 
> So, either the 3.6 kernel needs to know not to touch the missing item
> (ie. via reading the struct version from somewhere, stored on disk).
> 
> Or, there are spare items, which are zeroed in versions that do not use
> them and naturally used otherwise, but when new kernel uses old image,
> it finds zeros (and will be safe).
> 
>> +} __attribute__ ((__packed__));
>> +
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit

2012-05-18 Thread David Sterba
On Wed, May 16, 2012 at 06:50:47PM +0200, Stefan Behrens wrote:
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -823,6 +823,26 @@ struct btrfs_csum_item {
>   u8 csum;
>  } __attribute__ ((__packed__));
>  
> +struct btrfs_device_stats_item {
> + /*
> +  * grow this item struct at the end for future enhancements and keep
> +  * the existing values unchanged
> +  */
> + __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */
> + __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */
> + __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */
> +
> + /* stats for indirect indications for I/O failures */
> + __le64 cnt_corruption_errs; /* checksum error, bytenr error or
> +  * contents is illegal: this is an
> +  * indication that the block was damaged
> +  * during read or write, or written to
> +  * wrong location or read from wrong
> +  * location */
> + __le64 cnt_generation_errs; /* an indication that blocks have not
> +  * been written */

A few spare u64 would come handy in the future. Currently there are 5,
so add like 7 or 11. We might be interested in collecting more types of
stats, ore more fine-grained.

I see the comment above about enhancing the structure, but then you need
to version this stucture. Let's say this kernel at version 3.5 add this
structre as you propose it now, and kernel 3.6 adds another item
'cnt_exploded'.

Accessing the 3.6-created image with a 3.5 will be ok (older kernel will
not touch the new items).

Accessing the 3.5-created image with a 3.6 will be problematic, as the
kernel would try to access ->cnt_exploded .

So, either the 3.6 kernel needs to know not to touch the missing item
(ie. via reading the struct version from somewhere, stored on disk).

Or, there are spare items, which are zeroed in versions that do not use
them and naturally used otherwise, but when new kernel uses old image,
it finds zeros (and will be safe).

> +} __attribute__ ((__packed__));
> +
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit

2012-05-17 Thread Stefan Behrens

On 05/17/2012 03:52, Liu Bo wrote:

On 05/17/2012 12:50 AM, Stefan Behrens wrote:


The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.



Hi Stefan,

Just scaned the patch for a while and got a question:

Adding a new key type usually means changing the disk format,
so have you done something for this?



Hi Liu,

New tree entries with new keys are added to the device tree. Old kernels 
do not search for these keys and therefore ignore them. New kernels 
(with this patch) search for these keys and read and write them, or 
create them when required. That works fine.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit

2012-05-16 Thread Liu Bo
On 05/17/2012 12:50 AM, Stefan Behrens wrote:

> The device statistics are written into the device tree with each
> transaction commit. Only modified statistics are written.
> When a filesystem is mounted, the device statistics for each involved
> device are read from the device tree and used to initialize the
> counters.
> 


Hi Stefan,

Just scaned the patch for a while and got a question:

Adding a new key type usually means changing the disk format,
so have you done something for this?

thanks,
liubo

> Signed-off-by: Stefan Behrens 
> ---
>  fs/btrfs/ctree.h   |   51 
>  fs/btrfs/disk-io.c |7 ++
>  fs/btrfs/print-tree.c  |3 +
>  fs/btrfs/transaction.c |4 +
>  fs/btrfs/volumes.c |  205 
> 
>  fs/btrfs/volumes.h |9 +++
>  6 files changed, 279 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index ec42a24..1dd7651 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -823,6 +823,26 @@ struct btrfs_csum_item {
>   u8 csum;
>  } __attribute__ ((__packed__));
>  
> +struct btrfs_device_stats_item {
> + /*
> +  * grow this item struct at the end for future enhancements and keep
> +  * the existing values unchanged
> +  */
> + __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */
> + __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */
> + __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */
> +
> + /* stats for indirect indications for I/O failures */
> + __le64 cnt_corruption_errs; /* checksum error, bytenr error or
> +  * contents is illegal: this is an
> +  * indication that the block was damaged
> +  * during read or write, or written to
> +  * wrong location or read from wrong
> +  * location */
> + __le64 cnt_generation_errs; /* an indication that blocks have not
> +  * been written */
> +} __attribute__ ((__packed__));
> +
>  /* different types of block groups (and chunks) */
>  #define BTRFS_BLOCK_GROUP_DATA   (1ULL << 0)
>  #define BTRFS_BLOCK_GROUP_SYSTEM (1ULL << 1)
> @@ -1508,6 +1528,12 @@ struct btrfs_ioctl_defrag_range_args {
>  #define BTRFS_BALANCE_ITEM_KEY   248
>  
>  /*
> + * Persistantly stores the io stats in the device tree.
> + * One key for all stats, (0, BTRFS_DEVICE_STATS_KEY, devid).
> + */
> +#define BTRFS_DEVICE_STATS_KEY   249
> +
> +/*
>   * string items are for debugging.  They just store a short string of
>   * data in the FS
>   */
> @@ -2415,6 +2441,31 @@ static inline u32 
> btrfs_file_extent_inline_item_len(struct extent_buffer *eb,
>   return btrfs_item_size(eb, e) - offset;
>  }
>  
> +/* btrfs_device_stats_item */
> +BTRFS_SETGET_FUNCS(device_stats_cnt_write_io_errs,
> +struct btrfs_device_stats_item, cnt_write_io_errs, 64);
> +BTRFS_SETGET_FUNCS(device_stats_cnt_read_io_errs,
> +struct btrfs_device_stats_item, cnt_read_io_errs, 64);
> +BTRFS_SETGET_FUNCS(device_stats_cnt_flush_io_errs,
> +struct btrfs_device_stats_item, cnt_flush_io_errs, 64);
> +BTRFS_SETGET_FUNCS(device_stats_cnt_corruption_errs,
> +struct btrfs_device_stats_item, cnt_corruption_errs, 64);
> +BTRFS_SETGET_FUNCS(device_stats_cnt_generation_errs,
> +struct btrfs_device_stats_item, cnt_generation_errs, 64);
> +
> +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_write_io_errs,
> +  struct btrfs_device_stats_item, cnt_write_io_errs, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_read_io_errs,
> +  struct btrfs_device_stats_item, cnt_read_io_errs, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_flush_io_errs,
> +  struct btrfs_device_stats_item, cnt_flush_io_errs, 64);
> +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_corruption_errs,
> +  struct btrfs_device_stats_item, cnt_corruption_errs,
> +  64);
> +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_generation_errs,
> +  struct btrfs_device_stats_item, cnt_generation_errs,
> +  64);
> +
>  static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb)
>  {
>   return sb->s_fs_info;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index e123629..7ba08f7 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2353,6 +2353,13 @@ retry_root_backup:
>   fs_info->generation = generation;
>   fs_info->last_trans_committed = generation;
>  
> + ret = btrfs_init_device_stats(fs_info);
> + if (ret) {
> + printk(KERN_ERR "btrfs: failed to init device_stats: %d\n",
> +ret);
> + goto fail_block_groups;
> + }
> +
>   ret 

[PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit

2012-05-16 Thread Stefan Behrens
The device statistics are written into the device tree with each
transaction commit. Only modified statistics are written.
When a filesystem is mounted, the device statistics for each involved
device are read from the device tree and used to initialize the
counters.

Signed-off-by: Stefan Behrens 
---
 fs/btrfs/ctree.h   |   51 
 fs/btrfs/disk-io.c |7 ++
 fs/btrfs/print-tree.c  |3 +
 fs/btrfs/transaction.c |4 +
 fs/btrfs/volumes.c |  205 
 fs/btrfs/volumes.h |9 +++
 6 files changed, 279 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ec42a24..1dd7651 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -823,6 +823,26 @@ struct btrfs_csum_item {
u8 csum;
 } __attribute__ ((__packed__));
 
+struct btrfs_device_stats_item {
+   /*
+* grow this item struct at the end for future enhancements and keep
+* the existing values unchanged
+*/
+   __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */
+   __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */
+   __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */
+
+   /* stats for indirect indications for I/O failures */
+   __le64 cnt_corruption_errs; /* checksum error, bytenr error or
+* contents is illegal: this is an
+* indication that the block was damaged
+* during read or write, or written to
+* wrong location or read from wrong
+* location */
+   __le64 cnt_generation_errs; /* an indication that blocks have not
+* been written */
+} __attribute__ ((__packed__));
+
 /* different types of block groups (and chunks) */
 #define BTRFS_BLOCK_GROUP_DATA (1ULL << 0)
 #define BTRFS_BLOCK_GROUP_SYSTEM   (1ULL << 1)
@@ -1508,6 +1528,12 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_BALANCE_ITEM_KEY 248
 
 /*
+ * Persistantly stores the io stats in the device tree.
+ * One key for all stats, (0, BTRFS_DEVICE_STATS_KEY, devid).
+ */
+#define BTRFS_DEVICE_STATS_KEY 249
+
+/*
  * string items are for debugging.  They just store a short string of
  * data in the FS
  */
@@ -2415,6 +2441,31 @@ static inline u32 
btrfs_file_extent_inline_item_len(struct extent_buffer *eb,
return btrfs_item_size(eb, e) - offset;
 }
 
+/* btrfs_device_stats_item */
+BTRFS_SETGET_FUNCS(device_stats_cnt_write_io_errs,
+  struct btrfs_device_stats_item, cnt_write_io_errs, 64);
+BTRFS_SETGET_FUNCS(device_stats_cnt_read_io_errs,
+  struct btrfs_device_stats_item, cnt_read_io_errs, 64);
+BTRFS_SETGET_FUNCS(device_stats_cnt_flush_io_errs,
+  struct btrfs_device_stats_item, cnt_flush_io_errs, 64);
+BTRFS_SETGET_FUNCS(device_stats_cnt_corruption_errs,
+  struct btrfs_device_stats_item, cnt_corruption_errs, 64);
+BTRFS_SETGET_FUNCS(device_stats_cnt_generation_errs,
+  struct btrfs_device_stats_item, cnt_generation_errs, 64);
+
+BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_write_io_errs,
+struct btrfs_device_stats_item, cnt_write_io_errs, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_read_io_errs,
+struct btrfs_device_stats_item, cnt_read_io_errs, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_flush_io_errs,
+struct btrfs_device_stats_item, cnt_flush_io_errs, 64);
+BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_corruption_errs,
+struct btrfs_device_stats_item, cnt_corruption_errs,
+64);
+BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_generation_errs,
+struct btrfs_device_stats_item, cnt_generation_errs,
+64);
+
 static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb)
 {
return sb->s_fs_info;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e123629..7ba08f7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2353,6 +2353,13 @@ retry_root_backup:
fs_info->generation = generation;
fs_info->last_trans_committed = generation;
 
+   ret = btrfs_init_device_stats(fs_info);
+   if (ret) {
+   printk(KERN_ERR "btrfs: failed to init device_stats: %d\n",
+  ret);
+   goto fail_block_groups;
+   }
+
ret = btrfs_init_space_info(fs_info);
if (ret) {
printk(KERN_ERR "Failed to initial space info: %d\n", ret);
diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index f38e452..a9e45e4 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -294,6 +294,9 @@ void btrfs_print_leaf(struct btrfs_root *root, struct 
extent_buffer *l)