Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit
On Fri, 18 May 2012 13:57:19 +0200, David Sterba wrote: > On Wed, May 16, 2012 at 06:50:47PM +0200, Stefan Behrens wrote: >> --- a/fs/btrfs/ctree.h >> +++ b/fs/btrfs/ctree.h >> @@ -823,6 +823,26 @@ struct btrfs_csum_item { >> u8 csum; >> } __attribute__ ((__packed__)); >> >> +struct btrfs_device_stats_item { >> +/* >> + * grow this item struct at the end for future enhancements and keep >> + * the existing values unchanged >> + */ >> +__le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */ >> +__le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */ >> +__le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */ >> + >> +/* stats for indirect indications for I/O failures */ >> +__le64 cnt_corruption_errs; /* checksum error, bytenr error or >> + * contents is illegal: this is an >> + * indication that the block was damaged >> + * during read or write, or written to >> + * wrong location or read from wrong >> + * location */ >> +__le64 cnt_generation_errs; /* an indication that blocks have not >> + * been written */ > > A few spare u64 would come handy in the future. Currently there are 5, > so add like 7 or 11. We might be interested in collecting more types of > stats, ore more fine-grained. > > I see the comment above about enhancing the structure, but then you need > to version this stucture. Let's say this kernel at version 3.5 add this > structre as you propose it now, and kernel 3.6 adds another item > 'cnt_exploded'. The size of the item is part of the information that is stored on disk. This is used to downgrade and upgrade the item, and to support downgrading and upgrading the kernel. In btrfs_init_device_stats(), where the items are read from disk while mounting, if the read item is too small, the missing statistic members at the end are filled with zero. If the item is larger than expected, the additional values are ignored. And in update_device_stat_item(), where the items are written to disk with each commit (only if they have been modified), if the disk item is too small, it is removed and replaced by a larger one. If the disk item is larger than expected, the additional values are ignored and not modified. The ioctl() works similar. In the user mode, the max number of statistic members is set before the ioctl(). The kernel code uses this parameter to determine, how much values to write and at the end replaces it with the number of values written. Then the btrfs tool prints all the values that it knows by name, but at most the number of values that the kernel has set. If I compare this solution to the one that just pads the item size to a well known size, the disadvantage are the extra lines of code. The advantage is that no limit at all is set. Rereading the code, I recognized that I will need to send a v4. The ioctl structure needs to be padded, or copy_to_user(..., sizeof(struct ...)) would overwrite memory when the kernel struct is grown and the btrfs tool is old. > Accessing the 3.6-created image with a 3.5 will be ok (older kernel will > not touch the new items). > > Accessing the 3.5-created image with a 3.6 will be problematic, as the > kernel would try to access ->cnt_exploded . > > So, either the 3.6 kernel needs to know not to touch the missing item > (ie. via reading the struct version from somewhere, stored on disk). > > Or, there are spare items, which are zeroed in versions that do not use > them and naturally used otherwise, but when new kernel uses old image, > it finds zeros (and will be safe). > >> +} __attribute__ ((__packed__)); >> + -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit
On Wed, May 16, 2012 at 06:50:47PM +0200, Stefan Behrens wrote: > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -823,6 +823,26 @@ struct btrfs_csum_item { > u8 csum; > } __attribute__ ((__packed__)); > > +struct btrfs_device_stats_item { > + /* > + * grow this item struct at the end for future enhancements and keep > + * the existing values unchanged > + */ > + __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */ > + __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */ > + __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */ > + > + /* stats for indirect indications for I/O failures */ > + __le64 cnt_corruption_errs; /* checksum error, bytenr error or > + * contents is illegal: this is an > + * indication that the block was damaged > + * during read or write, or written to > + * wrong location or read from wrong > + * location */ > + __le64 cnt_generation_errs; /* an indication that blocks have not > + * been written */ A few spare u64 would come handy in the future. Currently there are 5, so add like 7 or 11. We might be interested in collecting more types of stats, ore more fine-grained. I see the comment above about enhancing the structure, but then you need to version this stucture. Let's say this kernel at version 3.5 add this structre as you propose it now, and kernel 3.6 adds another item 'cnt_exploded'. Accessing the 3.6-created image with a 3.5 will be ok (older kernel will not touch the new items). Accessing the 3.5-created image with a 3.6 will be problematic, as the kernel would try to access ->cnt_exploded . So, either the 3.6 kernel needs to know not to touch the missing item (ie. via reading the struct version from somewhere, stored on disk). Or, there are spare items, which are zeroed in versions that do not use them and naturally used otherwise, but when new kernel uses old image, it finds zeros (and will be safe). > +} __attribute__ ((__packed__)); > + -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit
On 05/17/2012 03:52, Liu Bo wrote: On 05/17/2012 12:50 AM, Stefan Behrens wrote: The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. Hi Stefan, Just scaned the patch for a while and got a question: Adding a new key type usually means changing the disk format, so have you done something for this? Hi Liu, New tree entries with new keys are added to the device tree. Old kernels do not search for these keys and therefore ignore them. New kernels (with this patch) search for these keys and read and write them, or create them when required. That works fine. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit
On 05/17/2012 12:50 AM, Stefan Behrens wrote: > The device statistics are written into the device tree with each > transaction commit. Only modified statistics are written. > When a filesystem is mounted, the device statistics for each involved > device are read from the device tree and used to initialize the > counters. > Hi Stefan, Just scaned the patch for a while and got a question: Adding a new key type usually means changing the disk format, so have you done something for this? thanks, liubo > Signed-off-by: Stefan Behrens > --- > fs/btrfs/ctree.h | 51 > fs/btrfs/disk-io.c |7 ++ > fs/btrfs/print-tree.c |3 + > fs/btrfs/transaction.c |4 + > fs/btrfs/volumes.c | 205 > > fs/btrfs/volumes.h |9 +++ > 6 files changed, 279 insertions(+) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index ec42a24..1dd7651 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -823,6 +823,26 @@ struct btrfs_csum_item { > u8 csum; > } __attribute__ ((__packed__)); > > +struct btrfs_device_stats_item { > + /* > + * grow this item struct at the end for future enhancements and keep > + * the existing values unchanged > + */ > + __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */ > + __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */ > + __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */ > + > + /* stats for indirect indications for I/O failures */ > + __le64 cnt_corruption_errs; /* checksum error, bytenr error or > + * contents is illegal: this is an > + * indication that the block was damaged > + * during read or write, or written to > + * wrong location or read from wrong > + * location */ > + __le64 cnt_generation_errs; /* an indication that blocks have not > + * been written */ > +} __attribute__ ((__packed__)); > + > /* different types of block groups (and chunks) */ > #define BTRFS_BLOCK_GROUP_DATA (1ULL << 0) > #define BTRFS_BLOCK_GROUP_SYSTEM (1ULL << 1) > @@ -1508,6 +1528,12 @@ struct btrfs_ioctl_defrag_range_args { > #define BTRFS_BALANCE_ITEM_KEY 248 > > /* > + * Persistantly stores the io stats in the device tree. > + * One key for all stats, (0, BTRFS_DEVICE_STATS_KEY, devid). > + */ > +#define BTRFS_DEVICE_STATS_KEY 249 > + > +/* > * string items are for debugging. They just store a short string of > * data in the FS > */ > @@ -2415,6 +2441,31 @@ static inline u32 > btrfs_file_extent_inline_item_len(struct extent_buffer *eb, > return btrfs_item_size(eb, e) - offset; > } > > +/* btrfs_device_stats_item */ > +BTRFS_SETGET_FUNCS(device_stats_cnt_write_io_errs, > +struct btrfs_device_stats_item, cnt_write_io_errs, 64); > +BTRFS_SETGET_FUNCS(device_stats_cnt_read_io_errs, > +struct btrfs_device_stats_item, cnt_read_io_errs, 64); > +BTRFS_SETGET_FUNCS(device_stats_cnt_flush_io_errs, > +struct btrfs_device_stats_item, cnt_flush_io_errs, 64); > +BTRFS_SETGET_FUNCS(device_stats_cnt_corruption_errs, > +struct btrfs_device_stats_item, cnt_corruption_errs, 64); > +BTRFS_SETGET_FUNCS(device_stats_cnt_generation_errs, > +struct btrfs_device_stats_item, cnt_generation_errs, 64); > + > +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_write_io_errs, > + struct btrfs_device_stats_item, cnt_write_io_errs, 64); > +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_read_io_errs, > + struct btrfs_device_stats_item, cnt_read_io_errs, 64); > +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_flush_io_errs, > + struct btrfs_device_stats_item, cnt_flush_io_errs, 64); > +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_corruption_errs, > + struct btrfs_device_stats_item, cnt_corruption_errs, > + 64); > +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_generation_errs, > + struct btrfs_device_stats_item, cnt_generation_errs, > + 64); > + > static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) > { > return sb->s_fs_info; > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index e123629..7ba08f7 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -2353,6 +2353,13 @@ retry_root_backup: > fs_info->generation = generation; > fs_info->last_trans_committed = generation; > > + ret = btrfs_init_device_stats(fs_info); > + if (ret) { > + printk(KERN_ERR "btrfs: failed to init device_stats: %d\n", > +ret); > + goto fail_block_groups; > + } > + > ret
[PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit
The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. Signed-off-by: Stefan Behrens --- fs/btrfs/ctree.h | 51 fs/btrfs/disk-io.c |7 ++ fs/btrfs/print-tree.c |3 + fs/btrfs/transaction.c |4 + fs/btrfs/volumes.c | 205 fs/btrfs/volumes.h |9 +++ 6 files changed, 279 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ec42a24..1dd7651 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -823,6 +823,26 @@ struct btrfs_csum_item { u8 csum; } __attribute__ ((__packed__)); +struct btrfs_device_stats_item { + /* +* grow this item struct at the end for future enhancements and keep +* the existing values unchanged +*/ + __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */ + __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */ + __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */ + + /* stats for indirect indications for I/O failures */ + __le64 cnt_corruption_errs; /* checksum error, bytenr error or +* contents is illegal: this is an +* indication that the block was damaged +* during read or write, or written to +* wrong location or read from wrong +* location */ + __le64 cnt_generation_errs; /* an indication that blocks have not +* been written */ +} __attribute__ ((__packed__)); + /* different types of block groups (and chunks) */ #define BTRFS_BLOCK_GROUP_DATA (1ULL << 0) #define BTRFS_BLOCK_GROUP_SYSTEM (1ULL << 1) @@ -1508,6 +1528,12 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_BALANCE_ITEM_KEY 248 /* + * Persistantly stores the io stats in the device tree. + * One key for all stats, (0, BTRFS_DEVICE_STATS_KEY, devid). + */ +#define BTRFS_DEVICE_STATS_KEY 249 + +/* * string items are for debugging. They just store a short string of * data in the FS */ @@ -2415,6 +2441,31 @@ static inline u32 btrfs_file_extent_inline_item_len(struct extent_buffer *eb, return btrfs_item_size(eb, e) - offset; } +/* btrfs_device_stats_item */ +BTRFS_SETGET_FUNCS(device_stats_cnt_write_io_errs, + struct btrfs_device_stats_item, cnt_write_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_read_io_errs, + struct btrfs_device_stats_item, cnt_read_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_flush_io_errs, + struct btrfs_device_stats_item, cnt_flush_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_corruption_errs, + struct btrfs_device_stats_item, cnt_corruption_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_generation_errs, + struct btrfs_device_stats_item, cnt_generation_errs, 64); + +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_write_io_errs, +struct btrfs_device_stats_item, cnt_write_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_read_io_errs, +struct btrfs_device_stats_item, cnt_read_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_flush_io_errs, +struct btrfs_device_stats_item, cnt_flush_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_corruption_errs, +struct btrfs_device_stats_item, cnt_corruption_errs, +64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_generation_errs, +struct btrfs_device_stats_item, cnt_generation_errs, +64); + static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) { return sb->s_fs_info; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e123629..7ba08f7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2353,6 +2353,13 @@ retry_root_backup: fs_info->generation = generation; fs_info->last_trans_committed = generation; + ret = btrfs_init_device_stats(fs_info); + if (ret) { + printk(KERN_ERR "btrfs: failed to init device_stats: %d\n", + ret); + goto fail_block_groups; + } + ret = btrfs_init_space_info(fs_info); if (ret) { printk(KERN_ERR "Failed to initial space info: %d\n", ret); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index f38e452..a9e45e4 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -294,6 +294,9 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l)