Re: [RFC v2 05/10] vfs: introduce one hash table
On Thu, Sep 27, 2012 at 11:43 AM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:30PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Adds a hash table structure which contains a lot of hash list and is used to efficiently look up the data temperature of a file or its ranges. In each hash list of hash table, the hash node will keep track of temperature info. So, let me see if I've got the relationship straight: - sb-s_hot_info.hot_inode_tree indexes hot_inode_items, one per inode - hot_inode_item contains access frequency data for that inode - hot_inode_item holds a heat hash node to index the access frequency data for that inode - hot_inode_item.hot_range_tree indexes hot_range_items for that inode - hot_range_item contains access frequency data for that range - hot_range_item holds a heat hash node to index the access frequency data for that range - sb-s_hot_info.heat_inode_hl indexes per-inode heat hash nodes - sb-s_hot_info.heat_range_hl indexes per-range heat hash nodes Correct. How about some ascii art? :) Just looking at the hot inode item case (the range item case is the same pattern, though), we have: heat_inode_hl hot_inode_tree | | | V | +---hot_inode_item---+ +---+ | frequency data | | V^ V | ...--hot_inode_item--... |...--hot_inode_item-- | frequency data | frequency data | ^| ^ | || | | || | +--hot_hash_node--hot_hash_node--hot_hash_node-- Great, can we put them in hot_tracking.txt in Documentation? There's no actual data stored in the hot_hash_node, just pointer back to the frequency data, a hlist_node and a pointer to the hashlist head. IOWs, I agree with Ram that this does not need to exist and just embedding a hlist_node inside the hot_inode_item is all that is needed. i.e: heat_inode_hl hot_inode_tree | | | V | +---hot_inode_item---+ | | frequency data | +---+ | hlist_node | | V^ | V | ...--hot_inode_item--... | | ...--hot_inode_item-- | frequency data | |frequency data +--hlist_node---+ +---hlist_node---. There's no need for separate allocations, initialisations, locks and reference counting - all that is already in the hot_inode_item. The items have the same lifecycle limitations - a hot_hash_node must be torn down before the frequency data it points to is freed. Finally, there's no difference in how you move it between lists. How will you know if one hot_inode_item should be moved between lists when its freq data is changed? Indeed, calling it a hash is wrong - there's not hashing at all - it keeping an array of list where each entry corresponds to a specific temperature. It is a *heat map*, not a hash list. i.e. inode_heat_map, not heat_inode_hl. HEAT_MAP_SIZE, not HASH_SIZE. OK. As it is, there aren't any users of the heat maps that are generated in this patch set - it's not even exported to userspace or to debugfs, so I'm not sure how it will be used yet. How are these heat maps going to be used by filesystems, Zhi? In hot_hash_calc_temperature(), you can see that one hot_inode or hot_range's freq data will be distilled into one temperature value, then it will be inserted to the heat map based on its temperature. When the file corresponding to the inode or range got hotter or cold, its location will be changed in the heat map based on its new temperature in hot_hash_update_hash_table(). And the user will retrieve those freq data and temperature info via debugfs or ioctl interfaces. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 06/10] vfs: enable hot data tracking
On Thu, Sep 27, 2012 at 11:54 AM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:31PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Miscellaneous features that implement hot data tracking and generally make the hot data functions a bit more friendly. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/direct-io.c | 10 ++ include/linux/hot_tracking.h | 11 +++ mm/filemap.c |8 mm/page-writeback.c | 21 + mm/readahead.c |9 + 5 files changed, 59 insertions(+), 0 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index f86c720..3773f44 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -37,6 +37,7 @@ #include linux/uio.h #include linux/atomic.h #include linux/prefetch.h +#include hot_tracking.h /* * How many user pages to map in one call to get_user_pages(). This determines @@ -1297,6 +1298,15 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, prefetch(bdev-bd_queue); prefetch((char *)bdev-bd_queue + SMP_CACHE_BYTES); + /* Hot data tracking */ + if (TRACK_THIS_INODE(iocb-ki_filp-f_mapping-host) + iov_length(iov, nr_segs) 0) { + hot_rb_update_freqs(iocb-ki_filp-f_mapping-host, + (u64)offset, + (u64)iov_length(iov, nr_segs), + rw WRITE); + } That's a bit messy. I'd prefer a static inline function that hides all this. e.g. Do you think of moving the condition into hot_inode_udate_freqs(), not adding another new function? track_hot_inode_ranges(inode, offset, length, rw) { if (inode-i_sb-s_flags MS_HOT_TRACKING) hot_inode_freq_update(inode, offset, length, rw); } diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 5ad5ce2..552c861 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -35,6 +35,7 @@ #include linux/buffer_head.h /* __set_page_dirty_buffers */ #include linux/pagevec.h #include linux/timer.h +#include linux/hot_tracking.h #include trace/events/writeback.h /* @@ -1895,13 +1896,33 @@ EXPORT_SYMBOL(generic_writepages); int do_writepages(struct address_space *mapping, struct writeback_control *wbc) { int ret; + pgoff_t start = 0; + u64 prev_count = 0, count = 0; if (wbc-nr_to_write = 0) return 0; + + /* Hot data tracking */ + if (TRACK_THIS_INODE(mapping-host) + wbc-range_cyclic) { + start = mapping-writeback_index PAGE_CACHE_SHIFT; + prev_count = (u64)wbc-nr_to_write; + } Why only wbc-range_cyclic? This won't record things like synchronous writes or fsync-triggered writes, are are far more likely to be to hot ranges in a file... sorry, i don't undersand what wbc-range_cyclic means. OK, i will fix it in next version. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 07/10] vfs: fork one kthread to update data temperature
On Thu, Sep 27, 2012 at 12:03 PM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:32PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Fork and run one kernel kthread to calculate that temperature based on some metrics kept in custom frequency data structs, and store the info in the hash table. No new kthreads, please. Use a per-superblock workqueue and a struct delayed_work to run periodic work on each superblock. If no new kthread is created, which kthread will work on these delayed_work tasks? That will also remove all the nasty, nasty !hot_track_temperature_update_kthread checks from the code, too. Also, I'd separate the work that the workqueue does from the patch that introduces the work queue. That way there is only one new thing to comment on in the patch. Further, I'd separate the aging code from the code that updates the temperature map into it's own patch as well.. Finally, you're going to need a shrinker to control the amount of memory that is used in tracking hot regions - if we are throwing inodes out of memory due to memory pressure, we most definitely are going to need to reduce the amount of memory the tracking code is using, even if it means losing useful information (i.e. the shrinker accelerates the aging process). Great, I agree with you. Given the above, and the other comments earlier in the series, there's not a lot of point in me spending time commenting on ethe code in detail here as it will change significantly as a result of all the earlier comments OK, i will complete the code change based on all your earlier comments ASAP. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 05/10] vfs: introduce one hash table
On Thu, Sep 27, 2012 at 02:23:16PM +0800, Zhi Yong Wu wrote: On Thu, Sep 27, 2012 at 11:43 AM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:30PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Adds a hash table structure which contains a lot of hash list and is used to efficiently look up the data temperature of a file or its ranges. In each hash list of hash table, the hash node will keep track of temperature info. So, let me see if I've got the relationship straight: - sb-s_hot_info.hot_inode_tree indexes hot_inode_items, one per inode - hot_inode_item contains access frequency data for that inode - hot_inode_item holds a heat hash node to index the access frequency data for that inode - hot_inode_item.hot_range_tree indexes hot_range_items for that inode - hot_range_item contains access frequency data for that range - hot_range_item holds a heat hash node to index the access frequency data for that range - sb-s_hot_info.heat_inode_hl indexes per-inode heat hash nodes - sb-s_hot_info.heat_range_hl indexes per-range heat hash nodes Correct. How about some ascii art? :) Just looking at the hot inode item case (the range item case is the same pattern, though), we have: heat_inode_hl hot_inode_tree | | | V | +---hot_inode_item---+ +---+ | frequency data | | V^ V | ...--hot_inode_item--... |...--hot_inode_item-- | frequency data | frequency data | ^| ^ | || | | || | +--hot_hash_node--hot_hash_node--hot_hash_node-- Great, can we put them in hot_tracking.txt in Documentation? There's no actual data stored in the hot_hash_node, just pointer back to the frequency data, a hlist_node and a pointer to the hashlist head. IOWs, I agree with Ram that this does not need to exist and just embedding a hlist_node inside the hot_inode_item is all that is needed. i.e: heat_inode_hl hot_inode_tree | | | V | +---hot_inode_item---+ | | frequency data | +---+ | hlist_node | | V^ | V | ...--hot_inode_item--... | | ...--hot_inode_item-- | frequency data | |frequency data +--hlist_node---+ +---hlist_node---. There's no need for separate allocations, initialisations, locks and reference counting - all that is already in the hot_inode_item. The items have the same lifecycle limitations - a hot_hash_node must be torn down before the frequency data it points to is freed. Finally, there's no difference in how you move it between lists. How will you know if one hot_inode_item should be moved between lists when its freq data is changed? Record the current temperature in the frequency data, and if it changes, change the list it is on. Indeed, calling it a hash is wrong - there's not hashing at all - it keeping an array of list where each entry corresponds to a specific temperature. It is a *heat map*, not a hash list. i.e. inode_heat_map, not heat_inode_hl. HEAT_MAP_SIZE, not HASH_SIZE. OK. As it is, there aren't any users of the heat maps that are generated in this patch set - it's not even exported to userspace or to debugfs, so I'm not sure how it will be used yet. How are these heat maps going to be used by filesystems, Zhi? In hot_hash_calc_temperature(), you can see that one hot_inode or hot_range's freq data will be distilled into one temperature value, then it will be inserted to the heat map based on its temperature. When the file corresponding to the inode or range got hotter or cold, its location will be changed in the heat map based on its new temperature in hot_hash_update_hash_table(). Yes, but a hot_inode_item or hot_range_item can only have one location in the heat map, right? So it doesn't need external structure to point to the frequency data to track this And the user will retrieve those freq data and temperature info via debugfs or ioctl interfaces. Right - but that data is only extracted after an initial hot_inode_tree lookup - The heat map itself is never directly used for lookups. If it's not used for lookups based on temperature, why is it needed? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 06/10] vfs: enable hot data tracking
On Thu, Sep 27, 2012 at 02:28:12PM +0800, Zhi Yong Wu wrote: On Thu, Sep 27, 2012 at 11:54 AM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:31PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Miscellaneous features that implement hot data tracking and generally make the hot data functions a bit more friendly. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/direct-io.c | 10 ++ include/linux/hot_tracking.h | 11 +++ mm/filemap.c |8 mm/page-writeback.c | 21 + mm/readahead.c |9 + 5 files changed, 59 insertions(+), 0 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index f86c720..3773f44 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -37,6 +37,7 @@ #include linux/uio.h #include linux/atomic.h #include linux/prefetch.h +#include hot_tracking.h /* * How many user pages to map in one call to get_user_pages(). This determines @@ -1297,6 +1298,15 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, prefetch(bdev-bd_queue); prefetch((char *)bdev-bd_queue + SMP_CACHE_BYTES); + /* Hot data tracking */ + if (TRACK_THIS_INODE(iocb-ki_filp-f_mapping-host) + iov_length(iov, nr_segs) 0) { + hot_rb_update_freqs(iocb-ki_filp-f_mapping-host, + (u64)offset, + (u64)iov_length(iov, nr_segs), + rw WRITE); + } That's a bit messy. I'd prefer a static inline function that hides all this. e.g. Do you think of moving the condition into hot_inode_udate_freqs(), not adding another new function? Moving it into hot_inode_udate_freqs() will add a function call overhead even when tracking is not enabled. a static inline function will just result in no extra overhead other than the if statement Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 07/10] vfs: fork one kthread to update data temperature
On Thu, Sep 27, 2012 at 02:54:22PM +0800, Zhi Yong Wu wrote: On Thu, Sep 27, 2012 at 12:03 PM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:32PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Fork and run one kernel kthread to calculate that temperature based on some metrics kept in custom frequency data structs, and store the info in the hash table. No new kthreads, please. Use a per-superblock workqueue and a struct delayed_work to run periodic work on each superblock. If no new kthread is created, which kthread will work on these delayed_work tasks? One of the kworker threads that service the workqueue infrastructure. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 03/10] vfs: add one new mount option '-o hottrack'
On Thu, Sep 27, 2012 at 01:25:34PM +0800, Zhi Yong Wu wrote: On Tue, Sep 25, 2012 at 5:28 PM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:28PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Introduce one new mount option '-o hottrack', and add its parsing support. Its usage looks like: mount -o hottrack mount -o nouser,hottrack mount -o nouser,hottrack,loop mount -o hottrack,nouser I think that this option parsing should be done by the filesystem, even though the tracking functionality is in the VFS. That way ony the filesystems that can use the tracking information will turn it on, rather than being able to turn it on for everything regardless of whether it is useful or not. Along those lines, just using a normal superblock flag to indicate it is active (e.g. MS_HOT_INODE_TRACKING in sb-s_flags) means you don't need to allocate the sb-s_hot_info structure just to be able If we don't allocate one sb-s_hot_info, where will those hash list head and btree roots locate? I wrote that thinking (mistakenly) that s-hot)info was dynamically allocated rather than being embedded in the struct super_block. Indeed, if the mount option is held in s_flags, then it could be dynamically allocated, but I don't think that's really necessary... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 05/10] vfs: introduce one hash table
On Thu, Sep 27, 2012 at 2:57 PM, Dave Chinner da...@fromorbit.com wrote: On Thu, Sep 27, 2012 at 02:23:16PM +0800, Zhi Yong Wu wrote: On Thu, Sep 27, 2012 at 11:43 AM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:30PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Adds a hash table structure which contains a lot of hash list and is used to efficiently look up the data temperature of a file or its ranges. In each hash list of hash table, the hash node will keep track of temperature info. So, let me see if I've got the relationship straight: - sb-s_hot_info.hot_inode_tree indexes hot_inode_items, one per inode - hot_inode_item contains access frequency data for that inode - hot_inode_item holds a heat hash node to index the access frequency data for that inode - hot_inode_item.hot_range_tree indexes hot_range_items for that inode - hot_range_item contains access frequency data for that range - hot_range_item holds a heat hash node to index the access frequency data for that range - sb-s_hot_info.heat_inode_hl indexes per-inode heat hash nodes - sb-s_hot_info.heat_range_hl indexes per-range heat hash nodes Correct. How about some ascii art? :) Just looking at the hot inode item case (the range item case is the same pattern, though), we have: heat_inode_hl hot_inode_tree | | | V | +---hot_inode_item---+ +---+ | frequency data | | V^ V | ...--hot_inode_item--... |...--hot_inode_item-- | frequency data | frequency data | ^| ^ | || | | || | +--hot_hash_node--hot_hash_node--hot_hash_node-- Great, can we put them in hot_tracking.txt in Documentation? There's no actual data stored in the hot_hash_node, just pointer back to the frequency data, a hlist_node and a pointer to the hashlist head. IOWs, I agree with Ram that this does not need to exist and just embedding a hlist_node inside the hot_inode_item is all that is needed. i.e: heat_inode_hl hot_inode_tree | | | V | +---hot_inode_item---+ | | frequency data | +---+ | hlist_node | | V^ | V | ...--hot_inode_item--... | | ...--hot_inode_item-- | frequency data | |frequency data +--hlist_node---+ +---hlist_node---. There's no need for separate allocations, initialisations, locks and reference counting - all that is already in the hot_inode_item. The items have the same lifecycle limitations - a hot_hash_node must be torn down before the frequency data it points to is freed. Finally, there's no difference in how you move it between lists. How will you know if one hot_inode_item should be moved between lists when its freq data is changed? Record the current temperature in the frequency data, and if it I know how to do it, thanks. changes, change the list it is on. Indeed, calling it a hash is wrong - there's not hashing at all - it keeping an array of list where each entry corresponds to a specific temperature. It is a *heat map*, not a hash list. i.e. inode_heat_map, not heat_inode_hl. HEAT_MAP_SIZE, not HASH_SIZE. OK. As it is, there aren't any users of the heat maps that are generated in this patch set - it's not even exported to userspace or to debugfs, so I'm not sure how it will be used yet. How are these heat maps going to be used by filesystems, Zhi? In hot_hash_calc_temperature(), you can see that one hot_inode or hot_range's freq data will be distilled into one temperature value, then it will be inserted to the heat map based on its temperature. When the file corresponding to the inode or range got hotter or cold, its location will be changed in the heat map based on its new temperature in hot_hash_update_hash_table(). Yes, but a hot_inode_item or hot_range_item can only have one location in the heat map, right? So it doesn't need external Yes. structure to point to the frequency data to track this OK. And the user will retrieve those freq data and temperature info via debugfs or ioctl interfaces. Right - but that data is only extracted after an initial hot_inode_tree lookup - The heat map itself is never directly used for lookups. If it's not used for lookups based on temperature, why is it needed? You mean we don't need hot_inode_tree? You know, after those hook functions collect the freq data for inode, they will store those raw info in hot_inode_tree. One private kthread will iterate
Re: [RFC v2 06/10] vfs: enable hot data tracking
On Thu, Sep 27, 2012 at 2:59 PM, Dave Chinner da...@fromorbit.com wrote: On Thu, Sep 27, 2012 at 02:28:12PM +0800, Zhi Yong Wu wrote: On Thu, Sep 27, 2012 at 11:54 AM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:31PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Miscellaneous features that implement hot data tracking and generally make the hot data functions a bit more friendly. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- fs/direct-io.c | 10 ++ include/linux/hot_tracking.h | 11 +++ mm/filemap.c |8 mm/page-writeback.c | 21 + mm/readahead.c |9 + 5 files changed, 59 insertions(+), 0 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index f86c720..3773f44 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -37,6 +37,7 @@ #include linux/uio.h #include linux/atomic.h #include linux/prefetch.h +#include hot_tracking.h /* * How many user pages to map in one call to get_user_pages(). This determines @@ -1297,6 +1298,15 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode, prefetch(bdev-bd_queue); prefetch((char *)bdev-bd_queue + SMP_CACHE_BYTES); + /* Hot data tracking */ + if (TRACK_THIS_INODE(iocb-ki_filp-f_mapping-host) + iov_length(iov, nr_segs) 0) { + hot_rb_update_freqs(iocb-ki_filp-f_mapping-host, + (u64)offset, + (u64)iov_length(iov, nr_segs), + rw WRITE); + } That's a bit messy. I'd prefer a static inline function that hides all this. e.g. Do you think of moving the condition into hot_inode_udate_freqs(), not adding another new function? Moving it into hot_inode_udate_freqs() will add a function call overhead even when tracking is not enabled. a static inline function Can we not directly define hot_inode_udate_freqs to be a static inline?:) will just result in no extra overhead other than the if statement Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 07/10] vfs: fork one kthread to update data temperature
On Thu, Sep 27, 2012 at 3:01 PM, Dave Chinner da...@fromorbit.com wrote: On Thu, Sep 27, 2012 at 02:54:22PM +0800, Zhi Yong Wu wrote: On Thu, Sep 27, 2012 at 12:03 PM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:32PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Fork and run one kernel kthread to calculate that temperature based on some metrics kept in custom frequency data structs, and store the info in the hash table. No new kthreads, please. Use a per-superblock workqueue and a struct delayed_work to run periodic work on each superblock. If no new kthread is created, which kthread will work on these delayed_work tasks? One of the kworker threads that service the workqueue infrastructure. Got it, thanks Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 03/10] vfs: add one new mount option '-o hottrack'
On Thu, Sep 27, 2012 at 3:05 PM, Dave Chinner da...@fromorbit.com wrote: On Thu, Sep 27, 2012 at 01:25:34PM +0800, Zhi Yong Wu wrote: On Tue, Sep 25, 2012 at 5:28 PM, Dave Chinner da...@fromorbit.com wrote: On Sun, Sep 23, 2012 at 08:56:28PM +0800, zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com Introduce one new mount option '-o hottrack', and add its parsing support. Its usage looks like: mount -o hottrack mount -o nouser,hottrack mount -o nouser,hottrack,loop mount -o hottrack,nouser I think that this option parsing should be done by the filesystem, even though the tracking functionality is in the VFS. That way ony the filesystems that can use the tracking information will turn it on, rather than being able to turn it on for everything regardless of whether it is useful or not. Along those lines, just using a normal superblock flag to indicate it is active (e.g. MS_HOT_INODE_TRACKING in sb-s_flags) means you don't need to allocate the sb-s_hot_info structure just to be able If we don't allocate one sb-s_hot_info, where will those hash list head and btree roots locate? I wrote that thinking (mistakenly) that s-hot)info was dynamically allocated rather than being embedded in the struct super_block. Indeed, if the mount option is held in s_flags, then it could be dynamically allocated, but I don't think that's really necessary... ah, you prefer allocating it, OK, let me try. thanks for your explaination. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: typo in inode.c
On Thu, Sep 27, 2012 at 05:04:02AM +0800, ching wrote: On 09/26/2012 11:23 PM, David Sterba wrote: On Wed, Sep 26, 2012 at 07:48:47PM +0800, ching wrote: There is a typo (?) in inode.c (git) What's the top commit and what git tree? This has been fixed in 3.6-rc4 via 287082b0bd10060e9c6b32ed9605174ddf2f672a This mistake is in http://git.kernel.org/?p=linux/kernel/git/mason/linux-btrfs.git;a=summary I see, this is because the fix was merged into linus' tree via the trivial tree. If unsure, you may want to check btrfs-next first. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/4] btrfs: extended inode refs
On Tue, Sep 25, 2012 at 04:04:46PM -0400, Chris Mason wrote: @@ -889,16 +899,23 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans, while (cur_offset item_size) { extref = (struct btrfs_inode_extref *)base + cur_offset; - victim_name_len = btrfs_inode_extref_name_len(eb, extref); - victim_name = kmalloc(namelen, GFP_NOFS); - leaf = path-nodes[0]; - read_extent_buffer(eb, name, (unsigned long)extref-name, namelen); + victim_name_len = btrfs_inode_extref_name_len(leaf, extref); + + if (btrfs_inode_extref_parent(leaf, extref) != parent_objectid) + goto next; + + victim_name = kmalloc(victim_name_len, GFP_NOFS); unchecked kmalloc + read_extent_buffer(leaf, victim_name, (unsigned long)extref-name, +victim_name_len); search_key.objectid = inode_objectid; search_key.type = BTRFS_INODE_EXTREF_KEY; search_key.offset = btrfs_extref_hash(parent_objectid, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: improve the noflush reservation
In some places(such as: evicting inode), we just can not flush the reserved space of delalloc, flushing the delayed directory index and delayed inode is OK, but we don't try to flush those things and just go back when there is no enough space to be reserved. This patch fixes this problem. We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL. If we can in the transaction, we should not flush anything, or the deadlock would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used, and we will flush all things. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- This is based on btrfs-next tree. --- fs/btrfs/ctree.h | 26 +- fs/btrfs/delayed-inode.c |6 ++- fs/btrfs/extent-tree.c | 85 ++--- fs/btrfs/inode-map.c |5 ++- fs/btrfs/inode.c |8 +++-- fs/btrfs/relocation.c| 12 -- fs/btrfs/transaction.c | 30 +++- fs/btrfs/transaction.h |2 +- 8 files changed, 85 insertions(+), 89 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index dbb461f..cb59e9b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2870,6 +2870,18 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags); u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data); void btrfs_clear_space_info_full(struct btrfs_fs_info *info); + +enum btrfs_reserve_flush_enum { + /* If we are in the transaction, we can't flush anything.*/ + BTRFS_RESERVE_NO_FLUSH, + /* +* Flushing delalloc may cause deadlock somewhere, in this +* case, use FLUSH LIMIT +*/ + BTRFS_RESERVE_FLUSH_LIMIT, + BTRFS_RESERVE_FLUSH_ALL, +}; + int btrfs_check_data_free_space(struct inode *inode, u64 bytes); void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes); void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, @@ -2889,19 +2901,13 @@ struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root, void btrfs_free_block_rsv(struct btrfs_root *root, struct btrfs_block_rsv *rsv); int btrfs_block_rsv_add(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 num_bytes); -int btrfs_block_rsv_add_noflush(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 num_bytes); + struct btrfs_block_rsv *block_rsv, u64 num_bytes, + enum btrfs_reserve_flush_enum flush); int btrfs_block_rsv_check(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, int min_factor); int btrfs_block_rsv_refill(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 min_reserved); -int btrfs_block_rsv_refill_noflush(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 min_reserved); + struct btrfs_block_rsv *block_rsv, u64 min_reserved, + enum btrfs_reserve_flush_enum flush); int btrfs_block_rsv_migrate(struct btrfs_block_rsv *src_rsv, struct btrfs_block_rsv *dst_rsv, u64 num_bytes); diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index eb768c4..2e2eddb 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -651,7 +651,8 @@ static int btrfs_delayed_inode_reserve_metadata( */ if (!src_rsv || (!trans-bytes_reserved src_rsv-type != BTRFS_BLOCK_RSV_DELALLOC)) { - ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes); + ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes, + BTRFS_RESERVE_NO_FLUSH); /* * Since we're under a transaction reserve_metadata_bytes could * try to commit the transaction which will make it return @@ -686,7 +687,8 @@ static int btrfs_delayed_inode_reserve_metadata( * reserve something strictly for us. If not be a pain and try * to steal from the delalloc block rsv. */ - ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes); + ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes, + BTRFS_RESERVE_NO_FLUSH); if (!ret) goto out; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 8a01087..73b0255 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3851,24 +3851,31 @@ static int flush_space(struct btrfs_root *root,
Re: [PATCH 1/2] btrfs-progs: limit the max value of leafsize and nodesize
On Wed, Sep 26, 2012 at 03:52:07PM +0800, Robin Dong wrote: Using mkfs.btrfs like: mkfs.btrfs -l 131072 /dev/sda will return no error, but after mount it, the dmesg will report: BTRFS: couldn't mount because metadata blocksize (131072) was too large The user tools should use BTRFS_MAX_METADATA_BLOCKSIZE to limit leaf and node size. Good catch. @@ -1291,11 +1291,13 @@ int main(int ac, char **av) } } sectorsize = max(sectorsize, (u32)getpagesize()); - if (leafsize sectorsize || (leafsize (sectorsize - 1))) { + if (leafsize sectorsize || leafsize BTRFS_MAX_METADATA_BLOCKSIZE || + (leafsize (sectorsize - 1))) { Could you please separate the BTRFS_MAX_METADATA_BLOCKSIZE check and add appropriate error message that actually informs the user what kind of error happened? fprintf(stderr, Illegal leafsize %u\n, leafsize); exit(1); } - if (nodesize sectorsize || (nodesize (sectorsize - 1))) { + if (nodesize sectorsize || nodesize BTRFS_MAX_METADATA_BLOCKSIZE || + (nodesize (sectorsize - 1))) { (same here) fprintf(stderr, Illegal nodesize %u\n, nodesize); exit(1); } Thanks! david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix wrong calculation of the available space when reserving the space
According to the comment, we can overcommit the space up to 1/2 of the total disk space, or we just can overcommit up to 1/8. But the code was written reversedly. Fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- This is based on btrfs-next tree. --- fs/btrfs/extent-tree.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a010234..8a01087 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3962,9 +3962,9 @@ again: * 1/2 of the space. */ if (flush) - avail = 3; - else avail = 1; + else + avail = 3; spin_unlock(root-fs_info-free_chunk_lock); if (used + num_bytes space_info-total_bytes + avail) { -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: improve the noflush reservation
On thu, 27 Sep 2012 16:45:51 +0800, Miao Xie wrote: In some places(such as: evicting inode), we just can not flush the reserved space of delalloc, flushing the delayed directory index and delayed inode is OK, but we don't try to flush those things and just go back when there is no enough space to be reserved. This patch fixes this problem. We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL. If we can in the transaction, we should not flush anything, or the deadlock would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used, and we will flush all things. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- This is based on btrfs-next tree. Sorry, I forget to say that this patch is against: [PATCH] Btrfs: fix wrong calculation of the available space when reserving the space Both this patch and the above patch are based on btrfs-next tree. Thanks Miao --- fs/btrfs/ctree.h | 26 +- fs/btrfs/delayed-inode.c |6 ++- fs/btrfs/extent-tree.c | 85 ++--- fs/btrfs/inode-map.c |5 ++- fs/btrfs/inode.c |8 +++-- fs/btrfs/relocation.c| 12 -- fs/btrfs/transaction.c | 30 +++- fs/btrfs/transaction.h |2 +- 8 files changed, 85 insertions(+), 89 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index dbb461f..cb59e9b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2870,6 +2870,18 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags); u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data); void btrfs_clear_space_info_full(struct btrfs_fs_info *info); + +enum btrfs_reserve_flush_enum { + /* If we are in the transaction, we can't flush anything.*/ + BTRFS_RESERVE_NO_FLUSH, + /* + * Flushing delalloc may cause deadlock somewhere, in this + * case, use FLUSH LIMIT + */ + BTRFS_RESERVE_FLUSH_LIMIT, + BTRFS_RESERVE_FLUSH_ALL, +}; + int btrfs_check_data_free_space(struct inode *inode, u64 bytes); void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes); void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, @@ -2889,19 +2901,13 @@ struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root, void btrfs_free_block_rsv(struct btrfs_root *root, struct btrfs_block_rsv *rsv); int btrfs_block_rsv_add(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 num_bytes); -int btrfs_block_rsv_add_noflush(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 num_bytes); + struct btrfs_block_rsv *block_rsv, u64 num_bytes, + enum btrfs_reserve_flush_enum flush); int btrfs_block_rsv_check(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, int min_factor); int btrfs_block_rsv_refill(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 min_reserved); -int btrfs_block_rsv_refill_noflush(struct btrfs_root *root, -struct btrfs_block_rsv *block_rsv, -u64 min_reserved); +struct btrfs_block_rsv *block_rsv, u64 min_reserved, +enum btrfs_reserve_flush_enum flush); int btrfs_block_rsv_migrate(struct btrfs_block_rsv *src_rsv, struct btrfs_block_rsv *dst_rsv, u64 num_bytes); diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index eb768c4..2e2eddb 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -651,7 +651,8 @@ static int btrfs_delayed_inode_reserve_metadata( */ if (!src_rsv || (!trans-bytes_reserved src_rsv-type != BTRFS_BLOCK_RSV_DELALLOC)) { - ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes); + ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes, + BTRFS_RESERVE_NO_FLUSH); /* * Since we're under a transaction reserve_metadata_bytes could * try to commit the transaction which will make it return @@ -686,7 +687,8 @@ static int btrfs_delayed_inode_reserve_metadata( * reserve something strictly for us. If not be a pain and try * to steal from the delalloc block rsv. */ - ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes); + ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes, +
BTRF - Storage Usage
Hi, I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB Metadata: total=8.00MB, used=0.00 poivron:~ # df -hP / Filesystem Size Used Avail Use% Mounted on /dev/sda3 132G 3.0G 124G 3% / poivron:~ # btrfs filesystem show /dev/sda3 Label: none uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212 Total devices 1 FS bytes used 2.58GB devid1 size 131.64GB used 10.04GB path /dev/sda3 Btrfs v0.19+ poivron:~ # du -sh /.snapshots 40G /.snapshots === Please help me understand and interpret those information to know the most accurately as possible what is my real remaining space, and what space is used by what. Although, i don't really understand the output of the command btrfs filesystem df / : what are exactly Data, System DUP, System total, Metadata DUP and Metadata total ? == Here are some complementary informations : poivron:~ # uname -a Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012 (3829766) x86_64 x86_64 x86_64 GNU/Linux poivron:~ # snapper list-configs Config| Subvolume --+ root | / poivron:~ # cat /etc/snapper/configs/root # subvolume to snapshot SUBVOLUME=/ # filesystem type FSTYPE=btrfs # run daily number cleanup NUMBER_CLEANUP=yes # limit for number cleanup NUMBER_MIN_AGE=1800 NUMBER_LIMIT=100 # create hourly snapshots TIMELINE_CREATE=yes # cleanup hourly snapshots after some time TIMELINE_CLEANUP=yes # limits for timeline cleanup TIMELINE_MIN_AGE=1800 TIMELINE_LIMIT_HOURLY=10 TIMELINE_LIMIT_DAILY=10 TIMELINE_LIMIT_MONTHLY=10 TIMELINE_LIMIT_YEARLY=10 # cleanup empty pre-post-pairs EMPTY_PRE_POST_CLEANUP=yes # limits for empty pre-post-pair cleanup EMPTY_PRE_POST_MIN_AGE=1800 Cordialement, Sébastien MAURY Responsable d'exploitation du site de Montpellier Équipe DBA ___ INSERM - DSI - Pôle Infrastructures Délégation régionale Languedoc Roussillon 60, rue de Navacelles 34394 Montpellier Cedex 5 Mob : 06 31 51 42 18 Fixe : 04 67 63 61 43 Fax : 04 67 63 70 25 Mél : sebastien.ma...@inserm.fr ___ This message was sent using IMP, the Internet Messaging Program. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 1/2] Btrfs: cleanup duplicated division functions
div_factor{_fine} has been implemented for two times, and these two functions are very similar, so cleanup the reduplicate implement and drop the original div_factor(), and then rename div_factor_fine() to div_factor(). So the factor of the new div_factor() is 100, not 10. And I move div_factor into a independent file named math.h because it is a common math function, may be used by every composition of btrfs. Because these functions are mostly used on the hot path, and we are sure the parameters are right in the most cases, we don't add complex checks for the parameters. But in the other place, we must check and make sure the parameters are right. So besides the code cleanup, this patch also add a check for the usage of the space balance, it is the only place that we need add check to make sure the parameters of div_factor are right till now. Besides that, the old kernel may hold the wrong usage value, so we must rectify it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- Changelog v3 - v4: - deal with the wrong usage that was input on the old kernel Changelog v2 - v3: - drop the original div_factor and rename div_factor_fine to div_factor - drop the check of the factor Changelog v1 - v2: - add missing check --- fs/btrfs/extent-tree.c | 29 + fs/btrfs/ioctl.c | 21 ++ fs/btrfs/math.h| 35 ++ fs/btrfs/relocation.c |2 +- fs/btrfs/transaction.c |2 +- fs/btrfs/volumes.c | 55 ++- 6 files changed, 94 insertions(+), 50 deletions(-) create mode 100644 fs/btrfs/math.h diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a010234..bcb9ced 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -33,6 +33,7 @@ #include volumes.h #include locking.h #include free-space-cache.h +#include math.h #undef SCRAMBLE_DELAYED_REFS @@ -648,24 +649,6 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info) rcu_read_unlock(); } -static u64 div_factor(u64 num, int factor) -{ - if (factor == 10) - return num; - num *= factor; - do_div(num, 10); - return num; -} - -static u64 div_factor_fine(u64 num, int factor) -{ - if (factor == 100) - return num; - num *= factor; - do_div(num, 100); - return num; -} - u64 btrfs_find_block_group(struct btrfs_root *root, u64 search_start, u64 search_hint, int owner) { @@ -674,7 +657,7 @@ u64 btrfs_find_block_group(struct btrfs_root *root, u64 last = max(search_hint, search_start); u64 group_start = 0; int full_search = 0; - int factor = 9; + int factor = 90; int wrapped = 0; again: while (1) { @@ -708,7 +691,7 @@ again: if (!full_search factor 10) { last = search_start; full_search = 1; - factor = 10; + factor = 100; goto again; } found: @@ -3513,7 +3496,7 @@ static int should_alloc_chunk(struct btrfs_root *root, if (force == CHUNK_ALLOC_LIMITED) { thresh = btrfs_super_total_bytes(root-fs_info-super_copy); thresh = max_t(u64, 64 * 1024 * 1024, - div_factor_fine(thresh, 1)); + div_factor(thresh, 1)); if (num_bytes - num_allocated thresh) return 1; @@ -3521,12 +3504,12 @@ static int should_alloc_chunk(struct btrfs_root *root, thresh = btrfs_super_total_bytes(root-fs_info-super_copy); /* 256MB or 2% of the FS */ - thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2)); + thresh = max_t(u64, 256 * 1024 * 1024, div_factor(thresh, 2)); /* system chunks need a much small threshold */ if (sinfo-flags BTRFS_BLOCK_GROUP_SYSTEM) thresh = 32 * 1024 * 1024; - if (num_bytes thresh sinfo-bytes_used div_factor(num_bytes, 8)) + if (num_bytes thresh sinfo-bytes_used div_factor(num_bytes, 80)) return 0; return 1; } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9384a2a..121339c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3297,6 +3297,23 @@ void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock, } } +static int btrfs_check_balance_args(struct btrfs_ioctl_balance_args *bargs) +{ + if ((bargs-data.flags BTRFS_BALANCE_ARGS_USAGE) + (bargs-data.usage 0 || bargs-data.usage 100)) + return -EINVAL; + + if ((bargs-meta.flags BTRFS_BALANCE_ARGS_USAGE) + (bargs-meta.usage 0 || bargs-meta.usage 100)) + return -EINVAL; + + if ((bargs-sys.flags BTRFS_BALANCE_ARGS_USAGE) + (bargs-sys.usage 0 || bargs-sys.usage 100)) + return -EINVAL; + + return 0; +} + static long
Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions
Sorry to reply late. On Mon, 24 Sep 2012 18:47:42 +0200, David Sterba wrote: This is the most straightforward transformation I can think of. It doesn't result in an unnecessary BUG_ON, keeps churn to a minimum and agree with you. doesn't change the style of the balance ioctl. (If I were to check every filter argument that way, btrfs_balance_ioctl() would be very long and complicated.) I think the check in btrfs_balance_ioctl() is necessary, the reason is above. btrfs_balance_ioctl does not seem as the right place, it does the processing related to the state of balance (resume/cancel etc). Look at btrfs_balance() itself, it does lot more sanity checks of the parameters I think we should not put the check in btrfs_balance(), because the arguments are valid forever if they pass the check when they are input, if we put the check in btrfs_balance(), the check will be done every time we resume the balance. it is unnecessary. We can put the extra checks into helpers (and not only this one) if clarity and readability of the function becomes a concern. Agree. I will put this check into a helper in the next version of this patch. And I will make a separate patch to move the current check in btrfs_balance from btrfs_balance to the above helper after this patch is received. Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRF - Storage Usage
On Thu, Sep 27, 2012 at 12:44:27PM +0200, Sébastien Maury wrote: I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. This is in the FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F Short answer: you can't know in general. Longer answer -- see below. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem show /dev/sda3 Label: none uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212 Total devices 1 FS bytes used 2.58GB devid1 size 131.64GB used 10.04GB path /dev/sda3 You have 131.64 GiB of raw storage in your filesystem. Of that, 10.04 GiB is currently allocated for use by the FS (and it will take more as it needs it). poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB 4.01 GiB of the 10.04 GiB allocation is assigned for use by data, and 2.16 GiB of that allocation actually contains data. System, DUP: total=8.00MB, used=4.00KB 16 MiB (=2*8.00 MiB) of the 10.04 GiB allocation is assigned for use as two copies of the system data. There is 4 KiB of system data actually used. System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB 6 GiB (=2*3.00 GiB) of your 10.04 GiB allocation is assigned for use as metadata, with two copies (DUP) being kept. 429.16 MiB of the 3.00 GiB is currently in use. Metadata: total=8.00MB, used=0.00 poivron:~ # df -hP / Filesystem Size Used Avail Use% Mounted on /dev/sda3 132G 3.0G 124G 3% / Plain old df can't handle the truth, so this is at best only a hint at what's actually happening. When Avail reaches zero, your FS is probably full. Other than that, you can't necessarily say very much. === Please help me understand and interpret those information to know the most accurately as possible what is my real remaining space, and what space is used by what. Although, i don't really understand the output of the command btrfs filesystem df / : what are exactly Data, System DUP, System total, Metadata DUP and Metadata total ? This should all be covered in the glossary on the website: https://btrfs.wiki.kernel.org/index.php/Glossary Data is the contents of your files. Metadata is all the other stuff that the FS needs in order to store your files -- directory structures, permissions, locations of the file data, that kind of thing. System is a particular bit of the metadata (the chunk tree) which governs an internal physical/virtual mapping, and which needs to be read before anything else can make any kind of sense. DUP is a bit like RAID-1: anything stored in a DUP chunk is actually written to two different places on the disk, and can help recovery in the case of physical disk corruption (e.g. bad blocks, head crash). == Here are some complementary informations : poivron:~ # uname -a Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012 (3829766) x86_64 x86_64 x86_64 GNU/Linux You [probably(*)] need to upgrade your kernel as soon as possible. btrfs code moves very fast, and 3.0 has significant bugs in it. You should be running the latest released kernel -- right now, that's 3.5, or 3.6-rc7. Next week, it will probably change to 3.6 when Linus makes the next release. Most distributions have a repository somewhere which will give you access to new kernels without too much trouble. Hugo. (*) Some of the enterprise distributions do have backported btrfs fixes in their apparently older kernels. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- __(_' Squeak! --- signature.asc Description: Digital signature
Re: BTRF - Storage Usage
Hi, Thanks for the quick reply, this clarify me lots of things. I've had read the articles you mentioned, but i must admit that your explanations based on my examples makes things even more clearer. Also, if i understand things properly, snaphots size aren't included in the btrfs filesystem show command output ? So, the use, for example, of a du -sh /.snapshots is correct to determine the disk usage of my snapshots ? I will see with the people of my company in charge of maintaining distributions to provide us a more recent kernel. PS : I use SLES 11 SP2 distribution. Hugo Mills h...@carfax.org.uk a écrit : On Thu, Sep 27, 2012 at 12:44:27PM +0200, Sébastien Maury wrote: I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. This is in the FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F Short answer: you can't know in general. Longer answer -- see below. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem show /dev/sda3 Label: none uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212 Total devices 1 FS bytes used 2.58GB devid1 size 131.64GB used 10.04GB path /dev/sda3 You have 131.64 GiB of raw storage in your filesystem. Of that, 10.04 GiB is currently allocated for use by the FS (and it will take more as it needs it). poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB 4.01 GiB of the 10.04 GiB allocation is assigned for use by data, and 2.16 GiB of that allocation actually contains data. System, DUP: total=8.00MB, used=4.00KB 16 MiB (=2*8.00 MiB) of the 10.04 GiB allocation is assigned for use as two copies of the system data. There is 4 KiB of system data actually used. System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB 6 GiB (=2*3.00 GiB) of your 10.04 GiB allocation is assigned for use as metadata, with two copies (DUP) being kept. 429.16 MiB of the 3.00 GiB is currently in use. Metadata: total=8.00MB, used=0.00 poivron:~ # df -hP / Filesystem Size Used Avail Use% Mounted on /dev/sda3 132G 3.0G 124G 3% / Plain old df can't handle the truth, so this is at best only a hint at what's actually happening. When Avail reaches zero, your FS is probably full. Other than that, you can't necessarily say very much. === Please help me understand and interpret those information to know the most accurately as possible what is my real remaining space, and what space is used by what. Although, i don't really understand the output of the command btrfs filesystem df / : what are exactly Data, System DUP, System total, Metadata DUP and Metadata total ? This should all be covered in the glossary on the website: https://btrfs.wiki.kernel.org/index.php/Glossary Data is the contents of your files. Metadata is all the other stuff that the FS needs in order to store your files -- directory structures, permissions, locations of the file data, that kind of thing. System is a particular bit of the metadata (the chunk tree) which governs an internal physical/virtual mapping, and which needs to be read before anything else can make any kind of sense. DUP is a bit like RAID-1: anything stored in a DUP chunk is actually written to two different places on the disk, and can help recovery in the case of physical disk corruption (e.g. bad blocks, head crash). == Here are some complementary informations : poivron:~ # uname -a Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012 (3829766) x86_64 x86_64 x86_64 GNU/Linux You [probably(*)] need to upgrade your kernel as soon as possible. btrfs code moves very fast, and 3.0 has significant bugs in it. You should be running the latest released kernel -- right now, that's 3.5, or 3.6-rc7. Next week, it will probably change to 3.6 when Linus makes the next release. Most distributions have a repository somewhere which will give you access to new kernels without too much trouble. Hugo. (*) Some of the enterprise distributions do have backported btrfs fixes in their apparently older kernels. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- __(_' Squeak! --- Cordialement, Sébastien MAURY Responsable d'exploitation du site de Montpellier Équipe DBA ___ INSERM - DSI - Pôle Infrastructures Délégation régionale Languedoc Roussillon 60, rue de Navacelles 34394 Montpellier Cedex 5 Mob : 06 31 51 42 18 Fixe : 04 67 63 61 43 Fax : 04 67 63 70 25 Mél : sebastien.ma...@inserm.fr ___
Re: [PATCH] Btrfs: improve the noflush reservation
Please ignore this patch, my btrfs-next tree is old, and this patch will conflict with Josef's patch [PATCH] Btrfs: run delayed refs first when out of space I will modify this patch as soon as possible. Thanks Miao On thu, 27 Sep 2012 16:45:51 +0800, Miao Xie wrote: In some places(such as: evicting inode), we just can not flush the reserved space of delalloc, flushing the delayed directory index and delayed inode is OK, but we don't try to flush those things and just go back when there is no enough space to be reserved. This patch fixes this problem. We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL. If we can in the transaction, we should not flush anything, or the deadlock would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used, and we will flush all things. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- This is based on btrfs-next tree. --- fs/btrfs/ctree.h | 26 +- fs/btrfs/delayed-inode.c |6 ++- fs/btrfs/extent-tree.c | 85 ++--- fs/btrfs/inode-map.c |5 ++- fs/btrfs/inode.c |8 +++-- fs/btrfs/relocation.c| 12 -- fs/btrfs/transaction.c | 30 +++- fs/btrfs/transaction.h |2 +- 8 files changed, 85 insertions(+), 89 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index dbb461f..cb59e9b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2870,6 +2870,18 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags); u64 btrfs_get_alloc_profile(struct btrfs_root *root, int data); void btrfs_clear_space_info_full(struct btrfs_fs_info *info); + +enum btrfs_reserve_flush_enum { + /* If we are in the transaction, we can't flush anything.*/ + BTRFS_RESERVE_NO_FLUSH, + /* + * Flushing delalloc may cause deadlock somewhere, in this + * case, use FLUSH LIMIT + */ + BTRFS_RESERVE_FLUSH_LIMIT, + BTRFS_RESERVE_FLUSH_ALL, +}; + int btrfs_check_data_free_space(struct inode *inode, u64 bytes); void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes); void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, @@ -2889,19 +2901,13 @@ struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root, void btrfs_free_block_rsv(struct btrfs_root *root, struct btrfs_block_rsv *rsv); int btrfs_block_rsv_add(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 num_bytes); -int btrfs_block_rsv_add_noflush(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 num_bytes); + struct btrfs_block_rsv *block_rsv, u64 num_bytes, + enum btrfs_reserve_flush_enum flush); int btrfs_block_rsv_check(struct btrfs_root *root, struct btrfs_block_rsv *block_rsv, int min_factor); int btrfs_block_rsv_refill(struct btrfs_root *root, - struct btrfs_block_rsv *block_rsv, - u64 min_reserved); -int btrfs_block_rsv_refill_noflush(struct btrfs_root *root, -struct btrfs_block_rsv *block_rsv, -u64 min_reserved); +struct btrfs_block_rsv *block_rsv, u64 min_reserved, +enum btrfs_reserve_flush_enum flush); int btrfs_block_rsv_migrate(struct btrfs_block_rsv *src_rsv, struct btrfs_block_rsv *dst_rsv, u64 num_bytes); diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index eb768c4..2e2eddb 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -651,7 +651,8 @@ static int btrfs_delayed_inode_reserve_metadata( */ if (!src_rsv || (!trans-bytes_reserved src_rsv-type != BTRFS_BLOCK_RSV_DELALLOC)) { - ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes); + ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes, + BTRFS_RESERVE_NO_FLUSH); /* * Since we're under a transaction reserve_metadata_bytes could * try to commit the transaction which will make it return @@ -686,7 +687,8 @@ static int btrfs_delayed_inode_reserve_metadata( * reserve something strictly for us. If not be a pain and try * to steal from the delalloc block rsv. */ - ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes); + ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes, +
Re: BTRF - Storage Usage
On Thu, Sep 27, 2012 at 01:25:58PM +0200, Sébastien Maury wrote: Hi, Thanks for the quick reply, this clarify me lots of things. I've had read the articles you mentioned, but i must admit that your explanations based on my examples makes things even more clearer. Also, if i understand things properly, snaphots size aren't included in the btrfs filesystem show command output ? So, the use, for example, of a du -sh /.snapshots is correct to determine the disk usage of my snapshots ? Disk usage of a snapshot has two different answers: 1) The total size of the files listed in the snapshot, which you can get from du. 2) The amount of space that would be freed up by deleting the snapshot, which isn't currently available, but probably will be soon. (The additional bookkeeping code was part of the qgroups patches, which are in 3.6). I will see with the people of my company in charge of maintaining distributions to provide us a more recent kernel. PS : I use SLES 11 SP2 distribution. OK, that one's actually one of the few that does keep proper backports: https://btrfs.wiki.kernel.org/index.php/Getting_started#Distro_support That said, I don't know how good they are at keeping up -- probably pretty good, but other people here may be able to answer that better. Hugo. Hugo Mills h...@carfax.org.uk a écrit : On Thu, Sep 27, 2012 at 12:44:27PM +0200, Sébastien Maury wrote: I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. This is in the FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F Short answer: you can't know in general. Longer answer -- see below. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem show /dev/sda3 Label: none uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212 Total devices 1 FS bytes used 2.58GB devid1 size 131.64GB used 10.04GB path /dev/sda3 You have 131.64 GiB of raw storage in your filesystem. Of that, 10.04 GiB is currently allocated for use by the FS (and it will take more as it needs it). poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB 4.01 GiB of the 10.04 GiB allocation is assigned for use by data, and 2.16 GiB of that allocation actually contains data. System, DUP: total=8.00MB, used=4.00KB 16 MiB (=2*8.00 MiB) of the 10.04 GiB allocation is assigned for use as two copies of the system data. There is 4 KiB of system data actually used. System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB 6 GiB (=2*3.00 GiB) of your 10.04 GiB allocation is assigned for use as metadata, with two copies (DUP) being kept. 429.16 MiB of the 3.00 GiB is currently in use. Metadata: total=8.00MB, used=0.00 poivron:~ # df -hP / Filesystem Size Used Avail Use% Mounted on /dev/sda3 132G 3.0G 124G 3% / Plain old df can't handle the truth, so this is at best only a hint at what's actually happening. When Avail reaches zero, your FS is probably full. Other than that, you can't necessarily say very much. === Please help me understand and interpret those information to know the most accurately as possible what is my real remaining space, and what space is used by what. Although, i don't really understand the output of the command btrfs filesystem df / : what are exactly Data, System DUP, System total, Metadata DUP and Metadata total ? This should all be covered in the glossary on the website: https://btrfs.wiki.kernel.org/index.php/Glossary Data is the contents of your files. Metadata is all the other stuff that the FS needs in order to store your files -- directory structures, permissions, locations of the file data, that kind of thing. System is a particular bit of the metadata (the chunk tree) which governs an internal physical/virtual mapping, and which needs to be read before anything else can make any kind of sense. DUP is a bit like RAID-1: anything stored in a DUP chunk is actually written to two different places on the disk, and can help recovery in the case of physical disk corruption (e.g. bad blocks, head crash). == Here are some complementary informations : poivron:~ # uname -a Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012 (3829766) x86_64 x86_64 x86_64 GNU/Linux You [probably(*)] need to upgrade your kernel as soon as possible. btrfs code moves very fast, and 3.0 has significant bugs in it. You should be running the latest released kernel -- right now, that's 3.5, or 3.6-rc7. Next week, it will probably
Re: BTRF - Storage Usage
Hi, Thanks a lot for your time and answers. Things look pretty clear now for me. I'm monitoring my systems using nagios, and i was annoyed about the disk usage monitoring. Thanks to your answers, i should be able to developp a rather accurate script. Or so i hope :) Regards, Sebastien. Hugo Mills h...@carfax.org.uk a écrit : On Thu, Sep 27, 2012 at 01:25:58PM +0200, Sébastien Maury wrote: Hi, Thanks for the quick reply, this clarify me lots of things. I've had read the articles you mentioned, but i must admit that your explanations based on my examples makes things even more clearer. Also, if i understand things properly, snaphots size aren't included in the btrfs filesystem show command output ? So, the use, for example, of a du -sh /.snapshots is correct to determine the disk usage of my snapshots ? Disk usage of a snapshot has two different answers: 1) The total size of the files listed in the snapshot, which you can get from du. 2) The amount of space that would be freed up by deleting the snapshot, which isn't currently available, but probably will be soon. (The additional bookkeeping code was part of the qgroups patches, which are in 3.6). I will see with the people of my company in charge of maintaining distributions to provide us a more recent kernel. PS : I use SLES 11 SP2 distribution. OK, that one's actually one of the few that does keep proper backports: https://btrfs.wiki.kernel.org/index.php/Getting_started#Distro_support That said, I don't know how good they are at keeping up -- probably pretty good, but other people here may be able to answer that better. Hugo. Hugo Mills h...@carfax.org.uk a écrit : On Thu, Sep 27, 2012 at 12:44:27PM +0200, Sébastien Maury wrote: I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. This is in the FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ#Why_are_there_so_many_ways_to_check_the_amount_of_free_space.3F Short answer: you can't know in general. Longer answer -- see below. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem show /dev/sda3 Label: none uuid: 9e68b667-f9f9-490f-9da1-ae4e91558212 Total devices 1 FS bytes used 2.58GB devid1 size 131.64GB used 10.04GB path /dev/sda3 You have 131.64 GiB of raw storage in your filesystem. Of that, 10.04 GiB is currently allocated for use by the FS (and it will take more as it needs it). poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB 4.01 GiB of the 10.04 GiB allocation is assigned for use by data, and 2.16 GiB of that allocation actually contains data. System, DUP: total=8.00MB, used=4.00KB 16 MiB (=2*8.00 MiB) of the 10.04 GiB allocation is assigned for use as two copies of the system data. There is 4 KiB of system data actually used. System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB 6 GiB (=2*3.00 GiB) of your 10.04 GiB allocation is assigned for use as metadata, with two copies (DUP) being kept. 429.16 MiB of the 3.00 GiB is currently in use. Metadata: total=8.00MB, used=0.00 poivron:~ # df -hP / Filesystem Size Used Avail Use% Mounted on /dev/sda3 132G 3.0G 124G 3% / Plain old df can't handle the truth, so this is at best only a hint at what's actually happening. When Avail reaches zero, your FS is probably full. Other than that, you can't necessarily say very much. === Please help me understand and interpret those information to know the most accurately as possible what is my real remaining space, and what space is used by what. Although, i don't really understand the output of the command btrfs filesystem df / : what are exactly Data, System DUP, System total, Metadata DUP and Metadata total ? This should all be covered in the glossary on the website: https://btrfs.wiki.kernel.org/index.php/Glossary Data is the contents of your files. Metadata is all the other stuff that the FS needs in order to store your files -- directory structures, permissions, locations of the file data, that kind of thing. System is a particular bit of the metadata (the chunk tree) which governs an internal physical/virtual mapping, and which needs to be read before anything else can make any kind of sense. DUP is a bit like RAID-1: anything stored in a DUP chunk is actually written to two different places on the disk, and can help recovery in the case of physical disk corruption (e.g. bad blocks, head crash). == Here are some complementary informations : poivron:~ # uname -a Linux poivron 3.0.26-0.7-default #1 SMP Tue Apr 17 10:27:57 UTC 2012 (3829766) x86_64 x86_64 x86_64 GNU/Linux You [probably(*)]
Re: [PATCH] Btrfs: fix wrong calculation of the available space when reserving the space
Please ignore this patch, it is not based on the new btrfs-next tree. I'll send the right one as soon as possible. Thanks Miao On thu, 27 Sep 2012 17:09:15 +0800, Miao Xie wrote: According to the comment, we can overcommit the space up to 1/2 of the total disk space, or we just can overcommit up to 1/8. But the code was written reversedly. Fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- This is based on btrfs-next tree. --- fs/btrfs/extent-tree.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a010234..8a01087 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3962,9 +3962,9 @@ again: * 1/2 of the space. */ if (flush) - avail = 3; - else avail = 1; + else + avail = 3; spin_unlock(root-fs_info-free_chunk_lock); if (used + num_bytes space_info-total_bytes + avail) { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix wrong calculation of the available space when reserving the space
On Thu, Sep 27, 2012 at 03:09:15AM -0600, Miao Xie wrote: According to the comment, we can overcommit the space up to 1/2 of the total disk space, or we just can overcommit up to 1/8. But the code was written reversedly. Fix it. Sorry the comment is wrong, I was actually just looking at this the other day :). Basically we want non-flushers to be able to overcommit more to give those guys more of a chance of being able to make an allocation, but we want flushers to not be able to overcommit too much since they are allowed to make more headroom, so the logic is right, the comment is wrong. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 1/2] Btrfs: cleanup duplicated division functions
Hi Miao, You haven't addressed any of my concerns with v3. On Thu, Sep 27, 2012 at 06:19:58PM +0800, Miao Xie wrote: (snipped) the parameters are right. So besides the code cleanup, this patch also add a check for the usage of the space balance, it is the only place that we need add check to make sure the parameters of div_factor are right till now. Besides that, the old kernel may hold the wrong usage value, so we must rectify it. Cleaning up/unifying duplicated functions and changing the existing logic are two very different things. If you, in the course of writing this patch, became unhappy with the way balancing ioctl deals with invalid input, please send a separate patch. Before your patch, volumes.c had its own copy of div_factor_fine(): static u64 div_factor_fine(u64 num, int factor) { if (factor = 0) return 0; if (factor = 100) return num; num *= factor; do_div(num, 100); return num; } which was called from chunk_usage_filter() on unvalidated user input. As far as the cleanup part of your patch goes, you've dropped factor = 0 / factor = 100 logic, merged volumes.c's copy with extent-tree.c's copy and renamed div_factor_fine() to div_factor(). To make chunk_usage_filter() happy again, it's enough to move the dropped logic directly to the call site: static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, struct btrfs_balance_args *bargs) { ... - user_thresh = div_factor_fine(cache-key.offset, bargs-usage); + if (bargs-usage == 0) + user_thresh = 0; + else if (bargs-usage = 100) + user_thresh = cache-key.offset; + else + user_thresh = div_factor(cache-key.offset, bargs-usage); ... } So I would suggest you drop all hunks related to changing the way balancing ioctl works and make the above change to chunk_usage_filter() instead. Once again, if you are unhappy with usage filter argument handling, send a separate patch. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send/receive review by vfs folks
Hi Jan, I hope to get my proposal working soon, then expect for some code from me to look at. Thanks! Alex. On Mon, Sep 24, 2012 at 11:27 AM, Jan Schmidt list.bt...@jan-o-sch.net wrote: Hi Alex, On Mon, September 24, 2012 at 11:13 (+0200), Alex Lyakas wrote: Hi, write_buf: Used to write the stream to a user space supplied pipe. Please note the ERESTARTSYS comment there, I need some help here as I don't know how to handle that correctly. If I ignore the return value, it loops forever. If I bail out to user space, it reenters the ioctl and starts from the beginning (which is really bad). I have two possible solutions in my mind. 1. Store some kind of state in the ioctl arguments so that we can continue where we stopped when the ioctl reenters. This would however complicate the code a lot. 2. Spawn a thread when the ioctl is called and leave the ioctl immediately. I don't know if ERESTARTSYS can happen in vfs_xxx calls if they happen from a non syscall thread. I am hitting the ERESTARTSYS issue also. To easiest way to repro this is to stop the user process in gdb. As Alexander mentioned, restarting the ioctl from the beginning is really bad, because some commands were already sent to the pipe, and possibly consumed by the user mode (dump_thread). Also the command, on which vfs_write() hit ERESTARTSYS, might not have been pushed fully to the pipe. So if the ioctl() restarts, it starts filling the pipe with duplicate commands, and at least one command in the pipe might be corrupted. So the receive part cannot process such stream successfully (usually it hits crc error). In addition to what Alexander suggested, I have a third suggestion, but I would like to know whether community believes this issue is worth to fix. It's a must-fix in my opinion. As you mentioned, it's easy to hit. Second, code like this doesn't look like it should be in mainline at all: 391 /* TODO handle that correctly */ 392 /*if (ret == -ERESTARTSYS) { 393 continue; 394 }*/ I'm looking forward to your proposal, preferably in form of a patch :-) -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/4] btrfs: extended inode refs
On Tue, Sep 25, 2012 at 04:04:46PM -0400, Chris Mason wrote: On Mon, Aug 20, 2012 at 02:29:17PM -0600, Mark Fasheh wrote: Testing wise, the basic namespace operations work well (link, unlink, etc). The rest has gotten less debugging (and I really don't have a great way of testing the code in tree-log.c) Attached to this e-mail are btrfs-progs patches which make testing of the changes possible. Hi Mark, I hit a few problems testing this, so I have the patch below that I plan on folding into your commits (to keep bisect from crashing in tree log). Just let me know if this is a problem, or if you see any bugs in there. I'm still doing a last round of checks on it, but I wanted to send along early for comments. The biggest change in here is to always check the ref_objectid when returning a backref. Hash collisions mean we may return a ref for a completely different parent id otherwise. I think I caught all the places missing that logic, but please double check me. Ahh yes of course. I missed that in a couple key areas. Thanks for fixing it. Other than that I went through and fixed up bugs in tree-log.c. __add_inode_ref had a bunch of cut and paste errors, and you carefully preserved a huge use-after-free bug in the original add_inode_ref. Cool, everything in there looks good to me. Thanks again Chris! --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On 09/27/2012 12:44 PM, Sébastien Maury wrote: Hi, I've installed a new server using btrfs for my root partition (/). It uses snapper for snapshots management and all seems to work pretty fine. My problem is to be able to know the remaining REAL free space in my partition. Using different commands, i have different results, and i don't know how to interpret them correctly : poivron:~ # btrfs filesystem df / Data: total=4.01GB, used=2.16GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=3.00GB, used=429.16MB Metadata: total=8.00MB, used=0.00 In effect the output of btrfs filesystem df / is not very friendly. What about changing the output as below: $ btrfs filesystem disk-free / Summary: Total:135.00GB Allocated: 10.51GB Unallocated: 124.49GB Free_(Estimated) 86.56GB Average_disk_efficiency: 62 % Details: Chunk-type Mode AllocatedUsedFree -- - - DataSingle4.01GB 2.16GB 1.87GB System DUP 16.00MB 4.00KB 7.99MB System Single4.00MB0.00 4.00MB MetadataDUP 6.00GB429.16MB 2.57GB MetadataSingle8.00MB0.00 8.00MB Where the Free_(Estimated) and Average_disk_efficency are computed as: Average_disk_efficency = ratio of average disk usage = (sum(ChunkUsed)+sum(ChunkFree))/sum(ChunkAllocated) Estimated_available = Average_disk_efficency * Unallocated+sum(ChunkFree) I am open to suggestion about the terms: Used vs Allocated and Free vs Available, or a better description of Average disk efficiency BR G.Baroncelli P.S. the source could be find at http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git branch disk_free -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: cache extent state when writing out dirty metadata pages
Everytime we write out dirty pages we search for an offset in the tree, convert the bits in the state, and then when we wait we search for the offset again and clear the bits. So for every dirty range in the io tree we are doing 4 rb searches, which is suboptimal. With this patch we are only doing 2 searches for every cycle (modulo weird things happening). Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/disk-io.c |4 ++-- fs/btrfs/extent-tree.c |5 +++-- fs/btrfs/extent_io.c| 43 +-- fs/btrfs/extent_io.h|6 -- fs/btrfs/free-space-cache.c |2 +- fs/btrfs/relocation.c |2 +- fs/btrfs/transaction.c | 14 +- fs/btrfs/tree-log.c |3 ++- 8 files changed, 63 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c47a3ae..032cce2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3569,7 +3569,7 @@ static int btrfs_destroy_marked_extents(struct btrfs_root *root, while (1) { ret = find_first_extent_bit(dirty_pages, start, start, end, - mark); + mark, NULL); if (ret) break; @@ -3624,7 +3624,7 @@ static int btrfs_destroy_pinned_extent(struct btrfs_root *root, again: while (1) { ret = find_first_extent_bit(unpin, 0, start, end, - EXTENT_DIRTY); + EXTENT_DIRTY, NULL); if (ret) break; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index efb044e..65941d7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -313,7 +313,8 @@ static u64 add_new_free_space(struct btrfs_block_group_cache *block_group, while (start end) { ret = find_first_extent_bit(info-pinned_extents, start, extent_start, extent_end, - EXTENT_DIRTY | EXTENT_UPTODATE); + EXTENT_DIRTY | EXTENT_UPTODATE, + NULL); if (ret) break; @@ -5028,7 +5029,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans, while (1) { ret = find_first_extent_bit(unpin, 0, start, end, - EXTENT_DIRTY); + EXTENT_DIRTY, NULL); if (ret) break; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8f0f03b..1038f85 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -937,6 +937,7 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, int bits, * @end: the end offset in bytes (inclusive) * @bits: the bits to set in this range * @clear_bits:the bits to clear in this range + * @cached_state: state that we're going to cache * @mask: the allocation mask * * This will go through and set bits for the given range. If any states exist @@ -946,7 +947,8 @@ int set_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, int bits, * boundary bits like LOCK. */ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, - int bits, int clear_bits, gfp_t mask) + int bits, int clear_bits, + struct extent_state **cached_state, gfp_t mask) { struct extent_state *state; struct extent_state *prealloc = NULL; @@ -963,6 +965,15 @@ again: } spin_lock(tree-lock); + if (cached_state *cached_state) { + state = *cached_state; + if (state-start = start state-end start + state-tree) { + node = state-rb_node; + goto hit_next; + } + } + /* * this search will find all the extents that end after * our range starts. @@ -993,6 +1004,7 @@ hit_next: */ if (state-start == start state-end = end) { set_state_bits(tree, state, bits); + cache_state(state, cached_state); state = clear_state_bit(tree, state, clear_bits, 0); if (last_end == (u64)-1) goto out; @@ -1033,6 +1045,7 @@ hit_next: goto out; if (state-end = end) { set_state_bits(tree, state, bits); + cache_state(state, cached_state); state = clear_state_bit(tree, state, clear_bits, 0); if (last_end == (u64)-1) goto out; @@ -1071,6 +1084,7 @@
[PATCH] btrfs-convert: show progress
Hello, I'm sending a patch to show progress of btrfs-convert command. I put a progress bar in the only heavy process: the btrfs metadata creation (due to CRC calculation): ./btrfs-convert /dev/loop1 Creating btrfs metadata [] 100% Creating ext2fs image file... [DONE] Cleaning up system chunk... [DONE] Conversion complete. I just used \r. I think it is a simple but effective approach without ncurses either other dependencies. Suggestions are welcome. Alfredo convert-progress-bar.patch Description: Binary data
Re: [PATCH] btrfs-convert: show progress
On Thu, Sep 27, 2012 at 6:02 PM, Alfredo Esteban aedelato...@gmail.com wrote: Hello, I'm sending a patch to show progress of btrfs-convert command. I put a progress bar in the only heavy process: the btrfs metadata creation (due to CRC calculation): Please include patches inline in the email, not as an attachment. ./btrfs-convert /dev/loop1 Creating btrfs metadata [] 100% Creating ext2fs image file... [DONE] Cleaning up system chunk... [DONE] Conversion complete. I just used \r. I think it is a simple but effective approach without ncurses either other dependencies. There should probably be some way to disable the progress bar (ideally defaulting to a istty check) so that log files don't capture hundreds if not thousands of lines of [ ]. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 1/2] Btrfs: cleanup duplicated division functions
div_factor{_fine} has been implemented for two times, and these two functions are very similar, so cleanup the reduplicate implement and drop the original div_factor(), and then rename div_factor_fine() to div_factor(). So the factor of the new div_factor() is 100, not 10. And I move div_factor into a independent file named math.h because it is a common math function, may be used by every composition of btrfs. Because these functions are mostly used on the hot path, and we are sure the parameters are right in the most cases, we don't add complex checks for the parameters. But in the other place, we must check and make sure the parameters are right. So besides the code cleanup, this patch also add a check for the usage of the space balance, it is the only place that we need add check to make sure the parameters of div_factor are right till now. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- Changelog v4 - v5: - drop the check in the space balance, and make the churn to a minimum Changelog v3 - v4: - deal with the wrong usage that was input on the old kernel Changelog v2 - v3: - drop the original div_factor and rename div_factor_fine to div_factor - drop the check of the factor Changelog v1 - v2: - add missing check --- fs/btrfs/extent-tree.c | 29 ++--- fs/btrfs/math.h| 35 +++ fs/btrfs/relocation.c |2 +- fs/btrfs/transaction.c |2 +- fs/btrfs/volumes.c | 35 ++- 5 files changed, 53 insertions(+), 50 deletions(-) create mode 100644 fs/btrfs/math.h diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a010234..bcb9ced 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -33,6 +33,7 @@ #include volumes.h #include locking.h #include free-space-cache.h +#include math.h #undef SCRAMBLE_DELAYED_REFS @@ -648,24 +649,6 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info) rcu_read_unlock(); } -static u64 div_factor(u64 num, int factor) -{ - if (factor == 10) - return num; - num *= factor; - do_div(num, 10); - return num; -} - -static u64 div_factor_fine(u64 num, int factor) -{ - if (factor == 100) - return num; - num *= factor; - do_div(num, 100); - return num; -} - u64 btrfs_find_block_group(struct btrfs_root *root, u64 search_start, u64 search_hint, int owner) { @@ -674,7 +657,7 @@ u64 btrfs_find_block_group(struct btrfs_root *root, u64 last = max(search_hint, search_start); u64 group_start = 0; int full_search = 0; - int factor = 9; + int factor = 90; int wrapped = 0; again: while (1) { @@ -708,7 +691,7 @@ again: if (!full_search factor 10) { last = search_start; full_search = 1; - factor = 10; + factor = 100; goto again; } found: @@ -3513,7 +3496,7 @@ static int should_alloc_chunk(struct btrfs_root *root, if (force == CHUNK_ALLOC_LIMITED) { thresh = btrfs_super_total_bytes(root-fs_info-super_copy); thresh = max_t(u64, 64 * 1024 * 1024, - div_factor_fine(thresh, 1)); + div_factor(thresh, 1)); if (num_bytes - num_allocated thresh) return 1; @@ -3521,12 +3504,12 @@ static int should_alloc_chunk(struct btrfs_root *root, thresh = btrfs_super_total_bytes(root-fs_info-super_copy); /* 256MB or 2% of the FS */ - thresh = max_t(u64, 256 * 1024 * 1024, div_factor_fine(thresh, 2)); + thresh = max_t(u64, 256 * 1024 * 1024, div_factor(thresh, 2)); /* system chunks need a much small threshold */ if (sinfo-flags BTRFS_BLOCK_GROUP_SYSTEM) thresh = 32 * 1024 * 1024; - if (num_bytes thresh sinfo-bytes_used div_factor(num_bytes, 8)) + if (num_bytes thresh sinfo-bytes_used div_factor(num_bytes, 80)) return 0; return 1; } diff --git a/fs/btrfs/math.h b/fs/btrfs/math.h new file mode 100644 index 000..4fef49f --- /dev/null +++ b/fs/btrfs/math.h @@ -0,0 +1,35 @@ + +/* + * Copyright (C) 2012 Fujitsu. All rights reserved. + * Written by Miao Xie mi...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59
Re: [PATCH V4 1/2] Btrfs: cleanup duplicated division functions
On thu, 27 Sep 2012 19:56:24 +0300, Ilya Dryomov wrote: the parameters are right. So besides the code cleanup, this patch also add a check for the usage of the space balance, it is the only place that we need add check to make sure the parameters of div_factor are right till now. Besides that, the old kernel may hold the wrong usage value, so we must rectify it. Cleaning up/unifying duplicated functions and changing the existing logic are two very different things. If you, in the course of writing this patch, became unhappy with the way balancing ioctl deals with invalid input, please send a separate patch. Before your patch, volumes.c had its own copy of div_factor_fine(): static u64 div_factor_fine(u64 num, int factor) { if (factor = 0) return 0; if (factor = 100) return num; num *= factor; do_div(num, 100); return num; } which was called from chunk_usage_filter() on unvalidated user input. As far as the cleanup part of your patch goes, you've dropped factor = 0 / factor = 100 logic, merged volumes.c's copy with extent-tree.c's copy and renamed div_factor_fine() to div_factor(). To make chunk_usage_filter() happy again, it's enough to move the dropped logic directly to the call site: static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, struct btrfs_balance_args *bargs) { ... - user_thresh = div_factor_fine(cache-key.offset, bargs-usage); + if (bargs-usage == 0) + user_thresh = 0; + else if (bargs-usage = 100) + user_thresh = cache-key.offset; + else + user_thresh = div_factor(cache-key.offset, bargs-usage); ... } So I would suggest you drop all hunks related to changing the way balancing ioctl works and make the above change to chunk_usage_filter() instead. Once again, if you are unhappy with usage filter argument handling, send a separate patch. Fine. (I forget the rule that one patch just do one thing) Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] btrfs-progs: limit the max value of leafsize and nodesize
From: Robin Dong san...@taobao.com Using mkfs.btrfs like: mkfs.btrfs -l 131072 /dev/sda will return no error, but after mount it, the dmesg will report: BTRFS: couldn't mount because metadata blocksize (131072) was too large The leafsize and nodesize are equal at present, so we just use one function check_leaf_or_node_size to limit leaf and node size below BTRFS_MAX_METADATA_BLOCKSIZE. Signed-off-by: Robin Dong san...@taobao.com Reviewed-by: David Sterba d...@jikos.cz --- ctree.h |6 ++ mkfs.c | 29 +++-- 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/ctree.h b/ctree.h index 7f55229..75c1e0a 100644 --- a/ctree.h +++ b/ctree.h @@ -111,6 +111,12 @@ struct btrfs_trans_handle; #define BTRFS_DEV_ITEMS_OBJECTID 1ULL /* + * the max metadata block size. This limit is somewhat artificial, + * but the memmove costs go through the roof for larger blocks. + */ +#define BTRFS_MAX_METADATA_BLOCKSIZE 65536 + +/* * we can actually store much bigger names, but lets not confuse the rest * of linux */ diff --git a/mkfs.c b/mkfs.c index dff5eb8..8420482 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1201,6 +1201,27 @@ static int zero_output_file(int out_fd, u64 size, u32 sectorsize) return ret; } +static int check_leaf_or_node_size(u32 size, u32 sectorsize) +{ + if (size sectorsize) { + fprintf(stderr, + Illegal leafsize (or nodesize) %u (smaller than %u)\n, + size, sectorsize); + return -1; + } else if (size BTRFS_MAX_METADATA_BLOCKSIZE) { + fprintf(stderr, + Illegal leafsize (or nodesize) %u (larger than %u)\n, + size, BTRFS_MAX_METADATA_BLOCKSIZE); + return -1; + } else if (size (sectorsize - 1)) { + fprintf(stderr, + Illegal leafsize (or nodesize) %u (not align to %u)\n, + size, sectorsize); + return -1; + } + return 0; +} + int main(int ac, char **av) { char *file; @@ -1291,14 +1312,10 @@ int main(int ac, char **av) } } sectorsize = max(sectorsize, (u32)getpagesize()); - if (leafsize sectorsize || (leafsize (sectorsize - 1))) { - fprintf(stderr, Illegal leafsize %u\n, leafsize); + if (check_leaf_or_node_size(leafsize, sectorsize)) exit(1); - } - if (nodesize sectorsize || (nodesize (sectorsize - 1))) { - fprintf(stderr, Illegal nodesize %u\n, nodesize); + if (check_leaf_or_node_size(nodesize, sectorsize)) exit(1); - } ac = ac - optind; if (ac == 0) print_usage(); -- 1.7.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] btrfs-progs: limit the min value of total_bytes
From: Robin Dong san...@taobao.com Using mkfs.btrfs like: mkfs.btrfs -b 1048576 /dev/sda will report error: mkfs.btrfs: volumes.c:796: btrfs_alloc_chunk: Assertion `!(ret)' failed. Aborted because the length of dev_extent is 4MB. But if we use mkfs.btrfs with 8MB total bytes, the newly mounted btrfs filesystem would not contain even one empty file. So 12MB will be good min-value for block_count. Signed-off-by: Robin Dong san...@taobao.com --- mkfs.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/mkfs.c b/mkfs.c index 8420482..496faa8 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1345,7 +1345,11 @@ int main(int ac, char **av) dev_block_count, mixed, nodiscard); if (block_count == 0) block_count = dev_block_count; - else if (block_count dev_block_count) { + else if (block_count 3 * BTRFS_MKFS_SYSTEM_GROUP_SIZE) { + fprintf(stderr, Illegal total number of bytes %u\n, + block_count); + exit(1); + } else if (block_count dev_block_count) { fprintf(stderr, %s is smaller than requested size\n, file); exit(1); } -- 1.7.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] btrfs fi df output [Was Re: BTRF - Storage Usage]
On Thu, 27 Sep 2012 23:02:35 +0200 Goffredo Baroncelli kreij...@libero.it wrote: Sorry for the space error: Below a more correct example $ btrfs filesystem disk-free / Summary: Total:135.00GB Allocated: 10.51GB Unallocated: 124.49GB Free_(Estimated) 86.56GB Average_disk_efficiency: 62 % How do you estimate Free here? Sorry I didn't check the source code in git, but from the Details below nothing leads me to believe that this FS is doomed to only be able to usefully utilize only ~86GB of the partition, and not more. Are you ready to answer the flood of questions from people why their disk is only 62% efficient, and how to tune it to 100%? :-) Why use underscores instead of spaces? Details: Chunk-typeMode AllocatedUsedFree -- - - Data Single4.01GB 2.16GB 1.87GB SystemDUP 16.00MB 4.00KB 7.99MB SystemSingle4.00MB0.00 4.00MB Metadata DUP 6.00GB429.16MB 2.57GB Metadata Single8.00MB0.00 8.00MB -- With respect, Roman ~~~ Stallman had a printer, with code he could not see. So he began to tinker, and set the software free. signature.asc Description: PGP signature