[PATCH v2] Btrfs: set keep_lock when necessary in btrfs_defrag_leaves
path->keep_lock may force to lock tree root and higher nodes, and make lock contention worse, thus it needs to be avoided as much as possible. In btrfs_degrag_leaves, path->keep_lock is set but @path immediatley gets released, which is not necessary at all. Signed-off-by: Liu Bo--- v2: update commit log with more details. fs/btrfs/tree-defrag.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/tree-defrag.c b/fs/btrfs/tree-defrag.c index 3c0987ab587d..c12747904d4c 100644 --- a/fs/btrfs/tree-defrag.c +++ b/fs/btrfs/tree-defrag.c @@ -65,8 +65,6 @@ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, memcpy(, >defrag_progress, sizeof(key)); } - path->keep_locks = 1; - ret = btrfs_search_forward(root, , path, BTRFS_OLDEST_GENERATION); if (ret < 0) goto out; @@ -81,6 +79,7 @@ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, * a deadlock (attempting to write lock an already write locked leaf). */ path->lowest_level = 1; + path->keep_locks = 1; wret = btrfs_search_slot(trans, root, , path, 0, 1); if (wret < 0) { -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: set keep_lock when necessary in btrfs_defrag_leaves
On Thu, Apr 26, 2018 at 4:01 AM, David Sterbawrote: > On Wed, Apr 25, 2018 at 09:40:34AM +0800, Liu Bo wrote: >> path->keep_lock is set but @path immediatley gets released, this sets >> ->keep_lock only when it's necessary. > > Can you please write down more details for context? This mostly repeats > what the code does, but not why. Thanks. Urr, right, I missed the important piece. ->keep_lock may hold the locks of all nodes on the path instead of only level=1 node and level=0 leaf. As it's more likely that lock content happens on tree root and higher nodes, we have to release as less locks as possible. Will update in v2. thanks, liubo > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: set keep_lock when necessary in btrfs_defrag_leaves
On Wed, Apr 25, 2018 at 09:40:34AM +0800, Liu Bo wrote: > path->keep_lock is set but @path immediatley gets released, this sets > ->keep_lock only when it's necessary. Can you please write down more details for context? This mostly repeats what the code does, but not why. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status page
Gandalf Corvotempesta posted on Wed, 25 Apr 2018 14:30:42 +0200 as excerpted: > For me, RAID56 is mandatory. Any ETA for a stable RAID56 ? > Is something we should expect this year, next year, next 10 years, > ? It's complicated... is the best short answer to that. Here's my take at a somewhat longer, admin/user-oriented (as I'm not a dev, just a btrfs user and list regular), answer. AFAIK, current status of raid56/parity-raid is "no known major bugs left in the current code, but one major caveat, the common to parity-raid unless worked around some other way 'degraded-mode parity-raid write hole'", which arguably has somewhat more significance in btrfs than other parity-raid implications because the current raid56 implementation doesn't checksum the parity itself, thus losing some of the data integrity safeguards people normally choose btrfs for in the first place. The implications are particularly disturbing with regard to metadata because due to parity-raid's read-modify-write cycle, it's not just newly written/changed data/metadata that's put at risk, but potentially otherwise old and stable data as well. Again, this is a known issue with parity-raid in general, that simply has additional implications on btrfs. But because it's a generally well known issue, there are generally well accepted mitigations available. *If* your storage plans account for that with sufficient safeguards such as a good (tested) backup routine that ensures that you are actually defining as appropriately valuable your data by the number and frequency of backups you have of it... (Data without a backup is simply being defined as of less value than the time/trouble/resources necessary to do that backup, because if it were more valuable, there'd *BE* that backup.) ... Then AFAIK at this point the only thing btrfs raid56 mode lacks, stability-wise, is the testing of time, since until recently there *were* severe known bugs, and altho they've now been fixed, the fixes are recent enough that it's quite possible that other bugs still remain to show themselves, now that the older bugs have been fixed. My own suggestion for such time-testing is a year, five kernel cycles, after the last known severe bug has been fixed. If there's no hint of further reset-the-clock level bugs in that time, then it's reasonable to consider, still with some caution and additional safeguards, deployment beyond testing. Meanwhile, as others have mentioned, there are a number of proposals out there for write-hole mitigation. The theoretically cleanest but also the most intensive, since it requires rewriting and retesting much of the existing raid56 mode, would be rewriting raid56 mode to COW and checksum parity as well. If this happens, it's almost certainly least five years out to well tested and could well be a decade out. Another possibility is taking a technique from zfs, doing stripes of varying size (varying number of strips less than the total number of devices) depending on how much data is being written. Btrfs raid56 mode can already deal with this to some extent, and does so when some devices are smaller than others and thus run out of space, so stripes written after that don't include them. A similar situation occurs when devices are added, until a balance redoes existing stripes to take into account the new device. What btrfs raid56 mode /could/ do is extend this and handle small writes much as zfs does, deliberately writing less than full- width stripes when there's less data, thus avoiding read-modify-write of existing data/metadata. A balance could then be scheduled periodically to restripe these "short stripes" to full width. A variant of the above would simply write full-width, but partially empty, stripes. Both of these should be less work to code than the first/ cleanest solution above since they to a large extent simply repurpose existing code, but they're somewhat more complicated and thus potentially more bug prone, and they both would require periodic rebalancing of the short or partially empty stripes to full width for full efficiency. Finally, there's the possibility of logging partial-width writes before actually writing them. This would be an extension to existing code, and would require writing small writes twice, once to the log and then rewriting to the main storage at full stripe width with parity. As a result, it'd slow things down (tho only for less than full-width stripe writes, full width would be written as normal as they don't involve the risky read-modify-write cycle), but people don't choose parity-raid for write speed anyway, /because/ of the read-modify-write penalty it imposes. This last solution should involve the least change to existing code, and thus should be the fastest to implement, with the least chance of introducing new bugs so the testing and bugfixing cycle should be shorter as well, but ouch, that logged-write penalty
Re: Btrfs progs release 4.16.1
David Sterba posted on Wed, 25 Apr 2018 13:02:34 +0200 as excerpted: > On Wed, Apr 25, 2018 at 06:31:20AM +, Duncan wrote: >> David Sterba posted on Tue, 24 Apr 2018 13:58:57 +0200 as excerpted: >> >> > btrfs-progs version 4.16.1 have been released. This is a bugfix >> > release. >> > >> > Changes: >> > >> > * remove obsolete tools: btrfs-debug-tree, btrfs-zero-log, >> > btrfs-show-super, btrfs-calc-size >> >> Cue the admin-side gripes about developer definitions of micro-upgrade >> explicit "bugfix release" that allow disappearance of "obsolete tools". >> >> Arguably such removals can be expected in a "feature release", but >> shouldn't surprise unsuspecting admins doing a micro-version upgrade >> that's specifically billed as a "bugfix release". > > A major version release would be a better time for the removal, I agree > and should have considered that. > > However, the tools have been obsoleted for a long time (since 2015 or > 2016) so I wonder if the deprecation warnings have been ignored by the > admins all the time. Indeed, in practice, anybody still using the stand-alone tools in a current version has been ignoring deprecation warnings for awhile, and the difference between 4.16.1 and 4.17(.0) isn't likely to make much of a difference to them. It's just that from here anyway, if I did a big multi-version upgrade and saw tools go missing I'd expect it, and if I did an upgrade from 4.16 to 4.17 I'd expect it and blame myself for not getting with the program sooner. But on an upgrade from 4.16 to 4.16.1, furthermore, an explicit "bugfix release", I'd be annoyed with upstream when they went missing, because it's just not expected in such a minor release, particularly when it's an explicit "bugfix release". >> (Further support for btrfs being "still stabilizing, not yet fully >> stable and mature." But development mode habits need to end >> /sometime/, if stability is indeed a goal.) > > What happened here was a bad release management decision, a minor one in > my oppinion but I hear your complaint and will keep that in mind for > future releases. That's all I was after. A mere trifle indeed in the filesystem context where there's a real chance that bugs can eat data, but equally trivially held off for a .0 release. What's behind is done, but it can and should be used to inform the future, and I simply mentioned it here with the goal /of/ informing future release decisions. To the extent that it does so, my post accomplished its purpose. =:^) Seems my way of saying that ended up coming across way more negative than intended. So I have some changes to make in the way I handle things in the future as well. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Change bit number of BTRFS_FS_BALANCE_RUNNING
On 25.04.2018 18:18, Anand Jain wrote: > > > On 04/25/2018 09:16 PM, David Sterba wrote: >> On Wed, Apr 25, 2018 at 03:53:29PM +0300, Nikolay Borisov wrote: >>> Commit ddd93ef3b9d6 ("btrfs: track running balance in a simpler way") >>> which introduced this bit assigned it number 17. Unfortunately this bit >>> is already occupied by the BTRFS_FS_NEED_ASYNC_COMMIT bit. > > >>> This was >>> causing bit 17 to be cleared while __btrfs_balance was running which >>> resulted in fs_info->balance_ctl being deleted > > Any idea which thread deleted the fs_info::balance_ctl while balance > was still running? So what happened is that the test's btrfs balance stop ioctl was allowed to proceed while __btrfs_balance was running. And this was due to bit 17 being cleared by btrfs_commit_transaction. > > Thanks for catching. > -Anand > > >>> while we have balance >>> running. This manifested in an UAF crash. Fix it by putting the >>> definition of the newly introduced balance bit after NEED_ASYNC_COMMIT >>> and giving it number 18. >>> >>> Fixes: ddd93ef3b9d6 ("btrfs: track running balance in a simpler way") >> >> Uh, thanks for catching it. The bit was free when the volume mutex >> removal patchset was in development, but the number 17 got used by the >> recent qgroup patch and I did not adjust it afterwards. >> >>> Signed-off-by: Nikolay Borisov>>> --- >>> fs/btrfs/ctree.h | 12 ++-- >>> 1 file changed, 6 insertions(+), 6 deletions(-) >>> >>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >>> index 59998d5f6985..5a7893d7c668 100644 >>> --- a/fs/btrfs/ctree.h >>> +++ b/fs/btrfs/ctree.h >>> @@ -733,16 +733,16 @@ struct btrfs_delayed_root; >>> */ >>> #define BTRFS_FS_EXCL_OP 16 >>> /* >>> - * Indicate that balance has been set up from the ioctl and is in >>> the main >>> - * phase. The fs_info::balance_ctl is initialized. >>> - */ >>> -#define BTRFS_FS_BALANCE_RUNNING 17 >>> - >>> -/* >>> * To info transaction_kthread we need an immediate commit so it >>> doesn't >>> * need to wait for commit_interval >>> */ >>> #define BTRFS_FS_NEED_ASYNC_COMMIT 17 >>> +/* >>> + * Indicate that balance has been set up from the ioctl and is in >>> the main >>> + * phase. The fs_info::balance_ctl is initialized. >>> + */ >>> +#define BTRFS_FS_BALANCE_RUNNING 18 >> >> I'll fold the fix so we don't have an intermediate breakage in the >> history. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Change bit number of BTRFS_FS_BALANCE_RUNNING
On 04/25/2018 09:16 PM, David Sterba wrote: On Wed, Apr 25, 2018 at 03:53:29PM +0300, Nikolay Borisov wrote: Commit ddd93ef3b9d6 ("btrfs: track running balance in a simpler way") which introduced this bit assigned it number 17. Unfortunately this bit is already occupied by the BTRFS_FS_NEED_ASYNC_COMMIT bit. This was causing bit 17 to be cleared while __btrfs_balance was running which resulted in fs_info->balance_ctl being deleted Any idea which thread deleted the fs_info::balance_ctl while balance was still running? Thanks for catching. -Anand while we have balance running. This manifested in an UAF crash. Fix it by putting the definition of the newly introduced balance bit after NEED_ASYNC_COMMIT and giving it number 18. Fixes: ddd93ef3b9d6 ("btrfs: track running balance in a simpler way") Uh, thanks for catching it. The bit was free when the volume mutex removal patchset was in development, but the number 17 got used by the recent qgroup patch and I did not adjust it afterwards. Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 59998d5f6985..5a7893d7c668 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -733,16 +733,16 @@ struct btrfs_delayed_root; */ #define BTRFS_FS_EXCL_OP 16 /* - * Indicate that balance has been set up from the ioctl and is in the main - * phase. The fs_info::balance_ctl is initialized. - */ -#define BTRFS_FS_BALANCE_RUNNING 17 - -/* * To info transaction_kthread we need an immediate commit so it doesn't * need to wait for commit_interval */ #define BTRFS_FS_NEED_ASYNC_COMMIT17 +/* + * Indicate that balance has been set up from the ioctl and is in the main + * phase. The fs_info::balance_ctl is initialized. + */ +#define BTRFS_FS_BALANCE_RUNNING 18 I'll fold the fix so we don't have an intermediate breakage in the history. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] enhance btrfs_raid_array[]
On 25.04.2018 14:01, Anand Jain wrote: > Cleanup patches as in the individual change log. > > These patches were sent independently as they aren't related as such, > but as I am updating the 1/3 the 2/3 would endup with conflict. So > here I am putting all of them together with the conflict fixed at > my end. And as such there is no change in 2/3 and 3/3 from its v1. > > Anand Jain (3): > btrfs: kill btrfs_raid_type_names[] > btrfs: kill btrfs_raid_group[] > btrfs: kill btrfs_raid_mindev_error[] > > fs/btrfs/disk-io.c | 2 +- > fs/btrfs/extent-tree.c | 20 +-- > fs/btrfs/volumes.c | 54 > +++--- > fs/btrfs/volumes.h | 7 +-- > 4 files changed, 36 insertions(+), 47 deletions(-) > For the whole series: Reviewed-by: Nikolay Borisov-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Inconsistent behavior of fsync in btrfs
On Wed, Apr 25, 2018 at 7:36 AM, Ashlie Martinezwrote: > I don't really know all that much about the btrfs code, but I was > digging around to try and find the cause of this. Given the fact that > inserting a sleep between the fsyncs gives correct behavior, do you > think the issue could be related to how btrfs determines whether or > not to log a change to an inode? I found some code in > btrfs_log_inode_parent() (part of the fsync path for both files and > directories) that appears to check if the inode being fsync-ed is > already in the log and, if it is, returns BTRFS_NO_LOG_SYNC [1]. Since > in both cases we saw issues where the same inode was either changed > parent directories (rename) or was present in multiple directories > (hard link), it seems plausible that this could be the problem. Do you > have any thoughts on this? > > [1] > https://www.google.com/url?q=https://elixir.bootlin.com/linux/v4.16-rc7/source/fs/btrfs/tree-log.c%23L5563=D=hangouts=1524702498584000=AFQjCNE_KadcgkZ7xiIhLOzQCFQoet8Lqw > > Thanks, > Ashlie > > On Tue, Apr 24, 2018 at 10:16 PM, Vijaychidambaram Velayudhan Pillai > wrote: >> Hi Chris, >> >> On Tue, Apr 24, 2018 at 10:07 PM, Chris Murphy >> wrote: >>> I don't have answer to your question, Sending inline just to make sure I get a response (sorry if this is spam): I don't really know all that much about the btrfs code, but I was digging around to try and find the cause of this. Given the fact that inserting a sleep between the fsyncs gives correct behavior, do you think the issue could be related to how btrfs determines whether or not to log a change to an inode? I found some code in btrfs_log_inode_parent() (part of the fsync path for both files and directories) that appears to check if the inode being fsync-ed is already in the log and, if it is, returns BTRFS_NO_LOG_SYNC [1]. Since in both cases we saw issues where the same inode was either changed parent directories (rename) or was present in multiple directories (hard link), it seems plausible that this could be the problem. Do you have any thoughts on this? [1] https://www.google.com/url?q=https://elixir.bootlin.com/linux/v4.16-rc7/source/fs/btrfs/tree-log.c%23L5563=D=hangouts=1524702498584000=AFQjCNE_KadcgkZ7xiIhLOzQCFQoet8Lqw Thanks, Ashlie but I'm curious exactly how you >>> simulate a crash? For my own really rudimentary testing I've been doing >>> crazy things like: >>> >>> # grub-mkconfig -o /boot/efi && echo b > /proc/sysrq-trigger >>> >>> And seeing what makes it to disk - or not. And I'm finding a some >>> non-determinstic results are possible even in a VM which is a bit confusing. >>> I'm sure with real hardware I'd find even more inconsistency. >> >> We are using software we developed called CrashMonkey [1]. It >> simulates the state on storage after a crash (taking into accounts >> FLUSH and FUA flags). Talk slides on how it works can be found here >> [2]. >> >> It is similar to dm-log-writes if you have used that in the past. >> >> [1] https://github.com/utsaslab/crashmonkey >> [2] >> http://www.cs.utexas.edu/~vijay/papers/hotstorage17-crashmonkey-slides.pdf >> >> Thanks, >> Vijay Chidambaram >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Change bit number of BTRFS_FS_BALANCE_RUNNING
On Wed, Apr 25, 2018 at 03:53:29PM +0300, Nikolay Borisov wrote: > Commit ddd93ef3b9d6 ("btrfs: track running balance in a simpler way") > which introduced this bit assigned it number 17. Unfortunately this bit > is already occupied by the BTRFS_FS_NEED_ASYNC_COMMIT bit. This was > causing bit 17 to be cleared while __btrfs_balance was running which > resulted in fs_info->balance_ctl being deleted while we have balance > running. This manifested in an UAF crash. Fix it by putting the > definition of the newly introduced balance bit after NEED_ASYNC_COMMIT > and giving it number 18. > > Fixes: ddd93ef3b9d6 ("btrfs: track running balance in a simpler way") Uh, thanks for catching it. The bit was free when the volume mutex removal patchset was in development, but the number 17 got used by the recent qgroup patch and I did not adjust it afterwards. > Signed-off-by: Nikolay Borisov> --- > fs/btrfs/ctree.h | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 59998d5f6985..5a7893d7c668 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -733,16 +733,16 @@ struct btrfs_delayed_root; > */ > #define BTRFS_FS_EXCL_OP 16 > /* > - * Indicate that balance has been set up from the ioctl and is in the main > - * phase. The fs_info::balance_ctl is initialized. > - */ > -#define BTRFS_FS_BALANCE_RUNNING 17 > - > -/* > * To info transaction_kthread we need an immediate commit so it doesn't > * need to wait for commit_interval > */ > #define BTRFS_FS_NEED_ASYNC_COMMIT 17 > +/* > + * Indicate that balance has been set up from the ioctl and is in the main > + * phase. The fs_info::balance_ctl is initialized. > + */ > +#define BTRFS_FS_BALANCE_RUNNING 18 I'll fold the fix so we don't have an intermediate breakage in the history. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Change bit number of BTRFS_FS_BALANCE_RUNNING
Commit ddd93ef3b9d6 ("btrfs: track running balance in a simpler way") which introduced this bit assigned it number 17. Unfortunately this bit is already occupied by the BTRFS_FS_NEED_ASYNC_COMMIT bit. This was causing bit 17 to be cleared while __btrfs_balance was running which resulted in fs_info->balance_ctl being deleted while we have balance running. This manifested in an UAF crash. Fix it by putting the definition of the newly introduced balance bit after NEED_ASYNC_COMMIT and giving it number 18. Fixes: ddd93ef3b9d6 ("btrfs: track running balance in a simpler way") Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 59998d5f6985..5a7893d7c668 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -733,16 +733,16 @@ struct btrfs_delayed_root; */ #define BTRFS_FS_EXCL_OP 16 /* - * Indicate that balance has been set up from the ioctl and is in the main - * phase. The fs_info::balance_ctl is initialized. - */ -#define BTRFS_FS_BALANCE_RUNNING 17 - -/* * To info transaction_kthread we need an immediate commit so it doesn't * need to wait for commit_interval */ #define BTRFS_FS_NEED_ASYNC_COMMIT 17 +/* + * Indicate that balance has been set up from the ioctl and is in the main + * phase. The fs_info::balance_ctl is initialized. + */ +#define BTRFS_FS_BALANCE_RUNNING 18 + struct btrfs_fs_info { u8 fsid[BTRFS_FSID_SIZE]; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status page
On Wed, Apr 25, 2018 at 02:30:42PM +0200, Gandalf Corvotempesta wrote: > 2018-04-25 13:39 GMT+02:00 Austin S. Hemmelgarn: > > Define 'stable'. > > Something ready for production use like ext or xfs with no critical > bugs or with easy data loss. > > > If you just want 'safe for critical data', it's mostly there already > > provided that your admins and operators are careful. Assuming you avoid > > qgroups and parity raid, don't run the filesystem near full all the time, > > and keep an eye on the chunk allocations (which is easy to automate with > > newer kernels), you will generally be fine. We've been using it in > > production where I work for a couple of years now, with the only issues > > we've encountered arising from the fact that we're stuck using an older > > kernel which doesn't automatically deallocate empty chunks. > > For me, RAID56 is mandatory. Any ETA for a stable RAID56 ? > Is something we should expect this year, next year, next 10 years, ? There's not really any ETAs for anything in the kernel, in general, unless the relevant code has already been committed and accepted (when it has a fairly deterministic path from then onwards). ETAs for finding even known bugs are pretty variable, depending largely on how easily the bug can be reproduced by the reporter and by the developer. As for a stable version -- you'll have to define "stable" in a way that's actually measurable to get any useful answer, and even then, see my previous comment about ETAs. There have been example patches in the last few months on the subject of closing the write hole, so there's clear ongoing work on that particular item, but again, see the comment on ETAs. It'll be done when it's done. Hugo. -- Hugo Mills | Nothing wrong with being written in Perl... Some of hugo@... carfax.org.uk | my best friends are written in Perl. http://carfax.org.uk/ | PGP: E2AB1DE4 | dark signature.asc Description: Digital signature
Re: Inconsistent behavior of fsync in btrfs
I don't really know all that much about the btrfs code, but I was digging around to try and find the cause of this. Given the fact that inserting a sleep between the fsyncs gives correct behavior, do you think the issue could be related to how btrfs determines whether or not to log a change to an inode? I found some code in btrfs_log_inode_parent() (part of the fsync path for both files and directories) that appears to check if the inode being fsync-ed is already in the log and, if it is, returns BTRFS_NO_LOG_SYNC [1]. Since in both cases we saw issues where the same inode was either changed parent directories (rename) or was present in multiple directories (hard link), it seems plausible that this could be the problem. Do you have any thoughts on this? [1] https://www.google.com/url?q=https://elixir.bootlin.com/linux/v4.16-rc7/source/fs/btrfs/tree-log.c%23L5563=D=hangouts=1524702498584000=AFQjCNE_KadcgkZ7xiIhLOzQCFQoet8Lqw Thanks, Ashlie On Tue, Apr 24, 2018 at 10:16 PM, Vijaychidambaram Velayudhan Pillaiwrote: > Hi Chris, > > On Tue, Apr 24, 2018 at 10:07 PM, Chris Murphy > wrote: >> I don't have answer to your question, but I'm curious exactly how you >> simulate a crash? For my own really rudimentary testing I've been doing >> crazy things like: >> >> # grub-mkconfig -o /boot/efi && echo b > /proc/sysrq-trigger >> >> And seeing what makes it to disk - or not. And I'm finding a some >> non-determinstic results are possible even in a VM which is a bit confusing. >> I'm sure with real hardware I'd find even more inconsistency. > > We are using software we developed called CrashMonkey [1]. It > simulates the state on storage after a crash (taking into accounts > FLUSH and FUA flags). Talk slides on how it works can be found here > [2]. > > It is similar to dm-log-writes if you have used that in the past. > > [1] https://github.com/utsaslab/crashmonkey > [2] http://www.cs.utexas.edu/~vijay/papers/hotstorage17-crashmonkey-slides.pdf > > Thanks, > Vijay Chidambaram > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status page
2018-04-25 13:39 GMT+02:00 Austin S. Hemmelgarn: > Define 'stable'. Something ready for production use like ext or xfs with no critical bugs or with easy data loss. > If you just want 'safe for critical data', it's mostly there already > provided that your admins and operators are careful. Assuming you avoid > qgroups and parity raid, don't run the filesystem near full all the time, > and keep an eye on the chunk allocations (which is easy to automate with > newer kernels), you will generally be fine. We've been using it in > production where I work for a couple of years now, with the only issues > we've encountered arising from the fact that we're stuck using an older > kernel which doesn't automatically deallocate empty chunks. For me, RAID56 is mandatory. Any ETA for a stable RAID56 ? Is something we should expect this year, next year, next 10 years, ? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs progs release 4.16.1
On 2018-04-25 07:29, Christoph Anton Mitterer wrote: On Wed, 2018-04-25 at 07:22 -0400, Austin S. Hemmelgarn wrote: While I can understand Duncan's point here, I'm inclined to agree with David Same from my side... and I run a multi-PiB storage site (though not with btrfs). Cosmetically one shouldn't do this in a bugfix release, this should have really no impact on the real world. The typical sysadmin will anyway use some stable distribution... and is there any which ships already 4.16? Arch, Gentoo, and Void all have it ATM, but whether or not you want to consider them stable is another question. OpenSUSE Tumbleweed and Fedora Rawhide also have 4.16, though those are also of questionable stability. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status page
On 2018-04-25 07:13, Gandalf Corvotempesta wrote: 2018-04-23 17:16 GMT+02:00 David Sterba: Reviewed and updated for 4.16, there's no change regarding the overall status, though 4.16 has some raid56 fixes. Thank you! Any ETA for a stable RAID56 ? (or, even better, for a stable btrfs ready for production use) I've seen many improvements in these months and a stable btrfs seems to be not that far. Define 'stable'. If you want 'bug free', that won't happen ever. Even 'stable' filesystems like XFS and ext4 still have bugs found and fixed on a somewhat regular basis. The only filesystem drivers that don't have bugs reported are either so trivial that there really are no bugs (see for example minix and vfat) or aren't under active development (and therefore all the bugs have been fixed already). If you just want 'safe for critical data', it's mostly there already provided that your admins and operators are careful. Assuming you avoid qgroups and parity raid, don't run the filesystem near full all the time, and keep an eye on the chunk allocations (which is easy to automate with newer kernels), you will generally be fine. We've been using it in production where I work for a couple of years now, with the only issues we've encountered arising from the fact that we're stuck using an older kernel which doesn't automatically deallocate empty chunks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs progs release 4.16.1
On Wed, 2018-04-25 at 07:22 -0400, Austin S. Hemmelgarn wrote: > While I can understand Duncan's point here, I'm inclined to agree > with > David Same from my side... and I run a multi-PiB storage site (though not with btrfs). Cosmetically one shouldn't do this in a bugfix release, this should have really no impact on the real world. The typical sysadmin will anyway use some stable distribution... and is there any which ships already 4.16? Cheers, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs progs release 4.16.1
On 2018-04-25 07:02, David Sterba wrote: On Wed, Apr 25, 2018 at 06:31:20AM +, Duncan wrote: David Sterba posted on Tue, 24 Apr 2018 13:58:57 +0200 as excerpted: btrfs-progs version 4.16.1 have been released. This is a bugfix release. Changes: * remove obsolete tools: btrfs-debug-tree, btrfs-zero-log, btrfs-show-super, btrfs-calc-size Cue the admin-side gripes about developer definitions of micro-upgrade explicit "bugfix release" that allow disappearance of "obsolete tools". Arguably such removals can be expected in a "feature release", but shouldn't surprise unsuspecting admins doing a micro-version upgrade that's specifically billed as a "bugfix release". A major version release would be a better time for the removal, I agree and should have considered that. However, the tools have been obsoleted for a long time (since 2015 or 2016) so I wonder if the deprecation warnings have been ignored by the admins all the time. While I can understand Duncan's point here, I'm inclined to agree with David, with the further addendum that these are all debug tools, and therefore no sane sysadmin should be depending on them for production operation anyway. (Further support for btrfs being "still stabilizing, not yet fully stable and mature." But development mode habits need to end /sometime/, if stability is indeed a goal.) What happened here was a bad release management decision, a minor one in my oppinion but I hear your complaint and will keep that in mind for future releases. Do you really want to use that to perpetuate the 'still stabilizing and not mature' claim? If you expect 0 bugs and essentially no other visible problems, than I don't think you should use linux. Or wait until it's fully stable, whatever that means. I think you mean 'I don't think you should use computers', given that other platforms are just as bad in slightly different ways. In terms of features, btrfs is not done and will be actively developed and maintained. Bugs will be found, reported and fixed, new features will add more code that will have to be stabilized over time. This is how the entire linux kernel evolves. The focus in recent releases has been on cleanups and refactoring, besides bugfixes. No big feature has been merged, to some disappointment of developers and users, but this is namely to minimize the fallout of new code that does not have enough review and testing. My target is to do slow and steady incremental changes with no regressions. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status page
2018-04-23 17:16 GMT+02:00 David Sterba: > Reviewed and updated for 4.16, there's no change regarding the overall > status, though 4.16 has some raid56 fixes. Thank you! Any ETA for a stable RAID56 ? (or, even better, for a stable btrfs ready for production use) I've seen many improvements in these months and a stable btrfs seems to be not that far. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs progs release 4.16.1
On Wed, Apr 25, 2018 at 06:31:20AM +, Duncan wrote: > David Sterba posted on Tue, 24 Apr 2018 13:58:57 +0200 as excerpted: > > > btrfs-progs version 4.16.1 have been released. This is a bugfix > > release. > > > > Changes: > > > > * remove obsolete tools: btrfs-debug-tree, btrfs-zero-log, > > btrfs-show-super, btrfs-calc-size > > Cue the admin-side gripes about developer definitions of micro-upgrade > explicit "bugfix release" that allow disappearance of "obsolete tools". > > Arguably such removals can be expected in a "feature release", but > shouldn't surprise unsuspecting admins doing a micro-version upgrade > that's specifically billed as a "bugfix release". A major version release would be a better time for the removal, I agree and should have considered that. However, the tools have been obsoleted for a long time (since 2015 or 2016) so I wonder if the deprecation warnings have been ignored by the admins all the time. > (Further support for btrfs being "still stabilizing, not yet fully stable > and mature." But development mode habits need to end /sometime/, if > stability is indeed a goal.) What happened here was a bad release management decision, a minor one in my oppinion but I hear your complaint and will keep that in mind for future releases. Do you really want to use that to perpetuate the 'still stabilizing and not mature' claim? If you expect 0 bugs and essentially no other visible problems, than I don't think you should use linux. Or wait until it's fully stable, whatever that means. In terms of features, btrfs is not done and will be actively developed and maintained. Bugs will be found, reported and fixed, new features will add more code that will have to be stabilized over time. This is how the entire linux kernel evolves. The focus in recent releases has been on cleanups and refactoring, besides bugfixes. No big feature has been merged, to some disappointment of developers and users, but this is namely to minimize the fallout of new code that does not have enough review and testing. My target is to do slow and steady incremental changes with no regressions. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: kill btrfs_raid_type_names[]
On 04/25/2018 05:45 PM, David Sterba wrote: On Wed, Apr 25, 2018 at 05:44:13PM +0800, Anand Jain wrote: Add a new member struct btrfs_raid_attr::raid_name so that btrfs_raid_array[] can maintain the name of the raid type, and so we can kill btrfs_raid_type_names[]. Signed-off-by: Anand JainReviewed-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- v1->v2: add space after =. Such as.. + .raid_name = "raid10", ^ --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -342,6 +342,7 @@ struct btrfs_raid_attr { int tolerated_failures; /* max tolerated fail devs */ int devs_increment; /* ndevs has to be a multiple of this */ int ncopies;/* how many copies to data has */ + char *raid_name;/* name of the raid */ There was another comment under v1: const char raid_name[8] Thanks. I fixed this in v3. -Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs: kill btrfs_raid_group[]
Add a new member struct btrfs_raid_attr::bg_flag so that btrfs_raid_array[] can maintain the bit map flag of the raid type, and so we can kill btrfs_raid_group[]. Signed-off-by: Anand Jain--- fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/volumes.c | 19 --- fs/btrfs/volumes.h | 2 +- 4 files changed, 11 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b62559dfb053..2fa063c3ccec 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3525,7 +3525,7 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags) for (raid_type = 0; raid_type < BTRFS_NR_RAID_TYPES; raid_type++) { if (raid_type == BTRFS_RAID_SINGLE) continue; - if (!(flags & btrfs_raid_group[raid_type])) + if (!(flags & btrfs_raid_array[raid_type].bg_flag)) continue; min_tolerated = min(min_tolerated, btrfs_raid_array[raid_type]. diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4742734a73d7..19b4e24854ca 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4178,7 +4178,7 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags) /* First, mask out the RAID levels which aren't possible */ for (raid_type = 0; raid_type < BTRFS_NR_RAID_TYPES; raid_type++) { if (num_devices >= btrfs_raid_array[raid_type].devs_min) - allowed |= btrfs_raid_group[raid_type]; + allowed |= btrfs_raid_array[raid_type].bg_flag; } allowed &= flags; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5bb18ad6433d..de3eea8b393e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -53,6 +53,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 2, .ncopies= 2, .raid_name = "raid10", + .bg_flag= BTRFS_BLOCK_GROUP_RAID10, }, [BTRFS_RAID_RAID1] = { .sub_stripes= 1, @@ -63,6 +64,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 2, .ncopies= 2, .raid_name = "raid1", + .bg_flag= BTRFS_BLOCK_GROUP_RAID1, }, [BTRFS_RAID_DUP] = { .sub_stripes= 1, @@ -73,6 +75,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 2, .raid_name = "dup", + .bg_flag= BTRFS_BLOCK_GROUP_DUP, }, [BTRFS_RAID_RAID0] = { .sub_stripes= 1, @@ -83,6 +86,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 1, .raid_name = "raid0", + .bg_flag= BTRFS_BLOCK_GROUP_RAID0, }, [BTRFS_RAID_SINGLE] = { .sub_stripes= 1, @@ -93,6 +97,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 1, .raid_name = "single", + .bg_flag= 0, }, [BTRFS_RAID_RAID5] = { .sub_stripes= 1, @@ -103,6 +108,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 2, .raid_name = "raid5", + .bg_flag= BTRFS_BLOCK_GROUP_RAID5, }, [BTRFS_RAID_RAID6] = { .sub_stripes= 1, @@ -113,6 +119,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 3, .raid_name = "raid6", + .bg_flag= BTRFS_BLOCK_GROUP_RAID6, }, }; @@ -124,16 +131,6 @@ const char *get_raid_name(enum btrfs_raid_types type) return btrfs_raid_array[type].raid_name; } -const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = { - [BTRFS_RAID_RAID10] = BTRFS_BLOCK_GROUP_RAID10, - [BTRFS_RAID_RAID1] = BTRFS_BLOCK_GROUP_RAID1, - [BTRFS_RAID_DUP]= BTRFS_BLOCK_GROUP_DUP, - [BTRFS_RAID_RAID0] = BTRFS_BLOCK_GROUP_RAID0, - [BTRFS_RAID_SINGLE] = 0, - [BTRFS_RAID_RAID5] = BTRFS_BLOCK_GROUP_RAID5, - [BTRFS_RAID_RAID6] = BTRFS_BLOCK_GROUP_RAID6, -}; - /* * Table to convert BTRFS_RAID_* to the error code if minimum number of devices * condition is not met. Zero means there's no corresponding @@ -1898,7 +1895,7 @@ static int btrfs_check_raid_min_devices(struct btrfs_fs_info
[PATCH v3 1/3] btrfs: kill btrfs_raid_type_names[]
Add a new member struct btrfs_raid_attr::raid_name so that btrfs_raid_array[] can maintain the name of the raid type, and so we can kill btrfs_raid_type_names[]. Signed-off-by: Anand JainReviewed-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- v2->v3: use const char raid_name[8] v1->v2: add space after =. Such as.. + .raid_name = "raid10", fs/btrfs/extent-tree.c | 18 -- fs/btrfs/volumes.c | 15 +++ fs/btrfs/volumes.h | 3 +++ 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 90d28a3727c6..4742734a73d7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7338,24 +7338,6 @@ wait_block_group_cache_done(struct btrfs_block_group_cache *cache) return ret; } -static const char *btrfs_raid_type_names[BTRFS_NR_RAID_TYPES] = { - [BTRFS_RAID_RAID10] = "raid10", - [BTRFS_RAID_RAID1] = "raid1", - [BTRFS_RAID_DUP]= "dup", - [BTRFS_RAID_RAID0] = "raid0", - [BTRFS_RAID_SINGLE] = "single", - [BTRFS_RAID_RAID5] = "raid5", - [BTRFS_RAID_RAID6] = "raid6", -}; - -static const char *get_raid_name(enum btrfs_raid_types type) -{ - if (type >= BTRFS_NR_RAID_TYPES) - return NULL; - - return btrfs_raid_type_names[type]; -} - enum btrfs_loop_type { LOOP_CACHING_NOWAIT = 0, LOOP_CACHING_WAIT = 1, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9c29fdca9075..5bb18ad6433d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -52,6 +52,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 2, .ncopies= 2, + .raid_name = "raid10", }, [BTRFS_RAID_RAID1] = { .sub_stripes= 1, @@ -61,6 +62,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 2, .ncopies= 2, + .raid_name = "raid1", }, [BTRFS_RAID_DUP] = { .sub_stripes= 1, @@ -70,6 +72,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 2, + .raid_name = "dup", }, [BTRFS_RAID_RAID0] = { .sub_stripes= 1, @@ -79,6 +82,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 1, + .raid_name = "raid0", }, [BTRFS_RAID_SINGLE] = { .sub_stripes= 1, @@ -88,6 +92,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 1, + .raid_name = "single", }, [BTRFS_RAID_RAID5] = { .sub_stripes= 1, @@ -97,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 1, .ncopies= 2, + .raid_name = "raid5", }, [BTRFS_RAID_RAID6] = { .sub_stripes= 1, @@ -106,9 +112,18 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 2, .devs_increment = 1, .ncopies= 3, + .raid_name = "raid6", }, }; +const char *get_raid_name(enum btrfs_raid_types type) +{ + if (type >= BTRFS_NR_RAID_TYPES) + return NULL; + + return btrfs_raid_array[type].raid_name; +} + const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = { [BTRFS_RAID_RAID10] = BTRFS_BLOCK_GROUP_RAID10, [BTRFS_RAID_RAID1] = BTRFS_BLOCK_GROUP_RAID1, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index ef220d541d4b..d6edff847871 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -342,6 +342,7 @@ struct btrfs_raid_attr { int tolerated_failures; /* max tolerated fail devs */ int devs_increment; /* ndevs has to be a multiple of this */ int ncopies;/* how many copies to data has */ + const char raid_name[8];/* name of the raid */ }; extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES]; @@ -563,6 +564,8 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */ } +const char *get_raid_name(enum btrfs_raid_types
[PATCH 3/3] btrfs: kill btrfs_raid_mindev_error[]
Add a new member struct btrfs_raid_attr::mindev_error so that btrfs_raid_array[] can maintain the error code to return if the minimum numnber of devices required condition is not met while trying to delete a device in the given raid. And so we can kill btrfs_raid_mindev_error[]. Signed-off-by: Anand Jain--- fs/btrfs/volumes.c | 24 fs/btrfs/volumes.h | 2 +- 2 files changed, 9 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index de3eea8b393e..14efa98b7183 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -54,6 +54,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 2, .raid_name = "raid10", .bg_flag= BTRFS_BLOCK_GROUP_RAID10, + .mindev_error = BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET, }, [BTRFS_RAID_RAID1] = { .sub_stripes= 1, @@ -65,6 +66,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 2, .raid_name = "raid1", .bg_flag= BTRFS_BLOCK_GROUP_RAID1, + .mindev_error = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET, }, [BTRFS_RAID_DUP] = { .sub_stripes= 1, @@ -76,6 +78,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 2, .raid_name = "dup", .bg_flag= BTRFS_BLOCK_GROUP_DUP, + .mindev_error = 0, }, [BTRFS_RAID_RAID0] = { .sub_stripes= 1, @@ -87,6 +90,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 1, .raid_name = "raid0", .bg_flag= BTRFS_BLOCK_GROUP_RAID0, + .mindev_error = 0, }, [BTRFS_RAID_SINGLE] = { .sub_stripes= 1, @@ -98,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 1, .raid_name = "single", .bg_flag= 0, + .mindev_error = 0, }, [BTRFS_RAID_RAID5] = { .sub_stripes= 1, @@ -109,6 +114,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 2, .raid_name = "raid5", .bg_flag= BTRFS_BLOCK_GROUP_RAID5, + .mindev_error = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET, }, [BTRFS_RAID_RAID6] = { .sub_stripes= 1, @@ -120,6 +126,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 3, .raid_name = "raid6", .bg_flag= BTRFS_BLOCK_GROUP_RAID6, + .mindev_error = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET, }, }; @@ -131,21 +138,6 @@ const char *get_raid_name(enum btrfs_raid_types type) return btrfs_raid_array[type].raid_name; } -/* - * Table to convert BTRFS_RAID_* to the error code if minimum number of devices - * condition is not met. Zero means there's no corresponding - * BTRFS_ERROR_DEV_*_NOT_MET value. - */ -const int btrfs_raid_mindev_error[BTRFS_NR_RAID_TYPES] = { - [BTRFS_RAID_RAID10] = BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET, - [BTRFS_RAID_RAID1] = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET, - [BTRFS_RAID_DUP]= 0, - [BTRFS_RAID_RAID0] = 0, - [BTRFS_RAID_SINGLE] = 0, - [BTRFS_RAID_RAID5] = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET, - [BTRFS_RAID_RAID6] = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET, -}; - static int init_first_rw_device(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info); @@ -1899,7 +1891,7 @@ static int btrfs_check_raid_min_devices(struct btrfs_fs_info *fs_info, continue; if (num_devices < btrfs_raid_array[i].devs_min) { - int ret = btrfs_raid_mindev_error[i]; + int ret = btrfs_raid_array[i].mindev_error; if (ret) return ret; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index d9c4c734d854..cc53a99116aa 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -344,10 +344,10 @@ struct btrfs_raid_attr { int ncopies;/* how many copies to data has */ const char raid_name[8];/* name of the raid */ u64 bg_flag;/* block group flag of the raid */ + int mindev_error; /* error code if min devs requisite is unmet */ }; extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES]; -extern const int
[PATCH 0/3] enhance btrfs_raid_array[]
Cleanup patches as in the individual change log. These patches were sent independently as they aren't related as such, but as I am updating the 1/3 the 2/3 would endup with conflict. So here I am putting all of them together with the conflict fixed at my end. And as such there is no change in 2/3 and 3/3 from its v1. Anand Jain (3): btrfs: kill btrfs_raid_type_names[] btrfs: kill btrfs_raid_group[] btrfs: kill btrfs_raid_mindev_error[] fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent-tree.c | 20 +-- fs/btrfs/volumes.c | 54 +++--- fs/btrfs/volumes.h | 7 +-- 4 files changed, 36 insertions(+), 47 deletions(-) -- 2.7.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: kill btrfs_raid_mindev_error[]
Add a new member struct btrfs_raid_attr::mindev_error so that btrfs_raid_array[] can maintain the error code to return if the minimum numnber of devices required condition is not met while trying to delete a device in the given raid. And so we can kill btrfs_raid_mindev_error[]. Signed-off-by: Anand Jain--- fs/btrfs/volumes.c | 24 fs/btrfs/volumes.h | 2 +- 2 files changed, 9 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index de3eea8b393e..14efa98b7183 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -54,6 +54,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 2, .raid_name = "raid10", .bg_flag= BTRFS_BLOCK_GROUP_RAID10, + .mindev_error = BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET, }, [BTRFS_RAID_RAID1] = { .sub_stripes= 1, @@ -65,6 +66,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 2, .raid_name = "raid1", .bg_flag= BTRFS_BLOCK_GROUP_RAID1, + .mindev_error = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET, }, [BTRFS_RAID_DUP] = { .sub_stripes= 1, @@ -76,6 +78,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 2, .raid_name = "dup", .bg_flag= BTRFS_BLOCK_GROUP_DUP, + .mindev_error = 0, }, [BTRFS_RAID_RAID0] = { .sub_stripes= 1, @@ -87,6 +90,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 1, .raid_name = "raid0", .bg_flag= BTRFS_BLOCK_GROUP_RAID0, + .mindev_error = 0, }, [BTRFS_RAID_SINGLE] = { .sub_stripes= 1, @@ -98,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 1, .raid_name = "single", .bg_flag= 0, + .mindev_error = 0, }, [BTRFS_RAID_RAID5] = { .sub_stripes= 1, @@ -109,6 +114,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 2, .raid_name = "raid5", .bg_flag= BTRFS_BLOCK_GROUP_RAID5, + .mindev_error = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET, }, [BTRFS_RAID_RAID6] = { .sub_stripes= 1, @@ -120,6 +126,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .ncopies= 3, .raid_name = "raid6", .bg_flag= BTRFS_BLOCK_GROUP_RAID6, + .mindev_error = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET, }, }; @@ -131,21 +138,6 @@ const char *get_raid_name(enum btrfs_raid_types type) return btrfs_raid_array[type].raid_name; } -/* - * Table to convert BTRFS_RAID_* to the error code if minimum number of devices - * condition is not met. Zero means there's no corresponding - * BTRFS_ERROR_DEV_*_NOT_MET value. - */ -const int btrfs_raid_mindev_error[BTRFS_NR_RAID_TYPES] = { - [BTRFS_RAID_RAID10] = BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET, - [BTRFS_RAID_RAID1] = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET, - [BTRFS_RAID_DUP]= 0, - [BTRFS_RAID_RAID0] = 0, - [BTRFS_RAID_SINGLE] = 0, - [BTRFS_RAID_RAID5] = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET, - [BTRFS_RAID_RAID6] = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET, -}; - static int init_first_rw_device(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info); @@ -1899,7 +1891,7 @@ static int btrfs_check_raid_min_devices(struct btrfs_fs_info *fs_info, continue; if (num_devices < btrfs_raid_array[i].devs_min) { - int ret = btrfs_raid_mindev_error[i]; + int ret = btrfs_raid_array[i].mindev_error; if (ret) return ret; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 8417cb7059f6..8070a734877a 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -344,10 +344,10 @@ struct btrfs_raid_attr { int ncopies;/* how many copies to data has */ char *raid_name;/* name of the raid */ u64 bg_flag;/* block group flag of the raid */ + int mindev_error; /* error code if min devs requisite is unmet */ }; extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES]; -extern const int
[PATCH] btrfs: kill btrfs_raid_group[]
Add a new member struct btrfs_raid_attr::bg_flag so that btrfs_raid_array[] can maintain the bit map flag of the raid type, and so we can kill btrfs_raid_group[]. Signed-off-by: Anand Jain--- fs/btrfs/disk-io.c | 2 +- fs/btrfs/extent-tree.c | 2 +- fs/btrfs/volumes.c | 19 --- fs/btrfs/volumes.h | 2 +- 4 files changed, 11 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b62559dfb053..2fa063c3ccec 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3525,7 +3525,7 @@ int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags) for (raid_type = 0; raid_type < BTRFS_NR_RAID_TYPES; raid_type++) { if (raid_type == BTRFS_RAID_SINGLE) continue; - if (!(flags & btrfs_raid_group[raid_type])) + if (!(flags & btrfs_raid_array[raid_type].bg_flag)) continue; min_tolerated = min(min_tolerated, btrfs_raid_array[raid_type]. diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 4742734a73d7..19b4e24854ca 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4178,7 +4178,7 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_fs_info *fs_info, u64 flags) /* First, mask out the RAID levels which aren't possible */ for (raid_type = 0; raid_type < BTRFS_NR_RAID_TYPES; raid_type++) { if (num_devices >= btrfs_raid_array[raid_type].devs_min) - allowed |= btrfs_raid_group[raid_type]; + allowed |= btrfs_raid_array[raid_type].bg_flag; } allowed &= flags; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5bb18ad6433d..de3eea8b393e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -53,6 +53,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 2, .ncopies= 2, .raid_name = "raid10", + .bg_flag= BTRFS_BLOCK_GROUP_RAID10, }, [BTRFS_RAID_RAID1] = { .sub_stripes= 1, @@ -63,6 +64,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 2, .ncopies= 2, .raid_name = "raid1", + .bg_flag= BTRFS_BLOCK_GROUP_RAID1, }, [BTRFS_RAID_DUP] = { .sub_stripes= 1, @@ -73,6 +75,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 2, .raid_name = "dup", + .bg_flag= BTRFS_BLOCK_GROUP_DUP, }, [BTRFS_RAID_RAID0] = { .sub_stripes= 1, @@ -83,6 +86,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 1, .raid_name = "raid0", + .bg_flag= BTRFS_BLOCK_GROUP_RAID0, }, [BTRFS_RAID_SINGLE] = { .sub_stripes= 1, @@ -93,6 +97,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 1, .raid_name = "single", + .bg_flag= 0, }, [BTRFS_RAID_RAID5] = { .sub_stripes= 1, @@ -103,6 +108,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 2, .raid_name = "raid5", + .bg_flag= BTRFS_BLOCK_GROUP_RAID5, }, [BTRFS_RAID_RAID6] = { .sub_stripes= 1, @@ -113,6 +119,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .devs_increment = 1, .ncopies= 3, .raid_name = "raid6", + .bg_flag= BTRFS_BLOCK_GROUP_RAID6, }, }; @@ -124,16 +131,6 @@ const char *get_raid_name(enum btrfs_raid_types type) return btrfs_raid_array[type].raid_name; } -const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = { - [BTRFS_RAID_RAID10] = BTRFS_BLOCK_GROUP_RAID10, - [BTRFS_RAID_RAID1] = BTRFS_BLOCK_GROUP_RAID1, - [BTRFS_RAID_DUP]= BTRFS_BLOCK_GROUP_DUP, - [BTRFS_RAID_RAID0] = BTRFS_BLOCK_GROUP_RAID0, - [BTRFS_RAID_SINGLE] = 0, - [BTRFS_RAID_RAID5] = BTRFS_BLOCK_GROUP_RAID5, - [BTRFS_RAID_RAID6] = BTRFS_BLOCK_GROUP_RAID6, -}; - /* * Table to convert BTRFS_RAID_* to the error code if minimum number of devices * condition is not met. Zero means there's no corresponding @@ -1898,7 +1895,7 @@ static int btrfs_check_raid_min_devices(struct btrfs_fs_info
Re: [PATCH v2] btrfs: kill btrfs_raid_type_names[]
On Wed, Apr 25, 2018 at 05:44:13PM +0800, Anand Jain wrote: > Add a new member struct btrfs_raid_attr::raid_name so that > btrfs_raid_array[] can maintain the name of the raid type, > and so we can kill btrfs_raid_type_names[]. > > Signed-off-by: Anand Jain> Reviewed-by: Qu Wenruo > Reviewed-by: Nikolay Borisov > --- > v1->v2: > add space after =. Such as.. >+ .raid_name = "raid10", > ^ > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -342,6 +342,7 @@ struct btrfs_raid_attr { > int tolerated_failures; /* max tolerated fail devs */ > int devs_increment; /* ndevs has to be a multiple of this */ > int ncopies;/* how many copies to data has */ > + char *raid_name;/* name of the raid */ There was another comment under v1: const char raid_name[8] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: kill btrfs_raid_type_names[]
Add a new member struct btrfs_raid_attr::raid_name so that btrfs_raid_array[] can maintain the name of the raid type, and so we can kill btrfs_raid_type_names[]. Signed-off-by: Anand JainReviewed-by: Qu Wenruo Reviewed-by: Nikolay Borisov --- v1->v2: add space after =. Such as.. + .raid_name = "raid10", ^ fs/btrfs/extent-tree.c | 18 -- fs/btrfs/volumes.c | 15 +++ fs/btrfs/volumes.h | 3 +++ 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 90d28a3727c6..4742734a73d7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7338,24 +7338,6 @@ wait_block_group_cache_done(struct btrfs_block_group_cache *cache) return ret; } -static const char *btrfs_raid_type_names[BTRFS_NR_RAID_TYPES] = { - [BTRFS_RAID_RAID10] = "raid10", - [BTRFS_RAID_RAID1] = "raid1", - [BTRFS_RAID_DUP]= "dup", - [BTRFS_RAID_RAID0] = "raid0", - [BTRFS_RAID_SINGLE] = "single", - [BTRFS_RAID_RAID5] = "raid5", - [BTRFS_RAID_RAID6] = "raid6", -}; - -static const char *get_raid_name(enum btrfs_raid_types type) -{ - if (type >= BTRFS_NR_RAID_TYPES) - return NULL; - - return btrfs_raid_type_names[type]; -} - enum btrfs_loop_type { LOOP_CACHING_NOWAIT = 0, LOOP_CACHING_WAIT = 1, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9c29fdca9075..5bb18ad6433d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -52,6 +52,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 2, .ncopies= 2, + .raid_name = "raid10", }, [BTRFS_RAID_RAID1] = { .sub_stripes= 1, @@ -61,6 +62,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 2, .ncopies= 2, + .raid_name = "raid1", }, [BTRFS_RAID_DUP] = { .sub_stripes= 1, @@ -70,6 +72,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 2, + .raid_name = "dup", }, [BTRFS_RAID_RAID0] = { .sub_stripes= 1, @@ -79,6 +82,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 1, + .raid_name = "raid0", }, [BTRFS_RAID_SINGLE] = { .sub_stripes= 1, @@ -88,6 +92,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 1, + .raid_name = "single", }, [BTRFS_RAID_RAID5] = { .sub_stripes= 1, @@ -97,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 1, .ncopies= 2, + .raid_name = "raid5", }, [BTRFS_RAID_RAID6] = { .sub_stripes= 1, @@ -106,9 +112,18 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 2, .devs_increment = 1, .ncopies= 3, + .raid_name = "raid6", }, }; +const char *get_raid_name(enum btrfs_raid_types type) +{ + if (type >= BTRFS_NR_RAID_TYPES) + return NULL; + + return btrfs_raid_array[type].raid_name; +} + const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = { [BTRFS_RAID_RAID10] = BTRFS_BLOCK_GROUP_RAID10, [BTRFS_RAID_RAID1] = BTRFS_BLOCK_GROUP_RAID1, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index ef220d541d4b..2acd32ce1573 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -342,6 +342,7 @@ struct btrfs_raid_attr { int tolerated_failures; /* max tolerated fail devs */ int devs_increment; /* ndevs has to be a multiple of this */ int ncopies;/* how many copies to data has */ + char *raid_name;/* name of the raid */ }; extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES]; @@ -563,6 +564,8 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */ } +const char *get_raid_name(enum btrfs_raid_types
Re: [PATCH] btrfs: kill btrfs_raid_type_names[]
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index ef220d541d4b..2acd32ce1573 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -342,6 +342,7 @@ struct btrfs_raid_attr { int tolerated_failures; /* max tolerated fail devs */ int devs_increment; /* ndevs has to be a multiple of this */ int ncopies;/* how many copies to data has */ + char *raid_name;/* name of the raid */ nit: const char * ? Its declared as const when allocated. const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: kill btrfs_raid_type_names[]
On Wed, Apr 25, 2018 at 03:26:08PM +0800, Anand Jain wrote: > Add a new member struct btrfs_raid_attr::raid_name so that > btrfs_raid_array[] can maintain the name of the raid type, > and so we can kill btrfs_raid_type_names[]. > > Signed-off-by: Anand JainOk, nice. > + .raid_name ="raid10", .raid_name = "raid10", spaces around binary operators > }, > [BTRFS_RAID_RAID1] = { > .sub_stripes= 1, > @@ -61,6 +62,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 1, > .devs_increment = 2, > .ncopies= 2, > + .raid_name ="raid1", > }, > [BTRFS_RAID_DUP] = { > .sub_stripes= 1, > @@ -70,6 +72,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 2, > + .raid_name ="dup", > }, > [BTRFS_RAID_RAID0] = { > .sub_stripes= 1, > @@ -79,6 +82,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 1, > + .raid_name ="raid0", > }, > [BTRFS_RAID_SINGLE] = { > .sub_stripes= 1, > @@ -88,6 +92,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 1, > + .raid_name ="single", > }, > [BTRFS_RAID_RAID5] = { > .sub_stripes= 1, > @@ -97,6 +102,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 1, > .devs_increment = 1, > .ncopies= 2, > + .raid_name ="raid5", > }, > [BTRFS_RAID_RAID6] = { > .sub_stripes= 1, > @@ -106,9 +112,18 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 2, > .devs_increment = 1, > .ncopies= 3, > + .raid_name ="raid6", > }, > }; > > +const char *get_raid_name(enum btrfs_raid_types type) > +{ > + if (type >= BTRFS_NR_RAID_TYPES) > + return NULL; > + > + return btrfs_raid_array[type].raid_name; > +} > + > const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = { > [BTRFS_RAID_RAID10] = BTRFS_BLOCK_GROUP_RAID10, > [BTRFS_RAID_RAID1] = BTRFS_BLOCK_GROUP_RAID1, > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index ef220d541d4b..2acd32ce1573 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -342,6 +342,7 @@ struct btrfs_raid_attr { > int tolerated_failures; /* max tolerated fail devs */ > int devs_increment; /* ndevs has to be a multiple of this */ > int ncopies;/* how many copies to data has */ > + char *raid_name;/* name of the raid */ const char raid_name[8]; This stores the name in the array that is const, so there's no indirection and extra pointers. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] Btrfs: enchanse raid1/10 balance heuristic
On 2018/04/25 17:15, Timofey Titovets wrote: > 2018-04-25 10:54 GMT+03:00 Misono Tomohiro: >> On 2018/04/25 9:20, Timofey Titovets wrote: >>> Currently btrfs raid1/10 balancer bаlance requests to mirrors, >>> based on pid % num of mirrors. >>> >>> Make logic understood: >>> - if one of underline devices are non rotational >>> - Queue leght to underline devices >>> >>> By default try use pid % num_mirrors guessing, but: >>> - If one of mirrors are non rotational, repick optimal to it >>> - If underline mirror have less queue leght then optimal, >>>repick to that mirror >>> >>> For avoid round-robin request balancing, >>> lets round down queue leght: >>> - By 8 for rotational devs >>> - By 2 for all non rotational devs >>> >>> Changes: >>> v1 -> v2: >>> - Use helper part_in_flight() from genhd.c >>> to get queue lenght >>> - Move guess code to guess_optimal() >>> - Change balancer logic, try use pid % mirror by default >>> Make balancing on spinning rust if one of underline devices >>> are overloaded >>> v2 -> v3: >>> - Fix arg for RAID10 - use sub_stripes, instead of num_stripes >>> v3 -> v4: >>> - Rebased on latest misc-next >>> >>> Signed-off-by: Timofey Titovets >>> --- >>> block/genhd.c | 1 + >>> fs/btrfs/volumes.c | 111 - >>> 2 files changed, 110 insertions(+), 2 deletions(-) >>> >>> diff --git a/block/genhd.c b/block/genhd.c >>> index 9656f9e9f99e..5ea5acc88d3c 100644 >>> --- a/block/genhd.c >>> +++ b/block/genhd.c >>> @@ -81,6 +81,7 @@ void part_in_flight(struct request_queue *q, struct >>> hd_struct *part, >>> atomic_read(>in_flight[1]); >>> } >>> } >>> +EXPORT_SYMBOL_GPL(part_in_flight); >>> >>> struct hd_struct *__disk_get_part(struct gendisk *disk, int partno) >>> { >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>> index c95af358b71f..fa7dd6ac087f 100644 >>> --- a/fs/btrfs/volumes.c >>> +++ b/fs/btrfs/volumes.c >>> @@ -16,6 +16,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include "ctree.h" >>> #include "extent_map.h" >>> @@ -5148,7 +5149,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, >>> u64 logical, u64 len) >>> /* >>>* There could be two corrupted data stripes, we need >>>* to loop retry in order to rebuild the correct data. >>> - * >>> + * >>>* Fail a stripe at a time on every retry except the >>>* stripe under reconstruction. >>>*/ >>> @@ -5201,6 +5202,111 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info >>> *fs_info, u64 logical, u64 len) >>> return ret; >>> } >>> >>> +/** >>> + * bdev_get_queue_len - return rounded down in flight queue lenght of bdev >>> + * >>> + * @bdev: target bdev >>> + * @round_down: round factor big for hdd and small for ssd, like 8 and 2 >>> + */ >>> +static int bdev_get_queue_len(struct block_device *bdev, int round_down) >>> +{ >>> + int sum; >>> + struct hd_struct *bd_part = bdev->bd_part; >>> + struct request_queue *rq = bdev_get_queue(bdev); >>> + uint32_t inflight[2] = {0, 0}; >>> + >>> + part_in_flight(rq, bd_part, inflight); >>> + >>> + sum = max_t(uint32_t, inflight[0], inflight[1]); >>> + >>> + /* >>> + * Try prevent switch for every sneeze >>> + * By roundup output num by some value >>> + */ >>> + return ALIGN_DOWN(sum, round_down); >>> +} >>> + >>> +/** >>> + * guess_optimal - return guessed optimal mirror >>> + * >>> + * Optimal expected to be pid % num_stripes >>> + * >>> + * That's generaly ok for spread load >>> + * Add some balancer based on queue leght to device >>> + * >>> + * Basic ideas: >>> + * - Sequential read generate low amount of request >>> + *so if load of drives are equal, use pid % num_stripes balancing >>> + * - For mixed rotate/non-rotate mirrors, pick non-rotate as optimal >>> + *and repick if other dev have "significant" less queue lenght >> >> The code looks always choosing the queue with the lowest length regardless >> of the amount of queue length difference. So, this "significant" may be >> wrong? > > yes, but before code looks at queue len, we do round_down by 8, > may be you confused because i hide ALIGN_DOWN in bdev_get_queue_len() > > I'm not think what this is "best" leveling queue leveling solution, > but that's works. > In theory ABS() <> some_num, can be used, but that's will lead to > random send of some requests to hdd. > > i.e. for ABS() > 8: > ssd hdd > 90 -> hdd > 91 -> ssd > 10 1 -> hdd > 10 2 -> ssd > And so on. > > So in general that will looks like: > ssd rount_down(7, 8) = 0, hdd round_down(0, 8) = 0 > ssd will be used > And > ssd round_down(9, 8) = 8, hdd round_down(0,8) = 0 > hdd will be used, while hdd qlen < 8. > > i.e. 'significant' depends on
Re: [PATCH V4] Btrfs: enchanse raid1/10 balance heuristic
2018-04-25 10:54 GMT+03:00 Misono Tomohiro: > On 2018/04/25 9:20, Timofey Titovets wrote: >> Currently btrfs raid1/10 balancer bаlance requests to mirrors, >> based on pid % num of mirrors. >> >> Make logic understood: >> - if one of underline devices are non rotational >> - Queue leght to underline devices >> >> By default try use pid % num_mirrors guessing, but: >> - If one of mirrors are non rotational, repick optimal to it >> - If underline mirror have less queue leght then optimal, >>repick to that mirror >> >> For avoid round-robin request balancing, >> lets round down queue leght: >> - By 8 for rotational devs >> - By 2 for all non rotational devs >> >> Changes: >> v1 -> v2: >> - Use helper part_in_flight() from genhd.c >> to get queue lenght >> - Move guess code to guess_optimal() >> - Change balancer logic, try use pid % mirror by default >> Make balancing on spinning rust if one of underline devices >> are overloaded >> v2 -> v3: >> - Fix arg for RAID10 - use sub_stripes, instead of num_stripes >> v3 -> v4: >> - Rebased on latest misc-next >> >> Signed-off-by: Timofey Titovets >> --- >> block/genhd.c | 1 + >> fs/btrfs/volumes.c | 111 - >> 2 files changed, 110 insertions(+), 2 deletions(-) >> >> diff --git a/block/genhd.c b/block/genhd.c >> index 9656f9e9f99e..5ea5acc88d3c 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -81,6 +81,7 @@ void part_in_flight(struct request_queue *q, struct >> hd_struct *part, >> atomic_read(>in_flight[1]); >> } >> } >> +EXPORT_SYMBOL_GPL(part_in_flight); >> >> struct hd_struct *__disk_get_part(struct gendisk *disk, int partno) >> { >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index c95af358b71f..fa7dd6ac087f 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -16,6 +16,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include "ctree.h" >> #include "extent_map.h" >> @@ -5148,7 +5149,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, >> u64 logical, u64 len) >> /* >>* There could be two corrupted data stripes, we need >>* to loop retry in order to rebuild the correct data. >> - * >> + * >>* Fail a stripe at a time on every retry except the >>* stripe under reconstruction. >>*/ >> @@ -5201,6 +5202,111 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info >> *fs_info, u64 logical, u64 len) >> return ret; >> } >> >> +/** >> + * bdev_get_queue_len - return rounded down in flight queue lenght of bdev >> + * >> + * @bdev: target bdev >> + * @round_down: round factor big for hdd and small for ssd, like 8 and 2 >> + */ >> +static int bdev_get_queue_len(struct block_device *bdev, int round_down) >> +{ >> + int sum; >> + struct hd_struct *bd_part = bdev->bd_part; >> + struct request_queue *rq = bdev_get_queue(bdev); >> + uint32_t inflight[2] = {0, 0}; >> + >> + part_in_flight(rq, bd_part, inflight); >> + >> + sum = max_t(uint32_t, inflight[0], inflight[1]); >> + >> + /* >> + * Try prevent switch for every sneeze >> + * By roundup output num by some value >> + */ >> + return ALIGN_DOWN(sum, round_down); >> +} >> + >> +/** >> + * guess_optimal - return guessed optimal mirror >> + * >> + * Optimal expected to be pid % num_stripes >> + * >> + * That's generaly ok for spread load >> + * Add some balancer based on queue leght to device >> + * >> + * Basic ideas: >> + * - Sequential read generate low amount of request >> + *so if load of drives are equal, use pid % num_stripes balancing >> + * - For mixed rotate/non-rotate mirrors, pick non-rotate as optimal >> + *and repick if other dev have "significant" less queue lenght > > The code looks always choosing the queue with the lowest length regardless > of the amount of queue length difference. So, this "significant" may be wrong? yes, but before code looks at queue len, we do round_down by 8, may be you confused because i hide ALIGN_DOWN in bdev_get_queue_len() I'm not think what this is "best" leveling queue leveling solution, but that's works. In theory ABS() <> some_num, can be used, but that's will lead to random send of some requests to hdd. i.e. for ABS() > 8: ssd hdd 90 -> hdd 91 -> ssd 10 1 -> hdd 10 2 -> ssd And so on. So in general that will looks like: ssd rount_down(7, 8) = 0, hdd round_down(0, 8) = 0 ssd will be used And ssd round_down(9, 8) = 8, hdd round_down(0,8) = 0 hdd will be used, while hdd qlen < 8. i.e. 'significant' depends on round_down factor. Thanks. >> + * - Repick optimal if queue leght of other mirror are less >> + */ >> +static int guess_optimal(struct map_lookup *map, int num, int optimal) >> +{ >> + int i; >> + int
Re: [PATCH V4] Btrfs: enchanse raid1/10 balance heuristic
On 2018/04/25 9:20, Timofey Titovets wrote: > Currently btrfs raid1/10 balancer bаlance requests to mirrors, > based on pid % num of mirrors. > > Make logic understood: > - if one of underline devices are non rotational > - Queue leght to underline devices > > By default try use pid % num_mirrors guessing, but: > - If one of mirrors are non rotational, repick optimal to it > - If underline mirror have less queue leght then optimal, >repick to that mirror > > For avoid round-robin request balancing, > lets round down queue leght: > - By 8 for rotational devs > - By 2 for all non rotational devs > > Changes: > v1 -> v2: > - Use helper part_in_flight() from genhd.c > to get queue lenght > - Move guess code to guess_optimal() > - Change balancer logic, try use pid % mirror by default > Make balancing on spinning rust if one of underline devices > are overloaded > v2 -> v3: > - Fix arg for RAID10 - use sub_stripes, instead of num_stripes > v3 -> v4: > - Rebased on latest misc-next > > Signed-off-by: Timofey Titovets> --- > block/genhd.c | 1 + > fs/btrfs/volumes.c | 111 - > 2 files changed, 110 insertions(+), 2 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 9656f9e9f99e..5ea5acc88d3c 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -81,6 +81,7 @@ void part_in_flight(struct request_queue *q, struct > hd_struct *part, > atomic_read(>in_flight[1]); > } > } > +EXPORT_SYMBOL_GPL(part_in_flight); > > struct hd_struct *__disk_get_part(struct gendisk *disk, int partno) > { > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index c95af358b71f..fa7dd6ac087f 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include "ctree.h" > #include "extent_map.h" > @@ -5148,7 +5149,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 > logical, u64 len) > /* >* There could be two corrupted data stripes, we need >* to loop retry in order to rebuild the correct data. > - * > + * >* Fail a stripe at a time on every retry except the >* stripe under reconstruction. >*/ > @@ -5201,6 +5202,111 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info > *fs_info, u64 logical, u64 len) > return ret; > } > > +/** > + * bdev_get_queue_len - return rounded down in flight queue lenght of bdev > + * > + * @bdev: target bdev > + * @round_down: round factor big for hdd and small for ssd, like 8 and 2 > + */ > +static int bdev_get_queue_len(struct block_device *bdev, int round_down) > +{ > + int sum; > + struct hd_struct *bd_part = bdev->bd_part; > + struct request_queue *rq = bdev_get_queue(bdev); > + uint32_t inflight[2] = {0, 0}; > + > + part_in_flight(rq, bd_part, inflight); > + > + sum = max_t(uint32_t, inflight[0], inflight[1]); > + > + /* > + * Try prevent switch for every sneeze > + * By roundup output num by some value > + */ > + return ALIGN_DOWN(sum, round_down); > +} > + > +/** > + * guess_optimal - return guessed optimal mirror > + * > + * Optimal expected to be pid % num_stripes > + * > + * That's generaly ok for spread load > + * Add some balancer based on queue leght to device > + * > + * Basic ideas: > + * - Sequential read generate low amount of request > + *so if load of drives are equal, use pid % num_stripes balancing > + * - For mixed rotate/non-rotate mirrors, pick non-rotate as optimal > + *and repick if other dev have "significant" less queue lenght The code looks always choosing the queue with the lowest length regardless of the amount of queue length difference. So, this "significant" may be wrong? > + * - Repick optimal if queue leght of other mirror are less > + */ > +static int guess_optimal(struct map_lookup *map, int num, int optimal) > +{ > + int i; > + int round_down = 8; > + int qlen[num]; > + bool is_nonrot[num]; > + bool all_bdev_nonrot = true; > + bool all_bdev_rotate = true; > + struct block_device *bdev; > + > + if (num == 1) > + return optimal; > + > + /* Check accessible bdevs */ > + for (i = 0; i < num; i++) { > + /* Init for missing bdevs */ > + is_nonrot[i] = false; > + qlen[i] = INT_MAX; > + bdev = map->stripes[i].dev->bdev; > + if (bdev) { > + qlen[i] = 0; > + is_nonrot[i] = blk_queue_nonrot(bdev_get_queue(bdev)); > + if (is_nonrot[i]) > + all_bdev_rotate = false; > + else > + all_bdev_nonrot = false; > + } > + } > + > + /* > + * Don't
Re: [PATCH] btrfs: kill btrfs_raid_type_names[]
On 2018年04月25日 15:26, Anand Jain wrote: > Add a new member struct btrfs_raid_attr::raid_name so that > btrfs_raid_array[] can maintain the name of the raid type, > and so we can kill btrfs_raid_type_names[]. > > Signed-off-by: Anand JainLooks much better than the old way. Reviewed-by: Qu Wenruo Thanks, Qu > --- > fs/btrfs/extent-tree.c | 18 -- > fs/btrfs/volumes.c | 15 +++ > fs/btrfs/volumes.h | 3 +++ > 3 files changed, 18 insertions(+), 18 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 90d28a3727c6..4742734a73d7 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -7338,24 +7338,6 @@ wait_block_group_cache_done(struct > btrfs_block_group_cache *cache) > return ret; > } > > -static const char *btrfs_raid_type_names[BTRFS_NR_RAID_TYPES] = { > - [BTRFS_RAID_RAID10] = "raid10", > - [BTRFS_RAID_RAID1] = "raid1", > - [BTRFS_RAID_DUP]= "dup", > - [BTRFS_RAID_RAID0] = "raid0", > - [BTRFS_RAID_SINGLE] = "single", > - [BTRFS_RAID_RAID5] = "raid5", > - [BTRFS_RAID_RAID6] = "raid6", > -}; > - > -static const char *get_raid_name(enum btrfs_raid_types type) > -{ > - if (type >= BTRFS_NR_RAID_TYPES) > - return NULL; > - > - return btrfs_raid_type_names[type]; > -} > - > enum btrfs_loop_type { > LOOP_CACHING_NOWAIT = 0, > LOOP_CACHING_WAIT = 1, > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 9c29fdca9075..04b8d602dc08 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -52,6 +52,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 1, > .devs_increment = 2, > .ncopies= 2, > + .raid_name ="raid10", > }, > [BTRFS_RAID_RAID1] = { > .sub_stripes= 1, > @@ -61,6 +62,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 1, > .devs_increment = 2, > .ncopies= 2, > + .raid_name ="raid1", > }, > [BTRFS_RAID_DUP] = { > .sub_stripes= 1, > @@ -70,6 +72,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 2, > + .raid_name ="dup", > }, > [BTRFS_RAID_RAID0] = { > .sub_stripes= 1, > @@ -79,6 +82,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 1, > + .raid_name ="raid0", > }, > [BTRFS_RAID_SINGLE] = { > .sub_stripes= 1, > @@ -88,6 +92,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 1, > + .raid_name ="single", > }, > [BTRFS_RAID_RAID5] = { > .sub_stripes= 1, > @@ -97,6 +102,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 1, > .devs_increment = 1, > .ncopies= 2, > + .raid_name ="raid5", > }, > [BTRFS_RAID_RAID6] = { > .sub_stripes= 1, > @@ -106,9 +112,18 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 2, > .devs_increment = 1, > .ncopies= 3, > + .raid_name ="raid6", > }, > }; > > +const char *get_raid_name(enum btrfs_raid_types type) > +{ > + if (type >= BTRFS_NR_RAID_TYPES) > + return NULL; > + > + return btrfs_raid_array[type].raid_name; > +} > + > const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = { > [BTRFS_RAID_RAID10] = BTRFS_BLOCK_GROUP_RAID10, > [BTRFS_RAID_RAID1] = BTRFS_BLOCK_GROUP_RAID1, > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index ef220d541d4b..2acd32ce1573 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -342,6 +342,7 @@ struct btrfs_raid_attr { > int tolerated_failures; /* max tolerated fail devs */ > int devs_increment; /* ndevs has to be a multiple of this */ > int ncopies;/* how many copies to data has */ > + char *raid_name;/* name of the raid */ > }; > > extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES]; > @@ -563,6 +564,8 @@ static inline enum btrfs_raid_types > btrfs_bg_flags_to_raid_index(u64 flags) > return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */ > } > > +const
Re: [PATCH] btrfs: kill btrfs_raid_type_names[]
On 25.04.2018 10:26, Anand Jain wrote: > Add a new member struct btrfs_raid_attr::raid_name so that > btrfs_raid_array[] can maintain the name of the raid type, > and so we can kill btrfs_raid_type_names[]. > > Signed-off-by: Anand JainMakes sense: Reviewed-by: Nikolay Borisov > --- > fs/btrfs/extent-tree.c | 18 -- > fs/btrfs/volumes.c | 15 +++ > fs/btrfs/volumes.h | 3 +++ > 3 files changed, 18 insertions(+), 18 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 90d28a3727c6..4742734a73d7 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -7338,24 +7338,6 @@ wait_block_group_cache_done(struct > btrfs_block_group_cache *cache) > return ret; > } > > -static const char *btrfs_raid_type_names[BTRFS_NR_RAID_TYPES] = { > - [BTRFS_RAID_RAID10] = "raid10", > - [BTRFS_RAID_RAID1] = "raid1", > - [BTRFS_RAID_DUP]= "dup", > - [BTRFS_RAID_RAID0] = "raid0", > - [BTRFS_RAID_SINGLE] = "single", > - [BTRFS_RAID_RAID5] = "raid5", > - [BTRFS_RAID_RAID6] = "raid6", > -}; > - > -static const char *get_raid_name(enum btrfs_raid_types type) > -{ > - if (type >= BTRFS_NR_RAID_TYPES) > - return NULL; > - > - return btrfs_raid_type_names[type]; > -} > - > enum btrfs_loop_type { > LOOP_CACHING_NOWAIT = 0, > LOOP_CACHING_WAIT = 1, > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 9c29fdca9075..04b8d602dc08 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -52,6 +52,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 1, > .devs_increment = 2, > .ncopies= 2, > + .raid_name ="raid10", > }, > [BTRFS_RAID_RAID1] = { > .sub_stripes= 1, > @@ -61,6 +62,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 1, > .devs_increment = 2, > .ncopies= 2, > + .raid_name ="raid1", > }, > [BTRFS_RAID_DUP] = { > .sub_stripes= 1, > @@ -70,6 +72,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 2, > + .raid_name ="dup", > }, > [BTRFS_RAID_RAID0] = { > .sub_stripes= 1, > @@ -79,6 +82,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 1, > + .raid_name ="raid0", > }, > [BTRFS_RAID_SINGLE] = { > .sub_stripes= 1, > @@ -88,6 +92,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 0, > .devs_increment = 1, > .ncopies= 1, > + .raid_name ="single", > }, > [BTRFS_RAID_RAID5] = { > .sub_stripes= 1, > @@ -97,6 +102,7 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 1, > .devs_increment = 1, > .ncopies= 2, > + .raid_name ="raid5", > }, > [BTRFS_RAID_RAID6] = { > .sub_stripes= 1, > @@ -106,9 +112,18 @@ const struct btrfs_raid_attr > btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .tolerated_failures = 2, > .devs_increment = 1, > .ncopies= 3, > + .raid_name ="raid6", > }, > }; > > +const char *get_raid_name(enum btrfs_raid_types type) > +{ > + if (type >= BTRFS_NR_RAID_TYPES) > + return NULL; > + > + return btrfs_raid_array[type].raid_name; > +} > + > const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = { > [BTRFS_RAID_RAID10] = BTRFS_BLOCK_GROUP_RAID10, > [BTRFS_RAID_RAID1] = BTRFS_BLOCK_GROUP_RAID1, > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index ef220d541d4b..2acd32ce1573 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -342,6 +342,7 @@ struct btrfs_raid_attr { > int tolerated_failures; /* max tolerated fail devs */ > int devs_increment; /* ndevs has to be a multiple of this */ > int ncopies;/* how many copies to data has */ > + char *raid_name;/* name of the raid */ nit: const char * ? > }; > > extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES]; > @@ -563,6 +564,8 @@ static inline enum btrfs_raid_types > btrfs_bg_flags_to_raid_index(u64 flags) > return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */ > } > > +const char
[PATCH] btrfs: kill btrfs_raid_type_names[]
Add a new member struct btrfs_raid_attr::raid_name so that btrfs_raid_array[] can maintain the name of the raid type, and so we can kill btrfs_raid_type_names[]. Signed-off-by: Anand Jain--- fs/btrfs/extent-tree.c | 18 -- fs/btrfs/volumes.c | 15 +++ fs/btrfs/volumes.h | 3 +++ 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 90d28a3727c6..4742734a73d7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7338,24 +7338,6 @@ wait_block_group_cache_done(struct btrfs_block_group_cache *cache) return ret; } -static const char *btrfs_raid_type_names[BTRFS_NR_RAID_TYPES] = { - [BTRFS_RAID_RAID10] = "raid10", - [BTRFS_RAID_RAID1] = "raid1", - [BTRFS_RAID_DUP]= "dup", - [BTRFS_RAID_RAID0] = "raid0", - [BTRFS_RAID_SINGLE] = "single", - [BTRFS_RAID_RAID5] = "raid5", - [BTRFS_RAID_RAID6] = "raid6", -}; - -static const char *get_raid_name(enum btrfs_raid_types type) -{ - if (type >= BTRFS_NR_RAID_TYPES) - return NULL; - - return btrfs_raid_type_names[type]; -} - enum btrfs_loop_type { LOOP_CACHING_NOWAIT = 0, LOOP_CACHING_WAIT = 1, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 9c29fdca9075..04b8d602dc08 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -52,6 +52,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 2, .ncopies= 2, + .raid_name ="raid10", }, [BTRFS_RAID_RAID1] = { .sub_stripes= 1, @@ -61,6 +62,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 2, .ncopies= 2, + .raid_name ="raid1", }, [BTRFS_RAID_DUP] = { .sub_stripes= 1, @@ -70,6 +72,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 2, + .raid_name ="dup", }, [BTRFS_RAID_RAID0] = { .sub_stripes= 1, @@ -79,6 +82,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 1, + .raid_name ="raid0", }, [BTRFS_RAID_SINGLE] = { .sub_stripes= 1, @@ -88,6 +92,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 0, .devs_increment = 1, .ncopies= 1, + .raid_name ="single", }, [BTRFS_RAID_RAID5] = { .sub_stripes= 1, @@ -97,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 1, .devs_increment = 1, .ncopies= 2, + .raid_name ="raid5", }, [BTRFS_RAID_RAID6] = { .sub_stripes= 1, @@ -106,9 +112,18 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .tolerated_failures = 2, .devs_increment = 1, .ncopies= 3, + .raid_name ="raid6", }, }; +const char *get_raid_name(enum btrfs_raid_types type) +{ + if (type >= BTRFS_NR_RAID_TYPES) + return NULL; + + return btrfs_raid_array[type].raid_name; +} + const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = { [BTRFS_RAID_RAID10] = BTRFS_BLOCK_GROUP_RAID10, [BTRFS_RAID_RAID1] = BTRFS_BLOCK_GROUP_RAID1, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index ef220d541d4b..2acd32ce1573 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -342,6 +342,7 @@ struct btrfs_raid_attr { int tolerated_failures; /* max tolerated fail devs */ int devs_increment; /* ndevs has to be a multiple of this */ int ncopies;/* how many copies to data has */ + char *raid_name;/* name of the raid */ }; extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES]; @@ -563,6 +564,8 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) return BTRFS_RAID_SINGLE; /* BTRFS_BLOCK_GROUP_SINGLE */ } +const char *get_raid_name(enum btrfs_raid_types type); + void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info); void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans); -- 2.15.0 -- To unsubscribe from this list: send the line
Re: Btrfs progs release 4.16.1
David Sterba posted on Tue, 24 Apr 2018 13:58:57 +0200 as excerpted: > btrfs-progs version 4.16.1 have been released. This is a bugfix > release. > > Changes: > > * remove obsolete tools: btrfs-debug-tree, btrfs-zero-log, > btrfs-show-super, btrfs-calc-size Cue the admin-side gripes about developer definitions of micro-upgrade explicit "bugfix release" that allow disappearance of "obsolete tools". Arguably such removals can be expected in a "feature release", but shouldn't surprise unsuspecting admins doing a micro-version upgrade that's specifically billed as a "bugfix release". (Further support for btrfs being "still stabilizing, not yet fully stable and mature." But development mode habits need to end /sometime/, if stability is indeed a goal.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2.1] btrfs-progs: do not merge tree block refs have different root_id
On 2018年04月25日 13:19, Su Yue wrote: > For an extent item which contains many tree block backrefs, like > = > In 020-extent-ref-cases/keyed_block_ref.img > > item 10 key (29470720 METADATA_ITEM 0) itemoff 3450 itemsize 222 > refs 23 gen 10 flags TREE_BLOCK > tree block skinny level 0 > tree block backref root 278 > tree block backref root 277 > tree block backref root 276 > tree block backref root 275 > tree block backref root 274 > tree block backref root 273 > tree block backref root 272 > tree block backref root 271 > tree block backref root 270 > tree block backref root 269 > tree block backref root 268 > tree block backref root 267 > tree block backref root 266 > tree block backref root 265 > tree block backref root 264 > tree block backref root 263 > tree block backref root 262 > tree block backref root 261 > tree block backref root 260 > tree block backref root 259 > tree block backref root 258 > tree block backref root 257 > = > In find_parent_nodes(), these refs's parents are 0, then __merge_refs > will merge refs to one ref. It causes only one root to be returned. > > So, if both parents are 0, do not merge refs. > > Lowmem check calls find_parent_nodes frequently to decide whether > check an extent buffer or not. The bug influences bytes accounting. > > Signed-off-by: Su YueReviewed-by: Qu Wenruo Waiting for the kernel port of backref.c to solve the problem permanently. Thanks, Qu > --- > Changelog: > v2: > Put judgment of ref->parent above comparison. > Add the comment. > Fix typos. > v2.1: > Remove the change of adding a new line. > --- > backref.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/backref.c b/backref.c > index 51553c702187..23a394edfd02 100644 > --- a/backref.c > +++ b/backref.c > @@ -505,6 +505,12 @@ static void __merge_refs(struct pref_state *prefstate, > int mode) > if (!ref_for_same_block(ref1, ref2)) > continue; > } else { > + /* > + * Parent == 0 means that the ref is tree block > + * backref or its parent is unresolved. > + */ > + if (!ref1->parent || !ref2->parent) > + continue; > if (ref1->parent != ref2->parent) > continue; > } > signature.asc Description: OpenPGP digital signature