Re: [PATCH] btrfs: properly track when rescan worker is running
At 08/16/2016 12:10 AM, Jeff Mahoney wrote: The qgroup_flags field is overloaded such that it reflects the on-disk status of qgroups and the runtime state. The BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is used to indicate that a rescan operation is in progress, but if the file system is unmounted while a rescan is running, the rescan operation is paused. If the file system is then mounted read-only, the flag will still be present but the rescan operation will not have been resumed. When we go to umount, btrfs_qgroup_wait_for_completion will see the flag and interpret it to mean that the rescan worker is still running and will wait for a completion that will never come. This patch uses a separate flag to indicate when the worker is running. The locking and state surrounding the qgroup rescan worker needs a lot of attention beyond this patch but this is enough to avoid a hung umount. Cc:# v4.4+ Signed-off-by; Jeff Mahoney Reviewed-by: Qu Wenruo Looks good to me. Would you mind to submit a test case for it? Thanks, Qu --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |1 + fs/btrfs/qgroup.c |9 - 3 files changed, 10 insertions(+), 1 deletion(-) --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1771,6 +1771,7 @@ struct btrfs_fs_info { struct btrfs_workqueue *qgroup_rescan_workers; struct completion qgroup_rescan_completion; struct btrfs_work qgroup_rescan_work; + bool qgroup_rescan_running; /* protected by qgroup_rescan_lock */ /* filesystem state */ unsigned long fs_state; --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2275,6 +2275,7 @@ static void btrfs_init_qgroup(struct btr fs_info->quota_enabled = 0; fs_info->pending_quota_state = 0; fs_info->qgroup_ulist = NULL; + fs_info->qgroup_rescan_running = false; mutex_init(_info->qgroup_rescan_lock); } --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2302,6 +2302,10 @@ static void btrfs_qgroup_rescan_worker(s int err = -ENOMEM; int ret = 0; + mutex_lock(_info->qgroup_rescan_lock); + fs_info->qgroup_rescan_running = true; + mutex_unlock(_info->qgroup_rescan_lock); + path = btrfs_alloc_path(); if (!path) goto out; @@ -2368,6 +2372,9 @@ out: } done: + mutex_lock(_info->qgroup_rescan_lock); + fs_info->qgroup_rescan_running = false; + mutex_unlock(_info->qgroup_rescan_lock); complete_all(_info->qgroup_rescan_completion); } @@ -2494,7 +2501,7 @@ int btrfs_qgroup_wait_for_completion(str mutex_lock(_info->qgroup_rescan_lock); spin_lock(_info->qgroup_lock); - running = fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN; + running = fs_info->qgroup_rescan_running; spin_unlock(_info->qgroup_lock); mutex_unlock(_info->qgroup_rescan_lock); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
At 08/15/2016 10:10 PM, Austin S. Hemmelgarn wrote: On 2016-08-15 10:08, Anand Jain wrote: IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. Any comment is welcomed. Based on looking at the code, we do in fact support 2/3 devices for raid5/6 respectively. Personally, I agree that we should warn when trying to do this, but I absolutely don't think we should stop it from happening. How does 2 disks RAID5 work ? One disk is your data, the other is your parity. In essence, it works like a really computationally expensive version of RAID1 with 2 disks, which is why it's considered a degenerate configuration. I totally agree with the fact that 2 disk raid5 is just a slow raid1. Three disks in RAID6 is similar, but has a slight advantage at the moment in BTRFS because it's the only way to configure three disks so you can lose two and not lose any data as we have no support for higher order replication than 2 copies yet. It's true that btrfs doesn't support any other raid level which can provide 2 parities. But the use case to gain the ability to lose 2 disks in a 3 disk raid6 setup seems more like a trick other than normal use case. Either in mkfs man page, or warning at mkfs time (but still allowing to do it), IMHO it's better to tell user "yes, you can do it, but it's not a really good idea" Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs quota issues
At 08/16/2016 03:11 AM, Rakesh Sankeshi wrote: yes, subvol level. qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/258 119.48GiB119.48GiB200.00GiB none --- --- 0/25992.57GiB 92.57GiB200.00GiB none --- --- although I have 200GB limit on 2 subvols, running into issue at about 120 and 92GB itself 1) About workload Would you mind to mention the work pattern of your write? Just dd data with LZO compression? For compression part, it's a little complicated, as the reserved data size and on disk extent size are different. It's possible that at some code we leaked some reserved data space. 2) Behavior after EDQUOT And, after EDQUOT happens, can you write data into the subvolume? If you can still write a lot of data (at least several giga), it seems to be something related with temporary reserved space. If not, and even can't remove any file due to EQUOTA, then it's almost sure we have underflowed the reserved data. In that case, unmount and mount again will be the only workaround. (In fact, not workaround at all) 3) Behavior without compression If it's OK for you, would you mind to test it without compression? Currently we mostly use the assumption that on-disk extent size are the same with in-memory extent size (non-compression). So qgroup + compression is not the main concern before and is buggy. If without compression, qgroup works sanely, at least we can be sure that the cause is qgroup + compression. Thanks, Qu On Sun, Aug 14, 2016 at 7:11 PM, Qu Wenruowrote: At 08/12/2016 01:32 AM, Rakesh Sankeshi wrote: I set 200GB limit to one user and 100GB to another user. as soon as I reached 139GB and 53GB each, hitting the quota errors. anyway to workaround quota functionality on btrfs LZO compressed filesystem? Please paste "btrfs qgroup show -prce " output if you are using btrfs qgroup/quota function. And, AFAIK btrfs qgroup is applied to subvolume, not user. So did you mean limit it to one subvolume belongs to one user? Thanks, Qu 4.7.0-040700-generic #201608021801 SMP btrfs-progs v4.7 Label: none uuid: 66a78faf-2052-4864-8a52-c5aec7a56ab8 Total devices 2 FS bytes used 150.62GiB devid1 size 1.00TiB used 78.01GiB path /dev/xvdc devid2 size 1.00TiB used 78.01GiB path /dev/xvde Data, RAID0: total=150.00GiB, used=149.12GiB System, RAID1: total=8.00MiB, used=16.00KiB Metadata, RAID1: total=3.00GiB, used=1.49GiB GlobalReserve, single: total=512.00MiB, used=0.00B Filesystem Size Used Avail Use% Mounted on /dev/xvdc 2.0T 153G 1.9T 8% /test_lzo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS constantly reports "No space left on device" even with a huge unallocated space
On Mon, Aug 15, 2016 at 5:12 PM, Ronan Chagaswrote: > Hi guys! > > It happened again. The computer was completely unusable. The only useful > message I saw was this one: > > http://img.ctrlv.in/img/16/08/16/57b24b0bb2243.jpg > > Does it help? > > I decided to format and reinstall tomorrow. This is a production machine and > I have to fix this ASAP. Looks similar to this: https://lkml.org/lkml/2016/3/28/230 Can you describe the workload happening at the time? -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
On Mon, Aug 15, 2016 at 8:30 PM, Hugo Millswrote: > On Mon, Aug 15, 2016 at 10:32:25PM +0800, Anand Jain wrote: >> >> >> On 08/15/2016 10:10 PM, Austin S. Hemmelgarn wrote: >> >On 2016-08-15 10:08, Anand Jain wrote: >> >> >> >> >> IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. >> >> Any comment is welcomed. >> >> >>>Based on looking at the code, we do in fact support 2/3 devices for >> >>>raid5/6 respectively. >> >>> >> >>>Personally, I agree that we should warn when trying to do this, but I >> >>>absolutely don't think we should stop it from happening. About a year ago I had a raid5 array in an disk upgrade situation from 5x 2TB to 4x 4TB. As intermediate I had 2x 2TB + 2x 4TB situation for several weeks. The 2x 2TB were getting really full and the fs was slow. just wondering if an enospc would happen, I started an filewrite task doing several 100 GB's and it simply did work to my surprise. At some point, chunks only occupying the 4TB disks must have been created. I also saw the expected write rate on the 4TB disks. CPU load was not especially high as far as I remember, like a raid1 fs as far as I remember. So it is good that in such a situation, one can still use the fs. I don't remember how the allocated/free space accounting was, probably not correct, but I did not fill up the whole fs to see/experience that. I have no strong opinion whether we should warn about amount of devices at mkfs time for raid56. It's just that the other known issues with raid56 draw more attention. >> >> How does 2 disks RAID5 work ? >> >One disk is your data, the other is your parity. >> >> >> >In essence, it works >> >like a really computationally expensive version of RAID1 with 2 disks, >> >which is why it's considered a degenerate configuration. >> >>How do you generate parity with only one data ? > >For plain parity calculations, parity is the value p which solves > the expression: > > x_1 XOR x_2 XOR ... XOR x_n XOR p = 0 > > for corresponding bits in the n data volumes. With one data volume, > n=1, and hence p = x_1. > >What's the problem? :) > >Hugo. > >> -Anand >> >> >> > Three disks in >> >RAID6 is similar, but has a slight advantage at the moment in BTRFS >> >because it's the only way to configure three disks so you can lose two >> >and not lose any data as we have no support for higher order replication >> >than 2 copies yet. > > -- > Hugo Mills | I always felt that as a C programmer, I was becoming > hugo@... carfax.org.uk | typecast. > http://carfax.org.uk/ | > PGP: E2AB1DE4 | -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Huge load on btrfs subvolume delete
Le 15/08/16 à 10:16, "Austin S. Hemmelgarn"a écrit : ASH> With respect to databases, you might consider backing them up separately ASH> too. In many cases for something like an SQL database, it's a lot more ASH> flexible to have a dump of the database as a backup than it is to have ASH> the database files themselves, because it decouples it from the ASH> filesystem level layout. With mysql|mariadb, having a consistent dump needs to lock tables during dump, not acceptable on production servers. Even with specialised tools for hotdump, doing the dump on prod servers is too heavy about I/O (I have huge db, writing the dump is expensive and long). I used to have a slave juste for the dump (easy to stop slave, dump, and start slave), but after a while it wasn't able to follow the writings all the day long (prod was on ssd and it wasn't, dump hd was 100% busy all the day long), so it's for me really easier to rsync the raw files once a day on a cheap host before dump. (of course, I need to flush & lock table during the snapshot, before rsync, but it's just one or two seconds, still acceptable) -- Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Extents for a particular subvolume
On 03/08/16 22:55, Graham Cobb wrote: > On 03/08/16 21:37, Adam Borowski wrote: >> On Wed, Aug 03, 2016 at 08:56:01PM +0100, Graham Cobb wrote: >>> Are there any btrfs commands (or APIs) to allow a script to create a >>> list of all the extents referred to within a particular (mounted) >>> subvolume? And is it a reasonably efficient process (i.e. doesn't >>> involve backrefs and, preferably, doesn't involve following directory >>> trees)? In case anyone else is interested in this, I ended up creating some simple scripts to allow me to do this. They are slow because they walk the directory tree and they use filefrag to get the extent data, but they do let me answer questions like: * How much space am I wasting by keeping historical snapshots? * How much data is being shared between two subvolumes * How much of the data in my latest snapshot is unique to that snapshot? * How much data would I actually free up if I removed (just) these particular subvolumes? If they are useful to anyone else you can find them at: https://github.com/GrahamCobb/extents-lists If anyone knows of more efficient ways to get this information please let me know. And, of course, feel free to suggest improvements/bugfixes! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs quota issues
yes, subvol level. qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16.00KiB 16.00KiB none none --- --- 0/258 119.48GiB119.48GiB200.00GiB none --- --- 0/25992.57GiB 92.57GiB200.00GiB none --- --- although I have 200GB limit on 2 subvols, running into issue at about 120 and 92GB itself On Sun, Aug 14, 2016 at 7:11 PM, Qu Wenruowrote: > > > At 08/12/2016 01:32 AM, Rakesh Sankeshi wrote: >> >> I set 200GB limit to one user and 100GB to another user. >> >> as soon as I reached 139GB and 53GB each, hitting the quota errors. >> anyway to workaround quota functionality on btrfs LZO compressed >> filesystem? >> > > Please paste "btrfs qgroup show -prce " output if you are using btrfs > qgroup/quota function. > > And, AFAIK btrfs qgroup is applied to subvolume, not user. > > So did you mean limit it to one subvolume belongs to one user? > > Thanks, > Qu > >> >> >> 4.7.0-040700-generic #201608021801 SMP >> >> btrfs-progs v4.7 >> >> >> Label: none uuid: 66a78faf-2052-4864-8a52-c5aec7a56ab8 >> >> Total devices 2 FS bytes used 150.62GiB >> >> devid1 size 1.00TiB used 78.01GiB path /dev/xvdc >> >> devid2 size 1.00TiB used 78.01GiB path /dev/xvde >> >> >> Data, RAID0: total=150.00GiB, used=149.12GiB >> >> System, RAID1: total=8.00MiB, used=16.00KiB >> >> Metadata, RAID1: total=3.00GiB, used=1.49GiB >> >> GlobalReserve, single: total=512.00MiB, used=0.00B >> >> >> Filesystem Size Used Avail Use% Mounted on >> >> /dev/xvdc 2.0T 153G 1.9T 8% /test_lzo >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
On Mon, Aug 15, 2016 at 10:32:25PM +0800, Anand Jain wrote: > > > On 08/15/2016 10:10 PM, Austin S. Hemmelgarn wrote: > >On 2016-08-15 10:08, Anand Jain wrote: > >> > >> > IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. > > Any comment is welcomed. > > >>>Based on looking at the code, we do in fact support 2/3 devices for > >>>raid5/6 respectively. > >>> > >>>Personally, I agree that we should warn when trying to do this, but I > >>>absolutely don't think we should stop it from happening. > >> > >> > >> How does 2 disks RAID5 work ? > >One disk is your data, the other is your parity. > > > >In essence, it works > >like a really computationally expensive version of RAID1 with 2 disks, > >which is why it's considered a degenerate configuration. > >How do you generate parity with only one data ? For plain parity calculations, parity is the value p which solves the expression: x_1 XOR x_2 XOR ... XOR x_n XOR p = 0 for corresponding bits in the n data volumes. With one data volume, n=1, and hence p = x_1. What's the problem? :) Hugo. > -Anand > > > > Three disks in > >RAID6 is similar, but has a slight advantage at the moment in BTRFS > >because it's the only way to configure three disks so you can lose two > >and not lose any data as we have no support for higher order replication > >than 2 copies yet. -- Hugo Mills | I always felt that as a C programmer, I was becoming hugo@... carfax.org.uk | typecast. http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: [GIT PULL] [PATCH v4 00/26] Delete CURRENT_TIME and CURRENT_TIME_SEC macros
On Sat, Aug 13, 2016 at 03:48:12PM -0700, Deepa Dinamani wrote: > The series is aimed at getting rid of CURRENT_TIME and CURRENT_TIME_SEC > macros. > The macros are not y2038 safe. There is no plan to transition them into being > y2038 safe. > ktime_get_* api's can be used in their place. And, these are y2038 safe. Who are you execting to pull this huge patch series? Why not just introduce the new api call, wait for that to be merged, and then push the individual patches through the different subsystems? After half of those get ignored, then provide a single set of patches that can go through Andrew or my trees. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: properly track when rescan worker is running
The qgroup_flags field is overloaded such that it reflects the on-disk status of qgroups and the runtime state. The BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is used to indicate that a rescan operation is in progress, but if the file system is unmounted while a rescan is running, the rescan operation is paused. If the file system is then mounted read-only, the flag will still be present but the rescan operation will not have been resumed. When we go to umount, btrfs_qgroup_wait_for_completion will see the flag and interpret it to mean that the rescan worker is still running and will wait for a completion that will never come. This patch uses a separate flag to indicate when the worker is running. The locking and state surrounding the qgroup rescan worker needs a lot of attention beyond this patch but this is enough to avoid a hung umount. Cc:# v4.4+ Signed-off-by; Jeff Mahoney --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |1 + fs/btrfs/qgroup.c |9 - 3 files changed, 10 insertions(+), 1 deletion(-) --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1771,6 +1771,7 @@ struct btrfs_fs_info { struct btrfs_workqueue *qgroup_rescan_workers; struct completion qgroup_rescan_completion; struct btrfs_work qgroup_rescan_work; + bool qgroup_rescan_running; /* protected by qgroup_rescan_lock */ /* filesystem state */ unsigned long fs_state; --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2275,6 +2275,7 @@ static void btrfs_init_qgroup(struct btr fs_info->quota_enabled = 0; fs_info->pending_quota_state = 0; fs_info->qgroup_ulist = NULL; + fs_info->qgroup_rescan_running = false; mutex_init(_info->qgroup_rescan_lock); } --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2302,6 +2302,10 @@ static void btrfs_qgroup_rescan_worker(s int err = -ENOMEM; int ret = 0; + mutex_lock(_info->qgroup_rescan_lock); + fs_info->qgroup_rescan_running = true; + mutex_unlock(_info->qgroup_rescan_lock); + path = btrfs_alloc_path(); if (!path) goto out; @@ -2368,6 +2372,9 @@ out: } done: + mutex_lock(_info->qgroup_rescan_lock); + fs_info->qgroup_rescan_running = false; + mutex_unlock(_info->qgroup_rescan_lock); complete_all(_info->qgroup_rescan_completion); } @@ -2494,7 +2501,7 @@ int btrfs_qgroup_wait_for_completion(str mutex_lock(_info->qgroup_rescan_lock); spin_lock(_info->qgroup_lock); - running = fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN; + running = fs_info->qgroup_rescan_running; spin_unlock(_info->qgroup_lock); mutex_unlock(_info->qgroup_rescan_lock); -- Jeff Mahoney SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
On 2016-08-15 10:32, Anand Jain wrote: On 08/15/2016 10:10 PM, Austin S. Hemmelgarn wrote: On 2016-08-15 10:08, Anand Jain wrote: IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. Any comment is welcomed. Based on looking at the code, we do in fact support 2/3 devices for raid5/6 respectively. Personally, I agree that we should warn when trying to do this, but I absolutely don't think we should stop it from happening. How does 2 disks RAID5 work ? One disk is your data, the other is your parity. In essence, it works like a really computationally expensive version of RAID1 with 2 disks, which is why it's considered a degenerate configuration. How do you generate parity with only one data ? You treat the data as a stripe of width 1. That's really all there is to it, it's just the same as using 3 or 4 or 5 disks, just with a smaller stripe size. In other systems, 4 is the minimum disk count for RAID5. I'm not sure why they usually disallow 3 disks (it's perfectly legitimate usage, it's just almost never seen in practice (largely because nothing supports it and erasure coding only makes sense from an economic perspective when dealing with lots of data)), but they disallow 2 because it gives no benefit over RAID1 with 2 copies and gives worse performance, not because the math doesn't work with 2 disks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
On 08/15/2016 10:10 PM, Austin S. Hemmelgarn wrote: On 2016-08-15 10:08, Anand Jain wrote: IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. Any comment is welcomed. Based on looking at the code, we do in fact support 2/3 devices for raid5/6 respectively. Personally, I agree that we should warn when trying to do this, but I absolutely don't think we should stop it from happening. How does 2 disks RAID5 work ? One disk is your data, the other is your parity. In essence, it works like a really computationally expensive version of RAID1 with 2 disks, which is why it's considered a degenerate configuration. How do you generate parity with only one data ? -Anand Three disks in RAID6 is similar, but has a slight advantage at the moment in BTRFS because it's the only way to configure three disks so you can lose two and not lose any data as we have no support for higher order replication than 2 copies yet. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Huge load on btrfs subvolume delete
On 2016-08-15 10:06, Daniel Caillibaud wrote: Le 15/08/16 à 08:32, "Austin S. Hemmelgarn"a écrit : ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote: ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete […] ASH> Before I start explaining possible solutions, it helps to explain what's ASH> actually happening here. […] Thanks a lot for these clear and detailed explanations. Glad I could help. ASH> > Is there a better way to do so ? ASH> While there isn't any way I know of to do so, there are ways you can ASH> reduce the impact by reducing how much your backing up: Thanks for these clues too ! I'll use --commit-after, in order to wait for complete deletion before starting rsync the next snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the vm (but I guess my main pb is about databases, because their datadir are the ones with most writes). With respect to databases, you might consider backing them up separately too. In many cases for something like an SQL database, it's a lot more flexible to have a dump of the database as a backup than it is to have the database files themselves, because it decouples it from the filesystem level layout. Most good databases should be able to give you a stable dump (assuming of course that the application using the databases is sanely written) a whole lot faster than you could back up the files themselves. For the couple of databases we use internally where I work, we actually back them up separately not only to retain this flexibility, but also because we have them on a separate backup schedule from the rest of the systems because they change a lot more frequently than anything else. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
Have a look at this.. http://www.spinics.net/lists/linux-btrfs/msg54779.html -- RAID5&6 devs_min values are in the context of degraded volume. RAID1&10.. devs_min values are in the context of healthy volume. RAID56 is correct. We already have devs_max to know the number of devices in a healthy volumes. RAID1's devs_min is wrong so it ended up being same as devs_max. -- Any comments? Also you may use the btrfs-raid-cal simulator tool to verify. https://github.com/asj/btrfs-raid-cal/blob/master/state-table Thanks, Anand On 08/15/2016 03:50 PM, Qu Wenruo wrote: Hi, Recently I found that manpage of mkfs is saying minimal device number for RAID5 and RAID6 is 2 and 3. Personally speaking, although I understand that RAID5/6 only requires 1/2 devices for parity stripe, it is still quite strange behavior. Under most case, user use raid5/6 for striping AND parity. For 2 devices RAID5, it's just a more expensive RAID1. IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. Any comment is welcomed. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
On 2016-08-15 10:08, Anand Jain wrote: IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. Any comment is welcomed. Based on looking at the code, we do in fact support 2/3 devices for raid5/6 respectively. Personally, I agree that we should warn when trying to do this, but I absolutely don't think we should stop it from happening. How does 2 disks RAID5 work ? One disk is your data, the other is your parity. In essence, it works like a really computationally expensive version of RAID1 with 2 disks, which is why it's considered a degenerate configuration. Three disks in RAID6 is similar, but has a slight advantage at the moment in BTRFS because it's the only way to configure three disks so you can lose two and not lose any data as we have no support for higher order replication than 2 copies yet. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. Any comment is welcomed. Based on looking at the code, we do in fact support 2/3 devices for raid5/6 respectively. Personally, I agree that we should warn when trying to do this, but I absolutely don't think we should stop it from happening. How does 2 disks RAID5 work ? -Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Huge load on btrfs subvolume delete
Le 15/08/16 à 08:32, "Austin S. Hemmelgarn"a écrit : ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote: ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete […] ASH> Before I start explaining possible solutions, it helps to explain what's ASH> actually happening here. […] Thanks a lot for these clear and detailed explanations. ASH> > Is there a better way to do so ? ASH> While there isn't any way I know of to do so, there are ways you can ASH> reduce the impact by reducing how much your backing up: Thanks for these clues too ! I'll use --commit-after, in order to wait for complete deletion before starting rsync the next snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the vm (but I guess my main pb is about databases, because their datadir are the ones with most writes). -- Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
On 2016-08-15 09:39, Martin wrote: That really is the case, there's currently no way to do this with BTRFS. You have to keep in mind that the raid5/6 code only went into the mainline kernel a few versions ago, and it's still pretty immature as far as kernel code goes. I don't know when (if ever) such a feature might get put in, but it's definitely something to add to the list of things that would be nice to have. For the moment, the only option to achieve something like this is to set up a bunch of separate 8 device filesystems, but I would be willing to bet that the way you have it configured right now is closer to what most people would be doing in a regular deployment, and therefore is probably more valuable for testing. I see. Right now on our +500TB zfs filesystems we used raid6 with a 6 disk vdev, which is often in the zfs world, and for btrfs I would be the same when stable/possible. A while back there was talk of implementing a system where you could specify any arbitrary number of replicas, stripes or parity (for example, if you had 16 devices, you could tell it to do two copies with double parity using full width stripes), and in theory, it would be possible there (parity level of 2 with a stripe width of 6 or 8 depending on how it's implemented), but I don't think it's likely that that functionality will exist any time soon. Implementing such a system would pretty much require re-writing most of the allocation code (which probably would be a good idea for other reasons now too), and that's not likely to happen given the amount of coding that went into the raid5/6 support. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
On Mon, Aug 15, 2016 at 7:38 AM, Martinwrote: >> Looking at the kernel log itself, you've got a ton of write errors on >> /dev/sdap. I would suggest checking that particular disk with smartctl, and >> possibly checking the other hardware involved (the storage controller and >> cabling). >> >> I would kind of expect BTRFS to crash with that many write errors regardless >> of what profile is being used, but we really should get better about >> reporting errors to user space in a sane way (making people dig through >> kernel logs to figure out their having issues like this is not particularly >> user friendly). > > Interesting! > > Why does it speak of "device sdq" and /dev/sdap ? > > [337411.703937] BTRFS error (device sdq): bdev /dev/sdap errs: wr > 36973, rd 0, flush 1, corrupt 0, gen 0 > [337411.704658] BTRFS warning (device sdq): lost page write due to IO > error on /dev/sdap > > /dev/sdap doesn't exist. OK well journalctl -b | grep -A10 -B10 "sdap" See in what other context it appears. And also 'btrfs fi show' and see if it appears associated with this Btrfs volume. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
On 2016-08-15 09:38, Martin wrote: Looking at the kernel log itself, you've got a ton of write errors on /dev/sdap. I would suggest checking that particular disk with smartctl, and possibly checking the other hardware involved (the storage controller and cabling). I would kind of expect BTRFS to crash with that many write errors regardless of what profile is being used, but we really should get better about reporting errors to user space in a sane way (making people dig through kernel logs to figure out their having issues like this is not particularly user friendly). Interesting! Why does it speak of "device sdq" and /dev/sdap ? [337411.703937] BTRFS error (device sdq): bdev /dev/sdap errs: wr 36973, rd 0, flush 1, corrupt 0, gen 0 [337411.704658] BTRFS warning (device sdq): lost page write due to IO error on /dev/sdap /dev/sdap doesn't exist. I'm not quite certain, something in the kernel might have been confused, but it's hard to be sure. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
> That really is the case, there's currently no way to do this with BTRFS. > You have to keep in mind that the raid5/6 code only went into the mainline > kernel a few versions ago, and it's still pretty immature as far as kernel > code goes. I don't know when (if ever) such a feature might get put in, but > it's definitely something to add to the list of things that would be nice to > have. > > For the moment, the only option to achieve something like this is to set up > a bunch of separate 8 device filesystems, but I would be willing to bet that > the way you have it configured right now is closer to what most people would > be doing in a regular deployment, and therefore is probably more valuable > for testing. > I see. Right now on our +500TB zfs filesystems we used raid6 with a 6 disk vdev, which is often in the zfs world, and for btrfs I would be the same when stable/possible. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
> Looking at the kernel log itself, you've got a ton of write errors on > /dev/sdap. I would suggest checking that particular disk with smartctl, and > possibly checking the other hardware involved (the storage controller and > cabling). > > I would kind of expect BTRFS to crash with that many write errors regardless > of what profile is being used, but we really should get better about > reporting errors to user space in a sane way (making people dig through > kernel logs to figure out their having issues like this is not particularly > user friendly). Interesting! Why does it speak of "device sdq" and /dev/sdap ? [337411.703937] BTRFS error (device sdq): bdev /dev/sdap errs: wr 36973, rd 0, flush 1, corrupt 0, gen 0 [337411.704658] BTRFS warning (device sdq): lost page write due to IO error on /dev/sdap /dev/sdap doesn't exist. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
On Mon, Aug 15, 2016 at 6:19 AM, Martinwrote: > > I have now had the first crash, can you take a look if I have provided > the needed info? > > https://bugzilla.kernel.org/show_bug.cgi?id=153141 [337406.626175] BTRFS warning (device sdq): lost page write due to IO error on /dev/sdap Anytime there's I/O related errors that you'd need to go back farther in the log to find out what really happened. You can play around with 'journalctl --since' for this. It'll accept things like -1m or -2h for "back one minute or back two hours" or also "today" "yesterday" or by explicit date and time. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 10/26] fs: btrfs: Use ktime_get_real_ts for root ctime
On Sat, Aug 13, 2016 at 03:48:22PM -0700, Deepa Dinamani wrote: > btrfs_root_item maintains the ctime for root updates. > This is not part of vfs_inode. > > Since current_time() uses struct inode* as an argument > as Linus suggested, this cannot be used to update root > times unless, we modify the signature to use inode. > > Since btrfs uses nanosecond time granularity, it can also > use ktime_get_real_ts directly to obtain timestamp for > the root. It is necessary to use the timespec time api > here because the same btrfs_set_stack_timespec_*() apis > are used for vfs inode times as well. These can be > transitioned to using timespec64 when btrfs internally > changes to use timespec64 as well. > > Signed-off-by: Deepa Dinamani> Acked-by: David Sterba > Reviewed-by: Arnd Bergmann > Cc: Chris Mason > Cc: David Sterba Acked-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
On 2016-08-15 08:19, Martin wrote: I'm not sure what Arch does any differently to their kernels from kernel.org kernels. But bugzilla.kernel.org offers a Mainline and Fedora drop down for identifying the kernel source tree. IIRC, they're pretty close to mainline kernels. I don't think they have any patches in the filesystem or block layer code at least, but I may be wrong, it's been a long time since I looked at an Arch kernel. Perhaps I should use Arch then, as Fedora rawhide kernel wouldn't boot on my hw, so I am running the stock Fedora 24 kernel right now for the tests... If I want to compile a mainline kernel. Are there anything I need to tune? Fedora kernels do not have these options set. # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set # CONFIG_BTRFS_DEBUG is not set # CONFIG_BTRFS_ASSERT is not set The sanity and integrity tests are both compile time and mount time options, i.e. it has to be compiled enabled for the mount option to do anything. I can't recall any thread where a developer asked a user to set any of these options for testing though. FWIW, I actually have the integrity checking code built in on most kernels I build. I don't often use it, but it has near zero overhead when not enabled, and it's helped me track down lower-level storage configuration issues on occasion. I'll give that a shot tomorrow. When I do the tests, how do I log the info you would like to see, if I find a bug? bugzilla.kernel.org for tracking, and then reference the URL for the bug with a summary in an email to list is how I usually do it. The main thing is going to be the exact reproduce steps. It's also better, I think, to have complete dmesg (or journalctl -k) attached to the bug report because not all problems are directly related to Btrfs, they can have contributing factors elsewhere. And various MTAs, or more commonly MUAs, have a tendancy to wrap such wide text as found in kernel or journald messages. Aside from kernel messages, the other general stuff you want to have is: 1. Kernel version and userspace tools version (`uname -a` and `btrfs --version`) 2. Any underlying storage configuration if it's not just plain a SSD/HDD or partitions (for example, usage of dm-crypt, LVM, mdadm, and similar things). 3. Output from `btrfs filesystem show` (this can be trimmed to the filesystem that's having the issue). 4. If you can still mount the filesystem, `btrfs filesystem df` output can be helpful. 5. If you can't mount the filesystem, output from `btrfs check` run without any options will usually be asked for. I have now had the first crash, can you take a look if I have provided the needed info? https://bugzilla.kernel.org/show_bug.cgi?id=153141 How long should I keep the host untouched? Or is all interesting idea provided? Looking at the kernel log itself, you've got a ton of write errors on /dev/sdap. I would suggest checking that particular disk with smartctl, and possibly checking the other hardware involved (the storage controller and cabling). I would kind of expect BTRFS to crash with that many write errors regardless of what profile is being used, but we really should get better about reporting errors to user space in a sane way (making people dig through kernel logs to figure out their having issues like this is not particularly user friendly). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
On 2016-08-15 08:19, Martin wrote: The smallest disk of the 122 is 500GB. Is it possible to have btrfs see each disk as only e.g. 10GB? That way I can corrupt and resilver more disks over a month. Well, at least you can easily partition the devices for that to happen. Can it be done with btrfs or should I do it with gdisk? With gdisk. BTRFS includes some volume management features, but it doesn't handle partitioning itself. However, I would also suggest that would it be more useful use of the resource to run many arrays in parallel? Ie. one 6-device raid6, one 20-device raid6, and then perhaps use the rest of the devices for a very large btrfs filesystem? Or if you have been using partitioning the large btrfs volume can also be composed of all the 122 devices; in fact you could even run multiple 122-device raid6s and use different kind of testing on each. For performance testing you might only excert one of the file systems at a time, though. Very interesting idea, which leads me to the following question: For the past weeks have I had all 122 disks in one raid6 filesystem, and since I didn't entered any vdev (zfs term) size, I suspect only 2 of the 122 disks are parity. If, how can I make the filesystem, so for every 6 disks, 2 of them are parity? Reading the mkfs.btrfs man page gives me the impression that it can't be done, which I find hard to believe. That really is the case, there's currently no way to do this with BTRFS. You have to keep in mind that the raid5/6 code only went into the mainline kernel a few versions ago, and it's still pretty immature as far as kernel code goes. I don't know when (if ever) such a feature might get put in, but it's definitely something to add to the list of things that would be nice to have. For the moment, the only option to achieve something like this is to set up a bunch of separate 8 device filesystems, but I would be willing to bet that the way you have it configured right now is closer to what most people would be doing in a regular deployment, and therefore is probably more valuable for testing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Huge load on btrfs subvolume delete
On 2016-08-15 06:39, Daniel Caillibaud wrote: Hi, I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete I use snapshots on lxc hosts under debian jessie with - kernel 4.6.0-0.bpo.1-amd64 - btrfs-progs 4.6.1-1~bpo8 For backup, I have each day, for each subvolume btrfs subvolume snapshot -r $subvol $snap # then later ionice -c3 btrfs subvolume delete $snap but ionice doesn't seems to have any effect here and after a few minutes the load grows up quite high (30~40), and I don't know how to make this deletion nicer with I/O Before I start explaining possible solutions, it helps to explain what's actually happening here. When you create a snapshot, BTRFS just scans down the tree for the subvolume in question and creates new references to everything in that subvolume in a separate tree. This is usually insanely fast because all that needs to be done is updating metadata. When you delete a snapshot however, it has to remove any remaining references within the snapshot to the parent subvolume, and also has to process any changed data that is now different from the parent subvolume for deletion just like it would for deleting a file. As a result of this, the work to create a snapshot only depends on the complexity of the directory structure within the subvolume, while the work to delete it depends on both that and how much the snapshot has changed from the parent subvolume. The spike in load your seeing is the filesystem handling all that internal accounting in the background, and I'd be willing to bet that it varies based on how fast things are changing in the parent subvolume. Setting idle I/O scheduling priority on the command to delete the snapshot does nothing because all that command does is tell the kernel to delete the snapshot, the actual deletion is handled in the filesystem driver. While it won't help with the spike in load, you probably want to add `--commit-after` to that subvolume deletion command. That will cause the spike to happen almost immediately, and the command won't return until the filesystem is finished with the accounting and thus the load should be back to normal when it returns. Is there a better way to do so ? While there isn't any way I know of to do so, there are ways you can reduce the impact by reducing how much your backing up: 1. You almost certainly don't need to back up the logs, and if you do, they should probably be backed up independently from the rest of the system image. In most cases, logs just add extra size to a backup, and have little value when you restore a backup, so it makes little sense in most cases to include them in a backup. The simplest way to exclude them in your case is to make /var/log in the LXC containers be a separate subvolume. This will exclude it from the snapshot for the backup, which will both speed up the backup, and reduce the amount of changes from the parent that occur while creating the backup. 2. Assuming you're using a distribution compliant with the filesystem hierarchy standard, there are a couple of directories you can safely exclude from all backups simply because portable programs are designed to handle losing data from these directories gracefully. Such directories include /tmp, /var/tmp, and /var/cache, and they can be excluded the same way as /var/log. 3. Similar arguments apply to $HOME/.cache, which is essentially a per-user /var/cache. This is less likely to have an impact if you don't have individual users doing things on these systems. 4. Look for other similar areas you may be able to safely exclude. For example, I use Gentoo, and I build all my packages with external debugging symbols which get stored in /usr/lib/debug. I only have this set up for convenience, so there's no point in me backing it up because I can just rebuild the package to regenerate the debugging symbols if I need them after restoring from a backup. Similarly, I also exclude any VCS repositories that I have copies of elsewhere, simply because I can just clone that copy if I need it. Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a lot of I/O ? Yes, it's always a bad idea to mess with any scheduling properties other than CPU affinity for kernel threads (and even messing with CPU affinity is usually a bad idea too). The btrfs-transaction kthread (the name gets cut off by the length limits built into the kernel) is a particularly bad one to mess with, because it handles committing updates to the filesystem. Setting an idle scheduling priority on it would probably put you at severe risk of data loss or cause your system to lock up. Actually my io priority on btrfs process are ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}' [btrfs-worker] none: prio 4 [btrfs-worker-hi] none: prio 4 [btrfs-delalloc] none: prio 4 [btrfs-flush_del] none: prio 4 [btrfs-cache] none:
Re: How to stress test raid6 on 122 disk array
>> The smallest disk of the 122 is 500GB. Is it possible to have btrfs >> see each disk as only e.g. 10GB? That way I can corrupt and resilver >> more disks over a month. > > Well, at least you can easily partition the devices for that to happen. Can it be done with btrfs or should I do it with gdisk? > However, I would also suggest that would it be more useful use of the > resource to run many arrays in parallel? Ie. one 6-device raid6, one > 20-device raid6, and then perhaps use the rest of the devices for a very > large btrfs filesystem? Or if you have been using partitioning the large > btrfs volume can also be composed of all the 122 devices; in fact you > could even run multiple 122-device raid6s and use different kind of > testing on each. For performance testing you might only excert one of > the file systems at a time, though. Very interesting idea, which leads me to the following question: For the past weeks have I had all 122 disks in one raid6 filesystem, and since I didn't entered any vdev (zfs term) size, I suspect only 2 of the 122 disks are parity. If, how can I make the filesystem, so for every 6 disks, 2 of them are parity? Reading the mkfs.btrfs man page gives me the impression that it can't be done, which I find hard to believe. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to stress test raid6 on 122 disk array
>> I'm not sure what Arch does any differently to their kernels from >> kernel.org kernels. But bugzilla.kernel.org offers a Mainline and >> Fedora drop down for identifying the kernel source tree. > > IIRC, they're pretty close to mainline kernels. I don't think they have any > patches in the filesystem or block layer code at least, but I may be wrong, > it's been a long time since I looked at an Arch kernel. Perhaps I should use Arch then, as Fedora rawhide kernel wouldn't boot on my hw, so I am running the stock Fedora 24 kernel right now for the tests... >>> If I want to compile a mainline kernel. Are there anything I need to >>> tune? >> >> >> Fedora kernels do not have these options set. >> >> # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set >> # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set >> # CONFIG_BTRFS_DEBUG is not set >> # CONFIG_BTRFS_ASSERT is not set >> >> The sanity and integrity tests are both compile time and mount time >> options, i.e. it has to be compiled enabled for the mount option to do >> anything. I can't recall any thread where a developer asked a user to >> set any of these options for testing though. > FWIW, I actually have the integrity checking code built in on most kernels I > build. I don't often use it, but it has near zero overhead when not > enabled, and it's helped me track down lower-level storage configuration > issues on occasion. I'll give that a shot tomorrow. >>> When I do the tests, how do I log the info you would like to see, if I >>> find a bug? >> >> >> bugzilla.kernel.org for tracking, and then reference the URL for the >> bug with a summary in an email to list is how I usually do it. The >> main thing is going to be the exact reproduce steps. It's also better, >> I think, to have complete dmesg (or journalctl -k) attached to the bug >> report because not all problems are directly related to Btrfs, they >> can have contributing factors elsewhere. And various MTAs, or more >> commonly MUAs, have a tendancy to wrap such wide text as found in >> kernel or journald messages. > > Aside from kernel messages, the other general stuff you want to have is: > 1. Kernel version and userspace tools version (`uname -a` and `btrfs > --version`) > 2. Any underlying storage configuration if it's not just plain a SSD/HDD or > partitions (for example, usage of dm-crypt, LVM, mdadm, and similar things). > 3. Output from `btrfs filesystem show` (this can be trimmed to the > filesystem that's having the issue). > 4. If you can still mount the filesystem, `btrfs filesystem df` output can > be helpful. > 5. If you can't mount the filesystem, output from `btrfs check` run without > any options will usually be asked for. I have now had the first crash, can you take a look if I have provided the needed info? https://bugzilla.kernel.org/show_bug.cgi?id=153141 How long should I keep the host untouched? Or is all interesting idea provided? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About minimal device number for RAID5/6
On 2016-08-15 03:50, Qu Wenruo wrote: Hi, Recently I found that manpage of mkfs is saying minimal device number for RAID5 and RAID6 is 2 and 3. Personally speaking, although I understand that RAID5/6 only requires 1/2 devices for parity stripe, it is still quite strange behavior. Under most case, user use raid5/6 for striping AND parity. For 2 devices RAID5, it's just a more expensive RAID1. IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. Any comment is welcomed. Based on looking at the code, we do in fact support 2/3 devices for raid5/6 respectively. Personally, I agree that we should warn when trying to do this, but I absolutely don't think we should stop it from happening. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On 2016-08-12 11:06, Duncan wrote: Austin S. Hemmelgarn posted on Fri, 12 Aug 2016 08:04:42 -0400 as excerpted: On a file server? No, I'd ensure proper physical security is established and make sure it's properly secured against network based attacks and then not worry about it. Unless you have things you want to hide from law enforcement or your government (which may or may not be legal where you live) or can reasonably expect someone to steal the system, you almost certainly don't actually need whole disk encryption. There are two specific exceptions to this though: 1. If your employer requires encryption on this system, that's their call. 2. Encrypted swap is a good thing regardless, because it prevents security credentials from accidentally being written unencrypted to persistent storage. In the US, medical records are pretty well protected under penalty of law (HIPPA, IIRC?). Anyone storing medical records here would do well to have full filesystem encryption for that reason. Of course financial records are sensitive as well, or even just forum login information, and then there's the various industrial spies from various countries (China being the one most frequently named) that would pay good money for unencrypted devices from the right sources. Medical and even financial records really fall under my first exception, but it's still no substitute for proper physical security. As far as user account information, that depends on what your legal or PR department promised, but in many cases there, there's minimal improvement in security when using full disk encryption in place of just encrypting the database file used to store the information. In either case though, it's still a better investment in terms of both time and money to properly secure the network and physical access to the hardware. All that disk encryption protects is data at rest, and for a _server_ system, the data is almost always online, and therefore lack of protection of the system as a whole is usually more of a security issue in general than lack of protection for a single disk that's powered off. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Huge load on btrfs subvolume delete
Hi, I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete I use snapshots on lxc hosts under debian jessie with - kernel 4.6.0-0.bpo.1-amd64 - btrfs-progs 4.6.1-1~bpo8 For backup, I have each day, for each subvolume btrfs subvolume snapshot -r $subvol $snap # then later ionice -c3 btrfs subvolume delete $snap but ionice doesn't seems to have any effect here and after a few minutes the load grows up quite high (30~40), and I don't know how to make this deletion nicer with I/O Is there a better way to do so ? Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a lot of I/O ? Actually my io priority on btrfs process are ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}' [btrfs-worker] none: prio 4 [btrfs-worker-hi] none: prio 4 [btrfs-delalloc] none: prio 4 [btrfs-flush_del] none: prio 4 [btrfs-cache] none: prio 4 [btrfs-submit] none: prio 4 [btrfs-fixup] none: prio 4 [btrfs-endio] none: prio 4 [btrfs-endio-met] none: prio 4 [btrfs-endio-met] none: prio 4 [btrfs-endio-rai] none: prio 4 [btrfs-endio-rep] none: prio 4 [btrfs-rmw] none: prio 4 [btrfs-endio-wri] none: prio 4 [btrfs-freespace] none: prio 4 [btrfs-delayed-m] none: prio 4 [btrfs-readahead] none: prio 4 [btrfs-qgroup-re] none: prio 4 [btrfs-extent-re] none: prio 4 [btrfs-cleaner] none: prio 0 [btrfs-transacti] none: prio 0 Thanks -- Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
About minimal device number for RAID5/6
Hi, Recently I found that manpage of mkfs is saying minimal device number for RAID5 and RAID6 is 2 and 3. Personally speaking, although I understand that RAID5/6 only requires 1/2 devices for parity stripe, it is still quite strange behavior. Under most case, user use raid5/6 for striping AND parity. For 2 devices RAID5, it's just a more expensive RAID1. IMHO it's better to warn user about 2 devices RAID5 or 3 devices RAID6. Any comment is welcomed. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] code cleanup
On Sun, Aug 14, 2016 at 04:11:31PM -0400, Harinath Nampally wrote: > This patch checks ret value and jumps to clean up in case of > btrs_add_systme_chunk call fails > > Signed-off-by: Harinath Nampally> --- > fs/btrfs/volumes.c | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 366b335..fedb301 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -4880,12 +4880,15 @@ int btrfs_finish_chunk_alloc(struct > btrfs_trans_handle *trans, > > ret = btrfs_insert_item(trans, chunk_root, , chunk, item_size); > if (ret == 0 && map->type & BTRFS_BLOCK_GROUP_SYSTEM) { > - /* > - * TODO: Cleanup of inserted chunk root in case of > - * failure. > - */ > ret = btrfs_add_system_chunk(chunk_root, , chunk, >item_size); > + if (ret) { > + /* > + * Cleanup of inserted chunk root in case of > + * failure. > + */ > + goto out; > + } > } > > out: NAK. This patch doesn't do anything. That's just jumping to the exact same location that we were previously returning to anyways. -- Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html