Re: RADI6 questions
Hi, fyi: Raid5/6 hit mainline in 3.9, with 3.8 you will not be able to use those raid levels. Regards, Felix On Sat, Jun 1, 2013 at 11:23 PM, Hugo Mills h...@carfax.org.uk wrote: On Sat, Jun 01, 2013 at 02:07:53PM -0700, ronnie sahlberg wrote: Hi List, I have a filesystem that is spanning about 10 devices. It is currently using RAID1 for both data and metadata. In order to get higher availability and be able to handle multi device failures I would like to change from RAID1 to RAID6. Is it possible/stable/supported/recommended to change data from RAID1 to RAID6 ? (I assume btrfs fi balance ... is used for this?) Yes. Metadata is currently RAID1, is it supported to put metadata as RAID6 too? It would be odd to have lesser protection for metadata than data. Optimally I would like a mode where metadata is mirrored onto all the spindles in the filesystem, not just 2 in RAID1 or n in RAID6. Yes, that should be supported. Im running a 3.8.0 kernel. The btrfs RAID-5 and RAID-6 implementations aren't really ready for production use, so right now I wouldn't recommend using them for anything other than for testing purposes with data that's replacable. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- w.w.w. : England's batting scorecard --- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is there a way to flag specific directories nodatacow?
I am seeing massive journal corruptions that seem to be unique to btrfs and I am suspecting that cow might be causing them. My bandaid fix for this will be to mark the /var filesystem nodatacow at boot. But I am wondering if their is any way to flag a particular directory as nodatacow outside of the mount process. I would like to be able to mark /var/log/journal as nodatacow for example, without having to declare it a subvolume and mount it separately. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Possible solution to the open_ctree boot bug ...
I am seeing a huge improvement in boot performance since doing a system wide file by file defragementation of metadata. In fact in the four sequential boots since completing this process, I have not seen one open_ctree failure so far. This leads me to suspect that the open_ctree boot failures that have been plaguing me since install have been related to metadata fragmentation. So I would advise anyone else experiencing open_ctree boot problems to defragment their metatdata and see if that helps. It certainly seems to have helped me in that regard. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] Btrfs-progs: add btrfsck functionality to btrfs
Hi everybody, Am 08.02.2013 01:36, schrieb Ian Kumlien: diff --git a/cmds-check.c b/cmds-check.c index 71e98de..8e4cce0 100644 --- a/cmds-check.c +++ b/cmds-check.c [...] @@ -3574,7 +3579,8 @@ int main(int ac, char **av) (unsigned long long)bytenr); break; case '?': - print_usage(); + case 'h': + usage(cmd_check_usage); } if (option_index == 1) { printf(enabling repair mode\n); For this to have any effect, 'h' must be added to getopt_long(), see attached patch 1. However, this results in btrfsck -h and --help doing different things: --help prints the usage message to stdout and exits with exit(0). -h prints the usage message to stderr and exits with exit(129). I made a patch to fix this, see attached patch 2. What it doesn't fix though is, that -h/--help and -? don't do the same thing. This is more complicated, as getop_long returns '?' for unknown options. Cheers, Dieter From 11aabdb018aed3c5b6a1616178883fd879152856 Mon Sep 17 00:00:00 2001 From: Dieter Ries m...@dieterries.net Date: Sun, 2 Jun 2013 17:30:09 +0200 Subject: [PATCH 1/2] Btrfs-progs: Fix 'btrfsck/btrfs check -h' For the '-h' option to be usable, getopts_long() has to know it. Signed-off-by: Dieter Ries m...@dieterries.net --- cmds-check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index 1e5e005..ff9298d 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -4065,7 +4065,7 @@ int cmd_check(int argc, char **argv) while(1) { int c; - c = getopt_long(argc, argv, as:, long_options, + c = getopt_long(argc, argv, ahs:, long_options, option_index); if (c 0) break; -- 1.8.1.3 From 52d9e47bfa0936a14baa48e8ad6ecdd820295809 Mon Sep 17 00:00:00 2001 From: Dieter Ries m...@dieterries.net Date: Sun, 2 Jun 2013 17:32:15 +0200 Subject: [PATCH 2/2] Btrfs-progs: Fix '--help' to '-h' inconsistency in btrfsck/btrfs check This patch fixes the following inconsistency between calling btrfsck/btrfs check with the -h or --help options: --help prints the usage message to stdout and exits with exit(0). -h prints the usage message to stderr and exits with exit(129). To achieve this, usage_command_usagestr() is made avalilable via commands.h. Signed-off-by: Dieter Ries m...@dieterries.net --- cmds-check.c | 5 - commands.h | 2 ++ help.c | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index ff9298d..093c859 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -4078,8 +4078,11 @@ int cmd_check(int argc, char **argv) (unsigned long long)bytenr); break; case '?': - case 'h': usage(cmd_check_usage); +break; + case 'h': +usage_command_usagestr(cmd_check_usage, check, 1, 0); +exit(0); } if (option_index == 1) { printf(enabling repair mode\n); diff --git a/commands.h b/commands.h index 15c616d..814452f 100644 --- a/commands.h +++ b/commands.h @@ -73,6 +73,8 @@ extern const char * const generic_cmd_help_usage[]; void usage(const char * const *usagestr); void usage_command(const struct cmd_struct *cmd, int full, int err); void usage_command_group(const struct cmd_group *grp, int all, int err); +void usage_command_usagestr(const char * const *usagestr, +const char *token, int full, int err); void help_unknown_token(const char *arg, const struct cmd_group *grp); void help_ambiguous_token(const char *arg, const struct cmd_group *grp); diff --git a/help.c b/help.c index 6d04293..effb72e 100644 --- a/help.c +++ b/help.c @@ -102,7 +102,7 @@ static int usage_command_internal(const char * const *usagestr, return ret; } -static void usage_command_usagestr(const char * const *usagestr, +void usage_command_usagestr(const char * const *usagestr, const char *token, int full, int err) { FILE *outf = err ? stderr : stdout; -- 1.8.1.3
RAID10 total capacity incorrect
Hi list, I have a 4-device RAID10 array of 2TB drives on btrfs. It works great. I recently added an additional 4 drives to the array. There is only about 2TB in use across the whole array (which should have an effective capacity of about 8TB). However I have noticed that when I issue btrfs filesystem df against the mountpoint, in the total field, I get the same value as the used field: root@mckinley:/# btrfs fi df /mnt/shares/btrfsvol0 Data, RAID10: total=2.06TB, used=2.06TB System, RAID10: total=64.00MB, used=188.00KB System: total=4.00MB, used=0.00 Metadata, RAID10: total=3.00GB, used=2.29GB Here's my btrfs filesystem show: root@mckinley:/# btrfs fi show Label: 'btrfsvol0' uuid: 1a735971-3ad7-4046-b25b-e834a74f2fbb Total devices 8 FS bytes used 2.06TB devid7 size 1.82TB used 527.77GB path /dev/sdk1 devid8 size 1.82TB used 527.77GB path /dev/sdg1 devid6 size 1.82TB used 527.77GB path /dev/sdi1 devid5 size 1.82TB used 527.77GB path /dev/sde1 devid4 size 1.82TB used 527.77GB path /dev/sdj1 devid2 size 1.82TB used 527.77GB path /dev/sdf1 devid1 size 1.82TB used 527.77GB path /dev/sdh1 devid3 size 1.82TB used 527.77GB path /dev/sdc1 This is running the Ubuntu build of kernel 3.9.4 and btrfs-progs from git (v0.20-rc1-324-g650e656). Am I being an idiot and missing something here? I must admit that I still find the df output a bit cryptic (entirely my failure to understand, nothing else), but on another system with only a single device the total field returns the capacity of the device. Cheers! ---tim -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID10 total capacity incorrect
On Sun, Jun 02, 2013 at 05:17:11PM +0100, Tim Eggleston wrote: Hi list, I have a 4-device RAID10 array of 2TB drives on btrfs. It works great. I recently added an additional 4 drives to the array. There is only about 2TB in use across the whole array (which should have an effective capacity of about 8TB). However I have noticed that when I issue btrfs filesystem df against the mountpoint, in the total field, I get the same value as the used field: root@mckinley:/# btrfs fi df /mnt/shares/btrfsvol0 Data, RAID10: total=2.06TB, used=2.06TB System, RAID10: total=64.00MB, used=188.00KB System: total=4.00MB, used=0.00 Metadata, RAID10: total=3.00GB, used=2.29GB Here's my btrfs filesystem show: root@mckinley:/# btrfs fi show Label: 'btrfsvol0' uuid: 1a735971-3ad7-4046-b25b-e834a74f2fbb Total devices 8 FS bytes used 2.06TB devid7 size 1.82TB used 527.77GB path /dev/sdk1 devid8 size 1.82TB used 527.77GB path /dev/sdg1 devid6 size 1.82TB used 527.77GB path /dev/sdi1 devid5 size 1.82TB used 527.77GB path /dev/sde1 devid4 size 1.82TB used 527.77GB path /dev/sdj1 devid2 size 1.82TB used 527.77GB path /dev/sdf1 devid1 size 1.82TB used 527.77GB path /dev/sdh1 devid3 size 1.82TB used 527.77GB path /dev/sdc1 You have 8*527.77 GB = 4222.16 GB of raw space allocated for all purposes. Since RAID-10 takes twice the raw bytes to store data, that gives you 2111.08 GB of usable space so far. From the df output, 2.06 TB ~= 2109.44 GB is allocated as data, and all of that space is used. 3.00 GB is allocated as metadata, and most of that is used. That adds up (within rounding errors) to the 2111.08 GB above. Additional space will be allocated from the available unallocated space as the FS needs it. This is running the Ubuntu build of kernel 3.9.4 and btrfs-progs from git (v0.20-rc1-324-g650e656). Am I being an idiot and missing something here? I must admit that I still find the df output a bit cryptic (entirely my failure to understand, nothing else), but on another system with only a single device the total field returns the capacity of the device. That's probably already fully-allocated, so used=size in btrfs fi show. If it's a single device, then you're probably not using any replication, so the raw storage is equal to the possible storage. HTH, Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I can resist everything except temptation --- signature.asc Description: Digital signature
Re: RAID10 total capacity incorrect
Hi Hugo, Thanks for your reply, good to know it's not an error as such (just me being an idiot!). Additional space will be allocated from the available unallocated space as the FS needs it. So I guess my question becomes, how much of that available unallocated space do I have? Instinctively the btrfs df output feels like it's missing an equivalent to the size column from vanilla df. Is there a method of getting this in a RAID situation? I understand that btrfs RAID is more complicated than md RAID, so it's ok if the answer at this point is no... Thanks again, ---tim -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID10 total capacity incorrect
On Jun 2, 2013, at 12:17 PM, Tim Eggleston li...@timeggleston.co.uk wrote: root@mckinley:/# btrfs fi df /mnt/shares/btrfsvol0 Data, RAID10: total=2.06TB, used=2.06TB System, RAID10: total=64.00MB, used=188.00KB System: total=4.00MB, used=0.00 Metadata, RAID10: total=3.00GB, used=2.29GB Am I being an idiot and missing something here? No, it's confusing. btrfs fi df doesn't show free space. The first value is what space the fs has allocated for the data usage type, and the 2nd value is how much of that allocation is actually being used. I personally think the allocated value is useless for mortal users. I'd rather have some idea of what free space I have left, and the regular df command presents this in an annoying way also because it shows the total volume size, not accounting for the double consumption of raid1. So no matter how you slice it, it's confusing. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID10 total capacity incorrect
On Sun, Jun 02, 2013 at 05:52:38PM +0100, Tim Eggleston wrote: Hi Hugo, Thanks for your reply, good to know it's not an error as such (just me being an idiot!). Additional space will be allocated from the available unallocated space as the FS needs it. So I guess my question becomes, how much of that available unallocated space do I have? Instinctively the btrfs df output feels like it's missing an equivalent to the size column from vanilla df. Look at btrfs fi show -- you have size and used there, so the difference there will give you the unallocated space. Is there a method of getting this in a RAID situation? I understand that btrfs RAID is more complicated than md RAID, so it's ok if the answer at this point is no... Not in any obvious (and non-surprising) way. Basically, any way you could work it out is going to give someone a surprise because they were thinking of it some other way around. The problem is that until the space is allocated, the FS can't know how that space needs to be allocated (to data/metadata, or with what replication type and hence overheads), so we can't necessarily give a reliable estimate. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If you're not part of the solution, you're part --- of the precipiate. signature.asc Description: Digital signature
Re: RAID10 total capacity incorrect
On Sun, Jun 02, 2013 at 12:52:40PM -0400, Chris Murphy wrote: On Jun 2, 2013, at 12:17 PM, Tim Eggleston li...@timeggleston.co.uk wrote: root@mckinley:/# btrfs fi df /mnt/shares/btrfsvol0 Data, RAID10: total=2.06TB, used=2.06TB System, RAID10: total=64.00MB, used=188.00KB System: total=4.00MB, used=0.00 Metadata, RAID10: total=3.00GB, used=2.29GB Am I being an idiot and missing something here? No, it's confusing. btrfs fi df doesn't show free space. The first value is what space the fs has allocated for the data usage type, and the 2nd value is how much of that allocation is actually being used. I personally think the allocated value is useless for mortal users. I'd rather have some idea of what free space I have left, and the regular df command presents this in an annoying way also because it shows the total volume size, not accounting for the double consumption of raid1. So no matter how you slice it, it's confusing. It's the nature of the beast, unfortunately. So far, nobody's managed to come up with a simple method of showing free space and space usage that isn't going to be misleading somehow. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If you're not part of the solution, you're part --- of the precipiate. signature.asc Description: Digital signature
Re: Is there a way to flag specific directories nodatacow?
On Sun, Jun 02, 2013 at 07:40:52AM -0700, George Mitchell wrote: I am seeing massive journal corruptions that seem to be unique to btrfs and I am suspecting that cow might be causing them. My bandaid fix for this will be to mark the /var filesystem nodatacow at boot. But I am wondering if their is any way to flag a particular directory as nodatacow outside of the mount process. I would like to be able to mark /var/log/journal as nodatacow for example, without having to declare it a subvolume and mount it separately. Hi George, We actually have per-file/directory nodatacow :) But please note if you set nodatacow on the particular directory, only new-created or zero-size files in the directory can follow the nocow rule. 'chattr' in the latest e2fsprogs can fit your requirements, # chattr +C /var/log/journal Also, what kind of massive journal corruptions? Does it look like a btrfs specific bug? thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
csum failed during rebalance
Hi, I added a new drive to an existing RAID 0 array. Every attempt to rebalance the array fails: # btrfs filesystem balance /share/bd8 ERROR: error during balancing '/share/bd8' - Input/output error # dmesg | tail btrfs: found 1 extents btrfs: relocating block group 10752513540096 flags 1 btrfs: found 5 extents btrfs: found 5 extents btrfs: relocating block group 10751439798272 flags 1 btrfs: found 1 extents btrfs: found 1 extents btrfs: relocating block group 10048138903552 flags 1 btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028 btrfs csum failed ino 365 off 221745152 csum 3391451932 private 3121065028 An earlier rebalance attempt had the same csum error on a different inode: btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028 btrfs csum failed ino 312 off 221745152 csum 3391451932 private 3121065028 Every rebalance attempt fails the same way, but with a different inum. Here is the array: # btrfs filesystem show Label: 'bd8' uuid: b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4 Total devices 4 FS bytes used 7.37TB devid4 size 3.64TB used 52.00GB path /dev/sde devid1 size 3.64TB used 3.32TB path /dev/sdf1 devid3 size 3.64TB used 2.92TB path /dev/sdc devid2 size 3.64TB used 2.97TB path /dev/sdb While I didn't finish the scrub, no errors were found: # btrfs scrub status -d /share/bd8 scrub status for b39f475f-3ebf-40ea-b088-4ce7f4d4d8f4 scrub device /dev/sdf1 (id 1) status scrub resumed at Sun Jun 2 20:29:06 2013, running for 10360 seconds total bytes scrubbed: 845.53GB with 0 errors scrub device /dev/sdb (id 2) status scrub resumed at Sun Jun 2 20:29:06 2013, running for 10360 seconds total bytes scrubbed: 869.38GB with 0 errors scrub device /dev/sdc (id 3) status scrub resumed at Sun Jun 2 20:29:06 2013, running for 10360 seconds total bytes scrubbed: 706.04GB with 0 errors scrub device /dev/sde (id 4) history scrub started at Sun Jun 2 12:48:36 2013 and finished after 0 seconds total bytes scrubbed: 0.00 with 0 errors Mount options: /dev/sdf1 on /share/bd8 type btrfs (rw,flushoncommit) Kernel 3.9.4 John -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is there a way to flag specific directories nodatacow?
On 06/02/2013 06:28 PM, Liu Bo wrote: On Sun, Jun 02, 2013 at 07:40:52AM -0700, George Mitchell wrote: I am seeing massive journal corruptions that seem to be unique to btrfs and I am suspecting that cow might be causing them. My bandaid fix for this will be to mark the /var filesystem nodatacow at boot. But I am wondering if their is any way to flag a particular directory as nodatacow outside of the mount process. I would like to be able to mark /var/log/journal as nodatacow for example, without having to declare it a subvolume and mount it separately. Hi George, We actually have per-file/directory nodatacow :) But please note if you set nodatacow on the particular directory, only new-created or zero-size files in the directory can follow the nocow rule. 'chattr' in the latest e2fsprogs can fit your requirements, # chattr +C /var/log/journal Also, what kind of massive journal corruptions? Does it look like a btrfs specific bug? thanks, liubo Thanks Liu, That helps a lot! I am very familiar with chattr/lsattr from my ext3 days, but didn't know where to look for btrfs options. From what you are telling me the nodatacow option is identical to nodatacow option for ext3. Do the other ext3 options work for btrfs also? As for as the corruption issue, I actually don't know whether the corruptions are real or whether they are being caused by the way the `journalctl --verify` command is interfacing with the filesystem. My suspicion is that metadata fragmentation *might* be somehow messing with the `journalctl --verify` since I can use simply `journalctl` and all the data flows out without error. I just cleaned out the /var/log/journal directory and started fresh and in no time I am seeing corruptions according to `journalctl --verify`. Here is what the output looks like: == [root@localhost aide]# journalctl --verify Invalid object contents at 130624 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0628-0004de2c1807989c.journal:130624 (of 131072, 99%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0628-0004de2c1807989c.journal (Bad message) PASS: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/user-501@e1447322cf904d028439c2d3f17d032e-065a-0004de2c18d6d96d.journal Invalid object contents at 125264 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-069a-0004de2c5e323847.journal:125264 (of 131072, 95%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-069a-0004de2c5e323847.journal (Bad message) PASS: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/user-501@e1447322cf904d028439c2d3f17d032e-06a8-0004de2c73b5f19d.journal Invalid object contents at 128408 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0709-0004de2cedab583c.journal:128408 (of 131072, 97%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0709-0004de2cedab583c.journal (Bad message) Invalid object contents at 126736 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-077f-0004de2d20abe261.journal:126736 (of 131072, 96%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-077f-0004de2d20abe261.journal (Bad message) Invalid object contents at 129600 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-07ec-0004de2d7c50c186.journal:129600 (of 131072, 98%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-07ec-0004de2d7c50c186.journal (Bad message) PASS: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/user-501@e1447322cf904d028439c2d3f17d032e-07f1-0004de2d87392b08.journal Invalid object contents at 129256 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0862-0004de2e9a6decf4.journal:129256 (of 131072, 98%). FAIL:
Re: Is there a way to flag specific directories nodatacow?
On 06/02/2013 06:28 PM, Liu Bo wrote: On Sun, Jun 02, 2013 at 07:40:52AM -0700, George Mitchell wrote: I am seeing massive journal corruptions that seem to be unique to btrfs and I am suspecting that cow might be causing them. My bandaid fix for this will be to mark the /var filesystem nodatacow at boot. But I am wondering if their is any way to flag a particular directory as nodatacow outside of the mount process. I would like to be able to mark /var/log/journal as nodatacow for example, without having to declare it a subvolume and mount it separately. Hi George, We actually have per-file/directory nodatacow :) But please note if you set nodatacow on the particular directory, only new-created or zero-size files in the directory can follow the nocow rule. 'chattr' in the latest e2fsprogs can fit your requirements, # chattr +C /var/log/journal Also, what kind of massive journal corruptions? Does it look like a btrfs specific bug? thanks, liubo I am also assuming that all directories later created under /var/log/journal will inherit the nodatacow profile? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is there a way to flag specific directories nodatacow?
On Sun, Jun 02, 2013 at 07:19:50PM -0700, George Mitchell wrote: On 06/02/2013 06:28 PM, Liu Bo wrote: On Sun, Jun 02, 2013 at 07:40:52AM -0700, George Mitchell wrote: I am seeing massive journal corruptions that seem to be unique to btrfs and I am suspecting that cow might be causing them. My bandaid fix for this will be to mark the /var filesystem nodatacow at boot. But I am wondering if their is any way to flag a particular directory as nodatacow outside of the mount process. I would like to be able to mark /var/log/journal as nodatacow for example, without having to declare it a subvolume and mount it separately. Hi George, We actually have per-file/directory nodatacow :) But please note if you set nodatacow on the particular directory, only new-created or zero-size files in the directory can follow the nocow rule. 'chattr' in the latest e2fsprogs can fit your requirements, # chattr +C /var/log/journal Also, what kind of massive journal corruptions? Does it look like a btrfs specific bug? thanks, liubo I am also assuming that all directories later created under /var/log/journal will inherit the nodatacow profile? Yes, indeed. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is there a way to flag specific directories nodatacow?
On Sun, Jun 02, 2013 at 07:11:10PM -0700, George Mitchell wrote: On 06/02/2013 06:28 PM, Liu Bo wrote: On Sun, Jun 02, 2013 at 07:40:52AM -0700, George Mitchell wrote: I am seeing massive journal corruptions that seem to be unique to btrfs and I am suspecting that cow might be causing them. My bandaid fix for this will be to mark the /var filesystem nodatacow at boot. But I am wondering if their is any way to flag a particular directory as nodatacow outside of the mount process. I would like to be able to mark /var/log/journal as nodatacow for example, without having to declare it a subvolume and mount it separately. Hi George, We actually have per-file/directory nodatacow :) But please note if you set nodatacow on the particular directory, only new-created or zero-size files in the directory can follow the nocow rule. 'chattr' in the latest e2fsprogs can fit your requirements, # chattr +C /var/log/journal Also, what kind of massive journal corruptions? Does it look like a btrfs specific bug? thanks, liubo Thanks Liu, That helps a lot! I am very familiar with chattr/lsattr from my ext3 days, but didn't know where to look for btrfs options. From what you are telling me the nodatacow option is identical to nodatacow option for ext3. Do the other ext3 options work for btrfs also? Besides nodatacow, compression is also supported as per file/directory basis. As for as the corruption issue, I actually don't know whether the corruptions are real or whether they are being caused by the way the `journalctl --verify` command is interfacing with the filesystem. My suspicion is that metadata fragmentation *might* be somehow messing with the `journalctl --verify` since I can use simply `journalctl` and all the data flows out without error. I just cleaned out the /var/log/journal directory and started fresh and in no time I am seeing corruptions according to `journalctl --verify`. Here is what the output looks like: That's weird, AFAIK it shouldn't be. Does 'dmesg' also complain when these corruptions from 'journalctl --verify' occurs? (well, I'm expecting some csum errors, maybe...) == [root@localhost aide]# journalctl --verify Invalid object contents at 130624 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0628-0004de2c1807989c.journal:130624 (of 131072, 99%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0628-0004de2c1807989c.journal (Bad message) PASS: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/user-501@e1447322cf904d028439c2d3f17d032e-065a-0004de2c18d6d96d.journal Invalid object contents at 125264 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-069a-0004de2c5e323847.journal:125264 (of 131072, 95%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-069a-0004de2c5e323847.journal (Bad message) PASS: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/user-501@e1447322cf904d028439c2d3f17d032e-06a8-0004de2c73b5f19d.journal Invalid object contents at 128408 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0709-0004de2cedab583c.journal:128408 (of 131072, 97%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-0709-0004de2cedab583c.journal (Bad message) Invalid object contents at 126736 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-077f-0004de2d20abe261.journal:126736 (of 131072, 96%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-077f-0004de2d20abe261.journal (Bad message) Invalid object contents at 129600 0% File corruption detected at /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-07ec-0004de2d7c50c186.journal:129600 (of 131072, 98%). FAIL: /var/log/journal/8846d97f611b49aa9f3d48eeac6a81f2/system@e1447322cf904d028439c2d3f17d032e-07ec-0004de2d7c50c186.journal (Bad message) PASS:
Re: Is there a way to flag specific directories nodatacow?
On Sun, Jun 2, 2013 at 11:11 PM, George Mitchell geo...@chinilu.com wrote: So I want to try forcing nodatacow on this directory and see what happens. If that doesn't work, I suppose the next step will be to place this one directory on an ext4 filesystem and mount it externally to the btrfs /var/log. I have the same kind of errors in ext4 file system (ArchLinux 64-bit in a Macbook Air). To me they seem to be related to power loss events, caused by battery depletion when sleeping for long time. Any way besides long time intialization of journalctl displays, there is no error in dmesg or /var/log/files. The errors should be related to log metadata info. -- A. C. Censi accensi [em] gmail [ponto] com accensi [em] montreal [ponto] com [ponto] br -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID10 total capacity incorrect
Hugo Mills posted on Sun, 02 Jun 2013 18:43:59 +0100 as excerpted: On Sun, Jun 02, 2013 at 12:52:40PM -0400, Chris Murphy wrote: [I]t's confusing. btrfs fi df doesn't show free space. The first value is what space the fs has allocated for the data usage type, and the 2nd value is how much of that allocation is actually being used. I personally think the allocated value is useless for mortal users. I'd rather have some idea of what free space I have left, and the regular df command presents this in an annoying way also because it shows the total volume size, not accounting for the double consumption of raid1. So no matter how you slice it, it's confusing. It's the nature of the beast, unfortunately. So far, nobody's managed to come up with a simple method of showing free space and space usage that isn't going to be misleading somehow. btrfs.wiki.kernel.org covers this topic as well as I guess it's possible to be covered at this point, in the FAQ. I definitely recommend reading the user documentation section there, to any btrfs or potential btrfs user who hasn't done so already, as it really does cover a lot of questions, tho certainly not all (as my posting history here, after reading it, demonstrates). Home page (easiest to remember): https://btrfs.wiki.kernel.org Direct link to the documentation section on that page (perhaps more useful as a bookmark): https://btrfs.wiki.kernel.org/index.php/Main_Page#Documentation The FAQ: https://btrfs.wiki.kernel.org/index.php/FAQ Direct link to FAQ section 4.4, which starts the questions that deal with space (4.4-4.9): https://btrfs.wiki.kernel.org/index.php/ FAQ#Why_does_df_show_incorrect_free_space_for_my_RAID_volume.3F In addition, people using multiple devices should read the sysadmin guide and multiple devices pages (which can be found under the docs link above), tho they don't really cover space questions. (But the raid-10 diagram in the sysadmin guide may be helpful in visualizing what's going on.) In particular, see the Whis is free space so complicated? question/ answer, which explains the why of Hugo's answer -- I don't believe it's yet implemented, but the plan is to allow different subvolumes, which can be created at any time, to have different raid levels. Between differing data and metadata levels and differing subvolume levels, in the general case there's simply no reasonable way to reliably report on the unallocated space, since there's no way to know which raid level it'll be allocated as, until it actually happens. Of course the answer in limited specific cases can be known. Here, I'm just deploying multiple btrfs filesystems across two SSD devices, generally raid1[1] for both data/metadata, with no intention of having differing level subvolumes, so I can simply run regular df and divide the results in half in my head. btrfs filesystem df gives me different, much more technical information, so it's useful, but not as simply useful as regular df, halving the numbers in my head. Tim (the OP)'s case is similarly knowable since he's raid10 both data/ metadata across originally four, now eight, similarly sized 2TB devices (unlike me, he's apparently using the same btrfs across the entire physical device, all now eight devices), assuming he never chooses anything other than raid10 data/metadata for subvolumes, and sticks with two-mirror-copy raid10 once N-way mirroring becomes possible. btrfs raid10, like its raid1, is limited to two mirror-copies, so with eight similarly-sized devices and the caveat that he has already rebalanced across all eight devices since doubled from four, he's raid10 4-way striping, two-way-mirroring. I'd guess normal df (not btrfs filesystem df) and doing the math in his head will be the simplest for him, as it is for me. But it's worth noting that normal df with math in your head isn't /always/ going to be the answer, as things start getting rather more complex as soon as different sized devices get thrown into the mix, or raid1/10 on an /odd/ number of devices (tho there the math simply gets a bit more complex since it's no longer integers), let alone the case of differing data/metadata allocation mode, without even considering the case of subvolumes having different modes, since I don't think that's implemented yet. But in the simple cases of data/metadata of the same raid level on either just one or an even number of devices, regular df, doing the math in your head, should be the simplest and most direct answer. As I said, btrfs filesystem df and btrfs filesystem show are useful, but for more technical purposes or in the complex cases where there's no easy way to just do the math on normal df. --- [1] My single exception is a separate tiny /boot, one to each device, -- mixed data/metadata DUP mode, as they're a quarter gig each. I went separate here and separately installed grub2 to each device as well, so I can independently boot from