Re: Corrupt btrfs filesystem recovery... What best instructions?
Martin posted on Sun, 29 Sep 2013 03:10:37 +0100 as excerpted: > So... > > Any options for btrfsck to fix things? > > Or is anything/everything that is fixable automatically fixed on the > next mount? > > Or should: > > btrfs scrub /dev/sdX > > be run first? > > Or? > > > What does btrfs do (or can do) for recovery? Here's a general-case answer (courtesy gmane) to the order in which to try recovery question, that Hugo posted a few weeks ago: http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999 Note that in specific cases someone who knew what they were doing could omit some steps and focus on others, but I'm not at that level of "know what I'm doing", so... Scrub... would go before this, if it's useful. But scrub depends on a second, valid copy being available in ordered to fix the bad-checksum one. On a single device btrfs, btrfs defaults to DUP metadata (unless it's SSD), so you may have a second copy for that, but you won't have a second copy of the data. This is a very strong reason to go btrfs raid1 mode (for both data and metadata) if you can, because that gives you a second copy of everything, thereby actually making use of btrfs' checksum and scrub ability. (Unfortunately, there is as yet no way to do N-way mirroring, there's only the second copy not a third, no matter how many devices you have in that "raid1".) Finally, if you mentioned your kernel (and btrfs-tools) version(s) I missed it, but [boilerplate recommendation, stressed repeatedly both in the wiki and on-list] btrfs being still labeled experimental and under serious development, there's still lots of bugs fixed every kernel release. So as Chris Murphy said, if you're not on 3.11-stable or 3.12- rcX already, get there. Not only can the safety of your data depend on it, but by choosing to run experimental we're all testers, and our reports if something does go wrong will be far more usable if we're on a current kernel. Similarly, btrfs-tools 0.20-rc1 is already somewhat old; you really should be on a git-snapshot beyond that. (The master branch is kept stable, work is done in other branches and only merged to master when it's considered suitably stable, so a recently updated btrfs-tools master HEAD is at least in theory always the best possible version you can be running. If that's ever NOT the case, then testers need to be reporting that ASAP so it can be fixed, too.) Back to the kernel, it's worth noting that 3.12-rcX includes an option that turns off most btrfs bugons by default. Unless you're a btrfs developer (which it doesn't sound like you are), you'll want to activate that (turning off the bugons), as they're not helpful for ordinary users and just force unnecessary reboots when something minor and otherwise immediately recoverable goes wrong. That's just one of the latest fixes. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: fix memory leak of chunks' extent map
As we're hold a ref on looking up the extent map, we need to drop the ref before returning to callers. Signed-off-by: Liu Bo --- v2: add the missing changelog. fs/btrfs/volumes.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 0431147..ee1fdac 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4483,6 +4483,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) btrfs_crit(fs_info, "Invalid mapping for %Lu-%Lu, got " "%Lu-%Lu\n", logical, logical+len, em->start, em->start + em->len); + free_extent_map(em); return 1; } @@ -4663,6 +4664,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, btrfs_crit(fs_info, "found a bad mapping, wanted %Lu, " "found %Lu-%Lu\n", logical, em->start, em->start + em->len); + free_extent_map(em); return -EINVAL; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix memory leak of chunks' extent map
Signed-off-by: Liu Bo --- fs/btrfs/volumes.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 0431147..ee1fdac 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4483,6 +4483,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) btrfs_crit(fs_info, "Invalid mapping for %Lu-%Lu, got " "%Lu-%Lu\n", logical, logical+len, em->start, em->start + em->len); + free_extent_map(em); return 1; } @@ -4663,6 +4664,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, btrfs_crit(fs_info, "found a bad mapping, wanted %Lu, " "found %Lu-%Lu\n", logical, em->start, em->start + em->len); + free_extent_map(em); return -EINVAL; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corrupt btrfs filesystem recovery... (Due to *sata* errors)
Chris, Thanks for good comment/discussion. On 29/09/13 03:06, Chris Murphy wrote: > > On Sep 28, 2013, at 4:51 PM, Martin wrote: > > Stick with forced 3Gbps, but I think it's worth while to find out > what the actual problem is. One day you forget about this 3Gbps SATA > link, upgrade or regress to another kernel and you don't have the > 3Gbps forced speed on the parameter line, and poof - you've got more > problems again. The hardware shouldn't negotiate a 6Gbps link and > then do a backwards swan dive at 30,000' with your data as if it's an > after thought. I've got an engineer's curiosity so that one is very definitely marked for revisiting at some time... If only to blog that x-y-z combination is a tar pit for your data... >> In any case, for the existing HDD - motherboard combination, using >> sata2 rather than sata3 speeds shouldn't noticeably impact >> performance. (Other than sata2 works reliably and so is infinitely >> better for this case!) > > It's true. Well, the IO data rate for badblocks is exactly the same as before, limited by the speed of the physical rust spinning and data density... > I would also separately unmount the file system, note the latest > kernel message, then mount the file system and see if there are any > kernel messages that might indicate recognition of problems with the > fs. > > I would not use btrfsck --repair until someone says it's a good idea. > That person would not be me. It is sat unmounted until some informed opinion is gained... Thanks again for your notes, Regards, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corrupt btrfs filesystem recovery... What best instructions?
On 28/09/13 23:54, Martin wrote: > On 28/09/13 20:26, Martin wrote: > >> ... btrfsck bombs out with LOTs of errors... >> >> How best to recover from this? >> >> (This is a 'backup' disk so not 'critical' but it would be nice to avoid >> rewriting about 1.5TB of data over the network...) >> >> >> Is there an obvious sequence/recipe to follow for recovery? > > > I've got the drive reliably working with the sata limited to 3Gbit/s. > What is the best sequence to try to tidy-up and carry on with the 1.5TB > or so of data on there, rather than working from scratch? > > > So far, I've only run btrfsck since the corruption... So... Any options for btrfsck to fix things? Or is anything/everything that is fixable automatically fixed on the next mount? Or should: btrfs scrub /dev/sdX be run first? Or? What does btrfs do (or can do) for recovery? Advice welcomed, Thanks, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corrupt btrfs filesystem recovery... (Due to *sata* errors)
On Sep 28, 2013, at 4:51 PM, Martin wrote: > Indeed. However, are not the sata errors reported back to btrfs so that > it knows whatever parts haven't been updated? It's a good question. My doubtful speculation of such a mechanism is that it is really not the responsibility of the file system to be prepared for the hardware face planting this spectacularly. The hardware really should do better than this. There are specifications that apply here, and the drive and controller and driver all agreed long before the mounting of a volume and writes started to occur. But then later on, at some point in the middle of the really important part of the conversation (writing your data) something in the hardware chain puked and said "OHHh wait about that prior conversation, I'm really confused, let's talk at a slower speed shall we?" So the before part is just a lost conversation, is my speculation. The other thing is that SATA and SAS handle these things differently. When there's such a serious error that results in a link speed change, usually the bus is reset and for SATA it means the command queue is lost. And I don't think Btrfs is informed of what commands were completed vs failed in such a case. But I'd love someone who actually knows what they're talking about to answer that question. My expectation though, is that unlike perhaps other file systems, Btrfs's design goal is to handle the data that did get written, better. In that it's still accessible where other file systems possibly will have a more difficulty. > Is there not a mechanism to then go "read-only"? I don't know. In this case it does seem sorta reasonable. But the dmesg might still be revealing. The PHY Event counters indicate a lot of retries of over 1000 sectors. > > Also, should not the journal limit the damage? Well it's COW so it's not quite like a journaled file system, but yeah it should be in a position to know at the next mount time the most recent state of file system consistency. But that doesn't mean it can fix the parts that are just fundamentally broken. But I think it's a valid question, "now what?" because I don't actually know the state of your file system or how to determine it. So maybe Hugo, or someone else has some thoughts. But for sure I would move to kernel 3.11.2 or 3.12.rc2 before mounting this file system again. > > >>> How best to recover from this? >> >> Why you're getting I/O errors at SATA 6Gbps link speed needs to be >> understood. Is it a bad cable? Bad SATA port? Drive or controller >> firmware bug? Or libata driver bug? > > I systematically eliminated such as leads, PSU, and NCQ. Limiting libata > to only use 3Gbit/s is the one change that gives a consistent fix. The > HDD and motherboard both support 6Gbit/s, but hey-ho, that's an > experiment I can try again some other time when I have another HDD/SSD > to test in there. Stick with forced 3Gbps, but I think it's worth while to find out what the actual problem is. One day you forget about this 3Gbps SATA link, upgrade or regress to another kernel and you don't have the 3Gbps forced speed on the parameter line, and poof - you've got more problems again. The hardware shouldn't negotiate a 6Gbps link and then do a backwards swan dive at 30,000' with your data as if it's an after thought. > In any case, for the existing HDD - motherboard combination, using sata2 > rather than sata3 speeds shouldn't noticeably impact performance. (Other > than sata2 works reliably and so is infinitely better for this case!) It's true. > > >>> Lots of sata error noise omitted. >> >> And entire dmesg might still be useful. I don't know if the list will >> handle the whole dmesg in one email, but it's worth a shot (reply to >> an email in the thread, don't change the subject). > > I can email directly if of use/interest. Let me know offlist. Use pastebin.com and post the link if it's really huge, but I'd consider setting it to no expiration because if something interesting is learned, people doing searches have a better chance of finding the problem if the link hasn't expired. I would also separately unmount the file system, note the latest kernel message, then mount the file system and see if there are any kernel messages that might indicate recognition of problems with the fs. I would not use btrfsck --repair until someone says it's a good idea. That person would not be me. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions regarding logging upon fsync in btrfs
On Sun, Sep 29, 2013 at 01:35:15AM +0200, Aastha Mehta wrote: > Hi, > > I have few questions regarding logging triggered by calling fsync in BTRFS: > > 1. If I understand correctly, fsync will call to log entire inode in > the log tree. Does this mean that the data extents are also logged > into the log tree? Are they copied into the log tree, or just > referenced? Are they copied into the subvolume's extent tree again > upon replay? > The data extents are copied as well, as in the metadata that points to the data, not the actual data itself. For 3.1 it's all of the extents in the inode, in 3.8 on it's only the extents that have changed this transaction. > 2. During replay, when the extents are added into the extent > allocation tree, do they acquire the physical extent number during > replay? Does they physical extent allocated to the data in the log > tree differ from that in the subvolume? > No the physical location was picked when we wrote the data out during fsync. If we crash and re-mount the replay will just insert the ref into the extent tree for the disk offset as it replays the extents. > 3. I see there is a mount option of notreelog available. After > disabling tree logging, does fsync still lead to flushing of buffers > to the disk directly? > notreelog just means that we write the data and wait on the ordered data extents and then commit the transaction. So you get the data for the inode you are fsycning and all of the metadata for the entire file system that has changed in that transaction. > 4. Is it possible to selectively identify certain files in the log > tree and flush them to disk directly, without waiting for the replay > to do it? > I don't understand this question, replay only happens on mount after a crash/power loss, and everything is replayed that is in the log, there is no way to select which inode is replayed. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questions regarding logging upon fsync in btrfs
On Sun, Sep 29, 2013 at 01:46:23AM +0200, Aastha Mehta wrote: > I am using linux kernel 3.1.10-1.16, just to let you know. Not that it invalidates the questions below, but that's a really old kernel. You should update to something recent (3.11, or 3.12-rc2) as soon as possible. There are major problems in 3.1 (and most of the subsequent kernels) that have been fixed in 3.11. Of course, there are still major problems in 3.11 that haven't been fixed yet, but we don't know about very many of those. :) (And when we do, we'll be recommending that you upgrade to whatever has them fixed...) Hugo. > Thanks > > On 29 September 2013 01:35, Aastha Mehta wrote: > > Hi, > > > > I have few questions regarding logging triggered by calling fsync in BTRFS: > > > > 1. If I understand correctly, fsync will call to log entire inode in > > the log tree. Does this mean that the data extents are also logged > > into the log tree? Are they copied into the log tree, or just > > referenced? Are they copied into the subvolume's extent tree again > > upon replay? > > > > 2. During replay, when the extents are added into the extent > > allocation tree, do they acquire the physical extent number during > > replay? Does they physical extent allocated to the data in the log > > tree differ from that in the subvolume? > > > > 3. I see there is a mount option of notreelog available. After > > disabling tree logging, does fsync still lead to flushing of buffers > > to the disk directly? > > > > 4. Is it possible to selectively identify certain files in the log > > tree and flush them to disk directly, without waiting for the replay > > to do it? > > > > Thanks > > -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- My code is never released, it escapes from the --- git repo and kills a few beta testers on the way out. signature.asc Description: Digital signature
Re: Questions regarding logging upon fsync in btrfs
I am using linux kernel 3.1.10-1.16, just to let you know. Thanks On 29 September 2013 01:35, Aastha Mehta wrote: > Hi, > > I have few questions regarding logging triggered by calling fsync in BTRFS: > > 1. If I understand correctly, fsync will call to log entire inode in > the log tree. Does this mean that the data extents are also logged > into the log tree? Are they copied into the log tree, or just > referenced? Are they copied into the subvolume's extent tree again > upon replay? > > 2. During replay, when the extents are added into the extent > allocation tree, do they acquire the physical extent number during > replay? Does they physical extent allocated to the data in the log > tree differ from that in the subvolume? > > 3. I see there is a mount option of notreelog available. After > disabling tree logging, does fsync still lead to flushing of buffers > to the disk directly? > > 4. Is it possible to selectively identify certain files in the log > tree and flush them to disk directly, without waiting for the replay > to do it? > > Thanks > > -- > Aastha Mehta -- Aastha Mehta -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Questions regarding logging upon fsync in btrfs
Hi, I have few questions regarding logging triggered by calling fsync in BTRFS: 1. If I understand correctly, fsync will call to log entire inode in the log tree. Does this mean that the data extents are also logged into the log tree? Are they copied into the log tree, or just referenced? Are they copied into the subvolume's extent tree again upon replay? 2. During replay, when the extents are added into the extent allocation tree, do they acquire the physical extent number during replay? Does they physical extent allocated to the data in the log tree differ from that in the subvolume? 3. I see there is a mount option of notreelog available. After disabling tree logging, does fsync still lead to flushing of buffers to the disk directly? 4. Is it possible to selectively identify certain files in the log tree and flush them to disk directly, without waiting for the replay to do it? Thanks -- Aastha Mehta -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corrupt btrfs filesystem recovery... (Due to *sata* errors)
On 28/09/13 20:26, Martin wrote: > ... btrfsck bombs out with LOTs of errors... > > How best to recover from this? > > (This is a 'backup' disk so not 'critical' but it would be nice to avoid > rewriting about 1.5TB of data over the network...) > > > Is there an obvious sequence/recipe to follow for recovery? I've got the drive reliably working with the sata limited to 3Gbit/s. What is the best sequence to try to tidy-up and carry on with the 1.5TB or so of data on there, rather than working from scratch? So far, I've only run btrfsck since the corruption errors for the three sectors... Suggestions for recovery? Thanks, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Corrupt btrfs filesystem recovery... (Due to *sata* errors)
Chris, All agreed. Further comment inlined: (Should have mentioned more prominently that the hardware problem has been worked-around by limiting the sata to 3Gbit/s on bootup.) On 28/09/13 21:51, Chris Murphy wrote: > > On Sep 28, 2013, at 1:26 PM, Martin wrote: > >> Writing data via rsync at the 6Gbit/s sata rate caused IO errors >> for just THREE sectors... >> >> Yet btrfsck bombs out with LOTs of errors… > > Any fs will bomb out on write errors. Indeed. However, are not the sata errors reported back to btrfs so that it knows whatever parts haven't been updated? Is there not a mechanism to then go "read-only"? Also, should not the journal limit the damage? >> How best to recover from this? > > Why you're getting I/O errors at SATA 6Gbps link speed needs to be > understood. Is it a bad cable? Bad SATA port? Drive or controller > firmware bug? Or libata driver bug? I systematically eliminated such as leads, PSU, and NCQ. Limiting libata to only use 3Gbit/s is the one change that gives a consistent fix. The HDD and motherboard both support 6Gbit/s, but hey-ho, that's an experiment I can try again some other time when I have another HDD/SSD to test in there. In any case, for the existing HDD - motherboard combination, using sata2 rather than sata3 speeds shouldn't noticeably impact performance. (Other than sata2 works reliably and so is infinitely better for this case!) >> Lots of sata error noise omitted. > > And entire dmesg might still be useful. I don't know if the list will > handle the whole dmesg in one email, but it's worth a shot (reply to > an email in the thread, don't change the subject). I can email directly if of use/interest. Let me know offlist. > do a smartctl -x on the drive, chances are it's recording PHY Event (smartctl -x errors shown further down...) Nothing untoward noticed: # smartctl -a /dev/sdc === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD20EARX-00PASB0 Serial Number:WD-... LU WWN Device Id: ... Firmware Version: 51.0AB51 User Capacity:2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is:In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is:Sat Sep 28 23:35:57 2013 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled [...] SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051Pre-fail Always - 9 3 Spin_Up_Time0x0027 253 159 021Pre-fail Always - 1983 4 Start_Stop_Count0x0032 100 100 000Old_age Always - 55 5 Reallocated_Sector_Ct 0x0033 200 200 140Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000Old_age Always - 0 9 Power_On_Hours 0x0032 099 099 000Old_age Always - 800 10 Spin_Retry_Count0x0032 100 253 000Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 53 192 Power-Off_Retract_Count 0x0032 200 200 000Old_age Always - 31 193 Load_Cycle_Count0x0032 199 199 000Old_age Always - 3115 194 Temperature_Celsius 0x0022 118 110 000Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x0032 200 200 000Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000Old_age Offline - 0 # smartctl -x /dev/sdc ... also shows the errors it saw: (Just the last 4 copied which look timed for when the HDD was last exposed to 6Gbit/s sata) Error 46 [21] occurred at disk power-on lifetime: 755 hours (31 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 01 -- 51 00 08 00 00 6c 1a 4b b0 e0 00 Error: AMNF 8 sectors at LBA = 0x6c1a4bb0 = 1813662640 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --- -
Re: Corrupt btrfs filesystem recovery... (Due to *sata* errors)
On Sep 28, 2013, at 1:26 PM, Martin wrote: > Writing data via rsync at the 6Gbit/s sata rate caused > IO errors for just THREE sectors... > > Yet btrfsck bombs out with LOTs of errors… Any fs will bomb out on write errors. > How best to recover from this? Why you're getting I/O errors at SATA 6Gbps link speed needs to be understood. Is it a bad cable? Bad SATA port? Drive or controller firmware bug? Or libata driver bug? > Lots of sata error noise omitted. And entire dmesg might still be useful. I don't know if the list will handle the whole dmesg in one email, but it's worth a shot (reply to an email in the thread, don't change the subject). It's possible software or hardware problems are detected well before writes are even initiated. > Running "badblocks" twice in succession (non-destructive data test!) > shows no surface errors and no further errors on the sata interface. SATA link speed related errors aren't related to bad blocks. If you do a smartctl -x on the drive, chances are it's recording PHY Event errors that might be relevant, and also SMART might record UDMA/CMC errors that would just corroborate that the drive also found link errors. > > Running btrfsck twice gives the same result, giving a failure with: Well honestly at this point I expect file system corruption as it's entirely possible that before the hardware dropped the link speed down to SATA 3Gbps, there was corrupt data already sent to the drive and that's not something Btrfs can know about until trying to read the data back in. So *shrug* - I don't see Btrfs as a way to totally mitigate hardware problems. It's the same problem with bad RAM, and Btrfs doesn't like that either. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Not possible to read device stats for devices added after mount
Hi, I discovered one minor bug in BTRFS filesystem. I made nagios check for btrfs which reads device statistics for all devices in mounted btrfs filesystem, calling btrfs dev stats /btrfs. But there is one problem ... it's output looks like this: [/dev/sda].corruption_errs 0 .. ... [/dev/sdt].generation_errs 0 ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sdb2 failed: No such device ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sdh failed: No such device ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sdj failed: No such device ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sdk failed: No such device ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sdl failed: No such device ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sdp failed: No such device ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sdq failed: No such device ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sds failed: No such device ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on /dev/sde failed: No such device But this is not true ... all specified devices exist and are members of btrfs filesystem. In dmesg I see this: ... [973077.098957] btrfs: get dev_stats failed, not yet valid [973077.098984] btrfs: get dev_stats failed, not yet valid [973077.099011] btrfs: get dev_stats failed, not yet valid [973077.099038] btrfs: get dev_stats failed, not yet valid [973077.099065] btrfs: get dev_stats failed, not yet valid [973077.099092] btrfs: get dev_stats failed, not yet valid [973077.099118] btrfs: get dev_stats failed, not yet valid What makes device statistics valid ? I tried doing full filesystem scrub ... but it did not fix that issue. Thank you for any hints Using this kernel (if it matters): 3.10-2-amd64 #1 SMP Debian 3.10.7-1 (2013-08-17) x86_64 GNU/Linux Ondřej Kunc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Corrupt btrfs filesystem recovery... (Due to *sata* errors)
This may be of interest for the fail cause aswel as how to recover... I have a known good 2TB (4kByte physical sectors) HDD that supports sata3 (6Gbit/s). Writing data via rsync at the 6Gbit/s sata rate caused IO errors for just THREE sectors... Yet btrfsck bombs out with LOTs of errors... How best to recover from this? (This is a 'backup' disk so not 'critical' but it would be nice to avoid rewriting about 1.5TB of data over the network...) Is there an obvious sequence/recipe to follow for recovery? Thanks, Martin Further details: Linux 3.10.7-gentoo-r1 #2 SMP Fri Sep 27 23:38:06 BST 2013 x86_64 AMD E-450 APU with Radeon(tm) HD Graphics AuthenticAMD GNU/Linux # btrfs version Btrfs v0.20-rc1-358-g194aa4a Single 2TB HDD using default mkbtrfs. Entire disk (/dev/sdc) is btrfs (no partitions). The IO errors were: kernel: end_request: I/O error, dev sdc, sector 3215049328 kernel: end_request: I/O error, dev sdc, sector 3215049328 kernel: end_request: I/O error, dev sdc, sector 3215049328 kernel: end_request: I/O error, dev sdc, sector 3215049328 kernel: end_request: I/O error, dev sdc, sector 3215049328 kernel: end_request: I/O error, dev sdc, sector 3206563752 kernel: end_request: I/O error, dev sdc, sector 3206563752 kernel: end_request: I/O error, dev sdc, sector 3206563752 kernel: end_request: I/O error, dev sdc, sector 3206563752 kernel: end_request: I/O error, dev sdc, sector 3206563752 kernel: end_request: I/O error, dev sdc, sector 3213925248 kernel: end_request: I/O error, dev sdc, sector 3213925248 kernel: end_request: I/O error, dev sdc, sector 3213925248 kernel: end_request: I/O error, dev sdc, sector 3213925248 kernel: end_request: I/O error, dev sdc, sector 3213925248 Lots of sata error noise omitted. The sata problem was fixed by limiting libata to 3Gbit/s: libata.force=3.0G added onto the Grub kernel line. Running "badblocks" twice in succession (non-destructive data test!) shows no surface errors and no further errors on the sata interface. Running btrfsck twice gives the same result, giving a failure with: Ignoring transid failure btrfsck: cmds-check.c:1066: process_file_extent: Assertion `!(rec->ino != key->objectid || rec->refs > 1)' failed. An abridged summary is: checking extents parent transid verify failed on 907185082368 wanted 15935 found 12264 parent transid verify failed on 907185082368 wanted 15935 found 12264 parent transid verify failed on 907185127424 wanted 15935 found 12264 parent transid verify failed on 907185127424 wanted 15935 found 12264 leaf parent key incorrect 907185135616 bad block 907185135616 parent transid verify failed on 915444707328 wanted 16974 found 13021 parent transid verify failed on 915444707328 wanted 16974 found 13021 parent transid verify failed on 915445092352 wanted 16974 found 13021 parent transid verify failed on 915445092352 wanted 16974 found 13021 leaf parent key incorrect 915444883456 bad block 915444883456 leaf parent key incorrect 915445014528 bad block 915445014528 parent transid verify failed on 907185082368 wanted 15935 found 12264 parent transid verify failed on 907185082368 wanted 15935 found 12264 parent transid verify failed on 907185127424 wanted 15935 found 12264 parent transid verify failed on 907185127424 wanted 15935 found 12264 leaf parent key incorrect 907183771648 bad block 907183771648 leaf parent key incorrect 907183779840 bad block 907183779840 leaf parent key incorrect 907183783936 bad block 907183783936 [...] leaf parent key incorrect 907185913856 bad block 907185913856 leaf parent key incorrect 907185917952 bad block 907185917952 parent transid verify failed on 915431579648 wanted 16974 found 16972 parent transid verify failed on 915431579648 wanted 16974 found 16972 parent transid verify failed on 915432382464 wanted 16974 found 16972 parent transid verify failed on 915432382464 wanted 16974 found 16972 parent transid verify failed on 915444707328 wanted 16974 found 13021 parent transid verify failed on 915444707328 wanted 16974 found 13021 parent transid verify failed on 915445092352 wanted 16974 found 13021 parent transid verify failed on 915445092352 wanted 16974 found 13021 parent transid verify failed on 915445100544 wanted 16974 found 13021 parent transid verify failed on 915445100544 wanted 16974 found 13021 parent transid verify failed on 915432734720 wanted 16974 found 16972 parent transid verify failed on 915432734720 wanted 16974 found 16972 parent transid verify failed on 915433144320 wanted 16974 found 16972 parent transid verify failed on 915433144320 wanted 16974 found 16972 parent transid verify failed on 915431862272 wanted 16974 found 16972 parent transid verify failed on 915431862272 wanted 16974 found 16972 parent transid verify failed on 915444715520 wanted 16974 found 13021 parent transid verify failed on 915444715520 wanted 16974 found 13021 parent transid verify failed on 915445166080 wanted 16974 found 13021 parent transid verify failed on 915445166080 wanted 16974 foun
Re: Issue building a file based rootfs image with mkfs.btrfs
On 09/28/2013 05:29 AM, Chris Mason wrote: Quoting Saul Wold (2013-09-19 14:19:34) Hi there, I am attempting to build a rootfs image from an existing rootfs directory tree. I am using the 0.20 @ 194aa4a of Chris's git repo. The couple problem I saw was that the target image file needed to exist, although I think I can patch that then the FS size was much larger than the actual size, I tracked this to the usage of ftw not accounting for symlinks, I have a patch for that which I will send once I finish getting the other issues resolved. Next issue I hit was an assertion failure after getting "not enough free space" message: not enough free space add_file_items failed unable to traverse_directory Making image is aborted. mkfs.btrfs: mkfs.c:1542: main: Assertion `!(ret)' failed. I am kind of stuck on this one, took it as far as I can right now. Would I be better off dropping back to 0.19 or can we move forward fixing this? Hi Saul, Update on my end, the problem is the image code expects every file to fit inside a single chunk. It's only creating 8MB chunks, so any file over 8MB in size is causing problems. I'm fixing it up here, I should have a patch for you on Monday. Ah great news! I want to verify is your git repo for btrfs-progs the main upstream? I see loads of other patches flying around, but not applied there. Thanks again Sau! Thanks! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: calculate disk space that a subvol could free
On 09/28/2013 03:10 AM, Zach Brown wrote: diff --git a/cmds-subvolume.c b/cmds-subvolume.c index de246ab..0f36cde 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -809,6 +809,7 @@ static int cmd_subvol_show(int argc, char **argv) int fd = -1, mntfd = -1; int ret = 1; DIR *dirstream1 = NULL, *dirstream2 = NULL; + u64 freeable_bytes; if (check_argc_exact(argc, 2)) usage(cmd_subvol_show_usage); @@ -878,6 +879,8 @@ static int cmd_subvol_show(int argc, char **argv) goto out; } + freeable_bytes = get_subvol_freeable_bytes(fd); + ret = 0; /* print the info */ printf("%s\n", fullpath); @@ -915,6 +918,8 @@ static int cmd_subvol_show(int argc, char **argv) else printf("\tFlags: \t\t\t-\n"); + printf("\tUnshared space: \t%s\n", + pretty_size(freeable_bytes)); There's no reason to have a local variable: printf("\tUnshared space: \t%s\n", pretty_size(get_subvol_freeable_bytes(fd)); printf("\tSnapshot(s):\n"); filter_set = btrfs_list_alloc_filter_set(); diff --git a/utils.c b/utils.c index ccb5199..ca30485 100644 --- a/utils.c +++ b/utils.c @@ -2062,3 +2062,157 @@ int lookup_ino_rootid(int fd, u64 *rootid) return 0; } + +/* gets the ref count for given extent + * 0 = didn't find the item + * n = number of references +*/ +u64 get_extent_refcnt(int fd, u64 disk_blk) +{ + int ret = 0, i, e; + struct btrfs_ioctl_search_args args; + struct btrfs_ioctl_search_key *sk = &args.key; + struct btrfs_ioctl_search_header sh; + unsigned long off = 0; + + memset(&args, 0, sizeof(args)); + + sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID; + + sk->min_type = BTRFS_EXTENT_ITEM_KEY; + sk->max_type = BTRFS_EXTENT_ITEM_KEY; + + sk->min_objectid = disk_blk; + sk->max_objectid = disk_blk; + + sk->max_offset = (u64)-1; + sk->max_transid = (u64)-1; + + while (1) { + sk->nr_items = 4096; + + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); + e = errno; + if (ret < 0) { + fprintf(stderr, "ERROR: search failed - %s\n", + strerror(e)); + return 0; + } + if (sk->nr_items == 0) + break; + + off = 0; + for (i = 0; i < sk->nr_items; i++) { + struct btrfs_extent_item *ei; + u64 ref; + + memcpy(&sh, args.buf + off, sizeof(sh)); + off += sizeof(sh); + + if (sh.type != BTRFS_EXTENT_ITEM_KEY) { + off += sh.len; + continue; + } + + ei = (struct btrfs_extent_item *)(args.buf + off); + ref = btrfs_stack_extent_refs(ei); + return ref; + } + sk->min_objectid = sh.objectid; + sk->min_offset = sh.offset; + sk->min_type = sh.type; + if (sk->min_offset < (u64)-1) + sk->min_offset++; + else if (sk->min_objectid < (u64)-1) { + sk->min_objectid++; + sk->min_offset = 0; + sk->min_type = 0; + } else + break; + } + return 0; +} These two fiddly functions only differ in the tree search and what they do with each item. So replace them with a function that takes a description of the search and calls the caller's callback for each item. typedef void (*item_func_t)(struct btrfs_key *key, void *data, void *arg); int btrfs_for_each_item(int fd, min and max and junk, item_func_t func, void *arg); u64 get_subvol_freeable_bytes(int fd) { u64 size_bytes = 0; btrfs_for_each_item(fd, , sum_extents, &size_bytes); return size_bytes; } Etc. You get the idea. Will fix them. Thanks ! -Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: device add should check existing FS before adding
On 09/28/2013 02:32 AM, Zach Brown wrote: @@ -49,14 +50,17 @@ static int cmd_add_dev(int argc, char **argv) int i, fdmnt, ret=0, e; DIR *dirstream = NULL; int discard = 1; + int force = 0; + char estr[100]; + res = test_dev_for_mkfs(argv[i], force, estr); + if (res) { + fprintf(stderr, "%s", estr); continue; } This test_dev_for_mkfs() error string interface is bad. The caller should not have to magically guess the string size that the function is going to use. Especially because users can trivial provide giant paths that exhaust that tiny buffer. If an arbitrarily too small buffer in the caller was needed at all, its length should have been passed in with the string pointer. (Or a string struct that all C projects eventually grow.) But all the callers just immediately print it anyway. Get rid of that string argument entirely and just have test_dev_for_mkfs() print the strings. Right. But this patch didn't introduce test_dev_for_mkfs() revamp of it will be good in a separate patch as it touches other functions as well. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue building a file based rootfs image with mkfs.btrfs
Quoting Saul Wold (2013-09-19 14:19:34) > Hi there, > > I am attempting to build a rootfs image from an existing rootfs > directory tree. I am using the 0.20 @ 194aa4a of Chris's git repo. > > The couple problem I saw was that the target image file needed to exist, > although I think I can patch that then the FS size was much larger than > the actual size, I tracked this to the usage of ftw not accounting for > symlinks, I have a patch for that which I will send once I finish > getting the other issues resolved. > > Next issue I hit was an assertion failure after getting "not enough free > space" message: > > not enough free space > add_file_items failed > unable to traverse_directory > Making image is aborted. > mkfs.btrfs: mkfs.c:1542: main: Assertion `!(ret)' failed. > > I am kind of stuck on this one, took it as far as I can right now. > Would I be better off dropping back to 0.19 or can we move forward > fixing this? Hi Saul, Update on my end, the problem is the image code expects every file to fit inside a single chunk. It's only creating 8MB chunks, so any file over 8MB in size is causing problems. I'm fixing it up here, I should have a patch for you on Monday. Thanks! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html