> On 6 Jul 2016, at 02:25, Henk Slager <eye...@gmail.com> wrote: > > On Wed, Jul 6, 2016 at 2:32 AM, Tomasz Kusmierz <tom.kusmi...@gmail.com> > wrote: >> >> On 6 Jul 2016, at 00:30, Henk Slager <eye...@gmail.com> wrote: >> >> On Mon, Jul 4, 2016 at 11:28 PM, Tomasz Kusmierz <tom.kusmi...@gmail.com> >> wrote: >> >> I did consider that, but: >> - some files were NOT accessed by anything with 100% certainty (well if >> there is a rootkit on my system or something in that shape than maybe yes) >> - the only application that could access those files is totem (well >> Nautilius checks extension -> directs it to totem) so in that case we would >> hear about out break of totem killing people files. >> - if it was a kernel bug then other large files would be affected. >> >> Maybe I’m wrong and it’s actually related to the fact that all those files >> are located in single location on file system (single folder) that might >> have a historical bug in some structure somewhere ? >> >> >> I find it hard to imagine that this has something to do with the >> folderstructure, unless maybe the folder is a subvolume with >> non-default attributes or so. How the files in that folder are created >> (at full disktransferspeed or during a day or even a week) might give >> some hint. You could run filefrag and see if that rings a bell. >> >> files that are 4096 show: >> 1 extent found > > I actually meant filefrag for the files that are not (yet) truncated > to 4k. For example for virtual machine imagefiles (CoW), one could see > an MBR write. 117 extents found filesize 15468645003
good / bad ? > >> I did forgot to add that file system was created a long time ago and it was >> created with leaf & node size = 16k. >> >> >> If this long time ago is >2 years then you have likely specifically >> set node size = 16k, otherwise with older tools it would have been 4K. >> >> You are right I used -l 16K -n 16K >> >> Have you created it as raid10 or has it undergone profile conversions? >> >> Due to lack of spare disks >> (it may sound odd for some but spending for more than 6 disks for home use >> seems like an overkill) >> and due to last I’ve had I had to migrate all data to new file system. >> This played that way that I’ve: >> 1. from original FS I’ve removed 2 disks >> 2. Created RAID1 on those 2 disks, >> 3. shifted 2TB >> 4. removed 2 disks from source FS and adde those to destination FS >> 5 shifted 2 further TB >> 6 destroyed original FS and adde 2 disks to destination FS >> 7 converted destination FS to RAID10 >> >> FYI, when I convert to raid 10 I use: >> btrfs balance start -mconvert=raid10 -dconvert=raid10 -sconvert=raid10 -f >> /path/to/FS >> >> this filesystem has 5 sub volumes. Files affected are located in separate >> folder within a “victim folder” that is within a one sub volume. >> >> >> It could also be that the ondisk format is somewhat corrupted (btrfs >> check should find that ) and that that causes the issue. >> >> >> root@noname_server:/mnt# btrfs check /dev/sdg1 >> Checking filesystem on /dev/sdg1 >> UUID: d4cd1d5f-92c4-4b0f-8d45-1b378eff92a1 >> checking extents >> checking free space cache >> checking fs roots >> checking csums >> checking root refs >> found 4424060642634 bytes used err is 0 >> total csum bytes: 4315954936 >> total tree bytes: 4522786816 >> total fs tree bytes: 61702144 >> total extent tree bytes: 41402368 >> btree space waste bytes: 72430813 >> file data blocks allocated: 4475917217792 >> referenced 4420407603200 >> >> No luck there :/ > > Indeed looks all normal. > >> In-lining on raid10 has caused me some trouble (I had 4k nodes) over >> time, it has happened over a year ago with kernels recent at that >> time, but the fs was converted from raid5 >> >> Could you please elaborate on that ? you also ended up with files that got >> truncated to 4096 bytes ? > > I did not have truncated to 4k files, but your case lets me think of > small files inlining. Default max_inline mount option is 8k and that > means that 0 to ~3k files end up in metadata. I had size corruptions > for several of those small sized files that were updated quite > frequent, also within commit time AFAIK. Btrfs check lists this as > errors 400, although fs operation is not disturbed. I don't know what > happens if those small files are being updated/rewritten and are just > below or just above the max_inline limit. > > The only thing I was thinking of is that your files were started as > small, so inline, then extended to multi-GB. In the past, there were > 'bad extent/chunk type' issues and it was suggested that the fs would > have been an ext4-converted one (which had non-compliant mixed > metadata and data) but for most it was not the case. So there was/is > something unclear, but full balance or so fixed it as far as I > remember. But it is guessing, I do not have any failure cases like the > one you see. When I think of it, I did move this folder first when filesystem was RAID 1 (or not even RAID at all) and then it was upgraded to RAID 1 then RAID 10. Was there a faulty balance around August 2014 ? Please remember that I’m using Ubuntu so it was probably kernel from Ubuntu 14.04 LTS Also, I would like to hear it from horses mouth: dos & donts for a long term storage where you moderately care about the data: RAID10 - flaky ? would RAID1 give similar performance ? leaf & node size = 16k - pointless / flaky / untested / phased out ? growing FS: add disks and rebalance and then change to different RAID level or it doesn’t matter ?! RAID level on system data - am I an idiot to just even touch it ? > >> You might want to run the python scrips from here: >> https://github.com/knorrie/python-btrfs >> >> Will do. >> >> so that maybe you see how block-groups/chunks are filled etc. >> >> (ps. this email client on OS X is driving me up the wall … have to correct >> the corrections all the time :/) >> >> On 4 Jul 2016, at 22:13, Henk Slager <eye...@gmail.com> wrote: >> >> On Sun, Jul 3, 2016 at 1:36 AM, Tomasz Kusmierz <tom.kusmi...@gmail.com> >> wrote: >> >> Hi, >> >> My setup is that I use one file system for / and /home (on SSD) and a >> larger raid 10 for /mnt/share (6 x 2TB). >> >> Today I've discovered that 14 of files that are supposed to be over >> 2GB are in fact just 4096 bytes. I've checked the content of those 4KB >> and it seems that it does contain information that were at the >> beginnings of the files. >> >> I've experienced this problem in the past (3 - 4 years ago ?) but >> attributed it to different problem that I've spoke with you guys here >> about (corruption due to non ECC ram). At that time I did deleted >> files affected (56) and similar problem was discovered a year but not >> more than 2 years ago and I believe I've deleted the files. >> >> I periodically (once a month) run a scrub on my system to eliminate >> any errors sneaking in. I believe I did a balance a half a year ago ? >> to reclaim space after I deleted a large database. >> >> root@noname_server:/mnt/share# btrfs fi show >> Label: none uuid: 060c2345-5d2f-4965-b0a2-47ed2d1a5ba2 >> Total devices 1 FS bytes used 177.19GiB >> devid 3 size 899.22GiB used 360.06GiB path /dev/sde2 >> >> Label: none uuid: d4cd1d5f-92c4-4b0f-8d45-1b378eff92a1 >> Total devices 6 FS bytes used 4.02TiB >> devid 1 size 1.82TiB used 1.34TiB path /dev/sdg1 >> devid 2 size 1.82TiB used 1.34TiB path /dev/sdh1 >> devid 3 size 1.82TiB used 1.34TiB path /dev/sdi1 >> devid 4 size 1.82TiB used 1.34TiB path /dev/sdb1 >> devid 5 size 1.82TiB used 1.34TiB path /dev/sda1 >> devid 6 size 1.82TiB used 1.34TiB path /dev/sdf1 >> >> root@noname_server:/mnt/share# uname -a >> Linux noname_server 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 >> 10:09:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >> root@noname_server:/mnt/share# btrfs --version >> btrfs-progs v4.4 >> root@noname_server:/mnt/share# >> >> >> Problem is that stuff on this filesystem moves so slowly that it's >> hard to remember historical events ... it's like AWS glacier. What I >> can state with 100% certainty is that: >> - files that are affected are 2GB and over (safe to assume 4GB and over) >> - files affected were just read (and some not even read) never written >> after putting into storage >> - In the past I've assumed that files affected are due to size, but I >> have quite few ISO files some backups of virtual machines ... no >> problems there - seems like problem originates in one folder & size > >> 2GB & extension .mkv >> >> >> In case some application is the root cause of the issue, I would say >> try to keep some ro snapshots done by a tool like snapper for example, >> but maybe you do that already. It sounds also like this is some kernel >> bug, snaphots won't help that much then I think. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html