Hi,
I have this error for a time, It's not easy to reproduce, i write
everything i know at the moment.
I maintain some servers running xen (4.5.1) and gentoo dom0 with recent
kernels (3.18.*, 4.1.6, 4.2.3, 4.2.4). I use gentoo-sources patchset.
Running xen domu s, for www and mysql.
I have mysql servers in domu with high load (lots of read write). These
systems are identical in term of configuration and kernel.
Sometimes I got mysql errors randomly (sometimes more than one at a day,
sometimes one at a week), but it is more frequent on high load.
The mysql errors are because the file cannot be read from the
filesystem. If i try to run md5sum on it it shows io error.
At this point mysql stop && umount && mount && mysql start solves the
problem.
calling
echo 3 > /proc/sys/vm/drop_caches
sometimes solves the io error, but not every time. The problem rarely
randomly fixed without remount.
The problem seems to have no connection to the dom0 kernel and the xen
version. I have this problem for example on these dom0 -s:
kernel: 3.19.3 xen 4.5.0
kernel: 4.2.3 xen 4.5.1
The problem seems to have started with the kernel 4.0 series, but I am
not sure. In the summer the load was low, and the problem occured very
rarely.
In this case of io error:
btrfs scrub finds no error.
no memory or hdd/ssd hardware error (smart, memtest, etc) (not only one
physical server is affected) and no errors in dmesg at all.
tried different kernel configs, but I don't think I have anything
extraordinary.
I use deadline scheduler.
I use these mount options:
/dev/xvdb1 on /mnt/mysql_naplo_b2 type btrfs
(rw,noatime,compress=zlib,nossd,noacl,space_cache,subvolid=5,subvol=/)
I tried to reformat the filesystem with recent btrfs-progs: (and olders
before)
btrfs-progs v4.2.2
I use default mkfs options (skinny extents)
After format the problem was disappeared for some days. (it seems
correlation with the age of the filesystem?)
I do manual defragment on the filesystem with a script simply
recursively check "filefrag" for count the fragmentation and defrag if
it is more than 50 and the file is larger than 64kbyte. (this sometimes
lowers the frequency of the problem)
The files unreadable are usually small files, for example:
filefrag:
/mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD: 2
extents found
ls -l:
-rw-rw---- 1 mysql mysql 8092 okt 22 08.24
/mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD
There is no error in dmesg, no io errors, no kernel panic, etc at all.
The (virtual) servers has 3-4GB of memory, and I use a 2GB tmpfs for the
temporary tables (this way the physical memory usage is somewhat hectic).
The filesystem has no snapshots, but sometimes (for rebuilding
replication) I take on, and delete it. (but the problem happens on
filesystems with no snapshot created ever)
I did not try downgrading the kernel (for 3.18), but I always try to
upgrade.
I guess this problem has some connection to the memory usage (but there
is no out of memory).
I am able to try any debug mode if you suggest one, but it's not
reproducable, it happens randomly. I think there should be some errors
in the dmesg if I encounter io errors, but I am not sure if this error
has direct connection for btrfs at all. I didn't try other filesystems.
The problem was occured with kernel versions: 4.0.1, 4.0.4, 4.1.6,
4.2.1, 4.2.3, 4.2.4.
I checked the bugzilla, and google for similar problem, but I couldn't
find any similar.
This problem sometimes (i think it is the same) happen on a www server
too, with apache log files (they are fragmented heavily), but very
rarely. I don't have any problem with this configuration on other
servers even mysql servers with lower load.
I welcome any suggestion:
László Szalma
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html