random i/o error without error in dmesg

Szalma László Mon, 26 Oct 2015 04:33:51 -0700

Hi,

I have this error for a time, It's not easy to reproduce, i writeeverything i know at the moment.

I maintain some servers running xen (4.5.1) and gentoo dom0 with recentkernels (3.18.*, 4.1.6, 4.2.3, 4.2.4). I use gentoo-sources patchset.

Running xen domu s, for www and mysql.

I have mysql servers in domu with high load (lots of read write). Thesesystems are identical in term of configuration and kernel.

Sometimes I got mysql errors randomly (sometimes more than one at a day,sometimes one at a week), but it is more frequent on high load.

The mysql errors are because the file cannot be read from thefilesystem. If i try to run md5sum on it it shows io error.

At this point mysql stop && umount && mount && mysql start solves theproblem.


calling
echo 3 > /proc/sys/vm/drop_caches

sometimes solves the io error, but not every time. The problem rarelyrandomly fixed without remount.

The problem seems to have no connection to the dom0 kernel and the xenversion. I have this problem for example on these dom0 -s:


kernel: 3.19.3  xen 4.5.0
kernel: 4.2.3 xen 4.5.1

The problem seems to have started with the kernel 4.0 series, but I amnot sure. In the summer the load was low, and the problem occured veryrarely.


In this case of io error:
btrfs scrub finds no error.

no memory or hdd/ssd hardware error (smart, memtest, etc) (not only onephysical server is affected) and no errors in dmesg at all.tried different kernel configs, but I don't think I have anythingextraordinary.

I use deadline scheduler.
I use these mount options:

/dev/xvdb1 on /mnt/mysql_naplo_b2 type btrfs(rw,noatime,compress=zlib,nossd,noacl,space_cache,subvolid=5,subvol=/)

I tried to reformat the filesystem with recent btrfs-progs: (and oldersbefore)

btrfs-progs v4.2.2
I use default mkfs options (skinny extents)

After format the problem was disappeared for some days. (it seemscorrelation with the age of the filesystem?)I do manual defragment on the filesystem with a script simplyrecursively check "filefrag" for count the fragmentation and defrag ifit is more than 50 and the file is larger than 64kbyte. (this sometimeslowers the frequency of the problem)

The files unreadable are usually small files, for example:

filefrag:

/mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD: 2extents found

ls -l:

-rw-rw---- 1 mysql mysql 8092 okt 22 08.24/mnt/mysql_naplo_b2/mozanaplo_boly_altisk_2015/n_helyettesites.MYD


There is no error in dmesg, no io errors, no kernel panic, etc at all.

The (virtual) servers has 3-4GB of memory, and I use a 2GB tmpfs for thetemporary tables (this way the physical memory usage is somewhat hectic).

The filesystem has no snapshots, but sometimes (for rebuildingreplication) I take on, and delete it. (but the problem happens onfilesystems with no snapshot created ever)

I did not try downgrading the kernel (for 3.18), but I always try toupgrade.

I guess this problem has some connection to the memory usage (but thereis no out of memory).

I am able to try any debug mode if you suggest one, but it's notreproducable, it happens randomly. I think there should be some errorsin the dmesg if I encounter io errors, but I am not sure if this errorhas direct connection for btrfs at all. I didn't try other filesystems.The problem was occured with kernel versions: 4.0.1, 4.0.4, 4.1.6,4.2.1, 4.2.3, 4.2.4.

I checked the bugzilla, and google for similar problem, but I couldn'tfind any similar.

This problem sometimes (i think it is the same) happen on a www servertoo, with apache log files (they are fragmented heavily), but veryrarely. I don't have any problem with this configuration on otherservers even mysql servers with lower load.


I welcome any suggestion:

László Szalma
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

random i/o error without error in dmesg

Reply via email to