On 2016-05-07 12:11, Niccolò Belli wrote:
Il 2016-05-07 17:58 Clemens Eisserer ha scritto:
Hi Niccolo,
btrfs + dmcrypt + compress=lzo + autodefrag = corruption at first boot
Just to be curious - couldn't it be a hardware issue? I use almost the
same setup (compress-force=lzo instead of compress-force=lzo) on my
laptop for 2-3 years and haven't experienced any issues since
~kernel-3.14 or so.
Br, Clemens Eisserer
Hi,
Which kind of hardware issue? I did a full memtest86 check, a full
smartmontools extended check and even a badblocks -wsv.
If this is really an hardware issue that we can identify I would be more
than happy because Dell will replace my laptop and this nightmare will
be finally over. I'm open to suggestions.
First, some general advice:
1. It is fully possible to have bad RAM that still passes memtest86
consistently, and in fact, most of the time this will be the case (if
you're seeing any thing other than the bit-fade test in memtest86 fail,
then your system probably won't boot fully). Memtest doesn't replicate
typical usage patterns very well. My usual testing for RAM involves not
just memtest, but also booting into a LiveCD (usually SystemRescueCD),
pulling down a copy of the kernel source, and then running as many
concurrent kernel builds as cores, each with as many make jobs as cores
(so if you've got a quad core CPU (or a dual core with hyperthreading),
it would be running 4 builds with -j4 passed to make). GCC seems to
have memory usage patterns that reliably trigger memory errors that
aren't caught by memtest, so this generally gives good results.
Secondarily, if it's a big system and I am not pressed for time, I do a
quick Gentoo install with Xen, and then spin up twice as many Xen VM's
as cores and run memtest in those concurrently (this seems to catch
things a bit more reliably than just a plain memtest).
2. On a similar note, badblocks doesn't replicate filesystem like access
patterns, it just runs sequentially through the entire disk. This isn't
as likely to give bad results, but it's still important to know. In
particular, try running it over a dmcrypt volume a couple of times
(preferably with a different key each time, pulling keys from
/dev/urandom works well for this), as that will result in writing
different data. For what it's worth, when I'm doing initial testing of
new disks, I always use ddrescue to copy /dev/zero over the whole disk,
then do it twice through dmcrypt with different keys, copying from the
disk to /dev/null after each pass. This gives random data on disk as a
starting point (which is good if you're going to use dmcrypt), and
usually triggers reallocation of any bad sectors as early as possible.
If I have time and access to an existing system I can connect the disk
to, I often do testing with fio as well.
Now, to slightly more specific advice:
1. If you have an eSATA port, try plugging your hard disk in there and
see if things work. If that works but having the hard drive plugged in
internally doesn't, then the issue is probably either that specific SATA
port (in which case your chip-set is bad and you should get a new
system), or the SATA connector itself (or the wiring, but that's not as
likely when it's traces on a PCB). Normally I'd suggest just swapping
cables and SATA ports, but that's not really possible with a laptop.
2. If you have access to a reasonably large flash drive, or to a USB to
SATA adapter, try that as well, if it works on that but not internally
(or on an eSATA port), you've probably got a bad SATA controller, and
should get a new system.
3. Try things without dmcrypt. Adding extra layers makes it harder to
determine what is actually wrong. If it works without dmcrypt, try
using different parameters for the encryption (different ciphers is what
I would try first). If it works reliably without dmcrypt, then it's
either a bug in dmcrypt (which I don't think is very likely), or it's
bad interaction between dmcrypt and BTRFS. If it works with some
encryption parameters but not others, then that will help narrow down
where the issue is.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html