Bug#549660: md raid1 + lvm2 + snapshot resulted in lvm2 hang

2009-10-05 Thread Phil Ten
Package: lvm2
Version: 2.02.39-7
Severity: important

Hello,

This problem is sort of similar to #419209 but I believe
it is different because in my case the snapshot 
was successfully created and the volume stalled 
later while writing to the snapshot.

I use a proxmox 1.3:
Linux ns300364.ovh.net 2.6.24-7-pve #1 SMP PREEMPT Fri Aug 21 09:07:39 CEST 
2009 x86_64 GNU/Linux

The system was installed a couple weeks ago and is not an upgrade.
Snapshot worked fine until the problem occured
despite no changes to the disks configuration.


My configuration:

md1: md RAID 1 + ext3 mounted as /
md0: md RAID 1 + lvm2  divided in 2 x ext3 volumes vmdata and vmbackups, 
mounted as /var/lib/vz and /backups.


r...@ns300364:/backups/tmp# lvdisplay
  --- Logical volume ---
  LV Name/dev/data/vmdata
  VG Namedata
  LV UUID9CzFBp-k7fV-wlls-qeeG-v7Or-u1pq-9XhKKy
  LV Write Accessread/write
  LV Status  available
  # open 1
  LV Size309.57 GB
  Current LE 79250
  Segments   1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device   254:1

  --- Logical volume ---
  LV Name/dev/data/vmbackup
  VG Namedata
  LV UUIDjzCjXx-IodU-chBx-Aw3L-JUbv-dRho-vaOFCl
  LV Write Accessread/write
  LV Status  available
  # open 1
  LV Size309.57 GB
  Current LE 79250
  Segments   1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device   254:4

r...@ns300364:~# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90
  Creation Time : Tue Sep 15 17:48:43 2009
 Raid Level : raid1
 Array Size : 664986496 (634.18 GiB 680.95 GB)
  Used Dev Size : 664986496 (634.18 GiB 680.95 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Sun Oct  4 02:02:04 2009
  State : active, recovering
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

 Rebuild Status : 29% complete

   UUID : ab296276:ea3e622e:7008e345:84b8f442 (local to host 
ns300364.ovh.net)
 Events : 0.17

Number   Major   Minor   RaidDevice State
   0   830  active sync   /dev/sda3
   1   8   191  active sync   /dev/sdb3

Symptoms:

- snapshot creation for vmdata OK
- backup on vmbackups started OK
- after writing about 1Gb the snapshot stalled. I mean that all requests
to read files on the lvm volume will hang. 
However ls and cd commands do work and I can get directories listing. 
Any command to read a file content stall the ssh session (ex cat,cp,mv). 
In particular, cat /backups/phil.log will also stall the ssh session. 
Remember that the snapshot is for volume vmdata, and the cat above concern 
volume vmbackups.
- smartctl do not report any problem (including the long test)
- wa in top is blocked at 99%, cpu is near zero.
- the snapshot is visible in /dev/mapper
- the snapshot cannot be removed (lvremove -f). again no error reported. just 
hanged with no output at all.
- the system seem to work fine as long as nothing tries to read on one of the 
2 lvm2 volumes.
- no error reported in messages or syslog. 
- it seem a md check started after the snapshot creation. This check process 
also stalled at 29% (speed=0K/sec). again no error reported.
- soft reboot did not work
- hard reboot worked. But a md resync started and stalled at 0.1% leaving the 
system in the same context as before the hard reboot.

To recover a working system I set sdb3 as faulty, removed sdb3 from the raid1 
and hard rebooted. That worked and I could remove the snapshot and access the 
data 
on both lvm volumes. Since then, I did not try to create a snapshot and system 
seem to work fine.

Greetings,

Phil Ten



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#549660: md raid1 + lvm2 + snapshot resulted in lvm2 hang

2009-10-05 Thread Phil Ten
Thank you for your post.

I assume it won't change your opinion on this 
problem. But just to correct my post. I made
an incorrect statement. When the problem 
occured the application was copying from the 
snapshot to the vmbackup volume. Therefore 
the 1GB was read from the snapshot, and not 
writen to the snapshot as I said previously.

To my knowledge it seem impossible that 
the snapshot was full. Plus, I always thought 
that a full snapshot was automatically disabled,
but the file system would remain usable.

In this case, no report in the log, and the complete 
lvm2 group volume became NOT usable (read/write hang). 
Including the second volume not involved in the snapshot.

I can't believe that a full snapshot would affect other 
lvm2 volumes, but I guess it maybe possible 

Thanks anyway.

Phil Ten





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org