Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size

2019-03-05 Thread Ilia Zykov
Hello.

>> THAT is a crucial observation.  It's not an LVM bug, but the filesystem
>> trying to read 1024 bytes on a 4096 device.  
> Yes that's probably the reason. Nevertheless, its not really the FS's fault, 
> since it was moved by LVM to a 4069 device.
> The FS does not know anything about the move, so it reads in the block size 
> it was created with (1024 in this case).
> 
> I still think LVM should prevent one from mixing devices with different 
> physical block sizes, or at least warn when pvmoving or lvextending onto a PV 
> with a larger block size, since this can cause trouble.
> 

In this case, "dd" tool and others should prevent too.

Because after:

dd if=/dev/DiskWith512block bs=4096 of=/dev/DiskWith4Kblock

You couldn't mount the "/dev/DiskWith4Kblock" with the same error ;)
/dev/DiskWith512block has ext4 fs with 1k block.

P.S.
LVM,dd .. are low level tools and doesn't know about hi level anything.
And in the your case and others cases can't know. You should test(if you
need) the block size with other tools before moving or copying.
Not a lvm bug.
Thank you.



smime.p7s
Description: S/MIME Cryptographic Signature
___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size

2019-03-05 Thread Ingo Franzki
On 05.03.2019 10:29, Ilia Zykov wrote:
> Hello.
> 
>>> THAT is a crucial observation.  It's not an LVM bug, but the filesystem
>>> trying to read 1024 bytes on a 4096 device.  
>> Yes that's probably the reason. Nevertheless, its not really the FS's fault, 
>> since it was moved by LVM to a 4069 device.
>> The FS does not know anything about the move, so it reads in the block size 
>> it was created with (1024 in this case).
>>
>> I still think LVM should prevent one from mixing devices with different 
>> physical block sizes, or at least warn when pvmoving or lvextending onto a 
>> PV with a larger block size, since this can cause trouble.
>>
> 
> In this case, "dd" tool and others should prevent too.

Well, no, its LVM's pvmove who moves the data from a 512 block size to 4096 
block size device.
So its not dd that's causing the problem, but pvmove. 

> 
> Because after:
> 
> dd if=/dev/DiskWith512block bs=4096 of=/dev/DiskWith4Kblock
> 
> You couldn't mount the "/dev/DiskWith4Kblock" with the same error ;)
> /dev/DiskWith512block has ext4 fs with 1k block.
> 
> P.S.
> LVM,dd .. are low level tools and doesn't know about hi level anything.
> And in the your case and others cases can't know. You should test(if you
> need) the block size with other tools before moving or copying.
> Not a lvm bug.
Well, maybe not a bug, but I would still like to see that LVM's pvmove/lvextend 
and/or vgextend issues at least a warning when you have mixed block sizes. 

LVM knows the block sizes of the PVs it manages and it also knows when it 
changes the block size of an LV due to a pvmove or lvextend. So it can issue a 
warning (better a confirmation prompt) when this happens.
> Thank you.
> 


-- 
Ingo Franzki
eMail: ifran...@linux.ibm.com  
Tel: ++49 (0)7031-16-4648
Fax: ++49 (0)7031-16-3456
Linux on IBM Z Development, Schoenaicher Str. 220, 71032 Boeblingen, Germany

IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: 
Matthias Hartmann
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294
IBM DATA Privacy Statement: https://www.ibm.com/privacy/us/en/

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size

2019-03-05 Thread David Teigland
On Tue, Mar 05, 2019 at 06:29:31PM +0200, Nir Soffer wrote:
> I don't this way of thinking is useful. If we go in this way, then write()
> should not
> let you write data, and later maybe the disk controller should avoid this?
> 
> LVM is not a low level tool like dd. It is high level tool for managing
> device mapper,
> and providing high level tools to create user level abstractions. We can
> expect it
> to prevent system administrator from doing the wrong thing.
> 
> Maybe LVM should let you mix PVs with different logical block size, but it
> should
> require --force.
> 
> David, what do you think?

LVM needs to fix this, your solution sounds like the right one.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size

2019-03-05 Thread Stuart D. Gathman

On Tue, 5 Mar 2019, David Teigland wrote:


On Tue, Mar 05, 2019 at 06:29:31PM +0200, Nir Soffer wrote:

Maybe LVM should let you mix PVs with different logical block size, but it
should
require --force.


LVM needs to fix this, your solution sounds like the right one.


Also, since nearly every modern device device has a physical block size of
4k or more, and even when the logical block size is (emulated) 512,
performance degradation occurs with smaller filesystem blocks, 
then the savvy admin should ensure that all filesystem have a min of 
4k block size - except in special circustances.


--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM thin provisioning on encrypted root unreliable

2019-03-05 Thread kurcze

Hello again,

thank you a lot for your answer. It brought me to the solution.


I'd start with
rd.udev.debug

There's a nearly 1 minute delay with at least ata6.0, a.k.a. /dev/sdc
I can't imagine what device is taking that long to be discovered. The
transient nature makes it sound like a race could be happening. So the
gotcha with debug options is that this can affect the race condition.
Other options for debugging:

rd.debug will show dracut/initrd debug messages
systemd.log_level=debug will show systemd debug messages


After your suggestion I focused on udev (also turned on udev debugging). 
After that I got messages similar to this in dmesg (for various block 
devices):


[   26.374699] systemd-udevd[150]: Process '/sbin/lvm pvscan --cache 
--activate ay --major 259 --minor 2' failed with exit code 5.


Then I checked /lib/udev/rules.d/69-lvm-metad.rules and found a rule 
with this line:


RUN+="/sbin/lvm pvscan --cache --activate ay --major $major --minor 
$minor", ENV{LVM_SCANNED}="1"


So udev have already tried to automatically activate LVs. LVM scripts 
from initramfs try to do that later anyway, so I changed it to:


RUN+="/sbin/lvm pvscan --cache --major $major --minor $minor", 
ENV{LVM_SCANNED}="1"


It solved the problem.

Why can't udev cope with LV activation is not quite clear to me. Maybe 
this Call Trace would be of use if somebody wants to look at it:


[  182.362588] systemd-udevd[122]: worker [144] terminated by signal 9 
(KILL)

[  242.841504] INFO: task lvm:265 blocked for more than 120 seconds.
[  242.841573]   Not tainted 4.19.0-0.bpo.1-amd64 #1 Debian 
4.19.12-1~bpo9+1
[  242.841639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.

[  242.841714] lvm D    0   265  1 0x
[  242.841781] Call Trace:
[  242.841852]  ? __schedule+0x3f5/0x880
[  242.841918]  ? wait_for_completion+0x140/0x190
[  242.841984]  schedule+0x32/0x80
[  242.842049]  schedule_preempt_disabled+0xa/0x10
[  242.842116]  __mutex_lock.isra.4+0x296/0x4c0
[  242.842194]  ? table_load+0x370/0x370 [dm_mod]
[  242.842265]  ? dm_suspend+0x1f/0xc0 [dm_mod]
[  242.842335]  dm_suspend+0x1f/0xc0 [dm_mod]
[  242.842406]  dev_suspend+0x186/0x220 [dm_mod]
[  242.842478]  ctl_ioctl+0x1b5/0x4b0 [dm_mod]
[  242.842552]  dm_ctl_ioctl+0xa/0x10 [dm_mod]
[  242.842617]  do_vfs_ioctl+0xa2/0x640
[  242.842685]  ? vfs_write+0x144/0x190
[  242.842749]  ksys_ioctl+0x70/0x80
[  242.842813]  __x64_sys_ioctl+0x16/0x20
[  242.842878]  do_syscall_64+0x55/0x110
[  242.842943]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  242.843008] RIP: 0033:0x7f671fc93dd7
[  242.843075] Code: Bad RIP value.
[  242.843137] RSP: 002b:7fffc07ba778 EFLAGS: 0246 ORIG_RAX: 
0010
[  242.843214] RAX: ffda RBX: 5600a8beb190 RCX: 
7f671fc93dd7
[  242.843279] RDX: 5600a8beb190 RSI: c138fd06 RDI: 
0008
[  242.843345] RBP: 000c R08: 7f67203e1648 R09: 
7fffc07ba5e0
[  242.843410] R10: 7f67203dc413 R11: 0246 R12: 
5600a8beb1c0
[  242.843476] R13: 7f67203e0b53 R14: 5600a8bf6540 R15: 





This has no rd.luks or rd.lvm hints for dracut to do early activation.
I can't tell off hand what the layout is, if you've encrypted
partitions on each drive, and then the dmcrypt devices are made PV's;
or if the partitions are PV's, and you encrypt each LV separately?
Either method is valid but will make a difference in how it gets
assembled and therefore why and where it's failing. And anyway it
seems like that command line needs the proper hints, but I'm not
convinced that's the central problem because there's this huge 60
second gap in the dmesg where udevd is waiting for a drive apparently
to even appear and that's pretty strange.

So maybe the problem there is that gap is when you're entering in the
LUKS passphrase. So maybe the problem is that you enter that in, and
dracut is only passing the passphrase to one or two devices, and the
third device isn't there yet, so it never gets unlocked (?) and that's
why volume assembly fails is because one device is just coming up a
bit too slow.

In that case you might need a delay somewhere to improve the chance
the slow device is discovered. But that's speculation. Really we just
need more information on the storage stack, like a partition by
partition summary. If you get a successful boot, a sorted blkid (it
comes out unsorted by default) would be useful. And also if you can
figure out which drive is taking a long time to be discovered? It
wouldn't happen to be an drive in a USB enclosure would it?




Actually I used initramfs-tools (standard on Debian) to generate initrd 
image. Entering of a passphrase happens after LMV activates the devices 
(and it hanged before that). The layout is:


Normal 83 Type Partitions are PVs

These PVs are pooled to VGs

LVs are made of these VGs

LVs are LUKS encrypted

LUKS devices are ext4 formatted


Anayway. Thanks. Would be inte