Hi Gang,

The following patch sent to the list should fix the issue.

https://patchwork.kernel.org/patch/10002583/

Thanks,
Ashish


On 10/27/2017 02:47 AM, Gang He wrote:
> Hello Guys,
>
> I got a bug from the customer, he said, fstrim command corrupted ocfs2 file 
> system on their SSD SAN, the file system became read-only and SSD LUN was 
> configured by multipath.
> After umount the file system, the customer ran fsck.ocfs2 on this file 
> system, then the file system can be mounted until the next fstrim happens.
> The error messages were likes,
> 2017-10-02T00:00:00.334141+02:00 rz-xen10 systemd[1]: Starting Discard unused 
> blocks...
> 2017-10-02T00:00:00.383805+02:00 rz-xen10 fstrim[36615]: fstrim: /xensan1: 
> FITRIM ioctl fehlgeschlagen: Das Dateisystem ist nur lesbar
> 2017-10-02T00:00:00.385233+02:00 rz-xen10 kernel: [1092967.091821] OCFS2: 
> ERROR (device dm-5): ocfs2_validate_gd_self: Group descriptor #8257536 has 
> bad signature  <<== here
> 2017-10-02T00:00:00.385251+02:00 rz-xen10 kernel: [1092967.091831] On-disk 
> corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted.
> 2017-10-02T00:00:00.385254+02:00 rz-xen10 kernel: [1092967.091836] 
> (fstrim,36615,5):ocfs2_trim_fs:7422 ERROR: status = -30
> 2017-10-02T00:00:00.385854+02:00 rz-xen10 systemd[1]: fstrim.service: Main 
> process exited, code=exited, status=32/n/a
> 2017-10-02T00:00:00.386756+02:00 rz-xen10 systemd[1]: Failed to start Discard 
> unused blocks.
> 2017-10-02T00:00:00.387236+02:00 rz-xen10 systemd[1]: fstrim.service: Unit 
> entered failed state.
> 2017-10-02T00:00:00.387601+02:00 rz-xen10 systemd[1]: fstrim.service: Failed 
> with result 'exit-code'.
>
> The similar bug looks like 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_ubuntu_-2Bsource_util-2Dlinux_-2Bbug_1681410&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=f4ohdmGrYxZejY77yzx3eNgTHb1ZAfZytktjHqNVzc8&m=Jdo98IlzJDxBqiDEhsKfqxvEt4B6WpIbZ_woY7zmLFw&s=xp0bUwpDVIHZP9g4EboYYG_1gkenzWEt_O_5KZXyFg8&e=
>  .
> Then, I tried to reproduce this bug in local.
> Since I have not a SSD SAN, I found a PC server which has a SSD disk.
> I setup a two nodes ocfs2 cluster in VM on this PC server, attach this SSD 
> disk to each VM instance twice, then I can configure this SSD disk with 
> multipath tool,
> the configuration on each node likes,
> sle12sp3-nd1:/ # multipath -l
> INTEL_SSDSA2M040G2GC_CVGB0490002C040NGN dm-0 ATA,INTEL SSDSA2M040
> size=37G features='1 retain_attached_hw_handler' hwhandler='0' wp=rw
> |-+- policy='service-time 0' prio=0 status=active
> | `- 0:0:0:0 sda 8:0  active undef unknown
> `-+- policy='service-time 0' prio=0 status=enabled
>    `- 0:0:0:1 sdb 8:16 active undef unknown
>
> Next, I do some fstrim command from each node simultaneously,
> I also do dd command to write data to the shared SSD disk during fstrim 
> commands.
> But, I can not reproduce this issue, all the things go well.
>
> Then, I'd like to ping the list, did who ever encounter this bug?  If yes, 
> please help to provide some information.
> I think there are three factors which are related to this bug, SSD device 
> type, multipath configuration and simultaneously fstrim.
>
> Thanks a lot.
> Gang
>
>
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to