[Kernel-packages] [Bug 1756315] Re: fstrim and discard operations take too long to complete - Ubuntu 16.04

2020-09-14 Thread Matthew Ruffell
Hello Alexandre,

I tried to reproduce this bug, and I believe it has been fixed.

I started a i3.4xlarge instance on AWS, with Xenial as the distro:

$ uname -rv
4.4.0-1112-aws #124-Ubuntu SMP Fri Jul 24 11:10:25 UTC 2020

>From there, I checked the NVMe disks:

$ lsblk
NAMEMAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda202:008G  0 disk 
`-xvda1 202:108G  0 part /
nvme0n1 259:00  1.7T  0 disk 
nvme1n1 259:10  1.7T  0 disk

Made a Raid 0 array:

$ sudo mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2 
/dev/nvme0n1 /dev/nvme1n1
mdadm: chunk size defaults to 512K
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started

And formatted it:

$ time sudo mkfs.xfs /dev/md0
meta-data=/dev/md0   isize=512agcount=32, agsize=28989568 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=0
data =   bsize=4096   blocks=927666176, imaxpct=5
 =   sunit=128swidth=256 blks
naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
log  =internal log   bsize=4096   blocks=452968, version=2
 =   sectsz=512   sunit=8 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

real0m24.414s
user0m0.000s
sys 0m7.664s

I also tried to fstrim:

$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
$ sudo fstrim /mnt/disk
$ time sudo fstrim /mnt/disk

real0m22.083s
user0m0.000s
sys 0m7.560s

Things seem okay, I think this has been fixed somewhere between 4.4.0-1052-aws
and 4.4.0-1112-aws. 

I'm going to mark this bug as fixed released, but let me know if you still have
problems.

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: linux-aws (Ubuntu)

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu)
   Status: New => Fix Released

** Changed in: linux (Ubuntu Xenial)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1756315

Title:
  fstrim and discard operations take too long to complete - Ubuntu 16.04

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released

Bug description:
  1-) Ubuntu Release : Ubuntu 16.04 LTS

  2-) linux-image-4.4.0.1052.54-aws

  3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should
  not take more than some seconds to complete.

  4-) Formating the raid0 array with xfs took around 2 hours. Running
  fstrim -v on the mount point mounted on top of the raid array took
  around 2 hours.

  How to reproduce the issue:

  - Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 16.04 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
  - Compose a raid0 array with the following command :

   # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
  /dev/nvme0n1 /dev/nvme1n1

  - When trying to format the raid0 array ( /dev/md0 ) using xfs it
  takes around 2 hours to complete. I tried other AMIs like RHEL7,
  CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

  root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

  real120m45.725s
  user0m0.000s
  sys 0m18.248s

  - Running fstrim -v on a filesystem mounted on top of /dev/md0 can
  take around 2 hours to complete. With other AMIs like RHEL7, CentOS7
  and Ubuntu 18.04 and the time needed was around 2 seconds.

  - When I try the same with any of the nvme SSD devices alone, let's
  say /dev/nvme0n1, the issue doesn't happen.

  - I tried to replicate this issue using LVM and striping, fstrim and
  mkfs.xfs, the tasks complete without taking hours :

  root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
Physical volume "/dev/nvme0n1" successfully created

  root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
Physical volume "/dev/nvme1n1" successfully created

  root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
Volume group "raid0" successfully created

  root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 
100%FREE raid0 /dev/nvme0n1 /dev/nvme1n1
Using default stripesize 64.00 KiB.
Logical volume "lvol0" created.

  root@ip-172-31-27-69:~# vgchange -ay
1 logical volume(s) in volume group "raid0" now active

  root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

  root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
LVVGAttr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
Convert
lvol0 raid0 -wi-a- 3.46t
  htop
  root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
  meta-data=/dev/raid0/lvol0   

[Kernel-packages] [Bug 1756315] Re: fstrim and discard operations take too long to complete - Ubuntu 16.04

2019-08-23 Thread Lukas Dzunko
Update to my previous comment ... running fstrim on 4.15.0-58-generic
result in much more reported trimmed data (206.6 GiB) vs
4.15.0-55-generic (144.4 GiB) (e.g. this was not caused by snapshot
release). On 4.15.0-58-generic I see errors like this in kernel log:

BTRFS warning (device dm-0): failed to trim 307 block group(s), last error -512
BTRFS warning (device dm-0): failed to trim 1 device(s), last error -512

Scrub finished with not problems so data on disk are fine (yeah!).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1756315

Title:
  fstrim and discard operations take too long to complete - Ubuntu 16.04

Status in linux-aws package in Ubuntu:
  Confirmed

Bug description:
  1-) Ubuntu Release : Ubuntu 16.04 LTS

  2-) linux-image-4.4.0.1052.54-aws

  3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should
  not take more than some seconds to complete.

  4-) Formating the raid0 array with xfs took around 2 hours. Running
  fstrim -v on the mount point mounted on top of the raid array took
  around 2 hours.

  How to reproduce the issue:

  - Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 16.04 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
  - Compose a raid0 array with the following command :

   # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
  /dev/nvme0n1 /dev/nvme1n1

  - When trying to format the raid0 array ( /dev/md0 ) using xfs it
  takes around 2 hours to complete. I tried other AMIs like RHEL7,
  CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

  root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

  real120m45.725s
  user0m0.000s
  sys 0m18.248s

  - Running fstrim -v on a filesystem mounted on top of /dev/md0 can
  take around 2 hours to complete. With other AMIs like RHEL7, CentOS7
  and Ubuntu 18.04 and the time needed was around 2 seconds.

  - When I try the same with any of the nvme SSD devices alone, let's
  say /dev/nvme0n1, the issue doesn't happen.

  - I tried to replicate this issue using LVM and striping, fstrim and
  mkfs.xfs, the tasks complete without taking hours :

  root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
Physical volume "/dev/nvme0n1" successfully created

  root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
Physical volume "/dev/nvme1n1" successfully created

  root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
Volume group "raid0" successfully created

  root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 
100%FREE raid0 /dev/nvme0n1 /dev/nvme1n1
Using default stripesize 64.00 KiB.
Logical volume "lvol0" created.

  root@ip-172-31-27-69:~# vgchange -ay
1 logical volume(s) in volume group "raid0" now active

  root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

  root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
LVVGAttr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
Convert
lvol0 raid0 -wi-a- 3.46t
  htop
  root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
  meta-data=/dev/raid0/lvol0   isize=512agcount=32, agsize=28991664 blks
   =   sectsz=512   attr=2, projid32bit=1
   =   crc=1finobt=1, sparse=0
  data =   bsize=4096   blocks=927733248, imaxpct=5
   =   sunit=16 swidth=32 blks
  naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
  log  =internal log   bsize=4096   blocks=453008, version=2
   =   sectsz=512   sunit=16 blks, lazy-count=1
  realtime =none   extsz=4096   blocks=0, rtextents=0

  real0m2.926s
  user0m0.180s
  sys 0m0.000s

  
  root@ip-172-31-27-69:~# mount /dev/raid0/lvol0 /mnt

  root@ip-172-31-27-69:~# time fstrim -v /mnt
  /mnt: 3.5 TiB (3798138650624 bytes) trimmed

  real0m1.794s
  user0m0.000s
  sys 0m0.000s

  So the issue only happens when using nvme and md to compose the raid0
  array.

  Bellow follows some information that may be useful:

  started formating the md array with mkfs.xfs. Process looks hanged.

  root@ip-172-31-24-66:~# ps aux | grep -i mkfs.xfs
  root   1693 12.0  0.0  12728   968 pts/1D+   07:54   0:03 mkfs.xfs 
/dev/md0

  PID 1693 is in uninterruptible sleep ( D )

  Looking at /proc/7965/stack

  root@ip-172-31-24-66:~# cat /proc/1693/stack
  [] blkdev_issue_discard+0x232/0x2a0
  [] blkdev_ioctl+0x61d/0x7d0
  [] block_ioctl+0x41/0x50
  [] do_vfs_ioctl+0x2e3/0x4d0
  [] SyS_ioctl+0x81/0xa0
  [] system_call_fastpath+0x1a/0x1f
  [] 0x

  
  Looking at the stack, looks like it's hanged on a discard operation

  root@ip-172-31-24-66:~# ps -flp 1693
  F S UID PID   PPID  C 

[Kernel-packages] [Bug 1756315] Re: fstrim and discard operations take too long to complete - Ubuntu 16.04

2019-08-23 Thread Lukas Dzunko
I can confirm this bug on Ubuntu 16.04.6 LTS. Apparently it is related
to latest kernel update (in my case HWE).

---
Affected version:

Linux version 4.15.0-58-generic (buildd@lgw01-amd64-037) (gcc version
5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #64~16.04.1-Ubuntu SMP
Wed Aug 7 14:10:35 UTC 2019


root@lukas:~# time fstrim -v /
/: 206.6 GiB (221849395200 bytes) trimmed

real22m18.448s
user0m0.001s
sys 0m18.827s
root@lukas:~# 

lukas@lukas:~$ collectl -s d -i 1
waiting for 1 second sample...
#<--Disks--->
#KBRead  Reads KBWrit Writes 
  0  0  0  0 
  8  1  0  0 
  8  1   4344   1083 
  0  0  21848   1939 
  0  0  26420   1360 
  0  0  24468   1275 

... Normally I see spike in write and fstrim is done within seconds. On
version 4.15.0-58-generic is taking 20 to 30 minutes to complete. During
this time I see write to disk at around 25 - 30 MB/s and overall system
experience is slow (UI response, load time, speed of file operations)

---
Unaffected version:

Linux version 4.15.0-55-generic (buildd@lgw01-amd64-038) (gcc version
5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #60~16.04.2-Ubuntu SMP
Thu Jul 4 09:03:09 UTC 2019


root@lukas:~# time fstrim -v /
/: 144.4 GiB (155019710464 bytes) trimmed

real0m3.185s
user0m0.006s
sys 0m0.000s
root@lukas:~# 

lukas@lukas:~$ collectl -s d -i 1
waiting for 1 second sample...
#<--Disks--->
#KBRead  Reads KBWrit Writes 
  0  0  0  0 
  0  0 86398K 71 
  0  0 61439K 45 
  0  0  0  0 
  0  0  0  0 

... I rebooted my laptop to previous kernel version. e.g. Same
userspace, only kernel change. Fstrim finished within few seconds.
Collectl + system load indicator displayed much higher write throughput
during fstrim operation.

---

I see same behaviour on both my laptops. Both are configured in way SSD
-> partitions -> luks -> btrfs. Primary laptop have disk "SanDisk
SD7SN6S-512G-1006". Backup laptop "Micron C400 RealSSD 256GB mSATA". I
don't see this problem on disks attached via iSCSI.

---

... while writing this post I noticed that there is different amount of
"trimed" data reported. This may be due to snapshots being released (I
was capturing data +- at time when daily snapshots are released) and not
some kind of destructive problem ...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1756315

Title:
  fstrim and discard operations take too long to complete - Ubuntu 16.04

Status in linux-aws package in Ubuntu:
  Confirmed

Bug description:
  1-) Ubuntu Release : Ubuntu 16.04 LTS

  2-) linux-image-4.4.0.1052.54-aws

  3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should
  not take more than some seconds to complete.

  4-) Formating the raid0 array with xfs took around 2 hours. Running
  fstrim -v on the mount point mounted on top of the raid array took
  around 2 hours.

  How to reproduce the issue:

  - Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 16.04 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
  - Compose a raid0 array with the following command :

   # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
  /dev/nvme0n1 /dev/nvme1n1

  - When trying to format the raid0 array ( /dev/md0 ) using xfs it
  takes around 2 hours to complete. I tried other AMIs like RHEL7,
  CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

  root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

  real120m45.725s
  user0m0.000s
  sys 0m18.248s

  - Running fstrim -v on a filesystem mounted on top of /dev/md0 can
  take around 2 hours to complete. With other AMIs like RHEL7, CentOS7
  and Ubuntu 18.04 and the time needed was around 2 seconds.

  - When I try the same with any of the nvme SSD devices alone, let's
  say /dev/nvme0n1, the issue doesn't happen.

  - I tried to replicate this issue using LVM and striping, fstrim and
  mkfs.xfs, the tasks complete without taking hours :

  root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
Physical volume "/dev/nvme0n1" successfully created

  root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
Physical volume "/dev/nvme1n1" successfully created

  root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
Volume group "raid0" successfully created

  root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 
100%FREE 

[Kernel-packages] [Bug 1756315] Re: fstrim and discard operations take too long to complete - Ubuntu 16.04

2018-12-05 Thread Chris Trotman
I did some quick testing with some different kernel versions on Ubuntu
14.04, the issue seems to exist in versions up to 4.10 but with 4.11 it
is fixed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1756315

Title:
  fstrim and discard operations take too long to complete - Ubuntu 16.04

Status in linux-aws package in Ubuntu:
  Confirmed

Bug description:
  1-) Ubuntu Release : Ubuntu 16.04 LTS

  2-) linux-image-4.4.0.1052.54-aws

  3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should
  not take more than some seconds to complete.

  4-) Formating the raid0 array with xfs took around 2 hours. Running
  fstrim -v on the mount point mounted on top of the raid array took
  around 2 hours.

  How to reproduce the issue:

  - Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 16.04 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
  - Compose a raid0 array with the following command :

   # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
  /dev/nvme0n1 /dev/nvme1n1

  - When trying to format the raid0 array ( /dev/md0 ) using xfs it
  takes around 2 hours to complete. I tried other AMIs like RHEL7,
  CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

  root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

  real120m45.725s
  user0m0.000s
  sys 0m18.248s

  - Running fstrim -v on a filesystem mounted on top of /dev/md0 can
  take around 2 hours to complete. With other AMIs like RHEL7, CentOS7
  and Ubuntu 18.04 and the time needed was around 2 seconds.

  - When I try the same with any of the nvme SSD devices alone, let's
  say /dev/nvme0n1, the issue doesn't happen.

  - I tried to replicate this issue using LVM and striping, fstrim and
  mkfs.xfs, the tasks complete without taking hours :

  root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
Physical volume "/dev/nvme0n1" successfully created

  root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
Physical volume "/dev/nvme1n1" successfully created

  root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
Volume group "raid0" successfully created

  root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 
100%FREE raid0 /dev/nvme0n1 /dev/nvme1n1
Using default stripesize 64.00 KiB.
Logical volume "lvol0" created.

  root@ip-172-31-27-69:~# vgchange -ay
1 logical volume(s) in volume group "raid0" now active

  root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

  root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
LVVGAttr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
Convert
lvol0 raid0 -wi-a- 3.46t
  htop
  root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
  meta-data=/dev/raid0/lvol0   isize=512agcount=32, agsize=28991664 blks
   =   sectsz=512   attr=2, projid32bit=1
   =   crc=1finobt=1, sparse=0
  data =   bsize=4096   blocks=927733248, imaxpct=5
   =   sunit=16 swidth=32 blks
  naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
  log  =internal log   bsize=4096   blocks=453008, version=2
   =   sectsz=512   sunit=16 blks, lazy-count=1
  realtime =none   extsz=4096   blocks=0, rtextents=0

  real0m2.926s
  user0m0.180s
  sys 0m0.000s

  
  root@ip-172-31-27-69:~# mount /dev/raid0/lvol0 /mnt

  root@ip-172-31-27-69:~# time fstrim -v /mnt
  /mnt: 3.5 TiB (3798138650624 bytes) trimmed

  real0m1.794s
  user0m0.000s
  sys 0m0.000s

  So the issue only happens when using nvme and md to compose the raid0
  array.

  Bellow follows some information that may be useful:

  started formating the md array with mkfs.xfs. Process looks hanged.

  root@ip-172-31-24-66:~# ps aux | grep -i mkfs.xfs
  root   1693 12.0  0.0  12728   968 pts/1D+   07:54   0:03 mkfs.xfs 
/dev/md0

  PID 1693 is in uninterruptible sleep ( D )

  Looking at /proc/7965/stack

  root@ip-172-31-24-66:~# cat /proc/1693/stack
  [] blkdev_issue_discard+0x232/0x2a0
  [] blkdev_ioctl+0x61d/0x7d0
  [] block_ioctl+0x41/0x50
  [] do_vfs_ioctl+0x2e3/0x4d0
  [] SyS_ioctl+0x81/0xa0
  [] system_call_fastpath+0x1a/0x1f
  [] 0x

  
  Looking at the stack, looks like it's hanged on a discard operation

  root@ip-172-31-24-66:~# ps -flp 1693
  F S UID PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY  TIME 
CMD
  4 D root   1693   1682  2  80   0 -  3182 blkdev 07:54 pts/100:00:03 
mkfs.xfs /dev/md0

  
  root@ip-172-31-24-66:~# cat /proc/1693/wchan
  blkdev_issue_discard

  Process stuck with function --> blkdev_issue_discard

To manage notifications about this bug go to:

[Kernel-packages] [Bug 1756315] Re: fstrim and discard operations take too long to complete - Ubuntu 16.04

2018-12-04 Thread Chris Trotman
I ran into this issue on 14.04 and 16.04 as well. Ubuntu 18.04 does not
seem to be affected.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1756315

Title:
  fstrim and discard operations take too long to complete - Ubuntu 16.04

Status in linux-aws package in Ubuntu:
  Confirmed

Bug description:
  1-) Ubuntu Release : Ubuntu 16.04 LTS

  2-) linux-image-4.4.0.1052.54-aws

  3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should
  not take more than some seconds to complete.

  4-) Formating the raid0 array with xfs took around 2 hours. Running
  fstrim -v on the mount point mounted on top of the raid array took
  around 2 hours.

  How to reproduce the issue:

  - Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 16.04 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
  - Compose a raid0 array with the following command :

   # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
  /dev/nvme0n1 /dev/nvme1n1

  - When trying to format the raid0 array ( /dev/md0 ) using xfs it
  takes around 2 hours to complete. I tried other AMIs like RHEL7,
  CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

  root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

  real120m45.725s
  user0m0.000s
  sys 0m18.248s

  - Running fstrim -v on a filesystem mounted on top of /dev/md0 can
  take around 2 hours to complete. With other AMIs like RHEL7, CentOS7
  and Ubuntu 18.04 and the time needed was around 2 seconds.

  - When I try the same with any of the nvme SSD devices alone, let's
  say /dev/nvme0n1, the issue doesn't happen.

  - I tried to replicate this issue using LVM and striping, fstrim and
  mkfs.xfs, the tasks complete without taking hours :

  root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
Physical volume "/dev/nvme0n1" successfully created

  root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
Physical volume "/dev/nvme1n1" successfully created

  root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
Volume group "raid0" successfully created

  root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 
100%FREE raid0 /dev/nvme0n1 /dev/nvme1n1
Using default stripesize 64.00 KiB.
Logical volume "lvol0" created.

  root@ip-172-31-27-69:~# vgchange -ay
1 logical volume(s) in volume group "raid0" now active

  root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

  root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
LVVGAttr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
Convert
lvol0 raid0 -wi-a- 3.46t
  htop
  root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
  meta-data=/dev/raid0/lvol0   isize=512agcount=32, agsize=28991664 blks
   =   sectsz=512   attr=2, projid32bit=1
   =   crc=1finobt=1, sparse=0
  data =   bsize=4096   blocks=927733248, imaxpct=5
   =   sunit=16 swidth=32 blks
  naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
  log  =internal log   bsize=4096   blocks=453008, version=2
   =   sectsz=512   sunit=16 blks, lazy-count=1
  realtime =none   extsz=4096   blocks=0, rtextents=0

  real0m2.926s
  user0m0.180s
  sys 0m0.000s

  
  root@ip-172-31-27-69:~# mount /dev/raid0/lvol0 /mnt

  root@ip-172-31-27-69:~# time fstrim -v /mnt
  /mnt: 3.5 TiB (3798138650624 bytes) trimmed

  real0m1.794s
  user0m0.000s
  sys 0m0.000s

  So the issue only happens when using nvme and md to compose the raid0
  array.

  Bellow follows some information that may be useful:

  started formating the md array with mkfs.xfs. Process looks hanged.

  root@ip-172-31-24-66:~# ps aux | grep -i mkfs.xfs
  root   1693 12.0  0.0  12728   968 pts/1D+   07:54   0:03 mkfs.xfs 
/dev/md0

  PID 1693 is in uninterruptible sleep ( D )

  Looking at /proc/7965/stack

  root@ip-172-31-24-66:~# cat /proc/1693/stack
  [] blkdev_issue_discard+0x232/0x2a0
  [] blkdev_ioctl+0x61d/0x7d0
  [] block_ioctl+0x41/0x50
  [] do_vfs_ioctl+0x2e3/0x4d0
  [] SyS_ioctl+0x81/0xa0
  [] system_call_fastpath+0x1a/0x1f
  [] 0x

  
  Looking at the stack, looks like it's hanged on a discard operation

  root@ip-172-31-24-66:~# ps -flp 1693
  F S UID PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY  TIME 
CMD
  4 D root   1693   1682  2  80   0 -  3182 blkdev 07:54 pts/100:00:03 
mkfs.xfs /dev/md0

  
  root@ip-172-31-24-66:~# cat /proc/1693/wchan
  blkdev_issue_discard

  Process stuck with function --> blkdev_issue_discard

To manage notifications about this bug go to:

[Kernel-packages] [Bug 1756315] Re: fstrim and discard operations take too long to complete - Ubuntu 16.04

2018-12-04 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux-aws (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1756315

Title:
  fstrim and discard operations take too long to complete - Ubuntu 16.04

Status in linux-aws package in Ubuntu:
  Confirmed

Bug description:
  1-) Ubuntu Release : Ubuntu 16.04 LTS

  2-) linux-image-4.4.0.1052.54-aws

  3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should
  not take more than some seconds to complete.

  4-) Formating the raid0 array with xfs took around 2 hours. Running
  fstrim -v on the mount point mounted on top of the raid array took
  around 2 hours.

  How to reproduce the issue:

  - Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 16.04 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
  - Compose a raid0 array with the following command :

   # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
  /dev/nvme0n1 /dev/nvme1n1

  - When trying to format the raid0 array ( /dev/md0 ) using xfs it
  takes around 2 hours to complete. I tried other AMIs like RHEL7,
  CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

  root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

  real120m45.725s
  user0m0.000s
  sys 0m18.248s

  - Running fstrim -v on a filesystem mounted on top of /dev/md0 can
  take around 2 hours to complete. With other AMIs like RHEL7, CentOS7
  and Ubuntu 18.04 and the time needed was around 2 seconds.

  - When I try the same with any of the nvme SSD devices alone, let's
  say /dev/nvme0n1, the issue doesn't happen.

  - I tried to replicate this issue using LVM and striping, fstrim and
  mkfs.xfs, the tasks complete without taking hours :

  root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
Physical volume "/dev/nvme0n1" successfully created

  root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
Physical volume "/dev/nvme1n1" successfully created

  root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
Volume group "raid0" successfully created

  root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 
100%FREE raid0 /dev/nvme0n1 /dev/nvme1n1
Using default stripesize 64.00 KiB.
Logical volume "lvol0" created.

  root@ip-172-31-27-69:~# vgchange -ay
1 logical volume(s) in volume group "raid0" now active

  root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

  root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
LVVGAttr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
Convert
lvol0 raid0 -wi-a- 3.46t
  htop
  root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
  meta-data=/dev/raid0/lvol0   isize=512agcount=32, agsize=28991664 blks
   =   sectsz=512   attr=2, projid32bit=1
   =   crc=1finobt=1, sparse=0
  data =   bsize=4096   blocks=927733248, imaxpct=5
   =   sunit=16 swidth=32 blks
  naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
  log  =internal log   bsize=4096   blocks=453008, version=2
   =   sectsz=512   sunit=16 blks, lazy-count=1
  realtime =none   extsz=4096   blocks=0, rtextents=0

  real0m2.926s
  user0m0.180s
  sys 0m0.000s

  
  root@ip-172-31-27-69:~# mount /dev/raid0/lvol0 /mnt

  root@ip-172-31-27-69:~# time fstrim -v /mnt
  /mnt: 3.5 TiB (3798138650624 bytes) trimmed

  real0m1.794s
  user0m0.000s
  sys 0m0.000s

  So the issue only happens when using nvme and md to compose the raid0
  array.

  Bellow follows some information that may be useful:

  started formating the md array with mkfs.xfs. Process looks hanged.

  root@ip-172-31-24-66:~# ps aux | grep -i mkfs.xfs
  root   1693 12.0  0.0  12728   968 pts/1D+   07:54   0:03 mkfs.xfs 
/dev/md0

  PID 1693 is in uninterruptible sleep ( D )

  Looking at /proc/7965/stack

  root@ip-172-31-24-66:~# cat /proc/1693/stack
  [] blkdev_issue_discard+0x232/0x2a0
  [] blkdev_ioctl+0x61d/0x7d0
  [] block_ioctl+0x41/0x50
  [] do_vfs_ioctl+0x2e3/0x4d0
  [] SyS_ioctl+0x81/0xa0
  [] system_call_fastpath+0x1a/0x1f
  [] 0x

  
  Looking at the stack, looks like it's hanged on a discard operation

  root@ip-172-31-24-66:~# ps -flp 1693
  F S UID PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY  TIME 
CMD
  4 D root   1693   1682  2  80   0 -  3182 blkdev 07:54 pts/100:00:03 
mkfs.xfs /dev/md0

  
  root@ip-172-31-24-66:~# cat /proc/1693/wchan
  blkdev_issue_discard

  Process stuck with function --> blkdev_issue_discard

To manage notifications about this bug go to:

[Kernel-packages] [Bug 1756315] Re: fstrim and discard operations take too long to complete - Ubuntu 16.04

2018-03-21 Thread Alexandre Makoto Tanno
Hello David!

I launched an instance and tried to create a RAID0 array using the chunk
size of 4096 bytes. The result was the same, mkfs.xfs took around 2
hours and the same for fstrim -v on the mount point where /dev/md0 was
mounted.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1756315

Title:
  fstrim and discard operations take too long to complete - Ubuntu 16.04

Status in linux-aws package in Ubuntu:
  New

Bug description:
  1-) Ubuntu Release : Ubuntu 16.04 LTS

  2-) linux-image-4.4.0.1052.54-aws

  3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should
  not take more than some seconds to complete.

  4-) Formating the raid0 array with xfs took around 2 hours. Running
  fstrim -v on the mount point mounted on top of the raid array took
  around 2 hours.

  How to reproduce the issue:

  - Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 16.04 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
  - Compose a raid0 array with the following command :

   # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
  /dev/nvme0n1 /dev/nvme1n1

  - When trying to format the raid0 array ( /dev/md0 ) using xfs it
  takes around 2 hours to complete. I tried other AMIs like RHEL7,
  CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

  root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

  real120m45.725s
  user0m0.000s
  sys 0m18.248s

  - Running fstrim -v on a filesystem mounted on top of /dev/md0 can
  take around 2 hours to complete. With other AMIs like RHEL7, CentOS7
  and Ubuntu 18.04 and the time needed was around 2 seconds.

  - When I try the same with any of the nvme SSD devices alone, let's
  say /dev/nvme0n1, the issue doesn't happen.

  - I tried to replicate this issue using LVM and striping, fstrim and
  mkfs.xfs, the tasks complete without taking hours :

  root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
Physical volume "/dev/nvme0n1" successfully created

  root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
Physical volume "/dev/nvme1n1" successfully created

  root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
Volume group "raid0" successfully created

  root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 
100%FREE raid0 /dev/nvme0n1 /dev/nvme1n1
Using default stripesize 64.00 KiB.
Logical volume "lvol0" created.

  root@ip-172-31-27-69:~# vgchange -ay
1 logical volume(s) in volume group "raid0" now active

  root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

  root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
LVVGAttr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
Convert
lvol0 raid0 -wi-a- 3.46t
  htop
  root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
  meta-data=/dev/raid0/lvol0   isize=512agcount=32, agsize=28991664 blks
   =   sectsz=512   attr=2, projid32bit=1
   =   crc=1finobt=1, sparse=0
  data =   bsize=4096   blocks=927733248, imaxpct=5
   =   sunit=16 swidth=32 blks
  naming   =version 2  bsize=4096   ascii-ci=0 ftype=1
  log  =internal log   bsize=4096   blocks=453008, version=2
   =   sectsz=512   sunit=16 blks, lazy-count=1
  realtime =none   extsz=4096   blocks=0, rtextents=0

  real0m2.926s
  user0m0.180s
  sys 0m0.000s

  
  root@ip-172-31-27-69:~# mount /dev/raid0/lvol0 /mnt

  root@ip-172-31-27-69:~# time fstrim -v /mnt
  /mnt: 3.5 TiB (3798138650624 bytes) trimmed

  real0m1.794s
  user0m0.000s
  sys 0m0.000s

  So the issue only happens when using nvme and md to compose the raid0
  array.

  Bellow follows some information that may be useful:

  started formating the md array with mkfs.xfs. Process looks hanged.

  root@ip-172-31-24-66:~# ps aux | grep -i mkfs.xfs
  root   1693 12.0  0.0  12728   968 pts/1D+   07:54   0:03 mkfs.xfs 
/dev/md0

  PID 1693 is in uninterruptible sleep ( D )

  Looking at /proc/7965/stack

  root@ip-172-31-24-66:~# cat /proc/1693/stack
  [] blkdev_issue_discard+0x232/0x2a0
  [] blkdev_ioctl+0x61d/0x7d0
  [] block_ioctl+0x41/0x50
  [] do_vfs_ioctl+0x2e3/0x4d0
  [] SyS_ioctl+0x81/0xa0
  [] system_call_fastpath+0x1a/0x1f
  [] 0x

  
  Looking at the stack, looks like it's hanged on a discard operation

  root@ip-172-31-24-66:~# ps -flp 1693
  F S UID PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY  TIME 
CMD
  4 D root   1693   1682  2  80   0 -  3182 blkdev 07:54 pts/100:00:03 
mkfs.xfs /dev/md0

  
  root@ip-172-31-24-66:~# cat /proc/1693/wchan
  blkdev_issue_discard

  Process stuck with function