from:"Diego Viola"

[TRIVIAL] drm/amd/display: fix typo: liason -> liaison

2021-03-27 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 573cf17262da..1b4b4f508662 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -121,7 +121,7 @@ MODULE_FIRMWARE(FIRMWARE_NAVI12_DMCU);
  * DOC: overview
  *
  * The AMDgpu display manager, **amdgpu_dm** (or even simpler,
- * **dm**) sits between DRM and DC. It acts as a liason, converting DRM
+ * **dm**) sits between DRM and DC. It acts as a liaison, converting DRM
  * requests into DC requests, and DC responses into DRM responses.
  *
  * The root control structure is  amdgpu_display_manager.
-- 
2.31.1

Re: [PATCH v2] drm/nouveau/pmu: fix timeout on GP108

2021-02-24 Thread Diego Viola

On Thu, Feb 25, 2021 at 2:22 AM Diego Viola  wrote:
>
> Hi Ben,
>
> I can confirm that your last two patches[0][1] fix the timeout issues
> (those from a normal boot and from suspend/resume).
>
> [0] 
> https://github.com/skeggsb/linux/commit/90224a17437b1f39dbecbb385567c1fce958f992
> [1] 
> https://github.com/skeggsb/linux/commit/0ee6dc49601359042fd254bbd8ba6b4685b4d0d7
>
> Tested-by: Diego Viola 
>
> on both patches.
>
> Thanks, I appreciate it a lot.
>
> Diego
>
> On Wed, Feb 24, 2021 at 6:50 AM Ben Skeggs  wrote:
> >
> > On Wed, 17 Feb 2021 at 13:30, Alexandre Courbot  wrote:
> > >
> > > On Wed, Feb 17, 2021 at 1:20 AM Diego Viola  wrote:
> > > >
> > > > This code times out on GP108, probably because the BIOS puts it into a
> > > > bad state.
> > > >
> > > > Since we reset the PMU on driver load anyway, we are at no risk from
> > > > missing a response from it since we are not waiting for one to begin
> > > > with.
> > >
> > > This looks safe to me, provided indeed that the PMU's reset is not
> > > called outside of initialization (which for GP108 is shouldn't be
> > > IIRC?).
> > ISTR that the PMU FW we use prior to GM200 might depend on that being there.
> >
> > I've posted a proposed alternate fix here[1], as we probably shouldn't
> > have been touching PMU there anyway on those GPUs.
> >
> > Ben.
> >
> > [1] 
> > https://github.com/skeggsb/linux/commit/90224a17437b1f39dbecbb385567c1fce958f992
> >
> > >
> > > >
> > > > Signed-off-by: Diego Viola 
> > > > ---
> > > >  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c | 6 +-
> > > >  1 file changed, 1 insertion(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c 
> > > > b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > > index a0fe607c9c07..5c802f2d00cb 100644
> > > > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > > @@ -102,12 +102,8 @@ nvkm_pmu_reset(struct nvkm_pmu *pmu)
> > > > if (!pmu->func->enabled(pmu))
> > > > return 0;
> > > >
> > > > -   /* Inhibit interrupts, and wait for idle. */
> > > > +   /* Inhibit interrupts. */
> > > > nvkm_wr32(device, 0x10a014, 0x);
> > > > -   nvkm_msec(device, 2000,
> > > > -   if (!nvkm_rd32(device, 0x10a04c))
> > > > -   break;
> > > > -   );
> > > >
> > > > /* Reset. */
> > > > if (pmu->func->reset)
> > > > --
> > > > 2.30.1
> > > >
> > > ___
> > > dri-devel mailing list
> > > dri-de...@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel

Ugh, sorry for breaking the regular email flow/order. Damn you gmail!

Regards,
Diego

Re: [PATCH v2] drm/nouveau/pmu: fix timeout on GP108

2021-02-24 Thread Diego Viola

Hi Ben,

I can confirm that your last two patches[0][1] fix the timeout issues
(those from a normal boot and from suspend/resume).

[0] 
https://github.com/skeggsb/linux/commit/90224a17437b1f39dbecbb385567c1fce958f992
[1] 
https://github.com/skeggsb/linux/commit/0ee6dc49601359042fd254bbd8ba6b4685b4d0d7

Tested-by: Diego Viola 

on both patches.

Thanks, I appreciate it a lot.

Diego

On Wed, Feb 24, 2021 at 6:50 AM Ben Skeggs  wrote:
>
> On Wed, 17 Feb 2021 at 13:30, Alexandre Courbot  wrote:
> >
> > On Wed, Feb 17, 2021 at 1:20 AM Diego Viola  wrote:
> > >
> > > This code times out on GP108, probably because the BIOS puts it into a
> > > bad state.
> > >
> > > Since we reset the PMU on driver load anyway, we are at no risk from
> > > missing a response from it since we are not waiting for one to begin
> > > with.
> >
> > This looks safe to me, provided indeed that the PMU's reset is not
> > called outside of initialization (which for GP108 is shouldn't be
> > IIRC?).
> ISTR that the PMU FW we use prior to GM200 might depend on that being there.
>
> I've posted a proposed alternate fix here[1], as we probably shouldn't
> have been touching PMU there anyway on those GPUs.
>
> Ben.
>
> [1] 
> https://github.com/skeggsb/linux/commit/90224a17437b1f39dbecbb385567c1fce958f992
>
> >
> > >
> > > Signed-off-by: Diego Viola 
> > > ---
> > >  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c | 6 +-
> > >  1 file changed, 1 insertion(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c 
> > > b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > index a0fe607c9c07..5c802f2d00cb 100644
> > > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > @@ -102,12 +102,8 @@ nvkm_pmu_reset(struct nvkm_pmu *pmu)
> > > if (!pmu->func->enabled(pmu))
> > > return 0;
> > >
> > > -   /* Inhibit interrupts, and wait for idle. */
> > > +   /* Inhibit interrupts. */
> > > nvkm_wr32(device, 0x10a014, 0x);
> > > -   nvkm_msec(device, 2000,
> > > -   if (!nvkm_rd32(device, 0x10a04c))
> > > -   break;
> > > -   );
> > >
> > > /* Reset. */
> > > if (pmu->func->reset)
> > > --
> > > 2.30.1
> > >
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2] drm/nouveau/pmu: fix timeout on GP108

2021-02-24 Thread Diego Viola

Hi Ben,

On Wed, Feb 24, 2021 at 6:50 AM Ben Skeggs  wrote:
>
> On Wed, 17 Feb 2021 at 13:30, Alexandre Courbot  wrote:
> >
> > On Wed, Feb 17, 2021 at 1:20 AM Diego Viola  wrote:
> > >
> > > This code times out on GP108, probably because the BIOS puts it into a
> > > bad state.
> > >
> > > Since we reset the PMU on driver load anyway, we are at no risk from
> > > missing a response from it since we are not waiting for one to begin
> > > with.
> >
> > This looks safe to me, provided indeed that the PMU's reset is not
> > called outside of initialization (which for GP108 is shouldn't be
> > IIRC?).
> ISTR that the PMU FW we use prior to GM200 might depend on that being there.
>
> I've posted a proposed alternate fix here[1], as we probably shouldn't
> have been touching PMU there anyway on those GPUs.
>
> Ben.
>
> [1] 
> https://github.com/skeggsb/linux/commit/90224a17437b1f39dbecbb385567c1fce958f992
>
> >
> > >
> > > Signed-off-by: Diego Viola 
> > > ---
> > >  drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c | 6 +-
> > >  1 file changed, 1 insertion(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c 
> > > b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > index a0fe607c9c07..5c802f2d00cb 100644
> > > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
> > > @@ -102,12 +102,8 @@ nvkm_pmu_reset(struct nvkm_pmu *pmu)
> > > if (!pmu->func->enabled(pmu))
> > > return 0;
> > >
> > > -   /* Inhibit interrupts, and wait for idle. */
> > > +   /* Inhibit interrupts. */
> > > nvkm_wr32(device, 0x10a014, 0x);
> > > -   nvkm_msec(device, 2000,
> > > -   if (!nvkm_rd32(device, 0x10a04c))
> > > -   break;
> > > -   );
> > >
> > > /* Reset. */
> > > if (pmu->func->reset)
> > > --
> > > 2.30.1
> > >
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel

I tested your patch and can confirm that the timeout is gone after
booting my system, but unfortunately it's back after doing a
suspend/resume.

Any ideas about that?

Thanks,
Diego

[PATCH v2] drm/nouveau/pmu: fix timeout on GP108

2021-02-16 Thread Diego Viola

This code times out on GP108, probably because the BIOS puts it into a
bad state.

Since we reset the PMU on driver load anyway, we are at no risk from
missing a response from it since we are not waiting for one to begin
with.

Signed-off-by: Diego Viola 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
index a0fe607c9c07..5c802f2d00cb 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
@@ -102,12 +102,8 @@ nvkm_pmu_reset(struct nvkm_pmu *pmu)
if (!pmu->func->enabled(pmu))
return 0;
 
-   /* Inhibit interrupts, and wait for idle. */
+   /* Inhibit interrupts. */
nvkm_wr32(device, 0x10a014, 0x);
-   nvkm_msec(device, 2000,
-   if (!nvkm_rd32(device, 0x10a04c))
-   break;
-   );
 
/* Reset. */
if (pmu->func->reset)
-- 
2.30.1

[PATCH] drm/nouveau/pmu: fix timeout on GP108

2021-02-13 Thread Diego Viola

This code times out on GP108, probably because the BIOS puts it into a
bad state.

Since we reset the PMU on driver load anyway, we are at no risk from
missing a response from it since we are not waiting for one to begin
with.

Signed-off-by: Diego Viola 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
index a0fe607c9c07..4f02c4f14f04 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c
@@ -104,10 +104,6 @@ nvkm_pmu_reset(struct nvkm_pmu *pmu)
 
/* Inhibit interrupts, and wait for idle. */
nvkm_wr32(device, 0x10a014, 0x);
-   nvkm_msec(device, 2000,
-   if (!nvkm_rd32(device, 0x10a04c))
-   break;
-   );
 
/* Reset. */
if (pmu->func->reset)
-- 
2.30.1

[PATCH] drm/virtio: fix unblank

2020-08-25 Thread Diego Viola

From: Gerd Hoffmann 

When going through a disable/enable cycle without changing the
framebuffer the optimization added by commit 3954ff10e06e ("drm/virtio:
skip set_scanout if framebuffer didn't change") causes the screen stay
blank.  Add a bool to force an update to fix that.

v2: use drm_atomic_crtc_needs_modeset() (Daniel).

Cc: 1882...@bugs.launchpad.net
Fixes: 3954ff10e06e ("drm/virtio: skip set_scanout if framebuffer didn't 
change")
Signed-off-by: Gerd Hoffmann 
Tested-by: Jiri Slaby 
Tested-by: Diego Viola 
---
 drivers/gpu/drm/virtio/virtgpu_display.c | 11 +++
 drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
 drivers/gpu/drm/virtio/virtgpu_plane.c   |  4 +++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c 
b/drivers/gpu/drm/virtio/virtgpu_display.c
index af55b334be2f..35b5c80f5d85 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -123,6 +123,17 @@ static int virtio_gpu_crtc_atomic_check(struct drm_crtc 
*crtc,
 static void virtio_gpu_crtc_atomic_flush(struct drm_crtc *crtc,
 struct drm_crtc_state *old_state)
 {
+   struct virtio_gpu_output *output = drm_crtc_to_virtio_gpu_output(crtc);
+
+   /*
+* virtio-gpu can't do modeset and plane update operations
+* independant from each other.  So the actual modeset happens
+* in the plane update callback, and here we just check
+* whenever we must force the modeset.
+*/
+   if (drm_atomic_crtc_needs_modeset(crtc->state)) {
+   output->needs_modeset = true;
+   }
 }
 
 static const struct drm_crtc_helper_funcs virtio_gpu_crtc_helper_funcs = {
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 9ff9f4ac0522..4ab1b0ba2925 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -138,6 +138,7 @@ struct virtio_gpu_output {
int cur_x;
int cur_y;
bool enabled;
+   bool needs_modeset;
 };
 #define drm_crtc_to_virtio_gpu_output(x) \
container_of(x, struct virtio_gpu_output, crtc)
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c 
b/drivers/gpu/drm/virtio/virtgpu_plane.c
index 52d24179bcec..65757409d9ed 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -163,7 +163,9 @@ static void virtio_gpu_primary_plane_update(struct 
drm_plane *plane,
plane->state->src_w != old_state->src_w ||
plane->state->src_h != old_state->src_h ||
plane->state->src_x != old_state->src_x ||
-   plane->state->src_y != old_state->src_y) {
+   plane->state->src_y != old_state->src_y ||
+   output->needs_modeset) {
+   output->needs_modeset = false;
DRM_DEBUG("handle 0x%x, crtc %dx%d+%d+%d, src %dx%d+%d+%d\n",
  bo->hw_res_handle,
  plane->state->crtc_w, plane->state->crtc_h,
-- 
2.28.0

Re: Nested KVM is broken on a AMD Ryzen 5 2400G

2019-01-24 Thread Diego Viola

Hi Joerg,

On Thu, Jan 24, 2019 at 6:42 AM Joerg Roedel  wrote:
>
> Hi Diego,
>
> thanks for the report!

You're welcome, many thanks for your quick reply.

>
> On Thu, Jan 24, 2019 at 01:57:31AM -0200, Diego Viola wrote:
> > [10499.577192] WARNING: CPU: 2 PID: 3487 at arch/x86/kvm/mmu.c:2066
> > nonpaging_update_pte+0x5/0x10 [kvm]
>
> Seems like the nested-paging emulation got broken somewhere in the past.
> Can you try to load the KVM module in the guest with 'npt=0' and see if
> it works then?

I tried this and snapcraft is still timing out on the guest:

diego@diego-Standard-PC-i440FX-PIIX-1996:~/mysnaps/hello$ snapcraft
Launching a VM.
start failed: timed out waiting for instance to respond
Stopping snapcraft-my-snap-name /^C
Aborted!
diego@diego-Standard-PC-i440FX-PIIX-1996:~/mysnaps/hello$

That said, I no longer see the dmesg stack trace on the host. Should I
try a more recent kernel on the guest?

>
> Thanks,
>
> Joerg

Regards,
Diego

Fwd: Nested KVM is broken on a AMD Ryzen 5 2400G

2019-01-23 Thread Diego Viola

-- Forwarded message -
From: Diego Viola 
Date: Thu, Jan 24, 2019 at 1:57 AM
Subject: Nested KVM is broken on a AMD Ryzen 5 2400G
To: , ,
, 


Hello,

I am trying to do nested KVM on a Ryzen 5 2400G, my use case is the following:

- Arch Linux as the host OS.
- Ubuntu 18.04.1 as the guest OS.

I am using qemu 3.1.0-1 (from the extra repository) on Arch Linux.

This is the command I am using to start the VM:

qemu-system-x86_64 -enable-kvm -hda ubuntu.qcow2 -m 4G -smp 4 -vga
virtio -cpu host

The reason I need nested KVM is that I am trying to build some snap
packages on Ubuntu, and that uses an utility called "multipass" which
seems to run some VMs.

Anyway, everything works until I run "snapcraft", which then calls
multipass and I get the following on the host dmesg:

[10499.577192] WARNING: CPU: 2 PID: 3487 at arch/x86/kvm/mmu.c:2066
nonpaging_update_pte+0x5/0x10 [kvm]
[10499.577194] Modules linked in: kvm_amd fuse cfg80211 8021q garp mrp
stp llc nls_iso8859_1 nls_cp437 vfat fat amdgpu edac_mce_amd ccp
rng_core kvm irqbypass chash amd_iommu_v2 gpu_sched crct10dif_pclmul
i2c_algo_bit crc32_pclmul ghash_clmulni_intel ttm
snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper eeepc_wmi
asus_wmi sparse_keymap snd_hda_codec_hdmi rfkill drm wmi_bmof
snd_hda_intel aesni_intel snd_hda_codec snd_hda_core aes_x86_64
crypto_simd snd_hwdep cryptd r8169 glue_helper snd_pcm agpgart
syscopyarea sysfillrect snd_timer libphy sysimgblt fb_sys_fops snd
joydev mousedev input_leds soundcore sp5100_tco i2c_piix4 pcspkr
k10temp wmi evdev pinctrl_amd mac_hid gpio_amdpt pcc_cpufreq
acpi_cpufreq ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2
fscrypto hid_generic usbhid hid sd_mod crc32c_intel ahci libahci
libata xhci_pci xhci_hcd scsi_mod [last unloaded: kvm_amd]
[10499.577229] CPU: 2 PID: 3487 Comm: qemu-system-x86 Tainted: G
 W 4.20.3-arch1-1-ARCH #1
[10499.577230] Hardware name: System manufacturer System Product
Name/PRIME A320M-K/BR, BIOS 4023 08/20/2018
[10499.577241] RIP: 0010:nonpaging_update_pte+0x5/0x10 [kvm]
[10499.577243] Code: 00 00 00 00 00 0f 1f 44 00 00 31 c0 c3 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 <0f> 0b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 eb 16
83 68
[10499.577244] RSP: 0018:99fe480c7a90 EFLAGS: 00010202
[10499.577245] RAX: c08d4b10 RBX: 0701 RCX: 99fe480c7ac0
[10499.577246] RDX: 95480a349000 RSI: 954884b73460 RDI: 9548aef88000
[10499.577246] RBP: 954884b73460 R08: 95480a349000 R09: 
[10499.577247] R10: 0008 R11: 0007 R12: 
[10499.577248] R13: 95480a349000 R14: 9548aef88000 R15: 99fe480c7ac8
[10499.577249] FS:  7f8d9c7ff700() GS:954997a8()
knlGS:
[10499.577250] CS:  0010 DS:  ES:  CR0: 80050033
[10499.577250] CR2: 55761b641000 CR3: 000121cfe000 CR4: 003406e0
[10499.577251] Call Trace:
[10499.577265]  kvm_mmu_pte_write+0x487/0x4a0 [kvm]
[10499.577277]  kvm_page_track_write+0x7c/0xa0 [kvm]
[10499.577288]  emulator_write_phys+0x36/0x50 [kvm]
[10499.577299]  emulator_read_write_onepage+0xef/0x330 [kvm]
[10499.577309]  emulator_read_write+0xc8/0x180 [kvm]
[10499.577320]  segmented_write+0x5d/0x80 [kvm]
[10499.577332]  writeback+0xf4/0x260 [kvm]
[10499.577343]  ? em_in+0x13a/0x240 [kvm]
[10499.577354]  x86_emulate_insn+0x7b4/0x10a0 [kvm]
[10499.577364]  x86_emulate_instruction+0x33e/0x720 [kvm]
[10499.577374]  complete_emulated_pio+0x33/0x60 [kvm]
[10499.577384]  kvm_arch_vcpu_ioctl_run+0x1652/0x1b30 [kvm]
[10499.577387]  ? pollwake+0x74/0x90
[10499.577397]  ? kvm_vm_ioctl_irq_line+0x23/0x30 [kvm]
[10499.577404]  kvm_vcpu_ioctl+0x2b8/0x600 [kvm]
[10499.577407]  ? wake_up_q+0x70/0x70
[10499.577409]  do_vfs_ioctl+0xa4/0x630
[10499.577412]  ksys_ioctl+0x60/0x90
[10499.577413]  __x64_sys_ioctl+0x16/0x20
[10499.577416]  do_syscall_64+0x5b/0x170
[10499.577419]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10499.577421] RIP: 0033:0x7f8da3cd380b
[10499.577422] Code: 0f 1e fa 48 8b 05 55 b6 0c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 25 b6 0c 00 f7 d8 64 89
01 48
[10499.577422] RSP: 002b:7f8d9c7fcec8 EFLAGS: 0246 ORIG_RAX:
0010
[10499.577424] RAX: ffda RBX: ae80 RCX: 7f8da3cd380b
[10499.577424] RDX:  RSI: ae80 RDI: 0011
[10499.577425] RBP:  R08: 559ab1460b50 R09: 0004
[10499.577425] R10: 0001 R11: 0246 R12: 7f8d9e58d3c0
[10499.577426] R13: 7f8da2111000 R14:  R15: 7f8d9e58d3c0
[10499.577428] ---[ end trace 4f89a414fced52ea ]---

Please let me know if you need more information. I've tried the same
thing on a broadwell laptop (T450) and nested KVM works

Re: [PATCH RFC 05/15] parisc: replace **** with a hug

2018-11-30 Thread Diego Viola

On Fri, Nov 30, 2018 at 5:30 PM Jarkko Sakkinen
 wrote:
>
> In order to comply with the CoC, replace  with a hug.
>
> Signed-off-by: Jarkko Sakkinen 
> ---
>  arch/parisc/kernel/sys_parisc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
> index 376ea0d1b275..4782c41f00c8 100644
> --- a/arch/parisc/kernel/sys_parisc.c
> +++ b/arch/parisc/kernel/sys_parisc.c
> @@ -287,7 +287,7 @@ asmlinkage unsigned long sys_mmap(unsigned long addr, 
> unsigned long len,
> }
>  }
>
> -/* Fucking broken ABI */
> +/* Hugging broken ABI */
>
>  #ifdef CONFIG_64BIT
>  asmlinkage long parisc_truncate64(const char __user * path,
> --
> 2.19.1
>

What about profanity in commit messages?

Re: [PATCH RFC 05/15] parisc: replace **** with a hug

2018-11-30 Thread Diego Viola

On Fri, Nov 30, 2018 at 5:30 PM Jarkko Sakkinen
 wrote:
>
> In order to comply with the CoC, replace  with a hug.
>
> Signed-off-by: Jarkko Sakkinen 
> ---
>  arch/parisc/kernel/sys_parisc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
> index 376ea0d1b275..4782c41f00c8 100644
> --- a/arch/parisc/kernel/sys_parisc.c
> +++ b/arch/parisc/kernel/sys_parisc.c
> @@ -287,7 +287,7 @@ asmlinkage unsigned long sys_mmap(unsigned long addr, 
> unsigned long len,
> }
>  }
>
> -/* Fucking broken ABI */
> +/* Hugging broken ABI */
>
>  #ifdef CONFIG_64BIT
>  asmlinkage long parisc_truncate64(const char __user * path,
> --
> 2.19.1
>

What about profanity in commit messages?

Re: [PATCH RESEND] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-11-12 Thread Diego Viola

On Mon, Nov 12, 2018 at 5:42 PM Greg KH  wrote:
>
> On Mon, Nov 12, 2018 at 05:22:52PM -0200, Diego Viola wrote:
> > med_power_with_dipm still causes freezes after updating the firmware to
> > the latest version (DXT04L5Q).
> >
> > Set model_rev to NULL and blacklist the device.
> >
> > Signed-off-by: Diego Viola 
> > Reviewed-by: Hans de Goede 
> > ---
> >  drivers/ata/libata-core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> 
>
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read:
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.
>
> 

Sorry about that, I'll get it right next time.

Thanks,
Diego

Re: [PATCH RESEND] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-11-12 Thread Diego Viola

On Mon, Nov 12, 2018 at 5:42 PM Greg KH  wrote:
>
> On Mon, Nov 12, 2018 at 05:22:52PM -0200, Diego Viola wrote:
> > med_power_with_dipm still causes freezes after updating the firmware to
> > the latest version (DXT04L5Q).
> >
> > Set model_rev to NULL and blacklist the device.
> >
> > Signed-off-by: Diego Viola 
> > Reviewed-by: Hans de Goede 
> > ---
> >  drivers/ata/libata-core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> 
>
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read:
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.
>
> 

Sorry about that, I'll get it right next time.

Thanks,
Diego

Re: [PATCH RESEND] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-11-12 Thread Diego Viola

On Mon, Nov 12, 2018 at 5:30 PM Jens Axboe  wrote:
>
> On 11/12/18 12:22 PM, Diego Viola wrote:
> > med_power_with_dipm still causes freezes after updating the firmware to
> > the latest version (DXT04L5Q).
> >
> > Set model_rev to NULL and blacklist the device.
>
> Thanks, applied.
>
> --
> Jens Axboe
>

Thanks.

Re: [PATCH RESEND] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-11-12 Thread Diego Viola

On Mon, Nov 12, 2018 at 5:30 PM Jens Axboe  wrote:
>
> On 11/12/18 12:22 PM, Diego Viola wrote:
> > med_power_with_dipm still causes freezes after updating the firmware to
> > the latest version (DXT04L5Q).
> >
> > Set model_rev to NULL and blacklist the device.
>
> Thanks, applied.
>
> --
> Jens Axboe
>

Thanks.

[PATCH RESEND] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-11-12 Thread Diego Viola

med_power_with_dipm still causes freezes after updating the firmware to
the latest version (DXT04L5Q).

Set model_rev to NULL and blacklist the device.

Signed-off-by: Diego Viola 
Reviewed-by: Hans de Goede 
---
 drivers/ata/libata-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6e594644cb1d..a7f5202a4815 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,7 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
-   { "SAMSUNG MZ7TD256HAFV-000L9", "DXT02L5Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

[PATCH RESEND] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-11-12 Thread Diego Viola

med_power_with_dipm still causes freezes after updating the firmware to
the latest version (DXT04L5Q).

Set model_rev to NULL and blacklist the device.

Signed-off-by: Diego Viola 
Reviewed-by: Hans de Goede 
---
 drivers/ata/libata-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6e594644cb1d..a7f5202a4815 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,7 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
-   { "SAMSUNG MZ7TD256HAFV-000L9", "DXT02L5Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

Re: [PATCH] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-11-09 Thread Diego Viola

On Sun, Oct 28, 2018 at 11:34 AM Diego Viola  wrote:
>
> med_power_with_dipm still causes freezes after updating the firmware to
> the latest version (DXT04L5Q).
>
> Set model_rev to NULL and blacklist the device.
>
> Signed-off-by: Diego Viola 
> ---
>  drivers/ata/libata-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 6e594644cb1d..a7f5202a4815 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -4553,7 +4553,7 @@ static const struct ata_blacklist_entry 
> ata_device_blacklist [] = {
> /* These specific Samsung models/firmware-revs do not handle LPM well 
> */
> { "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
> { "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
> -   { "SAMSUNG MZ7TD256HAFV-000L9", "DXT02L5Q", ATA_HORKAGE_NOLPM, },
> +   { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
>
> /* devices that don't properly handle queued TRIM commands */
> { "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
> --
> 2.19.1
>

Hi Jens,

When do you plan to merge my patch?

Do you have any objections?

Thanks,
Diego

Re: [PATCH] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-11-09 Thread Diego Viola

On Sun, Oct 28, 2018 at 11:34 AM Diego Viola  wrote:
>
> med_power_with_dipm still causes freezes after updating the firmware to
> the latest version (DXT04L5Q).
>
> Set model_rev to NULL and blacklist the device.
>
> Signed-off-by: Diego Viola 
> ---
>  drivers/ata/libata-core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 6e594644cb1d..a7f5202a4815 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -4553,7 +4553,7 @@ static const struct ata_blacklist_entry 
> ata_device_blacklist [] = {
> /* These specific Samsung models/firmware-revs do not handle LPM well 
> */
> { "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
> { "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
> -   { "SAMSUNG MZ7TD256HAFV-000L9", "DXT02L5Q", ATA_HORKAGE_NOLPM, },
> +   { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
>
> /* devices that don't properly handle queued TRIM commands */
> { "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
> --
> 2.19.1
>

Hi Jens,

When do you plan to merge my patch?

Do you have any objections?

Thanks,
Diego

[PATCH] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-10-28 Thread Diego Viola

med_power_with_dipm still causes freezes after updating the firmware to
the latest version (DXT04L5Q).

Set model_rev to NULL and blacklist the device.

Signed-off-by: Diego Viola 
---
 drivers/ata/libata-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6e594644cb1d..a7f5202a4815 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,7 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
-   { "SAMSUNG MZ7TD256HAFV-000L9", "DXT02L5Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

[PATCH] Blacklist SAMSUNG MZ7TD256HAFV-000L9 SSD

2018-10-28 Thread Diego Viola

med_power_with_dipm still causes freezes after updating the firmware to
the latest version (DXT04L5Q).

Set model_rev to NULL and blacklist the device.

Signed-off-by: Diego Viola 
---
 drivers/ata/libata-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6e594644cb1d..a7f5202a4815 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,7 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
-   { "SAMSUNG MZ7TD256HAFV-000L9", "DXT02L5Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

Re: [PATCH v3] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-28 Thread Diego Viola

Hi Hans,

On Sun, Oct 28, 2018 at 7:48 AM Hans de Goede  wrote:
>
> Hi,
>
> On 28-10-18 05:13, Diego Viola wrote:
> > On Fri, Oct 26, 2018 at 5:36 PM Diego Viola  wrote:
> >>
> >> On Fri, Oct 26, 2018 at 11:21 AM Jens Axboe  wrote:
> >>>
> >>> On 10/26/18 7:45 AM, Diego Viola wrote:
> >>>> med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> >>>> MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> >>>>
> >>>> Switching the LPM to max_performance fixes this issue.
> >>>
> >>> Applied, thanks.
> >>>
> >>> --
> >>> Jens Axboe
> >>>
> >>
> >> Jens, Hans,
> >>
> >> Thank you.
> >>
> >> Diego
> >
> > Hi Hans and Jens,
> >
> > I just wanted to give you guys an update about this problem.
> >
> > I've managed to update my SSD firmware to the latest version[1].
> >
> > For running the update, I've had to install Windows 10, ran the
> > firmware update, remove Windows and reinstall Arch Linux.
> >
> > The latest version of the firmware is DXT04L5Q, and it looks like
> > there hasn't been a new update since 2015.
> >
> > I'll be running with med_power_with_dipm and hope this firmware update
> > fixes the problem, if it doesn't and I get another freeze, I'll send
> > another patch blacklisting the drive completely. Is that OK?
>
> Yes, if it still happens with the latest firmware then blacklisting
> it completely is the right thing to do.
>
> Unfortunately for reasons which I do not understand OEM SSDs often use
> different (customized?) firmware compared to the model on which they
> are based and often see less updates and seem to have more bugs :|
>
> Regards,
>
> Hans

I've just got another freeze with the latest firmware while I was
watching some YouTube videos. I think it's safe to assume this drive
is borked and won't get better, I'll send another patch.

Thanks,
Diego

Re: [PATCH v3] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-28 Thread Diego Viola

Hi Hans,

On Sun, Oct 28, 2018 at 7:48 AM Hans de Goede  wrote:
>
> Hi,
>
> On 28-10-18 05:13, Diego Viola wrote:
> > On Fri, Oct 26, 2018 at 5:36 PM Diego Viola  wrote:
> >>
> >> On Fri, Oct 26, 2018 at 11:21 AM Jens Axboe  wrote:
> >>>
> >>> On 10/26/18 7:45 AM, Diego Viola wrote:
> >>>> med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> >>>> MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> >>>>
> >>>> Switching the LPM to max_performance fixes this issue.
> >>>
> >>> Applied, thanks.
> >>>
> >>> --
> >>> Jens Axboe
> >>>
> >>
> >> Jens, Hans,
> >>
> >> Thank you.
> >>
> >> Diego
> >
> > Hi Hans and Jens,
> >
> > I just wanted to give you guys an update about this problem.
> >
> > I've managed to update my SSD firmware to the latest version[1].
> >
> > For running the update, I've had to install Windows 10, ran the
> > firmware update, remove Windows and reinstall Arch Linux.
> >
> > The latest version of the firmware is DXT04L5Q, and it looks like
> > there hasn't been a new update since 2015.
> >
> > I'll be running with med_power_with_dipm and hope this firmware update
> > fixes the problem, if it doesn't and I get another freeze, I'll send
> > another patch blacklisting the drive completely. Is that OK?
>
> Yes, if it still happens with the latest firmware then blacklisting
> it completely is the right thing to do.
>
> Unfortunately for reasons which I do not understand OEM SSDs often use
> different (customized?) firmware compared to the model on which they
> are based and often see less updates and seem to have more bugs :|
>
> Regards,
>
> Hans

I've just got another freeze with the latest firmware while I was
watching some YouTube videos. I think it's safe to assume this drive
is borked and won't get better, I'll send another patch.

Thanks,
Diego

Re: [PATCH v3] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-27 Thread Diego Viola

On Fri, Oct 26, 2018 at 5:36 PM Diego Viola  wrote:
>
> On Fri, Oct 26, 2018 at 11:21 AM Jens Axboe  wrote:
> >
> > On 10/26/18 7:45 AM, Diego Viola wrote:
> > > med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> > > MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> > >
> > > Switching the LPM to max_performance fixes this issue.
> >
> > Applied, thanks.
> >
> > --
> > Jens Axboe
> >
>
> Jens, Hans,
>
> Thank you.
>
> Diego

Hi Hans and Jens,

I just wanted to give you guys an update about this problem.

I've managed to update my SSD firmware to the latest version[1].

For running the update, I've had to install Windows 10, ran the
firmware update, remove Windows and reinstall Arch Linux.

The latest version of the firmware is DXT04L5Q, and it looks like
there hasn't been a new update since 2015.

I'll be running with med_power_with_dipm and hope this firmware update
fixes the problem, if it doesn't and I get another freeze, I'll send
another patch blacklisting the drive completely. Is that OK?

1. https://support.lenovo.com/br/en/downloads/ds038904

Thanks and Best Regards,
Diego

Re: [PATCH v3] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-27 Thread Diego Viola

On Fri, Oct 26, 2018 at 5:36 PM Diego Viola  wrote:
>
> On Fri, Oct 26, 2018 at 11:21 AM Jens Axboe  wrote:
> >
> > On 10/26/18 7:45 AM, Diego Viola wrote:
> > > med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> > > MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> > >
> > > Switching the LPM to max_performance fixes this issue.
> >
> > Applied, thanks.
> >
> > --
> > Jens Axboe
> >
>
> Jens, Hans,
>
> Thank you.
>
> Diego

Hi Hans and Jens,

I just wanted to give you guys an update about this problem.

I've managed to update my SSD firmware to the latest version[1].

For running the update, I've had to install Windows 10, ran the
firmware update, remove Windows and reinstall Arch Linux.

The latest version of the firmware is DXT04L5Q, and it looks like
there hasn't been a new update since 2015.

I'll be running with med_power_with_dipm and hope this firmware update
fixes the problem, if it doesn't and I get another freeze, I'll send
another patch blacklisting the drive completely. Is that OK?

1. https://support.lenovo.com/br/en/downloads/ds038904

Thanks and Best Regards,
Diego

Re: [PATCH v3] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-26 Thread Diego Viola

On Fri, Oct 26, 2018 at 11:21 AM Jens Axboe  wrote:
>
> On 10/26/18 7:45 AM, Diego Viola wrote:
> > med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> > MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> >
> > Switching the LPM to max_performance fixes this issue.
>
> Applied, thanks.
>
> --
> Jens Axboe
>

Jens, Hans,

Thank you.

Diego

Re: [PATCH v3] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-26 Thread Diego Viola

On Fri, Oct 26, 2018 at 11:21 AM Jens Axboe  wrote:
>
> On 10/26/18 7:45 AM, Diego Viola wrote:
> > med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> > MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> >
> > Switching the LPM to max_performance fixes this issue.
>
> Applied, thanks.
>
> --
> Jens Axboe
>

Jens, Hans,

Thank you.

Diego

Re: [PATCH v2] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-26 Thread Diego Viola

On Fri, Oct 26, 2018 at 10:48 AM Diego Viola  wrote:
>
> Hi Hans,
>
> On Fri, Oct 26, 2018 at 8:59 AM Hans de Goede  wrote:
> >
> > Hi,
> >
> > On 26-10-18 03:58, Diego Viola wrote:
> > > med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> > > MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> > >
> > > Switching the LPM to max_performance fixes this issue.
> > >
> > > Signed-off-by: Diego Viola 
> >
> > Thank you for the patch, have you updated the firmware and
> > confirmed that this happens with the latest firmware version too ?
>
> No, I haven't tired the latest one yet.

s/tired/tried/

>
> The last time I saw, Samsung/Lenovo doesn't make it very easy for
> Linux users, and I don't have a Windows installation to perform the
> upgrade.
>
> >
> > If not please send a new version limiting the blacklist to your
> > firmware version.
>
> OK. Sent.
>
> >
> > Regards,
> >
> > Hans
> >
> >
> >
> >
> > > ---
> > >   drivers/ata/libata-core.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > > index a9dd4ea7467d..a7f5202a4815 100644
> > > --- a/drivers/ata/libata-core.c
> > > +++ b/drivers/ata/libata-core.c
> > > @@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
> > > ata_device_blacklist [] = {
> > >   /* These specific Samsung models/firmware-revs do not handle LPM 
> > > well */
> > >   { "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
> > >   { "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
> > > + { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
> > >
> > >   /* devices that don't properly handle queued TRIM commands */
> > >   { "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
> > >
>
> Regards,
> Diego

Re: [PATCH v2] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-26 Thread Diego Viola

On Fri, Oct 26, 2018 at 10:48 AM Diego Viola  wrote:
>
> Hi Hans,
>
> On Fri, Oct 26, 2018 at 8:59 AM Hans de Goede  wrote:
> >
> > Hi,
> >
> > On 26-10-18 03:58, Diego Viola wrote:
> > > med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> > > MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> > >
> > > Switching the LPM to max_performance fixes this issue.
> > >
> > > Signed-off-by: Diego Viola 
> >
> > Thank you for the patch, have you updated the firmware and
> > confirmed that this happens with the latest firmware version too ?
>
> No, I haven't tired the latest one yet.

s/tired/tried/

>
> The last time I saw, Samsung/Lenovo doesn't make it very easy for
> Linux users, and I don't have a Windows installation to perform the
> upgrade.
>
> >
> > If not please send a new version limiting the blacklist to your
> > firmware version.
>
> OK. Sent.
>
> >
> > Regards,
> >
> > Hans
> >
> >
> >
> >
> > > ---
> > >   drivers/ata/libata-core.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > > index a9dd4ea7467d..a7f5202a4815 100644
> > > --- a/drivers/ata/libata-core.c
> > > +++ b/drivers/ata/libata-core.c
> > > @@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
> > > ata_device_blacklist [] = {
> > >   /* These specific Samsung models/firmware-revs do not handle LPM 
> > > well */
> > >   { "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
> > >   { "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
> > > + { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
> > >
> > >   /* devices that don't properly handle queued TRIM commands */
> > >   { "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
> > >
>
> Regards,
> Diego

Re: [PATCH v2] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-26 Thread Diego Viola

Hi Hans,

On Fri, Oct 26, 2018 at 8:59 AM Hans de Goede  wrote:
>
> Hi,
>
> On 26-10-18 03:58, Diego Viola wrote:
> > med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> > MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> >
> > Switching the LPM to max_performance fixes this issue.
> >
> > Signed-off-by: Diego Viola 
>
> Thank you for the patch, have you updated the firmware and
> confirmed that this happens with the latest firmware version too ?

No, I haven't tired the latest one yet.

The last time I saw, Samsung/Lenovo doesn't make it very easy for
Linux users, and I don't have a Windows installation to perform the
upgrade.

>
> If not please send a new version limiting the blacklist to your
> firmware version.

OK. Sent.

>
> Regards,
>
> Hans
>
>
>
>
> > ---
> >   drivers/ata/libata-core.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > index a9dd4ea7467d..a7f5202a4815 100644
> > --- a/drivers/ata/libata-core.c
> > +++ b/drivers/ata/libata-core.c
> > @@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
> > ata_device_blacklist [] = {
> >   /* These specific Samsung models/firmware-revs do not handle LPM well 
> > */
> >   { "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
> >   { "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
> > + { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
> >
> >   /* devices that don't properly handle queued TRIM commands */
> >   { "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
> >

Regards,
Diego

Re: [PATCH v2] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-26 Thread Diego Viola

Hi Hans,

On Fri, Oct 26, 2018 at 8:59 AM Hans de Goede  wrote:
>
> Hi,
>
> On 26-10-18 03:58, Diego Viola wrote:
> > med_power_with_dipm causes my T450 to freeze with a SAMSUNG
> > MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).
> >
> > Switching the LPM to max_performance fixes this issue.
> >
> > Signed-off-by: Diego Viola 
>
> Thank you for the patch, have you updated the firmware and
> confirmed that this happens with the latest firmware version too ?

No, I haven't tired the latest one yet.

The last time I saw, Samsung/Lenovo doesn't make it very easy for
Linux users, and I don't have a Windows installation to perform the
upgrade.

>
> If not please send a new version limiting the blacklist to your
> firmware version.

OK. Sent.

>
> Regards,
>
> Hans
>
>
>
>
> > ---
> >   drivers/ata/libata-core.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > index a9dd4ea7467d..a7f5202a4815 100644
> > --- a/drivers/ata/libata-core.c
> > +++ b/drivers/ata/libata-core.c
> > @@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
> > ata_device_blacklist [] = {
> >   /* These specific Samsung models/firmware-revs do not handle LPM well 
> > */
> >   { "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
> >   { "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
> > + { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
> >
> >   /* devices that don't properly handle queued TRIM commands */
> >   { "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
> >

Regards,
Diego

[PATCH v3] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-26 Thread Diego Viola

med_power_with_dipm causes my T450 to freeze with a SAMSUNG
MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).

Switching the LPM to max_performance fixes this issue.

Signed-off-by: Diego Viola 
---
 drivers/ata/libata-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a9dd4ea7467d..6e594644cb1d 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", "DXT02L5Q", ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

[PATCH v3] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-26 Thread Diego Viola

med_power_with_dipm causes my T450 to freeze with a SAMSUNG
MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).

Switching the LPM to max_performance fixes this issue.

Signed-off-by: Diego Viola 
---
 drivers/ata/libata-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a9dd4ea7467d..6e594644cb1d 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", "DXT02L5Q", ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

[PATCH v2] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-25 Thread Diego Viola

med_power_with_dipm causes my T450 to freeze with a SAMSUNG
MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).

Switching the LPM to max_performance fixes this issue.

Signed-off-by: Diego Viola 
---
 drivers/ata/libata-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a9dd4ea7467d..a7f5202a4815 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

[PATCH v2] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-25 Thread Diego Viola

med_power_with_dipm causes my T450 to freeze with a SAMSUNG
MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).

Switching the LPM to max_performance fixes this issue.

Signed-off-by: Diego Viola 
---
 drivers/ata/libata-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a9dd4ea7467d..a7f5202a4815 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", NULL,   ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

[PATCH] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-25 Thread Diego Viola

med_power_with_dipm causes my T450 to freeze with a SAMSUNG
MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).

Switching the LPM to max_performance fixes this issue.

Signed-off-by: Diego Viola 
---
 drivers/ata/libata-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a9dd4ea7467d..57706ab1b545 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", NULL, ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

[PATCH] libata: Apply NOLPM quirk for SAMSUNG MZ7TD256HAFV-000L9

2018-10-25 Thread Diego Viola

med_power_with_dipm causes my T450 to freeze with a SAMSUNG
MZ7TD256HAFV-000L9 SSD (firmware DXT02L5Q).

Switching the LPM to max_performance fixes this issue.

Signed-off-by: Diego Viola 
---
 drivers/ata/libata-core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index a9dd4ea7467d..57706ab1b545 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4553,6 +4553,7 @@ static const struct ata_blacklist_entry 
ata_device_blacklist [] = {
/* These specific Samsung models/firmware-revs do not handle LPM well */
{ "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM, },
{ "SAMSUNG SSD PM830 mSATA *",  "CXM13D1Q", ATA_HORKAGE_NOLPM, },
+   { "SAMSUNG MZ7TD256HAFV-000L9", NULL, ATA_HORKAGE_NOLPM, },
 
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500IT_*","MU01", ATA_HORKAGE_NO_NCQ_TRIM |
-- 
2.19.1

[PATCH TRIVIAL v3 2/3] MAINTAINERS: punctuation fix

2018-09-27 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 02a39617e..db46fa83c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -93,7 +93,7 @@ Descriptions of section entries:
   Supported:   Someone is actually paid to look after this.
   Maintained:  Someone actually looks after it.
   Odd Fixes:   It has a maintainer but they don't have time to do
-   much other than throw the odd patch in. See below..
+   much other than throw the odd patch in. See below.
   Orphan:  No current maintainer [but maybe you could take the
role as you write your new code].
   Obsolete:Old code. Something tagged obsolete generally means
-- 
2.19.0

[PATCH TRIVIAL v3 2/3] MAINTAINERS: punctuation fix

2018-09-27 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 02a39617e..db46fa83c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -93,7 +93,7 @@ Descriptions of section entries:
   Supported:   Someone is actually paid to look after this.
   Maintained:  Someone actually looks after it.
   Odd Fixes:   It has a maintainer but they don't have time to do
-   much other than throw the odd patch in. See below..
+   much other than throw the odd patch in. See below.
   Orphan:  No current maintainer [but maybe you could take the
role as you write your new code].
   Obsolete:Old code. Something tagged obsolete generally means
-- 
2.19.0

[PATCH TRIVIAL v3 1/3] CREDITS: punctuation fix

2018-09-27 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 CREDITS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CREDITS b/CREDITS
index 5befd2d71..b82efb36d 100644
--- a/CREDITS
+++ b/CREDITS
@@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
 W: http://www.linuxdiskcert.org/
 D: Random SMP kernel hacker...
 D: Uniform Multi-Platform E-IDE driver
-D: Active-ATA-Chipset maddness..
+D: Active-ATA-Chipset maddness...
 D: Ultra DMA 133/100/66/33 w/48-bit Addressing
 D: ATA-Disconnect, ATA-TCQ
 D: ATA-Smart Kernel Daemon
-- 
2.19.0

[PATCH TRIVIAL v3 3/3] Makefile: punctuation fix

2018-09-27 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 0c90c4354..3dd188d52 100644
--- a/Makefile
+++ b/Makefile
@@ -1099,7 +1099,7 @@ archprepare: archheaders archscripts prepare1 
scripts_basic
 prepare0: archprepare gcc-plugins
$(Q)$(MAKE) $(build)=.
 
-# All the preparing..
+# All the preparing...
 prepare: prepare0 prepare-objtool
 
 # Support for using generic headers in asm-generic
-- 
2.19.0

[PATCH TRIVIAL v3 1/3] CREDITS: punctuation fix

2018-09-27 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 CREDITS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CREDITS b/CREDITS
index 5befd2d71..b82efb36d 100644
--- a/CREDITS
+++ b/CREDITS
@@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
 W: http://www.linuxdiskcert.org/
 D: Random SMP kernel hacker...
 D: Uniform Multi-Platform E-IDE driver
-D: Active-ATA-Chipset maddness..
+D: Active-ATA-Chipset maddness...
 D: Ultra DMA 133/100/66/33 w/48-bit Addressing
 D: ATA-Disconnect, ATA-TCQ
 D: ATA-Smart Kernel Daemon
-- 
2.19.0

[PATCH TRIVIAL v3 3/3] Makefile: punctuation fix

2018-09-27 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 0c90c4354..3dd188d52 100644
--- a/Makefile
+++ b/Makefile
@@ -1099,7 +1099,7 @@ archprepare: archheaders archscripts prepare1 
scripts_basic
 prepare0: archprepare gcc-plugins
$(Q)$(MAKE) $(build)=.
 
-# All the preparing..
+# All the preparing...
 prepare: prepare0 prepare-objtool
 
 # Support for using generic headers in asm-generic
-- 
2.19.0

Re: [PATCH TRIVIAL] Punctuation fixes

2018-09-24 Thread Diego Viola

On Sun, Sep 23, 2018 at 5:59 AM Richard Weinberger
 wrote:
>
> On Sun, Sep 23, 2018 at 3:00 AM Miguel Ojeda
>  wrote:
> >
> > Hi Diego,
> >
> > A few things, since it looks like this is your first patch.
>
> git log...
>
> > On Sat, Sep 22, 2018 at 5:56 PM, Diego Viola  wrote:
> > > On Wed, Sep 12, 2018 at 12:54 AM Diego Viola  
> > > wrote:
> > >>
> > >> Signed-off-by: Diego Viola 
> > >> ---
> > >>  CREDITS | 2 +-
> > >>  MAINTAINERS | 2 +-
> > >>  Makefile| 2 +-
> > >>  3 files changed, 3 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/CREDITS b/CREDITS
> > >> index 5befd2d71..b82efb36d 100644
> > >> --- a/CREDITS
> > >> +++ b/CREDITS
> > >> @@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
> > >>  W: http://www.linuxdiskcert.org/
> > >>  D: Random SMP kernel hacker...
> > >>  D: Uniform Multi-Platform E-IDE driver
> > >> -D: Active-ATA-Chipset maddness..
> > >> +D: Active-ATA-Chipset maddness...
> >
> > I think the extra dots is what was intended here :)
>
> Diego, I know you can do better. :-)
> How about running linux-next and hunting down regressions?
> I think we suggested this already on IRC.

OK. I'll do that after I'm done with the documentation.

>
> --
> Thanks,
> //richard

Thanks,
Diego

Re: [PATCH TRIVIAL] Punctuation fixes

2018-09-24 Thread Diego Viola

On Sun, Sep 23, 2018 at 5:59 AM Richard Weinberger
 wrote:
>
> On Sun, Sep 23, 2018 at 3:00 AM Miguel Ojeda
>  wrote:
> >
> > Hi Diego,
> >
> > A few things, since it looks like this is your first patch.
>
> git log...
>
> > On Sat, Sep 22, 2018 at 5:56 PM, Diego Viola  wrote:
> > > On Wed, Sep 12, 2018 at 12:54 AM Diego Viola  
> > > wrote:
> > >>
> > >> Signed-off-by: Diego Viola 
> > >> ---
> > >>  CREDITS | 2 +-
> > >>  MAINTAINERS | 2 +-
> > >>  Makefile| 2 +-
> > >>  3 files changed, 3 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/CREDITS b/CREDITS
> > >> index 5befd2d71..b82efb36d 100644
> > >> --- a/CREDITS
> > >> +++ b/CREDITS
> > >> @@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
> > >>  W: http://www.linuxdiskcert.org/
> > >>  D: Random SMP kernel hacker...
> > >>  D: Uniform Multi-Platform E-IDE driver
> > >> -D: Active-ATA-Chipset maddness..
> > >> +D: Active-ATA-Chipset maddness...
> >
> > I think the extra dots is what was intended here :)
>
> Diego, I know you can do better. :-)
> How about running linux-next and hunting down regressions?
> I think we suggested this already on IRC.

OK. I'll do that after I'm done with the documentation.

>
> --
> Thanks,
> //richard

Thanks,
Diego

Re: [PATCH TRIVIAL] Punctuation fixes

2018-09-24 Thread Diego Viola

Hi Miguel,

On Sat, Sep 22, 2018 at 9:59 PM Miguel Ojeda
 wrote:
>
> Hi Diego,
>
> A few things, since it looks like this is your first patch.
>
> On Sat, Sep 22, 2018 at 5:56 PM, Diego Viola  wrote:
> > On Wed, Sep 12, 2018 at 12:54 AM Diego Viola  wrote:
> >>
> >> Signed-off-by: Diego Viola 
> >> ---
> >>  CREDITS | 2 +-
> >>  MAINTAINERS | 2 +-
> >>  Makefile| 2 +-
> >>  3 files changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/CREDITS b/CREDITS
> >> index 5befd2d71..b82efb36d 100644
> >> --- a/CREDITS
> >> +++ b/CREDITS
> >> @@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
> >>  W: http://www.linuxdiskcert.org/
> >>  D: Random SMP kernel hacker...
> >>  D: Uniform Multi-Platform E-IDE driver
> >> -D: Active-ATA-Chipset maddness..
> >> +D: Active-ATA-Chipset maddness...
>
> I think the extra dots is what was intended here :)

OK.

>
> >>  D: Ultra DMA 133/100/66/33 w/48-bit Addressing
> >>  D: ATA-Disconnect, ATA-TCQ
> >>  D: ATA-Smart Kernel Daemon
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index d870cb57c..6567bf245 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -93,7 +93,7 @@ Descriptions of section entries:
> >>Supported:   Someone is actually paid to look after this.
> >>Maintained:  Someone actually looks after it.
> >>Odd Fixes:   It has a maintainer but they don't have time to do
> >> -   much other than throw the odd patch in. See below..
> >> +   much other than throw the odd patch in. See below.
> >>Orphan:  No current maintainer [but maybe you could take the
> >> role as you write your new code].
> >>Obsolete:Old code. Something tagged obsolete generally means
> >> diff --git a/Makefile b/Makefile
> >> index 4d5c883a9..7b5c5d634 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -1109,7 +1109,7 @@ archprepare: archheaders archscripts prepare1 
> >> scripts_basic
> >>  prepare0: archprepare gcc-plugins
> >> $(Q)$(MAKE) $(build)=.
> >>
> >> -# All the preparing..
> >> +# All the preparing...
> >>  prepare: prepare0 prepare-objtool
> >>
> >>  # Support for using generic headers in asm-generic
> >> --
> >> 2.19.0
> >>
> >
> > Jonathan,
> >
> > can you please merge my patch? After you merge this one, I'll go
> > through the rest of the documentation to make it more consistent.
>
> Jonathan is not responsible for these files. Jiri Kosina is the
> maintainer of the trivial tree.

OK, I'm CCing him with this email.

>
> There are 1700+ instances of a "two-stops ellipsis" in the kernel. I
> am not sure they were mistakes, given their prevalence, even if they are
> not formal/common English. Also, I am not sure a commit for each 3 is
> really appropriate! :) Maybe do the full sweep of the Documentation
> ones in a single patch?

Good idea, do you know what approach I can use for the full sweep? sed
perhaps? Doing this manually would be very tedious and using sed might
be error prone.

>
> Also, typically you would note in the subject that this is a "v2" of
> your previous patch (if I am not wrong, you sent a first version
> fixing only one of the 3 mistakes).

Correct, this is my V2 of the first patch. Sorry for not specifying that.

>
> Thanks for your first patch!
>
> Cheers,
> Miguel

Thanks,
Diego

Re: [PATCH TRIVIAL] Punctuation fixes

2018-09-24 Thread Diego Viola

Hi Miguel,

On Sat, Sep 22, 2018 at 9:59 PM Miguel Ojeda
 wrote:
>
> Hi Diego,
>
> A few things, since it looks like this is your first patch.
>
> On Sat, Sep 22, 2018 at 5:56 PM, Diego Viola  wrote:
> > On Wed, Sep 12, 2018 at 12:54 AM Diego Viola  wrote:
> >>
> >> Signed-off-by: Diego Viola 
> >> ---
> >>  CREDITS | 2 +-
> >>  MAINTAINERS | 2 +-
> >>  Makefile| 2 +-
> >>  3 files changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/CREDITS b/CREDITS
> >> index 5befd2d71..b82efb36d 100644
> >> --- a/CREDITS
> >> +++ b/CREDITS
> >> @@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
> >>  W: http://www.linuxdiskcert.org/
> >>  D: Random SMP kernel hacker...
> >>  D: Uniform Multi-Platform E-IDE driver
> >> -D: Active-ATA-Chipset maddness..
> >> +D: Active-ATA-Chipset maddness...
>
> I think the extra dots is what was intended here :)

OK.

>
> >>  D: Ultra DMA 133/100/66/33 w/48-bit Addressing
> >>  D: ATA-Disconnect, ATA-TCQ
> >>  D: ATA-Smart Kernel Daemon
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index d870cb57c..6567bf245 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -93,7 +93,7 @@ Descriptions of section entries:
> >>Supported:   Someone is actually paid to look after this.
> >>Maintained:  Someone actually looks after it.
> >>Odd Fixes:   It has a maintainer but they don't have time to do
> >> -   much other than throw the odd patch in. See below..
> >> +   much other than throw the odd patch in. See below.
> >>Orphan:  No current maintainer [but maybe you could take the
> >> role as you write your new code].
> >>Obsolete:Old code. Something tagged obsolete generally means
> >> diff --git a/Makefile b/Makefile
> >> index 4d5c883a9..7b5c5d634 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -1109,7 +1109,7 @@ archprepare: archheaders archscripts prepare1 
> >> scripts_basic
> >>  prepare0: archprepare gcc-plugins
> >> $(Q)$(MAKE) $(build)=.
> >>
> >> -# All the preparing..
> >> +# All the preparing...
> >>  prepare: prepare0 prepare-objtool
> >>
> >>  # Support for using generic headers in asm-generic
> >> --
> >> 2.19.0
> >>
> >
> > Jonathan,
> >
> > can you please merge my patch? After you merge this one, I'll go
> > through the rest of the documentation to make it more consistent.
>
> Jonathan is not responsible for these files. Jiri Kosina is the
> maintainer of the trivial tree.

OK, I'm CCing him with this email.

>
> There are 1700+ instances of a "two-stops ellipsis" in the kernel. I
> am not sure they were mistakes, given their prevalence, even if they are
> not formal/common English. Also, I am not sure a commit for each 3 is
> really appropriate! :) Maybe do the full sweep of the Documentation
> ones in a single patch?

Good idea, do you know what approach I can use for the full sweep? sed
perhaps? Doing this manually would be very tedious and using sed might
be error prone.

>
> Also, typically you would note in the subject that this is a "v2" of
> your previous patch (if I am not wrong, you sent a first version
> fixing only one of the 3 mistakes).

Correct, this is my V2 of the first patch. Sorry for not specifying that.

>
> Thanks for your first patch!
>
> Cheers,
> Miguel

Thanks,
Diego

Re: [PATCH TRIVIAL] Punctuation fixes

2018-09-22 Thread Diego Viola

On Wed, Sep 12, 2018 at 12:54 AM Diego Viola  wrote:
>
> Signed-off-by: Diego Viola 
> ---
>  CREDITS | 2 +-
>  MAINTAINERS | 2 +-
>  Makefile| 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/CREDITS b/CREDITS
> index 5befd2d71..b82efb36d 100644
> --- a/CREDITS
> +++ b/CREDITS
> @@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
>  W: http://www.linuxdiskcert.org/
>  D: Random SMP kernel hacker...
>  D: Uniform Multi-Platform E-IDE driver
> -D: Active-ATA-Chipset maddness..
> +D: Active-ATA-Chipset maddness...
>  D: Ultra DMA 133/100/66/33 w/48-bit Addressing
>  D: ATA-Disconnect, ATA-TCQ
>  D: ATA-Smart Kernel Daemon
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d870cb57c..6567bf245 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -93,7 +93,7 @@ Descriptions of section entries:
>Supported:   Someone is actually paid to look after this.
>Maintained:  Someone actually looks after it.
>Odd Fixes:   It has a maintainer but they don't have time to do
> -   much other than throw the odd patch in. See below..
> +   much other than throw the odd patch in. See below.
>Orphan:  No current maintainer [but maybe you could take the
> role as you write your new code].
>Obsolete:Old code. Something tagged obsolete generally means
> diff --git a/Makefile b/Makefile
> index 4d5c883a9..7b5c5d634 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1109,7 +1109,7 @@ archprepare: archheaders archscripts prepare1 
> scripts_basic
>  prepare0: archprepare gcc-plugins
> $(Q)$(MAKE) $(build)=.
>
> -# All the preparing..
> +# All the preparing...
>  prepare: prepare0 prepare-objtool
>
>  # Support for using generic headers in asm-generic
> --
> 2.19.0
>

Jonathan,

can you please merge my patch? After you merge this one, I'll go
through the rest of the documentation to make it more consistent.

Diego

Re: [PATCH TRIVIAL] Punctuation fixes

2018-09-22 Thread Diego Viola

On Wed, Sep 12, 2018 at 12:54 AM Diego Viola  wrote:
>
> Signed-off-by: Diego Viola 
> ---
>  CREDITS | 2 +-
>  MAINTAINERS | 2 +-
>  Makefile| 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/CREDITS b/CREDITS
> index 5befd2d71..b82efb36d 100644
> --- a/CREDITS
> +++ b/CREDITS
> @@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
>  W: http://www.linuxdiskcert.org/
>  D: Random SMP kernel hacker...
>  D: Uniform Multi-Platform E-IDE driver
> -D: Active-ATA-Chipset maddness..
> +D: Active-ATA-Chipset maddness...
>  D: Ultra DMA 133/100/66/33 w/48-bit Addressing
>  D: ATA-Disconnect, ATA-TCQ
>  D: ATA-Smart Kernel Daemon
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d870cb57c..6567bf245 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -93,7 +93,7 @@ Descriptions of section entries:
>Supported:   Someone is actually paid to look after this.
>Maintained:  Someone actually looks after it.
>Odd Fixes:   It has a maintainer but they don't have time to do
> -   much other than throw the odd patch in. See below..
> +   much other than throw the odd patch in. See below.
>Orphan:  No current maintainer [but maybe you could take the
> role as you write your new code].
>Obsolete:Old code. Something tagged obsolete generally means
> diff --git a/Makefile b/Makefile
> index 4d5c883a9..7b5c5d634 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1109,7 +1109,7 @@ archprepare: archheaders archscripts prepare1 
> scripts_basic
>  prepare0: archprepare gcc-plugins
> $(Q)$(MAKE) $(build)=.
>
> -# All the preparing..
> +# All the preparing...
>  prepare: prepare0 prepare-objtool
>
>  # Support for using generic headers in asm-generic
> --
> 2.19.0
>

Jonathan,

can you please merge my patch? After you merge this one, I'll go
through the rest of the documentation to make it more consistent.

Diego

[PATCH TRIVIAL] Punctuation fixes

2018-09-11 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 CREDITS | 2 +-
 MAINTAINERS | 2 +-
 Makefile| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/CREDITS b/CREDITS
index 5befd2d71..b82efb36d 100644
--- a/CREDITS
+++ b/CREDITS
@@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
 W: http://www.linuxdiskcert.org/
 D: Random SMP kernel hacker...
 D: Uniform Multi-Platform E-IDE driver
-D: Active-ATA-Chipset maddness..
+D: Active-ATA-Chipset maddness...
 D: Ultra DMA 133/100/66/33 w/48-bit Addressing
 D: ATA-Disconnect, ATA-TCQ
 D: ATA-Smart Kernel Daemon
diff --git a/MAINTAINERS b/MAINTAINERS
index d870cb57c..6567bf245 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -93,7 +93,7 @@ Descriptions of section entries:
   Supported:   Someone is actually paid to look after this.
   Maintained:  Someone actually looks after it.
   Odd Fixes:   It has a maintainer but they don't have time to do
-   much other than throw the odd patch in. See below..
+   much other than throw the odd patch in. See below.
   Orphan:  No current maintainer [but maybe you could take the
role as you write your new code].
   Obsolete:Old code. Something tagged obsolete generally means
diff --git a/Makefile b/Makefile
index 4d5c883a9..7b5c5d634 100644
--- a/Makefile
+++ b/Makefile
@@ -1109,7 +1109,7 @@ archprepare: archheaders archscripts prepare1 
scripts_basic
 prepare0: archprepare gcc-plugins
$(Q)$(MAKE) $(build)=.
 
-# All the preparing..
+# All the preparing...
 prepare: prepare0 prepare-objtool
 
 # Support for using generic headers in asm-generic
-- 
2.19.0

[PATCH TRIVIAL] Punctuation fixes

2018-09-11 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 CREDITS | 2 +-
 MAINTAINERS | 2 +-
 Makefile| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/CREDITS b/CREDITS
index 5befd2d71..b82efb36d 100644
--- a/CREDITS
+++ b/CREDITS
@@ -1473,7 +1473,7 @@ W: http://www.linux-ide.org/
 W: http://www.linuxdiskcert.org/
 D: Random SMP kernel hacker...
 D: Uniform Multi-Platform E-IDE driver
-D: Active-ATA-Chipset maddness..
+D: Active-ATA-Chipset maddness...
 D: Ultra DMA 133/100/66/33 w/48-bit Addressing
 D: ATA-Disconnect, ATA-TCQ
 D: ATA-Smart Kernel Daemon
diff --git a/MAINTAINERS b/MAINTAINERS
index d870cb57c..6567bf245 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -93,7 +93,7 @@ Descriptions of section entries:
   Supported:   Someone is actually paid to look after this.
   Maintained:  Someone actually looks after it.
   Odd Fixes:   It has a maintainer but they don't have time to do
-   much other than throw the odd patch in. See below..
+   much other than throw the odd patch in. See below.
   Orphan:  No current maintainer [but maybe you could take the
role as you write your new code].
   Obsolete:Old code. Something tagged obsolete generally means
diff --git a/Makefile b/Makefile
index 4d5c883a9..7b5c5d634 100644
--- a/Makefile
+++ b/Makefile
@@ -1109,7 +1109,7 @@ archprepare: archheaders archscripts prepare1 
scripts_basic
 prepare0: archprepare gcc-plugins
$(Q)$(MAKE) $(build)=.
 
-# All the preparing..
+# All the preparing...
 prepare: prepare0 prepare-objtool
 
 # Support for using generic headers in asm-generic
-- 
2.19.0

[PATCH TRIVIAL] MAINTAINERS: small punctuation fix

2018-09-09 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d870cb57c..6567bf245 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -93,7 +93,7 @@ Descriptions of section entries:
   Supported:   Someone is actually paid to look after this.
   Maintained:  Someone actually looks after it.
   Odd Fixes:   It has a maintainer but they don't have time to do
-   much other than throw the odd patch in. See below..
+   much other than throw the odd patch in. See below.
   Orphan:  No current maintainer [but maybe you could take the
role as you write your new code].
   Obsolete:Old code. Something tagged obsolete generally means
-- 
2.18.0

[PATCH TRIVIAL] MAINTAINERS: small punctuation fix

2018-09-09 Thread Diego Viola

Signed-off-by: Diego Viola 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d870cb57c..6567bf245 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -93,7 +93,7 @@ Descriptions of section entries:
   Supported:   Someone is actually paid to look after this.
   Maintained:  Someone actually looks after it.
   Odd Fixes:   It has a maintainer but they don't have time to do
-   much other than throw the odd patch in. See below..
+   much other than throw the odd patch in. See below.
   Orphan:  No current maintainer [but maybe you could take the
role as you write your new code].
   Obsolete:Old code. Something tagged obsolete generally means
-- 
2.18.0

Re: Regression: x86/tsc: Fix mark_tsc_unstable()

2018-06-11 Thread Diego Viola

On Mon, Jun 11, 2018 at 3:11 PM, Jeremy Cline  wrote:
> On 06/11/2018 01:56 PM, Jeremy Cline wrote:
>> On 06/11/2018 11:30 AM, Peter Zijlstra wrote:
>>> On Mon, Jun 11, 2018 at 04:38:01PM +0200, Peter Zijlstra wrote:
 On Mon, Jun 11, 2018 at 04:17:42PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 11, 2018 at 01:59:15PM +, Jeremy Cline wrote:
>> A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4
>> ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter,
>> explicitly setting "tsc=" on the kernel command line causes the boot to
>> always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
>
> Weird. So Core2 typically triggers mark_tsc_unstable() in either
> intel_idle or processor_idle. ISTR testing that when I did the patches.
>
> When I make that mark_tsc_unstable() in the idle drivers unconditional
> and boot my ivb with that, it doesn't want to fail. I've booted the
> machine 5 consequctive times without issue.
>
> Let me try and checkout -stable, maybe something's up with that.

 Nope -stable seems to be working as well on the IVB (with modification).
 I just dug up my T500 and that's actually still running the test kernel.
 Let me try and build the -stable kernel for that.
>>>
>>> 4.16.8 works without issue on my T500 with a debian/ubuntu like distro
>>> config.
>>>
>>
>> Adding mmarget (who bisected the problem) to the CC.
>>
>> It might well be something Fedora-specific, then. I just noticed mmarget
>> commented over the weekend noting that they couldn't reproduce the
>> problem without using the initramfs generated during the RPM install of
>> the kernel. mmarget's theory was that it's a race condition that doesn't
>> occur when the initramfs takes long enough to unpack, but I don't know
>> enough about the early boot process *or* how Fedora's generating the
>> initramfs for RPM installs vs "make install" yet to know how likely that
>> is. I'm going to have to do some research.
>>
>> Thanks for looking into this so quickly and also sorry if this turns out
>> to be a Fedora problem :(
>
> Attached is the Fedora configuration for 4.16.8, as well, in case you'd
> like to test it with that.
>
> Thanks,
> Jeremy

Hi Jeremy,

I've compiled 4.16.8 with your config and booted my machine about 10
times with this kernel, and I'm unable to reproduce the issue.

Maybe it's an issue with the Fedora initramfs?

Diego

Re: Regression: x86/tsc: Fix mark_tsc_unstable()

2018-06-11 Thread Diego Viola

On Mon, Jun 11, 2018 at 3:11 PM, Jeremy Cline  wrote:
> On 06/11/2018 01:56 PM, Jeremy Cline wrote:
>> On 06/11/2018 11:30 AM, Peter Zijlstra wrote:
>>> On Mon, Jun 11, 2018 at 04:38:01PM +0200, Peter Zijlstra wrote:
 On Mon, Jun 11, 2018 at 04:17:42PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 11, 2018 at 01:59:15PM +, Jeremy Cline wrote:
>> A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4
>> ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter,
>> explicitly setting "tsc=" on the kernel command line causes the boot to
>> always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
>
> Weird. So Core2 typically triggers mark_tsc_unstable() in either
> intel_idle or processor_idle. ISTR testing that when I did the patches.
>
> When I make that mark_tsc_unstable() in the idle drivers unconditional
> and boot my ivb with that, it doesn't want to fail. I've booted the
> machine 5 consequctive times without issue.
>
> Let me try and checkout -stable, maybe something's up with that.

 Nope -stable seems to be working as well on the IVB (with modification).
 I just dug up my T500 and that's actually still running the test kernel.
 Let me try and build the -stable kernel for that.
>>>
>>> 4.16.8 works without issue on my T500 with a debian/ubuntu like distro
>>> config.
>>>
>>
>> Adding mmarget (who bisected the problem) to the CC.
>>
>> It might well be something Fedora-specific, then. I just noticed mmarget
>> commented over the weekend noting that they couldn't reproduce the
>> problem without using the initramfs generated during the RPM install of
>> the kernel. mmarget's theory was that it's a race condition that doesn't
>> occur when the initramfs takes long enough to unpack, but I don't know
>> enough about the early boot process *or* how Fedora's generating the
>> initramfs for RPM installs vs "make install" yet to know how likely that
>> is. I'm going to have to do some research.
>>
>> Thanks for looking into this so quickly and also sorry if this turns out
>> to be a Fedora problem :(
>
> Attached is the Fedora configuration for 4.16.8, as well, in case you'd
> like to test it with that.
>
> Thanks,
> Jeremy

Hi Jeremy,

I've compiled 4.16.8 with your config and booted my machine about 10
times with this kernel, and I'm unable to reproduce the issue.

Maybe it's an issue with the Fedora initramfs?

Diego

Re: Regression: x86/tsc: Fix mark_tsc_unstable()

2018-06-11 Thread Diego Viola

On Mon, Jun 11, 2018 at 10:59 AM, Jeremy Cline  wrote:
> Hi folks,
>
> A few Fedora users have reported[0] a regression starting in v4.16.8
> where the boot will hang ~1/3 of the time with the following RCU stall
> warning:
>
> INFO: rcu_sched detected stalls on CPUs/tasks:
> o1-...!: (0 ticks this GP) idle=688/0/0 softirq=171/171 fqs=0
> o(detected by 0, t=60002 jiffies, g=-142, c=-143, q=9)
> Sending NMI from CPU 0 to CPU 1:
> NMI backtrace for cpu 1 skipped: idling at
> acpi_processor_ffh_cstate_enter+0x65/0xb0
> rcu_sched kthread starved for 60002 jiffies! g18446744073709551474
> c1844674407370955143 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 -> cpu=1
> RCU grace-period kthread stack dump:
> rcu_sched   I0 9  2 0x8000
> Call Trace:
>  ? __schedule+0x234/0x850
>  schedule+0x28/0x80
>  schedule_timeout+0x166/0x380
>  ? __next_timer_interrupt+0xc0/0xc0
>  rcu_gp_kthread+0x368/0x830
>  ? rcu_process_callbacks+0x4f0/0x4f0
>  kthread+0x112/0x130
>  ? kthread_create_worker_on_cpu+0x70/0x70
>  ret_from_fork+0x35/0x40
>
> A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4
> ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter,
> explicitly setting "tsc=" on the kernel command line causes the boot to
> always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1579925
>
>
> Thanks,
> Jeremy

Everything works fine here with 4.16.8+ on my desktop with E5500 CPU.

[diego@dualcore ~]$ uname -a
Linux dualcore 4.16.13-2-ARCH #1 SMP PREEMPT Fri Jun 1 18:46:11 UTC
2018 x86_64 GNU/Linux
[diego@dualcore ~]$

Re: Regression: x86/tsc: Fix mark_tsc_unstable()

2018-06-11 Thread Diego Viola

On Mon, Jun 11, 2018 at 10:59 AM, Jeremy Cline  wrote:
> Hi folks,
>
> A few Fedora users have reported[0] a regression starting in v4.16.8
> where the boot will hang ~1/3 of the time with the following RCU stall
> warning:
>
> INFO: rcu_sched detected stalls on CPUs/tasks:
> o1-...!: (0 ticks this GP) idle=688/0/0 softirq=171/171 fqs=0
> o(detected by 0, t=60002 jiffies, g=-142, c=-143, q=9)
> Sending NMI from CPU 0 to CPU 1:
> NMI backtrace for cpu 1 skipped: idling at
> acpi_processor_ffh_cstate_enter+0x65/0xb0
> rcu_sched kthread starved for 60002 jiffies! g18446744073709551474
> c1844674407370955143 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 -> cpu=1
> RCU grace-period kthread stack dump:
> rcu_sched   I0 9  2 0x8000
> Call Trace:
>  ? __schedule+0x234/0x850
>  schedule+0x28/0x80
>  schedule_timeout+0x166/0x380
>  ? __next_timer_interrupt+0xc0/0xc0
>  rcu_gp_kthread+0x368/0x830
>  ? rcu_process_callbacks+0x4f0/0x4f0
>  kthread+0x112/0x130
>  ? kthread_create_worker_on_cpu+0x70/0x70
>  ret_from_fork+0x35/0x40
>
> A user has bisected the problem to the v4.16 commit 1ab4ca7c59d4
> ("x86/tsc: Fix mark_tsc_unstable()"). According to the reporter,
> explicitly setting "tsc=" on the kernel command line causes the boot to
> always succeed. All the users have Thinkpad T500s or T400s (Core 2 Duos)
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1579925
>
>
> Thanks,
> Jeremy

Everything works fine here with 4.16.8+ on my desktop with E5500 CPU.

[diego@dualcore ~]$ uname -a
Linux dualcore 4.16.13-2-ARCH #1 SMP PREEMPT Fri Jun 1 18:46:11 UTC
2018 x86_64 GNU/Linux
[diego@dualcore ~]$

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 12:10 PM, Peter Zijlstra  wrote:
> On Mon, Apr 23, 2018 at 04:00:52PM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 23, 2018 at 02:38:47PM +0200, Peter Zijlstra wrote:
>> > @@ -834,8 +857,12 @@ static int clocksource_unbind(struct clocksource *cs)
>> > if (curr_clocksource == cs)
>> > return -EBUSY;
>> > }
>> > +
>> > +   clocksource_watchdog_lock();
>> > clocksource_dequeue_watchdog(cs);
>> > list_del_init(>list);
>> > +   clocksource_watchdog_lock();
>>
>> It works a lot better when this is _unlock();
>>
>> > +
>> > return 0;
>> >  }
>> >
>>
>> With that, and intel_idle.c modified to always issue the "halts in
>> idle" thing, my IVB comes up in hpet mode. Let me dig up a Core2 era
>> machine and see if that works too.
>
> OK, that was fairly painful, but the box managed to boot a modified
> kernel and seems to function correctly, lemme try suspend/resume.
>
>

I can confirm that applying Peter's patch to the 4.17-rc2 kernel and
changing kernel/time/clocksource.c:864 to _unlock solves my problem.

Peter, thank you so much, I appreciate your help a lot.

Diego

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 12:10 PM, Peter Zijlstra  wrote:
> On Mon, Apr 23, 2018 at 04:00:52PM +0200, Peter Zijlstra wrote:
>> On Mon, Apr 23, 2018 at 02:38:47PM +0200, Peter Zijlstra wrote:
>> > @@ -834,8 +857,12 @@ static int clocksource_unbind(struct clocksource *cs)
>> > if (curr_clocksource == cs)
>> > return -EBUSY;
>> > }
>> > +
>> > +   clocksource_watchdog_lock();
>> > clocksource_dequeue_watchdog(cs);
>> > list_del_init(>list);
>> > +   clocksource_watchdog_lock();
>>
>> It works a lot better when this is _unlock();
>>
>> > +
>> > return 0;
>> >  }
>> >
>>
>> With that, and intel_idle.c modified to always issue the "halts in
>> idle" thing, my IVB comes up in hpet mode. Let me dig up a Core2 era
>> machine and see if that works too.
>
> OK, that was fairly painful, but the box managed to boot a modified
> kernel and seems to function correctly, lemme try suspend/resume.
>
>

I can confirm that applying Peter's patch to the 4.17-rc2 kernel and
changing kernel/time/clocksource.c:864 to _unlock solves my problem.

Peter, thank you so much, I appreciate your help a lot.

Diego

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 11:00 AM, Peter Zijlstra  wrote:
> On Mon, Apr 23, 2018 at 02:38:47PM +0200, Peter Zijlstra wrote:
>> @@ -834,8 +857,12 @@ static int clocksource_unbind(struct clocksource *cs)
>>   if (curr_clocksource == cs)
>>   return -EBUSY;
>>   }
>> +
>> + clocksource_watchdog_lock();
>>   clocksource_dequeue_watchdog(cs);
>>   list_del_init(>list);
>> + clocksource_watchdog_lock();
>
> It works a lot better when this is _unlock();
>
>> +
>>   return 0;
>>  }
>>
>
> With that, and intel_idle.c modified to always issue the "halts in
> idle" thing, my IVB comes up in hpet mode. Let me dig up a Core2 era
> machine and see if that works too.

OK, thanks.

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 11:00 AM, Peter Zijlstra  wrote:
> On Mon, Apr 23, 2018 at 02:38:47PM +0200, Peter Zijlstra wrote:
>> @@ -834,8 +857,12 @@ static int clocksource_unbind(struct clocksource *cs)
>>   if (curr_clocksource == cs)
>>   return -EBUSY;
>>   }
>> +
>> + clocksource_watchdog_lock();
>>   clocksource_dequeue_watchdog(cs);
>>   list_del_init(>list);
>> + clocksource_watchdog_lock();
>
> It works a lot better when this is _unlock();
>
>> +
>>   return 0;
>>  }
>>
>
> With that, and intel_idle.c modified to always issue the "halts in
> idle" thing, my IVB comes up in hpet mode. Let me dig up a Core2 era
> machine and see if that works too.

OK, thanks.

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 9:49 AM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Mon, Apr 23, 2018 at 9:38 AM, Peter Zijlstra <pet...@infradead.org> wrote:
>> On Mon, Apr 23, 2018 at 08:49:25AM -0300, Diego Viola wrote:
>>> On Mon, Apr 23, 2018 at 8:48 AM, Peter Zijlstra <pet...@infradead.org> 
>>> wrote:
>>> > On Mon, Apr 23, 2018 at 08:23:24AM -0300, Diego Viola wrote:
>>> >> > That's a Core2 era chip; does it actually have stable TSC ?
>>> >>
>>> >> I'm not sure.
>>> >
>>> > dmesg | grep -i tsc
>>> >
>>> > should be able to tell you.
>>>
>>> [diego@dualcore ~]$ dmesg | grep -i tsc
>>> [0.00] tsc: Fast TSC calibration using PIT
>>> [0.01] tsc: Fast TSC calibration using PIT
>>> [0.01] tsc: Detected 2793.087 MHz processor
>>> [0.01] clocksource: tsc-early: mask: 0x
>>> max_cycles: 0x2842be30f1f, max_idle_ns: 440795236296 ns
>>> [0.162058] clocksource: Switched to clocksource tsc-early
>>> [0.300076] tsc: Marking TSC unstable due to TSC halts in idle
>>> [diego@dualcore ~]$
>>
>> Much thanks.. I suspect there a bunch of fail when marking unstable
>> before we register clocksource_tsc.
>>
>> The below patch is a bit ugly, but should cure a number of things; it
>> compiles but hasn't otherwise been tested, can you give it a spin?
>>
>> ---
>>
>>  - when TSC is unstable and we've already registered tsc-early, don't
>>forget to unregister it; this then leaves us without a tsc
>>clocksource entirely -- which is good.
>>
>>  - when we call mark_tsc_unstable() before we've registered
>>clocksource_tsc things go wobbly because it doesn't know about
>>clocksource_tsc_early. Fix that by:
>>
>> - Make clocksource_mark_unstable() work for unregistered
>>   clocksources.
>>
>>- which means we have to be able to detect this; use cs.list
>>  for this; initialize it empty.
>>- means we also have to place all cs.list manipulation under
>>  watchdog_lock -- bit ugly.
>>
>> - Make __clocksource_unstable() de-rate the clocksource.
>>
>> - Call clocksource_mark_unstable() on both tsc and tsc_early.
>>
>> This way we should either end up with a derated tsc clocksource marked
>> UNSTABLE or no tsc clocksource at all, either should result in it not
>> becoming the active clocksource.
>>
>> ---
>>  arch/x86/kernel/tsc.c | 22 +++---
>>  kernel/time/clocksource.c | 43 +++
>>  2 files changed, 46 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
>> index 91e6da48cbb6..74392d9d51e0 100644
>> --- a/arch/x86/kernel/tsc.c
>> +++ b/arch/x86/kernel/tsc.c
>> @@ -1067,6 +1067,7 @@ static struct clocksource clocksource_tsc_early = {
>> .resume = tsc_resume,
>> .mark_unstable  = tsc_cs_mark_unstable,
>> .tick_stable= tsc_cs_tick_stable,
>> +   .list   = LIST_HEAD_INIT(clocksource_tsc_early.list),
>>  };
>>
>>  /*
>> @@ -1086,6 +1087,7 @@ static struct clocksource clocksource_tsc = {
>> .resume = tsc_resume,
>> .mark_unstable  = tsc_cs_mark_unstable,
>> .tick_stable= tsc_cs_tick_stable,
>> +   .list   = LIST_HEAD_INIT(clocksource_tsc.list),
>>  };
>>
>>  void mark_tsc_unstable(char *reason)
>> @@ -1098,13 +1100,9 @@ void mark_tsc_unstable(char *reason)
>> clear_sched_clock_stable();
>> disable_sched_clock_irqtime();
>> pr_info("Marking TSC unstable due to %s\n", reason);
>> -   /* Change only the rating, when not registered */
>> -   if (clocksource_tsc.mult) {
>> -   clocksource_mark_unstable(_tsc);
>> -   } else {
>> -   clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
>> -   clocksource_tsc.rating = 0;
>> -   }
>> +
>> +   clocksource_mark_unstable(_tsc_early);
>> +   clocksource_mark_unstable(_tsc);
>>  }
>>
>>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
>> @@ -1244,7 +1242,7 @@ static void tsc_refine_calibration_work(struct 
>> work_struct *work)
>>
>> /* Don't bother refining TSC on unstable systems */
>> if (tsc_unstable

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 9:49 AM, Diego Viola  wrote:
> On Mon, Apr 23, 2018 at 9:38 AM, Peter Zijlstra  wrote:
>> On Mon, Apr 23, 2018 at 08:49:25AM -0300, Diego Viola wrote:
>>> On Mon, Apr 23, 2018 at 8:48 AM, Peter Zijlstra  
>>> wrote:
>>> > On Mon, Apr 23, 2018 at 08:23:24AM -0300, Diego Viola wrote:
>>> >> > That's a Core2 era chip; does it actually have stable TSC ?
>>> >>
>>> >> I'm not sure.
>>> >
>>> > dmesg | grep -i tsc
>>> >
>>> > should be able to tell you.
>>>
>>> [diego@dualcore ~]$ dmesg | grep -i tsc
>>> [0.00] tsc: Fast TSC calibration using PIT
>>> [0.01] tsc: Fast TSC calibration using PIT
>>> [0.01] tsc: Detected 2793.087 MHz processor
>>> [0.01] clocksource: tsc-early: mask: 0x
>>> max_cycles: 0x2842be30f1f, max_idle_ns: 440795236296 ns
>>> [0.162058] clocksource: Switched to clocksource tsc-early
>>> [0.300076] tsc: Marking TSC unstable due to TSC halts in idle
>>> [diego@dualcore ~]$
>>
>> Much thanks.. I suspect there a bunch of fail when marking unstable
>> before we register clocksource_tsc.
>>
>> The below patch is a bit ugly, but should cure a number of things; it
>> compiles but hasn't otherwise been tested, can you give it a spin?
>>
>> ---
>>
>>  - when TSC is unstable and we've already registered tsc-early, don't
>>forget to unregister it; this then leaves us without a tsc
>>clocksource entirely -- which is good.
>>
>>  - when we call mark_tsc_unstable() before we've registered
>>clocksource_tsc things go wobbly because it doesn't know about
>>clocksource_tsc_early. Fix that by:
>>
>> - Make clocksource_mark_unstable() work for unregistered
>>   clocksources.
>>
>>- which means we have to be able to detect this; use cs.list
>>  for this; initialize it empty.
>>- means we also have to place all cs.list manipulation under
>>  watchdog_lock -- bit ugly.
>>
>> - Make __clocksource_unstable() de-rate the clocksource.
>>
>> - Call clocksource_mark_unstable() on both tsc and tsc_early.
>>
>> This way we should either end up with a derated tsc clocksource marked
>> UNSTABLE or no tsc clocksource at all, either should result in it not
>> becoming the active clocksource.
>>
>> ---
>>  arch/x86/kernel/tsc.c | 22 +++---
>>  kernel/time/clocksource.c | 43 +++
>>  2 files changed, 46 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
>> index 91e6da48cbb6..74392d9d51e0 100644
>> --- a/arch/x86/kernel/tsc.c
>> +++ b/arch/x86/kernel/tsc.c
>> @@ -1067,6 +1067,7 @@ static struct clocksource clocksource_tsc_early = {
>> .resume = tsc_resume,
>> .mark_unstable  = tsc_cs_mark_unstable,
>> .tick_stable= tsc_cs_tick_stable,
>> +   .list   = LIST_HEAD_INIT(clocksource_tsc_early.list),
>>  };
>>
>>  /*
>> @@ -1086,6 +1087,7 @@ static struct clocksource clocksource_tsc = {
>> .resume = tsc_resume,
>> .mark_unstable  = tsc_cs_mark_unstable,
>> .tick_stable= tsc_cs_tick_stable,
>> +   .list   = LIST_HEAD_INIT(clocksource_tsc.list),
>>  };
>>
>>  void mark_tsc_unstable(char *reason)
>> @@ -1098,13 +1100,9 @@ void mark_tsc_unstable(char *reason)
>> clear_sched_clock_stable();
>> disable_sched_clock_irqtime();
>> pr_info("Marking TSC unstable due to %s\n", reason);
>> -   /* Change only the rating, when not registered */
>> -   if (clocksource_tsc.mult) {
>> -   clocksource_mark_unstable(_tsc);
>> -   } else {
>> -   clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
>> -   clocksource_tsc.rating = 0;
>> -   }
>> +
>> +   clocksource_mark_unstable(_tsc_early);
>> +   clocksource_mark_unstable(_tsc);
>>  }
>>
>>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
>> @@ -1244,7 +1242,7 @@ static void tsc_refine_calibration_work(struct 
>> work_struct *work)
>>
>> /* Don't bother refining TSC on unstable systems */
>> if (tsc_unstable)
>> -   return;
>> +   goto unreg;
>>
>&g

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 9:38 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Mon, Apr 23, 2018 at 08:49:25AM -0300, Diego Viola wrote:
>> On Mon, Apr 23, 2018 at 8:48 AM, Peter Zijlstra <pet...@infradead.org> wrote:
>> > On Mon, Apr 23, 2018 at 08:23:24AM -0300, Diego Viola wrote:
>> >> > That's a Core2 era chip; does it actually have stable TSC ?
>> >>
>> >> I'm not sure.
>> >
>> > dmesg | grep -i tsc
>> >
>> > should be able to tell you.
>>
>> [diego@dualcore ~]$ dmesg | grep -i tsc
>> [0.00] tsc: Fast TSC calibration using PIT
>> [0.01] tsc: Fast TSC calibration using PIT
>> [0.01] tsc: Detected 2793.087 MHz processor
>> [0.01] clocksource: tsc-early: mask: 0x
>> max_cycles: 0x2842be30f1f, max_idle_ns: 440795236296 ns
>> [0.162058] clocksource: Switched to clocksource tsc-early
>> [0.300076] tsc: Marking TSC unstable due to TSC halts in idle
>> [diego@dualcore ~]$
>
> Much thanks.. I suspect there a bunch of fail when marking unstable
> before we register clocksource_tsc.
>
> The below patch is a bit ugly, but should cure a number of things; it
> compiles but hasn't otherwise been tested, can you give it a spin?
>
> ---
>
>  - when TSC is unstable and we've already registered tsc-early, don't
>forget to unregister it; this then leaves us without a tsc
>clocksource entirely -- which is good.
>
>  - when we call mark_tsc_unstable() before we've registered
>clocksource_tsc things go wobbly because it doesn't know about
>clocksource_tsc_early. Fix that by:
>
> - Make clocksource_mark_unstable() work for unregistered
>   clocksources.
>
>- which means we have to be able to detect this; use cs.list
>  for this; initialize it empty.
>- means we also have to place all cs.list manipulation under
>  watchdog_lock -- bit ugly.
>
> - Make __clocksource_unstable() de-rate the clocksource.
>
> - Call clocksource_mark_unstable() on both tsc and tsc_early.
>
> This way we should either end up with a derated tsc clocksource marked
> UNSTABLE or no tsc clocksource at all, either should result in it not
> becoming the active clocksource.
>
> ---
>  arch/x86/kernel/tsc.c | 22 +++---
>  kernel/time/clocksource.c | 43 +++
>  2 files changed, 46 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 91e6da48cbb6..74392d9d51e0 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1067,6 +1067,7 @@ static struct clocksource clocksource_tsc_early = {
> .resume = tsc_resume,
> .mark_unstable  = tsc_cs_mark_unstable,
> .tick_stable= tsc_cs_tick_stable,
> +   .list   = LIST_HEAD_INIT(clocksource_tsc_early.list),
>  };
>
>  /*
> @@ -1086,6 +1087,7 @@ static struct clocksource clocksource_tsc = {
> .resume = tsc_resume,
> .mark_unstable  = tsc_cs_mark_unstable,
> .tick_stable= tsc_cs_tick_stable,
> +   .list   = LIST_HEAD_INIT(clocksource_tsc.list),
>  };
>
>  void mark_tsc_unstable(char *reason)
> @@ -1098,13 +1100,9 @@ void mark_tsc_unstable(char *reason)
> clear_sched_clock_stable();
> disable_sched_clock_irqtime();
> pr_info("Marking TSC unstable due to %s\n", reason);
> -   /* Change only the rating, when not registered */
> -   if (clocksource_tsc.mult) {
> -   clocksource_mark_unstable(_tsc);
> -   } else {
> -   clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
> -   clocksource_tsc.rating = 0;
> -   }
> +
> +   clocksource_mark_unstable(_tsc_early);
> +   clocksource_mark_unstable(_tsc);
>  }
>
>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> @@ -1244,7 +1242,7 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>
> /* Don't bother refining TSC on unstable systems */
> if (tsc_unstable)
> -   return;
> +   goto unreg;
>
> /*
>  * Since the work is started early in boot, we may be
> @@ -1297,11 +1295,12 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>
>  out:
> if (tsc_unstable)
> -   return;
> +   goto unreg;
>
> if (boot_cpu_has(X86_FEATURE_ART))
> art_related_clocksource = _tsc;
> clocksource_register_

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 9:38 AM, Peter Zijlstra  wrote:
> On Mon, Apr 23, 2018 at 08:49:25AM -0300, Diego Viola wrote:
>> On Mon, Apr 23, 2018 at 8:48 AM, Peter Zijlstra  wrote:
>> > On Mon, Apr 23, 2018 at 08:23:24AM -0300, Diego Viola wrote:
>> >> > That's a Core2 era chip; does it actually have stable TSC ?
>> >>
>> >> I'm not sure.
>> >
>> > dmesg | grep -i tsc
>> >
>> > should be able to tell you.
>>
>> [diego@dualcore ~]$ dmesg | grep -i tsc
>> [0.00] tsc: Fast TSC calibration using PIT
>> [0.01] tsc: Fast TSC calibration using PIT
>> [0.01] tsc: Detected 2793.087 MHz processor
>> [0.01] clocksource: tsc-early: mask: 0x
>> max_cycles: 0x2842be30f1f, max_idle_ns: 440795236296 ns
>> [0.162058] clocksource: Switched to clocksource tsc-early
>> [0.300076] tsc: Marking TSC unstable due to TSC halts in idle
>> [diego@dualcore ~]$
>
> Much thanks.. I suspect there a bunch of fail when marking unstable
> before we register clocksource_tsc.
>
> The below patch is a bit ugly, but should cure a number of things; it
> compiles but hasn't otherwise been tested, can you give it a spin?
>
> ---
>
>  - when TSC is unstable and we've already registered tsc-early, don't
>forget to unregister it; this then leaves us without a tsc
>clocksource entirely -- which is good.
>
>  - when we call mark_tsc_unstable() before we've registered
>clocksource_tsc things go wobbly because it doesn't know about
>clocksource_tsc_early. Fix that by:
>
> - Make clocksource_mark_unstable() work for unregistered
>   clocksources.
>
>- which means we have to be able to detect this; use cs.list
>  for this; initialize it empty.
>- means we also have to place all cs.list manipulation under
>  watchdog_lock -- bit ugly.
>
> - Make __clocksource_unstable() de-rate the clocksource.
>
> - Call clocksource_mark_unstable() on both tsc and tsc_early.
>
> This way we should either end up with a derated tsc clocksource marked
> UNSTABLE or no tsc clocksource at all, either should result in it not
> becoming the active clocksource.
>
> ---
>  arch/x86/kernel/tsc.c | 22 +++---
>  kernel/time/clocksource.c | 43 +++
>  2 files changed, 46 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 91e6da48cbb6..74392d9d51e0 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1067,6 +1067,7 @@ static struct clocksource clocksource_tsc_early = {
> .resume = tsc_resume,
> .mark_unstable  = tsc_cs_mark_unstable,
> .tick_stable= tsc_cs_tick_stable,
> +   .list   = LIST_HEAD_INIT(clocksource_tsc_early.list),
>  };
>
>  /*
> @@ -1086,6 +1087,7 @@ static struct clocksource clocksource_tsc = {
> .resume = tsc_resume,
> .mark_unstable  = tsc_cs_mark_unstable,
> .tick_stable= tsc_cs_tick_stable,
> +   .list   = LIST_HEAD_INIT(clocksource_tsc.list),
>  };
>
>  void mark_tsc_unstable(char *reason)
> @@ -1098,13 +1100,9 @@ void mark_tsc_unstable(char *reason)
> clear_sched_clock_stable();
> disable_sched_clock_irqtime();
> pr_info("Marking TSC unstable due to %s\n", reason);
> -   /* Change only the rating, when not registered */
> -   if (clocksource_tsc.mult) {
> -   clocksource_mark_unstable(_tsc);
> -   } else {
> -   clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
> -   clocksource_tsc.rating = 0;
> -   }
> +
> +   clocksource_mark_unstable(_tsc_early);
> +   clocksource_mark_unstable(_tsc);
>  }
>
>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> @@ -1244,7 +1242,7 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>
> /* Don't bother refining TSC on unstable systems */
> if (tsc_unstable)
> -   return;
> +   goto unreg;
>
> /*
>  * Since the work is started early in boot, we may be
> @@ -1297,11 +1295,12 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>
>  out:
> if (tsc_unstable)
> -   return;
> +   goto unreg;
>
> if (boot_cpu_has(X86_FEATURE_ART))
> art_related_clocksource = _tsc;
> clocksource_register_khz(_tsc, tsc_khz);
> +unreg:
> clocksource_unregister(_tsc

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 8:48 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Mon, Apr 23, 2018 at 08:23:24AM -0300, Diego Viola wrote:
>> > That's a Core2 era chip; does it actually have stable TSC ?
>>
>> I'm not sure.
>
> dmesg | grep -i tsc
>
> should be able to tell you.

[diego@dualcore ~]$ dmesg | grep -i tsc
[0.00] tsc: Fast TSC calibration using PIT
[0.01] tsc: Fast TSC calibration using PIT
[0.01] tsc: Detected 2793.087 MHz processor
[0.01] clocksource: tsc-early: mask: 0x
max_cycles: 0x2842be30f1f, max_idle_ns: 440795236296 ns
[0.162058] clocksource: Switched to clocksource tsc-early
[0.300076] tsc: Marking TSC unstable due to TSC halts in idle
[diego@dualcore ~]$

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 8:48 AM, Peter Zijlstra  wrote:
> On Mon, Apr 23, 2018 at 08:23:24AM -0300, Diego Viola wrote:
>> > That's a Core2 era chip; does it actually have stable TSC ?
>>
>> I'm not sure.
>
> dmesg | grep -i tsc
>
> should be able to tell you.

[diego@dualcore ~]$ dmesg | grep -i tsc
[0.00] tsc: Fast TSC calibration using PIT
[0.01] tsc: Fast TSC calibration using PIT
[0.01] tsc: Detected 2793.087 MHz processor
[0.01] clocksource: tsc-early: mask: 0x
max_cycles: 0x2842be30f1f, max_idle_ns: 440795236296 ns
[0.162058] clocksource: Switched to clocksource tsc-early
[0.300076] tsc: Marking TSC unstable due to TSC halts in idle
[diego@dualcore ~]$

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 6:16 AM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Sat, Apr 21, 2018 at 11:19:55PM -0300, Diego Viola wrote:
>
>> The problem only happens *after* resuming from suspend, in normal S0
>> state it never happens.
>
> What kind of suspend are you doing? s2disk, s2ram ?

Suspend to RAM with systemctl suspend.

>
>> My machine is a desktop and has a dual core E5500 CPU with 2GB of RAM,
>> I run Arch Linux and I use the i3wm.
>
> That's a Core2 era chip; does it actually have stable TSC ?

I'm not sure.

>
>> aa83c45762a242acce9b35020363225a7b59d7c9 is the first bad commit
>
> That is most curious; that was supposed to only affect early boot. I'll
> try and figure out of there's some intersection with suspend.

Thanks.

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-23 Thread Diego Viola

On Mon, Apr 23, 2018 at 6:16 AM, Peter Zijlstra  wrote:
> On Sat, Apr 21, 2018 at 11:19:55PM -0300, Diego Viola wrote:
>
>> The problem only happens *after* resuming from suspend, in normal S0
>> state it never happens.
>
> What kind of suspend are you doing? s2disk, s2ram ?

Suspend to RAM with systemctl suspend.

>
>> My machine is a desktop and has a dual core E5500 CPU with 2GB of RAM,
>> I run Arch Linux and I use the i3wm.
>
> That's a Core2 era chip; does it actually have stable TSC ?

I'm not sure.

>
>> aa83c45762a242acce9b35020363225a7b59d7c9 is the first bad commit
>
> That is most curious; that was supposed to only affect early boot. I'll
> try and figure out of there's some intersection with suspend.

Thanks.

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-22 Thread Diego Viola

On Sun, Apr 22, 2018 at 2:47 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Sat, Apr 21, 2018 at 11:19 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>> Hi,
>>
>> I'm currently experiencing freezes after resuming from suspend when
>> using various applications on my desktop.
>>
>> For example, when using Firefox and switching between tabs the screen
>> would freeze, allowing me to only move the mouse pointer but focusing
>> the mouse pointer on a application won't have any effect.
>>
>> The keyboard would still work and the only way to make the issue go
>> away would be to switch to a virtual desktop and then switch back,
>> then I'd be able to use the mouse again until the same problem
>> re-occurs.
>>
>> This never happened before upgrading to 4.16.3 and 4.15.18 is unaffected.
>>
>> The problem only happens *after* resuming from suspend, in normal S0
>> state it never happens.
>>
>> My machine is a desktop and has a dual core E5500 CPU with 2GB of RAM,
>> I run Arch Linux and I use the i3wm.
>>
>> I don't think it's a problem with user space because I run exactly the
>> same Arch Linux setup on my T450 and the issue can't be reproduced on
>> that machine.
>>
>> I did a bisect and the cause of the bug seems to be this commit:
>>
>> aa83c45762a242acce9b35020363225a7b59d7c9 is the first bad commit
>>
>> Here is the bisect log:
>>
>> git bisect start
>> # bad: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
>> git bisect bad 0adb32858b0bddf4ada5f364a84ed60b196dbcda
>> # good: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
>> git bisect good d8a5b80568a9cb66810e75b182018e9edb68e8ff
>> # bad: [c14376de3a1befa70d9811ca2872d47367b48767] printk: Wake klogd
>> when passing console_lock owner
>> git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
>> # bad: [a103950e0dd2058df5e8a8d4a915707bdcf205f0] Merge branch 'linus'
>> of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
>> git bisect bad a103950e0dd2058df5e8a8d4a915707bdcf205f0
>> # bad: [d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c] Merge branch
>> 'perf-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c
>> # good: [9697e9da84299d0d715d515dd2cc48f1eceb277d] Merge tag
>> 'edac_for_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
>> git bisect good 9697e9da84299d0d715d515dd2cc48f1eceb277d
>> # good: [d0bd31dc5c0b46b9c778112900cf8f910ac26e1b] Merge tag
>> 'xtensa-20180129' of git://github.com/jcmvbkbc/linux-xtensa
>> git bisect good d0bd31dc5c0b46b9c778112900cf8f910ac26e1b
>> # good: [0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2] perf util:
>> Introduce architecture specific errno/name mapping
>> git bisect good 0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2
>> # bad: [6304672b7f0a5c010002e63a075160856dc4f88d] Merge branch
>> 'x86-pti-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad 6304672b7f0a5c010002e63a075160856dc4f88d
>> # good: [a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2] Merge branch
>> 'timers-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect good a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2
>> # good: [669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3] Merge branch
>> 'x86-platform-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect good 669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3
>> # good: [7a32fc51ca938e67974cbb9db31e1a43f98345a9] x86/nospec: Fix
>> header guards names
>> git bisect good 7a32fc51ca938e67974cbb9db31e1a43f98345a9
>> # bad: [942633523cde99c5bfa0329996dd839ca66a87aa] Merge branch
>> 'x86-mm-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad 942633523cde99c5bfa0329996dd839ca66a87aa
>> # bad: [aa83c45762a242acce9b35020363225a7b59d7c9] x86/tsc: Introduce
>> early tsc clocksource
>> git bisect bad aa83c45762a242acce9b35020363225a7b59d7c9
>> # good: [6d671e1b85c63e7a337ba76c1a154c091545cff8] x86/time:
>> Unconditionally register legacy timer interrupt
>> git bisect good 6d671e1b85c63e7a337ba76c1a154c091545cff8
>> # first bad commit: [aa83c45762a242acce9b35020363225a7b59d7c9]
>> x86/tsc: Introduce early tsc clocksource
>>
>> Current git HEAD is also affected.
>>
>> I already tried HEAD with that commit reverted and the problem can't
>> be reproduced anymore.
>>
>> Please let me know if I can include more info, thanks in advance.
>>
>> Diego
>
> When booting with clocksource=hpet I cannot reproduce the problem.

I can reproduce the freeze with clocksource=tsc as well.

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-22 Thread Diego Viola

On Sun, Apr 22, 2018 at 2:47 PM, Diego Viola  wrote:
> On Sat, Apr 21, 2018 at 11:19 PM, Diego Viola  wrote:
>> Hi,
>>
>> I'm currently experiencing freezes after resuming from suspend when
>> using various applications on my desktop.
>>
>> For example, when using Firefox and switching between tabs the screen
>> would freeze, allowing me to only move the mouse pointer but focusing
>> the mouse pointer on a application won't have any effect.
>>
>> The keyboard would still work and the only way to make the issue go
>> away would be to switch to a virtual desktop and then switch back,
>> then I'd be able to use the mouse again until the same problem
>> re-occurs.
>>
>> This never happened before upgrading to 4.16.3 and 4.15.18 is unaffected.
>>
>> The problem only happens *after* resuming from suspend, in normal S0
>> state it never happens.
>>
>> My machine is a desktop and has a dual core E5500 CPU with 2GB of RAM,
>> I run Arch Linux and I use the i3wm.
>>
>> I don't think it's a problem with user space because I run exactly the
>> same Arch Linux setup on my T450 and the issue can't be reproduced on
>> that machine.
>>
>> I did a bisect and the cause of the bug seems to be this commit:
>>
>> aa83c45762a242acce9b35020363225a7b59d7c9 is the first bad commit
>>
>> Here is the bisect log:
>>
>> git bisect start
>> # bad: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
>> git bisect bad 0adb32858b0bddf4ada5f364a84ed60b196dbcda
>> # good: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
>> git bisect good d8a5b80568a9cb66810e75b182018e9edb68e8ff
>> # bad: [c14376de3a1befa70d9811ca2872d47367b48767] printk: Wake klogd
>> when passing console_lock owner
>> git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
>> # bad: [a103950e0dd2058df5e8a8d4a915707bdcf205f0] Merge branch 'linus'
>> of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
>> git bisect bad a103950e0dd2058df5e8a8d4a915707bdcf205f0
>> # bad: [d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c] Merge branch
>> 'perf-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c
>> # good: [9697e9da84299d0d715d515dd2cc48f1eceb277d] Merge tag
>> 'edac_for_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
>> git bisect good 9697e9da84299d0d715d515dd2cc48f1eceb277d
>> # good: [d0bd31dc5c0b46b9c778112900cf8f910ac26e1b] Merge tag
>> 'xtensa-20180129' of git://github.com/jcmvbkbc/linux-xtensa
>> git bisect good d0bd31dc5c0b46b9c778112900cf8f910ac26e1b
>> # good: [0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2] perf util:
>> Introduce architecture specific errno/name mapping
>> git bisect good 0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2
>> # bad: [6304672b7f0a5c010002e63a075160856dc4f88d] Merge branch
>> 'x86-pti-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad 6304672b7f0a5c010002e63a075160856dc4f88d
>> # good: [a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2] Merge branch
>> 'timers-core-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect good a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2
>> # good: [669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3] Merge branch
>> 'x86-platform-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect good 669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3
>> # good: [7a32fc51ca938e67974cbb9db31e1a43f98345a9] x86/nospec: Fix
>> header guards names
>> git bisect good 7a32fc51ca938e67974cbb9db31e1a43f98345a9
>> # bad: [942633523cde99c5bfa0329996dd839ca66a87aa] Merge branch
>> 'x86-mm-for-linus' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad 942633523cde99c5bfa0329996dd839ca66a87aa
>> # bad: [aa83c45762a242acce9b35020363225a7b59d7c9] x86/tsc: Introduce
>> early tsc clocksource
>> git bisect bad aa83c45762a242acce9b35020363225a7b59d7c9
>> # good: [6d671e1b85c63e7a337ba76c1a154c091545cff8] x86/time:
>> Unconditionally register legacy timer interrupt
>> git bisect good 6d671e1b85c63e7a337ba76c1a154c091545cff8
>> # first bad commit: [aa83c45762a242acce9b35020363225a7b59d7c9]
>> x86/tsc: Introduce early tsc clocksource
>>
>> Current git HEAD is also affected.
>>
>> I already tried HEAD with that commit reverted and the problem can't
>> be reproduced anymore.
>>
>> Please let me know if I can include more info, thanks in advance.
>>
>> Diego
>
> When booting with clocksource=hpet I cannot reproduce the problem.

I can reproduce the freeze with clocksource=tsc as well.

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-22 Thread Diego Viola

On Sat, Apr 21, 2018 at 11:19 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> Hi,
>
> I'm currently experiencing freezes after resuming from suspend when
> using various applications on my desktop.
>
> For example, when using Firefox and switching between tabs the screen
> would freeze, allowing me to only move the mouse pointer but focusing
> the mouse pointer on a application won't have any effect.
>
> The keyboard would still work and the only way to make the issue go
> away would be to switch to a virtual desktop and then switch back,
> then I'd be able to use the mouse again until the same problem
> re-occurs.
>
> This never happened before upgrading to 4.16.3 and 4.15.18 is unaffected.
>
> The problem only happens *after* resuming from suspend, in normal S0
> state it never happens.
>
> My machine is a desktop and has a dual core E5500 CPU with 2GB of RAM,
> I run Arch Linux and I use the i3wm.
>
> I don't think it's a problem with user space because I run exactly the
> same Arch Linux setup on my T450 and the issue can't be reproduced on
> that machine.
>
> I did a bisect and the cause of the bug seems to be this commit:
>
> aa83c45762a242acce9b35020363225a7b59d7c9 is the first bad commit
>
> Here is the bisect log:
>
> git bisect start
> # bad: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
> git bisect bad 0adb32858b0bddf4ada5f364a84ed60b196dbcda
> # good: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
> git bisect good d8a5b80568a9cb66810e75b182018e9edb68e8ff
> # bad: [c14376de3a1befa70d9811ca2872d47367b48767] printk: Wake klogd
> when passing console_lock owner
> git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
> # bad: [a103950e0dd2058df5e8a8d4a915707bdcf205f0] Merge branch 'linus'
> of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> git bisect bad a103950e0dd2058df5e8a8d4a915707bdcf205f0
> # bad: [d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c] Merge branch
> 'perf-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c
> # good: [9697e9da84299d0d715d515dd2cc48f1eceb277d] Merge tag
> 'edac_for_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
> git bisect good 9697e9da84299d0d715d515dd2cc48f1eceb277d
> # good: [d0bd31dc5c0b46b9c778112900cf8f910ac26e1b] Merge tag
> 'xtensa-20180129' of git://github.com/jcmvbkbc/linux-xtensa
> git bisect good d0bd31dc5c0b46b9c778112900cf8f910ac26e1b
> # good: [0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2] perf util:
> Introduce architecture specific errno/name mapping
> git bisect good 0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2
> # bad: [6304672b7f0a5c010002e63a075160856dc4f88d] Merge branch
> 'x86-pti-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 6304672b7f0a5c010002e63a075160856dc4f88d
> # good: [a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2] Merge branch
> 'timers-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2
> # good: [669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3] Merge branch
> 'x86-platform-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good 669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3
> # good: [7a32fc51ca938e67974cbb9db31e1a43f98345a9] x86/nospec: Fix
> header guards names
> git bisect good 7a32fc51ca938e67974cbb9db31e1a43f98345a9
> # bad: [942633523cde99c5bfa0329996dd839ca66a87aa] Merge branch
> 'x86-mm-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 942633523cde99c5bfa0329996dd839ca66a87aa
> # bad: [aa83c45762a242acce9b35020363225a7b59d7c9] x86/tsc: Introduce
> early tsc clocksource
> git bisect bad aa83c45762a242acce9b35020363225a7b59d7c9
> # good: [6d671e1b85c63e7a337ba76c1a154c091545cff8] x86/time:
> Unconditionally register legacy timer interrupt
> git bisect good 6d671e1b85c63e7a337ba76c1a154c091545cff8
> # first bad commit: [aa83c45762a242acce9b35020363225a7b59d7c9]
> x86/tsc: Introduce early tsc clocksource
>
> Current git HEAD is also affected.
>
> I already tried HEAD with that commit reverted and the problem can't
> be reproduced anymore.
>
> Please let me know if I can include more info, thanks in advance.
>
> Diego

When booting with clocksource=hpet I cannot reproduce the problem.

Re: Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-22 Thread Diego Viola

On Sat, Apr 21, 2018 at 11:19 PM, Diego Viola  wrote:
> Hi,
>
> I'm currently experiencing freezes after resuming from suspend when
> using various applications on my desktop.
>
> For example, when using Firefox and switching between tabs the screen
> would freeze, allowing me to only move the mouse pointer but focusing
> the mouse pointer on a application won't have any effect.
>
> The keyboard would still work and the only way to make the issue go
> away would be to switch to a virtual desktop and then switch back,
> then I'd be able to use the mouse again until the same problem
> re-occurs.
>
> This never happened before upgrading to 4.16.3 and 4.15.18 is unaffected.
>
> The problem only happens *after* resuming from suspend, in normal S0
> state it never happens.
>
> My machine is a desktop and has a dual core E5500 CPU with 2GB of RAM,
> I run Arch Linux and I use the i3wm.
>
> I don't think it's a problem with user space because I run exactly the
> same Arch Linux setup on my T450 and the issue can't be reproduced on
> that machine.
>
> I did a bisect and the cause of the bug seems to be this commit:
>
> aa83c45762a242acce9b35020363225a7b59d7c9 is the first bad commit
>
> Here is the bisect log:
>
> git bisect start
> # bad: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
> git bisect bad 0adb32858b0bddf4ada5f364a84ed60b196dbcda
> # good: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
> git bisect good d8a5b80568a9cb66810e75b182018e9edb68e8ff
> # bad: [c14376de3a1befa70d9811ca2872d47367b48767] printk: Wake klogd
> when passing console_lock owner
> git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
> # bad: [a103950e0dd2058df5e8a8d4a915707bdcf205f0] Merge branch 'linus'
> of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> git bisect bad a103950e0dd2058df5e8a8d4a915707bdcf205f0
> # bad: [d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c] Merge branch
> 'perf-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c
> # good: [9697e9da84299d0d715d515dd2cc48f1eceb277d] Merge tag
> 'edac_for_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
> git bisect good 9697e9da84299d0d715d515dd2cc48f1eceb277d
> # good: [d0bd31dc5c0b46b9c778112900cf8f910ac26e1b] Merge tag
> 'xtensa-20180129' of git://github.com/jcmvbkbc/linux-xtensa
> git bisect good d0bd31dc5c0b46b9c778112900cf8f910ac26e1b
> # good: [0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2] perf util:
> Introduce architecture specific errno/name mapping
> git bisect good 0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2
> # bad: [6304672b7f0a5c010002e63a075160856dc4f88d] Merge branch
> 'x86-pti-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 6304672b7f0a5c010002e63a075160856dc4f88d
> # good: [a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2] Merge branch
> 'timers-core-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2
> # good: [669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3] Merge branch
> 'x86-platform-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good 669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3
> # good: [7a32fc51ca938e67974cbb9db31e1a43f98345a9] x86/nospec: Fix
> header guards names
> git bisect good 7a32fc51ca938e67974cbb9db31e1a43f98345a9
> # bad: [942633523cde99c5bfa0329996dd839ca66a87aa] Merge branch
> 'x86-mm-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 942633523cde99c5bfa0329996dd839ca66a87aa
> # bad: [aa83c45762a242acce9b35020363225a7b59d7c9] x86/tsc: Introduce
> early tsc clocksource
> git bisect bad aa83c45762a242acce9b35020363225a7b59d7c9
> # good: [6d671e1b85c63e7a337ba76c1a154c091545cff8] x86/time:
> Unconditionally register legacy timer interrupt
> git bisect good 6d671e1b85c63e7a337ba76c1a154c091545cff8
> # first bad commit: [aa83c45762a242acce9b35020363225a7b59d7c9]
> x86/tsc: Introduce early tsc clocksource
>
> Current git HEAD is also affected.
>
> I already tried HEAD with that commit reverted and the problem can't
> be reproduced anymore.
>
> Please let me know if I can include more info, thanks in advance.
>
> Diego

When booting with clocksource=hpet I cannot reproduce the problem.

Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-21 Thread Diego Viola

Hi,

I'm currently experiencing freezes after resuming from suspend when
using various applications on my desktop.

For example, when using Firefox and switching between tabs the screen
would freeze, allowing me to only move the mouse pointer but focusing
the mouse pointer on a application won't have any effect.

The keyboard would still work and the only way to make the issue go
away would be to switch to a virtual desktop and then switch back,
then I'd be able to use the mouse again until the same problem
re-occurs.

This never happened before upgrading to 4.16.3 and 4.15.18 is unaffected.

The problem only happens *after* resuming from suspend, in normal S0
state it never happens.

My machine is a desktop and has a dual core E5500 CPU with 2GB of RAM,
I run Arch Linux and I use the i3wm.

I don't think it's a problem with user space because I run exactly the
same Arch Linux setup on my T450 and the issue can't be reproduced on
that machine.

I did a bisect and the cause of the bug seems to be this commit:

aa83c45762a242acce9b35020363225a7b59d7c9 is the first bad commit

Here is the bisect log:

git bisect start
# bad: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
git bisect bad 0adb32858b0bddf4ada5f364a84ed60b196dbcda
# good: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
git bisect good d8a5b80568a9cb66810e75b182018e9edb68e8ff
# bad: [c14376de3a1befa70d9811ca2872d47367b48767] printk: Wake klogd
when passing console_lock owner
git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
# bad: [a103950e0dd2058df5e8a8d4a915707bdcf205f0] Merge branch 'linus'
of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect bad a103950e0dd2058df5e8a8d4a915707bdcf205f0
# bad: [d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c] Merge branch
'perf-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c
# good: [9697e9da84299d0d715d515dd2cc48f1eceb277d] Merge tag
'edac_for_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
git bisect good 9697e9da84299d0d715d515dd2cc48f1eceb277d
# good: [d0bd31dc5c0b46b9c778112900cf8f910ac26e1b] Merge tag
'xtensa-20180129' of git://github.com/jcmvbkbc/linux-xtensa
git bisect good d0bd31dc5c0b46b9c778112900cf8f910ac26e1b
# good: [0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2] perf util:
Introduce architecture specific errno/name mapping
git bisect good 0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2
# bad: [6304672b7f0a5c010002e63a075160856dc4f88d] Merge branch
'x86-pti-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 6304672b7f0a5c010002e63a075160856dc4f88d
# good: [a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2] Merge branch
'timers-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2
# good: [669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3] Merge branch
'x86-platform-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3
# good: [7a32fc51ca938e67974cbb9db31e1a43f98345a9] x86/nospec: Fix
header guards names
git bisect good 7a32fc51ca938e67974cbb9db31e1a43f98345a9
# bad: [942633523cde99c5bfa0329996dd839ca66a87aa] Merge branch
'x86-mm-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 942633523cde99c5bfa0329996dd839ca66a87aa
# bad: [aa83c45762a242acce9b35020363225a7b59d7c9] x86/tsc: Introduce
early tsc clocksource
git bisect bad aa83c45762a242acce9b35020363225a7b59d7c9
# good: [6d671e1b85c63e7a337ba76c1a154c091545cff8] x86/time:
Unconditionally register legacy timer interrupt
git bisect good 6d671e1b85c63e7a337ba76c1a154c091545cff8
# first bad commit: [aa83c45762a242acce9b35020363225a7b59d7c9]
x86/tsc: Introduce early tsc clocksource

Current git HEAD is also affected.

I already tried HEAD with that commit reverted and the problem can't
be reproduced anymore.

Please let me know if I can include more info, thanks in advance.

Diego

Experiencing freezes with kernel 4.16.3 on a desktop with E5500 CPU (bisect included)

2018-04-21 Thread Diego Viola

Hi,

I'm currently experiencing freezes after resuming from suspend when
using various applications on my desktop.

For example, when using Firefox and switching between tabs the screen
would freeze, allowing me to only move the mouse pointer but focusing
the mouse pointer on a application won't have any effect.

The keyboard would still work and the only way to make the issue go
away would be to switch to a virtual desktop and then switch back,
then I'd be able to use the mouse again until the same problem
re-occurs.

This never happened before upgrading to 4.16.3 and 4.15.18 is unaffected.

The problem only happens *after* resuming from suspend, in normal S0
state it never happens.

My machine is a desktop and has a dual core E5500 CPU with 2GB of RAM,
I run Arch Linux and I use the i3wm.

I don't think it's a problem with user space because I run exactly the
same Arch Linux setup on my T450 and the issue can't be reproduced on
that machine.

I did a bisect and the cause of the bug seems to be this commit:

aa83c45762a242acce9b35020363225a7b59d7c9 is the first bad commit

Here is the bisect log:

git bisect start
# bad: [0adb32858b0bddf4ada5f364a84ed60b196dbcda] Linux 4.16
git bisect bad 0adb32858b0bddf4ada5f364a84ed60b196dbcda
# good: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
git bisect good d8a5b80568a9cb66810e75b182018e9edb68e8ff
# bad: [c14376de3a1befa70d9811ca2872d47367b48767] printk: Wake klogd
when passing console_lock owner
git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
# bad: [a103950e0dd2058df5e8a8d4a915707bdcf205f0] Merge branch 'linus'
of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect bad a103950e0dd2058df5e8a8d4a915707bdcf205f0
# bad: [d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c] Merge branch
'perf-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad d8b91dde38f4c43bd0bbbf17a90f735b16aaff2c
# good: [9697e9da84299d0d715d515dd2cc48f1eceb277d] Merge tag
'edac_for_4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
git bisect good 9697e9da84299d0d715d515dd2cc48f1eceb277d
# good: [d0bd31dc5c0b46b9c778112900cf8f910ac26e1b] Merge tag
'xtensa-20180129' of git://github.com/jcmvbkbc/linux-xtensa
git bisect good d0bd31dc5c0b46b9c778112900cf8f910ac26e1b
# good: [0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2] perf util:
Introduce architecture specific errno/name mapping
git bisect good 0337cf74ccf2a43437bff2e23b278e4f2dc4c6e2
# bad: [6304672b7f0a5c010002e63a075160856dc4f88d] Merge branch
'x86-pti-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 6304672b7f0a5c010002e63a075160856dc4f88d
# good: [a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2] Merge branch
'timers-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good a46d3f9b1c9888a244ed1ce8da0eca98c3f378e2
# good: [669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3] Merge branch
'x86-platform-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 669c0f762ed19bd9ec128ebc97ae8641b6e1a4a3
# good: [7a32fc51ca938e67974cbb9db31e1a43f98345a9] x86/nospec: Fix
header guards names
git bisect good 7a32fc51ca938e67974cbb9db31e1a43f98345a9
# bad: [942633523cde99c5bfa0329996dd839ca66a87aa] Merge branch
'x86-mm-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 942633523cde99c5bfa0329996dd839ca66a87aa
# bad: [aa83c45762a242acce9b35020363225a7b59d7c9] x86/tsc: Introduce
early tsc clocksource
git bisect bad aa83c45762a242acce9b35020363225a7b59d7c9
# good: [6d671e1b85c63e7a337ba76c1a154c091545cff8] x86/time:
Unconditionally register legacy timer interrupt
git bisect good 6d671e1b85c63e7a337ba76c1a154c091545cff8
# first bad commit: [aa83c45762a242acce9b35020363225a7b59d7c9]
x86/tsc: Introduce early tsc clocksource

Current git HEAD is also affected.

I already tried HEAD with that commit reverted and the problem can't
be reproduced anymore.

Please let me know if I can include more info, thanks in advance.

Diego

Re: 4.11-rc8 makes my Dell Inspiron 5558 crash

2017-05-09 Thread Diego Viola

On Fri, Apr 28, 2017 at 2:32 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Thu, Apr 27, 2017 at 8:09 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>> On Thu, Apr 27, 2017 at 7:56 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>>> Hello,
>>>
>>> For some reason 4.11-rc8 makes my Dell Inspiron 5558 crash. The
>>> problem is very random and I don't know how to reproduce it.
>>>
>>> That said, this only started happening with 4.11-rc8 and I've been
>>> using 4.11 since rc3 and it didn't happen at all with the previous
>>> RCs.
>>>
>>> Not sure how I could debug this, there is nothing in the logs either.
>>>
>>> I'm on Arch Linux x86_64.
>>>
>>> Thanks,
>>> Diego
>>
>> Not sure if "crash" is the best way to describe it, but when I'm using
>> i3 and browsing X will freeze and I won't be able to do anything, Num
>> Lock won't work either.
>>
>> I can only do a hard reset at that point.
>>
>> Diego
>
> BTW, I should specify that this issue has nothing to do with the other
> issue I had:
>
> https://lkml.org/lkml/2017/3/8/603
>
> That issue was with suspend/resume and it's fixed.
>
> This issue I have now is with normal usage (S0) and not with suspend/resume.
>
> In this case, everything freezes, X, etc. and only a hard reboot
> works, I'm not sure I can git-bisect the problem also because it's
> hard to reproduce and very random.
>
> I'll see if the final 4.11 release fixes it and report back.
>
> Diego

Hello,

I've been using 4.11 for about a week, and it has not frozen since then.

Thanks,
Diego

Re: 4.11-rc8 makes my Dell Inspiron 5558 crash

2017-05-09 Thread Diego Viola

On Fri, Apr 28, 2017 at 2:32 PM, Diego Viola  wrote:
> On Thu, Apr 27, 2017 at 8:09 PM, Diego Viola  wrote:
>> On Thu, Apr 27, 2017 at 7:56 PM, Diego Viola  wrote:
>>> Hello,
>>>
>>> For some reason 4.11-rc8 makes my Dell Inspiron 5558 crash. The
>>> problem is very random and I don't know how to reproduce it.
>>>
>>> That said, this only started happening with 4.11-rc8 and I've been
>>> using 4.11 since rc3 and it didn't happen at all with the previous
>>> RCs.
>>>
>>> Not sure how I could debug this, there is nothing in the logs either.
>>>
>>> I'm on Arch Linux x86_64.
>>>
>>> Thanks,
>>> Diego
>>
>> Not sure if "crash" is the best way to describe it, but when I'm using
>> i3 and browsing X will freeze and I won't be able to do anything, Num
>> Lock won't work either.
>>
>> I can only do a hard reset at that point.
>>
>> Diego
>
> BTW, I should specify that this issue has nothing to do with the other
> issue I had:
>
> https://lkml.org/lkml/2017/3/8/603
>
> That issue was with suspend/resume and it's fixed.
>
> This issue I have now is with normal usage (S0) and not with suspend/resume.
>
> In this case, everything freezes, X, etc. and only a hard reboot
> works, I'm not sure I can git-bisect the problem also because it's
> hard to reproduce and very random.
>
> I'll see if the final 4.11 release fixes it and report back.
>
> Diego

Hello,

I've been using 4.11 for about a week, and it has not frozen since then.

Thanks,
Diego

Re: 4.11-rc8 makes my Dell Inspiron 5558 crash

2017-04-28 Thread Diego Viola

On Thu, Apr 27, 2017 at 8:09 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Thu, Apr 27, 2017 at 7:56 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>> Hello,
>>
>> For some reason 4.11-rc8 makes my Dell Inspiron 5558 crash. The
>> problem is very random and I don't know how to reproduce it.
>>
>> That said, this only started happening with 4.11-rc8 and I've been
>> using 4.11 since rc3 and it didn't happen at all with the previous
>> RCs.
>>
>> Not sure how I could debug this, there is nothing in the logs either.
>>
>> I'm on Arch Linux x86_64.
>>
>> Thanks,
>> Diego
>
> Not sure if "crash" is the best way to describe it, but when I'm using
> i3 and browsing X will freeze and I won't be able to do anything, Num
> Lock won't work either.
>
> I can only do a hard reset at that point.
>
> Diego

BTW, I should specify that this issue has nothing to do with the other
issue I had:

https://lkml.org/lkml/2017/3/8/603

That issue was with suspend/resume and it's fixed.

This issue I have now is with normal usage (S0) and not with suspend/resume.

In this case, everything freezes, X, etc. and only a hard reboot
works, I'm not sure I can git-bisect the problem also because it's
hard to reproduce and very random.

I'll see if the final 4.11 release fixes it and report back.

Diego

Re: 4.11-rc8 makes my Dell Inspiron 5558 crash

2017-04-28 Thread Diego Viola

On Thu, Apr 27, 2017 at 8:09 PM, Diego Viola  wrote:
> On Thu, Apr 27, 2017 at 7:56 PM, Diego Viola  wrote:
>> Hello,
>>
>> For some reason 4.11-rc8 makes my Dell Inspiron 5558 crash. The
>> problem is very random and I don't know how to reproduce it.
>>
>> That said, this only started happening with 4.11-rc8 and I've been
>> using 4.11 since rc3 and it didn't happen at all with the previous
>> RCs.
>>
>> Not sure how I could debug this, there is nothing in the logs either.
>>
>> I'm on Arch Linux x86_64.
>>
>> Thanks,
>> Diego
>
> Not sure if "crash" is the best way to describe it, but when I'm using
> i3 and browsing X will freeze and I won't be able to do anything, Num
> Lock won't work either.
>
> I can only do a hard reset at that point.
>
> Diego

BTW, I should specify that this issue has nothing to do with the other
issue I had:

https://lkml.org/lkml/2017/3/8/603

That issue was with suspend/resume and it's fixed.

This issue I have now is with normal usage (S0) and not with suspend/resume.

In this case, everything freezes, X, etc. and only a hard reboot
works, I'm not sure I can git-bisect the problem also because it's
hard to reproduce and very random.

I'll see if the final 4.11 release fixes it and report back.

Diego

Re: 4.11-rc8 makes my Dell Inspiron 5558 crash

2017-04-27 Thread Diego Viola

On Thu, Apr 27, 2017 at 7:56 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> Hello,
>
> For some reason 4.11-rc8 makes my Dell Inspiron 5558 crash. The
> problem is very random and I don't know how to reproduce it.
>
> That said, this only started happening with 4.11-rc8 and I've been
> using 4.11 since rc3 and it didn't happen at all with the previous
> RCs.
>
> Not sure how I could debug this, there is nothing in the logs either.
>
> I'm on Arch Linux x86_64.
>
> Thanks,
> Diego

Not sure if "crash" is the best way to describe it, but when I'm using
i3 and browsing X will freeze and I won't be able to do anything, Num
Lock won't work either.

I can only do a hard reset at that point.

Diego

Re: 4.11-rc8 makes my Dell Inspiron 5558 crash

2017-04-27 Thread Diego Viola

On Thu, Apr 27, 2017 at 7:56 PM, Diego Viola  wrote:
> Hello,
>
> For some reason 4.11-rc8 makes my Dell Inspiron 5558 crash. The
> problem is very random and I don't know how to reproduce it.
>
> That said, this only started happening with 4.11-rc8 and I've been
> using 4.11 since rc3 and it didn't happen at all with the previous
> RCs.
>
> Not sure how I could debug this, there is nothing in the logs either.
>
> I'm on Arch Linux x86_64.
>
> Thanks,
> Diego

Not sure if "crash" is the best way to describe it, but when I'm using
i3 and browsing X will freeze and I won't be able to do anything, Num
Lock won't work either.

I can only do a hard reset at that point.

Diego

4.11-rc8 makes my Dell Inspiron 5558 crash

2017-04-27 Thread Diego Viola

Hello,

For some reason 4.11-rc8 makes my Dell Inspiron 5558 crash. The
problem is very random and I don't know how to reproduce it.

That said, this only started happening with 4.11-rc8 and I've been
using 4.11 since rc3 and it didn't happen at all with the previous
RCs.

Not sure how I could debug this, there is nothing in the logs either.

I'm on Arch Linux x86_64.

Thanks,
Diego

4.11-rc8 makes my Dell Inspiron 5558 crash

2017-04-27 Thread Diego Viola

Hello,

For some reason 4.11-rc8 makes my Dell Inspiron 5558 crash. The
problem is very random and I don't know how to reproduce it.

That said, this only started happening with 4.11-rc8 and I've been
using 4.11 since rc3 and it didn't happen at all with the previous
RCs.

Not sure how I could debug this, there is nothing in the logs either.

I'm on Arch Linux x86_64.

Thanks,
Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-24 Thread Diego Viola

On Thu, Mar 23, 2017 at 2:12 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Thu, Mar 23, 2017 at 2:02 PM, Mathias Nyman
> <mathias.ny...@linux.intel.com> wrote:
>> On 22.03.2017 19:51, Mathias Nyman wrote:
>>>
>>> On 22.03.2017 00:52, Diego Viola wrote:
>>>>
>>>> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <diego.vi...@gmail.com>
>>>> wrote:
>>>>>
>>>>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <diego.vi...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <diego.vi...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <diego.vi...@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>>>>> <mathias.ny...@linux.intel.com> wrote:
>>>>>>>>>
>>>>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>>>>> <mathias.ny...@linux.intel.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> xhci tracing can be added with:
>>>>>>>>>>>
>>>>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>
>>>>
>>>> Here's the log I was able to obtain today, dmesg + ftrace at the time
>>>> of the crash:
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255419
>>>>
>>>> USB keyboard and mouse was plugged when I reproduced this.
>>>>
>>>> Please let me know if you need more info.
>>>>
>>>
>>> Thanks, I'm looking at the logs and so far the most suspicious looking
>>> entry is:
>>>
>>> [  257.060941] rtsx_usb-254 0 119946155us : xhci_urb_enqueue:
>>> ep1out-bulk: urb 880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream
>>> 0 flags 0001
>>> [  257.063601] rtsx_usb-254 0 119946162us : xhci_urb_enqueue:
>>> ep0out-control: urb 880105a93300 pipe 2147484928 length 0/0 sgs 0/0
>>> stream 0 flags 0010
>>>
>>> It enqueues the same URB, without ever giving it back or actually queuing
>>> any trbs for
>>> the urb, wel,l it might just fail to enqueue it in the first place.
>>>
>>> I need to search for a URB that has been dequeued but never given back in
>>> the trace
>>
>>
>> Ok, found a much more likely candidate:
>>
>> [  258.004078] kworker/-544 0d..1 121599183us : xhci_urb_dequeue:
>> ep1out-bulk: urb 880105a930c0 pipe 3221259520...
>>
>> We try to kill this URB "880105a930c0", twice, and its never given back.
>> Trace is missing "xhci_dbg_cancel_urb: Cancel URB..." entry in log after
>> xhci_urb_dequeue, so it never got added to the list for cancellation in xhci
>> driver.
>>
>> xhci_urb_dequeue() has one place where it just returns an error without
>> giving back the urb or queuing it for cancellation.
>> This is in my opinion a bug in xhci_urb_dequeue()
>>
>> rtsx_usb_ms is a good test for usb, it seems to be constantly queuing urbs
>> at all
>> inappropriate times.
>>
>> If I write a patch can you try it out?
>
> Yes.
>
>>
>> -Mathias
>>
>>
>>
>
> Thanks,
> Diego

Hi Mathias,

I tested your patch with Linux 4.11-rc3 and can confirm that it solves
the problem.

I've tested suspend and resume with i3lock 150 times and it works.

Thank you, I appreciate it a lot.

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-24 Thread Diego Viola

On Thu, Mar 23, 2017 at 2:12 PM, Diego Viola  wrote:
> On Thu, Mar 23, 2017 at 2:02 PM, Mathias Nyman
>  wrote:
>> On 22.03.2017 19:51, Mathias Nyman wrote:
>>>
>>> On 22.03.2017 00:52, Diego Viola wrote:
>>>>
>>>> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola 
>>>> wrote:
>>>>>
>>>>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola 
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola 
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> xhci tracing can be added with:
>>>>>>>>>>>
>>>>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>
>>>>
>>>> Here's the log I was able to obtain today, dmesg + ftrace at the time
>>>> of the crash:
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255419
>>>>
>>>> USB keyboard and mouse was plugged when I reproduced this.
>>>>
>>>> Please let me know if you need more info.
>>>>
>>>
>>> Thanks, I'm looking at the logs and so far the most suspicious looking
>>> entry is:
>>>
>>> [  257.060941] rtsx_usb-254 0 119946155us : xhci_urb_enqueue:
>>> ep1out-bulk: urb 880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream
>>> 0 flags 0001
>>> [  257.063601] rtsx_usb-254 0 119946162us : xhci_urb_enqueue:
>>> ep0out-control: urb 880105a93300 pipe 2147484928 length 0/0 sgs 0/0
>>> stream 0 flags 0010
>>>
>>> It enqueues the same URB, without ever giving it back or actually queuing
>>> any trbs for
>>> the urb, wel,l it might just fail to enqueue it in the first place.
>>>
>>> I need to search for a URB that has been dequeued but never given back in
>>> the trace
>>
>>
>> Ok, found a much more likely candidate:
>>
>> [  258.004078] kworker/-544 0d..1 121599183us : xhci_urb_dequeue:
>> ep1out-bulk: urb 880105a930c0 pipe 3221259520...
>>
>> We try to kill this URB "880105a930c0", twice, and its never given back.
>> Trace is missing "xhci_dbg_cancel_urb: Cancel URB..." entry in log after
>> xhci_urb_dequeue, so it never got added to the list for cancellation in xhci
>> driver.
>>
>> xhci_urb_dequeue() has one place where it just returns an error without
>> giving back the urb or queuing it for cancellation.
>> This is in my opinion a bug in xhci_urb_dequeue()
>>
>> rtsx_usb_ms is a good test for usb, it seems to be constantly queuing urbs
>> at all
>> inappropriate times.
>>
>> If I write a patch can you try it out?
>
> Yes.
>
>>
>> -Mathias
>>
>>
>>
>
> Thanks,
> Diego

Hi Mathias,

I tested your patch with Linux 4.11-rc3 and can confirm that it solves
the problem.

I've tested suspend and resume with i3lock 150 times and it works.

Thank you, I appreciate it a lot.

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-23 Thread Diego Viola

On Thu, Mar 23, 2017 at 2:02 PM, Mathias Nyman
<mathias.ny...@linux.intel.com> wrote:
> On 22.03.2017 19:51, Mathias Nyman wrote:
>>
>> On 22.03.2017 00:52, Diego Viola wrote:
>>>
>>> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <diego.vi...@gmail.com>
>>> wrote:
>>>>
>>>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <diego.vi...@gmail.com>
>>>> wrote:
>>>>>
>>>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <diego.vi...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <diego.vi...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>>>> <mathias.ny...@linux.intel.com> wrote:
>>>>>>>>
>>>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>>>> <mathias.ny...@linux.intel.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> xhci tracing can be added with:
>>>>>>>>>>
>>>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>
>>>
>>> Here's the log I was able to obtain today, dmesg + ftrace at the time
>>> of the crash:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255419
>>>
>>> USB keyboard and mouse was plugged when I reproduced this.
>>>
>>> Please let me know if you need more info.
>>>
>>
>> Thanks, I'm looking at the logs and so far the most suspicious looking
>> entry is:
>>
>> [  257.060941] rtsx_usb-254 0 119946155us : xhci_urb_enqueue:
>> ep1out-bulk: urb 880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream
>> 0 flags 0001
>> [  257.063601] rtsx_usb-254 0 119946162us : xhci_urb_enqueue:
>> ep0out-control: urb 880105a93300 pipe 2147484928 length 0/0 sgs 0/0
>> stream 0 flags 0010
>>
>> It enqueues the same URB, without ever giving it back or actually queuing
>> any trbs for
>> the urb, wel,l it might just fail to enqueue it in the first place.
>>
>> I need to search for a URB that has been dequeued but never given back in
>> the trace
>
>
> Ok, found a much more likely candidate:
>
> [  258.004078] kworker/-544 0d..1 121599183us : xhci_urb_dequeue:
> ep1out-bulk: urb 880105a930c0 pipe 3221259520...
>
> We try to kill this URB "880105a930c0", twice, and its never given back.
> Trace is missing "xhci_dbg_cancel_urb: Cancel URB..." entry in log after
> xhci_urb_dequeue, so it never got added to the list for cancellation in xhci
> driver.
>
> xhci_urb_dequeue() has one place where it just returns an error without
> giving back the urb or queuing it for cancellation.
> This is in my opinion a bug in xhci_urb_dequeue()
>
> rtsx_usb_ms is a good test for usb, it seems to be constantly queuing urbs
> at all
> inappropriate times.
>
> If I write a patch can you try it out?

Yes.

>
> -Mathias
>
>
>

Thanks,
Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-23 Thread Diego Viola

On Thu, Mar 23, 2017 at 2:02 PM, Mathias Nyman
 wrote:
> On 22.03.2017 19:51, Mathias Nyman wrote:
>>
>> On 22.03.2017 00:52, Diego Viola wrote:
>>>
>>> On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola 
>>> wrote:
>>>>
>>>> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola 
>>>> wrote:
>>>>>
>>>>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola 
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola 
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> xhci tracing can be added with:
>>>>>>>>>>
>>>>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>
>>>
>>> Here's the log I was able to obtain today, dmesg + ftrace at the time
>>> of the crash:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255419
>>>
>>> USB keyboard and mouse was plugged when I reproduced this.
>>>
>>> Please let me know if you need more info.
>>>
>>
>> Thanks, I'm looking at the logs and so far the most suspicious looking
>> entry is:
>>
>> [  257.060941] rtsx_usb-254 0 119946155us : xhci_urb_enqueue:
>> ep1out-bulk: urb 880105a93300 pipe 3221259520 length 0/12 sgs 0/0 stream
>> 0 flags 0001
>> [  257.063601] rtsx_usb-254 0 119946162us : xhci_urb_enqueue:
>> ep0out-control: urb 880105a93300 pipe 2147484928 length 0/0 sgs 0/0
>> stream 0 flags 0010
>>
>> It enqueues the same URB, without ever giving it back or actually queuing
>> any trbs for
>> the urb, wel,l it might just fail to enqueue it in the first place.
>>
>> I need to search for a URB that has been dequeued but never given back in
>> the trace
>
>
> Ok, found a much more likely candidate:
>
> [  258.004078] kworker/-544 0d..1 121599183us : xhci_urb_dequeue:
> ep1out-bulk: urb 880105a930c0 pipe 3221259520...
>
> We try to kill this URB "880105a930c0", twice, and its never given back.
> Trace is missing "xhci_dbg_cancel_urb: Cancel URB..." entry in log after
> xhci_urb_dequeue, so it never got added to the list for cancellation in xhci
> driver.
>
> xhci_urb_dequeue() has one place where it just returns an error without
> giving back the urb or queuing it for cancellation.
> This is in my opinion a bug in xhci_urb_dequeue()
>
> rtsx_usb_ms is a good test for usb, it seems to be constantly queuing urbs
> at all
> inappropriate times.
>
> If I write a patch can you try it out?

Yes.

>
> -Mathias
>
>
>

Thanks,
Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-21 Thread Diego Viola

On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <diego.vi...@gmail.com> wrote:
>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>> <mathias.ny...@linux.intel.com> wrote:
>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>> <mathias.ny...@linux.intel.com> wrote:
>>>>>>>
>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>
>>>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>>>> and
>>>>>>> giveback.
>>>>>>>
>>>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>>>> xhci
>>>>>>> verbose dynamic debug)
>>>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>>>> devices.
>>>>>>>
>>>>>>> xhci tracing can be added with:
>>>>>>>
>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>>>
>>>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>>>
>>>>>>
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>>>
>>>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>>>
>>>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>>>
>>>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>>>
>>>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>>>> tracing enabled?
>>>>>
>>>>>
>>>>> A trace and a dmesg of the same  suspend/reusume hang would be great.
>>>>
>>>> I can capture the dmesg with netconsole once the machine hangs, but
>>>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>>>> the hang. I'm unable to use ssh after the hang.
>>>>
>>>>> And if you can then one of a succesful suspend/resume for reference.
>>>>
>>>> Here's the trace after a successful suspend/resume:
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>>>
>>>>>
>>>>> (I haven't yet checked the one you added to bugzilla)
>>>>>
>>>>> -Mathias
>>>>
>>>> Diego
>>>
>>> ftrace_dump_on_oops is what I was looking for.
>>>
>>> Diego
>>
>> I tried ftrace_dump_on_oops but I can't see the trace coming in, not
>> sure what I'm doing wrong. :(
>>
>> Diego
>
> I was able to obtain the trace with this: hung_task_panic=1
> no_console_suspend=1 ftrace_dump_on_oops
>
> Diego

Here's the log I was able to obtain today, dmesg + ftrace at the time
of the crash:

https://bugzilla.kernel.org/attachment.cgi?id=255419

USB keyboard and mouse was plugged when I reproduced this.

Please let me know if you need more info.

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-21 Thread Diego Viola

On Tue, Mar 21, 2017 at 12:29 PM, Diego Viola  wrote:
> On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola  wrote:
>> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola  wrote:
>>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola  wrote:
>>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>>  wrote:
>>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>>  wrote:
>>>>>>>
>>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>>
>>>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>>>> and
>>>>>>> giveback.
>>>>>>>
>>>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>>>> xhci
>>>>>>> verbose dynamic debug)
>>>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>>>> devices.
>>>>>>>
>>>>>>> xhci tracing can be added with:
>>>>>>>
>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>>>
>>>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>>>
>>>>>>
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>>>
>>>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>>>
>>>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>>>
>>>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>>>
>>>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>>>> tracing enabled?
>>>>>
>>>>>
>>>>> A trace and a dmesg of the same  suspend/reusume hang would be great.
>>>>
>>>> I can capture the dmesg with netconsole once the machine hangs, but
>>>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>>>> the hang. I'm unable to use ssh after the hang.
>>>>
>>>>> And if you can then one of a succesful suspend/resume for reference.
>>>>
>>>> Here's the trace after a successful suspend/resume:
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>>>
>>>>>
>>>>> (I haven't yet checked the one you added to bugzilla)
>>>>>
>>>>> -Mathias
>>>>
>>>> Diego
>>>
>>> ftrace_dump_on_oops is what I was looking for.
>>>
>>> Diego
>>
>> I tried ftrace_dump_on_oops but I can't see the trace coming in, not
>> sure what I'm doing wrong. :(
>>
>> Diego
>
> I was able to obtain the trace with this: hung_task_panic=1
> no_console_suspend=1 ftrace_dump_on_oops
>
> Diego

Here's the log I was able to obtain today, dmesg + ftrace at the time
of the crash:

https://bugzilla.kernel.org/attachment.cgi?id=255419

USB keyboard and mouse was plugged when I reproduced this.

Please let me know if you need more info.

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-21 Thread Diego Viola

On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>> <mathias.ny...@linux.intel.com> wrote:
>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>
>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>> <mathias.ny...@linux.intel.com> wrote:
>>>>>>
>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>
>>>>>>>
>>>>
>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>
>>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>>> and
>>>>>> giveback.
>>>>>>
>>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>>> xhci
>>>>>> verbose dynamic debug)
>>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>>> devices.
>>>>>>
>>>>>> xhci tracing can be added with:
>>>>>>
>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>>
>>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>>
>>>>>
>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>>
>>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>>
>>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>>
>>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>>
>>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>>> tracing enabled?
>>>>
>>>>
>>>> A trace and a dmesg of the same  suspend/reusume hang would be great.
>>>
>>> I can capture the dmesg with netconsole once the machine hangs, but
>>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>>> the hang. I'm unable to use ssh after the hang.
>>>
>>>> And if you can then one of a succesful suspend/resume for reference.
>>>
>>> Here's the trace after a successful suspend/resume:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>>
>>>>
>>>> (I haven't yet checked the one you added to bugzilla)
>>>>
>>>> -Mathias
>>>
>>> Diego
>>
>> ftrace_dump_on_oops is what I was looking for.
>>
>> Diego
>
> I tried ftrace_dump_on_oops but I can't see the trace coming in, not
> sure what I'm doing wrong. :(
>
> Diego

I was able to obtain the trace with this: hung_task_panic=1
no_console_suspend=1 ftrace_dump_on_oops

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-21 Thread Diego Viola

On Tue, Mar 21, 2017 at 10:04 AM, Diego Viola  wrote:
> On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola  wrote:
>> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola  wrote:
>>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>>  wrote:
>>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>>
>>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>>  wrote:
>>>>>>
>>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>>
>>>>>>>
>>>>
>>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>>
>>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>>> and
>>>>>> giveback.
>>>>>>
>>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>>> xhci
>>>>>> verbose dynamic debug)
>>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>>> devices.
>>>>>>
>>>>>> xhci tracing can be added with:
>>>>>>
>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>>
>>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>>
>>>>>
>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>>
>>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>>
>>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>>
>>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>>
>>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>>> tracing enabled?
>>>>
>>>>
>>>> A trace and a dmesg of the same  suspend/reusume hang would be great.
>>>
>>> I can capture the dmesg with netconsole once the machine hangs, but
>>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>>> the hang. I'm unable to use ssh after the hang.
>>>
>>>> And if you can then one of a succesful suspend/resume for reference.
>>>
>>> Here's the trace after a successful suspend/resume:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>>
>>>>
>>>> (I haven't yet checked the one you added to bugzilla)
>>>>
>>>> -Mathias
>>>
>>> Diego
>>
>> ftrace_dump_on_oops is what I was looking for.
>>
>> Diego
>
> I tried ftrace_dump_on_oops but I can't see the trace coming in, not
> sure what I'm doing wrong. :(
>
> Diego

I was able to obtain the trace with this: hung_task_panic=1
no_console_suspend=1 ftrace_dump_on_oops

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-21 Thread Diego Viola

On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>> <mathias.ny...@linux.intel.com> wrote:
>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>
>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>> <mathias.ny...@linux.intel.com> wrote:
>>>>>
>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>
>>>>>>
>>>
>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>
>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>> and
>>>>> giveback.
>>>>>
>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>> xhci
>>>>> verbose dynamic debug)
>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>> devices.
>>>>>
>>>>> xhci tracing can be added with:
>>>>>
>>>>> mount -t debugfs none /sys/kernel/debug
>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>
>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>
>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>
>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>
>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>
>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>> tracing enabled?
>>>
>>>
>>> A trace and a dmesg of the same  suspend/reusume hang would be great.
>>
>> I can capture the dmesg with netconsole once the machine hangs, but
>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>> the hang. I'm unable to use ssh after the hang.
>>
>>> And if you can then one of a succesful suspend/resume for reference.
>>
>> Here's the trace after a successful suspend/resume:
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>
>>>
>>> (I haven't yet checked the one you added to bugzilla)
>>>
>>> -Mathias
>>
>> Diego
>
> ftrace_dump_on_oops is what I was looking for.
>
> Diego

I tried ftrace_dump_on_oops but I can't see the trace coming in, not
sure what I'm doing wrong. :(

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-21 Thread Diego Viola

On Mon, Mar 20, 2017 at 8:15 PM, Diego Viola  wrote:
> On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola  wrote:
>> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>>  wrote:
>>> On 20.03.2017 17:39, Diego Viola wrote:
>>>>
>>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>>  wrote:
>>>>>
>>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>>
>>>>>>
>>>
>>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>>
>>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>>> and
>>>>> giveback.
>>>>>
>>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>>> xhci
>>>>> verbose dynamic debug)
>>>>> It will generate a lot of data, so better to remove all extra USB
>>>>> devices.
>>>>>
>>>>> xhci tracing can be added with:
>>>>>
>>>>> mount -t debugfs none /sys/kernel/debug
>>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>>
>>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>>
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>>
>>>> This is with Linux 4.11.0-rc3-ARCH.
>>>>
>>>> USB mouse/keyboard was unplugged before booting the machine.
>>>>
>>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>>
>>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>>> tracing enabled?
>>>
>>>
>>> A trace and a dmesg of the same  suspend/reusume hang would be great.
>>
>> I can capture the dmesg with netconsole once the machine hangs, but
>> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
>> the hang. I'm unable to use ssh after the hang.
>>
>>> And if you can then one of a succesful suspend/resume for reference.
>>
>> Here's the trace after a successful suspend/resume:
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255369
>>
>>>
>>> (I haven't yet checked the one you added to bugzilla)
>>>
>>> -Mathias
>>
>> Diego
>
> ftrace_dump_on_oops is what I was looking for.
>
> Diego

I tried ftrace_dump_on_oops but I can't see the trace coming in, not
sure what I'm doing wrong. :(

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-20 Thread Diego Viola

On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
> <mathias.ny...@linux.intel.com> wrote:
>> On 20.03.2017 17:39, Diego Viola wrote:
>>>
>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>> <mathias.ny...@linux.intel.com> wrote:
>>>>
>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>
>>>>>
>>
>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>
>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>> and
>>>> giveback.
>>>>
>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>> xhci
>>>> verbose dynamic debug)
>>>> It will generate a lot of data, so better to remove all extra USB
>>>> devices.
>>>>
>>>> xhci tracing can be added with:
>>>>
>>>> mount -t debugfs none /sys/kernel/debug
>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>
>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>
>>> This is with Linux 4.11.0-rc3-ARCH.
>>>
>>> USB mouse/keyboard was unplugged before booting the machine.
>>>
>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>
>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>> tracing enabled?
>>
>>
>> A trace and a dmesg of the same  suspend/reusume hang would be great.
>
> I can capture the dmesg with netconsole once the machine hangs, but
> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
> the hang. I'm unable to use ssh after the hang.
>
>> And if you can then one of a succesful suspend/resume for reference.
>
> Here's the trace after a successful suspend/resume:
>
> https://bugzilla.kernel.org/attachment.cgi?id=255369
>
>>
>> (I haven't yet checked the one you added to bugzilla)
>>
>> -Mathias
>
> Diego

ftrace_dump_on_oops is what I was looking for.

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-20 Thread Diego Viola

On Mon, Mar 20, 2017 at 3:27 PM, Diego Viola  wrote:
> On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
>  wrote:
>> On 20.03.2017 17:39, Diego Viola wrote:
>>>
>>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>>  wrote:
>>>>
>>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>>
>>>>>
>>
>>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>>
>>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>>> and
>>>> giveback.
>>>>
>>>> Could you try enabling xhci tracing before suspending (not the same as
>>>> xhci
>>>> verbose dynamic debug)
>>>> It will generate a lot of data, so better to remove all extra USB
>>>> devices.
>>>>
>>>> xhci tracing can be added with:
>>>>
>>>> mount -t debugfs none /sys/kernel/debug
>>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>>
>>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>>
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>>
>>> This is with Linux 4.11.0-rc3-ARCH.
>>>
>>> USB mouse/keyboard was unplugged before booting the machine.
>>>
>>> I didn't do a suspend/resume before getting this trace, should I do that?
>>>
>>> Should I reproduce the hang and get a netconsole dmesg capture with
>>> tracing enabled?
>>
>>
>> A trace and a dmesg of the same  suspend/reusume hang would be great.
>
> I can capture the dmesg with netconsole once the machine hangs, but
> I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
> the hang. I'm unable to use ssh after the hang.
>
>> And if you can then one of a succesful suspend/resume for reference.
>
> Here's the trace after a successful suspend/resume:
>
> https://bugzilla.kernel.org/attachment.cgi?id=255369
>
>>
>> (I haven't yet checked the one you added to bugzilla)
>>
>> -Mathias
>
> Diego

ftrace_dump_on_oops is what I was looking for.

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-20 Thread Diego Viola

On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
<mathias.ny...@linux.intel.com> wrote:
> On 20.03.2017 17:39, Diego Viola wrote:
>>
>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>> <mathias.ny...@linux.intel.com> wrote:
>>>
>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>
>>>>
>
>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>
>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>> and
>>> giveback.
>>>
>>> Could you try enabling xhci tracing before suspending (not the same as
>>> xhci
>>> verbose dynamic debug)
>>> It will generate a lot of data, so better to remove all extra USB
>>> devices.
>>>
>>> xhci tracing can be added with:
>>>
>>> mount -t debugfs none /sys/kernel/debug
>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>
>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>
>> This is with Linux 4.11.0-rc3-ARCH.
>>
>> USB mouse/keyboard was unplugged before booting the machine.
>>
>> I didn't do a suspend/resume before getting this trace, should I do that?
>>
>> Should I reproduce the hang and get a netconsole dmesg capture with
>> tracing enabled?
>
>
> A trace and a dmesg of the same  suspend/reusume hang would be great.

I can capture the dmesg with netconsole once the machine hangs, but
I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
the hang. I'm unable to use ssh after the hang.

> And if you can then one of a succesful suspend/resume for reference.

Here's the trace after a successful suspend/resume:

https://bugzilla.kernel.org/attachment.cgi?id=255369

>
> (I haven't yet checked the one you added to bugzilla)
>
> -Mathias

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-20 Thread Diego Viola

On Mon, Mar 20, 2017 at 1:32 PM, Mathias Nyman
 wrote:
> On 20.03.2017 17:39, Diego Viola wrote:
>>
>> On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
>>  wrote:
>>>
>>> On 19.03.2017 23:29, Diego Viola wrote:
>>>>
>>>>
>
>>>>> Still a problem with 4.11.0-rc2-ARCH+
>>>
>>> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue
>>> and
>>> giveback.
>>>
>>> Could you try enabling xhci tracing before suspending (not the same as
>>> xhci
>>> verbose dynamic debug)
>>> It will generate a lot of data, so better to remove all extra USB
>>> devices.
>>>
>>> xhci tracing can be added with:
>>>
>>> mount -t debugfs none /sys/kernel/debug
>>> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>>>
>>> and then send the output of cat /sys/kernel/debug/tracing/trace
>>
>>
>> https://bugzilla.kernel.org/attachment.cgi?id=255367
>>
>> This is with Linux 4.11.0-rc3-ARCH.
>>
>> USB mouse/keyboard was unplugged before booting the machine.
>>
>> I didn't do a suspend/resume before getting this trace, should I do that?
>>
>> Should I reproduce the hang and get a netconsole dmesg capture with
>> tracing enabled?
>
>
> A trace and a dmesg of the same  suspend/reusume hang would be great.

I can capture the dmesg with netconsole once the machine hangs, but
I'm not sure how I could capture /sys/kernel/debug/tracing/trace after
the hang. I'm unable to use ssh after the hang.

> And if you can then one of a succesful suspend/resume for reference.

Here's the trace after a successful suspend/resume:

https://bugzilla.kernel.org/attachment.cgi?id=255369

>
> (I haven't yet checked the one you added to bugzilla)
>
> -Mathias

Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-20 Thread Diego Viola

On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
<mathias.ny...@linux.intel.com> wrote:
> On 19.03.2017 23:29, Diego Viola wrote:
>>
>> On Fri, Mar 17, 2017 at 5:18 PM, Diego Viola <diego.vi...@gmail.com>
>> wrote:
>>>
>>> On Fri, Mar 17, 2017 at 1:57 PM, Diego Viola <diego.vi...@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern <st...@rowland.harvard.edu>
>>>> wrote:
>>>>>
>>>>> On Fri, 17 Mar 2017, Diego Viola wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Here's the log to the netconsole dmesg capture, I've used
>>>>>> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>>>>>>
>>>>>> I did the usual suspend/resume cycle with i3lock, it hung after the
>>>>>> third attempt when trying to resume from suspend.
>>>>>>
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255309
>>>>>
>>>>>
>>>>> I'm not an expert on xHCI.  This should be CC'ed to the xhci-hcd
>>>>> maintainer.
>>>>>
>>>>> Alan Stern
>>>>>
>>>>>>
>>>>>> Please let me know if I should provide something else.
>>>>>>
>>>>>> Thanks,
>>>>>> Diego
>>>>>>
>>>>>
>>>>
>>>> I've forwarded my email to Mathias Nyman.
>>>>
>>>> Diego
>>>
>>>
>>> Still a problem with 4.11.0-rc2-ARCH+
>>>
>
> From a quick glance it looks like rtsx_usb_ms probaly takes a mutex
> (>dev_mutex)
> and then issues a usb_bulk_msg() and waits for it to complete with mutex
> held.
> The usb message times out, usb core kills the urb but the URB probably never
> gets completed,
> and function never returns.
>
> Everyting using ucr->dev_mutex would block, for example the kthread,
> rtsx_usb_detect_ms_card
> that continuously tries to detect a ms card, takes and releases the same
> ucr->dev_mutex for
> each try.
>
> [  614.026502] INFO: task kworker/u8:0:5 blocked for more than 120 seconds.
> [  614.027865]   Not tainted 4.10.3-1-ARCH #1
> [  614.029116] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  614.030467] kworker/u8:0D0 5  2 0x
> [  614.031812] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [  614.033179] Call Trace:
> [  614.034550]  __schedule+0x22f/0x700
> [  614.035940]  schedule+0x3d/0x90
> [  614.037334]  schedule_preempt_disabled+0x15/0x20
> [  614.038680]  __mutex_lock_slowpath+0x19b/0x2d0
> [  614.040067]  ? flush_workqueue+0x204/0x580
> [  614.041456]  mutex_lock+0x23/0x30
> [  614.042163]  acpi_device_hotplug+0x43/0x3e7
> [  614.042882]  acpi_hotplug_work_fn+0x1e/0x29
> [  614.043612]  process_one_work+0x1e5/0x470
> [  614.044356]  worker_thread+0x48/0x4e0
> [  614.045077]  kthread+0x101/0x140
> [  614.045788]  ? process_one_work+0x470/0x470
> [  614.046495]  ? kthread_create_on_node+0x60/0x60
> [  614.047215]  ret_from_fork+0x2c/0x40
> [  614.047950] INFO: task rtsx_usb_ms_1:235 blocked for more than 120
> seconds.
> [  614.048697]   Not tainted 4.10.3-1-ARCH #1
> [  614.049465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  614.050265] rtsx_usb_ms_1   D0   235  2 0x
> [  614.051064] Call Trace:
> [  614.051841]  __schedule+0x22f/0x700
> [  614.052626]  schedule+0x3d/0x90
> [  614.053411]  usb_kill_urb.part.4+0x6c/0xa0 [usbcore]
> [  614.054198]  ? wake_atomic_t_function+0x60/0x60
> [  614.055005]  usb_kill_urb+0x21/0x30 [usbcore]
> [  614.055819]  usb_start_wait_urb+0xe5/0x170 [usbcore]
> [  614.056652]  usb_bulk_msg+0xbd/0x160 [usbcore]
> [  614.057489]  rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]
> [  614.058306]  rtsx_usb_read_register+0x6c/0xc0 [rtsx_usb]
> [  614.059118]  rtsx_usb_detect_ms_card+0x98/0x120 [rtsx_usb_ms]
>
> There is a lot going on in xhci during the last suspend befor this.
> URBs are canceled, devices reset and re-enumerated, timeout while reading
> descriptor,
> device firmware changed.
>
>  It's possible we end up in a situation where xhci never givers back the
> URB.
>
> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue and
> giveback.
>
> Could you try enabling xhci tracing before suspending (not the same as xhci
> verbose dynamic debug)
> It will generate a lot of data, so better to remove all extra USB devices.
>
> xhci tracing can be added with:
>
> mount -t debugfs none /sys/kernel/debug
> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>
> and then send the output of cat /sys/kernel/debug/tracing/trace
>
> -Mathias
>
>

https://bugzilla.kernel.org/attachment.cgi?id=255367

This is with Linux 4.11.0-rc3-ARCH.

USB mouse/keyboard was unplugged before booting the machine.

I didn't do a suspend/resume before getting this trace, should I do that?

Should I reproduce the hang and get a netconsole dmesg capture with
tracing enabled?

Thanks,
Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-20 Thread Diego Viola

On Mon, Mar 20, 2017 at 11:21 AM, Mathias Nyman
 wrote:
> On 19.03.2017 23:29, Diego Viola wrote:
>>
>> On Fri, Mar 17, 2017 at 5:18 PM, Diego Viola 
>> wrote:
>>>
>>> On Fri, Mar 17, 2017 at 1:57 PM, Diego Viola 
>>> wrote:
>>>>
>>>> On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern 
>>>> wrote:
>>>>>
>>>>> On Fri, 17 Mar 2017, Diego Viola wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Here's the log to the netconsole dmesg capture, I've used
>>>>>> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>>>>>>
>>>>>> I did the usual suspend/resume cycle with i3lock, it hung after the
>>>>>> third attempt when trying to resume from suspend.
>>>>>>
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=255309
>>>>>
>>>>>
>>>>> I'm not an expert on xHCI.  This should be CC'ed to the xhci-hcd
>>>>> maintainer.
>>>>>
>>>>> Alan Stern
>>>>>
>>>>>>
>>>>>> Please let me know if I should provide something else.
>>>>>>
>>>>>> Thanks,
>>>>>> Diego
>>>>>>
>>>>>
>>>>
>>>> I've forwarded my email to Mathias Nyman.
>>>>
>>>> Diego
>>>
>>>
>>> Still a problem with 4.11.0-rc2-ARCH+
>>>
>
> From a quick glance it looks like rtsx_usb_ms probaly takes a mutex
> (>dev_mutex)
> and then issues a usb_bulk_msg() and waits for it to complete with mutex
> held.
> The usb message times out, usb core kills the urb but the URB probably never
> gets completed,
> and function never returns.
>
> Everyting using ucr->dev_mutex would block, for example the kthread,
> rtsx_usb_detect_ms_card
> that continuously tries to detect a ms card, takes and releases the same
> ucr->dev_mutex for
> each try.
>
> [  614.026502] INFO: task kworker/u8:0:5 blocked for more than 120 seconds.
> [  614.027865]   Not tainted 4.10.3-1-ARCH #1
> [  614.029116] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  614.030467] kworker/u8:0D0 5  2 0x
> [  614.031812] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> [  614.033179] Call Trace:
> [  614.034550]  __schedule+0x22f/0x700
> [  614.035940]  schedule+0x3d/0x90
> [  614.037334]  schedule_preempt_disabled+0x15/0x20
> [  614.038680]  __mutex_lock_slowpath+0x19b/0x2d0
> [  614.040067]  ? flush_workqueue+0x204/0x580
> [  614.041456]  mutex_lock+0x23/0x30
> [  614.042163]  acpi_device_hotplug+0x43/0x3e7
> [  614.042882]  acpi_hotplug_work_fn+0x1e/0x29
> [  614.043612]  process_one_work+0x1e5/0x470
> [  614.044356]  worker_thread+0x48/0x4e0
> [  614.045077]  kthread+0x101/0x140
> [  614.045788]  ? process_one_work+0x470/0x470
> [  614.046495]  ? kthread_create_on_node+0x60/0x60
> [  614.047215]  ret_from_fork+0x2c/0x40
> [  614.047950] INFO: task rtsx_usb_ms_1:235 blocked for more than 120
> seconds.
> [  614.048697]   Not tainted 4.10.3-1-ARCH #1
> [  614.049465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  614.050265] rtsx_usb_ms_1   D0   235  2 0x
> [  614.051064] Call Trace:
> [  614.051841]  __schedule+0x22f/0x700
> [  614.052626]  schedule+0x3d/0x90
> [  614.053411]  usb_kill_urb.part.4+0x6c/0xa0 [usbcore]
> [  614.054198]  ? wake_atomic_t_function+0x60/0x60
> [  614.055005]  usb_kill_urb+0x21/0x30 [usbcore]
> [  614.055819]  usb_start_wait_urb+0xe5/0x170 [usbcore]
> [  614.056652]  usb_bulk_msg+0xbd/0x160 [usbcore]
> [  614.057489]  rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]
> [  614.058306]  rtsx_usb_read_register+0x6c/0xc0 [rtsx_usb]
> [  614.059118]  rtsx_usb_detect_ms_card+0x98/0x120 [rtsx_usb_ms]
>
> There is a lot going on in xhci during the last suspend befor this.
> URBs are canceled, devices reset and re-enumerated, timeout while reading
> descriptor,
> device firmware changed.
>
>  It's possible we end up in a situation where xhci never givers back the
> URB.
>
> 4.11-rc2 has better xhci tracing, it shows each URB enqueue and dequeue and
> giveback.
>
> Could you try enabling xhci tracing before suspending (not the same as xhci
> verbose dynamic debug)
> It will generate a lot of data, so better to remove all extra USB devices.
>
> xhci tracing can be added with:
>
> mount -t debugfs none /sys/kernel/debug
> echo xhci-hcd >> /sys/kernel/debug/tracing/set_event
>
> and then send the output of cat /sys/kernel/debug/tracing/trace
>
> -Mathias
>
>

https://bugzilla.kernel.org/attachment.cgi?id=255367

This is with Linux 4.11.0-rc3-ARCH.

USB mouse/keyboard was unplugged before booting the machine.

I didn't do a suspend/resume before getting this trace, should I do that?

Should I reproduce the hang and get a netconsole dmesg capture with
tracing enabled?

Thanks,
Diego

Re: Dell Inspiron 5558/0VNM2T hangs at resume from suspend when USB 3 is enabled

2017-03-19 Thread Diego Viola

On Fri, Mar 17, 2017 at 5:18 PM, Diego Viola <diego.vi...@gmail.com> wrote:
> On Fri, Mar 17, 2017 at 1:57 PM, Diego Viola <diego.vi...@gmail.com> wrote:
>> On Fri, Mar 17, 2017 at 1:24 PM, Alan Stern <st...@rowland.harvard.edu> 
>> wrote:
>>> On Fri, 17 Mar 2017, Diego Viola wrote:
>>>
>>>> Hi,
>>>>
>>>> Here's the log to the netconsole dmesg capture, I've used
>>>> xhci_hcd.dyndbg no_console_suspend=1 as the kernel parameters.
>>>>
>>>> I did the usual suspend/resume cycle with i3lock, it hung after the
>>>> third attempt when trying to resume from suspend.
>>>>
>>>> https://bugzilla.kernel.org/attachment.cgi?id=255309
>>>
>>> I'm not an expert on xHCI.  This should be CC'ed to the xhci-hcd
>>> maintainer.
>>>
>>> Alan Stern
>>>
>>>>
>>>> Please let me know if I should provide something else.
>>>>
>>>> Thanks,
>>>> Diego
>>>>
>>>
>>
>> I've forwarded my email to Mathias Nyman.
>>
>> Diego
>
> Still a problem with 4.11.0-rc2-ARCH+
>
> commit d528ae0d3dfedea553812c957a6ed1e87feeed8a

I have had a conversation with oiaohm over IRC about this, some
interesting things he had said about this issue:

2017-03-18 18:08:02 oiaohm  That driver that was going dead
because because it was physical port less was usb stack.   So maybe it
that bit of hardware still doing stupid.

2017-03-18 18:21:44 oiaohm  I guess this current log of yours is
with the realtek memstick black listed.
2017-03-18 18:21:55 oiaohm  because it does not exist.
2017-03-18 18:22:09 oiaohm  physically.

2017-03-18 18:23:04 oiaohm  Maybe.   If the hardware is not inited
the usb stack might not try to suspend it.

2017-03-18 18:26:30 oiaohm  No matter how you look at it the thing
is broken hardware.   I don't know if that realtek is USB 3.0
2017-03-18 18:27:02 oiaohm  Or it sitting on a USB 3.0 hub inside
the machine.

2017-03-18 18:27:39 oiaohm  You cannot expect a driver to work
when the hardware is portless.
2017-03-18 18:27:51 oiaohm  and it should have a port.
2017-03-18 18:27:52 oiaohm  either.

2017-03-18 18:29:03 oiaohm  rtsx_usb_ms this is a memstick driver
there should be  memstick port on you system or a header for a
memstick port both mean the pull down and pull up circiuts are present
so the hardware cannot function right.
2017-03-18 18:29:38 oiaohm  You gone over the machine and there is
no memstick port exposed being a laptop the odds of internal header is
basically never happens.
2017-03-18 18:30:27 oiaohm  so it broken hardware.
2017-03-18 18:31:18 oiaohm  the correct answer with broken
hardware is don't init the part blacklist the driver.

2017-03-18 18:40:36 oiaohm  You can think of it this way the
hardware gets lost because it cannot tell if something is connected so
is sending messages and waiting for responses that will never come.
But when hardware is there due to different speeds of cards it has no
clear clue what the time frame is.
2017-03-18 18:41:27 oiaohm  So the hardware being lost and kinda
jammed is purely to be excepted if it does not have all it required
circuits to function.

2017-03-18 18:48:42 oiaohm  You have the realtek controller for a
memstick port and it cannot tell if the proper hardware is present or
not that is what is triggering the driver to load.
2017-03-18 18:49:18 oiaohm  There is a difference when you have
the USB 3.0 controller active.
2017-03-18 18:49:49 oiaohm  You will see a lot of windows users
noting they need to disable the USB 3.0 controller to hibernate.
2017-03-18 18:50:19 oiaohm  In usb 2.0 the operating system polls
the USB ports and does a lot of the messaging.   In USB 3.0 controller
it does that polling.
2017-03-18 18:50:40 oiaohm  USB 3.0 controllers normally presume
all the hardware that is inited is functional.
2017-03-18 18:51:07 oiaohm  Linux kernel doing USB 2.0 polling
itself presumes the hardware could be busted.
2017-03-18 18:56:37 oiaohm  USB 3.0 controller is interpret driven
to the OS so it does a lot of heavy lifting of USB by itself. USB
2.0 and before controllers are like win modems basically brainless and
depending on the OS todo everything thing.

2017-03-18 18:58:27 oiaohm  So usb 2.0 controller not showing the
issue and the usb 3.0 showing the issue is kind of expected.   If you
did not init the hardware and usb 3.0 controller still showed a issue
then there would be a problem.
2017-03-18 19:02:59 oiaohm  dviola I guess the only thing you were
missing is that the USB 3.0 controller had proper controller so can
think for itself and USB 2.0 and before is like a brainless winmoden
so the OS can work around a few USB hardware issues in USB 2.0

1 2 3 4 >

1 - 100 of 301 matches

Mail list logo