Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread Hans Petter Selasky

On 2020-05-27 23:38, John Baldwin wrote:

No.  I get that constantly on a desktop that never suspends/resumes.
It only started after upgrading to 12.0.


If you have time, could you investigate why the USB host controllers 
Root HUB PCI register flips to -1U ? Which cause these spurious events 
... Maybe some kind of PCI power save feature which is not timed 
correctly ...


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread John Baldwin
On 5/27/20 2:05 PM, Hans Petter Selasky wrote:
> On 2020-05-27 15:41, Justin Hibbits wrote:
>> On Wed, 27 May 2020 06:27:16 -0700
>> John Baldwin  wrote:
>>
>>> On 5/27/20 2:39 AM, Andriy Gapon wrote:
 On 27/05/2020 11:13, Andriy Gapon wrote:
> I added more diagnostics and it seems to support the idea that the
> problem is related to I/O cycles and bridges.
>
> ACPI timer suddenly starts returning 0x and that lasts for
> tens of microseconds before the timer goes back to returning
> normal values with an expected increase.
> AMD provides a proprietary way to access ACPI registers via MMIO
> (0xfed808xx). That mechanism is unaffected, ACPI timer register
> always returns good values.
>
> The problem seems to happen when restoring configuration of a
> particular PCI bridge.  What's interesting is that the bridge
> decodes one memory range and one I/O range.
>
> Looking at pci_cfg_restore() I wonder if it is wise to restore
> PCIR_COMMAND so early.  Could it be that after the resume the
> bridge is configured with a wrong I/O range (e.g., too wide) and
> by writing PCIR_COMMAND we enable that decoding. So, the bridge
> steals I/O cycles destined for ACPI support hardware.  If there is
> nothing behind the bridge to handle those ports, then we get those
> bad readings. Once the bridge configuration is fully restored, the
> I/O handling goes back to normal.

  From what I see, this looks like a BIOS bug.
 Upon resume, it swaps window configurations of pcib1 and pcib2
 (until FreeBSD restores them).  pcib1 originally does not have an
 I/O window.  So, BIOS programs both base and limit of pcib2 I/O
 window to zero.   When FreeBSD writes its command register to
 enable I/O decoding it starts claiming 0x0 - 0xFFF I/O port range.
 That covers the ACPI ports at 0x8xx.

 Some printf-s.
  From (verbose) boot time:
 pcib1:   domain0
 pcib1:   secondary bus 1
 pcib1:   subordinate bus   1
 pcib1:   memory decode 0xfea0-0xfeaf
 pcib2:   domain0
 pcib2:   secondary bus 2
 pcib2:   subordinate bus   2
 pcib2:   I/O decode0xf000-0x
 pcib2:   memory decode 0xfe90-0xfe9f

 My printf-s from resume time:
 pcib1: old I/O base (low): 0xf1
 pcib1: old I/O base (high): 0x0
 pcib1: old I/O limit (low): 0x1
 pcib1: old I/O limit (high): 0x0
 pcib2: old I/O base (low): 0x1
 pcib2: old I/O base (high): 0x0
 pcib2: old I/O limit (low): 0x1
 pcib2: old I/O limit (high): 0x0
>>>
>>> The "solution" I think is to have resume be multi-pass and to resume
>>> all the bridges first before trying to resume leaf devices (including
>>> timers), but that's a fair bit of work.  It might be that we just
>>> need to resume timer interrupts later after the new-bus resume (I
>>> think we currently do it before?), though the reason for that was to
>>> allow resume methods in devices to sleep (I'm not sure if any do).
>>>
>>
>> That sounds like a good fit for https://reviews.freebsd.org/D203 .
>> Someone (TM) just needs to take it over the finish line... 6 years
>> later.
> 
> Is this perhaps related to:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666

No.  I get that constantly on a desktop that never suspends/resumes.
It only started after upgrading to 12.0.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread Hans Petter Selasky

On 2020-05-27 15:41, Justin Hibbits wrote:

On Wed, 27 May 2020 06:27:16 -0700
John Baldwin  wrote:


On 5/27/20 2:39 AM, Andriy Gapon wrote:

On 27/05/2020 11:13, Andriy Gapon wrote:

I added more diagnostics and it seems to support the idea that the
problem is related to I/O cycles and bridges.

ACPI timer suddenly starts returning 0x and that lasts for
tens of microseconds before the timer goes back to returning
normal values with an expected increase.
AMD provides a proprietary way to access ACPI registers via MMIO
(0xfed808xx). That mechanism is unaffected, ACPI timer register
always returns good values.

The problem seems to happen when restoring configuration of a
particular PCI bridge.  What's interesting is that the bridge
decodes one memory range and one I/O range.

Looking at pci_cfg_restore() I wonder if it is wise to restore
PCIR_COMMAND so early.  Could it be that after the resume the
bridge is configured with a wrong I/O range (e.g., too wide) and
by writing PCIR_COMMAND we enable that decoding. So, the bridge
steals I/O cycles destined for ACPI support hardware.  If there is
nothing behind the bridge to handle those ports, then we get those
bad readings. Once the bridge configuration is fully restored, the
I/O handling goes back to normal.


 From what I see, this looks like a BIOS bug.
Upon resume, it swaps window configurations of pcib1 and pcib2
(until FreeBSD restores them).  pcib1 originally does not have an
I/O window.  So, BIOS programs both base and limit of pcib2 I/O
window to zero.   When FreeBSD writes its command register to
enable I/O decoding it starts claiming 0x0 - 0xFFF I/O port range.
That covers the ACPI ports at 0x8xx.

Some printf-s.
 From (verbose) boot time:
pcib1:   domain0
pcib1:   secondary bus 1
pcib1:   subordinate bus   1
pcib1:   memory decode 0xfea0-0xfeaf
pcib2:   domain0
pcib2:   secondary bus 2
pcib2:   subordinate bus   2
pcib2:   I/O decode0xf000-0x
pcib2:   memory decode 0xfe90-0xfe9f

My printf-s from resume time:
pcib1: old I/O base (low): 0xf1
pcib1: old I/O base (high): 0x0
pcib1: old I/O limit (low): 0x1
pcib1: old I/O limit (high): 0x0
pcib2: old I/O base (low): 0x1
pcib2: old I/O base (high): 0x0
pcib2: old I/O limit (low): 0x1
pcib2: old I/O limit (high): 0x0


The "solution" I think is to have resume be multi-pass and to resume
all the bridges first before trying to resume leaf devices (including
timers), but that's a fair bit of work.  It might be that we just
need to resume timer interrupts later after the new-bus resume (I
think we currently do it before?), though the reason for that was to
allow resume methods in devices to sleep (I'm not sure if any do).



That sounds like a good fit for https://reviews.freebsd.org/D203 .
Someone (TM) just needs to take it over the finish line... 6 years
later.


Is this perhaps related to:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread Justin Hibbits
On Wed, 27 May 2020 06:27:16 -0700
John Baldwin  wrote:

> On 5/27/20 2:39 AM, Andriy Gapon wrote:
> > On 27/05/2020 11:13, Andriy Gapon wrote:  
> >> I added more diagnostics and it seems to support the idea that the
> >> problem is related to I/O cycles and bridges.
> >>
> >> ACPI timer suddenly starts returning 0x and that lasts for
> >> tens of microseconds before the timer goes back to returning
> >> normal values with an expected increase.
> >> AMD provides a proprietary way to access ACPI registers via MMIO
> >> (0xfed808xx). That mechanism is unaffected, ACPI timer register
> >> always returns good values.
> >>
> >> The problem seems to happen when restoring configuration of a
> >> particular PCI bridge.  What's interesting is that the bridge
> >> decodes one memory range and one I/O range.
> >>
> >> Looking at pci_cfg_restore() I wonder if it is wise to restore
> >> PCIR_COMMAND so early.  Could it be that after the resume the
> >> bridge is configured with a wrong I/O range (e.g., too wide) and
> >> by writing PCIR_COMMAND we enable that decoding. So, the bridge
> >> steals I/O cycles destined for ACPI support hardware.  If there is
> >> nothing behind the bridge to handle those ports, then we get those
> >> bad readings. Once the bridge configuration is fully restored, the
> >> I/O handling goes back to normal.  
> > 
> > From what I see, this looks like a BIOS bug.
> > Upon resume, it swaps window configurations of pcib1 and pcib2
> > (until FreeBSD restores them).  pcib1 originally does not have an
> > I/O window.  So, BIOS programs both base and limit of pcib2 I/O
> > window to zero.   When FreeBSD writes its command register to
> > enable I/O decoding it starts claiming 0x0 - 0xFFF I/O port range.
> > That covers the ACPI ports at 0x8xx.
> > 
> > Some printf-s.
> > From (verbose) boot time:
> > pcib1:   domain0
> > pcib1:   secondary bus 1
> > pcib1:   subordinate bus   1
> > pcib1:   memory decode 0xfea0-0xfeaf
> > pcib2:   domain0
> > pcib2:   secondary bus 2
> > pcib2:   subordinate bus   2
> > pcib2:   I/O decode0xf000-0x
> > pcib2:   memory decode 0xfe90-0xfe9f
> > 
> > My printf-s from resume time:
> > pcib1: old I/O base (low): 0xf1
> > pcib1: old I/O base (high): 0x0
> > pcib1: old I/O limit (low): 0x1
> > pcib1: old I/O limit (high): 0x0
> > pcib2: old I/O base (low): 0x1
> > pcib2: old I/O base (high): 0x0
> > pcib2: old I/O limit (low): 0x1
> > pcib2: old I/O limit (high): 0x0  
> 
> The "solution" I think is to have resume be multi-pass and to resume
> all the bridges first before trying to resume leaf devices (including
> timers), but that's a fair bit of work.  It might be that we just
> need to resume timer interrupts later after the new-bus resume (I
> think we currently do it before?), though the reason for that was to
> allow resume methods in devices to sleep (I'm not sure if any do).
> 

That sounds like a good fit for https://reviews.freebsd.org/D203 .
Someone (TM) just needs to take it over the finish line... 6 years
later.

- Justin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread Andriy Gapon
On 27/05/2020 16:27, John Baldwin wrote:
> The "solution" I think is to have resume be multi-pass and to resume all the 
> bridges
> first before trying to resume leaf devices (including timers), but that's a 
> fair bit
> of work.  It might be that we just need to resume timer interrupts later 
> after the
> new-bus resume (I think we currently do it before?), though the reason for 
> that was
> to allow resume methods in devices to sleep (I'm not sure if any do).

But it's not only about timers.
{sbin,bin,micro,etc}uptime() calls can return garbage as well and confuse their
callers.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread John Baldwin
On 5/27/20 2:39 AM, Andriy Gapon wrote:
> On 27/05/2020 11:13, Andriy Gapon wrote:
>> I added more diagnostics and it seems to support the idea that the problem is
>> related to I/O cycles and bridges.
>>
>> ACPI timer suddenly starts returning 0x and that lasts for tens of
>> microseconds before the timer goes back to returning normal values with an
>> expected increase.
>> AMD provides a proprietary way to access ACPI registers via MMIO 
>> (0xfed808xx).
>> That mechanism is unaffected, ACPI timer register always returns good values.
>>
>> The problem seems to happen when restoring configuration of a particular PCI
>> bridge.  What's interesting is that the bridge decodes one memory range and 
>> one
>> I/O range.
>>
>> Looking at pci_cfg_restore() I wonder if it is wise to restore PCIR_COMMAND 
>> so
>> early.  Could it be that after the resume the bridge is configured with a 
>> wrong
>> I/O range (e.g., too wide) and by writing PCIR_COMMAND we enable that 
>> decoding.
>>  So, the bridge steals I/O cycles destined for ACPI support hardware.  If 
>> there
>> is nothing behind the bridge to handle those ports, then we get those bad 
>> readings.
>> Once the bridge configuration is fully restored, the I/O handling goes back 
>> to
>> normal.
> 
> From what I see, this looks like a BIOS bug.
> Upon resume, it swaps window configurations of pcib1 and pcib2 (until FreeBSD
> restores them).  pcib1 originally does not have an I/O window.  So, BIOS
> programs both base and limit of pcib2 I/O window to zero.   When FreeBSD 
> writes
> its command register to enable I/O decoding it starts claiming 0x0 - 0xFFF I/O
> port range.  That covers the ACPI ports at 0x8xx.
> 
> Some printf-s.
> From (verbose) boot time:
> pcib1:   domain0
> pcib1:   secondary bus 1
> pcib1:   subordinate bus   1
> pcib1:   memory decode 0xfea0-0xfeaf
> pcib2:   domain0
> pcib2:   secondary bus 2
> pcib2:   subordinate bus   2
> pcib2:   I/O decode0xf000-0x
> pcib2:   memory decode 0xfe90-0xfe9f
> 
> My printf-s from resume time:
> pcib1: old I/O base (low): 0xf1
> pcib1: old I/O base (high): 0x0
> pcib1: old I/O limit (low): 0x1
> pcib1: old I/O limit (high): 0x0
> pcib2: old I/O base (low): 0x1
> pcib2: old I/O base (high): 0x0
> pcib2: old I/O limit (low): 0x1
> pcib2: old I/O limit (high): 0x0

The "solution" I think is to have resume be multi-pass and to resume all the 
bridges
first before trying to resume leaf devices (including timers), but that's a 
fair bit
of work.  It might be that we just need to resume timer interrupts later after 
the
new-bus resume (I think we currently do it before?), though the reason for that 
was
to allow resume methods in devices to sleep (I'm not sure if any do).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread Andriy Gapon
On 27/05/2020 11:13, Andriy Gapon wrote:
> I added more diagnostics and it seems to support the idea that the problem is
> related to I/O cycles and bridges.
> 
> ACPI timer suddenly starts returning 0x and that lasts for tens of
> microseconds before the timer goes back to returning normal values with an
> expected increase.
> AMD provides a proprietary way to access ACPI registers via MMIO (0xfed808xx).
> That mechanism is unaffected, ACPI timer register always returns good values.
> 
> The problem seems to happen when restoring configuration of a particular PCI
> bridge.  What's interesting is that the bridge decodes one memory range and 
> one
> I/O range.
> 
> Looking at pci_cfg_restore() I wonder if it is wise to restore PCIR_COMMAND so
> early.  Could it be that after the resume the bridge is configured with a 
> wrong
> I/O range (e.g., too wide) and by writing PCIR_COMMAND we enable that 
> decoding.
>  So, the bridge steals I/O cycles destined for ACPI support hardware.  If 
> there
> is nothing behind the bridge to handle those ports, then we get those bad 
> readings.
> Once the bridge configuration is fully restored, the I/O handling goes back to
> normal.

>From what I see, this looks like a BIOS bug.
Upon resume, it swaps window configurations of pcib1 and pcib2 (until FreeBSD
restores them).  pcib1 originally does not have an I/O window.  So, BIOS
programs both base and limit of pcib2 I/O window to zero.   When FreeBSD writes
its command register to enable I/O decoding it starts claiming 0x0 - 0xFFF I/O
port range.  That covers the ACPI ports at 0x8xx.

Some printf-s.
>From (verbose) boot time:
pcib1:   domain0
pcib1:   secondary bus 1
pcib1:   subordinate bus   1
pcib1:   memory decode 0xfea0-0xfeaf
pcib2:   domain0
pcib2:   secondary bus 2
pcib2:   subordinate bus   2
pcib2:   I/O decode0xf000-0x
pcib2:   memory decode 0xfe90-0xfe9f

My printf-s from resume time:
pcib1: old I/O base (low): 0xf1
pcib1: old I/O base (high): 0x0
pcib1: old I/O limit (low): 0x1
pcib1: old I/O limit (high): 0x0
pcib2: old I/O base (low): 0x1
pcib2: old I/O base (high): 0x0
pcib2: old I/O limit (low): 0x1
pcib2: old I/O limit (high): 0x0

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-27 Thread Andriy Gapon
On 27/05/2020 01:14, John Baldwin wrote:
> On 5/26/20 11:55 AM, Konstantin Belousov wrote:
>> On Tue, May 26, 2020 at 06:22:13PM +0300, Andriy Gapon wrote:
>>> I am not sure if this is just a coincidence but it appears as if a write to 
>>> some
>>> PCI configuration register could temporarily interfere with access to the PM
>>> timer I/O port.
>>> Is that plausible?
>> If something disabled a BAR, then typical response of x86 chipset for timed
>> out read from PCIe is 0xf... . 
> 
> And the ACPI timer might be "behind" the isab0 bridge device which would 
> indeed
> cause this.

I added more diagnostics and it seems to support the idea that the problem is
related to I/O cycles and bridges.

ACPI timer suddenly starts returning 0x and that lasts for tens of
microseconds before the timer goes back to returning normal values with an
expected increase.
AMD provides a proprietary way to access ACPI registers via MMIO (0xfed808xx).
That mechanism is unaffected, ACPI timer register always returns good values.

The problem seems to happen when restoring configuration of a particular PCI
bridge.  What's interesting is that the bridge decodes one memory range and one
I/O range.

Looking at pci_cfg_restore() I wonder if it is wise to restore PCIR_COMMAND so
early.  Could it be that after the resume the bridge is configured with a wrong
I/O range (e.g., too wide) and by writing PCIR_COMMAND we enable that decoding.
 So, the bridge steals I/O cycles destined for ACPI support hardware.  If there
is nothing behind the bridge to handle those ports, then we get those bad 
readings.
Once the bridge configuration is fully restored, the I/O handling goes back to
normal.

Is this possible?

P.S.
pci_cfg_restore() also attempts to restore PCIR_INTPIN, but it's read-only?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-26 Thread John Baldwin
On 5/26/20 11:55 AM, Konstantin Belousov wrote:
> On Tue, May 26, 2020 at 06:22:13PM +0300, Andriy Gapon wrote:
>> On 25/05/2020 11:37, Andriy Gapon wrote:
>>> Also, there is another issue related to atrtc.
>>> When I have both drivers attached, and also when I have only atrtc attached
>>> (efi.rt.disabled=1), system clock jumps 10 minutes forward after each 
>>> suspend /
>>> resume cycle (S0 -> S3 -> S0).  That does not happen for reboot and shutdown
>>> cycles.  I haven't investigated this deeper, but it is a curious problem.
>>
>> Actually, I was wrong.  The problem can also occur with efirtc alone.
>> Also, sometimes there is a different problem where there are no callouts for 
>> a
>> period of time on the order of minutes.  I tracked it to cc_lastscan being 
>> set
>> to a value greater than the current uptime.  So, any scheduled callout gets
>> scheduled at cc_lastscan and it is a while before the uptime catches up.
>>
>> It seemed that both issues were connected and were a result of the uptime
>> jumping forward by some minutes and then jumping back to a sane value.
>> If something important happened during the weird period, like getting time of
>> day from hardware or invoking a callout, it lead to the observed effects.
>>
>> So, that gave me some ideas where to add debugging checks.
>> What I determined is that ACPI timer (ACPI-fast) could produce a reading of 
>> all
>> 1-s like happens when there is no hardware response.
>>
>> I caught one such instance and got a stack trace for it (but no crash dump
>> because devices had not resumed yet):
>> tc_windup() at tc_windup+0x318/frame 0xfe00a7a19300
>> tc_ticktock() at tc_ticktock+0x4b/frame 0xfe00a7a19320
>> hardclock() at hardclock+0x107/frame 0xfe00a7a19360
>> handleevents() at handleevents+0xb3/frame 0xfe00a7a193a0
>> timercb() at timercb+0x196/frame 0xfe00a7a193f0
>> lapic_handle_timer() at lapic_handle_timer+0x98/frame 0xfe00a7a19420
>> Xtimerint() at Xtimerint+0xb1/frame 0xfe00a7a19420
>> --- interrupt, rip = 0x80b34500, rsp = 0xfe00a7a194f8, rbp =
>> 0xfe00a7a19540 ---
>> acpi_pcib_write_config() at acpi_pcib_write_config/frame 0xfe00a7a19540
>> pci_cfg_restore() at pci_cfg_restore+0x2cc/frame 0xfe00a7a195a0
>> pci_resume_child() at pci_resume_child+0xee/frame 0xfe00a7a195e0
>> pci_resume() at pci_resume+0x49/frame 0xfe00a7a19630
>> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
>> 0xfe00a7a19650
>> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19680
>> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
>> 0xfe00a7a196a0
>> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a196d0
>> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
>> 0xfe00a7a196f0
>> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19720
>> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
>> 0xfe00a7a19740
>> root_resume() at root_resume+0x29/frame 0xfe00a7a19770
>> acpi_EnterSleepState() at acpi_EnterSleepState+0x73b/frame 0xfe00a7a197f0
>> acpi_AckSleepState() at acpi_AckSleepState+0x144/frame 0xfe00a7a19820
>> devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfe00a7a19870
>> vn_ioctl() at vn_ioctl+0x132/frame 0xfe00a7a19980
>> devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfe00a7a199a0
>> kern_ioctl() at kern_ioctl+0x27b/frame 0xfe00a7a19a00
>> sys_ioctl() at sys_ioctl+0x123/frame 0xfe00a7a19ad0
>> amd64_syscall() at amd64_syscall+0x140/frame 0xfe00a7a19bf0
>> fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe00a7a19bf0
>>
>> I am not sure if this is just a coincidence but it appears as if a write to 
>> some
>> PCI configuration register could temporarily interfere with access to the PM
>> timer I/O port.
>> Is that plausible?
> If something disabled a BAR, then typical response of x86 chipset for timed
> out read from PCIe is 0xf... . 

And the ACPI timer might be "behind" the isab0 bridge device which would indeed
cause this.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-26 Thread Konstantin Belousov
On Tue, May 26, 2020 at 06:22:13PM +0300, Andriy Gapon wrote:
> On 25/05/2020 11:37, Andriy Gapon wrote:
> > Also, there is another issue related to atrtc.
> > When I have both drivers attached, and also when I have only atrtc attached
> > (efi.rt.disabled=1), system clock jumps 10 minutes forward after each 
> > suspend /
> > resume cycle (S0 -> S3 -> S0).  That does not happen for reboot and shutdown
> > cycles.  I haven't investigated this deeper, but it is a curious problem.
> 
> Actually, I was wrong.  The problem can also occur with efirtc alone.
> Also, sometimes there is a different problem where there are no callouts for a
> period of time on the order of minutes.  I tracked it to cc_lastscan being set
> to a value greater than the current uptime.  So, any scheduled callout gets
> scheduled at cc_lastscan and it is a while before the uptime catches up.
> 
> It seemed that both issues were connected and were a result of the uptime
> jumping forward by some minutes and then jumping back to a sane value.
> If something important happened during the weird period, like getting time of
> day from hardware or invoking a callout, it lead to the observed effects.
> 
> So, that gave me some ideas where to add debugging checks.
> What I determined is that ACPI timer (ACPI-fast) could produce a reading of 
> all
> 1-s like happens when there is no hardware response.
> 
> I caught one such instance and got a stack trace for it (but no crash dump
> because devices had not resumed yet):
> tc_windup() at tc_windup+0x318/frame 0xfe00a7a19300
> tc_ticktock() at tc_ticktock+0x4b/frame 0xfe00a7a19320
> hardclock() at hardclock+0x107/frame 0xfe00a7a19360
> handleevents() at handleevents+0xb3/frame 0xfe00a7a193a0
> timercb() at timercb+0x196/frame 0xfe00a7a193f0
> lapic_handle_timer() at lapic_handle_timer+0x98/frame 0xfe00a7a19420
> Xtimerint() at Xtimerint+0xb1/frame 0xfe00a7a19420
> --- interrupt, rip = 0x80b34500, rsp = 0xfe00a7a194f8, rbp =
> 0xfe00a7a19540 ---
> acpi_pcib_write_config() at acpi_pcib_write_config/frame 0xfe00a7a19540
> pci_cfg_restore() at pci_cfg_restore+0x2cc/frame 0xfe00a7a195a0
> pci_resume_child() at pci_resume_child+0xee/frame 0xfe00a7a195e0
> pci_resume() at pci_resume+0x49/frame 0xfe00a7a19630
> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
> 0xfe00a7a19650
> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19680
> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
> 0xfe00a7a196a0
> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a196d0
> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
> 0xfe00a7a196f0
> bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19720
> bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
> 0xfe00a7a19740
> root_resume() at root_resume+0x29/frame 0xfe00a7a19770
> acpi_EnterSleepState() at acpi_EnterSleepState+0x73b/frame 0xfe00a7a197f0
> acpi_AckSleepState() at acpi_AckSleepState+0x144/frame 0xfe00a7a19820
> devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfe00a7a19870
> vn_ioctl() at vn_ioctl+0x132/frame 0xfe00a7a19980
> devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfe00a7a199a0
> kern_ioctl() at kern_ioctl+0x27b/frame 0xfe00a7a19a00
> sys_ioctl() at sys_ioctl+0x123/frame 0xfe00a7a19ad0
> amd64_syscall() at amd64_syscall+0x140/frame 0xfe00a7a19bf0
> fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe00a7a19bf0
> 
> I am not sure if this is just a coincidence but it appears as if a write to 
> some
> PCI configuration register could temporarily interfere with access to the PM
> timer I/O port.
> Is that plausible?
If something disabled a BAR, then typical response of x86 chipset for timed
out read from PCIe is 0xf... . 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


acpi timer reads all ones [Was: efirtc + atrtc at the same time]

2020-05-26 Thread Andriy Gapon
On 25/05/2020 11:37, Andriy Gapon wrote:
> Also, there is another issue related to atrtc.
> When I have both drivers attached, and also when I have only atrtc attached
> (efi.rt.disabled=1), system clock jumps 10 minutes forward after each suspend 
> /
> resume cycle (S0 -> S3 -> S0).  That does not happen for reboot and shutdown
> cycles.  I haven't investigated this deeper, but it is a curious problem.

Actually, I was wrong.  The problem can also occur with efirtc alone.
Also, sometimes there is a different problem where there are no callouts for a
period of time on the order of minutes.  I tracked it to cc_lastscan being set
to a value greater than the current uptime.  So, any scheduled callout gets
scheduled at cc_lastscan and it is a while before the uptime catches up.

It seemed that both issues were connected and were a result of the uptime
jumping forward by some minutes and then jumping back to a sane value.
If something important happened during the weird period, like getting time of
day from hardware or invoking a callout, it lead to the observed effects.

So, that gave me some ideas where to add debugging checks.
What I determined is that ACPI timer (ACPI-fast) could produce a reading of all
1-s like happens when there is no hardware response.

I caught one such instance and got a stack trace for it (but no crash dump
because devices had not resumed yet):
tc_windup() at tc_windup+0x318/frame 0xfe00a7a19300
tc_ticktock() at tc_ticktock+0x4b/frame 0xfe00a7a19320
hardclock() at hardclock+0x107/frame 0xfe00a7a19360
handleevents() at handleevents+0xb3/frame 0xfe00a7a193a0
timercb() at timercb+0x196/frame 0xfe00a7a193f0
lapic_handle_timer() at lapic_handle_timer+0x98/frame 0xfe00a7a19420
Xtimerint() at Xtimerint+0xb1/frame 0xfe00a7a19420
--- interrupt, rip = 0x80b34500, rsp = 0xfe00a7a194f8, rbp =
0xfe00a7a19540 ---
acpi_pcib_write_config() at acpi_pcib_write_config/frame 0xfe00a7a19540
pci_cfg_restore() at pci_cfg_restore+0x2cc/frame 0xfe00a7a195a0
pci_resume_child() at pci_resume_child+0xee/frame 0xfe00a7a195e0
pci_resume() at pci_resume+0x49/frame 0xfe00a7a19630
bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
0xfe00a7a19650
bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19680
bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
0xfe00a7a196a0
bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a196d0
bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
0xfe00a7a196f0
bus_generic_resume() at bus_generic_resume+0x29/frame 0xfe00a7a19720
bus_generic_resume_child() at bus_generic_resume_child+0x43/frame 
0xfe00a7a19740
root_resume() at root_resume+0x29/frame 0xfe00a7a19770
acpi_EnterSleepState() at acpi_EnterSleepState+0x73b/frame 0xfe00a7a197f0
acpi_AckSleepState() at acpi_AckSleepState+0x144/frame 0xfe00a7a19820
devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfe00a7a19870
vn_ioctl() at vn_ioctl+0x132/frame 0xfe00a7a19980
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfe00a7a199a0
kern_ioctl() at kern_ioctl+0x27b/frame 0xfe00a7a19a00
sys_ioctl() at sys_ioctl+0x123/frame 0xfe00a7a19ad0
amd64_syscall() at amd64_syscall+0x140/frame 0xfe00a7a19bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe00a7a19bf0

I am not sure if this is just a coincidence but it appears as if a write to some
PCI configuration register could temporarily interfere with access to the PM
timer I/O port.
Is that plausible?

I'll try to dig up more data.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: efirtc + atrtc at the same time

2020-05-26 Thread Ian Lepore
On Mon, 2020-05-25 at 11:37 +0300, Andriy Gapon wrote:
> I see that on my laptop both efirtc and atrtc get attached.
> The latter is via an ACPI attachment:
> efirtc0: 
> efirtc0: registered as a time-of-day clock, resolution 1.00s
> atrtc0:  port 0x70-0x71 on acpi0
> atrtc0: registered as a time-of-day clock, resolution 1.00s
> 
> I am not sure if this is a problem by itself, but it certainly seems redundant
> to have two drivers controlling the same(?) hardware via different platform
> mechanisms.
> Maybe there is a nice way to automatically disable (or "neutralize") one of 
> the
> drivers?
> 

I thought I had done something long ago to prevent atrtc and efirtc
from both attaching, but apparently not.  I intended to, I even
mentioned it in https://reviews.freebsd.org/D14399 but it looks like I
never followed up and did the work.

> Also, there is another issue related to atrtc.
> When I have both drivers attached, and also when I have only atrtc attached
> (efi.rt.disabled=1), system clock jumps 10 minutes forward after each suspend 
> /
> resume cycle (S0 -> S3 -> S0).  That does not happen for reboot and shutdown
> cycles.  I haven't investigated this deeper, but it is a curious problem.
> 

I've looked at the code for messing with the clock around
suspend/resume and never felt like it was doing the right thing (or
even anything useful).  But I've never owned a freebsd machine that
could successfully resume from suspend, so I've never been able to
experiment with it.

-- Ian


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


efirtc + atrtc at the same time

2020-05-25 Thread Andriy Gapon


I see that on my laptop both efirtc and atrtc get attached.
The latter is via an ACPI attachment:
efirtc0: 
efirtc0: registered as a time-of-day clock, resolution 1.00s
atrtc0:  port 0x70-0x71 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.00s

I am not sure if this is a problem by itself, but it certainly seems redundant
to have two drivers controlling the same(?) hardware via different platform
mechanisms.
Maybe there is a nice way to automatically disable (or "neutralize") one of the
drivers?

Also, there is another issue related to atrtc.
When I have both drivers attached, and also when I have only atrtc attached
(efi.rt.disabled=1), system clock jumps 10 minutes forward after each suspend /
resume cycle (S0 -> S3 -> S0).  That does not happen for reboot and shutdown
cycles.  I haven't investigated this deeper, but it is a curious problem.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"