Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Linus Torvalds wrote:

> Can somebody tell which softirq it is that dvb/usb cares about?

I don't know about the DVB part.  The USB part is a little difficult to
analyze, mostly because the bug reports I've seen are mostly from
people running non-vanilla kernels.  For example, Josef is using a
Raspberry Pi 3B with a non-standard USB host controller driver:
dwc_otg_hcd is built into raspbian in place of the normal dwc2_hsotg
driver.

Both dwc2_hsotg and ehci-hcd use the tasklets embedded in the 
giveback_urb_bh member of struct usb_hcd.  See usb_hcd_giveback_urb() 
in drivers/usb/core/hcd.c; the calls are

else if (high_prio_bh)
tasklet_hi_schedule(>bh);
else
tasklet_schedule(>bh);

As it turns out, high_prio_bh gets set for interrupt and isochronous
URBs but not for bulk and control URBs.  The DVB driver in question
uses bulk transfers.

xhci-hcd, on the other hand, does not use these tasklets (it doesn't
set the HCD_BH bit in the hc_driver's .flags member).

Alan Stern



Re: Aw: Re: Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Josef Griebichler wrote:

> No I can't sorry. There's no sat connection near to my workstation.

Can we ask the person who made this post:

https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965

to run the test?  The post says that the testing was done on an x86_64 
machine.

> Gesendet: Montag, 08. Januar 2018 um 17:31 Uhr
> Von: "Alan Stern" <st...@rowland.harvard.edu>
> An: "Josef Griebichler" <griebichler.jo...@gmx.at>
> Cc: "Mauro Carvalho Chehab" <mche...@s-opensource.com>, "Greg Kroah-Hartman" 
> <gre...@linuxfoundation.org>, linux-...@vger.kernel.org, "Eric Dumazet" 
> <eduma...@google.com>, "Rik van Riel" <r...@redhat.com>, "Paolo Abeni" 
> <pab...@redhat.com>, "Hannes Frederic Sowa" <han...@redhat.com>, "Jesper 
> Dangaard Brouer" <jbro...@redhat.com>, linux-kernel 
> <linux-ker...@vger.kernel.org>, netdev <netdev@vger.kernel.org>, "Jonathan 
> Corbet" <cor...@lwn.net>, LMML <linux-me...@vger.kernel.org>, "Peter 
> Zijlstra" <pet...@infradead.org>, "David Miller" <da...@davemloft.net>, 
> torva...@linux-foundation.org
> Betreff: Re: Aw: Re: dvb usb issues since kernel 4.9
> On Mon, 8 Jan 2018, Josef Griebichler wrote: > Hi Maro, > > I tried your 
> mentioned patch but unfortunately no real improvement for me. > dmesg 
> http://ix.io/DOg > tvheadend service log http://ix.io/DOi[http://ix.io/DOi] > 
> Errors during recording are still there. > Errors increase if there is 
> additional tcp load on raspberry. > > Unfortunately there's no usbmon or 
> tshark on libreelec so I can't provide further logs. Can you try running the 
> same test on an x86_64 system? Alan Stern

It appears that you are using a non-standard kernel.  The vanilla 
kernel does not include any "dwc_otg_hcd" driver.

Alan Stern



Re: Aw: Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Josef Griebichler wrote:

> Hi Maro,
> 
> I tried your mentioned patch but unfortunately no real improvement for me.
> dmesg http://ix.io/DOg
> tvheadend service log http://ix.io/DOi
> Errors during recording are still there.
> Errors increase if there is additional tcp load on raspberry.
> 
> Unfortunately there's no usbmon or tshark on libreelec so I can't provide 
> further logs.

Can you try running the same test on an x86_64 system?

Alan Stern



Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Mauro Carvalho Chehab wrote:

> > Let find the root-cause of this before reverting, as this will hurt the
> > networking use-case.
> > 
> > I want to see if the increase buffer will solve the issue (the current
> > buffer of 0.63 ms seem too small).
> 
> For TV, high latency has mainly two practical consequences:
> 
> 1) it increases the time to switch channels. MPEG-TS based transmissions
>usually takes some time to start showing the channel contents. Adding
>more buffers make it worse;
> 
> 2) specially when watching sports, a higher latency means that you'll know
>that your favorite team made a score when your neighbors start
>celebrating... seeing the actual event only after them.
> 
> So, the lower, the merrier, but I think that 5 ms would be acceptable.

That value 65 for the number of buffers was calculated based on a
misunderstanding of the actual bandwidth requirement.  Still increasing
the number of buffers shouldn't hurt, and it's worth trying.

But there is another misunderstanding here which needs to be cleared 
up.  Adding more buffers does _not_ increase latency; it increases 
capacity.  Making each buffer larger _would_ increase latency, but 
that's not what I proposed.

Going through this more explicitly...  Suppose you receive 8 KB of data
every ms, and suppose you have four 8-KB buffers.  Then the latency is
1 ms, because that's how long you have to wait for the first buffer to
be filled up after you submit an I/O request.  (The driver does _not_
need to wait for all four buffers to be filled before it can start
displaying the data in the first buffer.)  The capacity would be 4 ms,
because that's how much data your buffers can store.  If you end up
waiting longer than 4 ms before ksoftirqd gets around to processing any
of the data, then some data will inevitably get lost.

That's why the way to deal with the delays caused by deferring softirqs
to ksoftirqd is to add more buffers (and not make the buffers larger
than they already are).

> > I would also like to see experiments with adjusting adjust the sched
> > priority of the kthread's and/or the userspace prog. (e.g use command
> > like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ).
> 
> If this fixes the issue, we'll need to do something inside the Kernel
> to change the priority, as TV userspace apps should not run as root. Not
> sure where such change should be done (USB? media?).

It would be interesting to try this, but I agree that it's not likely 
to be a practical solution.  Anyway, shouldn't ksoftirqd already be 
running with very high priority?

> > Are we really sure that the regression is cause by 4cd13c21b207
> > ("softirq: Let ksoftirqd do its job"), the forum thread also report
> > that the problem is almost gone after commit 34f41c0316ed ("timers: Fix
> > overflow in get_next_timer_interrupt")
> >  https://git.kernel.org/torvalds/c/34f41c0316ed

That is a good point.  It's hard to see how the issues in the two 
commits could be related, but who knows?

> I'll see if I can mount a test scenario here in order to try reproduce
> the reported bug. I suspect that I won't be able to reproduce it on my
> "standard" i7core-based test machine, even with KPTI enabled.

If you're using the same sort of hardware as Josef, under similar 
circumstances, the buggy bahavior should be the same.  If not, there 
must be something else going on that we're not aware of.

> > It makes me suspicious that this fix changes things...
> > After this fix, I suspect that changing the sched priorities, will fix
> > the remaining glitches.
> > 
> > 
> > > It is hard to foresee the consequences of the softirq changes for other
> > > devices, though.  
> > 
> > Yes, it is hard to foresee, I can only cover networking.
> > 
> > For networking, if reverting this, we will (again) open the kernel for
> > an easy DDoS vector with UDP packets.  As mentioned in the commit desc,
> > before you could easily cause softirq to take all the CPU time from the
> > application, resulting in very low "good-put" in the UDP-app. (That's why
> > it was so easy to DDoS DNS servers before...)
> > 
> > With the softirqd patch in place, ksoftirqd is scheduled fairly between
> > other applications running on the same CPU.  But in some cases this is
> > not what you want, so as the also commit mentions, the admin can now
> > more easily tune process scheduling parameters if needed, to adjust for
> > such use-cases (it was not really an admin choice before).
> 
> Can't the ksoftirq patch be modified to only apply to the networking
> IRQ handling? That sounds less risky of affecting unrelated subsystems[1].

That might work.  Or

Re: dvb usb issues since kernel 4.9

2018-01-08 Thread Alan Stern
On Mon, 8 Jan 2018, Mauro Carvalho Chehab wrote:

> Em Sun, 7 Jan 2018 10:41:37 -0500 (EST)
> Alan Stern <st...@rowland.harvard.edu> escreveu:
> 
> > On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:
> > 
> > > > > It seems that the original patch were designed to solve some IRQ 
> > > > > issues
> > > > > with network cards with causes data losses on high traffic. However,
> > > > > it is also causing bad effects on sustained high bandwidth demands
> > > > > required by DVB cards, at least on some USB host drivers.
> > > > > 
> > > > > Alan/Greg/Eric/David:
> > > > > 
> > > > > Any ideas about how to fix it without causing regressions to
> > > > > network?
> > > > 
> > > > It would be good to know what hardware was involved on the x86 system
> > > > and to have some timing data.  Can we see the output from lsusb and
> > > > usbmon, running on a vanilla kernel that gets plenty of video glitches? 
> > > >  
> > > 
> > > From Josef's report, and from the BZ, the affected hardware seems
> > > to be based on Montage Technology M88DS3103/M88TS2022 chipset.  
> > 
> > What type of USB host controller does the x86_64 system use?  EHCI or
> > xHCI?
> 
> I'll let Josef answer this.
> 
> > 
> > > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> > > with shares a USB implementation that is used by a lot more drivers.
> > > The URB handling code is at:
> > > 
> > >   drivers/media/usb/dvb-usb-v2/usb_urb.c
> > > 
> > > This particular driver allocates 8 buffers with 4096 bytes each
> > > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
> > > 
> > > This become a popular USB hardware nowadays. I have one S960c
> > > myself, so I can send you the lsusb from it. You should notice, however,
> > > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> > > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> > > of payload after removing URB headers.  
> > 
> > You mentioned earlier that the driver uses bulk transfers.  In USB-2.0,
> > the maximum possible payload data transfer rate using bulk transfers is
> > 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s).  And
> > even this is possible only if almost nothing else is using the bus at
> > the same time.
> 
> No, I said 58 Mbits/s (not bytes).

Well, what you actually _wrote_ was "58 Mpps of payload" (see above),
and I couldn't tell how to interpret that.  :-)

58 Mb/s is obviously almost 8 times less than the full USB bus 
bandwidth.

> On DVB-C and DVB-S2 specs, AFAIKT, there's no hard limit for the maximum
> payload data rate, although industry seems to limit it to be around
> 60 Mbits/s. On those standards, the maximal bit rate is defined by the
> modulation type and by the channel symbol rate.
> 
> To give you a practical example, my DVB-S2 provider modulates each
> transponder with 8/PSK (3 bits/symbol), and define channels with a
> symbol rate of 30 Mbauds/s. So, it could, theoretically, transport
> a MPEG-TS stream up to 90 Mbits/s (minus headers and guard intervals).
> In practice, the streams there are transmitted with 58,026.5 Kbits/s.

Okay.  This is 58 Kb/ms or 7.25 KB/ms.  So your scheme of eight 4-KB 
buffers gives a latency of 0.57 ms with a total capacity of 4.5 ms, 
which is a lot better than what I was thinking.

> > In any case, you might be able to attack the problem simply by using
> > more than 8 buffers.  With just eight 4096-byte buffers, the total
> > pipeline capacity is only about 0.62 ms (at the maximum possible
> > transfer rate).  Increasing the number of buffers to 65 would give a
> > capacity of 5 ms, which is probably a lot better suited for situations
> > where completions are handled by the ksoftirqd thread.
> 
> Increasing it to 65 shouldn't be hard. Not sure, however, if the hardware
> will actually fill the 65 buffers, but it is worth to try.

Given the new information, 65 would be overkill.  But going from 8 to 
16 might help.

> > > Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> > > in order to revert the kernel logic to prioritize latency instead of
> > > throughput.  
> > 
> > It can't be done without pervasive changes to the USB subsystem, which
> > I would greatly prefer to avoid.  Besides, this wouldn't really solve
> > the problem.  Decreasing the latency for one device will cause it to be
> > increased for others.
> 
> If there is a TV streaming traffic at a USB bus, it means that the
> user wants to either watch and/or record a TV program. On such
> usecase scenario, a low latency is highly desired for the TV capture
> (and display, if the GPU is USB), even it means a higher latency for
> other traffic.

Not if the other traffic is also a TV capture.  :-)

It might make sense to classify softirq sources as "high priority" or 
"low priority", and only defer the "low priority" work to ksoftirqd.

Alan Stern



Re: dvb usb issues since kernel 4.9

2018-01-07 Thread Alan Stern
On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:

> > > It seems that the original patch were designed to solve some IRQ issues
> > > with network cards with causes data losses on high traffic. However,
> > > it is also causing bad effects on sustained high bandwidth demands
> > > required by DVB cards, at least on some USB host drivers.
> > > 
> > > Alan/Greg/Eric/David:
> > > 
> > > Any ideas about how to fix it without causing regressions to
> > > network?  
> > 
> > It would be good to know what hardware was involved on the x86 system
> > and to have some timing data.  Can we see the output from lsusb and
> > usbmon, running on a vanilla kernel that gets plenty of video glitches?
> 
> From Josef's report, and from the BZ, the affected hardware seems
> to be based on Montage Technology M88DS3103/M88TS2022 chipset.

What type of USB host controller does the x86_64 system use?  EHCI or
xHCI?

> The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> with shares a USB implementation that is used by a lot more drivers.
> The URB handling code is at:
> 
>   drivers/media/usb/dvb-usb-v2/usb_urb.c
> 
> This particular driver allocates 8 buffers with 4096 bytes each
> for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
> 
> This become a popular USB hardware nowadays. I have one S960c
> myself, so I can send you the lsusb from it. You should notice, however,
> that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> of payload after removing URB headers.

You mentioned earlier that the driver uses bulk transfers.  In USB-2.0,
the maximum possible payload data transfer rate using bulk transfers is
53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s).  And
even this is possible only if almost nothing else is using the bus at
the same time.

> A 10 minutes record with the
> entire data (with typically contains 5-10 channels) can easily go
> above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if
> a usbmon dump would be useful.

It might not be helpful at all.  However, I'm not interested in the 
payload data (which would be unintelligible to me anyway) but rather 
the timing of URB submissions and completions.  A usbmon trace which 
didn't keep much of the payload data would only require on the order of 
50 MB per minute -- and Josef said that glitches usually would show up 
within a minute or so.

> I'm enclosing the lsusb from a S960C device, with is based on those
> Montage chipsets:

What I wanted to see was the output from "lsusb" on the affected
system, not the output from "lsusb -v -s B:D" on your system.

> > Overall, this may be a very difficult problem to solve.  The
> > 4cd13c21b207 commit was intended to improve throughput at the cost of
> > increased latency.  But then what do you do when the latency becomes
> > too high for the video subsystem to handle?
> 
> Latency can't be too high, otherwise frames will be dropped.

Yes, that's the whole point.

> Even if the Kernel itself doesn't drop, if the delay goes higher
> than a certain threshold, userspace will need to drop, as it
> should be presenting audio and video on real time. Yet, typically,
> userspace will delay it by one or two seconds, with would mean
> 1500-3500 buffers, with I suspect it is a lot more than the hardware
> limits. So I suspect that the hardware starves free buffers a way 
> before userspace, as media hardware don't have unlimited buffers
> inside them, as they assume that the Kernel/userspace will be fast
> enough to sustain bit rates up to 66 Mbps of payload.

The timing information would tell us how large the latency is.

In any case, you might be able to attack the problem simply by using
more than 8 buffers.  With just eight 4096-byte buffers, the total
pipeline capacity is only about 0.62 ms (at the maximum possible
transfer rate).  Increasing the number of buffers to 65 would give a
capacity of 5 ms, which is probably a lot better suited for situations
where completions are handled by the ksoftirqd thread.

> Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> in order to revert the kernel logic to prioritize latency instead of
> throughput.

It can't be done without pervasive changes to the USB subsystem, which
I would greatly prefer to avoid.  Besides, this wouldn't really solve
the problem.  Decreasing the latency for one device will cause it to be
increased for others.

Alan Stern



Re: dvb usb issues since kernel 4.9

2018-01-06 Thread Alan Stern
On Sat, 6 Jan 2018, Mauro Carvalho Chehab wrote:

> Hi Josef,
> 
> Em Sat, 6 Jan 2018 16:04:16 +0100
> "Josef Griebichler" <griebichler.jo...@gmx.at> escreveu:
> 
> > Hi,
> > 
> > the causing commit has been identified.
> > After reverting commit 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> > its working again.
> 
> Just replying to me won't magically fix this. The ones that were involved on
> this patch should also be c/c, plus USB people. Just added them.
> 
> > Please have a look into the thread 
> > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13
> > here are several users aknowledging the revert solves their issues with usb 
> > dvb cards.
> 
> I read the entire (long) thread there. In order to make easier for the
> others, from what I understand, the problem happens on both x86 and arm,
> although almost all comments there are mentioning tests with raspbian
> Kernel (with uses a different USB host driver than the upstream one).
> 
> It happens when watching digital TV DVB-C channels, with usually means
> a sustained bit rate of 11 MBps to 54 MBps.
> 
> The reports mention the dvbsky, with uses USB URB bulk transfers.
> On every several minutes (5 to 10 mins), the stream suffer "glitches"
> caused by frame losses.
> 
> The part of the thread that contains the bisect is at:
>   
> https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965
> 
> It indirectly mentions another comment on the thread with points
> to:
>   https://github.com/raspberrypi/linux/issues/2134
> 
> There, it says that this fix part of the issues:
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4
> 
> but it affects URB packet losses on a lesser extend.
> 
> The main issue is really the logic changes a the core softirq logic.
> 
> Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted
> fixed the issue. 
> 
> Joseph, is the above right? Anything else to mention? Does the
> same issue affect also on x86 with vanilla Kernel 4.14.10?
> 
> -
> 
> It seems that the original patch were designed to solve some IRQ issues
> with network cards with causes data losses on high traffic. However,
> it is also causing bad effects on sustained high bandwidth demands
> required by DVB cards, at least on some USB host drivers.
> 
> Alan/Greg/Eric/David:
> 
> Any ideas about how to fix it without causing regressions to
> network?

It would be good to know what hardware was involved on the x86 system
and to have some timing data.  Can we see the output from lsusb and
usbmon, running on a vanilla kernel that gets plenty of video glitches?

Overall, this may be a very difficult problem to solve.  The
4cd13c21b207 commit was intended to improve throughput at the cost of
increased latency.  But then what do you do when the latency becomes
too high for the video subsystem to handle?

Alan Stern



Re: [PATCH v2 0/9] Remove spin_unlock_wait()

2017-07-08 Thread Alan Stern
Pardon me for barging in, but I found this whole interchange extremely 
confusing...

On Sat, 8 Jul 2017, Ingo Molnar wrote:

> * Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote:
> 
> > On Sat, Jul 08, 2017 at 10:35:43AM +0200, Ingo Molnar wrote:
> > > 
> > > * Manfred Spraul <manf...@colorfullife.com> wrote:
> > > 
> > > > Hi Ingo,
> > > > 
> > > > On 07/07/2017 10:31 AM, Ingo Molnar wrote:
> > > > > 
> > > > > There's another, probably just as significant advantage: 
> > > > > queued_spin_unlock_wait()
> > > > > is 'read-only', while spin_lock()+spin_unlock() dirties the lock 
> > > > > cache line. On
> > > > > any bigger system this should make a very measurable difference - if
> > > > > spin_unlock_wait() is ever used in a performance critical code path.
> > > > At least for ipc/sem:
> > > > Dirtying the cacheline (in the slow path) allows to remove a smp_mb() 
> > > > in the
> > > > hot path.
> > > > So for sem_lock(), I either need a primitive that dirties the cacheline 
> > > > or
> > > > sem_lock() must continue to use spin_lock()/spin_unlock().

This statement doesn't seem to make sense.  Did Manfred mean to write 
"smp_mb()" instead of "spin_lock()/spin_unlock()"?

> > > Technically you could use spin_trylock()+spin_unlock() and avoid the lock 
> > > acquire 
> > > spinning on spin_unlock() and get very close to the slow path performance 
> > > of a 
> > > pure cacheline-dirtying behavior.

This is even more confusing.  Did Ingo mean to suggest using 
"spin_trylock()+spin_unlock()" in place of "spin_lock()+spin_unlock()" 
could provide the desired ordering guarantee without delaying other 
CPUs that may try to acquire the lock?  That seems highly questionable.

> > > But adding something like spin_barrier(), which purely dirties the lock 
> > > cacheline, 
> > > would be even faster, right?
> > 
> > Interestingly enough, the arm64 and powerpc implementations of
> > spin_unlock_wait() were very close to what it sounds like you are
> > describing.
> 
> So could we perhaps solve all our problems by defining the generic version 
> thusly:
> 
> void spin_unlock_wait(spinlock_t *lock)
> {
>   if (spin_trylock(lock))
>   spin_unlock(lock);
> }

How could this possibly be a generic version of spin_unlock_wait()?  
It does nothing at all (with no ordering properties) if some other CPU
currently holds the lock, whereas the real spin_unlock_wait() would
wait until the other CPU released the lock (or possibly longer).

And if no other CPU currently holds the lock, this has exactly the same
performance properties as spin_lock()+spin_unlock(), so what's the
advantage?

Alan Stern

> ... and perhaps rename it to spin_barrier() [or whatever proper name there 
> would 
> be]?
> 
> Architectures can still optimize it, to remove the small window where the 
> lock is 
> held locally - as long as the ordering is at least as strong as the generic 
> version.
> 
> This would have various advantages:
> 
>  - semantics are well-defined
> 
>  - the generic implementation is already pretty well optimized (no spinning)
> 
>  - it would make it usable for the IPC performance optimization
> 
>  - architectures could still optimize it to eliminate the window where the 
> lock is
>held locally - if there's such instructions available.
> 
> Was this proposed before, or am I missing something?
> 
> Thanks,
> 
>   Ingo



Re: [PATCH v2 0/9] Remove spin_unlock_wait()

2017-07-06 Thread Alan Stern
On Thu, 6 Jul 2017, Peter Zijlstra wrote:

> On Thu, Jul 06, 2017 at 12:49:12PM -0400, Alan Stern wrote:
> > On Thu, 6 Jul 2017, Paul E. McKenney wrote:
> > 
> > > On Thu, Jul 06, 2017 at 06:10:47PM +0200, Peter Zijlstra wrote:
> > > > On Thu, Jul 06, 2017 at 08:21:10AM -0700, Paul E. McKenney wrote:
> > > > > And yes, there are architecture-specific optimizations for an
> > > > > empty spin_lock()/spin_unlock() critical section, and the current
> > > > > arch_spin_unlock_wait() implementations show some of these 
> > > > > optimizations.
> > > > > But I expect that performance benefits would need to be demonstrated 
> > > > > at
> > > > > the system level.
> > > > 
> > > > I do in fact contended there are any optimizations for the exact
> > > > lock+unlock semantics.
> > > 
> > > You lost me on this one.
> > > 
> > > > The current spin_unlock_wait() is weaker. Most notably it will not (with
> > > > exception of ARM64/PPC for other reasons) cause waits on other CPUs.
> > > 
> > > Agreed, weaker semantics allow more optimizations.  So use cases needing
> > > only the weaker semantics should more readily show performance benefits.
> > > But either way, we need compelling use cases, and I do not believe that
> > > any of the existing spin_unlock_wait() calls are compelling.  Perhaps I
> > > am confused, but I am not seeing it for any of them.
> > 
> > If somebody really wants the full spin_unlock_wait semantics and
> > doesn't want to interfere with other CPUs, wouldn't synchronize_sched()
> > or something similar do the job?  It wouldn't be as efficient as
> > lock+unlock, but it also wouldn't affect other CPUs.
> 
> So please don't do that. That'll create massive pain for RT. Also I
> don't think it works. The whole point was that spin_unlock_wait() is
> _cheaper_ than lock()+unlock(). If it gets to be more expensive there is
> absolutely no point in using it.

Of course; that is obvious.

I was making a rhetorical point: You should not try to justify
spin_unlock_wait() on the basis that it doesn't cause waits on other
CPUs.

Alan Stern



Re: [PATCH v2 0/9] Remove spin_unlock_wait()

2017-07-06 Thread Alan Stern
On Thu, 6 Jul 2017, Paul E. McKenney wrote:

> On Thu, Jul 06, 2017 at 06:10:47PM +0200, Peter Zijlstra wrote:
> > On Thu, Jul 06, 2017 at 08:21:10AM -0700, Paul E. McKenney wrote:
> > > And yes, there are architecture-specific optimizations for an
> > > empty spin_lock()/spin_unlock() critical section, and the current
> > > arch_spin_unlock_wait() implementations show some of these optimizations.
> > > But I expect that performance benefits would need to be demonstrated at
> > > the system level.
> > 
> > I do in fact contended there are any optimizations for the exact
> > lock+unlock semantics.
> 
> You lost me on this one.
> 
> > The current spin_unlock_wait() is weaker. Most notably it will not (with
> > exception of ARM64/PPC for other reasons) cause waits on other CPUs.
> 
> Agreed, weaker semantics allow more optimizations.  So use cases needing
> only the weaker semantics should more readily show performance benefits.
> But either way, we need compelling use cases, and I do not believe that
> any of the existing spin_unlock_wait() calls are compelling.  Perhaps I
> am confused, but I am not seeing it for any of them.

If somebody really wants the full spin_unlock_wait semantics and
doesn't want to interfere with other CPUs, wouldn't synchronize_sched()
or something similar do the job?  It wouldn't be as efficient as
lock+unlock, but it also wouldn't affect other CPUs.

Alan Stern



Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair

2017-07-03 Thread Alan Stern
On Mon, 3 Jul 2017, Paul E. McKenney wrote:

> On Mon, Jul 03, 2017 at 10:39:49AM -0400, Alan Stern wrote:
> > On Sat, 1 Jul 2017, Manfred Spraul wrote:
> > 
> > > As we want to remove spin_unlock_wait() and replace it with explicit
> > > spin_lock()/spin_unlock() calls, we can use this to simplify the
> > > locking.
> > > 
> > > In addition:
> > > - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
> > > - The new code avoids the backwards loop.
> > > 
> > > Only slightly tested, I did not manage to trigger calls to
> > > nf_conntrack_all_lock().
> > > 
> > > Fixes: b16c29191dc8
> > > Signed-off-by: Manfred Spraul <manf...@colorfullife.com>
> > > Cc: <sta...@vger.kernel.org>
> > > Cc: Sasha Levin <sasha.le...@oracle.com>
> > > Cc: Pablo Neira Ayuso <pa...@netfilter.org>
> > > Cc: netfilter-de...@vger.kernel.org
> > > ---
> > >  net/netfilter/nf_conntrack_core.c | 44 
> > > +--
> > >  1 file changed, 24 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/net/netfilter/nf_conntrack_core.c 
> > > b/net/netfilter/nf_conntrack_core.c
> > > index e847dba..1193565 100644
> > > --- a/net/netfilter/nf_conntrack_core.c
> > > +++ b/net/netfilter/nf_conntrack_core.c
> > > @@ -96,19 +96,24 @@ static struct conntrack_gc_work conntrack_gc_work;
> > >  
> > >  void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
> > >  {
> > > + /* 1) Acquire the lock */
> > >   spin_lock(lock);
> > > - while (unlikely(nf_conntrack_locks_all)) {
> > > - spin_unlock(lock);
> > >  
> > > - /*
> > > -  * Order the 'nf_conntrack_locks_all' load vs. the
> > > -  * spin_unlock_wait() loads below, to ensure
> > > -  * that 'nf_conntrack_locks_all_lock' is indeed held:
> > > -  */
> > > - smp_rmb(); /* spin_lock(_conntrack_locks_all_lock) */
> > > - spin_unlock_wait(_conntrack_locks_all_lock);
> > > - spin_lock(lock);
> > > - }
> > > + /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */
> > > + if (likely(smp_load_acquire(_conntrack_locks_all) == false))
> > > + return;
> > 
> > As far as I can tell, this read does not need to have ACQUIRE
> > semantics.
> > 
> > You need to guarantee that two things can never happen:
> > 
> > (1) We read nf_conntrack_locks_all == false, and this routine's
> > critical section for nf_conntrack_locks[i] runs after the
> > (empty) critical section for that lock in 
> > nf_conntrack_all_lock().
> > 
> > (2) We read nf_conntrack_locks_all == true, and this routine's 
> > critical section for nf_conntrack_locks_all_lock runs before 
> > the critical section in nf_conntrack_all_lock().
> > 
> > In fact, neither one can happen even if smp_load_acquire() is replaced
> > with READ_ONCE().  The reason is simple enough, using this property of
> > spinlocks:
> > 
> > If critical section CS1 runs before critical section CS2 (for 
> > the same lock) then: (a) every write coming before CS1's
> > spin_unlock() will be visible to any read coming after CS2's
> > spin_lock(), and (b) no write coming after CS2's spin_lock()
> > will be visible to any read coming before CS1's spin_unlock().
> > 
> > Thus for (1), assuming the critical sections run in the order mentioned
> > above, since nf_conntrack_all_lock() writes to nf_conntrack_locks_all
> > before releasing nf_conntrack_locks[i], and since nf_conntrack_lock()
> > acquires nf_conntrack_locks[i] before reading nf_conntrack_locks_all,
> > by (a) the read will always see the write.
> > 
> > Similarly for (2), since nf_conntrack_all_lock() acquires 
> > nf_conntrack_locks_all_lock before writing to nf_conntrack_locks_all, 
> > and since nf_conntrack_lock() reads nf_conntrack_locks_all before 
> > releasing nf_conntrack_locks_all_lock, by (b) the read cannot see the 
> > write.
> 
> And the Linux kernel memory model (https://lwn.net/Articles/718628/
> and https://lwn.net/Articles/720550/) agrees with Alan.  Here is
> a litmus test, which emulates spin_lock() with xchg_acquire() and
> spin_unlock() with smp_store_release():
> 
> 
> 
> C C-ManfredSpraul-L1G1xchgnr.litmus
> 
> (* Expected result: Never.  *)
> 
>

Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair

2017-07-03 Thread Alan Stern
On Mon, 3 Jul 2017, Manfred Spraul wrote:

> >>> + /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */
> >>> + if (likely(smp_load_acquire(_conntrack_locks_all) == false))
> >>> + return;
> >> As far as I can tell, this read does not need to have ACQUIRE
> >> semantics.
> >>
> >> You need to guarantee that two things can never happen:
> >>
> >>  (1) We read nf_conntrack_locks_all == false, and this routine's
> >>critical section for nf_conntrack_locks[i] runs after the
> >>(empty) critical section for that lock in
> >>nf_conntrack_all_lock().
> >>
> >>  (2) We read nf_conntrack_locks_all == true, and this routine's
> >>critical section for nf_conntrack_locks_all_lock runs before
> >>the critical section in nf_conntrack_all_lock().
> I was looking at nf_conntrack_all_unlock:
> There is a smp_store_release() - which memory barrier does this pair with?
> 
> nf_conntrack_all_unlock()
>  
>  smp_store_release(a, false)
>  spin_unlock(b);
> 
> nf_conntrack_lock()
>  spin_lock(c);
>  xx=read_once(a)
>  if (xx==false)
>  return
>  

Ah, I see your point.  Yes, I did wonder about what would happen when
nf_conntrack_locks_all was set back to false.  But I didn't think about
it any further, because the relevant code wasn't in your patch.

> I tried to pair the memory barriers:
> nf_conntrack_all_unlock() contains a smp_store_release().
> What does that pair with?

You are right, this does need to be smp_load_acquire() after all.  
Perhaps the preceding comment should mention that it pairs with the 
smp_store_release() from an earlier invocation of 
nf_conntrack_all_unlock().

(Alternatively, you could make nf_conntrack_all_unlock() do a
lock+unlock on all the locks in the array, just like
nf_conntrack_all_lock().  But of course, that would be a lot less
efficient.)

Alan Stern



Re: [PATCH RFC 01/26] netfilter: Replace spin_unlock_wait() with lock/unlock pair

2017-07-03 Thread Alan Stern
On Sat, 1 Jul 2017, Manfred Spraul wrote:

> As we want to remove spin_unlock_wait() and replace it with explicit
> spin_lock()/spin_unlock() calls, we can use this to simplify the
> locking.
> 
> In addition:
> - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering.
> - The new code avoids the backwards loop.
> 
> Only slightly tested, I did not manage to trigger calls to
> nf_conntrack_all_lock().
> 
> Fixes: b16c29191dc8
> Signed-off-by: Manfred Spraul <manf...@colorfullife.com>
> Cc: <sta...@vger.kernel.org>
> Cc: Sasha Levin <sasha.le...@oracle.com>
> Cc: Pablo Neira Ayuso <pa...@netfilter.org>
> Cc: netfilter-de...@vger.kernel.org
> ---
>  net/netfilter/nf_conntrack_core.c | 44 
> +--
>  1 file changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index e847dba..1193565 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -96,19 +96,24 @@ static struct conntrack_gc_work conntrack_gc_work;
>  
>  void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
>  {
> + /* 1) Acquire the lock */
>   spin_lock(lock);
> - while (unlikely(nf_conntrack_locks_all)) {
> - spin_unlock(lock);
>  
> - /*
> -  * Order the 'nf_conntrack_locks_all' load vs. the
> -  * spin_unlock_wait() loads below, to ensure
> -  * that 'nf_conntrack_locks_all_lock' is indeed held:
> -  */
> - smp_rmb(); /* spin_lock(_conntrack_locks_all_lock) */
> - spin_unlock_wait(_conntrack_locks_all_lock);
> - spin_lock(lock);
> - }
> + /* 2) read nf_conntrack_locks_all, with ACQUIRE semantics */
> + if (likely(smp_load_acquire(_conntrack_locks_all) == false))
> + return;

As far as I can tell, this read does not need to have ACQUIRE
semantics.

You need to guarantee that two things can never happen:

(1) We read nf_conntrack_locks_all == false, and this routine's
critical section for nf_conntrack_locks[i] runs after the
(empty) critical section for that lock in 
nf_conntrack_all_lock().

(2) We read nf_conntrack_locks_all == true, and this routine's 
critical section for nf_conntrack_locks_all_lock runs before 
the critical section in nf_conntrack_all_lock().

In fact, neither one can happen even if smp_load_acquire() is replaced
with READ_ONCE().  The reason is simple enough, using this property of
spinlocks:

If critical section CS1 runs before critical section CS2 (for 
the same lock) then: (a) every write coming before CS1's
spin_unlock() will be visible to any read coming after CS2's
spin_lock(), and (b) no write coming after CS2's spin_lock()
will be visible to any read coming before CS1's spin_unlock().

Thus for (1), assuming the critical sections run in the order mentioned
above, since nf_conntrack_all_lock() writes to nf_conntrack_locks_all
before releasing nf_conntrack_locks[i], and since nf_conntrack_lock()
acquires nf_conntrack_locks[i] before reading nf_conntrack_locks_all,
by (a) the read will always see the write.

Similarly for (2), since nf_conntrack_all_lock() acquires 
nf_conntrack_locks_all_lock before writing to nf_conntrack_locks_all, 
and since nf_conntrack_lock() reads nf_conntrack_locks_all before 
releasing nf_conntrack_locks_all_lock, by (b) the read cannot see the 
write.

Alan Stern

> +
> + /* fast path failed, unlock */
> + spin_unlock(lock);
> +
> + /* Slow path 1) get global lock */
> + spin_lock(_conntrack_locks_all_lock);
> +
> + /* Slow path 2) get the lock we want */
> + spin_lock(lock);
> +
> + /* Slow path 3) release the global lock */
> + spin_unlock(_conntrack_locks_all_lock);
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_lock);
>  
> @@ -149,18 +154,17 @@ static void nf_conntrack_all_lock(void)
>   int i;
>  
>   spin_lock(_conntrack_locks_all_lock);
> - nf_conntrack_locks_all = true;
>  
> - /*
> -  * Order the above store of 'nf_conntrack_locks_all' against
> -  * the spin_unlock_wait() loads below, such that if
> -  * nf_conntrack_lock() observes 'nf_conntrack_locks_all'
> -  * we must observe nf_conntrack_locks[] held:
> -  */
> - smp_mb(); /* spin_lock(_conntrack_locks_all_lock) */
> + nf_conntrack_locks_all = true;
>  
>   for (i = 0; i < CONNTRACK_LOCKS; i++) {
> - spin_unlock_wait(_conntrack_locks[i]);
> + spin_lock(_conntrack_locks[i]);
> +
> + /* This spin_unlock provides the "release" to ensure that
> +  * nf_conntrack_locks_all==true is visible to everyone that
> +  * acquired spin_lock(_conntrack_locks[]).
> +  */
> + spin_unlock(_conntrack_locks[i]);
>   }
>  }





Re: [PATCH RFC 02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

2017-06-30 Thread Alan Stern
On Fri, 30 Jun 2017, Oleg Nesterov wrote:

> On 06/30, Paul E. McKenney wrote:
> >
> > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > >
> > > I do not think the overhead will be noticeable in this particular case.
> > >
> > > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
>^
> 
> if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".
> 
> > > it has some problems, but still...
> > >
> > > The code above looks strange for me. If we are going to repeat this 
> > > pattern
> > > the perhaps we should add a helper for lock+unlock and name it 
> > > unlock_wait2 ;)
> > >
> > > If not, we should probably change this code more:
> >
> > This looks -much- better than my patch!  May I have your Signed-off-by?
> 
> Only if you promise to replace all RCU flavors with a single simple 
> implementation
> based on rwlock ;)
> 
> Seriously, of course I won't argue, and it seems that nobody except me likes
> this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
> sometimes it makes sense.

If it looks like synchronize_rcu(), why not actually use 
synchronize_rcu()?

Alan Stern

> Including this particular case. task_work_run() is going to flush/destroy the
> ->task_works list, so it needs to wait until all currently executing "readers"
> (task_work_cancel()'s which have started before ->task_works was updated) have
> completed.



Re: [PATCH] r8152: Allocate interrupt buffer as part of struct r8152

2017-01-31 Thread Alan Stern
On Tue, 31 Jan 2017, Guenter Roeck wrote:

> When unloading the r8152 driver using the 'unbind' sysfs attribute
> in a system with KASAN enabled, the following error message is seen
> on a regular basis.

...

> The two-byte allocation in conjunction with code analysis suggests that
> the interrupt buffer has been overwritten. Added instrumentation in the
> driver shows that the interrupt handler is called after RTL8152_UNPLUG
> was set, and that this event is associated with the error message above.
> This suggests that there are situations where the interrupt buffer is used
> after it has been freed.
> 
> To avoid the problem, allocate the interrupt buffer as part of struct
> r8152.
> 
> Cc: Hayes Wang <hayesw...@realtek.com>
> Signed-off-by: Guenter Roeck <li...@roeck-us.net>
> ---
> The problem is seen in chromeos-4.4, but there is not reason to believe
> that it does not occur with the upstream kernel. It is still seen in
> chromeos-4.4 after all patches from upstream and linux-next have been
> applied to the driver.
> 
> While relatively simple, I am not really convinced that this is the best
> (or even an acceptable) solution for this problem. I am open to suggestions
> for a better fix.

The proper approach is to keep the allocation as it is, but _before_
deallocating the buffer, make sure that the interrupt buffer won't be
accessed any more.  This may involve calling usb_kill_urb(), or
synchronize_irq(), or something similar.

Alan Stern



Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-10 Thread Alan Stern
On Thu, 10 Nov 2016, Kai-Heng Feng wrote:

> Hi,
> 
> On Wed, Nov 9, 2016 at 8:32 PM, Bjørn Mork <bj...@mork.no> wrote:
> > Oliver Neukum <oneu...@suse.com> writes:
> >
> >> On Tue, 2016-11-08 at 13:44 -0500, Alan Stern wrote:
> >>
> >>> These problems could very well be caused by running at SuperSpeed
> >>> (USB-3) instead of high speed (USB-2).
> 
> Yes, it's running at SuperSpeed, on a Kabylake laptop.
> 
> It does not have this issue on a Broadwell laptop, also running at SuperSpeed.
> 
> >>>
> >>> Is there any way to test what happens when the device is attached to
> >>> the computer by a USB-2 cable?  That would prevent it from operating at
> >>> SuperSpeed.
> 
> I recall old Intel PCH can change the USB host from XHCI to EHCI,
> newer PCH does not have this option.
> 
> Is there a way to force XHCI run at HighSpeed?

Yes, like I said above: Use a USB-2 cable instead of a USB-3 cable.

Alan Stern



Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-08 Thread Alan Stern
On Tue, 8 Nov 2016, Bjørn Mork wrote:

> Alan Stern <st...@rowland.harvard.edu> writes:
> 
> > On Tue, 8 Nov 2016, Kai-Heng Feng wrote:
> >
> >> Hi,
> >> 
> >> On Mon, Nov 7, 2016 at 7:02 PM, Oliver Neukum <oneu...@suse.com> wrote:
> >> > On Fri, 2016-11-04 at 17:57 +0800, Kai-Heng Feng wrote:
> >> >> Sometimes cdc_mbim failed to probe if runtime pm is enabled:
> >> >> [9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22
> >> >>
> >> >> This can be solved by increase its pm usage counter.
> >> >>
> >> >> Signed-off-by: Kai-Heng Feng <kai.heng.f...@canonical.com>
> >> >
> >> > For the record:
> >> >
> >> > NAK. This fixes a symptom. If this patch helps something is broken in
> >> > device core. We need to find that.
> >> >
> >> 
> >> Please check attached dmesg with usbcore.dyndbg="+p".
> >
> > The log shows that the device went into suspend _before_ the cdc_mbim 
> > driver was probed, not during the probe.  Then just before the probe 
> > was started, the USB core tried to resume the device and the resume 
> > failed.
> >
> > The log shows a bunch of other problems with this device:
> >
> > [3.862253] usb 2-2: config 1 has an invalid interface number: 12 but 
> > max is 1
> > [3.862254] usb 2-2: config 1 has an invalid interface number: 13 but 
> > max is 1
> > [3.862254] usb 2-2: config 1 has an invalid interface number: 13 but 
> > max is 1
> > [3.862255] usb 2-2: config 1 has no interface number 0
> > [3.862256] usb 2-2: config 1 has no interface number 1
> 
> These messages are completely harmless and normal for Sierra Wireless
> devices.  They use the interface number to identify the type of
> function, causing this mismatch between the number of interfaces and the
> inteface numbers. Boy, that looks weird in writing :)
> 
> Ref this discussion we had a few years ago:
> http://www.spinics.net/lists/linux-usb/msg77499.html
> 
> No, I didn't expect you to remember that :)

You're right; I didn't remember it.  But seeing those messages again in
the mailing list archives, they do look a little familiar.

> > [8.295180] usb 2-2: Disable of device-initiated U1 failed.
> > [8.295322] usb 2-2: Disable of device-initiated U2 failed.
> >
> > I get the impression that the device won't work properly with runtime 
> > PM at all.
> 
> I suspect the device is an EM7455?  If so, then it does work fine with
> runtime PM, as long as we're talking USB2.  Not sure about USB3 runtime
> PM though.  Cannot test it. The Lenovo laptop I got with one of these
> modems has disabled the USB3 link on the m.2 modem slot for some reason.

These problems could very well be caused by running at SuperSpeed
(USB-3) instead of high speed (USB-2).

Is there any way to test what happens when the device is attached to 
the computer by a USB-2 cable?  That would prevent it from operating at 
SuperSpeed.

The main point, however, is that the proposed patch doesn't seem to
address the true problem, which is that the device gets suspended
between probes.  The patch only tries to prevent it from being
suspended during a probe -- which is already prevented by the USB core.

Alan Stern



Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-08 Thread Alan Stern
On Tue, 8 Nov 2016, Kai-Heng Feng wrote:

> Hi,
> 
> On Mon, Nov 7, 2016 at 7:02 PM, Oliver Neukum <oneu...@suse.com> wrote:
> > On Fri, 2016-11-04 at 17:57 +0800, Kai-Heng Feng wrote:
> >> Sometimes cdc_mbim failed to probe if runtime pm is enabled:
> >> [9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22
> >>
> >> This can be solved by increase its pm usage counter.
> >>
> >> Signed-off-by: Kai-Heng Feng <kai.heng.f...@canonical.com>
> >
> > For the record:
> >
> > NAK. This fixes a symptom. If this patch helps something is broken in
> > device core. We need to find that.
> >
> 
> Please check attached dmesg with usbcore.dyndbg="+p".

The log shows that the device went into suspend _before_ the cdc_mbim 
driver was probed, not during the probe.  Then just before the probe 
was started, the USB core tried to resume the device and the resume 
failed.

The log shows a bunch of other problems with this device:

[3.862253] usb 2-2: config 1 has an invalid interface number: 12 but max is 
1
[3.862254] usb 2-2: config 1 has an invalid interface number: 13 but max is 
1
[3.862254] usb 2-2: config 1 has an invalid interface number: 13 but max is 
1
[3.862255] usb 2-2: config 1 has no interface number 0
[3.862256] usb 2-2: config 1 has no interface number 1
...
[8.295180] usb 2-2: Disable of device-initiated U1 failed.
[8.295322] usb 2-2: Disable of device-initiated U2 failed.

I get the impression that the device won't work properly with runtime 
PM at all.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-22 Thread Alan Stern
On Tue, 22 Mar 2016, Oliver Neukum wrote:

> On Tue, 2016-03-22 at 10:21 -0400, Alan Stern wrote:
> > I don't see any point in resuming the device just in order to collect 
> > operating statistics.  If it was already suspended then it wasn't 
> > operating, so there will be no statistics to collect.
> 
> Indeed. In that case the point is moot. But it is correct to ask
> the core whether the device is autosuspended at that point rather
> than keep a private flag if you can.

That's why we have pm_runtime_status_suspended().

> All that is relevant only if the upper layers ask for information
> that the driver cannot provide without resuming the device.
> Those are fundamentally different issues.

Of course.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-22 Thread Alan Stern
On Tue, 22 Mar 2016, Oliver Neukum wrote:

> On Mon, 2016-03-21 at 15:30 -0400, Alan Stern wrote:
> > On Mon, 21 Mar 2016, Oliver Neukum wrote:
> > 
> 
> > > We have an autosuspend timeout because we think that IO, if it will
> > > come at all, is likeliest to come soon. If, however, the IO is
> > > periodic that heuristics is false.
> > > To save most power the driver must either decide that the interval
> > > is too short or suspend immediately. So if we are lucky enough
> > > to have the frequency in the kernel, we should use that information.
> > 
> > The autosuspend timeout is set by userspace.  The kernel may assign a
> 
> Thus it should apply to all IO originating in user space.
> But only to that IO.

Fair enough.

> > default value, but the user can always override it.  Given this, I 
> > don't see how the kernel can use frequency information (and I'm not 
> > sure where that information would come from in the first place).
> 
> It can ignore internal IO for the purpose of the timeout.
> If such IO is performed while the device is active, don't
> alter the timer.

Come to think of it, we don't.  If pm_runtime_get_sync() and then
pm_runtime_put() are called while the device is already at full power, 
the PM core doesn't update the last_busy time.  So if the driver 
doesn't update it either, the statistics collection won't interfere 
with autosuspend (except when it races with the autosuspend timer 
expiration).

> Otherwise resume the device and look at
> the provided hint and suspend again immediately if the period is long
> enough.

I don't see any point in resuming the device just in order to collect 
operating statistics.  If it was already suspended then it wasn't 
operating, so there will be no statistics to collect.

> If IO is generated periodically in the kernel, the kernel must know that
> period.

Alan Stern



RE: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016 woojung@microchip.com wrote:

> > > > But this leaves open the issue that querying the device too often will
> > > > prevent it from going into autosuspend.  It seems to me that the best
> > > > way to deal with this is to make sure that the autosuspend timeout is
> > > > shorter than the interal between queries, not to make the querying
> > > > conditional on !runtime_auto.
> > >
> > > Short autosuspend timeout can affect performance. For instance our
> > experiments showed that
> > > shorter than 10sec timeout made Ethernet performance degrade because
> > of wakeup delays.
> > > So, just putting shorter timeout may have some side effects.
> > 
> > Sure.  This just means that you need a long statistics interval --
> > longer than the autosuspend timeout.  That's why I suggested making the
> > interval adjustable.
> 
> What do you mean statistics interval?
> Interval calling ndo_get_stats64 or another thread/timer or else getting 
> statistics?

The time between calls to ndo_get_stats64, since that's the routine 
which queries the device.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016, Oliver Neukum wrote:

> On Mon, 2016-03-21 at 14:24 -0400, Alan Stern wrote:
> > On Mon, 21 Mar 2016, Oliver Neukum wrote:
> > 
> > > On Mon, 2016-03-21 at 10:57 -0400, Alan Stern wrote:
> > > 
> > > > One possible solution is to export a sysfs parameter to prevent 
> > > > statistics collection (or more generally, to change the interval at 
> > > > which it occurs).
> > > 
> > > Surely, not performing a task can hardly be beaten in terms of power
> > > consumption. That is not meant to be flippant, but I think these
> > > issues are orthogonal. The question of how much to do doesn't
> > > solve the question of doing efficiently what we do.
> > 
> > In other words, what's the best way to collect the statistics without 
> > interfering with runtime PM, right?
> 
> Yes.
> 
> > If the device is suspended, presumably we know there's nothing to
> > collect -- especially if we already collected the statistics at the
> > time the device got suspended.  Hence my suggestion to avoid querying 
> > the device while it is suspended.
> 
> That is perfectly alright if we just collect statistics.
> As a generic mechanism it is bad. Think about the polling
> for media detection.

True.  Here I'm talking specifically about collecting statistics.  
Media detection has its own requirements.

> > But this leaves open the issue that querying the device too often will 
> > prevent it from going into autosuspend.  It seems to me that the best 
> > way to deal with this is to make sure that the autosuspend timeout is 
> > shorter than the interal between queries, not to make the querying 
> > conditional on !runtime_auto.
> [..]
> > > If we know when the next activity will come, why not pass this
> > > information down?
> 
> We have an autosuspend timeout because we think that IO, if it will
> come at all, is likeliest to come soon. If, however, the IO is
> periodic that heuristics is false.
> To save most power the driver must either decide that the interval
> is too short or suspend immediately. So if we are lucky enough
> to have the frequency in the kernel, we should use that information.

The autosuspend timeout is set by userspace.  The kernel may assign a 
default value, but the user can always override it.  Given this, I 
don't see how the kernel can use frequency information (and I'm not 
sure where that information would come from in the first place).

Alan Stern



RE: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016 woojung@microchip.com wrote:

> > But this leaves open the issue that querying the device too often will
> > prevent it from going into autosuspend.  It seems to me that the best
> > way to deal with this is to make sure that the autosuspend timeout is
> > shorter than the interal between queries, not to make the querying
> > conditional on !runtime_auto.
> 
> Short autosuspend timeout can affect performance. For instance our 
> experiments showed that
> shorter than 10sec timeout made Ethernet performance degrade because of 
> wakeup delays.
> So, just putting shorter timeout may have some side effects.

Sure.  This just means that you need a long statistics interval --
longer than the autosuspend timeout.  That's why I suggested making the
interval adjustable.

Alternatively, you could automatically set the statistics interval to
the autosuspend timeout (or 0 if autosuspend is disabled) plus some
fixed value.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016, Oliver Neukum wrote:

> On Mon, 2016-03-21 at 10:57 -0400, Alan Stern wrote:
> 
> > One possible solution is to export a sysfs parameter to prevent 
> > statistics collection (or more generally, to change the interval at 
> > which it occurs).
> 
> Surely, not performing a task can hardly be beaten in terms of power
> consumption. That is not meant to be flippant, but I think these
> issues are orthogonal. The question of how much to do doesn't
> solve the question of doing efficiently what we do.

In other words, what's the best way to collect the statistics without 
interfering with runtime PM, right?

If the device is suspended, presumably we know there's nothing to
collect -- especially if we already collected the statistics at the
time the device got suspended.  Hence my suggestion to avoid querying 
the device while it is suspended.

But this leaves open the issue that querying the device too often will 
prevent it from going into autosuspend.  It seems to me that the best 
way to deal with this is to make sure that the autosuspend timeout is 
shorter than the interal between queries, not to make the querying 
conditional on !runtime_auto.

> > But checking the runtime_auto flag is probably not a great idea.  Even
> > if it isn't set, collecting statistics is likely to wait up a device
> > that otherwise would have remained suspended.
> > 
> > Perhaps the best solution is to collect the statistics only when the 
> > device is not suspended or is about to suspend.
> 
> If we know when the next activity will come, why not pass this
> information down?

I don't follow.

Alan Stern



Re: [PATCH] lan78xx: Protect runtime_auto check by #ifdef CONFIG_PM

2016-03-21 Thread Alan Stern
On Mon, 21 Mar 2016, Oliver Neukum wrote:

> On Sun, 2016-03-20 at 11:43 +0100, Geert Uytterhoeven wrote:
> > If CONFIG_PM=n:
> > 
> > drivers/net/usb/lan78xx.c: In function ‘lan78xx_get_stats64’:
> > drivers/net/usb/lan78xx.c:3274: error: ‘struct dev_pm_info’ has no
> > member named ‘runtime_auto’
> > 
> > If PM is disabled, the runtime_auto flag is not available, but auto
> > suspend is not enabled anyway.  Hence protect the check for
> > runtime_auto
> > by #ifdef CONFIG_PM to fix this.
> > 
> > Fixes: a59f8c5b048dc938 ("lan78xx: add ndo_get_stats64")
> > Reported-by: Guenter Roeck <li...@roeck-us.net>
> > Signed-off-by: Geert Uytterhoeven <ge...@linux-m68k.org>
> > ---
> > Alternatively, we can add a dev_pm_runtime_auto_is_enabled() wrapper
> > to
> > include/linux/pm.h, which always return false if CONFIG_PM is
> > disabled.
> 
> That is what we do for almost everything else in include/pm_runtime.h
> So it is the best solution to add the stub.

Guenter's question about whether drivers should try to access
runtime_auto in the first place was a good one.  A similar problem
arises in the block layer: When a block device has removable media, the
block layer probes at 1-second intervals looking for media changes.  
This defeats autosuspend in the same way as we're talking about here.

One possible solution is to export a sysfs parameter to prevent 
statistics collection (or more generally, to change the interval at 
which it occurs).

But checking the runtime_auto flag is probably not a great idea.  Even
if it isn't set, collecting statistics is likely to wait up a device
that otherwise would have remained suspended.

Perhaps the best solution is to collect the statistics only when the 
device is not suspended or is about to suspend.

Alan Stern



Re: [PATCH v2] r8152: fix lockup when runtime PM is enabled

2015-12-24 Thread Alan Stern
On Thu, 24 Dec 2015, Oliver Neukum wrote:

> On Thu, 2015-12-24 at 10:14 -0500, Alan Stern wrote:
> > On Thu, 24 Dec 2015, Oliver Neukum wrote:
> > 
> > > On Wed, 2015-12-23 at 20:32 -0500, Alan Stern wrote:
> 
> > > But you cannot keep that setting if the system goes down
> > > or any broadcast packet would resume the whole system.
> > > Yet you cannot just disable remote wake up, as WoL packages
> > > still must trigger a remote wake up.
> > 
> > This means that sometimes you want to avoid losing packets and other 
> > times you do want to lose packets.  That is a policy decision, and 
> > therefore it should be made by the user, not the kernel.
> 
> Indeed it is and there is a tool for this with a defined
> interface called "ethtool"

No; ethtool affects the wakeup setting for system suspend, but not
for runtime suspend.  I was referring to something that would specify 
the setting for both cases.  But perhaps that doesn't make sense, 
because you never want to drop relevant packets during runtime suspend.  
If you did, you would run "ifconfig down" instead.

> > the PM core aren't adequate for what the driver needs.  The PM core 
> > distinguishes between wakeup enabled or disabled; it doesn't 
> > distinguish among different levels of wakekup.
> 
> True and sanely it cannot. We could only distinguish between drivers
> which need their devices to be resumed before the system suspends and
> the rest.
> Or we tell driver coders to use the notifier chains.

"Resume before system suspend" sounds like a reasonable thing to
implement, for devices that have multiple levels of wakeup settings.  
Would you like to post a proposal on linux-pm for this?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] r8152: fix lockup when runtime PM is enabled

2015-12-24 Thread Alan Stern
On Thu, 24 Dec 2015, Oliver Neukum wrote:

> On Wed, 2015-12-23 at 20:32 -0500, Alan Stern wrote:
> 
> > I don't understand why the wakeup conditions are different.  It seems
> > to me that the choice of which packets will generate a wakeup ought to
> > depend on the user's selection, not on the kind of suspend.  For
> > instance, if the user says that only a magic packet should cause a
> > wakeup then that should be true for both runtime suspend and system
> > suspend.
> > 
> > To put it another way, as far as the device is concerned a suspend is
> > just a suspend -- there's no different between a runtime suspend and a
> > system suspend.
> 
> This literally true, but the host and the driver care.
> If we autosuspend a running network device, any packet
> (maybe filtered for MAC) should cause a remote wake up,
> else we'd lose packets.

That's also true during system suspend.

> But you cannot keep that setting if the system goes down
> or any broadcast packet would resume the whole system.
> Yet you cannot just disable remote wake up, as WoL packages
> still must trigger a remote wake up.

This means that sometimes you want to avoid losing packets and other 
times you do want to lose packets.  That is a policy decision, and 
therefore it should be made by the user, not the kernel.

> So there are drivers which must change settings on devices
> as the system goes to sleep, even if their devices have
> already been autosuspended. We could use the notifier chains
> for that. But can this solution be called elegant?

Instead of the driver trying to do this automatically, you could rely 
on userspace telling the driver which packets should cause a wakeup.  
The setting could be updated immediately before and after each system 
suspend.

I admit this is more awkward than having the driver make a choice based 
on the type of suspend.  This is a case where the resources provided by 
the PM core aren't adequate for what the driver needs.  The PM core 
distinguishes between wakeup enabled or disabled; it doesn't 
distinguish among different levels of wakekup.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] r8152: fix lockup when runtime PM is enabled

2015-12-23 Thread Alan Stern
On Wed, 23 Dec 2015, Hayes Wang wrote:

> Oliver Neukum [mailto:oneu...@suse.de]
> > Sent: Wednesday, December 23, 2015 4:20 PM
> [...]
> > No, step (2) does not exist. Calls to suspend() and [reset_]resume()
> > always balance. Usually a driver shouldn't care about system suspend.
> > The way the driver is currently coded will also fail for Port-Power Off.
> 
> It is different with Windows. The Windows would resume the device before
> system suspend, if the system suspend follows the autosuspend.
> 
> Would this be a problem? After system suspend, the device may wake up
> the system when receiving any packet, not only magic packet. The wake
> events are different for system suspend and autosuspend. However, I
> couldn't change the wake event, because the autosuspend occurs first,
> and the suspend() is only called once.

I don't understand why the wakeup conditions are different.  It seems
to me that the choice of which packets will generate a wakeup ought to
depend on the user's selection, not on the kind of suspend.  For
instance, if the user says that only a magic packet should cause a
wakeup then that should be true for both runtime suspend and system
suspend.

To put it another way, as far as the device is concerned a suspend is
just a suspend -- there's no different between a runtime suspend and a
system suspend.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Adding Reset resume support for CDC-EEM driver.

2015-12-21 Thread Alan Stern
On Mon, 21 Dec 2015, Vikas Bansal wrote:

> Pre-Condition
> At the time of reset resume of a USB device, both self and bus powered 
> devices might go to a low power state or power off state depending on the 
> acceptable suspend time power of the system.
> In case the device experiences a power glitch or completely powers off during 
> suspend-resume, the device will lose its internal state and hence it'll again 
> need a set interface from class driver on reset resume operation.
> 
> Issue 
> So far our experiments were based on a USB gadget working on cdc_eem 
> protocol. 
> We have seen that device is unable to continue the packet transfer on bulk 
> endpoints after the reset resume operation.
> 
> Solution
> We have added a .reset_resume function for cdc_eem protocol which sends a set 
> interface command on the Control endpoint. And calls the existing 
> usbnet_resume thereafter

You know, the USB core already issues a Set-Interface request on the
control endpoint whenever a reset-resume occurs (unless the interface
was using altsetting 0 beforehand).  Issuing another Set-Interface
shouldn't make any difference.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Alan Stern
On Mon, 24 Aug 2015, David Miller wrote:

 From: Eugene Shatokhin eugene.shatok...@rosalab.ru
 Date: Wed, 19 Aug 2015 14:59:01 +0300
 
  So the following might be possible, although unlikely:
  
  CPU0 CPU1
   clear_bit: read dev-flags
   clear_bit: clear EVENT_RX_KILL in the read value
  
  dev-flags=0;
  
   clear_bit: write updated dev-flags
  
  As a result, dev-flags may become non-zero again.
 
 Is this really possible?
 
 Stores really are atomic in the sense that the do their update
 in one indivisible operation.

Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to 
call it now.

 Atomic operations like clear_bit also will behave that way.

Are you certain about that?  I couldn't find any mention of it in
Documentation/atomic_ops.txt.

In theory, an architecture could implement atomic bit operations using 
a spinlock to insure atomicity.  I don't know if any architectures do 
this, but if they do then the scenario above could arise.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: Fix two races between usbnet_stop() and the BH

2015-08-24 Thread Alan Stern
On Mon, 24 Aug 2015, Alan Stern wrote:

 On Mon, 24 Aug 2015, David Miller wrote:
 
  From: Eugene Shatokhin eugene.shatok...@rosalab.ru
  Date: Wed, 19 Aug 2015 14:59:01 +0300
  
   So the following might be possible, although unlikely:
   
   CPU0 CPU1
clear_bit: read dev-flags
clear_bit: clear EVENT_RX_KILL in the read value
   
   dev-flags=0;
   
clear_bit: write updated dev-flags
   
   As a result, dev-flags may become non-zero again.
  
  Is this really possible?
  
  Stores really are atomic in the sense that the do their update
  in one indivisible operation.
 
 Provided you use ACCESS_ONCE or WRITE_ONCE or whatever people like to 
 call it now.
 
  Atomic operations like clear_bit also will behave that way.
 
 Are you certain about that?  I couldn't find any mention of it in
 Documentation/atomic_ops.txt.
 
 In theory, an architecture could implement atomic bit operations using 
 a spinlock to insure atomicity.  I don't know if any architectures do 
 this, but if they do then the scenario above could arise.

Now that I see this in writing, I realize it's not possible after all.  
clear_bit() et al. will work with a single unsigned long, which doesn't
leave any place for spinlocks or other mechanisms.  I was thinking of 
atomic_t.

So never mind...

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fec: fix initial runtime PM refcount

2015-08-04 Thread Alan Stern
On Tue, 4 Aug 2015, Lucas Stach wrote:

 Am Montag, den 03.08.2015, 14:28 -0400 schrieb Alan Stern:
  On Mon, 3 Aug 2015, Uwe [iso-8859-1] Kleine-Knig wrote:
  
   Hello,
   
   I have no clue about runtime-pm, but I added a few people to Cc: who
   should know better ...
   
   Best regards
   Uwe
   
   On Mon, Aug 03, 2015 at 06:15:54PM +0200, Andrew Lunn wrote:
On Mon, Aug 03, 2015 at 05:50:11PM +0200, Lucas Stach wrote:
 The clocks are initially active and thus the device is marked active.
 This still keeps the PM refcount at 0, the 
 pm_runtime_put_autosuspend()
 call at the end of probe then leaves us with an invalid refcount of 
 -1,
 which in turn leads to the device staying in suspended state even 
 though
 netdev open had been called.
 
 Fix this by initializing the refcount to be coherent with the initial
 device status.
 
 Fixes:
 8fff755e9f8 (net: fec: Ensure clocks are enabled while using mdio bus)
 
 Signed-off-by: Lucas Stach l.st...@pengutronix.de
 ---
 Please apply this as a fix for 4.2
 ---
  drivers/net/ethernet/freescale/fec_main.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/drivers/net/ethernet/freescale/fec_main.c 
 b/drivers/net/ethernet/freescale/fec_main.c
 index 32e3807c650e..271bb5862346 100644
 --- a/drivers/net/ethernet/freescale/fec_main.c
 +++ b/drivers/net/ethernet/freescale/fec_main.c
 @@ -3433,6 +3433,7 @@ fec_probe(struct platform_device *pdev)
  
   pm_runtime_set_autosuspend_delay(pdev-dev, 
 FEC_MDIO_PM_TIMEOUT);
   pm_runtime_use_autosuspend(pdev-dev);
 + pm_runtime_get_noresume(pdev-dev);
   pm_runtime_set_active(pdev-dev);
   pm_runtime_enable(pdev-dev);

This might work, but is it the correct fix?
  
  It looks reasonable to me.  It might also make sense to move all of
  that pm_runtime_* stuff to the end of the probe routine.  Notice that
  they don't get undone if register_netdev() fails.
  
 Unfortunately we can not move RPM enabling to the end of probe, as the
 MDIO read/write functions that rely on RPM are already called while we
 are still in the middle of the probe function.

In that case, adding a call pm_runtime_get_noresume() is the right 
thing to do.

 I agree that we need better error handling here, but that comment
 applies to the whole FEC probe function. I think that this might be
 invasive enough to justify a delay to the next merge window, not really
 material for the late RCs.

That's entirely up to you, of course.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fec: fix initial runtime PM refcount

2015-08-04 Thread Alan Stern
On Tue, 4 Aug 2015, Uwe [iso-8859-1] Kleine-K�nig wrote:

 Hello,
 
 On Tue, Aug 04, 2015 at 10:20:55AM -0400, Alan Stern wrote:
  In that case, adding a call pm_runtime_get_noresume() is the right 
  thing to do.
 Is this an ack for Lucas' patch?

Yes:

Acked-by: Alan Stern st...@rowland.harvard.edu

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: fec: fix initial runtime PM refcount

2015-08-03 Thread Alan Stern
On Mon, 3 Aug 2015, Uwe [iso-8859-1] Kleine-K�nig wrote:

 Hello,
 
 I have no clue about runtime-pm, but I added a few people to Cc: who
 should know better ...
 
 Best regards
 Uwe
 
 On Mon, Aug 03, 2015 at 06:15:54PM +0200, Andrew Lunn wrote:
  On Mon, Aug 03, 2015 at 05:50:11PM +0200, Lucas Stach wrote:
   The clocks are initially active and thus the device is marked active.
   This still keeps the PM refcount at 0, the pm_runtime_put_autosuspend()
   call at the end of probe then leaves us with an invalid refcount of -1,
   which in turn leads to the device staying in suspended state even though
   netdev open had been called.
   
   Fix this by initializing the refcount to be coherent with the initial
   device status.
   
   Fixes:
   8fff755e9f8 (net: fec: Ensure clocks are enabled while using mdio bus)
   
   Signed-off-by: Lucas Stach l.st...@pengutronix.de
   ---
   Please apply this as a fix for 4.2
   ---
drivers/net/ethernet/freescale/fec_main.c | 1 +
1 file changed, 1 insertion(+)
   
   diff --git a/drivers/net/ethernet/freescale/fec_main.c 
   b/drivers/net/ethernet/freescale/fec_main.c
   index 32e3807c650e..271bb5862346 100644
   --- a/drivers/net/ethernet/freescale/fec_main.c
   +++ b/drivers/net/ethernet/freescale/fec_main.c
   @@ -3433,6 +3433,7 @@ fec_probe(struct platform_device *pdev)

 pm_runtime_set_autosuspend_delay(pdev-dev, FEC_MDIO_PM_TIMEOUT);
 pm_runtime_use_autosuspend(pdev-dev);
   + pm_runtime_get_noresume(pdev-dev);
 pm_runtime_set_active(pdev-dev);
 pm_runtime_enable(pdev-dev);
  
  This might work, but is it the correct fix?

It looks reasonable to me.  It might also make sense to move all of
that pm_runtime_* stuff to the end of the probe routine.  Notice that
they don't get undone if register_netdev() fails.

  Documentation/power/runtime_pm.txt says:
  
  534 In addition to that, the initial runtime PM status of all devices is
  535 'suspended', but it need not reflect the actual physical state of the 
  device.
  536 Thus, if the device is initially active (i.e. it is able to process 
  I/O), its
  537 runtime PM status must be changed to 'active', with the help of
  538 pm_runtime_set_active(), before pm_runtime_enable() is called for the 
  device.
  
  At the point we call the pm_runtime_ functions above, all the clocks
  are ticking. So according to the documentation pm_runtime_set_active()
  is the right thing to do. But it makes no mention of have to call
  pm_runtime_get_noresume(). I would of expected pm_runtime_set_active()
  to set the count to the correct value.

pm_runtime_set_active() doesn't change the usage count.  All it does is 
set the runtime PM status to active.

A call to pm_runtime_get_noresume() (or something similar) is necessary
to balance the call to pm_runtime_put_autosuspend() at the end of the
probe routine.  Both the _get_ and the _put_ should be present or
neither should be.

For instance, an alternative way to accomplish the same result is to
replace pm_runtime_put_autosuspend() with pm_runtime_autosuspend().  
The only difference is that the usage counter would not be elevated
during the register_netdev() call, so in theory the device could be
suspended while that routine is running.  But if all the pm_runtime_*
calls were moved to the end of the probe function, even that couldn't
happen.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH] Bluetooth: Make request workqueue freezable

2015-05-21 Thread Alan Stern
On Thu, 21 May 2015, Marcel Holtmann wrote:

 Hi Alan,
 
  Then avoiding the failed firmware is no solution, indeed.
  If it's a new probe, it should be never executed during resume.
  
  Can you expand this comment?  What's wrong with probing during resume?
  
  The USB stack does carry out probes during resume under certain
  circumstances.  A driver lacking a reset_resume callback is one of
  those circumstances.
 
 in case the platform kills the power to the USB lines, we can never
 do anything about this. I do not want to hack around this in the
 driver.
 
 What are the cases where we should implement reset_resume and would
 it really help here. Since the btusb.ko driver implements
 suspend/resume support, would reset_resume ever be called?

One of those cases is exactly what you have been talking about: when
the platform kills power to the USB lines during suspend.  The driver's
reset_resume routine will be called during resume, as opposed to the
probe routine being called.  Therefore the driver will be able to tell
that this is not a new device instance.

The other cases are less likely to occur: a device is unable to resume 
normally and requires a reset before it will start working again, or 
something else goes wrong along those lines.

 However I get the feeling someone needs to go back and see if the
 device is the same one and just gets probed again or if it is a new
 one from the USB host stack perspective.

That can be done easily enough by enabling usbcore debugging before 
carrying out the system suspend:

echo 'module usbcore =p' /debug/dynamic_debug/control

The debugging information in the kernel log will tell just what 
happened.


On Thu, 21 May 2015, Takashi Iwai wrote:

 At Thu, 21 May 2015 10:18:08 -0400 (EDT),
 Alan Stern wrote:
  
  On Thu, 21 May 2015, Takashi Iwai wrote:
  
   Then avoiding the failed firmware is no solution, indeed.
   If it's a new probe, it should be never executed during resume.
  
  Can you expand this comment?  What's wrong with probing during resume?
 
 Well, if the probe requires the access to a user-space file, it can't
 be done during resume.  That's the very problem we're seeing now.
 The firmware loader can't help much alone if it's a new device
 object.

But the same thing happens during early boot, if the driver is built 
into the kernel.  When the probe occurs, userspace isn't up and running 
yet, so the firmware loader can't do anything.

Why should probe during resume be any worse than probe during early 
boot?

  The USB stack does carry out probes during resume under certain
  circumstances.  A driver lacking a reset_resume callback is one of
  those circumstances.
 
 So, having a proper reset_resume in btusb would help in the end?

It might, depending on how the driver is written.  I don't know enough 
about the details of btusb to say.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH] Bluetooth: Make request workqueue freezable

2015-05-21 Thread Alan Stern
On Thu, 21 May 2015, Takashi Iwai wrote:

 Then avoiding the failed firmware is no solution, indeed.
 If it's a new probe, it should be never executed during resume.

Can you expand this comment?  What's wrong with probing during resume?

The USB stack does carry out probes during resume under certain
circumstances.  A driver lacking a reset_resume callback is one of
those circumstances.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH] Bluetooth: Make request workqueue freezable

2015-05-21 Thread Alan Stern
On Thu, 21 May 2015, Takashi Iwai wrote:

 At Thu, 21 May 2015 11:26:17 -0400 (EDT),
 Alan Stern wrote:
  
  On Thu, 21 May 2015, Takashi Iwai wrote:
  
   At Thu, 21 May 2015 10:18:08 -0400 (EDT),
   Alan Stern wrote:

On Thu, 21 May 2015, Takashi Iwai wrote:

 Then avoiding the failed firmware is no solution, indeed.
 If it's a new probe, it should be never executed during resume.

Can you expand this comment?  What's wrong with probing during resume?
   
   Well, if the probe requires the access to a user-space file, it can't
   be done during resume.  That's the very problem we're seeing now.
   The firmware loader can't help much alone if it's a new device
   object.
  
  But the same thing happens during early boot, if the driver is built 
  into the kernel.  When the probe occurs, userspace isn't up and running 
  yet, so the firmware loader can't do anything.
  
  Why should probe during resume be any worse than probe during early 
  boot?
 
 The early boot has initrd, so the files can be there.  But the resume
 has no way to fetch the file except for cached data.

I suppose USB could delay re-probing until userspace is running again,
if we knew when that was.  But it would be awkward and prone to races.  
It also would leave a user-visible window of time during which the 
device does not exist, which we want to avoid.  (This may not matter 
for bluetooth, but it does matter for other kinds of devices.)

I would prefer to solve this problem in a different way, if possible.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb regression] Re: [PATCH 2.6.24-rc3] Fix /proc/net breakage

2008-01-02 Thread Alan Stern
On Tue, 1 Jan 2008, Greg KH wrote:

 On Mon, Dec 31, 2007 at 11:26:43AM -0800, Greg KH wrote:
  On Mon, Dec 31, 2007 at 12:49:52PM -0500, Alan Stern wrote:
   On Sun, 30 Dec 2007, Greg KH wrote:
   
 It looks like Greg misused the debugfs API -- which is ironic, because
 he wrote debugfs in the first place!  :-)

 Ok, no, I didn't write that patch, I'm getting very confused here.
 
 In 2.6.24-rc6 there is no usage of debugfs in the ohci driver.
 
 In the -mm tree there is a patch, from Tony Jones, that moves some debug
 code out of sysfs and into debugfs where it belongs.  It does it for
 both the ehci and ohci USB host controller drivers, and this is the code
 that is incorrect if CONFIG_DEBUGFS is not enabled.

My mistake; I got the impression you had written that new code rather
than Tony.  BTW, I don't recall ever seeing Tony's patch announced on
linux-usb or linux-usb-devel.  Did I simply miss it?

 So, for the 2.6.24 release, nothing needs to be changed, all is good,
 and there is no regression.
 
 Right?  Or am I still confused about this whole thing?

Correct.  The problem exists only in -mm and your development tree.

 I will go fix up Tony's patches in the -mm tree that do not handle the
 error return values from debugfs properly, but that is not such a rush,
 as Tony is on vacation for a few weeks, and those patches are going to
 be going in only after 2.6.24 is out.

The fix I posted earlier in this thread should simply be merged into 
Tony's patches.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-27 Thread Alan Stern
On Fri, 26 Oct 2007, Maxim Levitsky wrote:

   Looking through the dmfe code, I noticed yet another possible race.
   A race between the .suspend, and a timer that serves both as a watchdog, 
   and link state detector.
   Again I need to prevent it from running during the suspend/resume, but 
   how?
   
   I can use del_timer in .suspend, and mod_timer in .resume, but that 
   doesn't protect against
   race with already running timer.
   I can use del_timer_sync, but it states that it is useless if timer 
   re-enables itself, and I agree with that.
   In dmfe case the timer does re-enable itself.
  
  That comment isn't right.  del_timer_sync works perfectly well even if
  the timer routine re-enables itself, provided it stops doing so after a
  small number of iterations.
 Thanks for the info. but
 Due to the don't access the hardware, while powered-off rule, I must know 
 that the timer isn't running.
 and won't be.
 So what function to use (if possible) to be sure that the timer won't run 
 anymore?
 (Taking in the account the fact that it re-enables itself)

Use del_timer_sync().  It guarantees that when it returns, the timer 
will be stopped and the timer routine will no longer be running on any 
CPU.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-27 Thread Alan Stern
On Sat, 27 Oct 2007, Maxim Levitsky wrote:

  Use del_timer_sync().  It guarantees that when it returns, the timer 
  will be stopped and the timer routine will no longer be running on any 
  CPU.
  
 Even if the timer re-enables itself, are you sure?

Last time I looked at the source code, that's what it did.  I'll look
again...  Yep, it still does.  It checks to see if the timer routine is
currently running; if so then it waits a little while and tries again.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-25 Thread Alan Stern
On Thu, 25 Oct 2007, Maxim Levitsky wrote:

 Hi,
 
 Recently, trying to fix saa7134 suspend/resume problems I found that there 
 is a race between IRQ handler and .suspend , and that I cant let driver 
 access the device
 while its in D3 since it can lock up some systems.
 
 Now I am looking to fix those issues in two drivers that have my 
 .suspend/.resume routines.
 the saa7134 capture chip and dmfe, the davicom network driver.
 
 Looking through the dmfe code, I noticed yet another possible race.
 A race between the .suspend, and a timer that serves both as a watchdog, and 
 link state detector.
 Again I need to prevent it from running during the suspend/resume, but how?
 
 I can use del_timer in .suspend, and mod_timer in .resume, but that doesn't 
 protect against
 race with already running timer.
 I can use del_timer_sync, but it states that it is useless if timer 
 re-enables itself, and I agree with that.
 In dmfe case the timer does re-enable itself.

That comment isn't right.  del_timer_sync works perfectly well even if
the timer routine re-enables itself, provided it stops doing so after a
small number of iterations.

 I can put checks in the timer for -insuspend, and don't re enable it if set,
 but that opens a new can of worms with memory barriers, etc...

You don't have to worry about any of that stuff.  Just check the 
insuspend flag and don't re-enable the timer if it is set.  Even 
without any memory barriers, the timer routine won't iterate more than 
once or twice.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Questions about IPsec and Netfilter

2007-05-10 Thread Alan Stern
I've got a few questions about the relationship between the IPsec 
implementation and Netfilter.

Q1: At what points during packet processing do the IPsec transformations 
occur?  In particular, which netfilter hooks do they come before and 
after?  And likewise, which routing operations do they come before and 
after?

Q2: When a packet using IPsec tunnel mode is encapsulated or 
de-encapsulated, does the newly-formed packet return to some earlier point 
in the stack for further netfilter processing or routing?  What about 
transport mode?

Q3: How can iptables rules determine whether they are dealing with a 
packet which has been de-encapsulated from (or encapsulated within) an 
IPsec wrapper?

Q4: Is it true that NAT-Traversal isn't implemented for transport mode?

In RFC 2401 (Security Architecture for the Internet Protocol), section 5
includes this text:

   As mentioned in Section 4.4.1 The Security Policy Database (SPD),
   the SPD must be consulted during the processing of all traffic
   (INBOUND and OUTBOUND), including non-IPsec traffic.  If no policy is
   found in the SPD that matches the packet (for either inbound or
   outbound traffic), the packet MUST be discarded.

But on Linux systems, by default the SPD is normally empty (as shown by
setkey -DP) and all packets are allowed to pass unhindered.

Q5: Isn't this a violation of the RFC?  Or is there some implicit policy 
entry which accepts all packets without applying any security association?


Thanks for any answers.  I may think up more questions later...

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Runtime power management for network interfaces

2006-07-25 Thread Alan Stern
During a Power Management session at the Ottawa Linux Symposium, it was
generally agreed that network interface drivers ought to automatically
suspend their devices (if possible) whenever:

(1) The interface is ifconfig'ed down, or

(2) No link is available.

Presumably (1) should be easy enough to implement.  (2) might or might not
be feasible, depending on how much WOL support is available.  (It might
not be feasible at all for wireless networking.)  Still, there can be no
question that it would be a Good Thing for laptops to power-down their
ethernet controllers when the network cable is unplugged.

Has any progress been made in this direction?  If not, a natural approach 
would be to start with a reference implementation in one driver which 
could then be copied to other drivers.

Any suggestions?

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html