Re: What is 2.4 Linux networking performance like compared to BSD?

2001-02-28 Thread Todd

hans,

we've found that the TCP and UDP performance on 2.4 is *dramatically*
better than 2.2.  with the acenic gig-e driver on PIII-933 UP (66MHz x
64bits PCI) we are getting 993 Mb/s with 2.4.0 with jumbo frames (about
850 Mb/s with standard ethernet frames).  the best number we got with 2.2
was about 650 with jumbos and 550 with standard.

i'd recommend it's networking performance to anyone.

todd underwood
[EMAIL PROTECTED]

On Thu, 1 Mar 2001, Hans Reiser wrote:

> Date: Thu, 01 Mar 2001 02:26:20 +0300
> From: Hans Reiser <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: What is 2.4 Linux networking performance like compared to BSD?
>
> I have a client that wants to implement a webcache, but is very leery of
> implementing it on Linux rather than BSD.
>
> They know that iMimic's polymix performance on Linux 2.2.* is half what it is on
> BSD.  Has the Linux 2.4 networking code caught up to BSD?
>
> Can I tell them not to worry about the Linux networking code strangling their
> webcache product's performance, or not?
>
> Hans
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



UDP performance drop from -test9 to 2.4.0 (fwd)

2001-01-08 Thread Todd

sorry for the resend. i sent this earlier today but still haven't seen it,
so i'm resending without attachments.

the originally attached netperf results are at:

http://www.unm.edu/~todd/udp.2.4.0-test9.9000mtu
http://www.unm.edu/~todd/udp.2.4.0-test9.1500mtu
http://www.unm.edu/~todd/udp.2.4.0.9000mtu
http://www.unm.edu/~todd/udp.2.4.0.1500mtu

-- Forwarded message --
Date: Mon, 8 Jan 2001 09:38:24 -0700 (MST)
From: Todd <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: UDP performance drop from -test9 to 2.4.0

folx,

i'm seeing a signficant performance decline between the -test9 and the
release version 2.4.0 on udp on acenic gig-e cards.  i should say that
the performance is startlingly better than it was under 2.2.14 (where the
best we could get was 150+ Mbps on udp) but it still represents a
signficant decline.

i'm attaching netperf numbers with 1500 and 9000 byte mtus for both kernel
versions (with interrupt coalescing and everything else as default--this
is just a compile/boot/test scenario with no tuning in particular).
standard kernel, standard /proc settings, standard acenic firmware and
driver that shipped with the version in question.

the summary of results:  12% reduction in performance at 1500-byte mtu, 6%
reduction at 9000.  when looking at the results remember that the second
line in each case is the 'receive bandwith' line, which is the only useful
number to report.

kernel version  max udp bandwithmtu message size
==  === 
2.4.0-test9 554.9   15001470
2.4.0-test9 636.7   90008870
2.4.0   489.4   15001470
2.4.0   597.3   90007270

both versions show the horrible effect that the reassembly of fragments
has on receive bandwith, but that's not really my question for the day
:-).

any info would be helpful.  i'd be happy to run any further tests to get
at the bottom of this.

todd

=
Todd Underwood, [EMAIL PROTECTED]
=


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



hw or other prob?

2000-11-16 Thread Todd

folx,

on 2.4.0-test6 i got a bunch of the following log messages and then the
console went dead (well, power-saving monitor and no amount of activity
on the keyboard or mouse made it wake up), but the machine still responded
to pings (could not connect w/ telnet or ssh or rlogin). 

does this look like a genuine hw problem or should i upgrade to a newer
2.4.0 test kernel?  i'm happy to do any troubleshooting suggested.

log messages (lots and lots of them.  they all look just like this):

Nov 12 06:43:30 zapata kernel: APIC error interrupt on CPU#0, should never
happen. 
Nov 12 06:43:30 zapata kernel: ... APIC ESR0: 0002 
Nov 12 06:43:30 zapata kernel: ... APIC ESR1: 0002 
Nov 12 06:43:30 zapata kernel: ... bit 1: APIC Receive CS Error (hw
problem). 
Nov 12 06:43:30 zapata kernel: APIC error interrupt on CPU#1, should never
happen. 
Nov 12 06:43:30 zapata kernel: ... APIC ESR0: 0002 
Nov 12 06:43:30 zapata kernel: ... APIC ESR1: 0002 
Nov 12 06:43:30 zapata kernel: ... bit 1: APIC Receive CS Error (hw
problem). 
Nov 12 07:43:18 zapata kernel: APIC error interrupt on CPU#1, should never
happen. 
Nov 12 07:43:18 zapata kernel: ... APIC ESR0: 0002 
Nov 12 07:43:18 zapata kernel: ... APIC ESR1: 0002 
Nov 12 07:43:18 zapata kernel: ... bit 1: APIC Receive CS Error (hw
problem). 
Nov 12 07:43:18 zapata kernel: APIC error interrupt on CPU#0, should never
happen. 

machine is smp pIII450 running on a intel N440bx mb.  main disk is ide,
but i'm using sym53c8xx to access two scsi disks in a logical volume.

thanks,


todd underwood
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: zero-copy TCP

2000-09-12 Thread Todd

> > Jes Sorensen wrote:
> > > It took me a little while in the beginning to convince Alteon to open
> > > up and provide docs, but since they saw the light they have been
> > > extremely helpful and went much further in their openness than I had
> > > ever expected or dared to hope for.
> > 
> > And now it's really showing in their favour.  An amazing number of
> > research groups are writing applications to run on the Alteon cards.

While I agree with what's going on right now, the recent purchase of
Alteon by Nortel (primarily for their switch line, not for the
NICs) leaves quite a bit of doubt in my mind about the future of the card
and the openness of the firmware in particular.

todd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [ANNOUNCE] Dolphin PCI-SCI RPM Drivers 1.1-4 released

2001-01-29 Thread Todd

folx,

i must be missing something here.  i'm not aware of a PCI bus that only
supports 70 MBps but i am probably ignorant.  this is why i was confused
by jeff's performance numbers.  33MHz 32-bit PCI busses should do around
120MB/s (just do the math 33*32/8 allowing for some overhead of PCI bus
negotiation), much greater than the numbers jeff is reporting.  66 MHz
64bit busses should do on the order of 500MB/s.

the performance numbers that jeff is reporting are not very impressive
even for the slowest PCI bus.  we're seeing 993 Mbps (124MB/s) using the
alteon acenic gig-e cards on 32-bit cards on a 66MHz bus.  i would expect
to get somewhat slower on a 33MHz bus but not catastrophically so
(certainly nothing as slow as 60MB/s or 480Mb/s).

what am i misunderstanding here?

todd

On Mon, 29 Jan 2001, Jeff V. Merkey wrote:

> Date: Mon, 29 Jan 2001 16:49:53 -0700
> From: Jeff V. Merkey <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: [ANNOUNCE] Dolphin PCI-SCI RPM Drivers 1.1-4 released
>
>
> Relative to some performance questions folks have asked, the SCI
> adapters are limited by PCI bus speeds.  If your system supports
> 64-bit PCI you get much higher numbers.  If you have a system
> that supports 100+ Megabyte/second PCI throughput, the SCI
> adapters will exploit it.
>
> This test was performed in on a 32-bit PCI system with a PCI bus
> architecture that's limited to 70 MB/S.
>
> Jeff
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
>

=
Todd Underwood, [EMAIL PROTECTED]

criticaltv.com
news, analysis and criticism.  about tv.
and other stuff.

=

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [ANNOUNCE] Dolphin PCI-SCI RPM Drivers 1.1-4 released

2001-01-30 Thread Todd

folx,

On Tue, 30 Jan 2001, Jeff V. Merkey wrote:
> What numbers does G-Enet provide
> doing userspace -> userspace transfers, and at what processor
> overhead?

using stock 2.4 kernel and alteon acenic cards with stock firmware we're
seeing 993 MBps userspace->userspace (running netperf UDP_STREAM tests,
which run as userspace client and server) with 88% CPU utilization.

Using a modified version of the firmware that we wrote we're getting
993Mbps with 55% CPU utilization.

> I posted the **ACCURATE** numbers from my test, but I did clarify that I
> was using a system with a limp PCI bus.
>
> Jeff

i appreciate that.  i'm just trying to figure out why the numbers are so
low compared to the network speed you mentioned.

todd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [ANNOUNCE] Dolphin PCI-SCI RPM Drivers 1.1-4 released

2001-01-30 Thread Todd

jeff,

On Tue, 30 Jan 2001, Jeff V. Merkey wrote:

> On 32 bit PCI, the average we are seeing going userpace -> userspace is
> 120-140 MB/S ranges in those systems that have a PCI bus with
> bridge chipsets that can support these data rates.
>
> That's 2 x G-Enet.

good numbers.  not really 2 x Gig-e, though, is it.  we're getting 993Mbps
(124Mb/s) on alteon acenic gig-e adapters on 32bit/66MHz pci machines.
i'm not saying that the nubmers aren't good or that the technology doesn't
sound promising (SCI is definitely cool stuff).  I'm just trying to put
the numbers in perspective and show that the targets should be higher.

i like your argument about cpu utilization. that's really the key.
anything that moves significant traffic and can reduce cpu utilization
will help enormously.  unfortunately, you've not posted any cpu numbers.

this is analogous to transmeta's complaint about battery life and
processor speed:  no one benchmarks both metrics at the same time.  if
they did, intel chips would not look nearly as good as they do.
similarly, most people are still not posting bandwidth numbers along with
cpu utilization numbers.  if they did, lots of fast networks would be less
appealing.

todd

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Website question

2005-02-11 Thread Todd
Dear Linux Kernel,

   We have visited your website spinics.net today , and would like to 
request a link exchange between our websites.

You may visit our website at barretire.com. Our website contains 
quality automotive content and tools that visitors of your website will 

find useful.

Please reply if you are interested in exchanging links with us, and we 
will post a link to your website immediately.

Here is the information you will need to create a link to our website:

Title: Wheels & Tires shipped to your door 
Link: http://www.barretire.com

Desc: Your internet wheel & tire source, Specializing in American 
Racing wheels, Motegi Wheels, Motto wheels, Riax wheels, Nitto tires, 
BF Goodrich tires, Yokohama tires & more. We also offer package deals 
shipped directly to your door.



We look forward to partnering together.


Best regards,

Todd Hebert
LA Wheels Direct

   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-26 Thread Fujinaka, Todd
On Tue, 20 Nov 2012, Joe Jin wrote:

> On 11/20/12 16:59, Dave, Tushar N wrote:
>> Have you power off the system completely after modifying eeprom? If not 
>> please do so.
>
> Hi Tushar,
>
> Seems not works for me, would you please help to check what is wrong of my 
> operations?

...

> # lspci -s :52:00.1 -vvv
> 52:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
> Controller (rev 06)
> <--snip-->
>   Capabilities: [e0] Express (v1) Endpoint, MSI 00
>   DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, 
> L1 <64us

>   ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>   DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ 
> Unsupported+
>   RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>   MaxPayload 128 bytes, MaxReadReq 4096 bytes
>   ^
>
 <--snip-->

If you look at the previous section, DevCap, you'll see that it's
correctly advertising 256 bytes but the system is negotiating 128 for
the link to the Ethernet controller. Things on the "other" side of the
link are controlled outside of the e1000 driver.

Tushar's first suggestion was to check the PCIe payload settings in the
entire chain. Have you done that? Mismatches will cause hangs.

Todd Fujinaka
Technical Marketing Engineer
LAN Access Division (LAD)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-27 Thread Fujinaka, Todd
Forgive me if I'm being too repetitious as I think some of this has been 
mentioned in the past.

We (and by we I mean the Ethernet part and driver) can only change the 
advertised availability of a larger MaxPayloadSize. The size is negotiated by 
both sides of the link when the link is established. The driver should not 
change the size of the link as it would be poking at registers outside of its 
scope and is controlled by the upstream bridge (not us).

You also need to check all the PCIe links to get to the device. There can be 
several to get from the root complex, through bridges, to the endpoint Ethernet 
controller. The Ethernet part and driver has no control over any other links. 
You'll have to talk to the motherboard manufacturer about those links.

Your original problem appears to be hangs and Tushar asked you to the entire 
path of PCIe connections from the root complex to the endpoint. Any mismatches 
in payload can cause hangs and I believe you have had the problem in the past. 
I'm sure you remember all the lspci commands to list the tree view and to dump 
all the details from each of the links and I would suggest you do that to check 
to see that the payload sizes match. What I do is "lspci -tvvv" to see what's 
connected, then "lspci -s xx:xx.x -vvv" to check the devices on the link.

Thanks.

Todd Fujinaka
Technical Marketing Engineer
LAN Access Division (LAD)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565


-Original Message-
From: Mary Mcgrath [mailto:mary.mcgr...@oracle.com] 
Sent: Monday, November 26, 2012 6:07 PM
To: Joe Jin
Cc: net...@vger.kernel.org; e1000-de...@lists.sf.net; 
linux-kernel@vger.kernel.org
Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

Joe
Thank you for working this.
I would love to find out how they expect a customer to make the modification To 
 "word  0x1A, and see if the 8th bit is 0 or 1, and to change to 0."

I have in turn asked the ct for the lspci command on eth3, maybe the incorrect 
setting is upstream.

Again,  thank you.
Regards
Mary



-Original Message-
From: Joe Jin
Sent: Monday, November 26, 2012 8:00 PM
To: Fujinaka, Todd
Cc: Dave, Tushar N; net...@vger.kernel.org; e1000-de...@lists.sf.net; 
linux-kernel@vger.kernel.org; Mary Mcgrath
Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

On 11/27/12 00:23, Fujinaka, Todd wrote:
> If you look at the previous section, DevCap, you'll see that it's 
> correctly advertising 256 bytes but the system is negotiating 128 for 
> the link to the Ethernet controller. Things on the "other" side of the 
> link are controlled outside of the e1000 driver.
> 
> Tushar's first suggestion was to check the PCIe payload settings in 
> the entire chain. Have you done that? Mismatches will cause hangs.

Hi Todd,

So far I had to know how to modify the maxpayload size, since BIOS have not 
entry to change this, so I had to use ethtool, now I need to get the offset of 
MaxPayload size in eeprom, I ever tried to find from Intel online document but 
failed, any idea?

Thanks in advance,
Joe

--
Monitor your physical, virtual and cloud infrastructure from a single web 
console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud 
infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
E1000-devel mailing list
e1000-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about IntelĀ® Ethernet, visit 
http://communities.intel.com/community/wired
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-27 Thread Fujinaka, Todd
Thanks for the clarification. I was just going by the PCIe spec, which says the 
lowest value of both ends is used, and I figured SOMETHING had to be looking at 
that and doing some sort of negotiation. I'm no BIOS guy, so I'm not sure 
what's actually going on, whether something walks the PCIe tree or if the BIOS 
just sets all the values to the minimum.

Todd Fujinaka
Technical Marketing Engineer
LAN Access Division (LAD)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565


-Original Message-
From: Ben Hutchings [mailto:bhutchi...@solarflare.com] 
Sent: Tuesday, November 27, 2012 10:11 AM
To: Fujinaka, Todd; Mary Mcgrath
Cc: Joe Jin; net...@vger.kernel.org; e1000-de...@lists.sf.net; 
linux-kernel@vger.kernel.org; linux-pci
Subject: RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang

On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
> Forgive me if I'm being too repetitious as I think some of this has 
> been mentioned in the past.
> 
> We (and by we I mean the Ethernet part and driver) can only change the 
> advertised availability of a larger MaxPayloadSize. The size is 
> negotiated by both sides of the link when the link is established. The 
> driver should not change the size of the link as it would be poking at 
> registers outside of its scope and is controlled by the upstream 
> bridge (not us).
[...]

MaxPayloadSize (MPS) is not negotiated between devices but is programmed by the 
system firmware (at least for devices present at boot - the kernel may be 
responsible in case of hotplug).  You can use the kernel parameter 
'pci=pcie_bus_perf' (or one of several others) to set a policy that overrides 
this, but no policy will allow setting MPS above the device's 
MaxPayloadSizeSupported (MPSS).

(These parameters are not documented in
Documentation/kernel-parameters.txt!  Someone ought to fix that.)

Ben.

--
Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's 
the marketing department's job.
They asked us to note that Solarflare product names are trademarked.



RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-28 Thread Fujinaka, Todd
The only EEPROM I know about or can speak to is the one attached to the 82571 
and it doesn't set the MaxPayloadSize. That's done by the BIOS.

Todd Fujinaka
Technical Marketing Engineer
LAN Access Division (LAD)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565


-Original Message-
From: Joe Jin [mailto:joe@oracle.com] 
Sent: Wednesday, November 28, 2012 12:31 AM
To: Ben Hutchings
Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; 
e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

On 11/28/12 02:10, Ben Hutchings wrote:
> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>> Forgive me if I'm being too repetitious as I think some of this has 
>> been mentioned in the past.
>>
>> We (and by we I mean the Ethernet part and driver) can only change 
>> the advertised availability of a larger MaxPayloadSize. The size is 
>> negotiated by both sides of the link when the link is established. 
>> The driver should not change the size of the link as it would be 
>> poking at registers outside of its scope and is controlled by the 
>> upstream bridge (not us).
> [...]
> 
> MaxPayloadSize (MPS) is not negotiated between devices but is 
> programmed by the system firmware (at least for devices present at 
> boot - the kernel may be responsible in case of hotplug).  You can use 
> the kernel parameter 'pci=pcie_bus_perf' (or one of several others) to 
> set a policy that overrides this, but no policy will allow setting MPS 
> above the device's MaxPayloadSizeSupported (MPSS).
> 

Ben,

Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
So I'm trying to use ethtool modify it from eeprom to see if help or no.


Todd, I'll review all MaxPayload for all devices, but need to say if it 
mismatch, customer could not modify it from BIOS for there was not entry at 
there, to test it, we have to find how to verify if this is the root cause, so 
still need to find the offset in eeprom.

Thanks in advance,
Joe



RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2012-11-29 Thread Fujinaka, Todd
Someone else pointed this out to me locally. If you have a non-client BIOS, you 
should be able to set the MaxPayloadSize using setpci. You have to make sure 
that you're being consistent throughout all the associated links.

Todd Fujinaka
Technical Marketing Engineer
LAN Access Division (LAD)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565


-Original Message-
From: Ethan Zhao [mailto:ethan.ker...@gmail.com] 
Sent: Wednesday, November 28, 2012 7:10 PM
To: Fujinaka, Todd
Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; net...@vger.kernel.org; 
e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

Joe,
Possibly your customer is running a kernel without source code on a 
platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server 
?).
Anyway, to see if is a payload issue or,  you could change the payload size 
with setpci tool to those devices and set the link retrain bit to trigger the 
link retraining to debug the issue and identity the root cause.  I thinks it is 
much easier than modify the BIOS or  eeprom of NIC.

e.g.
   set device control register to 0f 00   (128 bytes payload size)
   #   setpci -v -s 00:02.0 98.w=000f
   set device link control register to 60h (retrain the link)
   #  setpci -v -s 00:02.0 a0.b=60

  Hope it works,  Just my 2 cents.

ethan.z...@oracle.com

On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd  
wrote:
> The only EEPROM I know about or can speak to is the one attached to the 82571 
> and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>
> Todd Fujinaka
> Technical Marketing Engineer
> LAN Access Division (LAD)
> Intel Corporation
> todd.fujin...@intel.com
> (503) 712-4565
>
>
> -Original Message-
> From: Joe Jin [mailto:joe@oracle.com]
> Sent: Wednesday, November 28, 2012 12:31 AM
> To: Ben Hutchings
> Cc: Fujinaka, Todd; Mary Mcgrath; net...@vger.kernel.org; 
> e1000-de...@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>
> On 11/28/12 02:10, Ben Hutchings wrote:
>> On Tue, 2012-11-27 at 17:32 +, Fujinaka, Todd wrote:
>>> Forgive me if I'm being too repetitious as I think some of this has 
>>> been mentioned in the past.
>>>
>>> We (and by we I mean the Ethernet part and driver) can only change 
>>> the advertised availability of a larger MaxPayloadSize. The size is 
>>> negotiated by both sides of the link when the link is established.
>>> The driver should not change the size of the link as it would be 
>>> poking at registers outside of its scope and is controlled by the 
>>> upstream bridge (not us).
>> [...]
>>
>> MaxPayloadSize (MPS) is not negotiated between devices but is 
>> programmed by the system firmware (at least for devices present at 
>> boot - the kernel may be responsible in case of hotplug).  You can 
>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several 
>> others) to set a policy that overrides this, but no policy will allow 
>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>
>
> Ben,
>
> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>
>
> Todd, I'll review all MaxPayload for all devices, but need to say if it 
> mismatch, customer could not modify it from BIOS for there was not entry at 
> there, to test it, we have to find how to verify if this is the root cause, 
> so still need to find the offset in eeprom.
>
> Thanks in advance,
> Joe
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] alarmtimer: add functions for timerfd support

2013-05-15 Thread Todd Poynor
Add functions needed for hooking up alarmtimer to timerfd:

* alarm_restart: Similar to hrtimer_restart, restart an alarmtimer after
  the expires time has already been updated (as with alarm_forward).

* alarm_forward_now: Similar to hrtimer_forward_now, move the expires
  time forward to an interval from the current time of the associated clock.

* alarm_start_relative: Start an alarmtimer with an expires time relative to
  the current time of the associated clock.

* alarm_expires_remaining: Similar to hrtimer_expires_remaining, return the
  amount of time remaining until alarm expiry.

Signed-off-by: Todd Poynor 
---
 include/linux/alarmtimer.h |  4 
 kernel/time/alarmtimer.c   | 39 ++-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/include/linux/alarmtimer.h b/include/linux/alarmtimer.h
index 9069694..a899402 100644
--- a/include/linux/alarmtimer.h
+++ b/include/linux/alarmtimer.h
@@ -44,10 +44,14 @@ struct alarm {
 void alarm_init(struct alarm *alarm, enum alarmtimer_type type,
enum alarmtimer_restart (*function)(struct alarm *, ktime_t));
 int alarm_start(struct alarm *alarm, ktime_t start);
+int alarm_start_relative(struct alarm *alarm, ktime_t start);
+void alarm_restart(struct alarm *alarm);
 int alarm_try_to_cancel(struct alarm *alarm);
 int alarm_cancel(struct alarm *alarm);
 
 u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval);
+u64 alarm_forward_now(struct alarm *alarm, ktime_t interval);
+ktime_t alarm_expires_remaining(const struct alarm *alarm);
 
 /* Provide way to access the rtc device being used by alarmtimers */
 struct rtc_device *alarmtimer_get_rtcdev(void);
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index f11d83b..3e5cba2 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -199,6 +199,12 @@ static enum hrtimer_restart alarmtimer_fired(struct 
hrtimer *timer)
 
 }
 
+ktime_t alarm_expires_remaining(const struct alarm *alarm)
+{
+   struct alarm_base *base = &alarm_bases[alarm->type];
+   return ktime_sub(alarm->node.expires, base->gettime());
+}
+
 #ifdef CONFIG_RTC_CLASS
 /**
  * alarmtimer_suspend - Suspend time callback
@@ -305,7 +311,7 @@ void alarm_init(struct alarm *alarm, enum alarmtimer_type 
type,
 }
 
 /**
- * alarm_start - Sets an alarm to fire
+ * alarm_start - Sets an absolute alarm to fire
  * @alarm: ptr to alarm to set
  * @start: time to run the alarm
  */
@@ -325,6 +331,31 @@ int alarm_start(struct alarm *alarm, ktime_t start)
 }
 
 /**
+ * alarm_start_relative - Sets a relative alarm to fire
+ * @alarm: ptr to alarm to set
+ * @start: time relative to now to run the alarm
+ */
+int alarm_start_relative(struct alarm *alarm, ktime_t start)
+{
+   struct alarm_base *base = &alarm_bases[alarm->type];
+
+   start = ktime_add(start, base->gettime());
+   return alarm_start(alarm, start);
+}
+
+void alarm_restart(struct alarm *alarm)
+{
+   struct alarm_base *base = &alarm_bases[alarm->type];
+   unsigned long flags;
+
+   spin_lock_irqsave(&base->lock, flags);
+   hrtimer_set_expires(&alarm->timer, alarm->node.expires);
+   hrtimer_restart(&alarm->timer);
+   alarmtimer_enqueue(base, alarm);
+   spin_unlock_irqrestore(&base->lock, flags);
+}
+
+/**
  * alarm_try_to_cancel - Tries to cancel an alarm timer
  * @alarm: ptr to alarm to be canceled
  *
@@ -394,6 +425,12 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, 
ktime_t interval)
return overrun;
 }
 
+u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+{
+   struct alarm_base *base = &alarm_bases[alarm->type];
+
+   return alarm_forward(alarm, base->gettime(), interval);
+}
 
 
 
-- 
1.8.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] timerfd: add alarm timers

2013-05-15 Thread Todd Poynor
Add support for clocks CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM,
thereby enabling wakeup alarm timers via file descriptors.

Signed-off-by: Todd Poynor 
---
 fs/timerfd.c | 131 ---
 1 file changed, 108 insertions(+), 23 deletions(-)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 32b644f..9293121 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -8,6 +8,7 @@
  *
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -26,7 +27,10 @@
 #include 
 
 struct timerfd_ctx {
-   struct hrtimer tmr;
+   union {
+   struct hrtimer tmr;
+   struct alarm alarm;
+   } t;
ktime_t tintv;
ktime_t moffs;
wait_queue_head_t wqh;
@@ -41,14 +45,19 @@ struct timerfd_ctx {
 static LIST_HEAD(cancel_list);
 static DEFINE_SPINLOCK(cancel_lock);
 
+static inline bool isalarm(struct timerfd_ctx *ctx)
+{
+   return ctx->clockid == CLOCK_REALTIME_ALARM ||
+   ctx->clockid == CLOCK_BOOTTIME_ALARM;
+}
+
 /*
  * This gets called when the timer event triggers. We set the "expired"
  * flag, but we do not re-arm the timer (in case it's necessary,
  * tintv.tv64 != 0) until the timer is accessed.
  */
-static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
+static void timerfd_triggered(struct timerfd_ctx *ctx)
 {
-   struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx, tmr);
unsigned long flags;
 
spin_lock_irqsave(&ctx->wqh.lock, flags);
@@ -56,10 +65,25 @@ static enum hrtimer_restart timerfd_tmrproc(struct hrtimer 
*htmr)
ctx->ticks++;
wake_up_locked(&ctx->wqh);
spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+}
 
+static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
+{
+   struct timerfd_ctx *ctx = container_of(htmr, struct timerfd_ctx,
+  t.tmr);
+   timerfd_triggered(ctx);
return HRTIMER_NORESTART;
 }
 
+static enum alarmtimer_restart timerfd_alarmproc(struct alarm *alarm,
+   ktime_t now)
+{
+   struct timerfd_ctx *ctx = container_of(alarm, struct timerfd_ctx,
+  t.alarm);
+   timerfd_triggered(ctx);
+   return ALARMTIMER_NORESTART;
+}
+
 /*
  * Called when the clock was set to cancel the timers in the cancel
  * list. This will wake up processes waiting on these timers. The
@@ -107,8 +131,9 @@ static bool timerfd_canceled(struct timerfd_ctx *ctx)
 
 static void timerfd_setup_cancel(struct timerfd_ctx *ctx, int flags)
 {
-   if (ctx->clockid == CLOCK_REALTIME && (flags & TFD_TIMER_ABSTIME) &&
-   (flags & TFD_TIMER_CANCEL_ON_SET)) {
+   if ((ctx->clockid == CLOCK_REALTIME ||
+ctx->clockid == CLOCK_REALTIME_ALARM) &&
+   (flags & TFD_TIMER_ABSTIME) && (flags & TFD_TIMER_CANCEL_ON_SET)) {
if (!ctx->might_cancel) {
ctx->might_cancel = true;
spin_lock(&cancel_lock);
@@ -124,7 +149,11 @@ static ktime_t timerfd_get_remaining(struct timerfd_ctx 
*ctx)
 {
ktime_t remaining;
 
-   remaining = hrtimer_expires_remaining(&ctx->tmr);
+   if (isalarm(ctx))
+   remaining = alarm_expires_remaining(&ctx->t.alarm);
+   else
+   remaining = hrtimer_expires_remaining(&ctx->t.tmr);
+
return remaining.tv64 < 0 ? ktime_set(0, 0): remaining;
 }
 
@@ -142,11 +171,28 @@ static int timerfd_setup(struct timerfd_ctx *ctx, int 
flags,
ctx->expired = 0;
ctx->ticks = 0;
ctx->tintv = timespec_to_ktime(ktmr->it_interval);
-   hrtimer_init(&ctx->tmr, clockid, htmode);
-   hrtimer_set_expires(&ctx->tmr, texp);
-   ctx->tmr.function = timerfd_tmrproc;
+
+   if (isalarm(ctx)) {
+   alarm_init(&ctx->t.alarm,
+  ctx->clockid == CLOCK_REALTIME_ALARM ?
+  ALARM_REALTIME : ALARM_BOOTTIME,
+  timerfd_alarmproc);
+   } else {
+   hrtimer_init(&ctx->t.tmr, clockid, htmode);
+   hrtimer_set_expires(&ctx->t.tmr, texp);
+   ctx->t.tmr.function = timerfd_tmrproc;
+   }
+
if (texp.tv64 != 0) {
-   hrtimer_start(&ctx->tmr, texp, htmode);
+   if (isalarm(ctx)) {
+   if (flags & TFD_TIMER_ABSTIME)
+   alarm_start(&ctx->t.alarm, texp);
+   else
+   alarm_start_relative(&ctx->t.alarm, texp);
+   } else {
+   hrtimer_start(&ctx->t.tmr, texp, htmode);
+   }
+
if (timerfd_canceled(ctx))
return -ECANCELED;
}
@@ -

[PATCH 0/2] timerfd support for wakeup alarm timers

2013-05-15 Thread Todd Poynor
Enable wakeup alarm timers via file descriptors.  Hook up alarmtimer
to timerfd via clocks CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM,
and add needed functions to alarmtimer.

 fs/timerfd.c   |  131 +
 include/linux/alarmtimer.h |4 +
 kernel/time/alarmtimer.c   |   39 +
 3 files changed, 150 insertions(+), 24 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] alarmtimer: implement minimum alarm interval for allowing suspend

2012-08-09 Thread Todd Poynor
alarmtimer suspend return -EBUSY if the next alarm will fire in less
than 2 seconds.  This allows one RTC seconds tick to occur subsequent
to this check before the alarm wakeup time is set, ensuring the wakeup
time is still in the future (assuming the RTC does not tick one more
second prior to setting the alarm).

If suspend is rejected due to an imminent alarm, hold a wakeup source
for 2 seconds to process the alarm prior to reattempting suspend.

If setting the alarm incurs an -ETIME for an alarm set in the past,
or any other problem setting the alarm, abort suspend and hold a
wakelock for 1 second while the alarm is allowed to be serviced or
other hopefully transient conditions preventing the alarm clear up.

Signed-off-by: Todd Poynor 
---
 kernel/time/alarmtimer.c |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index aa27d39..f979d85 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -46,6 +46,8 @@ static struct alarm_base {
 static ktime_t freezer_delta;
 static DEFINE_SPINLOCK(freezer_delta_lock);
 
+static struct wakeup_source *ws;
+
 #ifdef CONFIG_RTC_CLASS
 /* rtc timer and device for setting alarm wakeups at suspend */
 static struct rtc_timerrtctimer;
@@ -250,6 +252,7 @@ static int alarmtimer_suspend(struct device *dev)
unsigned long flags;
struct rtc_device *rtc;
int i;
+   int ret;
 
spin_lock_irqsave(&freezer_delta_lock, flags);
min = freezer_delta;
@@ -279,8 +282,10 @@ static int alarmtimer_suspend(struct device *dev)
if (min.tv64 == 0)
return 0;
 
-   /* XXX - Should we enforce a minimum sleep time? */
-   WARN_ON(min.tv64 < NSEC_PER_SEC);
+   if (ktime_to_ns(min) < 2 * NSEC_PER_SEC) {
+   __pm_wakeup_event(ws, 2 * MSEC_PER_SEC);
+   return -EBUSY;
+   }
 
/* Setup an rtc timer to fire that far in the future */
rtc_timer_cancel(rtc, &rtctimer);
@@ -288,9 +293,11 @@ static int alarmtimer_suspend(struct device *dev)
now = rtc_tm_to_ktime(tm);
now = ktime_add(now, min);
 
-   rtc_timer_start(rtc, &rtctimer, now, ktime_set(0, 0));
-
-   return 0;
+   /* Set alarm, if in the past reject suspend briefly to handle */
+   ret = rtc_timer_start(rtc, &rtctimer, now, ktime_set(0, 0));
+   if (ret < 0)
+   __pm_wakeup_event(ws, 1 * MSEC_PER_SEC);
+   return ret;
 }
 #else
 static int alarmtimer_suspend(struct device *dev)
@@ -821,6 +828,7 @@ static int __init alarmtimer_init(void)
error = PTR_ERR(pdev);
goto out_drv;
}
+   ws = wakeup_source_register("alarmtimer");
return 0;
 
 out_drv:
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Hang when booting 2.4.0

2001-01-26 Thread Todd Goodman

My apologies in advance if I missed this in the archives (I found three
or four threads discussing hangs at boot, but none related to 2.4.0 and
the suggestions for older kernels that still applied to code in 2.4.0
didn't change the symptoms).

My kernel hangs after the "Ok, booting the kernel." message.

This is on a machine with a DX440LX motherboard and only a single
300Mhz PII installed.

I can boot 2.2.16 and 2.2.18 kernels successfully.

I've tried paring down the config to a minimum (.config included below),
but still have the problem.  Specifically removing any SCSI since I
had heard of problems with the AIC-78xx.

Thanks for any ideas or insight,

Todd Goodman




#
# Automatically generated by make menuconfig: don't edit
#
CONFIG_X86=y
CONFIG_ISA=y
# CONFIG_SBUS is not set
CONFIG_UID16=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y

#
# Loadable module support
#
# CONFIG_MODULES is not set

#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
CONFIG_M686FXSR=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_FXSR=y
CONFIG_X86_XMM=y
# CONFIG_TOSHIBA is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
CONFIG_MTRR=y
# CONFIG_SMP is not set
# CONFIG_X86_UP_IOAPIC is not set

#
# General setup
#
CONFIG_NET=y
# CONFIG_VISWS is not set
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set
CONFIG_SYSVIPC=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
# CONFIG_KCORE_AOUT is not set
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
# CONFIG_PM is not set
# CONFIG_ACPI is not set
# CONFIG_APM is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
# CONFIG_PARPORT is not set

#
# Plug and Play configuration
#
# CONFIG_PNP is not set
# CONFIG_ISAPNP is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_LOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_BLK_DEV_INITRD is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set
# CONFIG_BLK_DEV_MD is not set
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
# CONFIG_MD_RAID1 is not set
# CONFIG_MD_RAID5 is not set
# CONFIG_BLK_DEV_LVM is not set
# CONFIG_LVM_PROC_FS is not set

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
# CONFIG_NETLINK is not set
# CONFIG_NETFILTER is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_INET_ECN is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_IPV6 is not set
# CONFIG_KHTTPD is not set
# CONFIG_ATM is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_DECNET is not set
# CONFIG_BRIDGE is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_LLC is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set
# CONFIG_PHONE_IXJ is not set

#
# ATA/IDE/MFM/RLL support
#
CONFIG_IDE=y

#
# IDE, ATA and ATAPI Block devices
#
CONFIG_BLK_DEV_IDE=y
# CONFIG_BLK_DEV_HD_IDE is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
# CONFIG_BLK_DEV_IDEDISK_VENDOR is not set
# CONFIG_BLK_DEV_IDEDISK_FUJITSU is not set
# CONFIG_BLK_DEV_IDEDISK_IBM is not set
# CONFIG_BLK_DEV_IDEDISK_MAXTOR is not set
# CONFIG_BLK_DEV_IDEDISK_QUANTUM is not set
# CONFIG_BLK_DEV_IDEDISK_SEAGATE is not set
# CONFIG_BLK_DEV_IDEDISK_WD is not set
# CONFIG_BLK_DEV_COMMERIAL is not set
# CONFIG_BLK_DEV_TIVO is not set
# CONFIG_BLK_DEV_IDECS is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_B

RE: Hang when booting 2.4.0

2001-01-27 Thread Todd Goodman

Sorry, I was experiencing a senior moment and had the wrong CPU type
configured.

Once corrected everything is right in the world again.

Thanks to Mark Hahn for pointing out my mistake.

Todd

> -Original Message-
> My apologies in advance if I missed this in the archives (I found three
> or four threads discussing hangs at boot, but none related to 2.4.0 and
> the suggestions for older kernels that still applied to code in 2.4.0
> didn't change the symptoms).
> 
> My kernel hangs after the "Ok, booting the kernel." message.
> 
> This is on a machine with a DX440LX motherboard and only a single
> 300Mhz PII installed.
> 
> I can boot 2.2.16 and 2.2.18 kernels successfully.
> 
> I've tried paring down the config to a minimum (.config included below),
> but still have the problem.  Specifically removing any SCSI since I
> had heard of problems with the AIC-78xx.
> 
> Thanks for any ideas or insight,
> 
> Todd Goodman

[SNIP of .config]

> Please read the FAQ at http://www.tux.org/lkml/>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



PowerOP 0/3: System power operating point management API

2005-08-08 Thread Todd Poynor
PowerOP is a system power parameter management API submitted for
discussion.  PowerOP writes and reads power "operating points",
comprised of arbitrary integer-valued values, called power parameters,
that correspond to registers, clocks, dividers, voltage regulators,
etc. that may be modified to set a basic power/performance point for the
system.  The core basically passes an array of integer-valued power
parameters (with very little additional structure imposed by the core)
to a platform-specific backend that interprets those values and makes
the requested adjustments.  PowerOP is intended to leave all power
policy decisions to higher layers.  An optional sysfs representation of
power parameters is also available, primarily for diagnostic use.

PowerOP can be thought of as a layer below cpufreq that actually
accesses the hardware to make cpu frequency, voltage, core bus, and
perhaps other modifications to set a power point, leaving cpufreq to
manage the interfaces based around the "cpu frequency" abstraction, the
policies and governors that select the frequency, its notifiers, and so
forth.  An example hooking up support for one cpufreq platform to
PowerOP is in patch 3/3.

Depending on the ability of the hardware to make software-controlled
power/performance adjustments, this may be useful to select custom
voltages, bus speeds, etc. in desktop/server systems.  Various embedded
systems have several parameters that can be set.  For example, an XScale
PXA27x could be considered to have six basic power parameters (mainly
cpu run mode and memory and bus dividers) that for the most part should
be set in tandem to known good sets of values as validated by the
silicon vendor, plus other parameters possible for disabling PLLs during
low-speed execution, and so forth.  PowerOP is aimed at supporting this
kind of system, where the cpu frequency abstraction specifies only part
of the operating point that may be managed from software.  It also
pushes the hardware-level power parameter management down to a level
that can be shared with other power management policy frameworks, in use
in some embedded systems, that wish to deal with entire operating points
as the basic unit of system power management..

There are many ways to tackle those issues, of course, and a new API
layer is arguably rather heavyweight.  This is one suggested way that
tries to minimize disturbing existing power management code.  Comments
very much appreciated.

Patch 2/3 is a desktop-oriented example of PowerOP; embedded examples
will follow soon.

--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PowerOP 1/3: PowerOP core

2005-08-08 Thread Todd Poynor
The PowerOP core provides the struct powerop_point that defines an
operating point, and trivial routines for calling the platform-specific
backend to read and set operating points and for registering a single
machine-dependent backend.

Operating points are an array of 32-bit integer-valued power parameters,
almost entirely interpreted by the platform-specific backend.
Higher-layer software shares information on the power parameters made
available by the backend (that is, the interpretation of the integers in
the array) via header files.  The value -1 is commonly used to denote an
unspecified value, but in situations where all ones is a valid power
parameter value some extra smarts may be needed.

It optionally adds sysfs attributes for reading and writing individual
power parameter values, mainly for diagnostic purposes.  For example,
using the Intel Centrino patch that follows, parameters for frequency
and core voltage for CPU #0 appear:

   # ls /sys/powerop/param/
   cpu0 v0

Index: linux-2.6.12/drivers/Makefile
===
--- linux-2.6.12.orig/drivers/Makefile  2005-06-30 01:32:17.0 +
+++ linux-2.6.12/drivers/Makefile   2005-07-29 00:39:44.0 +
@@ -58,6 +58,7 @@
 obj-$(CONFIG_ISDN) += isdn/
 obj-$(CONFIG_MCA)  += mca/
 obj-$(CONFIG_EISA) += eisa/
+obj-$(CONFIG_POWEROP)  += powerop/
 obj-$(CONFIG_CPU_FREQ) += cpufreq/
 obj-$(CONFIG_MMC)  += mmc/
 obj-$(CONFIG_INFINIBAND)   += infiniband/
Index: linux-2.6.12/drivers/powerop/Kconfig
===
--- linux-2.6.12.orig/drivers/powerop/Kconfig   1970-01-01 00:00:00.0 
+
+++ linux-2.6.12/drivers/powerop/Kconfig2005-07-28 07:27:26.0 
+
@@ -0,0 +1,17 @@
+#
+# powerop
+#
+
+menu "Platform Core Power Management"
+
+config POWEROP
+   bool "PowerOP Platform Core Power Management"
+   help
+
+config POWEROP_SYSFS
+   bool "  Enable PowerOP sysfs interface"
+   depends on POWEROP && SYSFS
+   help
+
+endmenu
+
Index: linux-2.6.12/drivers/powerop/Makefile
===
--- linux-2.6.12.orig/drivers/powerop/Makefile  1970-01-01 00:00:00.0 
+
+++ linux-2.6.12/drivers/powerop/Makefile   2005-07-28 07:29:04.0 
+
@@ -0,0 +1,2 @@
+obj-$(CONFIG_POWEROP)  += powerop.o
+
Index: linux-2.6.12/drivers/powerop/powerop.c
===
--- linux-2.6.12.orig/drivers/powerop/powerop.c 1970-01-01 00:00:00.0 
+
+++ linux-2.6.12/drivers/powerop/powerop.c  2005-08-04 19:50:38.0 
+
@@ -0,0 +1,253 @@
+/*
+ * PowerOP generic core functions
+ *
+ * Author: Todd Poynor <[EMAIL PROTECTED]>
+ *
+ * 2005 (c) MontaVista Software, Inc. This file is licensed under
+ * the terms of the GNU General Public License version 2. This program
+ * is licensed "as is" without any warranty of any kind, whether express
+ * or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct powerop_driver *powerop_driver;
+static int powerop_subsys_init;
+
+int
+powerop_set_point(struct powerop_point *point)
+{
+   return powerop_driver && powerop_driver->set_point ? 
+   powerop_driver->set_point(point) : -EINVAL;
+}
+
+
+int
+powerop_get_point(struct powerop_point *point)
+{
+   return powerop_driver && powerop_driver->get_point ? 
+   powerop_driver->get_point(point) : -EINVAL;
+}
+
+
+EXPORT_SYMBOL_GPL(powerop_set_point);
+EXPORT_SYMBOL_GPL(powerop_get_point);
+
+#ifdef CONFIG_POWEROP_SYSFS
+decl_subsys(powerop, NULL, NULL);
+
+static void powerop_kobj_release(struct kobject *kobj)
+{
+   return;
+}
+
+struct powerop_param_attribute {
+   int index;
+struct attributeattr;
+};
+
+#define to_param_attr(_attr) container_of(_attr,\
+   struct powerop_param_attribute,attr)
+
+static ssize_t
+powerop_param_attr_show(struct kobject * kobj, struct attribute * attr,
+   char * buf)
+{
+   struct powerop_param_attribute * param_attr = to_param_attr(attr);
+   struct powerop_point point;
+   ssize_t ret = 0;
+
+   if ((ret = powerop_get_point(&point)) == 0)
+   ret = sprintf(buf, "%d\n", point.param[param_attr->index]);
+   return ret;
+}
+
+static ssize_t
+powerop_param_attr_store(struct kobject * kobj, struct attribute * attr, 
+const char * buf, size_t count)
+{
+   struct powerop_param_attribute * param_attr = to_param_attr(attr);
+   struct powerop_point point;
+   int i;
+   ssize_t ret = 0;
+
+   for (i = 0; i < powerop_driver->nr_params; i++) 
+   if (i == pa

PowerOP 2/3: Intel Centrino support

2005-08-08 Thread Todd Poynor
Minimal PowerOP support for Intel Enhanced Speedstep "Centrino"
notebooks.  These systems run at an operating point comprised of a cpu
frequency and a core voltage.  The voltage could be set from the values
recommended in the datasheets if left unspecified (-1) in the operating
point, as cpufreq does.


Index: linux-2.6.12/arch/i386/Kconfig
===
--- linux-2.6.12.orig/arch/i386/Kconfig 2005-08-02 23:39:05.0 +
+++ linux-2.6.12/arch/i386/Kconfig  2005-08-03 01:11:21.0 +
@@ -1098,6 +1098,7 @@
 endmenu
 
 source "arch/i386/kernel/cpu/cpufreq/Kconfig"
+source "arch/i386/kernel/cpu/powerop/Kconfig"
 
 endmenu
 
Index: linux-2.6.12/arch/i386/kernel/cpu/Makefile
===
--- linux-2.6.12.orig/arch/i386/kernel/cpu/Makefile 2005-08-02 
23:39:05.0 +
+++ linux-2.6.12/arch/i386/kernel/cpu/Makefile  2005-08-03 01:11:21.0 
+
@@ -17,3 +17,4 @@
 
 obj-$(CONFIG_MTRR) +=  mtrr/
 obj-$(CONFIG_CPU_FREQ) +=  cpufreq/
+obj-$(CONFIG_POWEROP)  +=  powerop/
Index: linux-2.6.12/arch/i386/kernel/cpu/powerop/Kconfig
===
--- linux-2.6.12.orig/arch/i386/kernel/cpu/powerop/Kconfig  1970-01-01 
00:00:00.0 +
+++ linux-2.6.12/arch/i386/kernel/cpu/powerop/Kconfig   2005-08-03 
01:11:21.0 +
@@ -0,0 +1,6 @@
+source "drivers/powerop/Kconfig"
+
+config POWEROP_CENTRINO
+   tristate "PowerOP for Intel Centrino Enhanced Speedstep"
+   depends on POWEROP
+   default n
Index: linux-2.6.12/arch/i386/kernel/cpu/powerop/Makefile
===
--- linux-2.6.12.orig/arch/i386/kernel/cpu/powerop/Makefile 1970-01-01 
00:00:00.0 +
+++ linux-2.6.12/arch/i386/kernel/cpu/powerop/Makefile  2005-08-03 
01:11:21.0 +
@@ -0,0 +1,2 @@
+obj-$(CONFIG_POWEROP_CENTRINO) +=  powerop-centrino.o
+
Index: linux-2.6.12/arch/i386/kernel/cpu/powerop/powerop-centrino.c
===
--- linux-2.6.12.orig/arch/i386/kernel/cpu/powerop/powerop-centrino.c   
1970-01-01 00:00:00.0 +
+++ linux-2.6.12/arch/i386/kernel/cpu/powerop/powerop-centrino.c
2005-08-03 06:41:37.0 +
@@ -0,0 +1,160 @@
+/*
+ * PowerOP support for Intel Enhanced Speedstep "Centrino" platforms.
+ *
+ * Based heavily on speedstep-centrino.c of cpufreq Copyright (c) 2003
+ * by Jeremy Fitzhardinge <[EMAIL PROTECTED]>
+ *
+ * 2005 (c) MontaVista Software, Inc. This file is licensed under
+ * the terms of the GNU General Public License version 2. This program
+ * is licensed "as is" without any warranty of any kind, whether express
+ * or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static int
+powerop_centrino_get_point(struct powerop_point *point)
+{
+   unsigned l, h;
+   unsigned cpu_freq;
+
+   rdmsr(MSR_IA32_PERF_STATUS, l, h);
+   if (unlikely((cpu_freq = ((l >> 8) & 0xff) * 100) == 0)) {
+   /*
+* On some CPUs, we can see transient MSR values (which are
+* not present in _PSS), while CPU is doing some automatic
+* P-state transition (like TM2). Get the last freq set 
+* in PERF_CTL.
+*/
+   rdmsr(MSR_IA32_PERF_CTL, l, h);
+   cpu_freq = ((l >> 8) & 0xff) * 100;
+   }
+
+   point->param[POWEROP_CPU + smp_processor_id()] = cpu_freq;
+   point->param[POWEROP_V + smp_processor_id()] = ((l & 0xff) * 16) + 700;
+   return 0;
+}
+
+static int
+powerop_centrino_set_point(struct powerop_point *point)
+{
+   unsigned int msr = 0, oldmsr, h, mask = 0;
+
+   if (point->param[POWEROP_CPU + smp_processor_id()] != -1) {
+   msr |= (point->param[POWEROP_CPU + smp_processor_id()]/100)
+   << 8;
+   mask |= 0xff00;
+   }
+
+   if (point->param[POWEROP_V + smp_processor_id()] != -1) {
+   msr |= ((point->param[POWEROP_V + smp_processor_id()] - 700)
+   / 16);
+   mask |= 0xff;
+   }
+
+   rdmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+
+   if (msr == (oldmsr & mask))
+   return 0;
+
+   /* all but 16 LSB are "reserved", so treat them with
+  care */
+   oldmsr &= ~mask;
+   msr &= mask;
+   oldmsr |= msr;
+   
+   wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
+   return 0;
+}
+
+#ifdef CONFIG_POWEROP_SYSFS
+static char *powerop_param_names[POWEROP_DRIVER_MAX_PARAMS];
+
+static int __init powerop_centrino_sysfs_init(void)
+{
+   int i;
+
+   for (i = 0; i < NR_CPUS; i++) {
+   if (! (powerop_param_names[POWEROP_CPU + i] = 
+  kmalloc(5 + i / 10, GFP_KERNEL)) ||
+

PowerOP 3/3: Intel Centrino cpufreq integration

2005-08-08 Thread Todd Poynor
A minimal example of modifying cpufreq to use PowerOP for reading and
writing power parameters on Intel Centrino platforms.  It would be
possible to move voltage table lookups to the PowerOP layer.


Index: linux-2.6.12/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
===
--- linux-2.6.12.orig/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 
2005-08-04 19:49:29.0 +
+++ linux-2.6.12/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c  
2005-08-05 01:21:45.0 +
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_SPEEDSTEP_CENTRINO_ACPI
 #include 
@@ -322,29 +323,21 @@
 /* Return the current CPU frequency in kHz */
 static unsigned int get_cur_freq(unsigned int cpu)
 {
-   unsigned l, h;
-   unsigned clock_freq;
+   struct powerop_point point;
+   unsigned clock_freq = 0;
cpumask_t saved_mask;

saved_mask = current->cpus_allowed;
set_cpus_allowed(current, cpumask_of_cpu(cpu));
if (smp_processor_id() != cpu)
return 0;
 
-   rdmsr(MSR_IA32_PERF_STATUS, l, h);
-   clock_freq = extract_clock(l, cpu, 0);
+   if (powerop_get_point(&point))
+   goto out;
 
-   if (unlikely(clock_freq == 0)) {
-   /*
-* On some CPUs, we can see transient MSR values (which are
-* not present in _PSS), while CPU is doing some automatic
-* P-state transition (like TM2). Get the last freq set 
-* in PERF_CTL.
-*/
-   rdmsr(MSR_IA32_PERF_CTL, l, h);
-   clock_freq = extract_clock(l, cpu, 1);
-   }
+   clock_freq = point.param[POWEROP_CPU + cpu] * 1000;
 
+out:
set_cpus_allowed(current, saved_mask);
return clock_freq;
 }
@@ -610,6 +603,7 @@
unsigned intmsr, oldmsr, h, cpu = policy->cpu;
struct cpufreq_freqsfreqs;
cpumask_t   saved_mask;
+   struct powerop_pointpoint;
int retval;
 
if (centrino_model[cpu] == NULL)
@@ -650,14 +644,9 @@
 
cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
 
-   /* all but 16 LSB are "reserved", so treat them with
-  care */
-   oldmsr &= ~0x;
-   msr &= 0x;
-   oldmsr |= msr;
-
-   wrmsr(MSR_IA32_PERF_CTL, oldmsr, h);
-
+   point.param[POWEROP_CPU + cpu] = ((msr >> 8) & 0xff) * 100;
+   point.param[POWEROP_V + cpu] = ((msr & 0xff) * 16) + 700;
+   powerop_set_point(&point);
cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
 
retval = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] PowerOP 1/3: PowerOP core

2005-08-09 Thread Todd Poynor

Geoff Levand wrote:

I'm wondering if anything could be gained by having the whole 
struct powerop_point defined in asm/powerop.h, and treat it as an 
opaque structure at this level.  That way, things other than just 
ints could be passed between the policy manager and the backend, 
although I guess that breaks the beauty of the simplicity and would 
complicate the sys-fs interface, etc.  I'm interested to hear your 
comments.


Making the "operating point" data structure entirely platform-specific 
should be OK.  There's a little value to having generic pieces handle 
some common chores (such as the sysfs interfaces), but even for integers 
decimal vs. hex formatting is nicer depending on the type of value. 
Since most values that have been managed using similar interfaces thus 
far have been flags, register values, voltages, etc. using integers has 
worked well and nicely simplified the platform backend, but if there's a 
need for other data types then should be doable.


Another point is that a policy manager would need to poll the system 
and/or get events and then act.  Your powerop work here only provides 
a (one way) piece of the final action.  Any comments regarding a more 
general interface?


What's discussed here is probably the bottommost layer of a power 
management software stack: to read and write the platform-specific 
system power parameters, optionally arranged into a mutually-consistent 
set called an "operating point".  Power policy management is a large, 
thorny topic that I wasn't trying to tackle now.


So far as kernel-to-userspace event notification goes (assuming the 
power policy manager is in userspace, which is certainly where I'd 
recommend), ACPI has a procfs-based communication channel but the 
kobject_uevent stuff looks like the way I'd go, and it's somewhere on my 
list to come up with a patch that does that as well.


If these general ideas of arbitrary platform power parameters and 
operating points are deemed worthy of continued consideration, I'll 
propose what I view is the next step: interfaces to create and activate 
operating points from userspace.


At that point it should be possible to write power policy management 
applications for systems that can benefit from this generalized notion 
of operating points: create the operating points that match the system 
usage models (in the case of many embedded systems, the system is some 
mode with different power/performance characteristics such as audio 
playback vs. mobile phone call in progress) and power needs (e.g., low 
battery strength vs. high strength) and activate operating points based 
on events received (new app running, low battery warning, etc.).


Any opinions on all that?  Thanks,

--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] PowerOP 0/3: System power operating point management API

2005-08-09 Thread Todd Poynor

Patrick Mochel wrote:

On Mon, 8 Aug 2005, Todd Poynor wrote:
(apologies for use of obsolete cpufreq mailing list address in my 
initial message.)

...

PowerOP is intended to leave all power
policy decisions to higher layers.


What do those higher layers look like? Do you have a userspace component
that uses this interface?


cpufreq is one example, it manages an abstraction of system 
power/performance levels based on cpu speed, which maps onto the 
PowerOP-level hardware capabilities in some fashion, and has both kernel 
and userspace components to manage the desired policy associated with 
this.  Regardless of whether this notion of configurable operating 
points would remain a separate layer from cpufreq or was more tightly 
integrated, the code to set these operating points can handle things 
such as setting validated voltage levels to match cpu speeds, etc.


For embedded systems, I am aware only of the Dynamic Power Management 
project, which you also mention and does indeed manage power policy 
based on the notions of power parameters and operating points.  The 
settings of these are configured entirely from userspace via sysfs, 
using shell scripts or convenience libraries that access the sysfs 
attributes.  A system designer chooses the operating points to be 
employed in the system based on the information from the processor or 
board vendor that describes validated, supported operating points and 
based on the characteristics of the system (how fast it needs to run 
while in use for different purposes and how much battery power can be 
spent for those purposes).


For example, a designer implementing a system based on an Intel XScale 
PXA27x processor can choose from among about 16 validated operating 
points listed in the most recent specification update.  Those operating 
points are comprised of register settings with inscrutable names such as 
CCCR[L], CCCR[2N], CLKCFG[T], CCCR[A], and two or three others.  A few 
of those operating points run the CPU at identical frequencies, but have 
other changes in memory clocking, system bus clocking, and the ability 
to quickly switch between certain cpu frequencies based on other 
properties of the platform (so-called "Turbo-mode" frequency scaling). 
A DPM- or PowerOP-based system can be configured with the subset of 
desired operating points and a particular operating point activated as 
needed.  The policy decision as to what operating point is appropriate 
to activate is a matter for custom code provided by the designer, 
tailored to their system.  It is also possible to write automated 
operating point selection algorithms based on such criteria as system 
busyness.



Who is using this code? Are there vendors that are already shipping
systems with this enabled?

Is this part of the DPM project? If so, what other components are left in
DPM?


The concepts and general Linux implementation of power parameters and 
operating points stems from the power-aware computing work done by 
Bishop Brock and Karthick Rajamani of IBM Research, and a somewhat 
different implementation is a part of the DPM project, which MontaVista 
(and reportedly others in the near future) does ship.  So far as I 
understand there are or soon will be mobile phones that use that code as 
the low- to mid-layers of the power management stack (the high-layer 
policy management is performed by a custom application of which I have 
no knowledge).


I mentioned in a previous email the next step of creating and activating 
operating points from userspace.  If that were in place, DPM would 
additionally consist primarily of:


1. Machine-specific backends to set operating points for the systems 
that DPM has been ported to.  If something like PowerOP is accepted into 
a broader community then that code would come along for the ride. 
XScale PXA27x and various ARM OMAPs are among the systems supported, as 
well as potentially others not yet making an appearance in open source.


2. DPM has further concepts of "operating state" (generally, whether the 
system is idle, processing interrupts, running a normal-power-usage 
task, running a background task without deadlines that can be assigned a 
low power/performance level, etc.) and the unfortunately-named "policy" 
that maps each operating state to an operating point, along with the 
code to switch in different operating points as the system switches 
operating states.  The "policy" is a bit of a misnomer; a system 
designer must create the desired operating points and decide upon the 
state -> point mappings appropriate, as well as make decisions on when 
to update the mappings based on external events, changing workloads, 
etc.  There are a few extra ramifications of modifying operating points 
in this fashion, including the need to handle such transitions while in 
interrupt context or in the idle loop, as well as a general concern for 
low overhead since switching may occur very freque

Re: PowerOP 0/3: System power operating point management API

2005-08-10 Thread Todd Poynor

Pavel Machek wrote:

Depending on the ability of the hardware to make software-controlled
power/performance adjustments, this may be useful to select custom
voltages, bus speeds, etc. in desktop/server systems.  Various embedded
systems have several parameters that can be set.  For example, an XScale
PXA27x could be considered to have six basic power parameters (mainly
cpu run mode and memory and bus dividers) that for the most part
should



This scares me a bit. Is table enough to handle this? I'm afraid that
table will get very large on systems that allow you to do "almost
anything".


Exhaustive tables for all combinations of possible parameters aren't 
expected (or practical for many systems as you note).  In practice, a 
subset of these possible operating points are created and activated over 
the lifetime of the system, where the subset is chosen by a system 
designer according to the needs of the particular system.  It's a matter 
for the higher-layer power management software to decide whether to have 
in-kernel tables of the possible operating points (as cpufreq does for 
various platforms) or whether to require userspace to create only the 
ones wanted (as does DPM).  There are cpufreq patches for PXA27x 
somewhere, for example, and in that case a subset of the supported 
operating points (and there are still only about 16 of those even for 
such a complicated piece of hardware) are represented in the kernel 
tables, choosing one of the possible combinations of memory/bus/etc. 
parameters for each unique cpu frequency.  Thanks,


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PowerOP 2/3: Intel Centrino support

2005-08-10 Thread Todd Poynor

Bruno Ducrot wrote:
 > ATM I'm wondering what are the pro for those patches wrt current cpufreq

infrastructure (especially cpufreq's notion of notifiers).

I still don't find a good one but I'm surely missing something obvious.


This is lower layer than cpufreq, intended for use by cpufreq (and 
potentially DPM and other frameworks, although moving embedded to use 
cpufreq is certainly an option) to do the actual hardware modifications 
at the lowest layer.  There are no notifiers et al because the 
higher-layer frameworks handle that portion of it.  More on that in a 
moment.


The main goal is to open up interfaces to operating points comprised of 
arbitrary power parameters, as an alternative to the "cpu frequency" 
parameter that cpufreq is designed around.  In the desktop/server world 
this may not be a high concern for two reasons: (1) although various 
folks do want to manage voltages separately (namely so-called 
"undervolting"), in most cases the safe answer is to stick with a 
voltage preprogrammed into cpufreq that matches what the silicon vendor 
as validated and supports; (2) various additional clocks, dividers, etc. 
potential power parameters might exist in the hardware, but interfaces 
for software control are often not readily available, and the associated 
parameters are typically set once by the BIOS at boot time (possibly 
with a menu to allow you override the defaults).


In the embedded world, however, silicon vendors typically produce 
hardware that are run by, and specify validated operating points that 
are comprised of, 6-7 basic parameters (clocks, voltages, dividers, 
etc.), and in many cases it is useful to manage all these parameters for 
additional battery savings at runtime (and these systems are often 
powered by much smaller batteries than the notebooks powered by 
desktop-class processors, and so more aggressive reductions in power 
consumption may be pursued).  Similar interfaces are already in use for 
managing such hardware, with benefits that largely depend on the 
capabilities of the hardware (in the case of the IBM PowerPC 405LP for 
which it was originally developed this was pretty extensively studied).


It is possible, of course, to choose operating points according to cpu 
frequency and hardcode the settings of other power parameters to match, 
restricting the choices for the other parameters.  There are, however, 
power consumption and performance (primarily latency of transition 
between operating points) advantages to being able to choose memory bus 
speeds, core PLL speeds, voltages (yes, some embedded platforms do allow 
this to be chosen within limits imposed by other clock speeds), etc. 
The embedded hardware designers could probably express this more 
eloquently than I can.


A secondary goal is to separate the code to perform platform-specific 
hardware modifications from the upper layers which by necessity carry 
various assumptions that may not be appropriate for all situations.  The 
notifiers are one example of this: DPM typically minimizes use of 
notifiers and requires notifiers to be able to operate in interrupt and 
in idle loop context, due to those additional situations in which DPM 
manages power parameters.


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: PowerOP 2/3: Intel Centrino support

2005-08-10 Thread Todd Poynor

Dave Jones wrote:

I'm glad I'm not the only one who feels he's too dumb to see
the advantages of this. The added complexity to expose something
that in all cases, we actually don't want to expose seems a little
pointless to me.

For example, most of the x86 drivers, if you set a speed, and then
start fiddling with the voltage, you can pretty much guarantee
you'll crash within the next few seconds.  They have to match,
or at the least, be within a very small margin.


I've attempted to extoll the benefits of adding these interfaces in 
previous emails, and if after that it still seems mystifying why anybody 
would want to do this then I'll take the heat for doing a lousy job of 
extolling.  I've also admitted that it is primarily of use in 
embedded-specific hardware, and of less use in x86 and in desktop/server 
usage for various reasons (unless it turns out there's some fantastic 
savings to be had in modifying common x86 bus speeds independently of 
cpu speed, which seems unlikely).  In the case of x86, undervolting is 
in practice by some folks, but yes it is risky and support for it in the 
basic interfaces probably shouldn't be a high priority.



Given how long its taken us to get sane userspace parts for cpufreq,
I'm loathe to changing the interfaces yet again unless there's
a clear advantage to doing so, as it'll take at least another 12 months
for userspace to catch up.


Just to be clear, there are no cpufreq userspace interface changes 
required by this, it simply occupies the bottom layer of code that 
modifies platform power registers etc.  The same speed/policy/governor 
etc. interfaces are used to specify the cpu speed.  Interfaces to the 
power parameters can optionally be used for diagnostic purposes, and in 
the near future I'll propose alternative interfaces for setting entire 
operating points, but the existing cpufreq interfaces will work just 
fine regardless.


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] PowerOP 1/3: PowerOP core

2005-08-10 Thread Todd Poynor

Daniel Petrini wrote:


I'd like to have an idea of how the powerop would evolve to address:

a) exporting all operating points to sysfs - that would be useful for
a policy manager in user space, or the user policy will already be
aware of the ops?


For different usage models I'd expect to see both situations.  In one 
situation (classic desktop/server), a canned set of operating points may 
bee preconfigured (for example, the cpufreq support for Centrino), and 
automated software or an interactive user knows how to select an 
appropriate point (like the existing cpufreq algorithms such as "choose 
the lowest (powersave) point").  In the other situation (classic 
embedded), a system designer has chosen a number of useful operating 
points and configures software to choose an appropriate point based on 
system state ("the MPEG4 video app is running"), and that software knows 
what points are available and in what situations the points should apply.



b) Constraints: if I would like to change to a op and such a
transition is not allowed in hardware, what part of the software will
test it? The set/get powerop functions, the higher layers or even some
lower layer (don't know if expected) ?


Any sort of policy on what operating points should be allowed is 
targeted for a higher layer than PowerOP.  As you mention, there are 
situations where device needs constrain the operating points that can be 
set (for example, a PXA27x has modes that disable PLLs and/or run clocks 
at low speeds such that certain devices, if powered on, will wedge). 
Intelligence on what to do about that situation, if anything, can be 
placed in the high-layer power policy management application (this is 
probably the best answer), and/or the mid-layer power management 
framework code can also attempt to enforce these rules.


Stepping up on my soapbox for a moment, it is always recommended to have 
the power policy management application be aware of what the constraints 
are on operating points and set an appropriate power policy based on 
that information.  This may entail sending event messages from the 
drivers to the power policy manager app, to coordinate changes in device 
state with changes in policy.  If the constraints change very rapidly 
then it may make sense to preconfigure the rules on which operating 
point to choose in the kernel to avoid userspace interactions (and this 
is what the DPM device constraints feature is intended to do).  Relying 
on the power management mid-layer to block attempts to set an 
incompatible operating point is not recommended, but can function 
reasonably well if the mid layer is smart enough to know what the next 
best choice is and set that operating point instead.


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PowerOP 2/3: Intel Centrino support

2005-08-12 Thread Todd Poynor

Jordan Crouse wrote:


When it comes
to embedded power management concepts, a consistant theme is that people
often question the usefulness, redundancy or complexity of a solution.  This
is perfectly understandable, since such a huge majority of the power
management experts and users are concentrating intently on x86 desktops and
servers.  


It also occurs to me that another reason for the disconnect between x86 
desktop/server and embedded on this point is the lack of ACPI.  We want 
to do things analogous to the power management performed by ACPI, but 
entirely in Linux, so we need to expose some of those low-level machine 
resources to our power management software.  In many cases those power 
management decisions do not revolve around the question of the MHz at 
which the CPU is to run.  Embedded Linux system power management exists 
for many of the same reasons ACPI exists.



--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] PowerOP 1/3: PowerOP core

2005-08-12 Thread Todd Poynor

[EMAIL PROTECTED] wrote:

How well would _this_ notion of an operating point scale up?

I have this feeling that it's maybe better attuned to "scale down"
sorts of problems (maybe cell phones) than to a big NUMA box.  I can
see how a batch scheduled server might want to fire up only enough
components to run the next simulation, but I wonder if maybe systems
dynamically managing lots of resources might not be better off with
some other model ... where userspace makes higher level decisions,
and the kernel is adaptive within a potentially big solution space.
(Likewise, maybe some of the smaller systems would too.)


If I understand correctly, that does seem to describe how such systems 
are used today: out of a potentially large number of choices (and some 
embedded SOCs do have a good variety of clocking, core voltage, and 
power domain choices), configure only those useful for the problem at 
hand into the kernel and then userspace gives marching orders to the 
kernel to activate whatever's appropriate according to system-specific 
logic in userspace.



  - Why have any parsing at all?  It's opaque data; so insist that
the kernel just get raw bytes.  That's the traditional solution,
not having the kernel parse arrays of integers.

  - Why try to standardize a data-based abstraction at all?  Surely
it'd be easier to use modprobe, and have it register operating
points that include not just the data, but its interpretation.


Configuring the definitions of desired operating points (and updating 
these on-the-fly) from userspace has its advantages, but inserting into 
the kernel as a module should work fine and avoids some awkward 
userspace interfaces.  In the current code it is intended that both 
in-kernel interfaces (without parsing) and userspace interfaces are 
available, but I think non-diagnostic userspace interfaces could easily 
be dropped.  One of the nice things about having it done from userspace 
is that apps can be in charge of configuring operating points and 
activating those operating points in response to changes in system state 
(with reduced chance of mismatched app + module).  And since it can be 
done in userspace it's nice to simplify the kernel that way, which some 
folks feel strongly about.  But not a big deal to me.


Standard data structures have small benefits for implementing some code 
once instead of per-platform.  Another possibility in addition to the 
configuration or diagnostic interfaces is the concept of "constraints" 
on operating points according to device needs that you and others have 
alluded to: the constraints can be expressed in a form that can be 
checked by platform-independent code ("check the value at this power 
parameter index in the array for this range of integer values").  Also 
not a big deal.


All in all, it sounds like the next version of this should not have a 
userspace configuration interface and should not provide a 
platform-independent structure for the operating points.  The operating 
point get and set functions are in machine-specific code and take an 
operating point argument defined in header files shared between the 
machine-specific code and whatever kernel code is creating and 
activating operating points.  Any diagnostic interfaces need to be 
machine-dependent as well.


Now that the operating points are created without a generic layer or 
structure, if one wants a generic interface to set (activate) one of 
those operating points, probably identified by name, from userspace then 
a little extra is needed.  Can worry about that later.



  - If those numbers are needed, having single-valued sysfs attributes
(maybe /sys/power/runstate/policy_name/XXX) would be preferable
to relying on getting position right within a multivalued input.


In case it helps, the code thus far and proposed interfaces use 
single-valued sysfs interfaces.  Since you also mentioned:



It's easier for me to see how "echo policy_name > /sys/power/runtime"
would scale up and down (given pluggable policy_name components!)
than "echo 0 35 98 141 66 -3 0x7efc0 > /sys/power/foo" would.


and:


That'd also be less error prone than "whoops,
there wasn't supposed to be a space between 35 and 98" or "darn, I
switched the 141 and 66 around again".


It may be worth pointing out that these interfaces wouldn't do that (a 
two level hierarchy of operating point name and single power parameter 
attribute would be used, and the ordering into the array is handled by 
the generic PowerOP core, which knows how to associate parameter names 
with array indices).  Older versions of DPM did use interfaces similar 
to what you describe, in case you've got that in mind.  They weren't 
intended to be used interactively.


Thanks,


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel&q

Re: PowerOP 0/3: System power operating point management API

2005-08-16 Thread Todd Poynor

Dominik Brodowski wrote:


First, the table interface you suggest is ugly. If there's indeed the need for
such an abstraction, I'd favour something like


I'm planning to adopt the previous suggestions of an opaque data 
structure and stop trying to have any generic structure to it.  I'll try 
to leave dependency checking etc. to the upper layers as much as 
possible, since platforms vary greatly in this and so do the needs of 
different PM s/w stacks.



Secondly, you do not adress the cross-relationships between operation points
correctly. If you change the CPU frequency, you may have to switch other
(memory, video) settings; you might even have to validate the frequency
settings for these or even additional reasons (thermal and battery reasons -
ACPI _PPC).


This lowest layer basically assumes that upper-layer software has 
created an appropriate operating point (for example, in DPM we pretty 
much require a system designer to create operating points that match the 
h/w specs and don't go to great lengths to encode rules about this), 
and/or will call driver notifiers etc. as needed to adapt to the 
changes.  Although there may be some sanity checking appropriate at the 
PowerOP level, cpufreq, DPM, etc. can for the most part continue to 
handle the larger issues of how valid operating points are constructed, 
driver callbacks, etc.  If you do want to handle various dependencies at 
the PowerOP layer then there's nothing that prevents that, but PM 
frameworks tend to embody assumptions about how frequently operating 
points will change and in what contexts (interrupt, idle...), and this 
can influence the code for such things.



Thirdly, who is to decide on the power management settings? The first and
intuitive answer is the kernel. Therefore, kernel-space cpufreq governors
exist. Only under rare circumstances, you want full userspace control --
that's what the userspace cpufreq governor is for.


Also something left to the existing upper layers; PowerOP isn't intended 
to handle any of that.  In the embedded space we usually let the system 
designer choose operating points supported by their h/w vendor and that 
match their particular system states (hardware enabled at any point in 
time, type and power/performance needs of software currently running). 
We do recommend that a userspace power policy manager be the component 
in charge of PM settings, based on messages from drivers and other apps 
on the state of the system.  And so that userspace component activates 
the operating point (or set of operating points in the case of DPM) 
appropriate for current state.



Foruthly, the code duplication which your implementation leads to is obvious
for the speedstep-centrino case. 


We could move the tables of valid cpu speeds and corresponding voltages 
down to the PowerOP level, and there would probably be little 
duplication at that point (in fact, with the current patch there's not a 
lot of duplication since the actual MSR access was moved to PowerOP and 
PowerOP contains little else, but both levels know how to understand the 
MSR format, and a more aggressive port to PowerOP could do away with that).


Your suggestions of changes to cpufreq governors and policies to handle 
governance of non-cpu-speed parameters sound interesting, and I'd be 
happy to help figure out what to do about those vs. the lower machine 
access layer I've discussed up until now.  I'll think more about this 
real soon now.  Thanks,


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PowerOP 0/3: System power operating point management API

2005-08-16 Thread Todd Poynor

Dominik Brodowski wrote:

A small add-on:

We need to make sure that we're capable of handling smart CPUs like Transmeta
Crusoe processors in a sane way. This means



b)  Setting of "values"



is optional if the hardware itself can be set to a min/max value (step a
above in previous mail).


Although I haven't looked into the Crusoe processor support, it may be 
that there is a different set of power parameters, not cpu speed 
directly, that are appropriate to manage on these platforms (after a 
brief look, seems to be a range of frequencies and some sort of flags)? 
 If so, these sorts of machine-specific power parameters are what 
PowerOP is trying to address, allowing management of the underlying 
machine-specific stuff to upper layers that may be presenting an 
abstracted view of power/performance, such as CPU speed or speed ranges, 
to the user.  Thanks,


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Resend:[RFC/Patch] Robust futexes

2005-07-05 Thread Todd Kneisel

This is a resend of my patch to add robust futex support to the existing
sys_futex system call. The patch applies to 2.6.12. Any comments or
discussion will be welcome.

Changes since my last posted version:
- Applies to 2.6.12, was 2.6.12-rc6
- Added config option CONFIG_ROBUST_FUTEX, depends on existing CONFIG_FUTEX
  and defaults to no.
- Commented functions, using kernel-doc style comments
- Cleaned up some CodingStyle violations

Sys_futex provides operations on futexes that can be local to a process, or
shared between processes by placing the futex in shared memory. However, if
a process terminates while it owns a locked shared futex, any other
processes that use the same futex will hang.

With this patch, if a process terminates while it owns a locked robust
futex, the ownership of the lock will be transferred to the next waiting
process, the waiting process will be awakened and will receive the status
EOWNERDEAD. If there is no waiting process at the time of termination, then
the next process that attempts to wait will receive ownership of the futex
and the EOWNERDEAD status. The new owner can recover the futex and unlock it,
in which case the futex can continue to be used. If the new owner only
unlocks the futex, then the futex becomes unrecoverable and any attempt to
use the futex will get the status ENOTRECOVERABLE.

The patch does not change the existing sys_futex operations on non-robust
mutexes, so the patch should not affect existing code that uses futexes.
New op codes are added to the sys_futex system call for use by code that
requires robust futexes. I have a patch to glibc and the nptl thread
library that uses robust futexes. We are in the process of getting
copyright assignments to the Free Software Foundation so that we can
submit the glibc and nptl patches.

Robust futexes have a different format from the non-robust futexes.
The non-robust futexes can have the values 0 (unlocked), 1 (locked) or
2 (locked with waiters). In a robust futex the high bit indicates if there
are processes waiting on the futex, the next bit indicates if the owning
process died, and the next bit indicates if the futex is not recoverable.
The rest of the futex contains the pid of the task that owns the futex lock
or zero if the futex is not locked.

Signed-off-by: Todd Kneisel <[EMAIL PROTECTED]>

 fs/dcache.c   |3
 fs/inode.c|2
 include/linux/fs.h|4
 include/linux/futex.h |   20 +
 init/Kconfig  |   11
 kernel/exit.c |3
 kernel/futex.c|  686 +-
 7 files changed, 728 insertions(+), 1 deletion(-)


diff -uprN -X dontdiff linux-2.6.12/fs/dcache.c linux-2.6.12-todd/fs/dcache.c
--- linux-2.6.12/fs/dcache.c2005-06-17 12:48:29.0 -0700
+++ linux-2.6.12-todd/fs/dcache.c   2005-06-20 10:44:40.738407891 -0700
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 

 /* #define DCACHE_DEBUG 1 */

@@ -158,6 +159,8 @@ repeat:
return;
}

+   futex_free_robust_list(dentry->d_inode);
+
/*
 * AV: ->d_delete() is _NOT_ allowed to block now.
 */
diff -uprN -X dontdiff linux-2.6.12/fs/inode.c linux-2.6.12-todd/fs/inode.c
--- linux-2.6.12/fs/inode.c 2005-06-17 12:48:29.0 -0700
+++ linux-2.6.12-todd/fs/inode.c2005-06-20 15:08:36.428029628 -0700
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * This is needed for the following functions:
@@ -202,6 +203,7 @@ void inode_init_once(struct inode *inode
INIT_LIST_HEAD(&inode->i_data.i_mmap_nonlinear);
spin_lock_init(&inode->i_lock);
i_size_ordered_init(inode);
+   futex_init_inode(inode);
 }

 EXPORT_SYMBOL(inode_init_once);
diff -uprN -X dontdiff linux-2.6.12/include/linux/fs.h 
linux-2.6.12-todd/include/linux/fs.h
--- linux-2.6.12/include/linux/fs.h 2005-06-17 12:48:29.00000 -0700
+++ linux-2.6.12-todd/include/linux/fs.h2005-06-20 14:56:16.977970185 
-0700
@@ -350,6 +350,10 @@ struct address_space {
spinlock_t  private_lock;   /* for use by the address_space 
*/
struct list_headprivate_list;   /* ditto */
struct address_space*assoc_mapping; /* ditto */
+#ifdef CONFIG_ROBUST_FUTEX
+   struct list_headrobust_list;/* list of robust futexes */
+   struct semaphorerobust_sem; /* protect list of robust 
futexes */
+#endif
 } __attribute__((aligned(sizeof(long;
/*
 * On most architectures that alignment is already the case; but
diff -uprN -X dontdiff linux-2.6.12/include/linux/futex.h 
linux-2.6.12-todd/include/linux/futex.h
--- linux-2.6.12/include/linux/futex.h  2005-06-17 12:48:29.00000 -0700
+++ linux-2.6.12-todd/include/linux/futex.h 2005-06-20 16:43:46.293122664 
-0700
@@ -1,6 +1,8 @@
 #ifndef _LINUX_FUTEX_H
 #define _LINUX_FUTEX_H

+#include 
+
 /* Second argument to futex syscall */



Re: Resend:[RFC/Patch] Robust futexes

2005-07-06 Thread Todd . Kneisel
I've made relatively minor changes to the glibc side since posting
the patch to the robust mutexes list.
- added EOWNERDEAD and ENOTRECOVERABLE to the glibc error reporting
  mechanism, so that strerror() works.
- added function prototypes to pthread.h for 
pthread_mutexattr_getrobust_np,
  pthread_mutexattr_setrobust_np, and pthread_mutex_consistent_np.
- defined a constant for accessing the pid that's stored in the futex.
- Changed copyrights from Bull SA (the French division of our company)
  to Bull HN (The U.S. division).

We expect to receive the FSF contracts for copyright assignment any
day now. If anyone would like to see the glibc changes, I can provide
the patch under the Bull HN copyright. The patch applies to glibc-2.3.4.

Todd.


Daniel Walker <[EMAIL PROTECTED]> wrote on 07/05/2005 05:00:32 PM:

> 
> You might want to CC Andrew Morton , and Rusty Russell.
> 
> What is the status of the glibc side of this?
> 
> Daniel
> 
> 
> On Tue, 2005-07-05 at 16:11 -0700, Todd Kneisel wrote:
> > This is a resend of my patch to add robust futex support to the 
existing
> > sys_futex system call. The patch applies to 2.6.12. Any comments or
> > discussion will be welcome.
> > 
> > Changes since my last posted version:
> > - Applies to 2.6.12, was 2.6.12-rc6
> > - Added config option CONFIG_ROBUST_FUTEX, depends on existing 
CONFIG_FUTEX
> >and defaults to no.
> > - Commented functions, using kernel-doc style comments
> > - Cleaned up some CodingStyle violations
> > 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PowerOP Take 2 1/3: ARM OMAP1 platform support

2005-08-24 Thread Todd Poynor
PowerOP support for OMAP1 platforms.  Currently handles these power
parameters:

   dpllmult DPLL_CTL reg PLL MULT bits
   dplldiv  DPLL_CTL reg PLL DIV bits
   armdiv   ARM_CKCTL reg ARMDIV bits
   dspdiv   ARM_CKCTL reg DSPDIV bits
   tcdivARM_CKCTL reg TCDIV bits
   lowpwr   1 = assert ULPD LOW_PWR, voltage scale low

Other parameters such as DSPMMUDIV, LCDDIV, and ARM_PERDIV might also be
useful.

Example usage will be shown with a follow-on sysfs UI patch.

Index: linux-2.6.13-rc4/arch/arm/mach-omap1/powerop.c
===
--- /dev/null
+++ linux-2.6.13-rc4/arch/arm/mach-omap1/powerop.c
@@ -0,0 +1,157 @@
+/*
+ * PowerOP support for OMAP1
+ *
+ * Based on DPM OMAP code by Matthew Locke, Dmitry Chigirev, Vladimir
+ * Barinov, and Todd Poynor.
+ *
+ * 2005 (c) MontaVista Software, Inc. This file is licensed under
+ * the terms of the GNU General Public License version 2. This program
+ * is licensed "as is" without any warranty of any kind, whether express
+ * or implied.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+/* ARM_CKCTL bit shifts */
+#define CKCTL_PERDIV_OFFSET0
+#define CKCTL_LCDDIV_OFFSET2
+#define CKCTL_ARMDIV_OFFSET4
+#define CKCTL_DSPDIV_OFFSET6
+#define CKCTL_TCDIV_OFFSET 8
+#define CKCTL_DSPMMUDIV_OFFSET 10
+
+#define DPLL_CTL_MASK  0xfe0
+
+#define ULPD_MIN_MAX_REG (1 << 11)
+#define ULPD_DVS_ENABLE  (1 << 10)
+#define ULPD_LOW_PWR_REQ (1 << 1)
+#define LOW_POWER (ULPD_MIN_MAX_REG | ULPD_DVS_ENABLE | ULPD_LOW_PWR_REQ | \
+  ULPD_LOW_PWR_EN)
+
+int
+powerop_get_point(struct powerop_point *point)
+{
+   unsigned long flags;
+   int dpll_ctl, arm_cktl;
+
+   local_irq_save(flags);
+   dpll_ctl = omap_readw(DPLL_CTL);
+   arm_cktl = omap_readw(ARM_CKCTL);
+
+   point->dpllmult = dpll_ctl >> 7 & 0x1f;
+   point->dplldiv = dpll_ctl >> 5 & 0x3;
+   point->armdiv = arm_cktl >> CKCTL_ARMDIV_OFFSET & 0x3;
+   point->dspdiv = arm_cktl >> CKCTL_DSPDIV_OFFSET & 0x3;
+   point->tcdiv = arm_cktl >> CKCTL_TCDIV_OFFSET & 0x3;
+   point->lowpwr = (omap_readw(ULPD_POWER_CTRL) & (ULPD_LOW_PWR_REQ)) ?
+   1 : 0;
+   local_irq_restore(flags);
+   return 0;
+}
+
+static void scale_dpll(int dpll_ctl)
+{
+   int i;
+
+   omap_writew((omap_readw(DPLL_CTL) & ~DPLL_CTL_MASK) | dpll_ctl,
+   DPLL_CTL);
+
+   for (i = 0; i < 0x1FF; i++)
+   nop();
+
+   /*
+* Wait for PLL relock.
+*/
+
+   while ((omap_readw(DPLL_CTL) & 0x1) == 0);
+}
+
+static void set_low_pwr(int lowpwr)
+{
+   int cur_lowpwr;
+
+   if (lowpwr == -1)
+   return;
+
+   cur_lowpwr = (omap_readw(ULPD_POWER_CTRL) & (ULPD_LOW_PWR_REQ)) ?
+   1 : 0;
+
+   if (cur_lowpwr != lowpwr) {
+   if (lowpwr)
+   omap_writew(omap_readw(ULPD_POWER_CTRL) | LOW_POWER,
+   ULPD_POWER_CTRL);
+   else
+   omap_writew(omap_readw(ULPD_POWER_CTRL) & ~LOW_POWER,
+   ULPD_POWER_CTRL);
+   }
+}
+
+int
+powerop_set_point(struct powerop_point *point)
+{
+   int dpll_ctl = 0;
+   int dpll_mod = 0;
+   int arm_ctl = 0;
+   int arm_msk = 0;
+   int cur_dpll_ctl;
+   unsigned long flags;
+
+   if ((point->dpllmult != -1) && (point->dplldiv != -1)) {
+   dpll_ctl = (point->dpllmult << 7) |
+   (point->dplldiv << 5);
+   dpll_mod = 1;
+   }
+
+   if (point->armdiv != -1) {
+   arm_ctl |= (point->armdiv << CKCTL_ARMDIV_OFFSET);
+   arm_msk |= (3 << CKCTL_ARMDIV_OFFSET);
+   }
+
+   if (point->dspdiv != -1) {
+   arm_ctl |= (point->dspdiv << CKCTL_DSPDIV_OFFSET);
+   arm_msk |= (3 << CKCTL_DSPDIV_OFFSET);
+   }
+
+   if (point->tcdiv != -1) {
+   arm_ctl |= (point->tcdiv << CKCTL_TCDIV_OFFSET);
+   arm_msk |= (3 << CKCTL_TCDIV_OFFSET);
+   }
+
+   local_irq_save(flags);
+   cur_dpll_ctl = omap_readw(DPLL_CTL) & DPLL_CTL_MASK;
+
+   if (dpll_mod && (dpll_ctl < cur_dpll_ctl))
+   scale_dpll(dpll_ctl);
+
+   if (arm_msk != 0)
+   omap_writew((omap_readw(ARM_CKCTL) & ~arm_msk) | arm_ctl,
+   ARM_CKCTL);
+
+   if (dpll_mod && (dpll_ctl > cur_dpll_ctl))
+   scale_dpll(dpll_ctl);
+
+   set_low_pwr(point->lowpwr);
+   local_irq_restore(flags);
+   return 0;
+}
+
+EXPORT_SYMBOL_GPL(powerop_get_point);
+EXPORT_SYMBOL_GPL(powerop_se

PowerOP Take 2 0/3 Intro

2005-08-24 Thread Todd Poynor
PowerOP is a system power parameter management API submitted for
discussion.  PowerOP writes and reads power "operating points",
comprised of arbitrary values, called power parameters, that correspond
to registers, clocks, dividers, voltage regulators, etc. that may be
modified to set a basic power/performance point for the system.
Higher-layer power management software passes a platform-specific struct
of power parameters to a backend that makes the requested adjustments.

PowerOP can be thought of as a layer below cpufreq that actually
accesses the hardware to make cpu frequency, voltage, core bus, and
perhaps other modifications to set a power point.  The main goal is to
open up interfaces to operating points comprised of arbitrary power
parameters, as an alternative to the "cpu frequency" parameter that
cpufreq is designed around.  It pushes the hardware-level power
parameter management down to a level that can be shared with other power
management policy frameworks that deal with entire operating points as
the basic unit of system power management for reasons described below,
and that 

The new layer is intended to leave all power policy decisions, and
various other power management chores such as driver notification, to
higher layers in a power management software stack.

This work is aimed in part at supporting embedded systems, which often
have several parameters that can be set and the cpu frequency
abstraction specifies only part of the operating point that may be
managed from software.  For example, an XScale PXA27x could be
considered to have six basic power parameters (mainly cpu run mode and
memory and bus dividers) that for the most part should be set in tandem
to known good sets of values as validated by the silicon vendor.
Embedded systems typically handle more hardware clock manipulation
directly in Linux than do, for example, desktop/server systems based on
ACPI interfaces.  Desktop/server systems may also benefit from the
ability to select custom voltages, bus speeds, etc., although deviating
from values validated by the hardware vendor is risky and controversial.

An optional sysfs interface to create and activate operating points is
also submitted for discussion.  This interface could be used in an
actual power management stack, or at least can serve as a starting point
for discussing how to manage operating points at the next higher layer
in the power management soctware stack.

cpufreq is another possible interface. PowerOP can fit below cpufreq to
make cpu frequency, voltage, core bus, and perhaps other modifications
to set a power point, leaving cpufreq to manage the interfaces based
around the "cpu frequency" abstraction, the policies and governors that
select the frequency, its notifiers, and so forth.  An example hooking
up support for one cpufreq platform to PowerOP has been previously
submitted.  Some discussion regarding incorporating an expanded set of
power parameters into the cpufreq interfaces has also taken place.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PowerOP Take 2 3/3: OMAP1 sysfs UI

2005-08-24 Thread Todd Poynor
A PowerOP sysfs UI backend for OMAP1 platforms, adding sysfs attributes
and show/store functions that correspond to the platform power
parameters.  

An example usage on an OMAP1510 Innovator which does not have voltage
scaling and ignoring the DSP:

# echo -n 168-168-84 > /sys/powerop/new # DPLL 168MHz, ARM 168MHz, TC 84MHz
# echo -n 14 > /sys/powerop/168-168-84/dpllmult
# echo -n 0 > /sys/powerop/168-168-84/dplldiv
# echo -n 0 > /sys/powerop/168-168-84/armdiv
# echo -n 1 > /sys/powerop/168-168-84/tcdiv
# echo -n -1 > /sys/powerop/168-168-84/dspdiv
# echo -n -1 > /sys/powerop/168-168-84/lowpwr

# echo -n 60-60-60 > /sys/powerop/new # DPLL 60MHz, ARM 60MHz, TC 60MHz
# echo -n 5 > /sys/powerop/60-60-60/dpllmult
# echo -n 0 > /sys/powerop/60-60-60/dplldiv
# echo -n 0 > /sys/powerop/60-60-60/armdiv
# echo -n 0 > /sys/powerop/60-60-60/tcdiv
# echo -n -1 > /sys/powerop/60-60-60/dspdiv
# echo -n -1 > /sys/powerop/60-60-60/lowpwr

# echo -n 60-60-60 > /sys/powerop/active
# cat /sys/powerop/hw/dpllmult
5

The lower-powered operating point consumes about .07A less power using
my .config (it should be noted this is not a proper low-power evaluation
platform).


Index: linux-2.6.13-rc4/arch/arm/mach-omap1/Kconfig
===
--- linux-2.6.13-rc4.orig/arch/arm/mach-omap1/Kconfig
+++ linux-2.6.13-rc4/arch/arm/mach-omap1/Kconfig
@@ -155,3 +155,5 @@ source "arch/arm/plat-omap/dsp/Kconfig"
 config POWEROP
bool "PowerOP Platform Core Power Management for OMAP1"
help
+
+source "drivers/powerop/Kconfig"
Index: linux-2.6.13-rc4/arch/arm/mach-omap1/powerop.c
===
--- linux-2.6.13-rc4.orig/arch/arm/mach-omap1/powerop.c
+++ linux-2.6.13-rc4/arch/arm/mach-omap1/powerop.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -144,13 +145,141 @@ powerop_set_point(struct powerop_point *
 EXPORT_SYMBOL_GPL(powerop_get_point);
 EXPORT_SYMBOL_GPL(powerop_set_point);
 
+#ifdef CONFIG_POWEROP_SYSFS
+
+ssize_t lowpwr_show(struct powerop_point *op, char * buf)
+{
+   return sprintf(buf,"%d\n",op->lowpwr);
+}
+
+ssize_t lowpwr_store(struct powerop_point *op, const char * buf, size_t count)
+{
+   int error;
+
+   if ((error = sscanf(buf, "%d", &op->lowpwr)) == 1)
+   return count;
+   return error;
+}
+
+powerop_param_attr(lowpwr);
+
+ssize_t dpllmult_show(struct powerop_point *op, char * buf)
+{
+   return sprintf(buf,"%d\n",op->dpllmult);
+}
+
+ssize_t dpllmult_store(struct powerop_point *op, const char * buf, size_t 
count)
+{
+   int error;
+
+   if ((error = sscanf(buf, "%d", &op->dpllmult)) == 1)
+   return count;
+   return error;
+}
+
+powerop_param_attr(dpllmult);
+
+ssize_t dplldiv_show(struct powerop_point *op, char * buf)
+{
+   return sprintf(buf,"%d\n",op->dplldiv);
+}
+
+ssize_t dplldiv_store(struct powerop_point *op, const char * buf, size_t count)
+{
+   int error;
+
+   if ((error = sscanf(buf, "%d", &op->dplldiv)) == 1)
+   return count;
+   return error;
+}
+
+powerop_param_attr(dplldiv);
+
+ssize_t armdiv_show(struct powerop_point *op, char * buf)
+{
+   return sprintf(buf,"%d\n",op->armdiv);
+}
+
+ssize_t armdiv_store(struct powerop_point *op, const char * buf, size_t count)
+{
+   int error;
+
+   if ((error = sscanf(buf, "%d", &op->armdiv)) == 1)
+   return count;
+   return error;
+}
+
+powerop_param_attr(armdiv);
+
+ssize_t dspdiv_show(struct powerop_point *op, char * buf)
+{
+   return sprintf(buf,"%d\n",op->dspdiv);
+}
+
+ssize_t dspdiv_store(struct powerop_point *op, const char * buf, size_t count)
+{
+   int error;
+
+   if ((error = sscanf(buf, "%d", &op->dspdiv)) == 1)
+   return count;
+   return error;
+}
+
+powerop_param_attr(dspdiv);
+
+ssize_t tcdiv_show(struct powerop_point *op, char * buf)
+{
+   return sprintf(buf,"%d\n",op->tcdiv);
+}
+
+ssize_t tcdiv_store(struct powerop_point *op, const char * buf, size_t count)
+{
+   int error;
+
+   if ((error = sscanf(buf, "%d", &op->tcdiv)) == 1)
+   return count;
+   return error;
+}
+
+powerop_param_attr(tcdiv);
+
+static struct attribute *powerop_sysfs_param_attrs[] = {
+   &lowpwr_attr.attr,
+   &dpllmult_attr.attr,
+   &dplldiv_attr.attr,
+   &armdiv_attr.attr,
+   &dspdiv_attr.attr,
+   &tcdiv_attr.attr,
+   NULL,
+};
+
+static void powerop_omap1_sysfs_init()
+{
+   powerop_sysfs_register(powerop_sysfs_param_attrs);
+}
+
+static void powerop_omap1_sysfs_exit()
+{
+   powerop_sysfs_unregister(powerop_sysfs_param_attrs);
+}
+#else /* CONFIG_POWEROP_SYSFS */
+static void powerop_omap1_sysfs_init()
+{
+}
+static void powerop_omap1_sysfs_exit()
+{
+}
+#endif /* CONFIG_POWEROP_SYSFS */
+
 static int __init powerop_init(void)
 {
+   powerop_omap1_sysfs_init();
return 0;
 }
 
 static

PowerOP Take 2 2/3: sysfs UI core

2005-08-24 Thread Todd Poynor
A sysfs interface for PowerOP that allows operating points to be created
and activated from userspace.

The platform-specific backend provides the code to read and write sysfs
attributes for each power parameter; the core sysfs interface has no
knowledge of the struct powerop_point contents.  This interface could be
used independently of an integrated cpufreq or DPM interface.  It is not
an integral part of PowerOP and is provided in part to facilitate
discussion and experimentation with PowerOP, but could serve as a basis
for a basic userspace power policy management stack.

Operating points are created by writing the name of the operating point
to /sys/powerop/new.  This may be a job for configfs.
/sys/powerop// will contain an attribute for each power parameter
that may be written to set the associated parameter for the new
operating point.  An operating point may be activated by writing its
name to /sys/powerop/active.  The hardware power parameters currently
set may be read and written via /sys/powerop/hw/, a special operating
point that reads and writes parameter attribute values immediately,
primarily for diagnostic purposes.

Buried in this interface is also the notion of a registry of "named
operating points", allowing operating points created by some other
interface (such as cpufreq or loading a module with the definitions as
suggested previously by David Brownell) to be activated from userspace
via /sys/powerop/active.

Changing operating points (or other power-policy-based information that
triggers changes in operating points) from userspace is a common
scenario in some embedded systems, where power/performance needs change
based on system state changes that are coordinated by a userspace
process (for example, a mobile phone starting a multimedia application).

Index: linux-2.6.13-rc4/drivers/powerop/Kconfig
===
--- /dev/null
+++ linux-2.6.13-rc4/drivers/powerop/Kconfig
@@ -0,0 +1,4 @@
+config POWEROP_SYSFS
+   bool "  Enable PowerOP sysfs interface"
+   depends on POWEROP && SYSFS
+   help
Index: linux-2.6.13-rc4/drivers/powerop/Makefile
===
--- /dev/null
+++ linux-2.6.13-rc4/drivers/powerop/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_POWEROP_SYSFS)+= powerop_sysfs.o
Index: linux-2.6.13-rc4/drivers/powerop/powerop_sysfs.c
===
--- /dev/null
+++ linux-2.6.13-rc4/drivers/powerop/powerop_sysfs.c
@@ -0,0 +1,398 @@
+/*
+ * PowerOP sysfs UI
+ *
+ * Author: Todd Poynor <[EMAIL PROTECTED]>
+ *
+ * 2005 (c) MontaVista Software, Inc. This file is licensed under
+ * the terms of the GNU General Public License version 2. This program
+ * is licensed "as is" without any warranty of any kind, whether express
+ * or implied.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+int powerop_register_point(const char *id, struct powerop_point *point);
+int powerop_select_point(const char *id);
+
+struct namedop {
+   struct kobject kobj;
+   struct powerop_point *point;
+   struct list_head node;
+   struct completion released;
+};
+
+#define to_namedop(_kobj) container_of(_kobj,\
+   struct namedop,kobj)
+
+static DECLARE_MUTEX(namedop_list_mutex);
+static struct list_head namedop_list;
+static struct namedop *activeop;
+
+struct sysfsop {
+   struct powerop_point point;
+   struct list_head node;
+};
+
+static DECLARE_MUTEX(sysfsop_list_mutex);
+static struct list_head sysfsop_list;
+
+static struct powerop_point *hwop;
+
+#define powerop_attr(_name) \
+static struct subsys_attribute _name##_attr = { \
+   .attr   = { \
+   .name = __stringify(_name), \
+   .mode = 0644,   \
+   .owner = THIS_MODULE,   \
+   },  \
+   .show   = _name##_show, \
+   .store  = _name##_store,\
+}
+
+static struct attribute_group param_attr_group;
+
+#define to_powerop_param_attr(_attr) container_of(_attr,\
+   struct powerop_param_attribute,attr)
+
+
+decl_subsys(powerop, NULL, NULL);
+static int subsys_reg;
+static int sysfs_init;
+
+static void namedop_release(struct kobject *kobj)
+{
+   struct namedop *op = to_namedop(kobj);
+
+   complete(&op->released);
+   return;
+}
+
+static struct sysfsop *sysfsop_create(const char *id)
+{
+   struct sysfsop *op;
+   int error;
+
+   if ((op = kmalloc(sizeof(struct sysfsop), GFP_KERNEL)) == NULL)
+   return ERR_PTR(-ENOMEM);
+
+   down(&sysfsop_list_mutex);
+   list_add_tail(&op->node, &sysfsop_list);
+   up(&sysfsop_list_mutex);
+   memset(&op->point, 0, sizeof(struct powerop_

Re: PowerOP Take 2 0/3 Intro

2005-08-25 Thread Todd Poynor

Jordan Crouse wrote:

Todd - do you have a ChangeLog from Take 1? :)


Right, here's what's changed in this version...

The generic structure of an operating point as an array of integers is 
dropped.  A struct powerop_point is now an entirely backend-defined 
struct of arbitrary fields.


There is no more PowerOP core layer; all data structures and functions 
for core functionality are provided by the machine-specific backend.


The diagnostic sysfs UI has been split out into a separate, optional 
patch.  A more full-featured UI allowing operating point creation and 
activation via sysfs has also been provided in that patch.  This UI 
primarily serves as an example for experimentation purposes, but is 
pretty close to what a basic userspace-based policy manager might need 
to switch operating points in response to infrequent changes in system 
state.


The UI also embodies the notion of a list of "named operating points" 
that could be registered by other means, such as loading a module with 
data structures that encode the desired operating points (as David 
Brownell has suggested).  The named operating points registered from 
such other interfaces can also be activated from the sysfs UI (that is, 
the hardware can be told to run at that operating point), as an example 
of how to tie in userspace policy managers with such a scheme.


The example platform backend this time is for an embedded system: the TI 
OMAP1 family of processors used for numerous mobile phones and PDAs.  It 
may better illustrate why managing multiple power parameters might be a 
useful capability.  I haven't put out an example of cpufreq integration 
this time, but the idea has changed little from before.


In case it's getting lost in all these details, the main point of all 
this is to pose the question: "are arbitrary power parameters arranged 
into a set with mutually consistent values (called here an operating 
point) a good low-level abstraction for system power management of a 
wide variety of platforms?"  Thanks,


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] PowerOP Take 2 1/3: ARM OMAP1 platform support

2005-08-31 Thread Todd Poynor
 it is that runs on the 
DSP (such as via shared memory message libs from the silicon vendor). 
Soon the other core will be running Linux as well, and the two OSes will 
need to coordinate the system power management, which will be an 
interesting thing to tackle.



  lowpwr1 = assert ULPD LOW_PWR, voltage scale low



Could you describe the policy effect of this bit?  I suspect
a good "PCs don't work like that!" example is lurking here.
That interacts with some other bits, and code ... when would
setting this be good, or bad?


This is how Dynamic Voltage Scaling is done on OMAP1 platforms. 
Assuming you've setup an operating point that is validated to work at 
the reduced voltage level on your hardware by TI (these are two voltage 
levels available), you can optionally specify to run at reduced voltage, 
possibly at an increased cost in latency of transitioning between 
operating points as voltage ramps up or down.  In the case of DPM 
running on an OMAP 1610 H2, you could tell the system to run at 1.5V 
when not idle and at 1.1V when idle, although depending on the ramp time 
(I can't recall for that board, but for some non-OMAPs this can be 
significant) and the realtime constraints of your app there could be 
missed deadlines under such a policy.  If the system isn't running 
anything with a tight deadline then it may be fine to stay at 1.1V or 
voltage scale between the two.



Other parameters such as DSPMMUDIV, LCDDIV, and ARM_PERDIV might also be
useful.



Again, PERDIV changes would need to involve clock.c to cascade
the changes through the clock tree.  Change PERDIV and you'll
need to recalculate the peripheral clocks that derive from it...
better not do it while an I/O operation is actively using it!


On some other platforms this actually becomes necessary, but for OMAP1's 
the trouble with doing so probably precludes anybody from using it.



As with TCDIV, that makes a useful example of something that is
clearly not within the "cpufreq" domain.  


I'll try to cook up an XScale PXA27x example, which adds multiple memory 
and system bus frequencies supported per CPU MHz, quick run vs.turbo 
mode switching of CPU MHz, and some other exotic features.  It has a 
very specific set of "product points" validated by Intel that correspond 
to the operating point abstraction.  If nothing else, it may be 
instructive to consider the variety of ways embedded platforms are being 
designed to be power managed.


--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.4.x kernel BUG at filemap.c:81

2005-02-08 Thread Todd Shetter
Running slackware 10 and 10.1, with kernels 2.4.26, 2.4.27, 2.4.28, 
2.4.29 with highmem 4GB, and highmem i/o support enabled, I get a system 
lockup. This happens in both X and console. Happens with and without my 
Nvidia drivers loaded. I cannot determine what makes this bug present it 
self besides highmem and high i/o support enabled. Im guessing the 
system is fine until highmem is actually used to some point and then it 
borks, but I really have no idea and so im just making a random guess. I 
ran memtest86 for a few hours a while ago thinking that it may be bad 
memory, but that did not seem to be the problem.

If you need anymore information, or have questions, or wish me to test 
anything, PLEASE feel free to contact me, I would really like to see 
this bug resolved. =)

--
Todd Shetter
Feb  8 19:49:31 quark kernel: kernel BUG at filemap.c:81!
Feb  8 19:49:31 quark kernel: invalid operand: 
Feb  8 19:49:31 quark kernel: CPU:0
Feb  8 19:49:31 quark kernel: EIP:0010:[]Tainted: P
Feb  8 19:49:31 quark kernel: EFLAGS: 00210206
Feb  8 19:49:31 quark kernel: eax: 4000   ebx: c18012f0   ecx: 
   edx: f7f06e60
Feb  8 19:49:31 quark kernel: esi: 0001   edi: f7f06e60   ebp: 
f3461e9c   esp: f3461e98
Feb  8 19:49:31 quark kernel: ds: 0018   es: 0018   ss: 0018
Feb  8 19:49:31 quark kernel: Process hdparm (pid: 2191, stackpage=f3461000)
Feb  8 19:49:31 quark kernel: Stack: c18012f0 f3461eb0 c0128c78 f07d0360 
c18012f0 47a0 f3461edc c0128d20
Feb  8 19:49:31 quark kernel:c18012f0 f07d0360 47a0 f7f06e60 
f7f06e60 eed4ab34 47a0 
Feb  8 19:49:31 quark kernel:eed4ab34 f3461f00 c012941e 001f 
479f 0020 012a1f16 f7f06de4
Feb  8 19:49:31 quark kernel: Call Trace:[] [] 
[] [] []
Feb  8 19:49:31 quark kernel:   [] [] [] 
[] []
Feb  8 19:49:31 quark kernel:
Feb  8 19:49:31 quark kernel: Code: 0f 0b 51 00 a8 05 2e c0 6a 01 53 e8 
ef 7e 00 00 8b 5d fc 89


00:00.0 Host bridge: Intel Corp. 82875P Memory Controller Hub (rev 02)
   Subsystem: ABIT Computer Corp.: Unknown device 1014
   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
   Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
   Latency: 0
   Region 0: Memory at e800 (32-bit, prefetchable) [size=128M]
   Capabilities: [e4] #09 [2106]
   Capabilities: [a0] AGP version 3.0
   Status: RQ=32 Iso- ArqSz=2 Cal=2 SBA+ ITACoh- GART64- HTrans- 
64bit- FW+ AGP3+ Rate=x4,x8
   Command: RQ=1 ArqSz=0 Cal=2 SBA+ AGP+ GART64- 64bit- FW- Rate=x8

00:01.0 PCI bridge: Intel Corp. 82875P Processor to AGP Controller (rev 
02) (prog-if 00 [Normal decode])
   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
   Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
   Latency: 64
   Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
   I/O behind bridge: f000-0fff
   Memory behind bridge: f800-faff
   Prefetchable memory behind bridge: f000-f7ff
   BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-

00:03.0 PCI bridge: Intel Corp. 82875P Processor to PCI to CSA Bridge 
(rev 02) (prog-if 00 [Normal decode])
   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
   Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
   Latency: 32
   Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
   I/O behind bridge: a000-afff
   Memory behind bridge: fb00-fb0f
   Prefetchable memory behind bridge: fff0-000f
   BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-

00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface to PCI 
Bridge (rev c2) (prog-if 00 [Normal decode])
   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
   Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
   Latency: 0
   Bus: primary=00, secondary=03, subordinate=03, sec-latency=32
   I/O behind bridge: f000-0fff
   Memory behind bridge: fff0-000f
   Prefetchable memory behind bridge: fff0-000f
   BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-

00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev 02)
   Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
   Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
   Latency: 0

00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage 
Controller (rev 02) (prog-if 8a [Master SecP PriP])
   Subsystem: ABIT Computer Corp.: Unknown device 1014
   Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
   Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
   Latency: 0
   Interrupt: pin A routed to IRQ 11
   Region 0: I/O ports at

Re: 2.4.x kernel BUG at filemap.c:81

2005-02-09 Thread Todd Shetter
Marcelo Tosatti wrote:
On Wed, Feb 09, 2005 at 12:15:03AM -0500, Todd Shetter wrote:
 

Running slackware 10 and 10.1, with kernels 2.4.26, 2.4.27, 2.4.28, 
2.4.29 with highmem 4GB, and highmem i/o support enabled, I get a system 
lockup. This happens in both X and console. Happens with and without my 
Nvidia drivers loaded. I cannot determine what makes this bug present it 
self besides highmem and high i/o support enabled. Im guessing the 
system is fine until highmem is actually used to some point and then it 
borks, but I really have no idea and so im just making a random guess. I 
ran memtest86 for a few hours a while ago thinking that it may be bad 
memory, but that did not seem to be the problem.

If you need anymore information, or have questions, or wish me to test 
anything, PLEASE feel free to contact me, I would really like to see 
this bug resolved. =)

--
Todd Shetter
Feb  8 19:49:31 quark kernel: kernel BUG at filemap.c:81!
Feb  8 19:49:31 quark kernel: invalid operand: 
Feb  8 19:49:31 quark kernel: CPU:0
Feb  8 19:49:31 quark kernel: EIP:0010:[]Tainted: P
   

Hi Todd, 

Why is your kernel tainted ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 

I had the nvidia 1.0-6629 driver loaded when I got that error. I 
compiled the kernel using the slackware 10.1 config, enabled highmem 4GB 
support, highmem i/o, and then some kernel hacking options including 
debugging for highmen related things.

I booted, loaded X with KDE, opened firefox a few times, and then 
started running hdparm because some newer 2.4.x kernels dont play nice 
with my SATA, ICH5, and DMA. hdparm segfaulted while running the drive 
read access portion of its tests, and things locked up from there in 
about 30secs.

I've gotten the same error with the nvidia driver not loaded, so I dont 
think that is part of the problem.

As I said, if you want me to test or try anything feel free to ask.  =)
--
Todd Shetter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.x kernel BUG at filemap.c:81

2005-02-09 Thread Todd Shetter
Marcelo Tosatti wrote:
On Wed, Feb 09, 2005 at 11:30:05AM -0500, Todd Shetter wrote:
 

Marcelo Tosatti wrote:
   

On Wed, Feb 09, 2005 at 12:15:03AM -0500, Todd Shetter wrote:
 

Running slackware 10 and 10.1, with kernels 2.4.26, 2.4.27, 2.4.28, 
2.4.29 with highmem 4GB, and highmem i/o support enabled, I get a system 
lockup. This happens in both X and console. Happens with and without my 
Nvidia drivers loaded. I cannot determine what makes this bug present it 
self besides highmem and high i/o support enabled. Im guessing the 
system is fine until highmem is actually used to some point and then it 
borks, but I really have no idea and so im just making a random guess. I 
ran memtest86 for a few hours a while ago thinking that it may be bad 
memory, but that did not seem to be the problem.

If you need anymore information, or have questions, or wish me to test 
anything, PLEASE feel free to contact me, I would really like to see 
this bug resolved. =)

--
Todd Shetter
Feb  8 19:49:31 quark kernel: kernel BUG at filemap.c:81!
Feb  8 19:49:31 quark kernel: invalid operand: 
Feb  8 19:49:31 quark kernel: CPU:0
Feb  8 19:49:31 quark kernel: EIP:0010:[]Tainted: P
 

   

Hi Todd, 

Why is your kernel tainted ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

 

I had the nvidia 1.0-6629 driver loaded when I got that error. I 
compiled the kernel using the slackware 10.1 config, enabled highmem 4GB 
support, highmem i/o, and then some kernel hacking options including 
debugging for highmen related things.

I booted, loaded X with KDE, opened firefox a few times, and then 
started running hdparm because some newer 2.4.x kernels dont play nice 
with my SATA, ICH5, and DMA. hdparm segfaulted while running the drive 
read access portion of its tests, and things locked up from there in 
about 30secs.

I've gotten the same error with the nvidia driver not loaded, so I dont 
think that is part of the problem.

As I said, if you want me to test or try anything feel free to ask.  =)
   

Todd,
Would be interesting to have the oops output without the kernel nvidia module. 

Do you have that saved?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
 

Sorry, it took me FOREVER to get this bug to appear again, and this time 
its a little different.


Feb  9 15:20:37 quark kernel: kernel BUG at page_alloc.c:142!
Feb  9 15:20:37 quark kernel: invalid operand: 
Feb  9 15:20:37 quark kernel: CPU:0
Feb  9 15:20:37 quark kernel: EIP:0010:[]Not tainted
Feb  9 15:20:37 quark kernel: EFLAGS: 00013206
Feb  9 15:20:37 quark kernel: eax: 0114   ebx: c17e1160   ecx: 
4000   edx: 
Feb  9 15:20:37 quark kernel: esi:    edi: eea037f0   ebp: 
f0f27f24   esp: f0f27ef0
Feb  9 15:20:37 quark kernel: ds: 0018   es: 0018   ss: 0018
Feb  9 15:20:37 quark kernel: Process X (pid: 2206, stackpage=f0f27000)
Feb  9 15:20:37 quark kernel: Stack: c0324d68 00037000 c120 c17e11c0 
c0324cb8 c1030020 c0324cf0 3207
Feb  9 15:20:37 quark kernel:c17e1160 f0f27f24 2a05c067 001fc000 
eea037f0 f0f27f68 c012531c c17e1160
Feb  9 15:20:37 quark kernel:c17e1160 01fd 002f 4980 
 496f f0eee494 4940
Feb  9 15:20:37 quark kernel: Call Trace:[] [] 
[] []
Feb  9 15:20:37 quark kernel:
Feb  9 15:20:37 quark kernel: Code: 0f 0b 8e 00 bf 06 2e c0 8b 53 08 85 
d2 74 08 0f 0b 90 00 bf
Feb  9 15:30:41 quark kernel:  <6>SysRq : SAK

00:00.0 Host bridge: Intel Corp. 82875P Memory Controller Hub (rev 02)
   Subsystem: ABIT Computer Corp.: Unknown device 1014
   Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
   Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
   Latency: 0
   Region 0: Memory at e800 (32-bit, prefetchable) [size=128M]
   Capabilities: [e4] #09 [2106]
   Capabilities: [a0] AGP version 3.0
   Status: RQ=32 Iso- ArqSz=2 Cal=2 SBA+ ITACoh- GART64- HTrans- 
64bit- FW+ AGP3+ Rate=x4,x8
   Command: RQ=1 ArqSz=0 Cal=2 SBA+ AGP- GART64- 64bit- FW- Rate=

00:01.0 PCI bridge: Intel Corp. 82875P Processor to AGP Controller (rev 
02) (prog-if 00 [Normal decode])
   Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
   Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 
   Latency: 64
   Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
   I/O behind bridge: f000-0fff
   Memory behind bridge: f800-faff
   Prefetchable memory behind bridge: f000-f7ff
   BridgeCtl: Par

Re: 2.4.x kernel BUG at filemap.c:81

2005-02-09 Thread Todd Shetter
Marcelo Tosatti wrote:
On Wed, Feb 09, 2005 at 03:47:28PM -0500, Todd Shetter wrote:
 

Running slackware 10 and 10.1, with kernels 2.4.26, 2.4.27, 2.4.28, 
2.4.29 with highmem 4GB, and highmem i/o support enabled, I get a 
system lockup. This happens in both X and console. Happens with and 
without my Nvidia drivers loaded. I cannot determine what makes this 
bug present it self besides highmem and high i/o support enabled. Im 
guessing the system is fine until highmem is actually used to some 
point and then it borks, but I really have no idea and so im just 
making a random guess. I ran memtest86 for a few hours a while ago 
thinking that it may be bad memory, but that did not seem to be the 
problem.

If you need anymore information, or have questions, or wish me to test 
anything, PLEASE feel free to contact me, I would really like to see 
this bug resolved. =)

--
Todd Shetter
Feb  8 19:49:31 quark kernel: kernel BUG at filemap.c:81!
Feb  8 19:49:31 quark kernel: invalid operand: 
Feb  8 19:49:31 quark kernel: CPU:0
Feb  8 19:49:31 quark kernel: EIP:0010:[]Tainted: P
 

   

Hi Todd, 

Why is your kernel tainted ?
 

I had the nvidia 1.0-6629 driver loaded when I got that error. I 
compiled the kernel using the slackware 10.1 config, enabled highmem 4GB 
support, highmem i/o, and then some kernel hacking options including 
debugging for highmen related things.

I booted, loaded X with KDE, opened firefox a few times, and then 
started running hdparm because some newer 2.4.x kernels dont play nice 
with my SATA, ICH5, and DMA. hdparm segfaulted while running the drive 
read access portion of its tests, and things locked up from there in 
about 30secs.

I've gotten the same error with the nvidia driver not loaded, so I dont 
think that is part of the problem.

As I said, if you want me to test or try anything feel free to ask.  =)
 
   

Todd,
Would be interesting to have the oops output without the kernel nvidia 
module. 
Do you have that saved?


 

Sorry, it took me FOREVER to get this bug to appear again, and this time 
its a little different.
   

Hum, both BUGs are due to a page with alive ->buffers mapping.
Did it crashed right after hdparm now too? 

Can you boot your box without SATA drivers, configuring the interface to IDE 
mode ?

Which problems are you facing with newer v2.4.x kernels and SATA? 
 

Im waiting for the system to crash, so I figured I might as well get on 
with the SATA problems

Running 2.4.29 neither the CONFIG_BLK_DEV_IDE_SATA nor the 
CONFIG_SCSI_SATA are set currently and DMA is not enabled on either of 
my drives,  hda: ST380013AS,  hdb: WDC WD2500SD-01KCB0,  hdc: Maxtor 
94610U6. Setting DMA manually on the hard drives yields a HDIO_SET_DMA 
failed: Operation not permitted error.

Using 2.4.26, DMA worked fine on the drives. Under 2.4.27, 2.4.28, and 
2.4.29 using CONFIG_SCSI_SATA does not allow setting of DMA on the 
drives, yielding a HDIO_SET_DMA failed: Operation not permitted error, 
and the transfer speeds reported by hdparm are at about 3MB/s.

Under 2.4.29 using CONFIG_BLK_DEV_IDE_SATA the DMA is set fine upon 
boot, and I get good transfers, hdparm reports 58MB/s on my Western 
Digital drive. I have not tested using CONFIG_BLK_DEV_IDE_SATA on any 
previous kernel versions.

Well, still no crash yetAgain, anything else you want me to try or 
do just let me know.

--
Todd Shetter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.x kernel BUG at filemap.c:81

2005-02-13 Thread Todd Shetter
Jeff Garzik wrote:
Marcelo Tosatti wrote:
On Wed, Feb 09, 2005 at 11:23:42PM -0500, Todd Shetter wrote:
Marcelo Tosatti wrote:

On Wed, Feb 09, 2005 at 03:47:28PM -0500, Todd Shetter wrote:

Running slackware 10 and 10.1, with kernels 2.4.26, 2.4.27, 
2.4.28, 2.4.29 with highmem 4GB, and highmem i/o support 
enabled, I get a system lockup. This happens in both X and 
console. Happens with and without my Nvidia drivers loaded. I 
cannot determine what makes this bug present it self besides 
highmem and high i/o support enabled. Im guessing the system 
is fine until highmem is actually used to some point and then 
it borks, but I really have no idea and so im just making a 
random guess. I ran memtest86 for a few hours a while ago 
thinking that it may be bad memory, but that did not seem to 
be the problem.

If you need anymore information, or have questions, or wish me 
to test anything, PLEASE feel free to contact me, I would 
really like to see this bug resolved. =)

Todd Shetter
Feb  8 19:49:31 quark kernel: kernel BUG at filemap.c:81!
Feb  8 19:49:31 quark kernel: invalid operand: 
Feb  8 19:49:31 quark kernel: CPU:0
Feb  8 19:49:31 quark kernel: EIP:0010:[]
Tainted: P
 

Hi Todd,
Why is your kernel tainted ?
I had the nvidia 1.0-6629 driver loaded when I got that error. I 
compiled the kernel using the slackware 10.1 config, enabled 
highmem 4GB support, highmem i/o, and then some kernel hacking 
options including debugging for highmen related things.

I booted, loaded X with KDE, opened firefox a few times, and 
then started running hdparm because some newer 2.4.x kernels 
dont play nice with my SATA, ICH5, and DMA. hdparm segfaulted 
while running the drive read access portion of its tests, and 
things locked up from there in about 30secs.

I've gotten the same error with the nvidia driver not loaded, so 
I dont think that is part of the problem.

As I said, if you want me to test or try anything feel free to 
ask.  =)

Todd,
Would be interesting to have the oops output without the kernel 
nvidia module. Do you have that saved?

Sorry, it took me FOREVER to get this bug to appear again, and 
this time its a little different.
 

Hum, both BUGs are due to a page with alive ->buffers mapping.
Did it crashed right after hdparm now too?
Can you boot your box without SATA drivers, configuring the 
interface to IDE mode ?

Which problems are you facing with newer v2.4.x kernels and SATA?
Im waiting for the system to crash, so I figured I might as well get 
on with the SATA problems

Running 2.4.29 neither the CONFIG_BLK_DEV_IDE_SATA nor the 
CONFIG_SCSI_SATA are set currently and DMA is not enabled on either 
of my drives,  hda: ST380013AS,  hdb: WDC WD2500SD-01KCB0,  hdc: 
Maxtor 94610U6. Setting DMA manually on the hard drives yields a 
HDIO_SET_DMA failed: Operation not permitted error.

Using 2.4.26, DMA worked fine on the drives. Under 2.4.27, 2.4.28, 
and 2.4.29 using CONFIG_SCSI_SATA does not allow setting of DMA on 
the drives, yielding a HDIO_SET_DMA failed: Operation not permitted 
error, and the transfer speeds reported by hdparm are at about 3MB/s.

I think thats expected. Jeff?

If (a) this is Intel ICH hardware and (b) it is running in combined 
mode, it will get poor performance.

For the SATA drivers, they ALWAYS run in DMA mode, so there is no need 
to support HDIO_SET_DMA.

Jeff

Yes, I am running the SATA in combined mode. That, along with Auto mode, 
is the only way I can get all my devices to work properly under linux. 
Under Enhanced mode there is no primary IDE, only second and third. My 
ATA drive and CDROM are hdc and hdd, which is normal, my first SATA 
drive which would normally be hda should be hde and the other drive 
should be hdf. But the kernel doesnt see them and I cant boot off my 
normally hda, I get an invalid root= option kernel panic. I did edit 
lilo to reflect the change, but the drive wasnt even listed when hdc and 
hdd were. Under SATA only, I only get the SATA drives, and not my ATA 
drive nor my CDROM. With SATA diabled, I cant get the SATA drives at all 
which I expected.

Under Auto, which is the same as combined as far as I can tell, the 
kernel doesnt enable DMA for the drives using libata, and trying to 
enable it by hand yields the HDIO_SET_DMA failed: Operation not 
permitted error.

So, unless there is better performance from libata with ICH5 combined 
mode, I am 'forced' to stay with CONFIG_BLK_DEV_IDE_SATA, which isnt a 
bad thing really. I get good performance from the drives, and havent had 
any problems yet.

On a another note, kernel 2.4.30-pre1 with highmem is working fine for 
me, using CONFIG_BLK_DEV_IDE_SATA, uptime was a little over 2 days.

--
Todd Shetter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Custom power states for non-ACPI systems

2005-03-01 Thread Todd Poynor
Advertise custom sets of system power states for non-ACPI systems.
Currently, /sys/power/state shows and accepts a static set of choices
that are not necessarily meaningful on all platforms (for example,
suspend-to-disk is an option even on diskless embedded systems, and the
meaning of standby vs. suspend-to-mem is not well-defined on
non-ACPI-systems).  This patch allows the platform to register power
states with meaningful names that correspond to the platform's
conventions (for example, "big sleep" and "deep sleep" on TI OMAP), and
only those states that make sense for the platform.

For the time being, the canned set of PM_SUSPEND_STANDBY/MEM/DISK
etc. symbols are preserved, since knowledge of the meanings of those
values have crept into drivers.  There is a separate effort underway to
divorce driver suspend flags from the platform suspend state
identifiers.  Once that is accomplished, we can then replace the suspend
states available with an entirely custom set.  For example, various
embedded platforms have multiple power states that roughly correspond to
suspend-to-mem, and each could be advertised and requested via the PM
interfaces, once drivers no longer look for the one and only
PM_SUSPEND_MEM system suspend state.

If the platform does not register a custom set of power states then the
present-day set remains available as a default.  Will send separately a
patch for an embedded platform to show usage.  Comments appreciated.

Index: linux-2.6.10/include/linux/pm.h
===
--- linux-2.6.10.orig/include/linux/pm.h2005-03-02 00:41:43.0 
+
+++ linux-2.6.10/include/linux/pm.h 2005-03-02 01:12:14.0 +
@@ -216,8 +216,14 @@
 #definePM_DISK_REBOOT  ((__force suspend_disk_method_t) 4)
 #definePM_DISK_MAX ((__force suspend_disk_method_t) 5)
 
+struct pm_suspend_method {
+   char *name;
+   suspend_state_t state;
+};
+
 struct pm_ops {
suspend_disk_method_t pm_disk_mode;
+   struct pm_suspend_method *pm_suspend_methods;
int (*prepare)(suspend_state_t state);
int (*enter)(suspend_state_t state);
int (*finish)(suspend_state_t state);
Index: linux-2.6.10/kernel/power/main.c
===
--- linux-2.6.10.orig/kernel/power/main.c   2005-03-02 00:41:41.0 
+
+++ linux-2.6.10/kernel/power/main.c2005-03-02 01:15:21.0 +
@@ -228,11 +228,22 @@
 
 
 
-char * pm_states[] = {
-   [PM_SUSPEND_STANDBY]= "standby",
-   [PM_SUSPEND_MEM]= "mem",
-   [PM_SUSPEND_DISK]   = "disk",
-   NULL,
+struct pm_suspend_method pm_default_suspend_methods[] = {
+   {
+   .name = "standby",
+   .state = PM_SUSPEND_STANDBY,
+   },
+   {
+   .name = "mem",
+   .state = PM_SUSPEND_MEM,
+   },
+   {
+   .name = "disk",
+   .state = PM_SUSPEND_DISK,
+   },
+   {
+   .name = NULL,
+   },
 };
 
 
@@ -324,19 +335,22 @@
 {
int i;
char * s = buf;
+   struct pm_suspend_method *methods = pm_ops->pm_suspend_methods;
+
+   if (! methods)
+   methods = pm_default_suspend_methods;
+
+   for (i=0; methods[i].name; i++)
+   s += sprintf(s,"%s ",methods[i].name);
 
-   for (i = 0; i < PM_SUSPEND_MAX; i++) {
-   if (pm_states[i])
-   s += sprintf(s,"%s ",pm_states[i]);
-   }
s += sprintf(s,"\n");
return (s - buf);
 }
 
 static ssize_t state_store(struct subsystem * subsys, const char * buf, size_t 
n)
 {
-   suspend_state_t state = PM_SUSPEND_STANDBY;
-   char ** s;
+   struct pm_suspend_method *methods = pm_ops->pm_suspend_methods;
+   int i;
char *p;
int error;
int len;
@@ -344,12 +358,15 @@
p = memchr(buf, '\n', n);
len = p ? p - buf : n;
 
-   for (s = &pm_states[state]; state < PM_SUSPEND_MAX; s++, state++) {
-   if (*s && !strncmp(buf, *s, len))
+   if (! methods)
+   methods = pm_default_suspend_methods;
+
+   for (i = 0; methods[i].name; i++) {
+   if (!strncmp(buf, methods[i].name, len))
break;
}
-   if (*s)
-   error = enter_state(state);
+   if (methods[i].name)
+   error = enter_state(methods[i].state);
else
error = -EINVAL;
return error ? error : n;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Custom power states for non-ACPI systems

2005-03-01 Thread Todd Poynor
An example of custom power states for the TI OMAP family.
/sys/power/states supports a state named "deepsleep", which corresponds
to the platform state actually entered by the present-day system suspend
handler.  It no longer offers the option of "disk" suspend which would
not normally be available in an OMAP-based system, nor does it offer the
choices "standby" or "mem", which are currently somewhat arbitrarily
mapped to actual platform power states on OMAPs.  In the future the OMAP
could be extended to offer the choice of "big sleep" as well, another
platform-specific low-power mode that falls under the general category
of suspend-to-mem, once it is feasible to no longer use the same set of
system suspend state values for all platforms and drivers (as mentioned
in the base note).

Index: linux-2.6.10/arch/arm/mach-omap/pm.c
===
--- linux-2.6.10.orig/arch/arm/mach-omap/pm.c   2005-03-02 01:10:27.0 
+
+++ linux-2.6.10/arch/arm/mach-omap/pm.c2005-03-02 01:13:41.0 
+
@@ -576,8 +576,20 @@
 }
 
 
+static struct pm_suspend_method omap_pm_suspend_methods[] = {
+   {
+   .name = "deepsleep",
+   .state = PM_SUSPEND_MEM,
+   },
+   {
+   .name = NULL,
+   },
+};
+
+
 struct pm_ops omap_pm_ops ={
.pm_disk_mode = 0,
+   .pm_suspend_methods = omap_pm_suspend_methods,
 .prepare= omap_pm_prepare,
 .enter  = omap_pm_enter,
 .finish = omap_pm_finish,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] [PATCH] Custom power states for non-ACPI systems

2005-03-02 Thread Todd Poynor
Pavel Machek wrote:
Hi!

Advertise custom sets of system power states for non-ACPI systems.
Currently, /sys/power/state shows and accepts a static set of choices
that are not necessarily meaningful on all platforms (for example,
suspend-to-disk is an option even on diskless embedded systems, and the
meaning of standby vs. suspend-to-mem is not well-defined on
non-ACPI-systems).  This patch allows the platform to register power
states with meaningful names that correspond to the platform's
conventions (for example, "big sleep" and "deep sleep" on TI OMAP), and
only those states that make sense for the platform.

Maybe this is a bit overdone?
Of course you can have suspend-to-disk on most embedded systems; CF
flash card looks just like disk, and you should be able to suspend to
it.
It's possible (on those with CF/PCMCIA etc.), although due to various 
problems with things like flash size, write speed, and wear leveling 
it's not very common to do so (I've seen two vendors abandon plans for 
this, but no doubt somebody does do it) -- that's why I'd like to have 
the particular platform register the capability if it happens to have 
it, but no, not a big deal.

If OMAP has "big sleep" and "deep sleep", why not simply map them to
"standby" and "suspend-to-ram"?
In fact that's more or less what happens (or will happen once drivers 
like USB stop looking for PM_SUSPEND_MEM, etc.).  There are other 
platforms with more than 2 sleep states (say, XScale PXA27x), so this 
will start to get a bit problematic.  And it seens so easy to truly 
handle the platform's states instead of pretending ACPI S1/S3/S4 are the 
only methods to suspend any system.

If it's preferable, how about replacing the /sys/power/state "standby" 
and "mem" values  to "sleep", and have a /sys/power/sleep attribute that 
tells the methods of sleep available for the platform, much like 
suspend-to-disk methods are handled today?  So the sleep attribute would 
handle "standby" and "mem" for ACPI systems, and other values for 
non-ACPI systems.  Thanks,

--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] [PATCH] Custom power states for non-ACPI systems

2005-03-02 Thread Todd Poynor
Pavel Machek wrote:
...
...but adding new /sys/power/state might be okay. We should not have
introduced "standby" in the first place [but I guess it is not worth
removing now]. If something has more than 2 states (does user really
want to enter different states in different usage?), I guess we can
add something like "deepmem" or whatever. Is there something with more
than 3 states?
In most of the cases I'm thinking of, it wouldn't be a user requesting a 
state but rather software (say, a cell phone progressively entering 
lower power states due to inactivity).  I haven't noticed a platform 
with more than 3 low-power modes so far, but I'm sure it'll happen soon. 
 If the time isn't right for incompatible changes to these interfaces 
then I guess mapping standby and mem to platform-specific things will 
work for now, maybe with some tweak to allow a choice of actual state 
entered.  At some more opportune time in the future I'll suggest an 
attribute that allows a choice of platform-specific method of 
suspend-to-mem, somewhat like the "disk" attribute for suspend-to-disk. 
 Thanks,

--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kernel/power/disk.c trivial cleanups

2005-03-03 Thread Todd Poynor
* Remove duplicate include.

* Avoid "mode set to ''" message when error updating /sys/power/disk.

Signed-off-by: Todd Poynor <[EMAIL PROTECTED]>

--- linux-2.6.11-rc4-orig/kernel/power/disk.c   2005-02-23 09:47:03.0 
-0800
+++ linux-2.6.11-rc4-pm/kernel/power/disk.c 2005-03-03 05:00:18.609860968 
-0800
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "power.h"
 
 
@@ -321,8 +320,9 @@
} else
error = -EINVAL;
 
-   pr_debug("PM: suspend-to-disk mode set to '%s'\n",
-pm_disk_modes[mode]);
+   if (mode == pm_disk_mode)
+   pr_debug("PM: suspend-to-disk mode set to '%s'\n",
+pm_disk_modes[mode]);
up(&pm_sem);
return error ? error : n;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] [PATCH] Custom power states for non-ACPI systems

2005-03-03 Thread Todd Poynor
Pavel Machek wrote:
...
In most of the cases I'm thinking of, it wouldn't be a user 
requesting a state but rather software (say, a cell phone 
progressively entering lower power states due to inactivity).  I 
haven't noticed a platform with more than 3 low-power modes so far, 

Are not your power states more like cpu power states?
These are expected to be system states, and sleeping system
does not take calls, etc...
There's a great variety of behaviors and usage models out there, not 
sure I can draw a useful distinction between cpu power states vs. system 
states, but the net effect could be considered to be approximately the 
same in typical embedded uses: the drivers are called to place 
appropriate devices in a low(er)-power state, various platform thingies 
are slowed or powered off, and the system stops waiting for something to 
wake it up.  In some cases the system does not wake up until an explicit 
user action (button press, etc.), but more commonly 
wake-on-device-activity (including ring from telephony unit) or 
time-based actions (including wake on alarm from event in user's 
datebook) is also wanted (rather like wake-on-LAN et al).  I don't think 
this would correspond well to hardware-managed CPU power states like 
ACPI C states, for example.  Thanks,

--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] [PATCH] Custom power states for non-ACPI systems

2005-03-04 Thread Todd Poynor
Nigel Cunningham wrote:
...
Two way communication between a userspace policy manager and kernel
drivers is implemented via DBus.
In this scheme, 'kernel drivers' doesn't just refer to the drivers for
hardware. It refers to anything remotely power management related,
including code to implement suspend-to-RAM, to disk or the like, ACPI
drivers or code to implement system power states.
The policy manager can enumerate devices and inter-relationships,
capabilities, settings and status information, set and query policies
and implementation results. The drivers can notify events. This
communication doesn't use complicated structures or type definitions.
Rather, all the nous regarding interpretation of the messages that are
sent is in the policy manager and the drivers. One driver might say it's
capable of states called "D0, D1 and D3", another (system) states called
"Deep Sleep" and "Big Sleep". Nothing but the driver itself and
userspace manager need to how to interpret & use these states.
Inter-relationships between drivers are _not_ included in this
information. The policy manager sets policy, the drivers deal with the
specifics of implementing it.
This all sounds exactly like the way we're headed as well, so I'm 
definitely interested in anything I can do to help.  Was thinking that 
can start defining kobject_uevent power events and attributes (with 
enough detail that acpid could use it instead of /proc if the ACPI 
drivers were to convert to it).

Capturing the relationships between drivers is difficult.  If nobody's 
already looking into this then I'll take this up soon.

The userspace manager can in turn [en|dis]able capabilites and send a
list of run-time states that the driver can move between according to
its own logic (eg lack of active children) without notifying the
userspace manager. This would fit in with your power modes above, even
to the level of "cpu idle".
At dynamicpower.sf.net we do something similar for cpufreq-style scaling 
of platform clocks and voltages, setting up desired policy for various 
platform clocks/voltages according to changes in low-level system state 
(primarily scheduler state) from userspace and then letting the state 
machine run without interaction.  Similar policy objects for devices 
sounds intriguing, although the device-specific nature of event triggers 
probably makes this quite difficult.

Mac OS X support for some of these concepts is documented at 
developer.apple.com, looking for ideas to steal...  Thanks,

--
Todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SMP races in proc with thread_struct

2001-05-01 Thread Todd Inglett

Perhaps this is old news...but...

I can easily create a race when reading /proc//stat
(fs/proc/{base.c,array.c}) where a rapidly reading application, such as
"top", starts reading stats for a thread which goes away during the
read.  This is easily reproduced with a program that rapidly forks and
exits while top is running.

On inspection, I don't see how the code can expect the thread_struct to
stay around since it is not holding any lock for the duration of its
use.  The code could hold the thread_struct's lock (after verifying it
still exists while holding tasklist_lock I would imagine), but for
performance I would think a better solution would be to copy the struct
since stale data is probably ok in this case.

Dereferencing a non-existent thread_struct is clearly not ok.

Would anyone familiar with this code care to comment?
-- 
-todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP races in proc with thread_struct

2001-05-03 Thread Todd Inglett

Alexander Viro wrote:
> 
> On Tue, 1 May 2001, Todd Inglett wrote:
> 
> > Perhaps this is old news...but...
> >
> > I can easily create a race when reading /proc//stat
> > (fs/proc/{base.c,array.c}) where a rapidly reading application, such as
> > "top", starts reading stats for a thread which goes away during the
> > read.  This is easily reproduced with a program that rapidly forks and
> > exits while top is running.
> >
> > On inspection, I don't see how the code can expect the thread_struct to
> > stay around since it is not holding any lock for the duration of its
> > use.  The code could hold the thread_struct's lock (after verifying it
> > still exists while holding tasklist_lock I would imagine), but for
> > performance I would think a better solution would be to copy the struct
> > since stale data is probably ok in this case.
> >
> > Dereferencing a non-existent thread_struct is clearly not ok.
> >
> > Would anyone familiar with this code care to comment?
> 
> Code bumps the reference count of couple of pages that hold task_struct
> when it opens the file.

Yes that's right.  [Note that I erroneously wrote thread_struct when I
meant task_struct].

On closer inspection I see it is the *parent* task_struct that is the
problem here.  The task has indeed exited and the memory for the
task_struct is correctly held.  However, the /proc code will grab
tasklist_lock and dereference the parent task_struct to get the parent
pid.  Grabbing tasklist_lock is good, of course, when the task is live
because the parent could be switched at any time.  But in this case the
child is already done.  And so is the parent.  So the child's parent
task_struct is gone, but the stale held task_struct still points to
where it once was.

Does this sound like a possible scenerio?

-- 
-todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP races in proc with thread_struct

2001-05-04 Thread Todd Inglett

Ok, I've got this isolated.  Here's the sequence of events:

1.  Some process T (probably "top") opens /proc/N/stat.
2.  While holding tasklist_lock the proc code does a get_task_struct()
to add a ref count to the page.
3.  Process N exits.
4.  The parent of process N exits.
5.  Process T reads from the open file.  This calls proc_pid_stat()
which dereferences N's task_struct.  This is ok as Alexander points out
because a reference is held.
6.  Using N's task_struct process T attempt to dereference the *parent*
task struct.  It assumes this is ok because:
A) it is holding tasklist_lock so N cannot be reparented in a race.
B) every process *always* has a valid parent.

But this is where hell breaks loose.  Every process has a valid parent
-- unless it is dead and nobody cares.  Process N has already exited and
released from the tasklist while its parent was still alive.  There was
no reason to reparent it.  It just got released.  So N's task_struct has
a dangling ptr to its parent.  Nobody is holding the parent task_struct,
either.  When the parent died memory for its task_struct was released. 
This is ungood.


My opinion here is that this is proc's problem.  When we free a
task_struct it could be "cleaned up" of dangling ptrs, but this is a
hack to cover a bug in proc.

This is not isolated to the parent task_struct, either.  The task_struct
mm is also dereferenced.  It is pretty easy to validate a parent
task_struct ptr (just hold tasklist_lock and run the list to check if it
is still valid -- might not be the *right* task, but it will still be
valid).  However, how do you validate the mm is ok?

It would be nice if there were an easy way to detect when the process
gets into this state.  Certainly it is in this state if the page
reference count is 1.  But multiple processes *could* be reading the
same proc file holding additional artificial ref counts.

Hmm...perhaps if I scan the tasklist for my own task?  If I am not in
the list I either return an error or faked info.  I'll try this

-- 
-todd

BTW, by adding code to validate the parent my test has now run for 18
hours on a 4-way without failure.  It otherwise would fail within the
first 5 minutes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP races in proc with thread_struct

2001-05-05 Thread Todd Inglett

Andreas Ferber wrote:
> 
> On Fri, May 04, 2001 at 10:46:43PM +1000, Keith Owens wrote:
> 
> > For a read only case, the only important
> > thing is not to die, one occurrence of bad data is tolerable.
> 
> Strong NACK. The pages where the bad data comes from may in some cases
> already be reclaimed for other data, probably something security
> relevant, which should never ever be given even read access by an
> unauthorized user. Even if this event may be a very rare case, one
> single occurrence of this is one to much.

Agreed.  Worse, it is not readonly.  The /proc code task_lock's the task
struct, thus writing to it.

I'll post a patch shortly once I've tested it.  Worse case only if the
task is exiting I sweep the tasklist looking for the parent to see if
the parent is still valid.  I am not verifying if it is the actual
parent (it might be a new task allocated at the same spot).  I could
just report 0 (or 1) for the parent for any process that is exiting, but
then you won't be able to see the ppid for zombies.  Or is there another
state I can look for?  What I really need is PF_EXITED :).

I am a little concerned also about mm, file, tty and sig fields. These
appear to be NULLed in do_exit(), but I haven't tracked down tty and sig
yet.
-- 
-todd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] android: binder: Disable preemption while holding the global binder lock

2016-09-08 Thread Todd Kjos
In Android systems, the display pipeline relies on low
latency binder transactions and is therefore sensitive to
delays caused by contention for the global binder lock.
Jank is siginificantly reduced by disabling preemption
while the global binder lock is held.

Originally-from: Riley Andrews 
Signed-off-by: Todd Kjos 
---
 drivers/android/binder.c | 194 +++
 1 file changed, 146 insertions(+), 48 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 16288e7..c36e420 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -379,6 +379,7 @@ static int task_get_unused_fd_flags(struct
binder_proc *proc, int flags)
  struct files_struct *files = proc->files;
  unsigned long rlim_cur;
  unsigned long irqs;
+ int ret;

  if (files == NULL)
  return -ESRCH;
@@ -389,7 +390,11 @@ static int task_get_unused_fd_flags(struct
binder_proc *proc, int flags)
  rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
  unlock_task_sighand(proc->tsk, &irqs);

- return __alloc_fd(files, 0, rlim_cur, flags);
+ preempt_enable_no_resched();
+ ret = __alloc_fd(files, 0, rlim_cur, flags);
+ preempt_disable();
+
+ return ret;
 }

 /*
@@ -398,8 +403,11 @@ static int task_get_unused_fd_flags(struct
binder_proc *proc, int flags)
 static void task_fd_install(
  struct binder_proc *proc, unsigned int fd, struct file *file)
 {
- if (proc->files)
+ if (proc->files) {
+ preempt_enable_no_resched();
  __fd_install(proc->files, fd, file);
+ preempt_disable();
+ }
 }

 /*
@@ -427,6 +435,7 @@ static inline void binder_lock(const char *tag)
 {
  trace_binder_lock(tag);
  mutex_lock(&binder_main_lock);
+ preempt_disable();
  trace_binder_locked(tag);
 }

@@ -434,8 +443,65 @@ static inline void binder_unlock(const char *tag)
 {
  trace_binder_unlock(tag);
  mutex_unlock(&binder_main_lock);
+ preempt_enable();
+}
+
+static inline void *kzalloc_nopreempt(size_t size)
+{
+ void *ptr;
+
+ ptr = kzalloc(size, GFP_NOWAIT);
+ if (ptr)
+ return ptr;
+
+ preempt_enable_no_resched();
+ ptr = kzalloc(size, GFP_KERNEL);
+ preempt_disable();
+
+ return ptr;
+}
+
+static inline long copy_to_user_nopreempt(void __user *to,
+  const void *from, long n)
+{
+ long ret;
+
+ preempt_enable_no_resched();
+ ret = copy_to_user(to, from, n);
+ preempt_disable();
+ return ret;
+}
+
+static inline long copy_from_user_nopreempt(void *to,
+const void __user *from,
+long n)
+{
+ long ret;
+
+ preempt_enable_no_resched();
+ ret = copy_from_user(to, from, n);
+ preempt_disable();
+ return ret;
 }

+#define get_user_nopreempt(x, ptr) \
+({ \
+ int __ret; \
+ preempt_enable_no_resched(); \
+ __ret = get_user(x, ptr); \
+ preempt_disable(); \
+ __ret; \
+})
+
+#define put_user_nopreempt(x, ptr) \
+({ \
+ int __ret; \
+ preempt_enable_no_resched(); \
+ __ret = put_user(x, ptr); \
+ preempt_disable(); \
+ __ret; \
+})
+
 static void binder_set_nice(long nice)
 {
  long min_nice;
@@ -568,6 +634,8 @@ static int binder_update_page_range(struct
binder_proc *proc, int allocate,
  else
  mm = get_task_mm(proc->tsk);

+ preempt_enable_no_resched();
+
  if (mm) {
  down_write(&mm->mmap_sem);
  vma = proc->vma;
@@ -622,6 +690,9 @@ static int binder_update_page_range(struct
binder_proc *proc, int allocate,
  up_write(&mm->mmap_sem);
  mmput(mm);
  }
+
+ preempt_disable();
+
  return 0;

 free_range:
@@ -644,6 +715,9 @@ err_no_vma:
  up_write(&mm->mmap_sem);
  mmput(mm);
  }
+
+ preempt_disable();
+
  return -ENOMEM;
 }

@@ -903,7 +977,7 @@ static struct binder_node *binder_new_node(struct
binder_proc *proc,
  return NULL;
  }

- node = kzalloc(sizeof(*node), GFP_KERNEL);
+ node = kzalloc_nopreempt(sizeof(*node));
  if (node == NULL)
  return NULL;
  binder_stats_created(BINDER_STAT_NODE);
@@ -1040,7 +1114,7 @@ static struct binder_ref
*binder_get_ref_for_node(struct binder_proc *proc,
  else
  return ref;
  }
- new_ref = kzalloc(sizeof(*ref), GFP_KERNEL);
+ new_ref = kzalloc_nopreempt(sizeof(*ref));
  if (new_ref == NULL)
  return NULL;
  binder_stats_created(BINDER_STAT_REF);
@@ -1438,14 +1512,14 @@ static void binder_transaction(struct binder_proc *proc,
  e->to_proc = target_proc->pid;

  /* TODO: reuse incoming transaction for reply */
- t = kzalloc(sizeof(*t), GFP_KERNEL);
+ t = kzalloc_nopreempt(sizeof(*t));
  if (t == NULL) {
  return_error = BR_FAILED_REPLY;
  goto err_alloc_t_failed;
  }
  binder_stats_created(BINDER_STAT_TRANSACTION);

- tcomplete = kzalloc(sizeof(*tcomplete), GFP_KERNEL);
+ tcomplete = kzalloc_nopreempt(sizeof(*tcomplete));
  if (tcomplete == NULL) {
  return_error = BR_FAILED_REPLY;
  goto err_alloc_tcomplete_failed;
@@ -1502,15 +1576,16 @@ static void binder_transaction(struct binder_proc *proc,
  offp = (binder_size_t *)(t->buffer->data +
  ALIGN(tr->data_size, sizeof(void *)));

- if (copy_from_user(t->buffer->data, (const void __user *)(uintptr_t)
-   tr->data.ptr.buffer, tr-&g

Re: [PATCH] android: binder: Disable preemption while holding the global binder lock

2016-09-08 Thread Todd Kjos
This was introduced in the 2015 Nexus devices and should have been
submitted to the kernel then since we keep forward porting it to each
new device.

On Thu, Sep 8, 2016 at 9:12 AM, Todd Kjos  wrote:
> In Android systems, the display pipeline relies on low
> latency binder transactions and is therefore sensitive to
> delays caused by contention for the global binder lock.
> Jank is siginificantly reduced by disabling preemption
> while the global binder lock is held.
>
> Originally-from: Riley Andrews 
> Signed-off-by: Todd Kjos 
> ---
>  drivers/android/binder.c | 194 
> +++
>  1 file changed, 146 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index 16288e7..c36e420 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -379,6 +379,7 @@ static int task_get_unused_fd_flags(struct
> binder_proc *proc, int flags)
>   struct files_struct *files = proc->files;
>   unsigned long rlim_cur;
>   unsigned long irqs;
> + int ret;
>
>   if (files == NULL)
>   return -ESRCH;
> @@ -389,7 +390,11 @@ static int task_get_unused_fd_flags(struct
> binder_proc *proc, int flags)
>   rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
>   unlock_task_sighand(proc->tsk, &irqs);
>
> - return __alloc_fd(files, 0, rlim_cur, flags);
> + preempt_enable_no_resched();
> + ret = __alloc_fd(files, 0, rlim_cur, flags);
> + preempt_disable();
> +
> + return ret;
>  }
>
>  /*
> @@ -398,8 +403,11 @@ static int task_get_unused_fd_flags(struct
> binder_proc *proc, int flags)
>  static void task_fd_install(
>   struct binder_proc *proc, unsigned int fd, struct file *file)
>  {
> - if (proc->files)
> + if (proc->files) {
> + preempt_enable_no_resched();
>   __fd_install(proc->files, fd, file);
> + preempt_disable();
> + }
>  }
>
>  /*
> @@ -427,6 +435,7 @@ static inline void binder_lock(const char *tag)
>  {
>   trace_binder_lock(tag);
>   mutex_lock(&binder_main_lock);
> + preempt_disable();
>   trace_binder_locked(tag);
>  }
>
> @@ -434,8 +443,65 @@ static inline void binder_unlock(const char *tag)
>  {
>   trace_binder_unlock(tag);
>   mutex_unlock(&binder_main_lock);
> + preempt_enable();
> +}
> +
> +static inline void *kzalloc_nopreempt(size_t size)
> +{
> + void *ptr;
> +
> + ptr = kzalloc(size, GFP_NOWAIT);
> + if (ptr)
> + return ptr;
> +
> + preempt_enable_no_resched();
> + ptr = kzalloc(size, GFP_KERNEL);
> + preempt_disable();
> +
> + return ptr;
> +}
> +
> +static inline long copy_to_user_nopreempt(void __user *to,
> +  const void *from, long n)
> +{
> + long ret;
> +
> + preempt_enable_no_resched();
> + ret = copy_to_user(to, from, n);
> + preempt_disable();
> + return ret;
> +}
> +
> +static inline long copy_from_user_nopreempt(void *to,
> +const void __user *from,
> +long n)
> +{
> + long ret;
> +
> + preempt_enable_no_resched();
> + ret = copy_from_user(to, from, n);
> + preempt_disable();
> + return ret;
>  }
>
> +#define get_user_nopreempt(x, ptr) \
> +({ \
> + int __ret; \
> + preempt_enable_no_resched(); \
> + __ret = get_user(x, ptr); \
> + preempt_disable(); \
> + __ret; \
> +})
> +
> +#define put_user_nopreempt(x, ptr) \
> +({ \
> + int __ret; \
> + preempt_enable_no_resched(); \
> + __ret = put_user(x, ptr); \
> + preempt_disable(); \
> + __ret; \
> +})
> +
>  static void binder_set_nice(long nice)
>  {
>   long min_nice;
> @@ -568,6 +634,8 @@ static int binder_update_page_range(struct
> binder_proc *proc, int allocate,
>   else
>   mm = get_task_mm(proc->tsk);
>
> + preempt_enable_no_resched();
> +
>   if (mm) {
>   down_write(&mm->mmap_sem);
>   vma = proc->vma;
> @@ -622,6 +690,9 @@ static int binder_update_page_range(struct
> binder_proc *proc, int allocate,
>   up_write(&mm->mmap_sem);
>   mmput(mm);
>   }
> +
> + preempt_disable();
> +
>   return 0;
>
>  free_range:
> @@ -644,6 +715,9 @@ err_no_vma:
>   up_write(&mm->mmap_sem);
>   mmput(mm);
>   }
> +
> + preempt_disable();
> +
>   return -ENOMEM;
>  }
>
> @@ -903,7 +977,7 @@ static struct binder_node *binder_new_node(struct
> binder_proc *proc,
>   return NULL;
>   }
>
> - node = kzalloc(sizeof(*node), GFP_KERNEL);
> + node = kzalloc_nopreempt(sizeof(*node));
>   if (node == NULL)
>   return NULL;
>   binder_stats_created(BINDER_STAT_NODE);
> @@ -1040,7 +1114,7 @@ static struct binder_ref
> *binder_get_ref_for_node(struct binder_proc *proc,
>   else
&g

[PATCH] sg: recheck MMAP_IO request length with lock held

2017-08-15 Thread Todd Poynor
Commit 1bc0eb044615 ("scsi: sg: protect accesses to 'reserved' page
array") adds needed concurrency protection for the "reserve" buffer.
Some checks that are initially made outside the lock are replicated once
the lock is taken to ensure the checks and resulting decisions are made
using consistent state.

The check that a request with flag SG_FLAG_MMAP_IO set fits in the
reserve buffer also needs to be performed again under the lock to
ensure the reserve buffer length compared against matches the value in
effect when the request is linked to the reserve buffer.  An -ENOMEM
should be returned in this case, instead of switching over to an
indirect buffer as for non-MMAP_IO requests.

Signed-off-by: Todd Poynor 
---
 drivers/scsi/sg.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d7ff71e0c85c..3a44b4bc872b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1735,9 +1735,12 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
!sfp->res_in_use) {
sfp->res_in_use = 1;
sg_link_reserve(sfp, srp, dxfer_len);
-   } else if ((hp->flags & SG_FLAG_MMAP_IO) && sfp->res_in_use) {
+   } else if (hp->flags & SG_FLAG_MMAP_IO) {
+   res = -EBUSY; /* sfp->res_in_use == 1 */
+   if (dxfer_len > rsv_schp->bufflen)
+   res = -ENOMEM;
mutex_unlock(&sfp->f_mutex);
-   return -EBUSY;
+   return res;
} else {
res = sg_build_indirect(req_schp, sfp, dxfer_len);
if (res) {
-- 
2.14.1.480.gb18f417b89-goog



[PATCH] sg: protect against races between mmap() and SG_SET_RESERVED_SIZE

2017-08-15 Thread Todd Poynor
Take f_mutex around mmap() processing to protect against races with
the SG_SET_RESERVED_SIZE ioctl.  Ensure the reserve buffer length
remains consistent during the mapping operation, and set the
"mmap called" flag to prevent further changes to the reserved buffer
size as an atomic operation with the mapping.

Signed-off-by: Todd Poynor 
---
 drivers/scsi/sg.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 3a44b4bc872b..a20718e9f1f4 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1233,6 +1233,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
unsigned long req_sz, len, sa;
Sg_scatter_hold *rsv_schp;
int k, length;
+int ret = 0;
 
if ((!filp) || (!vma) || (!(sfp = (Sg_fd *) filp->private_data)))
return -ENXIO;
@@ -1243,8 +1244,11 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
if (vma->vm_pgoff)
return -EINVAL; /* want no offset */
rsv_schp = &sfp->reserve;
-   if (req_sz > rsv_schp->bufflen)
-   return -ENOMEM; /* cannot map more than reserved buffer */
+   mutex_lock(&sfp->f_mutex);
+   if (req_sz > rsv_schp->bufflen) {
+   ret = -ENOMEM;  /* cannot map more than reserved buffer */
+   goto out;
+   }
 
sa = vma->vm_start;
length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
@@ -1258,7 +1262,9 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
vma->vm_private_data = sfp;
vma->vm_ops = &sg_mmap_vm_ops;
-   return 0;
+out:
+   mutex_unlock(&sfp->f_mutex);
+   return ret;
 }
 
 static void
-- 
2.14.1.480.gb18f417b89-goog



[PATCH] binder: fix proc->files use-after-free

2017-11-14 Thread Todd Kjos
proc->files cleanup is initiated by binder_vma_close. Therefore
a reference on the binder_proc is not enough to prevent the
files_struct from being released while the binder_proc still has
a reference. This can lead to an attempt to dereference the
stale pointer obtained from proc->files prior to proc->files
cleanup. This has been seen once in task_get_unused_fd_flags()
when __alloc_fd() is called with a stale "files".

The fix is to always use get_files_struct() to obtain struct_files
so that the refcount on the files_struct is used to prevent
a premature free. proc->files is removed since we get it every
time.

Signed-off-by: Todd Kjos 
---
 drivers/android/binder.c | 63 +++-
 1 file changed, 30 insertions(+), 33 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index fddf76ef5bd6..794e58c14b15 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -458,9 +458,8 @@ struct binder_ref {
 };
 
 enum binder_deferred_state {
-   BINDER_DEFERRED_PUT_FILES= 0x01,
-   BINDER_DEFERRED_FLUSH= 0x02,
-   BINDER_DEFERRED_RELEASE  = 0x04,
+   BINDER_DEFERRED_FLUSH= 0x01,
+   BINDER_DEFERRED_RELEASE  = 0x02,
 };
 
 /**
@@ -481,8 +480,6 @@ enum binder_deferred_state {
  *(invariant after initialized)
  * @tsk   task_struct for group_leader of process
  *(invariant after initialized)
- * @files files_struct for process
- *(invariant after initialized)
  * @deferred_work_node:   element for binder_deferred_list
  *(protected by binder_deferred_lock)
  * @deferred_work:bitmap of deferred work to perform
@@ -529,7 +526,6 @@ struct binder_proc {
struct list_head waiting_threads;
int pid;
struct task_struct *tsk;
-   struct files_struct *files;
struct hlist_node deferred_work_node;
int deferred_work;
bool is_dead;
@@ -875,22 +871,34 @@ static void binder_free_thread(struct binder_thread 
*thread);
 static void binder_free_proc(struct binder_proc *proc);
 static void binder_inc_node_tmpref_ilocked(struct binder_node *node);
 
+struct files_struct *binder_get_files_struct(struct binder_proc *proc)
+{
+   return get_files_struct(proc->tsk);
+}
+
 static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
 {
-   struct files_struct *files = proc->files;
+   struct files_struct *files;
unsigned long rlim_cur;
unsigned long irqs;
+   int ret;
 
+   files = binder_get_files_struct(proc);
if (files == NULL)
return -ESRCH;
 
-   if (!lock_task_sighand(proc->tsk, &irqs))
-   return -EMFILE;
+   if (!lock_task_sighand(proc->tsk, &irqs)) {
+   ret = -EMFILE;
+   goto err;
+   }
 
rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
unlock_task_sighand(proc->tsk, &irqs);
 
-   return __alloc_fd(files, 0, rlim_cur, flags);
+   ret = __alloc_fd(files, 0, rlim_cur, flags);
+err:
+   put_files_struct(files);
+   return ret;
 }
 
 /*
@@ -899,8 +907,12 @@ static int task_get_unused_fd_flags(struct binder_proc 
*proc, int flags)
 static void task_fd_install(
struct binder_proc *proc, unsigned int fd, struct file *file)
 {
-   if (proc->files)
-   __fd_install(proc->files, fd, file);
+   struct files_struct *files = binder_get_files_struct(proc);
+
+   if (files) {
+   __fd_install(files, fd, file);
+   put_files_struct(files);
+   }
 }
 
 /*
@@ -908,18 +920,20 @@ static void task_fd_install(
  */
 static long task_close_fd(struct binder_proc *proc, unsigned int fd)
 {
+   struct files_struct *files = binder_get_files_struct(proc);
int retval;
 
-   if (proc->files == NULL)
+   if (files == NULL)
return -ESRCH;
 
-   retval = __close_fd(proc->files, fd);
+   retval = __close_fd(files, fd);
/* can't restart close syscall because file table entry was cleared */
if (unlikely(retval == -ERESTARTSYS ||
 retval == -ERESTARTNOINTR ||
 retval == -ERESTARTNOHAND ||
 retval == -ERESTART_RESTARTBLOCK))
retval = -EINTR;
+   put_files_struct(files);
 
return retval;
 }
@@ -4561,7 +4575,6 @@ static void binder_vma_close(struct vm_area_struct *vma)
 (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags,
 (unsigned long)pgprot_val(vma->vm_page_prot));
binder_alloc_vma_close(&proc->alloc);
-   binder_defer_work(proc, BINDER_DEFERRED_PUT_FILES);
 }
 
 static int binder_vm_fault(struct vm_fault *vmf)
@@ -4603,10 +4616,8 @@ static int binder_mmap(struct fil

[PATCH 2/4] staging: gasket: core: device register debug log cleanups

2018-08-02 Thread Todd Poynor
From: Todd Poynor 

At device/driver registration time, convert a not-very-informative
info message to a more informative debug message, drop some not overly
helpful debug messages.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index fa477d0c3c74c..91db71c238804 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -1735,7 +1735,8 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
int desc_idx = -1;
struct gasket_internal_desc *internal;
 
-   pr_info("Initializing Gasket framework device\n");
+   pr_debug("Loading %s driver version %s\n", driver_desc->name,
+driver_desc->driver_version);
/* Check for duplicates and find a free slot. */
mutex_lock(&g_mutex);
 
@@ -1764,8 +1765,6 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
return -EBUSY;
}
 
-   /* Internal structure setup. */
-   pr_debug("Performing initial internal structure setup.\n");
internal = &g_descs[desc_idx];
mutex_init(&internal->mutex);
memset(internal->devs, 0, sizeof(struct gasket_dev *) * GASKET_DEV_MAX);
@@ -1788,7 +1787,6 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
 * Not using pci_register_driver() (without underscores), as it
 * depends on KBUILD_MODNAME, and this is a shared file.
 */
-   pr_debug("Registering PCI driver.\n");
ret = __pci_register_driver(&internal->pci, driver_desc->module,
driver_desc->name);
if (ret) {
@@ -1796,7 +1794,6 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
goto fail1;
}
 
-   pr_debug("Registering char driver.\n");
ret = register_chrdev_region(MKDEV(driver_desc->major,
   driver_desc->minor), GASKET_DEV_MAX,
 driver_desc->name);
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 4/4] Revert "staging: gasket: core: hold reference to pci_dev while used"

2018-08-02 Thread Todd Poynor
From: Todd Poynor 

There's no need to take an additional reference on the pci_dev structure
for the pointer copy saved in gasket data structures.

This reverts commit:
8dd8a48b9a7d ("staging: gasket: core: hold reference to pci_dev while used")

Reported-by: Dmitry Torokhov 
Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 93a4d9f08eaab..2d209e36cf372 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -255,7 +255,6 @@ static void gasket_free_dev(struct gasket_dev *gasket_dev)
internal_desc->devs[gasket_dev->dev_idx] = NULL;
mutex_unlock(&internal_desc->mutex);
put_device(gasket_dev->dev);
-   pci_dev_put(gasket_dev->pci_dev);
kfree(gasket_dev);
 }
 
@@ -1477,7 +1476,7 @@ static int gasket_pci_probe(struct pci_dev *pci_dev,
ret = gasket_alloc_dev(internal_desc, parent, &gasket_dev, kobj_name);
if (ret)
return ret;
-   gasket_dev->pci_dev = pci_dev_get(pci_dev);
+   gasket_dev->pci_dev = pci_dev;
if (IS_ERR_OR_NULL(gasket_dev->dev_info.device)) {
pr_err("Cannot create %s device %s [ret = %ld]\n",
   driver_desc->name, gasket_dev->dev_info.name,
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 3/4] staging: gasket: core: add subsystem and device info to logs

2018-08-02 Thread Todd Poynor
From: Todd Poynor 

Identify gasket as the subsystem printing various messages.
Add the driver name to appropriate messages to indicate which driver
has a problem.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 91db71c238804..93a4d9f08eaab 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -6,6 +6,9 @@
  *
  * Copyright (C) 2018 Google, Inc.
  */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include "gasket_core.h"
 
 #include "gasket_interrupt.h"
@@ -208,7 +211,7 @@ static int gasket_alloc_dev(struct gasket_internal_desc 
*internal_desc,
 
gasket_dev = *pdev = kzalloc(sizeof(*gasket_dev), GFP_KERNEL);
if (!gasket_dev) {
-   pr_err("no memory for device\n");
+   pr_err("no memory for device %s\n", kobj_name);
return -ENOMEM;
}
internal_desc->devs[dev_idx] = gasket_dev;
@@ -1760,7 +1763,7 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
mutex_unlock(&g_mutex);
 
if (desc_idx == -1) {
-   pr_err("Too many Gasket drivers loaded: %d\n",
+   pr_err("too many drivers loaded, max %d\n",
   GASKET_FRAMEWORK_DESC_MAX);
return -EBUSY;
}
@@ -1790,7 +1793,8 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
ret = __pci_register_driver(&internal->pci, driver_desc->module,
driver_desc->name);
if (ret) {
-   pr_err("cannot register pci driver [ret=%d]\n", ret);
+   pr_err("cannot register %s pci driver [ret=%d]\n",
+  driver_desc->name, ret);
goto fail1;
}
 
@@ -1798,7 +1802,8 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
   driver_desc->minor), GASKET_DEV_MAX,
 driver_desc->name);
if (ret) {
-   pr_err("cannot register char driver [ret=%d]\n", ret);
+   pr_err("cannot register %s char driver [ret=%d]\n",
+  driver_desc->name, ret);
goto fail2;
}
 
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 0/4 v2] staging: gasket: cleanups du jour

2018-08-02 Thread Todd Poynor
From: Todd Poynor 

More cleanups for the gasket+apex drivers.

Patched changed in v2 from v1:
  staging: gasket: core: print driver version code at registration time
  staging: gasket: core: move driver loaded log after error cases
 Above 2 patches replaced by new patch:
staging: gasket: core: remove registration logs
  staging: gasket: core: device register debug log cleanups
 Drop explicit "gasket:" prefix, use pr_fmt instead (in following patch)
  staging: gasket: core: add subsystem and device info to error logs
 Renamed: staging: gasket: core: add subsystem and device info to logs
 Use pr_fmt for modname prefix on logs
  Revert "staging: gasket: core: hold reference to pci_dev while used"
 Fixup SHA format in commit text.

Patches dropped from v2, already merged from v1:
  staging: gasket: apex: enable power save mode by default
  staging: gasket: remove "reset type" param from framework
  staging: gasket: apex: drop reset type param

Todd Poynor (4):
  staging: gasket: core: remove registration logs
  staging: gasket: core: device register debug log cleanups
  staging: gasket: core: add subsystem and device info to logs
  Revert "staging: gasket: core: hold reference to pci_dev while used"

 drivers/staging/gasket/gasket_core.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 1/4] staging: gasket: core: remove registration logs

2018-08-02 Thread Todd Poynor
From: Todd Poynor 

Remove logs for loading gasket drivers.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 2b75f100da4d3..fa477d0c3c74c 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -1758,9 +1758,6 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
}
mutex_unlock(&g_mutex);
 
-   pr_info("Loaded %s driver, framework version %s\n",
-   driver_desc->name, GASKET_FRAMEWORK_VERSION);
-
if (desc_idx == -1) {
pr_err("Too many Gasket drivers loaded: %d\n",
   GASKET_FRAMEWORK_DESC_MAX);
@@ -1808,7 +1805,6 @@ int gasket_register_device(const struct 
gasket_driver_desc *driver_desc)
goto fail2;
}
 
-   pr_info("Driver registered successfully.\n");
return 0;
 
 fail2:
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 05/15] staging: gasket: core: remove device enable and disable callbacks

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Device enable/disable operations are moving from being initiated through
the gasket framework to being initiated by the gasket device driver.
The driver can perform any processing needed for these operations before
or after the calls into the framework.  Neither of these callbacks are
implemented for the only gasket driver upstream today, apex.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c |  9 -
 drivers/staging/gasket/gasket_core.h | 27 ++-
 2 files changed, 2 insertions(+), 34 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 2741256eacfe8..b070efaf0d41c 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -648,8 +648,6 @@ static void gasket_disable_dev(struct gasket_dev 
*gasket_dev)
gasket_page_table_cleanup(gasket_dev->page_table[i]);
}
}
-
-   check_and_invoke_callback(gasket_dev, driver_desc->disable_dev_cb);
 }
 
 /*
@@ -1408,13 +1406,6 @@ static int gasket_enable_dev(struct gasket_internal_desc 
*internal_desc,
}
gasket_dev->hardware_revision = ret;
 
-   ret = check_and_invoke_callback(gasket_dev, driver_desc->enable_dev_cb);
-   if (ret) {
-   dev_err(gasket_dev->dev, "Error in enable device cb: %d\n",
-   ret);
-   return ret;
-   }
-
/* device_status_cb returns a device status, not an error code. */
gasket_dev->status = gasket_get_hw_status(gasket_dev);
if (gasket_dev->status == GASKET_STATUS_DEAD)
diff --git a/drivers/staging/gasket/gasket_core.h 
b/drivers/staging/gasket/gasket_core.h
index 9f9bc66a0daa0..5d40bc7f52e91 100644
--- a/drivers/staging/gasket/gasket_core.h
+++ b/drivers/staging/gasket/gasket_core.h
@@ -473,34 +473,11 @@ struct gasket_driver_desc {
 */
int (*device_close_cb)(struct gasket_dev *dev);
 
-   /*
-* enable_dev_cb: Callback immediately before enabling the device.
-* @dev: Pointer to the gasket_dev struct for this driver instance.
-*
-* This callback is invoked after the device has been added and all BAR
-* spaces mapped, immediately before registering and enabling the
-* [character] device via cdev_add. If this call fails (returns
-* nonzero), disable_dev_cb will be called.
-*
-* Note that cdev are initialized but not active
-* (cdev_add has not yet been called) when this callback is invoked.
-*/
-   int (*enable_dev_cb)(struct gasket_dev *dev);
-
-   /*
-* disable_dev_cb: Callback immediately after disabling the device.
-* @dev: Pointer to the gasket_dev struct for this driver instance.
-*
-* Called during device shutdown, immediately after disabling device
-* operations via cdev_del.
-*/
-   int (*disable_dev_cb)(struct gasket_dev *dev);
-
/*
 * sysfs_setup_cb: Callback to set up driver-specific sysfs nodes.
 * @dev: Pointer to the gasket_dev struct for this device.
 *
-* Called just before enable_dev_cb.
+* Called during the add gasket device call.
 *
 */
int (*sysfs_setup_cb)(struct gasket_dev *dev);
@@ -509,7 +486,7 @@ struct gasket_driver_desc {
 * sysfs_cleanup_cb: Callback to clean up driver-specific sysfs nodes.
 * @dev: Pointer to the gasket_dev struct for this device.
 *
-* Called just before disable_dev_cb.
+* Called during device disable processing.
 *
 */
int (*sysfs_cleanup_cb)(struct gasket_dev *dev);
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 02/15] staging: gasket: core: move core PCI calls to device drivers

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Remove gasket wrapping of PCI probe, enable, disable, and remove
functions.  Replace with calls to add and remove PCI gasket devices,
to be called by the gasket device drivers.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 82 
 drivers/staging/gasket/gasket_core.h |  6 ++
 2 files changed, 28 insertions(+), 60 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 2d209e36cf372..01cafe1ff6605 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -51,9 +51,6 @@ struct gasket_internal_desc {
/* Kernel-internal device class. */
struct class *class;
 
-   /* PCI subsystem metadata associated with this driver. */
-   struct pci_driver pci;
-
/* Instantiated / present devices of this type. */
struct gasket_dev *devs[GASKET_DEV_MAX];
 };
@@ -368,10 +365,10 @@ static void gasket_unmap_pci_bar(struct gasket_dev *dev, 
int bar_num)
 }
 
 /*
- * Setup PCI & set up memory mapping for the specified device.
+ * Setup PCI memory mapping for the specified device.
  *
- * Enables the PCI device, reads the BAR registers and sets up pointers to the
- * device's memory mapped IO space.
+ * Reads the BAR registers and sets up pointers to the device's memory mapped
+ * IO space.
  *
  * Returns 0 on success and a negative value otherwise.
  */
@@ -380,14 +377,6 @@ static int gasket_setup_pci(struct pci_dev *pci_dev,
 {
int i, mapped_bars, ret;
 
-   ret = pci_enable_device(pci_dev);
-   if (ret) {
-   dev_err(gasket_dev->dev, "cannot enable PCI device\n");
-   return ret;
-   }
-
-   pci_set_master(pci_dev);
-
for (i = 0; i < GASKET_NUM_BARS; i++) {
ret = gasket_map_pci_bar(gasket_dev, i);
if (ret) {
@@ -402,19 +391,16 @@ static int gasket_setup_pci(struct pci_dev *pci_dev,
for (i = 0; i < mapped_bars; i++)
gasket_unmap_pci_bar(gasket_dev, i);
 
-   pci_disable_device(pci_dev);
return -ENOMEM;
 }
 
-/* Unmaps memory and cleans up PCI for the specified device. */
+/* Unmaps memory for the specified device. */
 static void gasket_cleanup_pci(struct gasket_dev *gasket_dev)
 {
int i;
 
for (i = 0; i < GASKET_NUM_BARS; i++)
gasket_unmap_pci_bar(gasket_dev, i);
-
-   pci_disable_device(gasket_dev->pci_dev);
 }
 
 /* Determine the health of the Gasket device. */
@@ -1443,15 +1429,14 @@ static int gasket_enable_dev(struct 
gasket_internal_desc *internal_desc,
 }
 
 /*
- * PCI subsystem probe function.
- *
- * Called when a Gasket device is found. Allocates device metadata, maps device
- * memory, and calls gasket_enable_dev to prepare the device for active use.
+ * Add PCI gasket device.
  *
- * Returns 0 if successful and a negative value otherwise.
+ * Called by Gasket device probe function.
+ * Allocates device metadata, maps device memory, and calls gasket_enable_dev
+ * to prepare the device for active use.
  */
-static int gasket_pci_probe(struct pci_dev *pci_dev,
-   const struct pci_device_id *id)
+int gasket_pci_add_device(struct pci_dev *pci_dev,
+ struct gasket_dev **gasket_devp)
 {
int ret;
const char *kobj_name = dev_name(&pci_dev->dev);
@@ -1460,13 +1445,14 @@ static int gasket_pci_probe(struct pci_dev *pci_dev,
const struct gasket_driver_desc *driver_desc;
struct device *parent;
 
-   pr_info("Add Gasket device %s\n", kobj_name);
+   pr_debug("add PCI device %s\n", kobj_name);
 
mutex_lock(&g_mutex);
internal_desc = lookup_internal_desc(pci_dev);
mutex_unlock(&g_mutex);
if (!internal_desc) {
-   pr_err("PCI probe called for unknown driver type\n");
+   dev_err(&pci_dev->dev,
+   "PCI add device called for unknown driver type\n");
return -ENODEV;
}
 
@@ -1530,6 +1516,7 @@ static int gasket_pci_probe(struct pci_dev *pci_dev,
goto fail5;
}
 
+   *gasket_devp = gasket_dev;
return 0;
 
 fail5:
@@ -1545,14 +1532,10 @@ static int gasket_pci_probe(struct pci_dev *pci_dev,
gasket_free_dev(gasket_dev);
return ret;
 }
+EXPORT_SYMBOL(gasket_pci_add_device);
 
-/*
- * PCI subsystem remove function.
- *
- * Called to remove a Gasket device. Finds the device in the device list and
- * cleans up metadata.
- */
-static void gasket_pci_remove(struct pci_dev *pci_dev)
+/* Remove a PCI gasket device. */
+void gasket_pci_remove_device(struct pci_dev *pci_dev)
 {
int i;
struct gasket_internal_desc *internal_desc;
@@ -1583,8 +1566,8 @@ static void gasket_pci_remove(struct pci_dev *pci_dev)
if (!gasket_dev

[PATCH 03/15] staging: gasket: apex: move PCI core calls to apex driver

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Apex driver moves PCI core calls like probe, enable, and remove from
gasket to apex.  Call new functions in gasket to register apex as a PCI
device to the gasket framework.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/apex_driver.c | 49 +++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/gasket/apex_driver.c 
b/drivers/staging/gasket/apex_driver.c
index 42cef68eb4c19..b47661442009d 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -621,6 +622,36 @@ static void apex_pci_fixup_class(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_CLASS_HEADER(APEX_PCI_VENDOR_ID, APEX_PCI_DEVICE_ID,
   PCI_CLASS_NOT_DEFINED, 8, apex_pci_fixup_class);
 
+static int apex_pci_probe(struct pci_dev *pci_dev,
+ const struct pci_device_id *id)
+{
+   int ret;
+   struct gasket_dev *gasket_dev;
+
+   ret = pci_enable_device(pci_dev);
+   if (ret) {
+   dev_err(&pci_dev->dev, "error enabling PCI device\n");
+   return ret;
+   }
+
+   pci_set_master(pci_dev);
+
+   ret = gasket_pci_add_device(pci_dev, &gasket_dev);
+   if (ret) {
+   dev_err(&pci_dev->dev, "error adding gasket device\n");
+   pci_disable_device(pci_dev);
+   return ret;
+   }
+
+   return 0;
+}
+
+static void apex_pci_remove(struct pci_dev *pci_dev)
+{
+   gasket_pci_remove_device(pci_dev);
+   pci_disable_device(pci_dev);
+}
+
 static struct gasket_driver_desc apex_desc = {
.name = "apex",
.driver_version = APEX_DRIVER_VERSION,
@@ -672,13 +703,29 @@ static struct gasket_driver_desc apex_desc = {
.device_reset_cb = apex_reset,
 };
 
+static struct pci_driver apex_pci_driver = {
+   .name = "apex",
+   .probe = apex_pci_probe,
+   .remove = apex_pci_remove,
+   .id_table = apex_pci_ids,
+};
+
 static int __init apex_init(void)
 {
-   return gasket_register_device(&apex_desc);
+   int ret;
+
+   ret = gasket_register_device(&apex_desc);
+   if (ret)
+   return ret;
+   ret = pci_register_driver(&apex_pci_driver);
+   if (ret)
+   gasket_unregister_device(&apex_desc);
+   return ret;
 }
 
 static void apex_exit(void)
 {
+   pci_unregister_driver(&apex_pci_driver);
gasket_unregister_device(&apex_desc);
 }
 MODULE_DESCRIPTION("Google Apex driver");
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 04/15] staging: gasket: core: convert remaining info logs to debug

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Remaining info-level logs in gasket core converted to debug-level; the
information is not needed during normal system operation.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 01cafe1ff6605..2741256eacfe8 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -1819,7 +1819,7 @@ void gasket_unregister_device(const struct 
gasket_driver_desc *driver_desc)
g_descs[desc_idx].driver_desc = NULL;
mutex_unlock(&g_mutex);
 
-   pr_info("removed %s driver\n", driver_desc->name);
+   pr_debug("removed %s driver\n", driver_desc->name);
 }
 EXPORT_SYMBOL(gasket_unregister_device);
 
@@ -1827,7 +1827,7 @@ static int __init gasket_init(void)
 {
int i;
 
-   pr_info("Performing one-time init of the Gasket framework.\n");
+   pr_debug("%s\n", __func__);
/* Check for duplicates and find a free slot. */
mutex_lock(&g_mutex);
for (i = 0; i < GASKET_FRAMEWORK_DESC_MAX; i++) {
@@ -1843,8 +1843,7 @@ static int __init gasket_init(void)
 
 static void __exit gasket_exit(void)
 {
-   /* No deinit/dealloc needed at present. */
-   pr_info("Removing Gasket framework module.\n");
+   pr_debug("%s\n", __func__);
 }
 MODULE_DESCRIPTION("Google Gasket driver framework");
 MODULE_VERSION(GASKET_FRAMEWORK_VERSION);
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 09/15] staging: gasket: core: delete device add and remove callbacks

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Gasket device drivers are now in charge of orchestrating the device add
and removal sequences, so the callbacks from the framework to the device
drivers for these events are no longer needed.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 10 --
 drivers/staging/gasket/gasket_core.h | 29 
 2 files changed, 39 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index fad4883e6332c..0d76e18fcde5b 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -1468,12 +1468,6 @@ int gasket_pci_add_device(struct pci_dev *pci_dev,
if (ret)
goto fail2;
 
-   ret = check_and_invoke_callback(gasket_dev, driver_desc->add_dev_cb);
-   if (ret) {
-   dev_err(gasket_dev->dev, "Error in add device cb: %d\n", ret);
-   goto fail2;
-   }
-
ret = gasket_sysfs_create_mapping(gasket_dev->dev_info.device,
  gasket_dev);
if (ret)
@@ -1512,7 +1506,6 @@ int gasket_pci_add_device(struct pci_dev *pci_dev,
gasket_sysfs_remove_mapping(gasket_dev->dev_info.device);
 fail2:
gasket_cleanup_pci(gasket_dev);
-   check_and_invoke_callback(gasket_dev, driver_desc->remove_dev_cb);
device_destroy(internal_desc->class, gasket_dev->dev_info.devt);
 fail1:
gasket_free_dev(gasket_dev);
@@ -1559,9 +1552,6 @@ void gasket_pci_remove_device(struct pci_dev *pci_dev)
 
check_and_invoke_callback(gasket_dev, driver_desc->sysfs_cleanup_cb);
gasket_sysfs_remove_mapping(gasket_dev->dev_info.device);
-
-   check_and_invoke_callback(gasket_dev, driver_desc->remove_dev_cb);
-
device_destroy(internal_desc->class, gasket_dev->dev_info.devt);
gasket_free_dev(gasket_dev);
 }
diff --git a/drivers/staging/gasket/gasket_core.h 
b/drivers/staging/gasket/gasket_core.h
index 9c143ebeba452..0ef0a2640f0fe 100644
--- a/drivers/staging/gasket/gasket_core.h
+++ b/drivers/staging/gasket/gasket_core.h
@@ -302,12 +302,6 @@ struct gasket_dev {
/* Hardware revision value for this device. */
int hardware_revision;
 
-   /*
-* Device-specific data; allocated in gasket_driver_desc.add_dev_cb()
-* and freed in gasket_driver_desc.remove_dev_cb().
-*/
-   void *cb_data;
-
/* Protects access to per-device data (i.e. this structure). */
struct mutex mutex;
 
@@ -415,29 +409,6 @@ struct gasket_driver_desc {
int interrupt_pack_width;
 
/* Driver callback functions - all may be NULL */
-   /*
-* add_dev_cb: Callback when a device is found.
-* @dev: The gasket_dev struct for this driver instance.
-*
-* This callback should initialize the device-specific cb_data.
-* Called when a device is found by the driver,
-* before any BAR ranges have been mapped. If this call fails (returns
-* nonzero), remove_dev_cb will be called.
-*
-*/
-   int (*add_dev_cb)(struct gasket_dev *dev);
-
-   /*
-* remove_dev_cb: Callback for when a device is removed from the system.
-* @dev: The gasket_dev struct for this driver instance.
-*
-* This callback should free data allocated in add_dev_cb.
-* Called immediately before a device is unregistered by the driver.
-* All framework-managed resources will have been cleaned up by the time
-* this callback is invoked (PCI BARs, character devices, ...).
-*/
-   int (*remove_dev_cb)(struct gasket_dev *dev);
-
/*
 * device_open_cb: Callback for when a device node is opened in write
 * mode.
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 00/15] staging: gasket: unwrap pci core and more

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Stop wrapping PCI core calls like probe, enable, remove, etc. in the
gasket framework, move these calls to the device driver instead.  Have
gasket drivers call into framework on init, enable, disable,
etc. sequences, rather than the other way around.  Remove the
gasket-to-device callbacks associated with these sequences.

Plus a few other fixes and cleanups.

Todd Poynor (15):
  staging: gasket: sysfs: clean up state if ENOMEM removing mapping
  staging: gasket: core: move core PCI calls to device drivers
  staging: gasket: apex: move PCI core calls to apex driver
  staging: gasket: core: convert remaining info logs to debug
  staging: gasket: core: remove device enable and disable callbacks
  staging: gasket: apex: remove device enable and disable callbacks
  staging: gasket: core: let device driver enable/disable gasket device
  staging: gasket: apex: enable/disable gasket device from apex
  staging: gasket: core: delete device add and remove callbacks
  staging: gasket: apex: fold device add/remove logic inline
  staging: gasket: core: remove sysfs setup and cleanup callbacks
  staging: gasket: apex: move sysfs setup code to probe function
  staging: gasket: core: protect against races during unregister
  staging: gasket: apex: place in low power reset until opened
  staging: gasket: core: remove incorrect extraneous comment

 drivers/staging/gasket/apex_driver.c  | 145 +-
 drivers/staging/gasket/gasket_core.c  | 140 ++---
 drivers/staging/gasket/gasket_core.h  |  82 +++
 drivers/staging/gasket/gasket_sysfs.c |  13 ++-
 4 files changed, 148 insertions(+), 232 deletions(-)

-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 01/15] staging: gasket: sysfs: clean up state if ENOMEM removing mapping

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

If kcalloc() returns NULL in put_mapping(), continue to clean up state,
including dropping the reference on the struct device and free attribute
memory.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_sysfs.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/gasket/gasket_sysfs.c 
b/drivers/staging/gasket/gasket_sysfs.c
index 56d62aea51118..fc45f0d13e87d 100644
--- a/drivers/staging/gasket/gasket_sysfs.c
+++ b/drivers/staging/gasket/gasket_sysfs.c
@@ -101,13 +101,12 @@ static void put_mapping(struct gasket_sysfs_mapping 
*mapping)
files_to_remove = kcalloc(num_files_to_remove,
  sizeof(*files_to_remove),
  GFP_KERNEL);
-   if (!files_to_remove) {
-   mutex_unlock(&mapping->mutex);
-   return;
-   }
-
-   for (i = 0; i < num_files_to_remove; i++)
-   files_to_remove[i] = mapping->attributes[i].attr;
+   if (files_to_remove)
+   for (i = 0; i < num_files_to_remove; i++)
+   files_to_remove[i] =
+   mapping->attributes[i].attr;
+   else
+   num_files_to_remove = 0;
 
kfree(mapping->attributes);
mapping->attributes = NULL;
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 15/15] staging: gasket: core: remove incorrect extraneous comment

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

A copy-and-pasted comment from another code sequence is removed from
gasket core init sequence.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index a6462b6d702f1..d12ab560411f7 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -1792,7 +1792,6 @@ static int __init gasket_init(void)
int i;
 
pr_debug("%s\n", __func__);
-   /* Check for duplicates and find a free slot. */
mutex_lock(&g_mutex);
for (i = 0; i < GASKET_FRAMEWORK_DESC_MAX; i++) {
g_descs[i].driver_desc = NULL;
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 07/15] staging: gasket: core: let device driver enable/disable gasket device

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Move gasket device enable/disable functions from internal calls to
external calls from the gasket device drivers.  The device driver will
call these functions at appropriate times in its processing, placing
the device driver in control of this sequence and reducing the need for
callbacks from framework back to the device drivers during the
enable/disable sequences.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 22 --
 drivers/staging/gasket/gasket_core.h |  6 ++
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index b070efaf0d41c..fad4883e6332c 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -628,7 +628,7 @@ static int gasket_add_cdev(struct gasket_cdev_info 
*dev_info,
 }
 
 /* Disable device operations. */
-static void gasket_disable_dev(struct gasket_dev *gasket_dev)
+void gasket_disable_device(struct gasket_dev *gasket_dev)
 {
const struct gasket_driver_desc *driver_desc =
gasket_dev->internal_desc->driver_desc;
@@ -649,6 +649,7 @@ static void gasket_disable_dev(struct gasket_dev 
*gasket_dev)
}
}
 }
+EXPORT_SYMBOL(gasket_disable_device);
 
 /*
  * Registered descriptor lookup.
@@ -1350,13 +1351,12 @@ static const struct file_operations gasket_file_ops = {
 };
 
 /* Perform final init and marks the device as active. */
-static int gasket_enable_dev(struct gasket_internal_desc *internal_desc,
-struct gasket_dev *gasket_dev)
+int gasket_enable_device(struct gasket_dev *gasket_dev)
 {
int tbl_idx;
int ret;
const struct gasket_driver_desc *driver_desc =
-   internal_desc->driver_desc;
+   gasket_dev->internal_desc->driver_desc;
 
ret = gasket_interrupt_init(gasket_dev, driver_desc->name,
driver_desc->interrupt_type,
@@ -1418,13 +1418,15 @@ static int gasket_enable_dev(struct 
gasket_internal_desc *internal_desc,
 
return 0;
 }
+EXPORT_SYMBOL(gasket_enable_device);
 
 /*
  * Add PCI gasket device.
  *
  * Called by Gasket device probe function.
- * Allocates device metadata, maps device memory, and calls gasket_enable_dev
- * to prepare the device for active use.
+ * Allocates device metadata and maps device memory.  The device driver must
+ * call gasket_enable_device after driver init is complete to place the device
+ * in active use.
  */
 int gasket_pci_add_device(struct pci_dev *pci_dev,
  struct gasket_dev **gasket_devp)
@@ -1500,13 +1502,6 @@ int gasket_pci_add_device(struct pci_dev *pci_dev,
goto fail5;
}
 
-   ret = gasket_enable_dev(internal_desc, gasket_dev);
-   if (ret) {
-   pr_err("cannot setup %s device\n", driver_desc->name);
-   gasket_disable_dev(gasket_dev);
-   goto fail5;
-   }
-
*gasket_devp = gasket_dev;
return 0;
 
@@ -1560,7 +1555,6 @@ void gasket_pci_remove_device(struct pci_dev *pci_dev)
dev_dbg(gasket_dev->dev, "remove %s PCI gasket device\n",
internal_desc->driver_desc->name);
 
-   gasket_disable_dev(gasket_dev);
gasket_cleanup_pci(gasket_dev);
 
check_and_invoke_callback(gasket_dev, driver_desc->sysfs_cleanup_cb);
diff --git a/drivers/staging/gasket/gasket_core.h 
b/drivers/staging/gasket/gasket_core.h
index 5d40bc7f52e91..9c143ebeba452 100644
--- a/drivers/staging/gasket/gasket_core.h
+++ b/drivers/staging/gasket/gasket_core.h
@@ -590,6 +590,12 @@ int gasket_pci_add_device(struct pci_dev *pci_dev,
 /* Remove a PCI gasket device. */
 void gasket_pci_remove_device(struct pci_dev *pci_dev);
 
+/* Enable a Gasket device. */
+int gasket_enable_device(struct gasket_dev *gasket_dev);
+
+/* Disable a Gasket device. */
+void gasket_disable_device(struct gasket_dev *gasket_dev);
+
 /*
  * Reset the Gasket device.
  * @gasket_dev: Gasket device struct.
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 14/15] staging: gasket: apex: place in low power reset until opened

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

The apex device is left out of reset mode at the end of device
probe/initialize processing.  Add a call to enter reset at the end of
the sequence, triggering power gating and other low power features.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/apex_driver.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/staging/gasket/apex_driver.c 
b/drivers/staging/gasket/apex_driver.c
index 55319619b2e66..c747e9ca45186 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -644,6 +644,10 @@ static int apex_pci_probe(struct pci_dev *pci_dev,
goto remove_device;
}
 
+   /* Place device in low power mode until opened */
+   if (allow_power_save)
+   apex_enter_reset(gasket_dev);
+
return 0;
 
 remove_device:
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 13/15] staging: gasket: core: protect against races during unregister

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Keep mutex held across the unregistration operation, until the
driver_desc field of the global table is removed, to prevent a
concurrent accessor from looking up the driver_desc while
gasket_unregister_device() is in the processing of removing it.

Reported-by: Guenter Roeck 
Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index ace92f107ed58..a6462b6d702f1 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -1765,9 +1765,9 @@ void gasket_unregister_device(const struct 
gasket_driver_desc *driver_desc)
break;
}
}
-   mutex_unlock(&g_mutex);
 
if (!internal_desc) {
+   mutex_unlock(&g_mutex);
pr_err("request to unregister unknown desc: %s, %d:%d\n",
   driver_desc->name, driver_desc->major,
   driver_desc->minor);
@@ -1780,7 +1780,6 @@ void gasket_unregister_device(const struct 
gasket_driver_desc *driver_desc)
class_destroy(internal_desc->class);
 
/* Finally, effectively "remove" the driver. */
-   mutex_lock(&g_mutex);
g_descs[desc_idx].driver_desc = NULL;
mutex_unlock(&g_mutex);
 
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 11/15] staging: gasket: core: remove sysfs setup and cleanup callbacks

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Gasket device drivers now call into the gasket framework to initialize
and de-initialize, rather than the other way around.  The calling code
can perform sysfs setup and cleanup actions without callbacks from the
framework.  Remove the sysfs setup and cleanup callbacks.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 10 --
 drivers/staging/gasket/gasket_core.h | 18 --
 2 files changed, 28 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 0d76e18fcde5b..ace92f107ed58 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -1489,18 +1489,9 @@ int gasket_pci_add_device(struct pci_dev *pci_dev,
if (ret)
goto fail4;
 
-   ret = check_and_invoke_callback(gasket_dev,
-   driver_desc->sysfs_setup_cb);
-   if (ret) {
-   dev_err(gasket_dev->dev, "Error in sysfs setup cb: %d\n", ret);
-   goto fail5;
-   }
-
*gasket_devp = gasket_dev;
return 0;
 
-fail5:
-   check_and_invoke_callback(gasket_dev, driver_desc->sysfs_cleanup_cb);
 fail4:
 fail3:
gasket_sysfs_remove_mapping(gasket_dev->dev_info.device);
@@ -1550,7 +1541,6 @@ void gasket_pci_remove_device(struct pci_dev *pci_dev)
 
gasket_cleanup_pci(gasket_dev);
 
-   check_and_invoke_callback(gasket_dev, driver_desc->sysfs_cleanup_cb);
gasket_sysfs_remove_mapping(gasket_dev->dev_info.device);
device_destroy(internal_desc->class, gasket_dev->dev_info.devt);
gasket_free_dev(gasket_dev);
diff --git a/drivers/staging/gasket/gasket_core.h 
b/drivers/staging/gasket/gasket_core.h
index 0ef0a2640f0fe..275fd0b345b6e 100644
--- a/drivers/staging/gasket/gasket_core.h
+++ b/drivers/staging/gasket/gasket_core.h
@@ -444,24 +444,6 @@ struct gasket_driver_desc {
 */
int (*device_close_cb)(struct gasket_dev *dev);
 
-   /*
-* sysfs_setup_cb: Callback to set up driver-specific sysfs nodes.
-* @dev: Pointer to the gasket_dev struct for this device.
-*
-* Called during the add gasket device call.
-*
-*/
-   int (*sysfs_setup_cb)(struct gasket_dev *dev);
-
-   /*
-* sysfs_cleanup_cb: Callback to clean up driver-specific sysfs nodes.
-* @dev: Pointer to the gasket_dev struct for this device.
-*
-* Called during device disable processing.
-*
-*/
-   int (*sysfs_cleanup_cb)(struct gasket_dev *dev);
-
/*
 * get_mappable_regions_cb: Get descriptors of mappable device memory.
 * @gasket_dev: Pointer to the struct gasket_dev for this device.
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 12/15] staging: gasket: apex: move sysfs setup code to probe function

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

The gasket framework no longer provides callbacks to the device driver
for sysfs setup and teardown.  Move the sysfs setup code to the device
probe function.  Apex does not implement sysfs cleanup code.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/apex_driver.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/gasket/apex_driver.c 
b/drivers/staging/gasket/apex_driver.c
index 69ca7fb10eddc..55319619b2e66 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -568,12 +568,6 @@ static struct gasket_sysfs_attribute apex_sysfs_attrs[] = {
GASKET_END_OF_ATTR_ARRAY
 };
 
-static int apex_sysfs_setup_cb(struct gasket_dev *gasket_dev)
-{
-   return gasket_sysfs_create_entries(gasket_dev->dev_info.device,
-  apex_sysfs_attrs);
-}
-
 /* On device open, perform a core reinit reset. */
 static int apex_device_open_cb(struct gasket_dev *gasket_dev)
 {
@@ -639,6 +633,11 @@ static int apex_pci_probe(struct pci_dev *pci_dev,
goto remove_device;
}
 
+   ret = gasket_sysfs_create_entries(gasket_dev->dev_info.device,
+ apex_sysfs_attrs);
+   if (ret)
+   dev_err(&pci_dev->dev, "error creating device sysfs entries\n");
+
ret = gasket_enable_device(gasket_dev);
if (ret) {
dev_err(&pci_dev->dev, "error enabling gasket device\n");
@@ -695,9 +694,6 @@ static struct gasket_driver_desc apex_desc = {
.interrupts = apex_interrupts,
.interrupt_pack_width = 7,
 
-   .sysfs_setup_cb = apex_sysfs_setup_cb,
-   .sysfs_cleanup_cb = NULL,
-
.device_open_cb = apex_device_open_cb,
.device_close_cb = apex_device_cleanup,
 
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 08/15] staging: gasket: apex: enable/disable gasket device from apex

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Gasket framework now places device drivers in charge of calling APIs to
enable and disable gasket device operations.  Make the appropriate calls
from the apex driver.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/apex_driver.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/staging/gasket/apex_driver.c 
b/drivers/staging/gasket/apex_driver.c
index e2bc06b5244f7..1d8a100c52885 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -643,11 +643,23 @@ static int apex_pci_probe(struct pci_dev *pci_dev,
return ret;
}
 
+   pci_set_drvdata(pci_dev, gasket_dev);
+   ret = gasket_enable_device(gasket_dev);
+   if (ret) {
+   dev_err(&pci_dev->dev, "error enabling gasket device\n");
+   gasket_pci_remove_device(pci_dev);
+   pci_disable_device(pci_dev);
+   return ret;
+   }
+
return 0;
 }
 
 static void apex_pci_remove(struct pci_dev *pci_dev)
 {
+   struct gasket_dev *gasket_dev = pci_get_drvdata(pci_dev);
+
+   gasket_disable_device(gasket_dev);
gasket_pci_remove_device(pci_dev);
pci_disable_device(pci_dev);
 }
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 10/15] staging: gasket: apex: fold device add/remove logic inline

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

Gasket device drivers are now in charge of the device add and remove
sequences; the framework callbacks for these are deleted.  Move the
apex device add callback code to the probe function.  Apex did not
implement the removal callback.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/apex_driver.c | 69 +---
 1 file changed, 32 insertions(+), 37 deletions(-)

diff --git a/drivers/staging/gasket/apex_driver.c 
b/drivers/staging/gasket/apex_driver.c
index 1d8a100c52885..69ca7fb10eddc 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -448,37 +448,6 @@ static int apex_reset(struct gasket_dev *gasket_dev)
return ret;
 }
 
-static int apex_add_dev_cb(struct gasket_dev *gasket_dev)
-{
-   ulong page_table_ready, msix_table_ready;
-   int retries = 0;
-
-   apex_reset(gasket_dev);
-
-   while (retries < APEX_RESET_RETRY) {
-   page_table_ready =
-   gasket_dev_read_64(gasket_dev, APEX_BAR_INDEX,
-  
APEX_BAR2_REG_KERNEL_HIB_PAGE_TABLE_INIT);
-   msix_table_ready =
-   gasket_dev_read_64(gasket_dev, APEX_BAR_INDEX,
-  
APEX_BAR2_REG_KERNEL_HIB_MSIX_TABLE_INIT);
-   if (page_table_ready && msix_table_ready)
-   break;
-   schedule_timeout(msecs_to_jiffies(APEX_RESET_DELAY));
-   retries++;
-   }
-
-   if (retries == APEX_RESET_RETRY) {
-   if (!page_table_ready)
-   dev_err(gasket_dev->dev, "Page table init timed out\n");
-   if (!msix_table_ready)
-   dev_err(gasket_dev->dev, "MSI-X table init timed 
out\n");
-   return -ETIMEDOUT;
-   }
-
-   return 0;
-}
-
 /*
  * Check permissions for Apex ioctls.
  * Returns true if the current user may execute this ioctl, and false 
otherwise.
@@ -626,6 +595,8 @@ static int apex_pci_probe(struct pci_dev *pci_dev,
  const struct pci_device_id *id)
 {
int ret;
+   ulong page_table_ready, msix_table_ready;
+   int retries = 0;
struct gasket_dev *gasket_dev;
 
ret = pci_enable_device(pci_dev);
@@ -644,15 +615,42 @@ static int apex_pci_probe(struct pci_dev *pci_dev,
}
 
pci_set_drvdata(pci_dev, gasket_dev);
+   apex_reset(gasket_dev);
+
+   while (retries < APEX_RESET_RETRY) {
+   page_table_ready =
+   gasket_dev_read_64(gasket_dev, APEX_BAR_INDEX,
+  
APEX_BAR2_REG_KERNEL_HIB_PAGE_TABLE_INIT);
+   msix_table_ready =
+   gasket_dev_read_64(gasket_dev, APEX_BAR_INDEX,
+  
APEX_BAR2_REG_KERNEL_HIB_MSIX_TABLE_INIT);
+   if (page_table_ready && msix_table_ready)
+   break;
+   schedule_timeout(msecs_to_jiffies(APEX_RESET_DELAY));
+   retries++;
+   }
+
+   if (retries == APEX_RESET_RETRY) {
+   if (!page_table_ready)
+   dev_err(gasket_dev->dev, "Page table init timed out\n");
+   if (!msix_table_ready)
+   dev_err(gasket_dev->dev, "MSI-X table init timed 
out\n");
+   ret = -ETIMEDOUT;
+   goto remove_device;
+   }
+
ret = gasket_enable_device(gasket_dev);
if (ret) {
dev_err(&pci_dev->dev, "error enabling gasket device\n");
-   gasket_pci_remove_device(pci_dev);
-   pci_disable_device(pci_dev);
-   return ret;
+   goto remove_device;
}
 
return 0;
+
+remove_device:
+   gasket_pci_remove_device(pci_dev);
+   pci_disable_device(pci_dev);
+   return ret;
 }
 
 static void apex_pci_remove(struct pci_dev *pci_dev)
@@ -697,9 +695,6 @@ static struct gasket_driver_desc apex_desc = {
.interrupts = apex_interrupts,
.interrupt_pack_width = 7,
 
-   .add_dev_cb = apex_add_dev_cb,
-   .remove_dev_cb = NULL,
-
.sysfs_setup_cb = apex_sysfs_setup_cb,
.sysfs_cleanup_cb = NULL,
 
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 06/15] staging: gasket: apex: remove device enable and disable callbacks

2018-08-05 Thread Todd Poynor
From: Todd Poynor 

These are not implemented for apex, and are now being removed from the
gasket framework.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/apex_driver.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/staging/gasket/apex_driver.c 
b/drivers/staging/gasket/apex_driver.c
index b47661442009d..e2bc06b5244f7 100644
--- a/drivers/staging/gasket/apex_driver.c
+++ b/drivers/staging/gasket/apex_driver.c
@@ -688,9 +688,6 @@ static struct gasket_driver_desc apex_desc = {
.add_dev_cb = apex_add_dev_cb,
.remove_dev_cb = NULL,
 
-   .enable_dev_cb = NULL,
-   .disable_dev_cb = NULL,
-
.sysfs_setup_cb = apex_sysfs_setup_cb,
.sysfs_cleanup_cb = NULL,
 
-- 
2.18.0.597.ga71716f1ad-goog



[RFC] vruntime updated incorrectly when rt_mutex boots prio?

2018-08-07 Thread Todd Kjos
This issue was discovered on a 4.9-based android device, but the
relevant mainline code appears to be the same. The symptom is that
over time the some workloads become sluggish resulting in missed
frames or sluggishness. It appears to be the same issue described in
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/567836.html.

Here is the scenario: A task is deactivated while still in the fair
class. The task is then boosted to RT, so rt_mutex_setprio() is
called. This changes the task to RT and calls check_class_changed(),
which eventually calls detach_task_cfs_rq(), which is where
vruntime_normalized() sees that the task's state is TASK_WAKING, which
results in skipping the subtraction of the rq's min_vruntime from the
task's vruntime. Later, when the prio is deboosted and the task is
moved back to the fair class, the fair rq's min_vruntime is added to
the task's vruntime, resulting in vruntime inflation.

When investigating the problem, it was found that the change below
fixes the problem by forcing vruntime_normalized() to return false if
the sched_class is not CFS (though we're concerned that it might
introduce other issues):

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 91f7b3322a15..267056f2e2ca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11125,7 +11125,7 @@ static inline bool vruntime_normalized(struct
task_struct *p)
 * - A task which has been woken up by try_to_wake_up() and
 *   waiting for actually being woken up by sched_ttwu_pending().
 */
-   if (!se->sum_exec_runtime || p->state == TASK_WAKING)
+   if (!se->sum_exec_runtime || (p->state == TASK_WAKING &&
p->sched_class == &fair_sched_class))
return true;

return false;

Do folks agree that this is incorrect behavior? Does this fix look
appropriate and safe? Other ideas?

-Todd


[PATCH] acpi: bus: Fixed a pointer coding style issue

2018-08-07 Thread Tom Todd
Fixed a coding style issue.

Signed-off-by: Tom Todd 
---
 drivers/acpi/bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 84b4a62018eb..73e43d81ec63 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -962,7 +962,7 @@ static int acpi_device_probe(struct device *dev)
return 0;
 }
 
-static int acpi_device_remove(struct device * dev)
+static int acpi_device_remove(struct device *dev)
 {
struct acpi_device *acpi_dev = to_acpi_device(dev);
struct acpi_driver *acpi_drv = acpi_dev->driver;
-- 
2.18.0



[PATCH v2] drivers: base: initcall_debug logs for driver probe times

2018-06-20 Thread Todd Poynor
From: Todd Poynor 

Add initcall_debug logs for each driver device probe call, for example:

   probe of a380.ramoops returned 1 after 3007 usecs

This replaces the previous code added to report times for deferred
probes.  It also reports OF platform bus device creates that were
formerly lumped together in a single entry for function
of_platform_default_populate_init, as well as helping to annotate other
initcalls that involve device probing.

Remove restriction on printing probe times only during initcalls, since
initcall_debug now continues to show driver timing info past the boot
phase.

Signed-off-by: Todd Poynor 
---
 drivers/base/dd.c | 50 +--
 1 file changed, 22 insertions(+), 28 deletions(-)

Changes from v1:
* unsigned long long -> s64
* use ktime_to_us() instead of ktime_to_ns with shift
* print probe return value
* drop code that stopped output on end of initcall processing

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 1435d7281c66..6ea9c5cece71 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -53,7 +53,6 @@ static DEFINE_MUTEX(deferred_probe_mutex);
 static LIST_HEAD(deferred_probe_pending_list);
 static LIST_HEAD(deferred_probe_active_list);
 static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
-static bool initcalls_done;
 
 /*
  * In some cases, like suspend to RAM or hibernation, It might be reasonable
@@ -62,26 +61,6 @@ static bool initcalls_done;
  */
 static bool defer_all_probes;
 
-/*
- * For initcall_debug, show the deferred probes executed in late_initcall
- * processing.
- */
-static void deferred_probe_debug(struct device *dev)
-{
-   ktime_t calltime, delta, rettime;
-   unsigned long long duration;
-
-   printk(KERN_DEBUG "deferred probe %s @ %i\n", dev_name(dev),
-  task_pid_nr(current));
-   calltime = ktime_get();
-   bus_probe_device(dev);
-   rettime = ktime_get();
-   delta = ktime_sub(rettime, calltime);
-   duration = (unsigned long long) ktime_to_ns(delta) >> 10;
-   printk(KERN_DEBUG "deferred probe %s returned after %lld usecs\n",
-  dev_name(dev), duration);
-}
-
 /*
  * deferred_probe_work_func() - Retry probing devices in the active list.
  */
@@ -125,11 +104,7 @@ static void deferred_probe_work_func(struct work_struct 
*work)
device_pm_move_to_tail(dev);
 
dev_dbg(dev, "Retrying from deferred list\n");
-   if (initcall_debug && !initcalls_done)
-   deferred_probe_debug(dev);
-   else
-   bus_probe_device(dev);
-
+   bus_probe_device(dev);
mutex_lock(&deferred_probe_mutex);
 
put_device(dev);
@@ -237,7 +212,6 @@ static int deferred_probe_initcall(void)
driver_deferred_probe_trigger();
/* Sort as many dependencies as possible before exiting initcalls */
flush_work(&deferred_probe_work);
-   initcalls_done = true;
return 0;
 }
 late_initcall(deferred_probe_initcall);
@@ -527,6 +501,23 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
return ret;
 }
 
+/*
+ * For initcall_debug, show the driver probe time.
+ */
+static int really_probe_debug(struct device *dev, struct device_driver *drv)
+{
+   ktime_t calltime, delta, rettime;
+   int ret;
+
+   calltime = ktime_get();
+   ret = really_probe(dev, drv);
+   rettime = ktime_get();
+   delta = ktime_sub(rettime, calltime);
+   printk(KERN_DEBUG "probe of %s returned %d after %lld usecs\n",
+  dev_name(dev), ret, (s64) ktime_to_us(delta));
+   return ret;
+}
+
 /**
  * driver_probe_done
  * Determine if the probe sequence is finished or not.
@@ -585,7 +576,10 @@ int driver_probe_device(struct device_driver *drv, struct 
device *dev)
pm_runtime_get_sync(dev->parent);
 
pm_runtime_barrier(dev);
-   ret = really_probe(dev, drv);
+   if (initcall_debug)
+   ret = really_probe_debug(dev, drv);
+   else
+   ret = really_probe(dev, drv);
pm_request_idle(dev);
 
if (dev->parent)
-- 
2.18.0.rc1.244.gcf134e6275-goog



Re: [GIT PULL] Staging/IIO driver patches for 4.19-rc1

2018-08-28 Thread Todd Poynor
On Tue, Aug 28, 2018 at 3:38 AM Ahmed S. Darwish  wrote:
>[...]
> On Sat, 18 Aug 2018 17:57:24 +0200, Greg KH wrote:
> [...]
> > addition of some new IIO drivers.  Also added was a "gasket" driver from
> > Google that needs loads of work and the erofs filesystem.
> >
>
> Why are we adding __a whole new in-kernel framework__ for
> developing basic user-space drivers?
>
> We already have a frameowrk for that, and it's UIO. [1] The UIO
> code is a very stable and simple subsystem; it's also heavily used
> in the embedded industry..

Yeah, it's clear all the userspace I/O framework code needs to move to
UIO with some patches to add agreed-upon new pieces.  I think everyone
agrees this code shouldn't move out of staging until that happens.  A
whole lot of work is needed on these drivers, and UIO conversion is on
my list to address soon.

>
> I've looked at the gasket documentation [2], and the first user
> of this new in-kernel API [3], and this is almost replicating UIO
> it's not funny. [4] True, the gasket APIs adds some extra new
> conveniences (PCI BAR re-mapping, MSI, ..), but there's no
> technical reason this cannot be added to the UIO code instead.
>
> More-over, the exposed user-space API is just some ioctls. So if
> google hase some shipped user-space code that is using this,
> hopefully the driver can still be re-implemented through UIO
> without changing these bits..
>
> Thanks,
>
> [1] https://www.kernel.org/doc/html/v4.18/driver-api/uio-howto.html
> [2] drivers/staging/gasket/gasket_core.h :: struct gasket_driver_desc
> [3] drivers/staging/gasket/apex_driver.c
> [4] include/linux/uio_driver.h
>
> --
> Darwi
> http://darwish.chasingpointers.com

Thanks -- Todd


[PATCH] binder: use standard functions to allocate fds

2018-08-28 Thread Todd Kjos
Binder uses internal fs interfaces to allocate and install fds:

__alloc_fd
__fd_install
__close_fd
get_files_struct
put_files_struct

These were used to support the passing of fds between processes
as part of a transaction. The actual allocation and installation
of the fds in the target process was handled by the sending
process so the standard functions, alloc_fd() and fd_install()
which assume task==current couldn't be used.

This patch refactors this mechanism so that the fds are
allocated and installed by the target process allowing the
standard functions to be used.

The sender now creates a list of fd fixups that contains the
struct *file and the address to fixup with the new fd once
it is allocated. This list is processed by the target process
when the transaction is dequeued.

A new error case is introduced by this change. If an async
transaction with file descriptors cannot allocate new
fds in the target (probably due to out of file descriptors),
the transaction is discarded with a log message. In the old
implementation this would have been detected in the sender
context and failed prior to sending.

Signed-off-by: Todd Kjos 
---
 drivers/android/Kconfig|   2 +-
 drivers/android/binder.c   | 387 -
 drivers/android/binder_trace.h |  36 ++-
 3 files changed, 260 insertions(+), 165 deletions(-)

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 432e9ad770703..51e8250d113fa 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -10,7 +10,7 @@ if ANDROID
 
 config ANDROID_BINDER_IPC
bool "Android Binder IPC Driver"
-   depends on MMU
+   depends on MMU && !CPU_CACHE_VIVT
default n
---help---
  Binder is used in Android for both communication between processes,
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d58763b6b0090..50856319bc7da 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -457,9 +458,8 @@ struct binder_ref {
 };
 
 enum binder_deferred_state {
-   BINDER_DEFERRED_PUT_FILES= 0x01,
-   BINDER_DEFERRED_FLUSH= 0x02,
-   BINDER_DEFERRED_RELEASE  = 0x04,
+   BINDER_DEFERRED_FLUSH= 0x01,
+   BINDER_DEFERRED_RELEASE  = 0x02,
 };
 
 /**
@@ -480,9 +480,6 @@ enum binder_deferred_state {
  *(invariant after initialized)
  * @tsk   task_struct for group_leader of process
  *(invariant after initialized)
- * @files files_struct for process
- *(protected by @files_lock)
- * @files_lockmutex to protect @files
  * @deferred_work_node:   element for binder_deferred_list
  *(protected by binder_deferred_lock)
  * @deferred_work:bitmap of deferred work to perform
@@ -527,8 +524,6 @@ struct binder_proc {
struct list_head waiting_threads;
int pid;
struct task_struct *tsk;
-   struct files_struct *files;
-   struct mutex files_lock;
struct hlist_node deferred_work_node;
int deferred_work;
bool is_dead;
@@ -611,6 +606,23 @@ struct binder_thread {
bool is_dead;
 };
 
+/**
+ * struct binder_txn_fd_fixup - transaction fd fixup list element
+ * @fixup_entry:  list entry
+ * @file: struct file to be associated with new fd
+ * @offset:   offset in buffer data to this fixup
+ *
+ * List element for fd fixups in a transaction. Since file
+ * descriptors need to be allocated in the context of the
+ * target process, we pass each fd to be processed in this
+ * struct.
+ */
+struct binder_txn_fd_fixup {
+   struct list_head fixup_entry;
+   struct file *file;
+   size_t offset;
+};
+
 struct binder_transaction {
int debug_id;
struct binder_work work;
@@ -628,6 +640,7 @@ struct binder_transaction {
longpriority;
longsaved_priority;
kuid_t  sender_euid;
+   struct list_head fd_fixups;
/**
 * @lock:  protects @from, @to_proc, and @to_thread
 *
@@ -920,66 +933,6 @@ static void binder_free_thread(struct binder_thread 
*thread);
 static void binder_free_proc(struct binder_proc *proc);
 static void binder_inc_node_tmpref_ilocked(struct binder_node *node);
 
-static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
-{
-   unsigned long rlim_cur;
-   unsigned long irqs;
-   int ret;
-
-   mutex_lock(&proc->files_lock);
-   if (proc->files == NULL) {
-   ret = -ESRCH;
-   goto err;
-   }
-   if (!lock_task_sighand(proc->tsk, &irqs)) {
-   ret = -EMFILE;
-   goto err;
-   }
-   rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
-   unlock_task_sighand(proc->tsk, &am

[PATCH v2] binder: use standard functions to allocate fds

2018-08-28 Thread Todd Kjos
Binder uses internal fs interfaces to allocate and install fds:

__alloc_fd
__fd_install
__close_fd
get_files_struct
put_files_struct

These were used to support the passing of fds between processes
as part of a transaction. The actual allocation and installation
of the fds in the target process was handled by the sending
process so the standard functions, alloc_fd() and fd_install()
which assume task==current couldn't be used.

This patch refactors this mechanism so that the fds are
allocated and installed by the target process allowing the
standard functions to be used.

The sender now creates a list of fd fixups that contains the
struct *file and the address to fixup with the new fd once
it is allocated. This list is processed by the target process
when the transaction is dequeued.

A new error case is introduced by this change. If an async
transaction with file descriptors cannot allocate new
fds in the target (probably due to out of file descriptors),
the transaction is discarded with a log message. In the old
implementation this would have been detected in the sender
context and failed prior to sending.

Signed-off-by: Todd Kjos 
---
v2: use "%zu" printk format for size_t

 drivers/android/Kconfig|   2 +-
 drivers/android/binder.c   | 387 -
 drivers/android/binder_trace.h |  36 ++-
 3 files changed, 260 insertions(+), 165 deletions(-)

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 432e9ad770703..51e8250d113fa 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -10,7 +10,7 @@ if ANDROID
 
 config ANDROID_BINDER_IPC
bool "Android Binder IPC Driver"
-   depends on MMU
+   depends on MMU && !CPU_CACHE_VIVT
default n
---help---
  Binder is used in Android for both communication between processes,
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d58763b6b0090..50856319bc7da 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -457,9 +458,8 @@ struct binder_ref {
 };
 
 enum binder_deferred_state {
-   BINDER_DEFERRED_PUT_FILES= 0x01,
-   BINDER_DEFERRED_FLUSH= 0x02,
-   BINDER_DEFERRED_RELEASE  = 0x04,
+   BINDER_DEFERRED_FLUSH= 0x01,
+   BINDER_DEFERRED_RELEASE  = 0x02,
 };
 
 /**
@@ -480,9 +480,6 @@ enum binder_deferred_state {
  *(invariant after initialized)
  * @tsk   task_struct for group_leader of process
  *(invariant after initialized)
- * @files files_struct for process
- *(protected by @files_lock)
- * @files_lockmutex to protect @files
  * @deferred_work_node:   element for binder_deferred_list
  *(protected by binder_deferred_lock)
  * @deferred_work:bitmap of deferred work to perform
@@ -527,8 +524,6 @@ struct binder_proc {
struct list_head waiting_threads;
int pid;
struct task_struct *tsk;
-   struct files_struct *files;
-   struct mutex files_lock;
struct hlist_node deferred_work_node;
int deferred_work;
bool is_dead;
@@ -611,6 +606,23 @@ struct binder_thread {
bool is_dead;
 };
 
+/**
+ * struct binder_txn_fd_fixup - transaction fd fixup list element
+ * @fixup_entry:  list entry
+ * @file: struct file to be associated with new fd
+ * @offset:   offset in buffer data to this fixup
+ *
+ * List element for fd fixups in a transaction. Since file
+ * descriptors need to be allocated in the context of the
+ * target process, we pass each fd to be processed in this
+ * struct.
+ */
+struct binder_txn_fd_fixup {
+   struct list_head fixup_entry;
+   struct file *file;
+   size_t offset;
+};
+
 struct binder_transaction {
int debug_id;
struct binder_work work;
@@ -628,6 +640,7 @@ struct binder_transaction {
longpriority;
longsaved_priority;
kuid_t  sender_euid;
+   struct list_head fd_fixups;
/**
 * @lock:  protects @from, @to_proc, and @to_thread
 *
@@ -920,66 +933,6 @@ static void binder_free_thread(struct binder_thread 
*thread);
 static void binder_free_proc(struct binder_proc *proc);
 static void binder_inc_node_tmpref_ilocked(struct binder_node *node);
 
-static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
-{
-   unsigned long rlim_cur;
-   unsigned long irqs;
-   int ret;
-
-   mutex_lock(&proc->files_lock);
-   if (proc->files == NULL) {
-   ret = -ESRCH;
-   goto err;
-   }
-   if (!lock_task_sighand(proc->tsk, &irqs)) {
-   ret = -EMFILE;
-   goto err;
-   }
-   rlim_cur = task_rlimit(proc->tsk, RLIMIT_

[PATCH] binder: use standard functions to allocate fds

2018-08-28 Thread Todd Kjos
Binder uses internal fs interfaces to allocate and install fds:

__alloc_fd
__fd_install
__close_fd
get_files_struct
put_files_struct

These were used to support the passing of fds between processes
as part of a transaction. The actual allocation and installation
of the fds in the target process was handled by the sending
process so the standard functions, alloc_fd() and fd_install()
which assume task==current couldn't be used.

This patch refactors this mechanism so that the fds are
allocated and installed by the target process allowing the
standard functions to be used.

The sender now creates a list of fd fixups that contains the
struct *file and the address to fixup with the new fd once
it is allocated. This list is processed by the target process
when the transaction is dequeued.

A new error case is introduced by this change. If an async
transaction with file descriptors cannot allocate new
fds in the target (probably due to out of file descriptors),
the transaction is discarded with a log message. In the old
implementation this would have been detected in the sender
context and failed prior to sending.

Signed-off-by: Todd Kjos 
---
v2: use "%zu" printk format for size_t

 drivers/android/Kconfig|   2 +-
 drivers/android/binder.c   | 387 -
 drivers/android/binder_trace.h |  36 ++-
 3 files changed, 260 insertions(+), 165 deletions(-)

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 432e9ad770703..51e8250d113fa 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -10,7 +10,7 @@ if ANDROID
 
 config ANDROID_BINDER_IPC
bool "Android Binder IPC Driver"
-   depends on MMU
+   depends on MMU && !CPU_CACHE_VIVT
default n
---help---
  Binder is used in Android for both communication between processes,
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d58763b6b0090..50856319bc7da 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -457,9 +458,8 @@ struct binder_ref {
 };
 
 enum binder_deferred_state {
-   BINDER_DEFERRED_PUT_FILES= 0x01,
-   BINDER_DEFERRED_FLUSH= 0x02,
-   BINDER_DEFERRED_RELEASE  = 0x04,
+   BINDER_DEFERRED_FLUSH= 0x01,
+   BINDER_DEFERRED_RELEASE  = 0x02,
 };
 
 /**
@@ -480,9 +480,6 @@ enum binder_deferred_state {
  *(invariant after initialized)
  * @tsk   task_struct for group_leader of process
  *(invariant after initialized)
- * @files files_struct for process
- *(protected by @files_lock)
- * @files_lockmutex to protect @files
  * @deferred_work_node:   element for binder_deferred_list
  *(protected by binder_deferred_lock)
  * @deferred_work:bitmap of deferred work to perform
@@ -527,8 +524,6 @@ struct binder_proc {
struct list_head waiting_threads;
int pid;
struct task_struct *tsk;
-   struct files_struct *files;
-   struct mutex files_lock;
struct hlist_node deferred_work_node;
int deferred_work;
bool is_dead;
@@ -611,6 +606,23 @@ struct binder_thread {
bool is_dead;
 };
 
+/**
+ * struct binder_txn_fd_fixup - transaction fd fixup list element
+ * @fixup_entry:  list entry
+ * @file: struct file to be associated with new fd
+ * @offset:   offset in buffer data to this fixup
+ *
+ * List element for fd fixups in a transaction. Since file
+ * descriptors need to be allocated in the context of the
+ * target process, we pass each fd to be processed in this
+ * struct.
+ */
+struct binder_txn_fd_fixup {
+   struct list_head fixup_entry;
+   struct file *file;
+   size_t offset;
+};
+
 struct binder_transaction {
int debug_id;
struct binder_work work;
@@ -628,6 +640,7 @@ struct binder_transaction {
longpriority;
longsaved_priority;
kuid_t  sender_euid;
+   struct list_head fd_fixups;
/**
 * @lock:  protects @from, @to_proc, and @to_thread
 *
@@ -920,66 +933,6 @@ static void binder_free_thread(struct binder_thread 
*thread);
 static void binder_free_proc(struct binder_proc *proc);
 static void binder_inc_node_tmpref_ilocked(struct binder_node *node);
 
-static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
-{
-   unsigned long rlim_cur;
-   unsigned long irqs;
-   int ret;
-
-   mutex_lock(&proc->files_lock);
-   if (proc->files == NULL) {
-   ret = -ESRCH;
-   goto err;
-   }
-   if (!lock_task_sighand(proc->tsk, &irqs)) {
-   ret = -EMFILE;
-   goto err;
-   }
-   rlim_cur = task_rlimit(proc->tsk, RLIMIT_

Re: [PATCH v2] binder: use standard functions to allocate fds

2018-08-28 Thread Todd Kjos
Sorry, forgot to bump the version. Ignore this one.
On Tue, Aug 28, 2018 at 1:43 PM Todd Kjos  wrote:
>
> Binder uses internal fs interfaces to allocate and install fds:
>
> __alloc_fd
> __fd_install
> __close_fd
> get_files_struct
> put_files_struct
>
> These were used to support the passing of fds between processes
> as part of a transaction. The actual allocation and installation
> of the fds in the target process was handled by the sending
> process so the standard functions, alloc_fd() and fd_install()
> which assume task==current couldn't be used.
>
> This patch refactors this mechanism so that the fds are
> allocated and installed by the target process allowing the
> standard functions to be used.
>
> The sender now creates a list of fd fixups that contains the
> struct *file and the address to fixup with the new fd once
> it is allocated. This list is processed by the target process
> when the transaction is dequeued.
>
> A new error case is introduced by this change. If an async
> transaction with file descriptors cannot allocate new
> fds in the target (probably due to out of file descriptors),
> the transaction is discarded with a log message. In the old
> implementation this would have been detected in the sender
> context and failed prior to sending.
>
> Signed-off-by: Todd Kjos 
> ---
> v2: use "%zu" printk format for size_t
>
>  drivers/android/Kconfig|   2 +-
>  drivers/android/binder.c   | 387 -
>  drivers/android/binder_trace.h |  36 ++-
>  3 files changed, 260 insertions(+), 165 deletions(-)
>
> diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
> index 432e9ad770703..51e8250d113fa 100644
> --- a/drivers/android/Kconfig
> +++ b/drivers/android/Kconfig
> @@ -10,7 +10,7 @@ if ANDROID
>
>  config ANDROID_BINDER_IPC
> bool "Android Binder IPC Driver"
> -   depends on MMU
> +   depends on MMU && !CPU_CACHE_VIVT
> default n
> ---help---
>   Binder is used in Android for both communication between processes,
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index d58763b6b0090..50856319bc7da 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -71,6 +71,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>
> @@ -457,9 +458,8 @@ struct binder_ref {
>  };
>
>  enum binder_deferred_state {
> -   BINDER_DEFERRED_PUT_FILES= 0x01,
> -   BINDER_DEFERRED_FLUSH= 0x02,
> -   BINDER_DEFERRED_RELEASE  = 0x04,
> +   BINDER_DEFERRED_FLUSH= 0x01,
> +   BINDER_DEFERRED_RELEASE  = 0x02,
>  };
>
>  /**
> @@ -480,9 +480,6 @@ enum binder_deferred_state {
>   *(invariant after initialized)
>   * @tsk   task_struct for group_leader of process
>   *(invariant after initialized)
> - * @files files_struct for process
> - *(protected by @files_lock)
> - * @files_lockmutex to protect @files
>   * @deferred_work_node:   element for binder_deferred_list
>   *(protected by binder_deferred_lock)
>   * @deferred_work:bitmap of deferred work to perform
> @@ -527,8 +524,6 @@ struct binder_proc {
> struct list_head waiting_threads;
> int pid;
> struct task_struct *tsk;
> -   struct files_struct *files;
> -   struct mutex files_lock;
> struct hlist_node deferred_work_node;
> int deferred_work;
> bool is_dead;
> @@ -611,6 +606,23 @@ struct binder_thread {
> bool is_dead;
>  };
>
> +/**
> + * struct binder_txn_fd_fixup - transaction fd fixup list element
> + * @fixup_entry:  list entry
> + * @file: struct file to be associated with new fd
> + * @offset:   offset in buffer data to this fixup
> + *
> + * List element for fd fixups in a transaction. Since file
> + * descriptors need to be allocated in the context of the
> + * target process, we pass each fd to be processed in this
> + * struct.
> + */
> +struct binder_txn_fd_fixup {
> +   struct list_head fixup_entry;
> +   struct file *file;
> +   size_t offset;
> +};
> +
>  struct binder_transaction {
> int debug_id;
> struct binder_work work;
> @@ -628,6 +640,7 @@ struct binder_transaction {
> longpriority;
> longsaved_priority;
> kuid_t  sender_euid;
> +   struct list_head fd_fixups;
> /**
>  * @lock:  protects @from, @to_proc

[PATCH v2] binder: use standard functions to allocate fds

2018-08-28 Thread Todd Kjos
Binder uses internal fs interfaces to allocate and install fds:

__alloc_fd
__fd_install
__close_fd
get_files_struct
put_files_struct

These were used to support the passing of fds between processes
as part of a transaction. The actual allocation and installation
of the fds in the target process was handled by the sending
process so the standard functions, alloc_fd() and fd_install()
which assume task==current couldn't be used.

This patch refactors this mechanism so that the fds are
allocated and installed by the target process allowing the
standard functions to be used.

The sender now creates a list of fd fixups that contains the
struct *file and the address to fixup with the new fd once
it is allocated. This list is processed by the target process
when the transaction is dequeued.

A new error case is introduced by this change. If an async
transaction with file descriptors cannot allocate new
fds in the target (probably due to out of file descriptors),
the transaction is discarded with a log message. In the old
implementation this would have been detected in the sender
context and failed prior to sending.

Signed-off-by: Todd Kjos 
---
v2: use "%zu" printk format for size_t

 drivers/android/Kconfig|   2 +-
 drivers/android/binder.c   | 387 -
 drivers/android/binder_trace.h |  36 ++-
 3 files changed, 260 insertions(+), 165 deletions(-)

diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 432e9ad770703..51e8250d113fa 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -10,7 +10,7 @@ if ANDROID
 
 config ANDROID_BINDER_IPC
bool "Android Binder IPC Driver"
-   depends on MMU
+   depends on MMU && !CPU_CACHE_VIVT
default n
---help---
  Binder is used in Android for both communication between processes,
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index d58763b6b0090..50856319bc7da 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -457,9 +458,8 @@ struct binder_ref {
 };
 
 enum binder_deferred_state {
-   BINDER_DEFERRED_PUT_FILES= 0x01,
-   BINDER_DEFERRED_FLUSH= 0x02,
-   BINDER_DEFERRED_RELEASE  = 0x04,
+   BINDER_DEFERRED_FLUSH= 0x01,
+   BINDER_DEFERRED_RELEASE  = 0x02,
 };
 
 /**
@@ -480,9 +480,6 @@ enum binder_deferred_state {
  *(invariant after initialized)
  * @tsk   task_struct for group_leader of process
  *(invariant after initialized)
- * @files files_struct for process
- *(protected by @files_lock)
- * @files_lockmutex to protect @files
  * @deferred_work_node:   element for binder_deferred_list
  *(protected by binder_deferred_lock)
  * @deferred_work:bitmap of deferred work to perform
@@ -527,8 +524,6 @@ struct binder_proc {
struct list_head waiting_threads;
int pid;
struct task_struct *tsk;
-   struct files_struct *files;
-   struct mutex files_lock;
struct hlist_node deferred_work_node;
int deferred_work;
bool is_dead;
@@ -611,6 +606,23 @@ struct binder_thread {
bool is_dead;
 };
 
+/**
+ * struct binder_txn_fd_fixup - transaction fd fixup list element
+ * @fixup_entry:  list entry
+ * @file: struct file to be associated with new fd
+ * @offset:   offset in buffer data to this fixup
+ *
+ * List element for fd fixups in a transaction. Since file
+ * descriptors need to be allocated in the context of the
+ * target process, we pass each fd to be processed in this
+ * struct.
+ */
+struct binder_txn_fd_fixup {
+   struct list_head fixup_entry;
+   struct file *file;
+   size_t offset;
+};
+
 struct binder_transaction {
int debug_id;
struct binder_work work;
@@ -628,6 +640,7 @@ struct binder_transaction {
longpriority;
longsaved_priority;
kuid_t  sender_euid;
+   struct list_head fd_fixups;
/**
 * @lock:  protects @from, @to_proc, and @to_thread
 *
@@ -920,66 +933,6 @@ static void binder_free_thread(struct binder_thread 
*thread);
 static void binder_free_proc(struct binder_proc *proc);
 static void binder_inc_node_tmpref_ilocked(struct binder_node *node);
 
-static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
-{
-   unsigned long rlim_cur;
-   unsigned long irqs;
-   int ret;
-
-   mutex_lock(&proc->files_lock);
-   if (proc->files == NULL) {
-   ret = -ESRCH;
-   goto err;
-   }
-   if (!lock_task_sighand(proc->tsk, &irqs)) {
-   ret = -EMFILE;
-   goto err;
-   }
-   rlim_cur = task_rlimit(proc->tsk, RLIMIT_

Re: [PATCH] binder: use standard functions to allocate fds

2018-08-30 Thread Todd Kjos
On Wed, Aug 29, 2018 at 12:00 AM Christoph Hellwig  wrote:
>
> >  config ANDROID_BINDER_IPC
> >   bool "Android Binder IPC Driver"
> > - depends on MMU
> > + depends on MMU && !CPU_CACHE_VIVT
>
> Thats is a purely arm specific symbol which should not be
> used in common code.  Nevermind that there generally should
> be no good reason for it.

It is true that the current design (10+ years old) has this flaw. VIVT
cache support was the rationale for using the non-standard functions
to allocate file descriptors from the sender context. There are no
known cases of recent android releases running on a device with
a VIVT cache architecture, but I did want to call this out.

We're working on refactoring to eliminate this issue.

>
> > + fixup->offset = (uintptr_t)fdp - (uintptr_t)t->buffer->data;
>
> This looks completely broken.  Why would you care at what exact
> place the fd is placed?  Oh, because you share an array with fds
> with userspace, which is a hell of a bad idea, and then maninpulate
> that buffer mapped to userspace from kernel threads.

Well, android apps rely on this behavior (and have for 10+ years) so
there is no way to avoid the need to manipulate the buffer from
the kernel. We are working to refactor the driver to do this
using standard mechanisms instead of relying on low-level internal
functions.

>
> I think we just need to rm -rf drivers/android/binder*.c and be done
> with it, as this piece of crap should never have been merged to start
> with.


[PATCH 04/16] staging: gasket: core: remove kobj_name param from gasket_alloc_dev

2018-08-09 Thread Todd Poynor
From: Todd Poynor 

gasket_alloc_dev can retrieve the device name from the parent parameter,
a separate parameter isn't needed for this.  Rename the variable to
better reflect its meaning, as the name of the parent device for which a
gasket device is being allocated.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 3fb805204d700..5f54b3615f67c 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -189,26 +189,26 @@ static int gasket_find_dev_slot(struct 
gasket_internal_desc *internal_desc,
  * Returns 0 if successful, a negative error code otherwise.
  */
 static int gasket_alloc_dev(struct gasket_internal_desc *internal_desc,
-   struct device *parent, struct gasket_dev **pdev,
-   const char *kobj_name)
+   struct device *parent, struct gasket_dev **pdev)
 {
int dev_idx;
const struct gasket_driver_desc *driver_desc =
internal_desc->driver_desc;
struct gasket_dev *gasket_dev;
struct gasket_cdev_info *dev_info;
+   const char *parent_name = dev_name(parent);
 
-   pr_debug("Allocating a Gasket device %s.\n", kobj_name);
+   pr_debug("Allocating a Gasket device, parent %s.\n", parent_name);
 
*pdev = NULL;
 
-   dev_idx = gasket_find_dev_slot(internal_desc, kobj_name);
+   dev_idx = gasket_find_dev_slot(internal_desc, parent_name);
if (dev_idx < 0)
return dev_idx;
 
gasket_dev = *pdev = kzalloc(sizeof(*gasket_dev), GFP_KERNEL);
if (!gasket_dev) {
-   pr_err("no memory for device %s\n", kobj_name);
+   pr_err("no memory for device, parent %s\n", parent_name);
return -ENOMEM;
}
internal_desc->devs[dev_idx] = gasket_dev;
@@ -217,7 +217,7 @@ static int gasket_alloc_dev(struct gasket_internal_desc 
*internal_desc,
 
gasket_dev->internal_desc = internal_desc;
gasket_dev->dev_idx = dev_idx;
-   snprintf(gasket_dev->kobj_name, GASKET_NAME_MAX, "%s", kobj_name);
+   snprintf(gasket_dev->kobj_name, GASKET_NAME_MAX, "%s", parent_name);
gasket_dev->dev = get_device(parent);
/* gasket_bar_data is uninitialized. */
gasket_dev->num_page_tables = driver_desc->num_page_tables;
@@ -1431,13 +1431,12 @@ int gasket_pci_add_device(struct pci_dev *pci_dev,
  struct gasket_dev **gasket_devp)
 {
int ret;
-   const char *kobj_name = dev_name(&pci_dev->dev);
struct gasket_internal_desc *internal_desc;
struct gasket_dev *gasket_dev;
const struct gasket_driver_desc *driver_desc;
struct device *parent;
 
-   pr_debug("add PCI device %s\n", kobj_name);
+   dev_dbg(&pci_dev->dev, "add PCI gasket device\n");
 
mutex_lock(&g_mutex);
internal_desc = lookup_internal_desc(pci_dev);
@@ -1451,7 +1450,7 @@ int gasket_pci_add_device(struct pci_dev *pci_dev,
driver_desc = internal_desc->driver_desc;
 
parent = &pci_dev->dev;
-   ret = gasket_alloc_dev(internal_desc, parent, &gasket_dev, kobj_name);
+   ret = gasket_alloc_dev(internal_desc, parent, &gasket_dev);
if (ret)
return ret;
gasket_dev->pci_dev = pci_dev;
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 01/16] MAINTAINERS: Switch a maintainer for drivers/staging/gasket

2018-08-09 Thread Todd Poynor
From: Todd Poynor 

Todd Poynor takes over for John Joseph.

Signed-off-by: John Joseph 
Signed-off-by: Todd Poynor 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index af64fe0f0b41f..f3466b5c50482 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5939,7 +5939,7 @@ F:Documentation/gcc-plugins.txt
 
 GASKET DRIVER FRAMEWORK
 M: Rob Springer 
-M: John Joseph 
+M: Todd Poynor 
 M: Ben Chan 
 S: Maintained
 F: drivers/staging/gasket/
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 15/16] staging: gasket: interrupt: simplify interrupt init parameters

2018-08-09 Thread Todd Poynor
From: Todd Poynor 

Pass the gasket driver descriptor to the interrupt init function, rather
than exploding out separate parameters from various fields of that
structure.  This allows us to make more localized changes to the types
of interrupts supported (MSIX vs. wire, etc.) without affecting the
calling sequence, and seems nicer for simplification purposes.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.c  |  8 +--
 drivers/staging/gasket/gasket_interrupt.c | 27 +++
 drivers/staging/gasket/gasket_interrupt.h | 24 +---
 3 files changed, 15 insertions(+), 44 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.c 
b/drivers/staging/gasket/gasket_core.c
index 99f3f11d75ce2..f230bec76ae4e 100644
--- a/drivers/staging/gasket/gasket_core.c
+++ b/drivers/staging/gasket/gasket_core.c
@@ -1357,13 +1357,7 @@ int gasket_enable_device(struct gasket_dev *gasket_dev)
const struct gasket_driver_desc *driver_desc =
gasket_dev->internal_desc->driver_desc;
 
-   ret = gasket_interrupt_init(gasket_dev, driver_desc->name,
-   driver_desc->interrupt_type,
-   driver_desc->interrupts,
-   driver_desc->num_interrupts,
-   driver_desc->interrupt_pack_width,
-   driver_desc->interrupt_bar_index,
-   driver_desc->wire_interrupt_offsets);
+   ret = gasket_interrupt_init(gasket_dev);
if (ret) {
dev_err(gasket_dev->dev,
"Critical failure to allocate interrupts: %d\n", ret);
diff --git a/drivers/staging/gasket/gasket_interrupt.c 
b/drivers/staging/gasket/gasket_interrupt.c
index f94e4ea9a7ded..eb5dfbe08e214 100644
--- a/drivers/staging/gasket/gasket_interrupt.c
+++ b/drivers/staging/gasket/gasket_interrupt.c
@@ -331,31 +331,30 @@ static struct gasket_sysfs_attribute 
interrupt_sysfs_attrs[] = {
GASKET_END_OF_ATTR_ARRAY,
 };
 
-int gasket_interrupt_init(struct gasket_dev *gasket_dev, const char *name,
- int type,
- const struct gasket_interrupt_desc *interrupts,
- int num_interrupts, int pack_width, int bar_index,
- const struct gasket_wire_interrupt_offsets 
*wire_int_offsets)
+int gasket_interrupt_init(struct gasket_dev *gasket_dev)
 {
int ret;
struct gasket_interrupt_data *interrupt_data;
+   const struct gasket_driver_desc *driver_desc =
+   gasket_get_driver_desc(gasket_dev);
 
interrupt_data = kzalloc(sizeof(struct gasket_interrupt_data),
 GFP_KERNEL);
if (!interrupt_data)
return -ENOMEM;
gasket_dev->interrupt_data = interrupt_data;
-   interrupt_data->name = name;
-   interrupt_data->type = type;
+   interrupt_data->name = driver_desc->name;
+   interrupt_data->type = driver_desc->interrupt_type;
interrupt_data->pci_dev = gasket_dev->pci_dev;
-   interrupt_data->num_interrupts = num_interrupts;
-   interrupt_data->interrupts = interrupts;
-   interrupt_data->interrupt_bar_index = bar_index;
-   interrupt_data->pack_width = pack_width;
+   interrupt_data->num_interrupts = driver_desc->num_interrupts;
+   interrupt_data->interrupts = driver_desc->interrupts;
+   interrupt_data->interrupt_bar_index = driver_desc->interrupt_bar_index;
+   interrupt_data->pack_width = driver_desc->interrupt_pack_width;
interrupt_data->num_configured = 0;
-   interrupt_data->wire_interrupt_offsets = wire_int_offsets;
+   interrupt_data->wire_interrupt_offsets =
+   driver_desc->wire_interrupt_offsets;
 
-   interrupt_data->eventfd_ctxs = kcalloc(num_interrupts,
+   interrupt_data->eventfd_ctxs = kcalloc(driver_desc->num_interrupts,
   sizeof(struct eventfd_ctx *),
   GFP_KERNEL);
if (!interrupt_data->eventfd_ctxs) {
@@ -363,7 +362,7 @@ int gasket_interrupt_init(struct gasket_dev *gasket_dev, 
const char *name,
return -ENOMEM;
}
 
-   interrupt_data->interrupt_counts = kcalloc(num_interrupts,
+   interrupt_data->interrupt_counts = kcalloc(driver_desc->num_interrupts,
   sizeof(ulong),
   GFP_KERNEL);
if (!interrupt_data->interrupt_counts) {
diff --git a/drivers/staging/gasket/gasket_interrupt.h 
b/drivers/staging/gasket/gasket_interrupt.h
index 835af439e96a9..85526a1374a1a 100644
--- a/drivers/staging/gasket/gasket_interrupt.h
+++ b/drivers/staging/gask

[PATCH 09/16] staging: gasket: core: switch to relaxed memory-mapped I/O

2018-08-09 Thread Todd Poynor
From: Todd Poynor 

Use of readl() is deprecated; readl_relaxed() with appropriate memory
barriers is preferred.  Switch to relaxed reads and writes for better
performance as well.  Memory barriers required for I/O vs. normal
memory access on Apex devices have already been explicitly coded in the
page table routines.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.h 
b/drivers/staging/gasket/gasket_core.h
index 275fd0b345b6e..fd7e75b765a6d 100644
--- a/drivers/staging/gasket/gasket_core.h
+++ b/drivers/staging/gasket/gasket_core.h
@@ -590,25 +590,25 @@ const char *gasket_num_name_lookup(uint num,
 static inline ulong gasket_dev_read_64(struct gasket_dev *gasket_dev, int bar,
   ulong location)
 {
-   return readq(&gasket_dev->bar_data[bar].virt_base[location]);
+   return readq_relaxed(&gasket_dev->bar_data[bar].virt_base[location]);
 }
 
 static inline void gasket_dev_write_64(struct gasket_dev *dev, u64 value,
   int bar, ulong location)
 {
-   writeq(value, &dev->bar_data[bar].virt_base[location]);
+   writeq_relaxed(value, &dev->bar_data[bar].virt_base[location]);
 }
 
 static inline void gasket_dev_write_32(struct gasket_dev *dev, u32 value,
   int bar, ulong location)
 {
-   writel(value, &dev->bar_data[bar].virt_base[location]);
+   writel_relaxed(value, &dev->bar_data[bar].virt_base[location]);
 }
 
 static inline u32 gasket_dev_read_32(struct gasket_dev *dev, int bar,
 ulong location)
 {
-   return readl(&dev->bar_data[bar].virt_base[location]);
+   return readl_relaxed(&dev->bar_data[bar].virt_base[location]);
 }
 
 static inline void gasket_read_modify_write_64(struct gasket_dev *dev, int bar,
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 10/16] staging: gasket: page table: remove extraneous memory barriers

2018-08-09 Thread Todd Poynor
From: Todd Poynor 

Some explicit memory barriers in the page table code are not necessary,
either because:

(a) The barrier follows a non-relaxed MMIO access that already performs
a read or write memory barrier.

(b) The barrier follows DMA API calls for which the device-visible
effects of IOMMU programming are guaranteed to be flushed to the IOMMU
prior to the call returning, and doesn't need to sync with normal memory
access.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_page_table.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/gasket/gasket_page_table.c 
b/drivers/staging/gasket/gasket_page_table.c
index 4d2499269499b..53492f4fad6aa 100644
--- a/drivers/staging/gasket/gasket_page_table.c
+++ b/drivers/staging/gasket/gasket_page_table.c
@@ -317,8 +317,6 @@ static void gasket_free_extended_subtable(struct 
gasket_page_table *pg_tbl,
 
/* Release the page table from the device */
writeq(0, slot);
-   /* Force sync around the address release. */
-   mb();
 
if (pte->dma_addr)
dma_unmap_page(pg_tbl->device, pte->dma_addr, PAGE_SIZE,
@@ -504,8 +502,6 @@ static int gasket_perform_mapping(struct gasket_page_table 
*pg_tbl,
(void *)page_to_phys(page));
return -1;
}
-   /* Wait until the page is mapped. */
-   mb();
}
 
/* Make the DMA-space address available to the device. */
@@ -604,12 +600,13 @@ static void gasket_perform_unmapping(struct 
gasket_page_table *pg_tbl,
 */
for (i = 0; i < num_pages; i++) {
/* release the address from the device, */
-   if (is_simple_mapping || ptes[i].status == PTE_INUSE)
+   if (is_simple_mapping || ptes[i].status == PTE_INUSE) {
writeq(0, &slots[i]);
-   else
+   } else {
((u64 __force *)slots)[i] = 0;
-   /* Force sync around the address release. */
-   mb();
+   /* sync above PTE update before updating mappings */
+   wmb();
+   }
 
/* release the address from the driver, */
if (ptes[i].status == PTE_INUSE) {
@@ -898,8 +895,6 @@ static int gasket_alloc_extended_subtable(struct 
gasket_page_table *pg_tbl,
/* Map the page into DMA space. */
pte->dma_addr = dma_map_page(pg_tbl->device, pte->page, 0, PAGE_SIZE,
 DMA_BIDIRECTIONAL);
-   /* Wait until the page is mapped. */
-   mb();
 
/* make the addresses available to the device */
dma_addr = (pte->dma_addr + pte->offset) | GASKET_VALID_SLOT_FLAG;
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 16/16] staging: gasket: interrupt: remove unimplemented interrupt types

2018-08-09 Thread Todd Poynor
From: Todd Poynor 

Interrupt types PCI_MSI and PLATFORM_WIRE are unused and unimplemented.
Remove these.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_core.h  | 11 
 drivers/staging/gasket/gasket_interrupt.c | 34 +--
 2 files changed, 1 insertion(+), 44 deletions(-)

diff --git a/drivers/staging/gasket/gasket_core.h 
b/drivers/staging/gasket/gasket_core.h
index fd7e75b765a6d..0203460f48957 100644
--- a/drivers/staging/gasket/gasket_core.h
+++ b/drivers/staging/gasket/gasket_core.h
@@ -50,8 +50,6 @@ enum gasket_interrupt_packing {
 /* Type of the interrupt supported by the device. */
 enum gasket_interrupt_type {
PCI_MSIX = 0,
-   PCI_MSI = 1,
-   PLATFORM_WIRE = 2,
 };
 
 /*
@@ -69,12 +67,6 @@ struct gasket_interrupt_desc {
int packing;
 };
 
-/* Offsets to the wire interrupt handling registers */
-struct gasket_wire_interrupt_offsets {
-   u64 pending_bit_array;
-   u64 mask_array;
-};
-
 /*
  * This enum is used to identify memory regions being part of the physical
  * memory that belongs to a device.
@@ -384,9 +376,6 @@ struct gasket_driver_desc {
 */
struct gasket_coherent_buffer_desc coherent_buffer_description;
 
-   /* Offset of wire interrupt registers. */
-   const struct gasket_wire_interrupt_offsets *wire_interrupt_offsets;
-
/* Interrupt type. (One of gasket_interrupt_type). */
int interrupt_type;
 
diff --git a/drivers/staging/gasket/gasket_interrupt.c 
b/drivers/staging/gasket/gasket_interrupt.c
index eb5dfbe08e214..2cd262be65ca0 100644
--- a/drivers/staging/gasket/gasket_interrupt.c
+++ b/drivers/staging/gasket/gasket_interrupt.c
@@ -45,9 +45,6 @@ struct gasket_interrupt_data {
/* The width of a single interrupt in a packed interrupt register. */
int pack_width;
 
-   /* offset of wire interrupt registers */
-   const struct gasket_wire_interrupt_offsets *wire_interrupt_offsets;
-
/*
 * Design-wise, these elements should be bundled together, but
 * pci_enable_msix's interface requires that they be managed
@@ -92,19 +89,6 @@ static void gasket_interrupt_setup(struct gasket_dev 
*gasket_dev)
 
dev_dbg(gasket_dev->dev, "Running interrupt setup\n");
 
-   if (interrupt_data->type == PLATFORM_WIRE ||
-   interrupt_data->type == PCI_MSI) {
-   /* Nothing needs to be done for platform or PCI devices. */
-   return;
-   }
-
-   if (interrupt_data->type != PCI_MSIX) {
-   dev_dbg(gasket_dev->dev,
-   "Cannot handle unsupported interrupt type %d\n",
-   interrupt_data->type);
-   return;
-   }
-
/* Setup the MSIX table. */
 
for (i = 0; i < interrupt_data->num_interrupts; i++) {
@@ -351,8 +335,6 @@ int gasket_interrupt_init(struct gasket_dev *gasket_dev)
interrupt_data->interrupt_bar_index = driver_desc->interrupt_bar_index;
interrupt_data->pack_width = driver_desc->interrupt_pack_width;
interrupt_data->num_configured = 0;
-   interrupt_data->wire_interrupt_offsets =
-   driver_desc->wire_interrupt_offsets;
 
interrupt_data->eventfd_ctxs = kcalloc(driver_desc->num_interrupts,
   sizeof(struct eventfd_ctx *),
@@ -379,12 +361,7 @@ int gasket_interrupt_init(struct gasket_dev *gasket_dev)
force_msix_interrupt_unmasking(gasket_dev);
break;
 
-   case PCI_MSI:
-   case PLATFORM_WIRE:
default:
-   dev_err(gasket_dev->dev,
-   "Cannot handle unsupported interrupt type %d\n",
-   interrupt_data->type);
ret = -EINVAL;
}
 
@@ -439,12 +416,7 @@ int gasket_interrupt_reinit(struct gasket_dev *gasket_dev)
force_msix_interrupt_unmasking(gasket_dev);
break;
 
-   case PCI_MSI:
-   case PLATFORM_WIRE:
default:
-   dev_dbg(gasket_dev->dev,
-   "Cannot handle unsupported interrupt type %d\n",
-   gasket_dev->interrupt_data->type);
ret = -EINVAL;
}
 
@@ -489,12 +461,8 @@ void gasket_interrupt_cleanup(struct gasket_dev 
*gasket_dev)
gasket_interrupt_msix_cleanup(interrupt_data);
break;
 
-   case PCI_MSI:
-   case PLATFORM_WIRE:
default:
-   dev_dbg(gasket_dev->dev,
-   "Cannot handle unsupported interrupt type %d\n",
-   interrupt_data->type);
+   break;
}
 
kfree(interrupt_data->interrupt_counts);
-- 
2.18.0.597.ga71716f1ad-goog



[PATCH 08/16] staging: gasket: page table: use dma_mapping_error for error detection

2018-08-09 Thread Todd Poynor
From: Todd Poynor 

gasket_perform_mapping() call dma_mapping_error() to determine if
mapping failed.

Signed-off-by: Todd Poynor 
---
 drivers/staging/gasket/gasket_page_table.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/gasket/gasket_page_table.c 
b/drivers/staging/gasket/gasket_page_table.c
index bd921dc6094de..4d2499269499b 100644
--- a/drivers/staging/gasket/gasket_page_table.c
+++ b/drivers/staging/gasket/gasket_page_table.c
@@ -493,7 +493,8 @@ static int gasket_perform_mapping(struct gasket_page_table 
*pg_tbl,
(void *)page_to_pfn(page),
(unsigned long long)ptes[i].dma_addr);
 
-   if (ptes[i].dma_addr == -1) {
+   if (dma_mapping_error(pg_tbl->device,
+ ptes[i].dma_addr)) {
dev_dbg(pg_tbl->device,
"%s i %d -> fail to map page %llx "
"[pfn %p ohys %p]\n",
-- 
2.18.0.597.ga71716f1ad-goog



  1   2   3   4   5   6   7   8   >