Re: [PATCHv2 3/4] tty: n_gsm: add helper to convert mux-num to/from tty-base

2019-07-10 Thread Alan Cox
On Tue,  9 Jul 2019 08:46:32 +0200
Martin Hundebøll  wrote:

> Make it obvious how the gsm mux number relates to the virtual tty lines
> by using helper function instead of shifting 6 bits.
> 
> Signed-off-by: Martin Hundebøll 
> ---
>  drivers/tty/n_gsm.c | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c
> index c4e16b31f9ab..cba06063c44a 100644
> --- a/drivers/tty/n_gsm.c
> +++ b/drivers/tty/n_gsm.c
> @@ -2171,6 +2171,16 @@ static inline void mux_put(struct gsm_mux *gsm)
>   kref_put(>ref, gsm_free_muxr);
>  }
>  
> +static inline int mux_num_to_base(struct gsm_mux *gsm)
> +{
> + return gsm->num * NUM_DLCI;
> +}
> +
> +static inline unsigned int mux_line_to_num(int line)
> +{
> + return line / NUM_DLCI;

If you are going to convert shifts to multiply and divide then used
unsigned maths so the compiler can optimize it nicely on some of the low
end processors.

Alan


Re: [PATCH 4/4] tty: n_gsm: add ioctl to map serial device to mux'ed tty

2019-07-09 Thread Alan Cox
On Mon,  8 Jul 2019 21:02:52 +0200
Martin Hundebøll  wrote:

> Guessing the base tty for a gsm0710 multiplexed serial device is not
> currently possible, which makes it racy to use with multiple modems.
> 
> Add a way to map the physical serial tty to its related mux devices
> using a ioctl.

That looks very sensible

> + int base;
>  
>   /* open the serial port connected to the modem */
>   fd = open(SERIAL_PORT, O_RDWR | O_NOCTTY | O_NDELAY);
> @@ -58,6 +61,11 @@ Major parts of the initialization program :
>   c.mtu = 127;
>   /* set the new configuration */
>   ioctl(fd, GSMIOC_SETCONF, );
> + /* get and print base gsmtty device node */
> + ioctl(fd, GSMIOC_GETBASE, );

Can we at least use a specific sized type ? uint32_t or whatever is fine.

Alan


Re: [PATCH v3 0/3] net: fddi: skfp: Use PCI generic definitions instead of private duplicates

2019-06-21 Thread Alan Cox
On Fri, 21 Jun 2019 15:16:04 +0530
Puranjay Mohan  wrote:

> This patch series removes the private duplicates of PCI definitions in
> favour of generic definitions defined in pci_regs.h.

Why bother ? It's an ancient obsolete card ?

Do you even have one to test ?

> 
> This driver only uses some of the generic PCI definitons,
> which are included from pci_regs.h and thier private versions
> are removed from skfbi.h with all other private defines.
> 
> The skfbi.h defines PCI_REV_ID and other private defines with different
> names, these are renamed to Generic PCI names to make them
> compatible with defines in pci_regs.h.
> 
> All unused defines are removed from skfbi.h.

I sincerely doubt anyone on the planet is using this card any more.

Alan


Re: [PATCH v2 2/2] tty: add rpmsg driver

2019-05-16 Thread Alan Cox


> +static int rpmsg_tty_data_handler(struct rpmsg_device *rpdev, void *data,
> +   int len, void *priv, u32 src)
> +{
> + struct rpmsg_tty_port *cport = dev_get_drvdata(>dev);
> + u8 *cbuf;
> + int space;
> +
> + dev_dbg(>dev, "msg(<- src 0x%x) len %d\n", src, len);
> +
> + if (!len)
> + return 0;
> +
> + dev_dbg(>dev, "space available: %d\n",
> + tty_buffer_space_avail(>port));
> +
> + space = tty_prepare_flip_string(>port, , len);
> + if (space <= 0) {
> + dev_err(>dev, "No memory for tty_prepare_flip_string\n");
> + return -ENOMEM;
> + }

Really bad idea.

1. It's not an 'error'. It's normal that in the case the consumer is
blocked you can run out of processing space

2. You will trigger this when being hit by a very large fast flow of data
so responding by burning CPU spewing messages (possibly even out of this
tty) is bad.

It's not a bug - just keep statistics and throw it away. If anyone cares
they will do flow control.


> +
> +static int rpmsg_tty_write_control(struct tty_struct *tty, u8 ctrl, u8 
> *values,
> +unsigned int n_value)
> +{
> + struct rpmsg_tty_port *cport = idr_find(_idr, tty->index);
> + struct rpmsg_tty_payload *msg;
> + struct rpmsg_tty_ctrl *m_ctrl;
> + struct rpmsg_device *rpdev;
> + unsigned int msg_size;
> + int ret;
> +
> + if (!cport) {
> + dev_err(tty->dev, "cannot get cport\n");
> + return -ENODEV;
> + }
> +
> + rpdev = cport->rpdev;
> +
> + msg_size = sizeof(*msg) + sizeof(*m_ctrl) + n_value;
> + msg = kzalloc(msg_size, GFP_KERNEL);


Out of memory check ?

> +static int rpmsg_tty_write(struct tty_struct *tty, const u8 *buf, int len)
> +{
> + struct rpmsg_tty_port *cport = idr_find(_idr, tty->index);
> + struct rpmsg_device *rpdev;
> + int msg_size, msg_max_size, ret = 0;
> + int cmd_sz = sizeof(struct rpmsg_tty_payload);
> + u8 *tmpbuf;
> +
> + if (!cport) {
> + dev_err(tty->dev, "cannot get cport\n");
> + return -ENODEV;
> + }
> +
> + /* If cts not set, the message is not sent*/
> + if (!cport->cts)
> + return 0;
> +
> + rpdev = cport->rpdev;
> +
> + dev_dbg(>dev, "%s: send msg from tty->index = %d, len = %d\n",
> + __func__, tty->index, len);
> + if (!buf) {
> + dev_err(>dev, "buf shouldn't be null.\n");
> + return -ENOMEM;
> + }
> +
> + msg_max_size = rpmsg_get_buf_payload_size(rpdev->ept);
> + if (msg_max_size < 0)
> + return msg_max_size;
> +
> + msg_size = min(len + cmd_sz, msg_max_size);
> + tmpbuf = kzalloc(msg_size, GFP_KERNEL);

Allocation failure check ?



Re: [PATCH v3] serial: Add Milbeaut serial control

2019-04-26 Thread Alan Cox
O
> +static void mlb_usio_set_termios(struct uart_port *port,
> + struct ktermios *termios, struct ktermios *old)
> +{
> + unsigned int escr, smr = MLB_USIO_SMR_SOE;
> + unsigned long flags, baud, quot;
> +
> + switch (termios->c_cflag & CSIZE) {
> + case CS5:
> + escr = MLB_USIO_ESCR_L_5BIT;
> + break;
> + case CS6:
> + escr = MLB_USIO_ESCR_L_6BIT;
> + break;
> + case CS7:
> + escr = MLB_USIO_ESCR_L_7BIT;
> + break;
> + case CS8:
> + default:
> + escr = MLB_USIO_ESCR_L_8BIT;
> + break;
> + }
> +
> + if (termios->c_cflag & CSTOPB)
> + smr |= MLB_USIO_SMR_SBL;
> +
> + if (termios->c_cflag & PARENB) {
> + escr |= MLB_USIO_ESCR_PEN;
> + if (termios->c_cflag & PARODD)
> + escr |= MLB_USIO_ESCR_P;
> + }

If you don't suport CMSPAR then clear that bit in termios as well

> + /* Set hard flow control */
> + if (of_property_read_bool(port->dev->of_node, "auto-flow-control") ||
> + (termios->c_cflag & CRTSCTS))
> + escr |= MLB_USIO_ESCR_FLWEN;

That's just broken. The termios bits are the definitive things for the
port, and in addition even if they are forced you need to correct the
termios data.

You might want to control flow control *at boot* with an OF property but
doing it post boot is just busted.

Alan


Re: [PATCH] ata: pata_oldpiix: Add missing device ID for INTEL_82371AB

2019-03-22 Thread Alan Cox
> > If the virtual Sparc emulator is using it does that also mean actual
> > Sparc hardware has it. In which case presumably it needs fixing, or at
> > least moving to the generic driver assuming the firmware sets it up ?
> >   
> 
> The qemu works perfectly with the new Linux driver.

For some configurations both drivers will work with even older chips. The
question is whether your chip has separate slave timings, if not then
while it'll work single device with ata_piix some combinations will fail.

Digging into the datasheet the part in question has slave timing (0x44) so
should indeed be ata_piix.

Alan



Re: Staging status of speakup

2019-03-19 Thread Alan Cox
On Sat, 16 Mar 2019 10:35:43 +0100
Samuel Thibault  wrote:

> Chris Brannon, le ven. 15 mars 2019 18:19:39 -0700, a ecrit:
> > Okash Khawaja  writes:  
> > > Finally there is an issue where text in output buffer sometimes gets
> > > garbled on SMP systems, but we can continue working on it after the
> > > driver is moved out of staging, if that's okay. Basically we need a
> > > reproducer of this issue.  
> > 
> > What kind of reproducer do you need here?  It's straightforward to
> > reproduce in casual use, at least with a software synthesizer.  
> 
> The problem is that neither Okash nor I are even casual users of
> speakup, so we need a walk-through of the kind of operation that
> produces the issue. It does not have to be reproducible each time it is
> done. Perhaps (I really don't know what that bug is about actually) it
> is a matter of putting text in the selection buffer, and try to paste it
> 100 times, and once every 10 times it will be garbled, for instance.

paste_selection still says

/* Insert the contents of the selection buffer into the
 * queue of the tty associated with the current console.
 * Invoked by ioctl().
 *
 * Locking: called without locks. Calls the ldisc wrongly with
 * unsafe methods,
 */

from which I deduce that with everyone using X nobody ever bothered to
fix it. So before you look too hard at the speakup code you might want to
review the interaction with selection.c too.

Alan


Re: [PATCH 09/27] hibernate: Disable when the kernel is locked down

2019-03-18 Thread Alan Cox
> Suse have a solution for this that I'd like to see pushed again, but
> from a practical perspective enterprise distributions have been
> shipping this for some time without significant obvious customer
> complaint.

Probably because their IT department hasn't noticed 8)

Alan


Re: [PATCH] ata: pata_oldpiix: Add missing device ID for INTEL_82371AB

2019-03-12 Thread Alan Cox
On Tue, 12 Mar 2019 11:41:02 +0100
LABBE Corentin  wrote:

> On Mon, Dec 10, 2018 at 05:52:35PM +0300, Sergei Shtylyov wrote:
> > Hello!
> > 
> > On 12/10/2018 04:46 PM, Corentin Labbe wrote:
> >   
> > > When playing with a virtual SPARC machine with qemu, I found that the
> > > IDE emulated device was not probing with the ata/pata_oldpiix driver.  
> > 
> >Correctly, it should probe with ata_piix,
> >   
> > > But with the old ide/piix, it was probed.> 
> > > This is due to this PCI devid was not migrated from the old ide/piix.  
> > 
> >It wasn't on purpose -- the IDE driver supports the original PIIX
> > incorrectly.
> >   
> 
> Hello
> 
> What about removing this old driver totally if it dont work ?

If the virtual Sparc emulator is using it does that also mean actual
Sparc hardware has it. In which case presumably it needs fixing, or at
least moving to the generic driver assuming the firmware sets it up ?

Alan


Re: [PATCH 09/27] hibernate: Disable when the kernel is locked down

2019-03-07 Thread Alan Cox
On Wed,  6 Mar 2019 15:58:55 -0800
Matthew Garrett  wrote:

> From: Josh Boyer 
> 
> There is currently no way to verify the resume image when returning
> from hibernate.  This might compromise the signed modules trust model,
> so until we can work with signed hibernate images we disable it when the
> kernel is locked down.

That one is a bit worrying since whilst the other stuff may be useful in
some business environments, mandatory hibernate not suspend to RAM is a
common corporate IT policy because of concerns about theft and recovery
of memory contents.

Alan


Re: a.out coredumping: fix or delete?

2019-03-06 Thread Alan Cox
On Wed, 6 Mar 2019 09:11:44 -0500
"Theodore Y. Ts'o"  wrote:

> On Wed, Mar 06, 2019 at 01:25:17PM +0100, Thomas Gleixner wrote:
> > > It's been 25 years since Linux added support for ELF.  Can we just
> > > delete the a.out support entirely now?  According to the Linux-ELF HOWTO,
> > > support was added in 1.1.52 (August 1994).  It's pretty much necromancy
> > > at this point.  
> > 
> > The Kernel-Necrophilia cult members might disagree. :)
> > 
> > But yes, good riddance.  
> 
> Doesn't Minix 1.0 use a.out?  It *is* cool to be able to binaries from
> run dead operating systems.  :-)

Minixemu compiled fine ELF last time I checked 8). It does need  a 32bit
system as it still uses virtual 86 model.

Alan


Re: a.out coredumping: fix or delete?

2019-03-05 Thread Alan Cox
> It's been 25 years since Linux added support for ELF.  Can we just
> delete the a.out support entirely now?  According to the Linux-ELF HOWTO,
> support was added in 1.1.52 (August 1994).  It's pretty much necromancy
> at this point.

In the unlikely event that someone actually has an a.out binary they
can't live with they can also just write an a.out loader as an ELF
program entirely in userspace.

I'd vote for giving it the boot unless there are any architectures that
kept using a.out far longer due to tool chain issues ?

Alan


Re: [PATCH][RFC] module: Cure the MODULE_LICENSE "GPL" vs. "GPL v2" bogosity

2019-01-30 Thread Alan Cox
On Wed, 30 Jan 2019 15:31:20 +1030
Rusty Russell  wrote:

> Thanks taking on such a thankless task Thomas,
> 
> Might have been overzealous in assuming a verionless GPL string meant
> "or later" (I'm happy for that for my own code, FWIW).  My memory is
> fuzzy, but I don't think anyone cared at the time.

Versionless always meant 'or later' outside of the tags. It's the default
version of the licence. (Whether v2 only has any meaning beyond intent is
another debate that I guess some year a lawyer will have to figure out).

I think the description change makes sense given the ambiguity and the
fact we now have SPDX headers. (IANAL etc)

> >  2) The dual licensed strings became ill defined as well because following
> > the "GPL" vs. "GPL v2" distinction all dual licensed (or additional

The dual ones were IMHO a mistake. They should just have used GPL and
additional rights. Either you have GPL rights (and it's ok to use in the
kernel) or you don't (and it's proprietary and the rest is down to
derivative works).

We don't actually care whether its dual licensed BSD, or whether it
merely grants you an additional right to cheap pizza.

> > As of 5.0-rc2 2873 out of 9200 instances of MODULE_LICENSE() strings are
> > conflicting with the actual license in the source code (either SPDX or
> > license boilerplate/reference). A comparison between the scan of the

The SPDX tag isn't correctly capable of expressing the licence anyway. If
you have functions in a file and two of them are GPL v2+ and someone
added a GPLv2 only one and updated the header there isn't a valid SPDX tag
for it because I can still use the GPLv2+ bits with GPLv3.

This is nothing new - the headers on the files provided no more data on
that and took up lots more space 8) We've simply never tracked licence
data by line.

Alan


Re: [PATCH][RFC] module: Cure the MODULE_LICENSE "GPL" vs. "GPL v2" bogosity

2019-01-30 Thread Alan Cox
> > > +"GPL and additional rights"   Historical variant of expressing that 
> > > the
> > > +   module source is dual licensed under a
> > > +   GPL v2 variant and MIT license. Please do
> > > +   not use in new code.

Actually it was a historic fix for the fact that some slimeballs were
going to use a proposed "BSD" tag and just ship binaries only whilst
claiming that they were totally compliant with the BSD licence.

Alan


Re: [RFC] spectre hardware-software cooperative mitigation

2019-01-18 Thread Alan Cox
> This is going to be a mammoth task. The alternatives are to continue
> as things are, which is a mess that cannot be cleaned up by either of
> (mutually exclusive) hardware or software alone.
> 
> Thoughts and feedback appreciated.

You need to be talking to the JIT developers not asking here I think.
Speculative attacks in JIT environments is a topic an order of magnitude
or more complex than the kernel cases because there isn't even process
isolation between the JIT, JIT engin eand support logic.

Alan


Re: Official Linux system wrapper library?

2018-11-16 Thread Alan Cox


> I think the issue is a bit more complex :
>   - linux doesn't support a single libc
>   - glibc doesn't support a single OS
> 
> In practice we all know (believe?) that both statements above are
> true but in practice 99% of the time there's a 1:1 relation between
> these two components. 

The top linux C library is probably the Android one. Given the number
of containers now running Alpine and the number of embedded devices it's
probably a good fight going on for 2nd, 3rd and 4th. It is certainly not
a Linux/Glibc world any more.

Alan


Re: Official Linux system wrapper library?

2018-11-16 Thread Alan Cox


> I think the issue is a bit more complex :
>   - linux doesn't support a single libc
>   - glibc doesn't support a single OS
> 
> In practice we all know (believe?) that both statements above are
> true but in practice 99% of the time there's a 1:1 relation between
> these two components. 

The top linux C library is probably the Android one. Given the number
of containers now running Alpine and the number of embedded devices it's
probably a good fight going on for 2nd, 3rd and 4th. It is certainly not
a Linux/Glibc world any more.

Alan


Re: [PATCH 2/2] exec: increase BINPRM_BUF_SIZE to 256

2018-11-16 Thread Alan Cox
On Mon, 12 Nov 2018 17:09:56 +0100
Oleg Nesterov  wrote:

> Large enterprise clients often times run applications out of networked
> file systems where the IT mandated layout of project volumes can end up
> leading to paths that are longer than 128 characters. Bumping this up to
> the next order of two solves this problem in all but the most egregious
> case while still fitting into a 512b slab.

You also need to update the execve manual page as it explicitly documents
the 128 byte limit.

Alan


Re: [PATCH 2/2] exec: increase BINPRM_BUF_SIZE to 256

2018-11-16 Thread Alan Cox
On Mon, 12 Nov 2018 17:09:56 +0100
Oleg Nesterov  wrote:

> Large enterprise clients often times run applications out of networked
> file systems where the IT mandated layout of project volumes can end up
> leading to paths that are longer than 128 characters. Bumping this up to
> the next order of two solves this problem in all but the most egregious
> case while still fitting into a 512b slab.

You also need to update the execve manual page as it explicitly documents
the 128 byte limit.

Alan


Re: 32-bit PTI with THP = userspace corruption

2018-10-22 Thread Alan Cox
On Mon, 22 Oct 2018 09:56:42 +0200
Joerg Roedel  wrote:

> On Sun, Oct 21, 2018 at 02:37:45PM +0200, Pavel Machek wrote:
> > On Tue 2018-09-18 14:00:30, Alan Cox wrote:  
> > > There are pretty much no machines that don't support PAE and are still
> > > even vaguely able to boot a modern Linux kernel. The oddity is the
> > > Pentium-M but most distros shipped a hack to use PAE on the Pentium M
> > > anyway as it seems to work fine.  
> > 
> > I do have some AMD Geode here, in form of subnotebook. Definitely
> > newer then Pentium Ms, but no PAE...  
> 
> Are the AMD Geode chips affected by Meltdown?

Geode for AMD was just a marketing name.

The AMD athlon labelled as 'Geode' will behave like any other Athlon but
I've not seen anyone successfully implement Meltdown on the Athlon so it's
probably ok. 

The earlier NatSemi ones are not AFAIK vulnerable to either. The later
ones might do Spectre (they have branch prediction which is disabled on
the earlier ones) but quite possibly not enough to be attacked usefully -
and you can turn it off anyway if you care.

And I doubt your subnotebook can usefully run modern Linux since the
memory limit on most Geode was about 64MB.

Alan


Re: 32-bit PTI with THP = userspace corruption

2018-10-22 Thread Alan Cox
On Mon, 22 Oct 2018 09:56:42 +0200
Joerg Roedel  wrote:

> On Sun, Oct 21, 2018 at 02:37:45PM +0200, Pavel Machek wrote:
> > On Tue 2018-09-18 14:00:30, Alan Cox wrote:  
> > > There are pretty much no machines that don't support PAE and are still
> > > even vaguely able to boot a modern Linux kernel. The oddity is the
> > > Pentium-M but most distros shipped a hack to use PAE on the Pentium M
> > > anyway as it seems to work fine.  
> > 
> > I do have some AMD Geode here, in form of subnotebook. Definitely
> > newer then Pentium Ms, but no PAE...  
> 
> Are the AMD Geode chips affected by Meltdown?

Geode for AMD was just a marketing name.

The AMD athlon labelled as 'Geode' will behave like any other Athlon but
I've not seen anyone successfully implement Meltdown on the Athlon so it's
probably ok. 

The earlier NatSemi ones are not AFAIK vulnerable to either. The later
ones might do Spectre (they have branch prediction which is disabled on
the earlier ones) but quite possibly not enough to be attacked usefully -
and you can turn it off anyway if you care.

And I doubt your subnotebook can usefully run modern Linux since the
memory limit on most Geode was about 64MB.

Alan


Re: [Ksummit-discuss] [PATCH 6/7] Code of Conduct: Change the contact email address

2018-10-20 Thread Alan Cox
> > Data protection law, reporting laws in some
> > countries and the like mean that anyone expecting an incident to remain
> > confidential from the person it was reported against is living in
> > dreamland and are going to get a nasty shock.  
> 
> OK - you seem to be talking about keeping the incident and reporter
> confidential from the person reported against.
> Certainly the  person reported against has to have the incident
> identified to them, so that part is not confidential.  Many legal
> jurisdictions require that the accused can know their accuser.
> But these things come into play mostly when items have veered
> into legal territory.  Most violation reports are not in the territory.
> There's no legal requirement that the Code of Conduct committee
> tell someone who it was that said they were rude on the mailing list.

The 'who said' is generally safe (except from the it's in court case -
which is fine when that happens the legal process deals with it). The
other details are not so while someone accused of something might not
know who said it they can ask for what personal data (which would include
that email with names etc scrubbed).

You can possibly fight that in court of course, if you've got lots of
money and nothing better to do for six weeks.

> > You should also reserving the right to report serious incidents directly
> > to law enforcement. Unless of course you want to be forced to sit on
> > multiple reports of physical abuse from different people about
> > someone - unable to tell them about each others report, unable to prove
> > anything, and in twenty years time having to explain to the media why
> > nothing was done.  
> 
> The scope of the code of conduct basically means that it covers
> online interactions (communication via mailing list, git commits
> and Bugzilla).  

I don't see it specifically stating that 'If someone is offensive at a
kernel summit we are going to refuse to listen'

Seriously ?

Alan


Re: [Ksummit-discuss] [PATCH 6/7] Code of Conduct: Change the contact email address

2018-10-20 Thread Alan Cox
> > Data protection law, reporting laws in some
> > countries and the like mean that anyone expecting an incident to remain
> > confidential from the person it was reported against is living in
> > dreamland and are going to get a nasty shock.  
> 
> OK - you seem to be talking about keeping the incident and reporter
> confidential from the person reported against.
> Certainly the  person reported against has to have the incident
> identified to them, so that part is not confidential.  Many legal
> jurisdictions require that the accused can know their accuser.
> But these things come into play mostly when items have veered
> into legal territory.  Most violation reports are not in the territory.
> There's no legal requirement that the Code of Conduct committee
> tell someone who it was that said they were rude on the mailing list.

The 'who said' is generally safe (except from the it's in court case -
which is fine when that happens the legal process deals with it). The
other details are not so while someone accused of something might not
know who said it they can ask for what personal data (which would include
that email with names etc scrubbed).

You can possibly fight that in court of course, if you've got lots of
money and nothing better to do for six weeks.

> > You should also reserving the right to report serious incidents directly
> > to law enforcement. Unless of course you want to be forced to sit on
> > multiple reports of physical abuse from different people about
> > someone - unable to tell them about each others report, unable to prove
> > anything, and in twenty years time having to explain to the media why
> > nothing was done.  
> 
> The scope of the code of conduct basically means that it covers
> online interactions (communication via mailing list, git commits
> and Bugzilla).  

I don't see it specifically stating that 'If someone is offensive at a
kernel summit we are going to refuse to listen'

Seriously ?

Alan


Re: [PATCH 6/7] Code of Conduct: Change the contact email address

2018-10-20 Thread Alan Cox


> +to the circumstances. The Code of Conduct Committee is obligated to
> +maintain confidentiality with regard to the reporter of an incident.
> +Further details of specific enforcement policies may be posted
> +separately.

Unfortunately by ignoring the other suggestions on this you've left this
bit broken.

The committee can't keep most stuff confidential so it's misleading and
wrong to imply they can. Data protection law, reporting laws in some
countries and the like mean that anyone expecting an incident to remain
confidential from the person it was reported against is living in
dreamland and are going to get a nasty shock.

At the very least it should say '(except where required by law)'.

There is a separate issue that serious things should always go to law
enforcement - you are setting up a policy akin to the one that got the
catholic church and many others in trouble.

You should also reserving the right to report serious incidents directly
to law enforcement. Unless of course you want to be forced to sit on
multiple reports of physical abuse from different people about
someone - unable to tell them about each others report, unable to prove
anything, and in twenty years time having to explain to the media why
nothing was done.

Alan


Re: [PATCH 6/7] Code of Conduct: Change the contact email address

2018-10-20 Thread Alan Cox


> +to the circumstances. The Code of Conduct Committee is obligated to
> +maintain confidentiality with regard to the reporter of an incident.
> +Further details of specific enforcement policies may be posted
> +separately.

Unfortunately by ignoring the other suggestions on this you've left this
bit broken.

The committee can't keep most stuff confidential so it's misleading and
wrong to imply they can. Data protection law, reporting laws in some
countries and the like mean that anyone expecting an incident to remain
confidential from the person it was reported against is living in
dreamland and are going to get a nasty shock.

At the very least it should say '(except where required by law)'.

There is a separate issue that serious things should always go to law
enforcement - you are setting up a policy akin to the one that got the
catholic church and many others in trouble.

You should also reserving the right to report serious incidents directly
to law enforcement. Unless of course you want to be forced to sit on
multiple reports of physical abuse from different people about
someone - unable to tell them about each others report, unable to prove
anything, and in twenty years time having to explain to the media why
nothing was done.

Alan


Re: [PATCH] Input: uinput - fix Spectre v1 vulnerability

2018-10-18 Thread Alan Cox
On Tue, 16 Oct 2018 20:12:43 +0200
"Gustavo A. R. Silva"  wrote:

> On 10/16/18 8:09 PM, Dmitry Torokhov wrote:
> 
> > 
> > /dev/uinput   
> 
> I've got it. This explains it all. :)
> 
> > must be 0600, or accessible to equally privileged user, or you'll be 
> > opening your system to much mischief.

Still a correct change.

CAP_SYS_RAWIO is not the same as being root, especially in a container.

Alan


Re: [PATCH] Input: uinput - fix Spectre v1 vulnerability

2018-10-18 Thread Alan Cox
On Tue, 16 Oct 2018 20:12:43 +0200
"Gustavo A. R. Silva"  wrote:

> On 10/16/18 8:09 PM, Dmitry Torokhov wrote:
> 
> > 
> > /dev/uinput   
> 
> I've got it. This explains it all. :)
> 
> > must be 0600, or accessible to equally privileged user, or you'll be 
> > opening your system to much mischief.

Still a correct change.

CAP_SYS_RAWIO is not the same as being root, especially in a container.

Alan


Re: [PATCH] kernel/signal: Signal-based pre-coredump notification

2018-10-15 Thread Alan Cox
> +/*
> + * Returns true if current's euid is same as p's uid or euid,
> + * or has CAP_SYS_ADMIN.
> + *
> + * Called with rcu_read_lock, creds are safe.
> + *
> + * Adapted from set_one_prio_perm().
> + */
> +static bool set_predump_signal_perm(struct task_struct *p)
> +{
> + const struct cred *cred = current_cred(), *pcred = __task_cred(p);
> +
> + return uid_eq(pcred->uid, cred->euid) ||
> +uid_eq(pcred->euid, cred->euid) ||
> +capable(CAP_SYS_ADMIN);
> +}

This makes absolutely no security sense whatsoever. The uid and euid of
the parent and child can both change between the test and the signal
delivery.

There are reasons that the child signal control code is incredibly
careful about either the parent or child using execve or doing a
privilege change that might pose a risk.

Until this code gets the same protections I don't believe it's safe.

Alan


Re: [PATCH] kernel/signal: Signal-based pre-coredump notification

2018-10-15 Thread Alan Cox
> +/*
> + * Returns true if current's euid is same as p's uid or euid,
> + * or has CAP_SYS_ADMIN.
> + *
> + * Called with rcu_read_lock, creds are safe.
> + *
> + * Adapted from set_one_prio_perm().
> + */
> +static bool set_predump_signal_perm(struct task_struct *p)
> +{
> + const struct cred *cred = current_cred(), *pcred = __task_cred(p);
> +
> + return uid_eq(pcred->uid, cred->euid) ||
> +uid_eq(pcred->euid, cred->euid) ||
> +capable(CAP_SYS_ADMIN);
> +}

This makes absolutely no security sense whatsoever. The uid and euid of
the parent and child can both change between the test and the signal
delivery.

There are reasons that the child signal control code is incredibly
careful about either the parent or child using execve or doing a
privilege change that might pose a risk.

Until this code gets the same protections I don't believe it's safe.

Alan


Re: [PATCH v2] platform/x86: Add Intel AtomISP2 dummy / power-management driver

2018-10-15 Thread Alan Cox
On Sun, 14 Oct 2018 19:54:27 +0200
Hans de Goede  wrote:

> The Image Signal Processor found on Cherry Trail devices is brought up in
> D0 state on devices which have camera sensors attached to it. The ISP will
> not enter D3 state again without some massaging of its registers beforehand
> and the ISP not being in D3 state blocks the SoC from entering S0ix modes.
> 
> There was a driver for the ISP in drivers/staging but that got removed
> again because it never worked. It does not seem likely that a real
> driver for the ISP will be added to the mainline kernel anytime soon.
> 
> This commit adds a dummy driver which contains the necessary magic from
> the staging driver to powerdown the ISP, so that Cherry Trail devices where
> the ISP is used will properly use S0ix modes when suspended.
> 
> Together with other recent S0ix related fixes this allows S0ix modes to
> be entered on e.g. a Chuwi Hi8 Pro and a HP x2 210.
> 
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=196915
> Signed-off-by: Hans de Goede 

Reviewed-by: Alan Cox 

Thanks


Re: [PATCH v2] platform/x86: Add Intel AtomISP2 dummy / power-management driver

2018-10-15 Thread Alan Cox
On Sun, 14 Oct 2018 19:54:27 +0200
Hans de Goede  wrote:

> The Image Signal Processor found on Cherry Trail devices is brought up in
> D0 state on devices which have camera sensors attached to it. The ISP will
> not enter D3 state again without some massaging of its registers beforehand
> and the ISP not being in D3 state blocks the SoC from entering S0ix modes.
> 
> There was a driver for the ISP in drivers/staging but that got removed
> again because it never worked. It does not seem likely that a real
> driver for the ISP will be added to the mainline kernel anytime soon.
> 
> This commit adds a dummy driver which contains the necessary magic from
> the staging driver to powerdown the ISP, so that Cherry Trail devices where
> the ISP is used will properly use S0ix modes when suspended.
> 
> Together with other recent S0ix related fixes this allows S0ix modes to
> be entered on e.g. a Chuwi Hi8 Pro and a HP x2 210.
> 
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=196915
> Signed-off-by: Hans de Goede 

Reviewed-by: Alan Cox 

Thanks


Re: [PATCH v3 1/3] x86: baytrail/cherrytrail: Rework and move P-Unit PMIC bus semaphore code

2018-10-12 Thread Alan Cox
> > It should be.  
> 
> You mean that the problem should be purely academic, IOW that registers 
> touched
> by the P-Unit are never touched through ACPI Opregions / power-resources?

As far as I am aware. Holding the lock over both is definitely better
regardless

> >> 2) To safely access the shared I2C bus, we need to do 3 things:
> >> a) Notify the GPU driver that we are starting a window in which it may not
> >> access the P-Unit, since the P-Unit seems to ignore the semaphore for
> >> explicit power-level requests made by the GPU driver  
> > 
> > That's not what happens. It's more a problem of
> > 
> > We take the SEM
> > The GPU driver pokes the GPU
> > The GPU decides it wants to change the power situation
> > The GPU asks
> > It blocks on the SEM
> > 
> > and the system deadlocks.  
> 
> That may be, but why does it deadlock?

As I understand it because the CPU is stuck waiting for the GPU which is
waiting for the SEM which the CPU is holding. This isn't purely software
remember.

> I can understand that you are reluctant to change this code, but this
> commit is not changing the logic, it mostly just moves the code around
> and I do believe that overall doing this is worthwhile.

Fair enough

Alan


Re: [PATCH v3 1/3] x86: baytrail/cherrytrail: Rework and move P-Unit PMIC bus semaphore code

2018-10-12 Thread Alan Cox
> > It should be.  
> 
> You mean that the problem should be purely academic, IOW that registers 
> touched
> by the P-Unit are never touched through ACPI Opregions / power-resources?

As far as I am aware. Holding the lock over both is definitely better
regardless

> >> 2) To safely access the shared I2C bus, we need to do 3 things:
> >> a) Notify the GPU driver that we are starting a window in which it may not
> >> access the P-Unit, since the P-Unit seems to ignore the semaphore for
> >> explicit power-level requests made by the GPU driver  
> > 
> > That's not what happens. It's more a problem of
> > 
> > We take the SEM
> > The GPU driver pokes the GPU
> > The GPU decides it wants to change the power situation
> > The GPU asks
> > It blocks on the SEM
> > 
> > and the system deadlocks.  
> 
> That may be, but why does it deadlock?

As I understand it because the CPU is stuck waiting for the GPU which is
waiting for the SEM which the CPU is holding. This isn't purely software
remember.

> I can understand that you are reluctant to change this code, but this
> commit is not changing the logic, it mostly just moves the code around
> and I do believe that overall doing this is worthwhile.

Fair enough

Alan


Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-12 Thread Alan Cox
> My understanding is that the standard “breadcrumb” is that a cache line is 
> fetched into L1D, and that the cacheline in question will go into L1D even if 
> it was previously not cached at all. So flushing L1D will cause the timing 
> from a probe to be different, but the breadcrumb is still there, and the 
> attack will still work.

Flush not write back. The L1D is empty (or full of other stuff the way
the prototype I tested did it as x86 lacked a true L1 flushing primitive)
 
> > At best you have a microscopic window to attack it on the SMT pair.  
> 
> So only the extra clever attackers will pull it off. This isn’t all that 
> reassuring.

It's going to be very hard for them to do that. You don't really have the
timing control in userspace to do that sort of thing easily.

At the end of the day it's an additional hardenening not a fix. It's
designed to provide extra protection for the cases we don't know about
until we find them and lfence them. All of this (and all sidechannel of
any kind) is merely about reducing the bandwidth an attacker can achieve.

The idea is that it's cheap enough that it's worth doing.

> Or I would get a fancy new CPU and use UMONITOR and, unless UMONITOR is much 
> cleverer than I suspect it is, the gig is up.  The time window for the attack 
> could be as small as you want, and UMONITOR will catch it.

That would be an interesting line of attack. You would still have to
navigate a fair bit of noise.

Alan


Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-12 Thread Alan Cox
> My understanding is that the standard “breadcrumb” is that a cache line is 
> fetched into L1D, and that the cacheline in question will go into L1D even if 
> it was previously not cached at all. So flushing L1D will cause the timing 
> from a probe to be different, but the breadcrumb is still there, and the 
> attack will still work.

Flush not write back. The L1D is empty (or full of other stuff the way
the prototype I tested did it as x86 lacked a true L1 flushing primitive)
 
> > At best you have a microscopic window to attack it on the SMT pair.  
> 
> So only the extra clever attackers will pull it off. This isn’t all that 
> reassuring.

It's going to be very hard for them to do that. You don't really have the
timing control in userspace to do that sort of thing easily.

At the end of the day it's an additional hardenening not a fix. It's
designed to provide extra protection for the cases we don't know about
until we find them and lfence them. All of this (and all sidechannel of
any kind) is merely about reducing the bandwidth an attacker can achieve.

The idea is that it's cheap enough that it's worth doing.

> Or I would get a fancy new CPU and use UMONITOR and, unless UMONITOR is much 
> cleverer than I suspect it is, the gig is up.  The time window for the attack 
> could be as small as you want, and UMONITOR will catch it.

That would be an interesting line of attack. You would still have to
navigate a fair bit of noise.

Alan


Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-12 Thread Alan Cox
> But this really needs to be clarified.  Alan said that a bunch of the
> "yet another Spectre variant" attacks would have been mitigated by
> this patch.  An explanation of *how* would be in order.

Today you have the situation where something creates a speculative
disclosure gadget. So we run around and we try and guess where to fix
them all with lfence. If you miss one then it leaves a trace in the L1D
cache, which is what you measure.

In almost every case we have looked at when you leave a footprint in the
L1D you resolve to an error path so the syscall errors.

In other words every time we fail to find a

if (foo < limit) {
gadget(array[foo]);
} else
return -EINVAL;

we turn that from being an easy to use gadget into something really
tricky because by the time the code flow has gotten back to the caller
the breadcrumbs have been eaten by the L1D flush.

The current process of trying to find them all with smatch and the like
is a game of whack-a-mole that will go on for a long long time. In the
meantime (and until the tools get better) it's nice to have an option
that takes a totally non-hot path (the fast path change is a single test
for >= 0) and provides additional defence in depth.

They are not hot paths: when I started playing with this idea I did
indeed strace my entire desktop for a day. There are couple of other
cases I would add to the pass list (EAGAIN, EWOULDBLOCK)). Tracing
other stuff you see the same - real world workloads simply don't generate
vast spews of erroring syscalls.

Look just how many examples we are still committing like
de916736dddbd6061472969f667b14204aa9
7b6924d94a60c6b8c1279ca003e8744e6cd9e8b1
46feb6b495f7628a6dbf36c4e6d80faf378372d4
55690c07b44a82cc3359ce0c233f4ba7d80ba145


This approach turns the ones we miss into

if (type > = 

[speculatively create breadcrumb]

condition resolves, return -EINVAL

L1D flush

No breadcrumbs

At best you have a microscopic window to attack it on the SMT pair.

Alan


Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-12 Thread Alan Cox
> But this really needs to be clarified.  Alan said that a bunch of the
> "yet another Spectre variant" attacks would have been mitigated by
> this patch.  An explanation of *how* would be in order.

Today you have the situation where something creates a speculative
disclosure gadget. So we run around and we try and guess where to fix
them all with lfence. If you miss one then it leaves a trace in the L1D
cache, which is what you measure.

In almost every case we have looked at when you leave a footprint in the
L1D you resolve to an error path so the syscall errors.

In other words every time we fail to find a

if (foo < limit) {
gadget(array[foo]);
} else
return -EINVAL;

we turn that from being an easy to use gadget into something really
tricky because by the time the code flow has gotten back to the caller
the breadcrumbs have been eaten by the L1D flush.

The current process of trying to find them all with smatch and the like
is a game of whack-a-mole that will go on for a long long time. In the
meantime (and until the tools get better) it's nice to have an option
that takes a totally non-hot path (the fast path change is a single test
for >= 0) and provides additional defence in depth.

They are not hot paths: when I started playing with this idea I did
indeed strace my entire desktop for a day. There are couple of other
cases I would add to the pass list (EAGAIN, EWOULDBLOCK)). Tracing
other stuff you see the same - real world workloads simply don't generate
vast spews of erroring syscalls.

Look just how many examples we are still committing like
de916736dddbd6061472969f667b14204aa9
7b6924d94a60c6b8c1279ca003e8744e6cd9e8b1
46feb6b495f7628a6dbf36c4e6d80faf378372d4
55690c07b44a82cc3359ce0c233f4ba7d80ba145


This approach turns the ones we miss into

if (type > = 

[speculatively create breadcrumb]

condition resolves, return -EINVAL

L1D flush

No breadcrumbs

At best you have a microscopic window to attack it on the SMT pair.

Alan


Re: [PATCH v3 1/3] x86: baytrail/cherrytrail: Rework and move P-Unit PMIC bus semaphore code

2018-10-11 Thread Alan Cox
> 1) PMIC accesses often come in the form of a read-modify-write on one of
> the PMIC registers, we currently release the P-Unit's PMIC bus semaphore
> between the read and the write. If the P-Unit modifies the register during
> this window?, then we end up overwriting the P-Unit's changes.
> I believe that this is mostly an academic problem, but I'm not sure.

It should be.

> 2) To safely access the shared I2C bus, we need to do 3 things:
> a) Notify the GPU driver that we are starting a window in which it may not
> access the P-Unit, since the P-Unit seems to ignore the semaphore for
> explicit power-level requests made by the GPU driver

That's not what happens. It's more a problem of

We take the SEM
The GPU driver pokes the GPU
The GPU decides it wants to change the power situation
The GPU asks
It blocks on the SEM

and the system deadlocks.

> b) Make a pm_qos request to force all CPU cores out of C6/C7 since entering
> C6/C7 while we hold the semaphore hangs the SoC

Not just C6/C7 necessarily. We need to stop assorted transitions.

Given how horrible this lot was to debug originally do you have any
meaningful test data and performance numbers to justify it ? As an ahem
'feature' it's gone away in modern chips so is it worth the attention ?

Alan


Re: [PATCH v3 1/3] x86: baytrail/cherrytrail: Rework and move P-Unit PMIC bus semaphore code

2018-10-11 Thread Alan Cox
> 1) PMIC accesses often come in the form of a read-modify-write on one of
> the PMIC registers, we currently release the P-Unit's PMIC bus semaphore
> between the read and the write. If the P-Unit modifies the register during
> this window?, then we end up overwriting the P-Unit's changes.
> I believe that this is mostly an academic problem, but I'm not sure.

It should be.

> 2) To safely access the shared I2C bus, we need to do 3 things:
> a) Notify the GPU driver that we are starting a window in which it may not
> access the P-Unit, since the P-Unit seems to ignore the semaphore for
> explicit power-level requests made by the GPU driver

That's not what happens. It's more a problem of

We take the SEM
The GPU driver pokes the GPU
The GPU decides it wants to change the power situation
The GPU asks
It blocks on the SEM

and the system deadlocks.

> b) Make a pm_qos request to force all CPU cores out of C6/C7 since entering
> C6/C7 while we hold the semaphore hangs the SoC

Not just C6/C7 necessarily. We need to stop assorted transitions.

Given how horrible this lot was to debug originally do you have any
meaningful test data and performance numbers to justify it ? As an ahem
'feature' it's gone away in modern chips so is it worth the attention ?

Alan


Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-11 Thread Alan Cox
> Ugh.
> 
> What exactly is this trying to protect against?  And how many cycles

Most attacks by speculation rely upon leaving footprints in the L1 cache.
They also almost inevitably resolve non speculatively to errors. If you
look through all the 'yet another potential spectre case' patches people
have found they would have been rendered close to useless by this change.

It's a way to deal with the ones we don't know about, all the ones the
tools won't find and it has pretty much zero cost

(If you are bored strace an entire days desktop session, bang it through
a script or two to extract the number of triggerig error returns and do
the maths...)

> should we expect L1D_FLUSH to take?

More to the point you pretty much never trigger it. Errors are not the
normal path in real code. The original version of this code emptied the
L1 the hard way - and even then it was in the noise for real workloads we
tried.

You can argue that the other thread could be some evil task that
deliberately triggers flushes, but it can already thrash the L1 on
processors that share L1 between threads using perfectly normal memory
instructions.

Alan



Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-11 Thread Alan Cox
> Ugh.
> 
> What exactly is this trying to protect against?  And how many cycles

Most attacks by speculation rely upon leaving footprints in the L1 cache.
They also almost inevitably resolve non speculatively to errors. If you
look through all the 'yet another potential spectre case' patches people
have found they would have been rendered close to useless by this change.

It's a way to deal with the ones we don't know about, all the ones the
tools won't find and it has pretty much zero cost

(If you are bored strace an entire days desktop session, bang it through
a script or two to extract the number of triggerig error returns and do
the maths...)

> should we expect L1D_FLUSH to take?

More to the point you pretty much never trigger it. Errors are not the
normal path in real code. The original version of this code emptied the
L1 the hard way - and even then it was in the noise for real workloads we
tried.

You can argue that the other thread could be some evil task that
deliberately triggers flushes, but it can already thrash the L1 on
processors that share L1 between threads using perfectly normal memory
instructions.

Alan



Re: Insanely high baud rates

2018-10-11 Thread Alan Cox
> I'm mostly wondering if it is worth future-proofing for new transports. It 
> sounds like we can have a consensus on leaving the upper 4 bits of the speed 
> fields reserved, but leave the details of implementation for the future?

It seems reasonable, although I think the reality is that any future
transport is not going to be a true serial link, but some kind of serial
emulation layer. For those the speed really only matters to tell editors
and the like not to bother being clever.

I mean - what is the baud rate of a pty  ?

Alan


Re: Insanely high baud rates

2018-10-11 Thread Alan Cox
> I'm mostly wondering if it is worth future-proofing for new transports. It 
> sounds like we can have a consensus on leaving the upper 4 bits of the speed 
> fields reserved, but leave the details of implementation for the future?

It seems reasonable, although I think the reality is that any future
transport is not going to be a true serial link, but some kind of serial
emulation layer. For those the speed really only matters to tell editors
and the like not to bother being clever.

I mean - what is the baud rate of a pty  ?

Alan


Re: Insanely high baud rates

2018-10-10 Thread Alan Cox
On Tue, 9 Oct 2018 12:19:04 -0700
"H. Peter Anvin"  wrote:

> [Resending to a wider audience]
> 
> In trying to get the termios2 interface actually implemented in glibc,
> the question came up if we will ever care about baud rates in excess of
> 4 Gbps, even in the relatively remote future.

Even RS485 at 4MBits involves deep magic. I think we are fairly safe. Not
only that but our entire tty layer isn't capable of sustaining anything
even remotely in that range.

I think its non issue.

Alan


Re: Insanely high baud rates

2018-10-10 Thread Alan Cox
On Tue, 9 Oct 2018 12:19:04 -0700
"H. Peter Anvin"  wrote:

> [Resending to a wider audience]
> 
> In trying to get the termios2 interface actually implemented in glibc,
> the question came up if we will ever care about baud rates in excess of
> 4 Gbps, even in the relatively remote future.

Even RS485 at 4MBits involves deep magic. I think we are fairly safe. Not
only that but our entire tty layer isn't capable of sustaining anything
even remotely in that range.

I think its non issue.

Alan


Re: [PATCH v2 3/3] code-of-conduct: Add back the TAB as the central reporting point

2018-10-10 Thread Alan Cox
On Wed, 10 Oct 2018 13:10:30 -0700
> +appropriate to the circumstances. The TAB is obligated to maintain
> +confidentiality with regard to the reporter of an incident.

I would add (except where required by law.)

Alan


Re: [PATCH v2 3/3] code-of-conduct: Add back the TAB as the central reporting point

2018-10-10 Thread Alan Cox
On Wed, 10 Oct 2018 13:10:30 -0700
> +appropriate to the circumstances. The TAB is obligated to maintain
> +confidentiality with regard to the reporter of an incident.

I would add (except where required by law.)

Alan


Re: [Ksummit-discuss] [PATCH 2/2] code-of-conduct: Strip the enforcement paragraph pending community discussion

2018-10-10 Thread Alan Cox
On Wed, 10 Oct 2018 14:19:17 -0300
Mauro Carvalho Chehab  wrote:

> Em Wed, 10 Oct 2018 16:53:08 +0100
> Alan Cox  escreveu:
> 
> > > -Maintainers have the right and responsibility to remove, edit, or reject
> > > +Maintainers may remove, edit, or reject
> > >  comments, commits, code, wiki edits, issues, and other contributions 
> > > that are
> > >  not aligned to this Code of Conduct, or to ban temporarily or 
> > > permanently any
> > >  contributor for other behaviors that they deem inappropriate, 
> > > threatening,
> > > 
> > > The previous text seems too much legal for my taste.
> > > 
> > 
> > That is just as confusing. Maintainers have the right to remove, edit,
> > reject commits that *are* aligned with the code as well.  
> 
> Good point. Yeah, a maintainer can do whatever he thinks it is 
> appropriate for a patch - even when it follows the CoC.
> 
> > So what exactly is the point here ?  
> 
> The point is "responsibility" - that sounds like it is bounding a legal
> duty to a maintainer.

If you remove the responsibility aspect you might as well remove the
entire clause. It doesn't say anything as it's simply a subset of what
maintainers do anyway.

So how about

"Maintainers should remove, edit or reject..."

that keeps the sense that there should be pressure against abusive
behaviour.

except of course someone will attach a zero day exploit and fix to a
coc-violating rant and then you are a bit stuffed 8)

Alan


Re: [Ksummit-discuss] [PATCH 2/2] code-of-conduct: Strip the enforcement paragraph pending community discussion

2018-10-10 Thread Alan Cox
On Wed, 10 Oct 2018 14:19:17 -0300
Mauro Carvalho Chehab  wrote:

> Em Wed, 10 Oct 2018 16:53:08 +0100
> Alan Cox  escreveu:
> 
> > > -Maintainers have the right and responsibility to remove, edit, or reject
> > > +Maintainers may remove, edit, or reject
> > >  comments, commits, code, wiki edits, issues, and other contributions 
> > > that are
> > >  not aligned to this Code of Conduct, or to ban temporarily or 
> > > permanently any
> > >  contributor for other behaviors that they deem inappropriate, 
> > > threatening,
> > > 
> > > The previous text seems too much legal for my taste.
> > > 
> > 
> > That is just as confusing. Maintainers have the right to remove, edit,
> > reject commits that *are* aligned with the code as well.  
> 
> Good point. Yeah, a maintainer can do whatever he thinks it is 
> appropriate for a patch - even when it follows the CoC.
> 
> > So what exactly is the point here ?  
> 
> The point is "responsibility" - that sounds like it is bounding a legal
> duty to a maintainer.

If you remove the responsibility aspect you might as well remove the
entire clause. It doesn't say anything as it's simply a subset of what
maintainers do anyway.

So how about

"Maintainers should remove, edit or reject..."

that keeps the sense that there should be pressure against abusive
behaviour.

except of course someone will attach a zero day exploit and fix to a
coc-violating rant and then you are a bit stuffed 8)

Alan


Re: [Ksummit-discuss] [PATCH 2/2] code-of-conduct: Strip the enforcement paragraph pending community discussion

2018-10-10 Thread Alan Cox
> -Maintainers have the right and responsibility to remove, edit, or reject
> +Maintainers may remove, edit, or reject
>  comments, commits, code, wiki edits, issues, and other contributions that are
>  not aligned to this Code of Conduct, or to ban temporarily or permanently any
>  contributor for other behaviors that they deem inappropriate, threatening,
> 
> The previous text seems too much legal for my taste.
> 

That is just as confusing. Maintainers have the right to remove, edit,
reject commits that *are* aligned with the code as well. So what exactly
is the point here ?

Alan


Re: [Ksummit-discuss] [PATCH 2/2] code-of-conduct: Strip the enforcement paragraph pending community discussion

2018-10-10 Thread Alan Cox
> -Maintainers have the right and responsibility to remove, edit, or reject
> +Maintainers may remove, edit, or reject
>  comments, commits, code, wiki edits, issues, and other contributions that are
>  not aligned to this Code of Conduct, or to ban temporarily or permanently any
>  contributor for other behaviors that they deem inappropriate, threatening,
> 
> The previous text seems too much legal for my taste.
> 

That is just as confusing. Maintainers have the right to remove, edit,
reject commits that *are* aligned with the code as well. So what exactly
is the point here ?

Alan


Re: [Ksummit-discuss] [PATCH] code-of-conduct: Remove explicit list of discrimination factors

2018-10-08 Thread Alan Cox
> In any case, this is not the appropriate place for such patches, any
> more than it's the place for patches to the GPL.

I disagree. We had the GPLv2 or GPLv3 discussion on the kernel mailing
list. The syscall clarification was discussed on the list. The
EXPORT_SYMBOL and EXPORT_SYMBOL_GPL stuff was discussed on the list.

Alan


Re: [Ksummit-discuss] [PATCH] code-of-conduct: Remove explicit list of discrimination factors

2018-10-08 Thread Alan Cox
> In any case, this is not the appropriate place for such patches, any
> more than it's the place for patches to the GPL.

I disagree. We had the GPLv2 or GPLv3 discussion on the kernel mailing
list. The syscall clarification was discussed on the list. The
EXPORT_SYMBOL and EXPORT_SYMBOL_GPL stuff was discussed on the list.

Alan


Re: [Ksummit-discuss] [PATCH 2/2] code-of-conduct: Strip the enforcement paragraph pending community discussion

2018-10-08 Thread Alan Cox
 
> I happen to think that the fact that the TAB cannot compel where it
> cannot persuade is a huge strength of the system because it means
> there's no power structure to subvert if someone were interested in
> using it to try to impose their own viewpoint on the community.  But
> that's just my opinion and I did write the TAB charter, so I'm probably
> biased in this viewpoint.

The TAB can't handle it anyway because the privacy promise about
reporting is incompatible with reality for three reasons (and I bet there
are more)

1. Things like the EUCD can force almost all but the name to be revealed
to the person complained about as the tab has no legal privilege.
2. There are lots of laws in lots of locations where some allegations
*MUST* be reported to law enforcement.
3. We know from things like the catholic church debacle that serious
allegations need to be fast-pathed to the legal system - yet the privacy
promises are incompatible with that.

It ever got really nasty then the scenario that unfolds is potentially
the following

Developer A makes a complaint about developer B
Developer B's employer fires developer B

Developer B then uses things like the EUCD to force the TAB to provide
the complaint details (with personal data redacted) and the TAB has no
real defence as it's not legally privileged. 

Developer B then sues developer A, the TAB for all sorts of things, the
LF and their employer.

In court what's going to happen to the TAB ?

= Where is your written policy ?
= Who approved it and reviewed it for legal compliance ?
= What are your qualifications in this area ?
= Where are the full minutes of the decision ?
= Which of you work for rival companies ?
= What personal connections do or your frends have to A and B ?

Needless to say answers like 'we don't have one, nobody, none, umm I think
there's an email thread' are not going to go down well.

This sort of mess works with big company HR departments because they've
got lawyers and they have lots of written process. If it hits a court
then B's employer is able to point at all their rules and policies,
employment contracts etc. All of the decisions were either legally
privileged or minuted properly. The people who made the decisions have
appropriate professional qualifications.

The TAB can't enforce anything. If maintainers decide to carry on
accepting patches from someone what can they do ?

So both patches:

Reviewed-by: Alan Cox 




Re: [Ksummit-discuss] [PATCH 2/2] code-of-conduct: Strip the enforcement paragraph pending community discussion

2018-10-08 Thread Alan Cox
 
> I happen to think that the fact that the TAB cannot compel where it
> cannot persuade is a huge strength of the system because it means
> there's no power structure to subvert if someone were interested in
> using it to try to impose their own viewpoint on the community.  But
> that's just my opinion and I did write the TAB charter, so I'm probably
> biased in this viewpoint.

The TAB can't handle it anyway because the privacy promise about
reporting is incompatible with reality for three reasons (and I bet there
are more)

1. Things like the EUCD can force almost all but the name to be revealed
to the person complained about as the tab has no legal privilege.
2. There are lots of laws in lots of locations where some allegations
*MUST* be reported to law enforcement.
3. We know from things like the catholic church debacle that serious
allegations need to be fast-pathed to the legal system - yet the privacy
promises are incompatible with that.

It ever got really nasty then the scenario that unfolds is potentially
the following

Developer A makes a complaint about developer B
Developer B's employer fires developer B

Developer B then uses things like the EUCD to force the TAB to provide
the complaint details (with personal data redacted) and the TAB has no
real defence as it's not legally privileged. 

Developer B then sues developer A, the TAB for all sorts of things, the
LF and their employer.

In court what's going to happen to the TAB ?

= Where is your written policy ?
= Who approved it and reviewed it for legal compliance ?
= What are your qualifications in this area ?
= Where are the full minutes of the decision ?
= Which of you work for rival companies ?
= What personal connections do or your frends have to A and B ?

Needless to say answers like 'we don't have one, nobody, none, umm I think
there's an email thread' are not going to go down well.

This sort of mess works with big company HR departments because they've
got lawyers and they have lots of written process. If it hits a court
then B's employer is able to point at all their rules and policies,
employment contracts etc. All of the decisions were either legally
privileged or minuted properly. The people who made the decisions have
appropriate professional qualifications.

The TAB can't enforce anything. If maintainers decide to carry on
accepting patches from someone what can they do ?

So both patches:

Reviewed-by: Alan Cox 




Re: Leaking Path in XFS's ioctl interface(missing LSM check)

2018-10-01 Thread Alan Cox
> /* only root can play with this */
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
> 
> Think about it - if DM control ioctls only require CAP_SYS_ADMIN,
> then if have that cap you can use DM to remap any block in a block
> device to any other block. You don't need to the filesystem to move
> stuff around, it can be moved around without the filesystem knowing
> anything about it.

Yes - I am not surprised the XFS is not the only problem area. The fact
XFS also isn't going via the security hooks so security hooks can fix it
just makes it worse.

> > That's what people said about setuid shell scripts.  
> 
> Completely different. setuid shell scripts got abused as a hack for
> the lazy to avoid setting up permissions properly and hence were
> easily exploited.

Sounds to me like an accurate description of the current capabilities
mess in the kernel (and not just XFS and not just file systems)

> Systems restricted by LSMs to the point where CAP_SYS_ADMIN is not
> trusted have exactly the same issues. i.e. there's nobody trusted by
> the kernel to administer the storage stack, and nobody has defined a
> workable security model that can prevent untrusted users from
> violating the existing storage trust model

With a proper set of LSM checks you can lock the filesystem management
and enforcement to a particular set of objects. You can build that model
where for example only an administrative login from a trusted console may
launch processes to do that management.

Or you could - if things were not going around the LSM hooks.

Alan


Re: Leaking Path in XFS's ioctl interface(missing LSM check)

2018-10-01 Thread Alan Cox
> /* only root can play with this */
> if (!capable(CAP_SYS_ADMIN))
> return -EACCES;
> 
> Think about it - if DM control ioctls only require CAP_SYS_ADMIN,
> then if have that cap you can use DM to remap any block in a block
> device to any other block. You don't need to the filesystem to move
> stuff around, it can be moved around without the filesystem knowing
> anything about it.

Yes - I am not surprised the XFS is not the only problem area. The fact
XFS also isn't going via the security hooks so security hooks can fix it
just makes it worse.

> > That's what people said about setuid shell scripts.  
> 
> Completely different. setuid shell scripts got abused as a hack for
> the lazy to avoid setting up permissions properly and hence were
> easily exploited.

Sounds to me like an accurate description of the current capabilities
mess in the kernel (and not just XFS and not just file systems)

> Systems restricted by LSMs to the point where CAP_SYS_ADMIN is not
> trusted have exactly the same issues. i.e. there's nobody trusted by
> the kernel to administer the storage stack, and nobody has defined a
> workable security model that can prevent untrusted users from
> violating the existing storage trust model

With a proper set of LSM checks you can lock the filesystem management
and enforcement to a particular set of objects. You can build that model
where for example only an administrative login from a trusted console may
launch processes to do that management.

Or you could - if things were not going around the LSM hooks.

Alan


Re: Leaking Path in XFS's ioctl interface(missing LSM check)

2018-09-30 Thread Alan Cox
> > CAP_SYS_ADMIN is also a bit weird because low level access usually
> > implies you can bypass access controls so you should also check
> > CAP_SYS_DAC ?  
> 
> Do you mean CAP_DAC_READ_SEARCH as per the newer handle syscalls?
> But that only allows bypassing directory search operations, so maybe
> you mean CAP_DAC_OVERRIDE?

It depends what the ioctl allows you to do. If it allows me to bypass
DAC and manipulate the file system to move objects around then it's a
serious issue.

The underlying problem is if CAP_SYS_ADMIN is able to move objects around
then I can move modules around. We already have a problem with
CAP_DAC_OVERRIDE giving you CAP_SYS_RAWIO (ie totally owning the machine)
unless the modules are signed, if xfs allows ADMIN as well then
CAP_SYS_ADMIN is much easier to obtain and you'd get total system
ownership from it.

Not good.

> Regardless, this horse bolted long before those syscalls were
> introduced.  The time to address this issue was when XFS was merged
> into linux all those years ago, back when the apps that run in
> highly secure restricted environments that use these interfaces were
> being ported to linux. We can't change this now without breaking
> userspace

That's what people said about setuid shell scripts.

I'd like to understand better what can be done. We can argue afterwards
about what if anything to do about it and if it is possible to abuse it.

Alan


Re: Leaking Path in XFS's ioctl interface(missing LSM check)

2018-09-30 Thread Alan Cox
> > CAP_SYS_ADMIN is also a bit weird because low level access usually
> > implies you can bypass access controls so you should also check
> > CAP_SYS_DAC ?  
> 
> Do you mean CAP_DAC_READ_SEARCH as per the newer handle syscalls?
> But that only allows bypassing directory search operations, so maybe
> you mean CAP_DAC_OVERRIDE?

It depends what the ioctl allows you to do. If it allows me to bypass
DAC and manipulate the file system to move objects around then it's a
serious issue.

The underlying problem is if CAP_SYS_ADMIN is able to move objects around
then I can move modules around. We already have a problem with
CAP_DAC_OVERRIDE giving you CAP_SYS_RAWIO (ie totally owning the machine)
unless the modules are signed, if xfs allows ADMIN as well then
CAP_SYS_ADMIN is much easier to obtain and you'd get total system
ownership from it.

Not good.

> Regardless, this horse bolted long before those syscalls were
> introduced.  The time to address this issue was when XFS was merged
> into linux all those years ago, back when the apps that run in
> highly secure restricted environments that use these interfaces were
> being ported to linux. We can't change this now without breaking
> userspace

That's what people said about setuid shell scripts.

I'd like to understand better what can be done. We can argue afterwards
about what if anything to do about it and if it is possible to abuse it.

Alan


Re: [PATCH] x86/cpu: Rename Denverton and Gemini Lake

2018-09-28 Thread Alan Cox
> > SILVERMONT_CLIENT   0x37 Baytrail, Valleyview

There is no such product as Valleyview. Some things talk about
Valleyview 2 but shouldn't as that is Baytrail.

> /* "Small Core" Processors (Atom) */
> 
> #define INTEL_FAM6_ATOM_BONNELL   0x1C /* Diamondville, Pineview 
> */
> #define INTEL_FAM6_ATOM_BONNELL_MID   0x26 /* Silverthorne, Lincroft */
> 
> #define INTEL_FAM6_ATOM_SALTWELL  0x36 /* Cedarview */
> #define INTEL_FAM6_ATOM_SALTWELL_MID  0x27 /* Penwell */
> #define INTEL_FAM6_ATOM_SALTWELL_TABLET   0x35 /* Cloverview */
> 
> #define INTEL_FAM6_ATOM_SILVERMONT0x37 /* Bay Trail, Valleyview */
> #define INTEL_FAM6_ATOM_SILVERMONT_X  0x4D /* Avaton, Rangely */
> #define INTEL_FAM6_ATOM_SILVERMONT_MID0x4A /* Merriefield */
> 
> #define INTEL_FAM6_ATOM_AIRMONT   0x4C /* Cherry Trail, Braswell 
> */
> #define INTEL_FAM6_ATOM_AIRMONT_MID   0x5A /* Moorefield */
> 
> #define INTEL_FAM6_ATOM_GOLDMONT  0x5C /* Apollo Lake */
> #define INTEL_FAM6_ATOM_GOLDMONT_X0x5F /* Denvertor */
> #define INTEL_FAM6_ATOM_GOLDMONT_PLUS 0x7A /* Gemini Lake */

Can you decide if you are going to consistently use the -view or the
-trail naming ? I think for our use cases you mean Pinetrail, Cedartrail,
Clovertrail, Cherrytrail as you are talking about platform (at least in
the Intel sense of the word).

Alan


Re: [PATCH] x86/cpu: Rename Denverton and Gemini Lake

2018-09-28 Thread Alan Cox
> > SILVERMONT_CLIENT   0x37 Baytrail, Valleyview

There is no such product as Valleyview. Some things talk about
Valleyview 2 but shouldn't as that is Baytrail.

> /* "Small Core" Processors (Atom) */
> 
> #define INTEL_FAM6_ATOM_BONNELL   0x1C /* Diamondville, Pineview 
> */
> #define INTEL_FAM6_ATOM_BONNELL_MID   0x26 /* Silverthorne, Lincroft */
> 
> #define INTEL_FAM6_ATOM_SALTWELL  0x36 /* Cedarview */
> #define INTEL_FAM6_ATOM_SALTWELL_MID  0x27 /* Penwell */
> #define INTEL_FAM6_ATOM_SALTWELL_TABLET   0x35 /* Cloverview */
> 
> #define INTEL_FAM6_ATOM_SILVERMONT0x37 /* Bay Trail, Valleyview */
> #define INTEL_FAM6_ATOM_SILVERMONT_X  0x4D /* Avaton, Rangely */
> #define INTEL_FAM6_ATOM_SILVERMONT_MID0x4A /* Merriefield */
> 
> #define INTEL_FAM6_ATOM_AIRMONT   0x4C /* Cherry Trail, Braswell 
> */
> #define INTEL_FAM6_ATOM_AIRMONT_MID   0x5A /* Moorefield */
> 
> #define INTEL_FAM6_ATOM_GOLDMONT  0x5C /* Apollo Lake */
> #define INTEL_FAM6_ATOM_GOLDMONT_X0x5F /* Denvertor */
> #define INTEL_FAM6_ATOM_GOLDMONT_PLUS 0x7A /* Gemini Lake */

Can you decide if you are going to consistently use the -view or the
-trail naming ? I think for our use cases you mean Pinetrail, Cedartrail,
Clovertrail, Cherrytrail as you are talking about platform (at least in
the Intel sense of the word).

Alan


Re: Code of Conduct: Let's revamp it.

2018-09-28 Thread Alan Cox
> Well, then I have to repeat myself: Signed-off source code (in form of
> patches) in a well-known programming language for a (nowadays)
> well-known GPLv2 licensed project mailed on "everyone can subscribe"
> mailinglists, (thus) to be found in several $SEARCH_ENGINE-indexed
> mailinglist archives, if accepted to be found in lots of publicly
> accessible git repos can be not intended to be published?
> 
> I wonder what else must happen.

There is a bigger problem in the ambiguity.

It's easy to deal with signed off by lines because I had the sense to
make sure that the DCO covered us for EU data protection and thus it's
explicit.

It's relatively easy to deal with the case of 'I contributed some code'.

It's really not at all obvious what happens with 'I got some code from
another project that contains it's authors name'.

The wording IMHO just needs tightening up - and that's a useful
discussion that ought to he bad. I tihnk everyone understands the *inent*
of such wording - don't go around doxing people, or posting their home
address on facebook and calling for people to attend with pitchforks.

There's a second related area that needs sorting out in wording which is
the implication of any kind of privacy in a complaint - which is really
bad in two ways

As it is set up now the tab is not a lawyer so the tab could not claim
any kind of legal privilege. That means in the event of a complaint the
tab would be powerless not to release almost all the info in the
complaint if hit by a data protectin request in many jurisdictions. Sure
they'd have to (and be required to) remove some of the information that
might identify the complainant.

Secondly one thing that we've learned repeatedly (and notably from the
church scandals) is that there are some complaints that should upon
receipt be handed directly to law enforcement, but there is no carve out
for this.

The other issue is that whoever handles any complaint system needs a
budget and lawyers because they will potentially have to field judicial
reviews and other challenges. That means the TAB needs to have
exemplary record keeping and process because anyone who stands up in a
legal challenge and says 'Umm.. we read it and talked about it and kind
of decided X but I don't remember why and there are no minutes and there
is on process document' is going to get fried. Someone needs to have that
process in place well in advance.

Alan


Re: Code of Conduct: Let's revamp it.

2018-09-28 Thread Alan Cox
> Well, then I have to repeat myself: Signed-off source code (in form of
> patches) in a well-known programming language for a (nowadays)
> well-known GPLv2 licensed project mailed on "everyone can subscribe"
> mailinglists, (thus) to be found in several $SEARCH_ENGINE-indexed
> mailinglist archives, if accepted to be found in lots of publicly
> accessible git repos can be not intended to be published?
> 
> I wonder what else must happen.

There is a bigger problem in the ambiguity.

It's easy to deal with signed off by lines because I had the sense to
make sure that the DCO covered us for EU data protection and thus it's
explicit.

It's relatively easy to deal with the case of 'I contributed some code'.

It's really not at all obvious what happens with 'I got some code from
another project that contains it's authors name'.

The wording IMHO just needs tightening up - and that's a useful
discussion that ought to he bad. I tihnk everyone understands the *inent*
of such wording - don't go around doxing people, or posting their home
address on facebook and calling for people to attend with pitchforks.

There's a second related area that needs sorting out in wording which is
the implication of any kind of privacy in a complaint - which is really
bad in two ways

As it is set up now the tab is not a lawyer so the tab could not claim
any kind of legal privilege. That means in the event of a complaint the
tab would be powerless not to release almost all the info in the
complaint if hit by a data protectin request in many jurisdictions. Sure
they'd have to (and be required to) remove some of the information that
might identify the complainant.

Secondly one thing that we've learned repeatedly (and notably from the
church scandals) is that there are some complaints that should upon
receipt be handed directly to law enforcement, but there is no carve out
for this.

The other issue is that whoever handles any complaint system needs a
budget and lawyers because they will potentially have to field judicial
reviews and other challenges. That means the TAB needs to have
exemplary record keeping and process because anyone who stands up in a
legal challenge and says 'Umm.. we read it and talked about it and kind
of decided X but I don't remember why and there are no minutes and there
is on process document' is going to get fried. Someone needs to have that
process in place well in advance.

Alan


Re: Howto prevent kernel from evicting code pages ever? (to avoid disk thrashing when about to run out of RAM)

2018-09-28 Thread Alan Cox
On Wed, 22 Aug 2018 11:25:35 +0200
Marcus Linsner  wrote:

> Hi. How to make the kernel keep(lock?) all code pages in RAM so that
> kswapd0 won't evict them when the system is under low memory
> conditions ?
> 
> The purpose of this is to prevent the kernel from causing lots of disk
> reads(effectively freezing the whole system) when about to run out of
> RAM, even when there is no swap enabled, but well before(in real time
> minutes) OOM-killer triggers to kill the offending process (eg. ld)!

Having no swap is not helping you at all.

In Linux you can do several things. Firstly add some swap - even a
swap file because if you have no swap you fill up memory with pages that
are not backed by disk and the kernel has to pick less and less optimal
things to swap out so begins to thrash. Even slowish swap is better than
no swap as it can dump out little used data pages.

You can tune the OOM killer to taste and you can even guide it on what to
shoot first.

You can use cgroups to constrain the resources some group of things are
allowed to use.

You can play with no overcommit mode, although that is much more about
'cannot fail' embedded applications usually. In that mode the kernel
tightly constrains the resource overcommit permissible. It's very
conservative and you end up needing a lot of 'just in case' wasted
resource, although you can tune the amount you leverage the real
resources.

To be fair Linux *is* really bad at handling this case. What other systems
did (being older and from the days where RAM wasn't reasonably assumed
infinite) was two fold.  The first was under high swap load to switch to
swapping out entire processes, which with all the shared resources and
fast I/O today isn't quite so relevant. The second was that it would
ensure a process got a certain amount of real CPU time before it's pages
could be booted out again (and would then boot out lots of them). That
turns the thrashing into forward progress but still feels unpleasant.
However you still need swap or you don't have anywhere to boot out all
the dirty non code pages in order to manage progress.

There is a reason swap exists. If you don't have enough RAM to run
smoothly without swap, add swap (or RAM). Even then some things usually
need swap - I've got things that make the compiler consume over 16GB
building one file. With swap it's fine even on a 4GB machine.

Alan


Re: Howto prevent kernel from evicting code pages ever? (to avoid disk thrashing when about to run out of RAM)

2018-09-28 Thread Alan Cox
On Wed, 22 Aug 2018 11:25:35 +0200
Marcus Linsner  wrote:

> Hi. How to make the kernel keep(lock?) all code pages in RAM so that
> kswapd0 won't evict them when the system is under low memory
> conditions ?
> 
> The purpose of this is to prevent the kernel from causing lots of disk
> reads(effectively freezing the whole system) when about to run out of
> RAM, even when there is no swap enabled, but well before(in real time
> minutes) OOM-killer triggers to kill the offending process (eg. ld)!

Having no swap is not helping you at all.

In Linux you can do several things. Firstly add some swap - even a
swap file because if you have no swap you fill up memory with pages that
are not backed by disk and the kernel has to pick less and less optimal
things to swap out so begins to thrash. Even slowish swap is better than
no swap as it can dump out little used data pages.

You can tune the OOM killer to taste and you can even guide it on what to
shoot first.

You can use cgroups to constrain the resources some group of things are
allowed to use.

You can play with no overcommit mode, although that is much more about
'cannot fail' embedded applications usually. In that mode the kernel
tightly constrains the resource overcommit permissible. It's very
conservative and you end up needing a lot of 'just in case' wasted
resource, although you can tune the amount you leverage the real
resources.

To be fair Linux *is* really bad at handling this case. What other systems
did (being older and from the days where RAM wasn't reasonably assumed
infinite) was two fold.  The first was under high swap load to switch to
swapping out entire processes, which with all the shared resources and
fast I/O today isn't quite so relevant. The second was that it would
ensure a process got a certain amount of real CPU time before it's pages
could be booted out again (and would then boot out lots of them). That
turns the thrashing into forward progress but still feels unpleasant.
However you still need swap or you don't have anywhere to boot out all
the dirty non code pages in order to manage progress.

There is a reason swap exists. If you don't have enough RAM to run
smoothly without swap, add swap (or RAM). Even then some things usually
need swap - I've got things that make the compiler consume over 16GB
building one file. With swap it's fine even on a 4GB machine.

Alan


Re: Leaking path for set_task_comm

2018-09-28 Thread Alan Cox


> Trying to depend on task name for anything security sensitive is at
> _really_ bad idea, so it seems unlikely that a LSM would want to
> protect the process name.  (And if they did, the first thing I would
> ask is "Why?  What are you trying to do?  Do you realize how many
> *other* ways the process name can be spoofed or otherwise controlled
> by a potentially malicious user?")

Two processes that should not be able to otherwise communicate can keep
changing their name to a chunk of data, waiting for an ack flag name
change back.

Alan


Re: Leaking path for set_task_comm

2018-09-28 Thread Alan Cox


> Trying to depend on task name for anything security sensitive is at
> _really_ bad idea, so it seems unlikely that a LSM would want to
> protect the process name.  (And if they did, the first thing I would
> ask is "Why?  What are you trying to do?  Do you realize how many
> *other* ways the process name can be spoofed or otherwise controlled
> by a potentially malicious user?")

Two processes that should not be able to otherwise communicate can keep
changing their name to a chunk of data, waiting for an ack flag name
change back.

Alan


Re: POSIX violation by writeback error

2018-09-27 Thread Alan Cox
On Wed, 26 Sep 2018 17:49:09 -0400
"Theodore Y. Ts'o"  wrote:

> On Wed, Sep 26, 2018 at 07:10:55PM +0100, Alan Cox wrote:
> > In almost all cases you don't care so you wouldn't use it. In those cases
> > where it might matter it's almost always the case that a reader won't
> > consume it before it hits the media.
> > 
> > That's why I suggested having an fbarrier() so you can explicitly say 'in
> > the even that case does happen then stall and write it'. It's kind of
> > lazy fsync. That can be used with almost no cost by things like mail
> > daemons.  
> 
> How could mail daemons use it?  They *have* to do an fsync() before
> they send a 2xx SMTP return code.

Point - so actually it would be less useful

> > Another way given that this only really makes sense with locks
> > is to add that fbarrier notion as a file locking optional semantic so you
> > can 'unlock with barrier' and 'lock with barrier honoured'  
> 
> I'm not sure what you're suggesting?

If someone has an actual case you could in theory constrain it to a range
specified in a file lock and only between two people who care. That said
seems like a lot of complexity to make a case nobody cares about only
affect people who care about it

Alan


Re: POSIX violation by writeback error

2018-09-27 Thread Alan Cox
On Wed, 26 Sep 2018 17:49:09 -0400
"Theodore Y. Ts'o"  wrote:

> On Wed, Sep 26, 2018 at 07:10:55PM +0100, Alan Cox wrote:
> > In almost all cases you don't care so you wouldn't use it. In those cases
> > where it might matter it's almost always the case that a reader won't
> > consume it before it hits the media.
> > 
> > That's why I suggested having an fbarrier() so you can explicitly say 'in
> > the even that case does happen then stall and write it'. It's kind of
> > lazy fsync. That can be used with almost no cost by things like mail
> > daemons.  
> 
> How could mail daemons use it?  They *have* to do an fsync() before
> they send a 2xx SMTP return code.

Point - so actually it would be less useful

> > Another way given that this only really makes sense with locks
> > is to add that fbarrier notion as a file locking optional semantic so you
> > can 'unlock with barrier' and 'lock with barrier honoured'  
> 
> I'm not sure what you're suggesting?

If someone has an actual case you could in theory constrain it to a range
specified in a file lock and only between two people who care. That said
seems like a lot of complexity to make a case nobody cares about only
affect people who care about it

Alan


Re: Leaking Path in XFS's ioctl interface(missing LSM check)

2018-09-26 Thread Alan Cox
On Wed, 26 Sep 2018 11:33:29 +1000
Dave Chinner  wrote:

> On Tue, Sep 25, 2018 at 08:51:50PM -0400, TongZhang wrote:
> > Hi,
> > 
> > I'm bringing up this issue again to let of LSM developers know the 
> > situation, and would like to know your thoughts.
> > Several weeks ago I sent an email to the security list to discuss the issue 
> > where
> > XFS's ioctl interface can do things like vfs_readlink without asking LSM's
> > permission, which we think is kind of weird and this kind of operation 
> > should be
> > audited by LSM.  
> 
> These aren't user interfaces. They are filesystem maintenance and
> extension interfaces.  They are intended for low level filesystem
> utilities that require complete, unrestricted access to the
> underlying filesystem via holding CAP_SYSADMIN in the initns.

CAP_SYS_ADMIN is meaningless in an environment using a strong LSM set up.
So what if I have CAP_SYS_ADMIN ? In a secure environment low level
complete unrestricted access to the file system is most definitely
something that should be mediated.

CAP_SYS_ADMIN is also a bit weird because low level access usually
implies you can bypass access controls so you should also check
CAP_SYS_DAC ?

Alan


Re: Leaking Path in XFS's ioctl interface(missing LSM check)

2018-09-26 Thread Alan Cox
On Wed, 26 Sep 2018 11:33:29 +1000
Dave Chinner  wrote:

> On Tue, Sep 25, 2018 at 08:51:50PM -0400, TongZhang wrote:
> > Hi,
> > 
> > I'm bringing up this issue again to let of LSM developers know the 
> > situation, and would like to know your thoughts.
> > Several weeks ago I sent an email to the security list to discuss the issue 
> > where
> > XFS's ioctl interface can do things like vfs_readlink without asking LSM's
> > permission, which we think is kind of weird and this kind of operation 
> > should be
> > audited by LSM.  
> 
> These aren't user interfaces. They are filesystem maintenance and
> extension interfaces.  They are intended for low level filesystem
> utilities that require complete, unrestricted access to the
> underlying filesystem via holding CAP_SYSADMIN in the initns.

CAP_SYS_ADMIN is meaningless in an environment using a strong LSM set up.
So what if I have CAP_SYS_ADMIN ? In a secure environment low level
complete unrestricted access to the file system is most definitely
something that should be mediated.

CAP_SYS_ADMIN is also a bit weird because low level access usually
implies you can bypass access controls so you should also check
CAP_SYS_DAC ?

Alan


Re: POSIX violation by writeback error

2018-09-26 Thread Alan Cox
> And I think that's fine.  The only way we can make any guarantees is
> if we do what Alan suggested, which is to imply that a read on a dirty
> page *block* until the the page is successfully written back.  This
> would destroy performance.

In almost all cases you don't care so you wouldn't use it. In those cases
where it might matter it's almost always the case that a reader won't
consume it before it hits the media.

That's why I suggested having an fbarrier() so you can explicitly say 'in
the even that case does happen then stall and write it'. It's kind of
lazy fsync. That can be used with almost no cost by things like mail
daemons. Another way given that this only really makes sense with locks
is to add that fbarrier notion as a file locking optional semantic so you
can 'unlock with barrier' and 'lock with barrier honoured'

Alan


Re: POSIX violation by writeback error

2018-09-26 Thread Alan Cox
> And I think that's fine.  The only way we can make any guarantees is
> if we do what Alan suggested, which is to imply that a read on a dirty
> page *block* until the the page is successfully written back.  This
> would destroy performance.

In almost all cases you don't care so you wouldn't use it. In those cases
where it might matter it's almost always the case that a reader won't
consume it before it hits the media.

That's why I suggested having an fbarrier() so you can explicitly say 'in
the even that case does happen then stall and write it'. It's kind of
lazy fsync. That can be used with almost no cost by things like mail
daemons. Another way given that this only really makes sense with locks
is to add that fbarrier notion as a file locking optional semantic so you
can 'unlock with barrier' and 'lock with barrier honoured'

Alan


Re: POSIX violation by writeback error

2018-09-25 Thread Alan Cox
> Unlike O_TMPFILE, this would require file system changes to support,
> so maybe it's not worth having something which automatically cleans up
> files that were in the middle of being written at the time of a system
> crash.

Would it. If you open a file unlink it and write to it and then have a
linkf(fd, path); your underlying fs behaviour isn't really changed it's
just you are allowed to name a file late ?

Alan


Re: POSIX violation by writeback error

2018-09-25 Thread Alan Cox
> Unlike O_TMPFILE, this would require file system changes to support,
> so maybe it's not worth having something which automatically cleans up
> files that were in the middle of being written at the time of a system
> crash.

Would it. If you open a file unlink it and write to it and then have a
linkf(fd, path); your underlying fs behaviour isn't really changed it's
just you are allowed to name a file late ?

Alan


Re: POSIX violation by writeback error

2018-09-24 Thread Alan Cox
> write()
> kernel attempts to write back page and fails
> page is marked clean and evicted from the cache
> read()
> 
> Now your write is gone and there were no calls between the write and
> read.
> 
> The question we still need to answer is this:
> 
> When we attempt to write back some data from the cache and that fails,
> what should happen to the dirty pages?

Why do you care about the content of the pages at that point. The only
options are to use the data (todays model), or to report that you are on
fire.

If you are going to error you don't need to use the data so you could in
fact compress dramatically the amount of stuff you need to save
somewhere. You need the page information so you can realize what page
this is, but you can point the data into oblivion somewhere because you
are no longer going to give it to anyone (assuming you can successfully
force unmap it from everyone once it's not locked by a DMA or similar).

In the real world though it's fairly unusual to just lose a bit of I/O.
Flash devices in particular have a nasty tendancy to simply go *poof* and
the first you know about an I/O error is the last data the drive ever
gives you short of jtag. NFS is an exception and NFS soft timeouts are
nasty.

Alan


Re: POSIX violation by writeback error

2018-09-24 Thread Alan Cox
> write()
> kernel attempts to write back page and fails
> page is marked clean and evicted from the cache
> read()
> 
> Now your write is gone and there were no calls between the write and
> read.
> 
> The question we still need to answer is this:
> 
> When we attempt to write back some data from the cache and that fails,
> what should happen to the dirty pages?

Why do you care about the content of the pages at that point. The only
options are to use the data (todays model), or to report that you are on
fire.

If you are going to error you don't need to use the data so you could in
fact compress dramatically the amount of stuff you need to save
somewhere. You need the page information so you can realize what page
this is, but you can point the data into oblivion somewhere because you
are no longer going to give it to anyone (assuming you can successfully
force unmap it from everyone once it's not locked by a DMA or similar).

In the real world though it's fairly unusual to just lose a bit of I/O.
Flash devices in particular have a nasty tendancy to simply go *poof* and
the first you know about an I/O error is the last data the drive ever
gives you short of jtag. NFS is an exception and NFS soft timeouts are
nasty.

Alan


Re: POSIX violation by writeback error

2018-09-24 Thread Alan Cox
> The other thing which you seem to be assuming is that applications
> which care about precious data won't use fsync(2).  And in general,
> it's been fairly well known for decades that if you care about your
> data, you have to use fsync(2) or O_DIRECT writes; and you *must*
> check the error return of both the fsync(2) and the close(2) system
> calls.  Emacs got that right in the mid-1980's --- over 30 years ago.
> We mocked GNOME and KDE's toy notepad applications for getting this
> wrong a decade ago, and they've since fixed it.

That's also because our fsync no longer sucks rocks. It used to be
possible for a box under heavy disk I/O to take minutes to fsync a file
because our disk scheduling was so awful (hours if doing a backup to USB
stick).

The problem I think actually is a bit different. There isn't an

int fbarrier(int fd, ...);

call with more relaxed semantics so that you can say 'what I have done so
far must not be consumed by a reader until we are sure it is stable, but
I don't actually need it to hit disk right now'. That's just a flag on
buffers saying 'if we try to read this, make sure we write it out first',
and the flag is cleared as the buffer hits media in writeback.

All of this is still probabilities. I can do all the fsync's I like,
consume the stable data, cause actions over the network like sending
people goods, and then the server is destroyed by a power surge.
Transactions are a higher level concept and the kernel can't fix that.

Alan


Re: POSIX violation by writeback error

2018-09-24 Thread Alan Cox
> The other thing which you seem to be assuming is that applications
> which care about precious data won't use fsync(2).  And in general,
> it's been fairly well known for decades that if you care about your
> data, you have to use fsync(2) or O_DIRECT writes; and you *must*
> check the error return of both the fsync(2) and the close(2) system
> calls.  Emacs got that right in the mid-1980's --- over 30 years ago.
> We mocked GNOME and KDE's toy notepad applications for getting this
> wrong a decade ago, and they've since fixed it.

That's also because our fsync no longer sucks rocks. It used to be
possible for a box under heavy disk I/O to take minutes to fsync a file
because our disk scheduling was so awful (hours if doing a backup to USB
stick).

The problem I think actually is a bit different. There isn't an

int fbarrier(int fd, ...);

call with more relaxed semantics so that you can say 'what I have done so
far must not be consumed by a reader until we are sure it is stable, but
I don't actually need it to hit disk right now'. That's just a flag on
buffers saying 'if we try to read this, make sure we write it out first',
and the flag is cleared as the buffer hits media in writeback.

All of this is still probabilities. I can do all the fsync's I like,
consume the stable data, cause actions over the network like sending
people goods, and then the server is destroyed by a power surge.
Transactions are a higher level concept and the kernel can't fix that.

Alan


Re: POSIX violation by writeback error

2018-09-24 Thread Alan Cox
On Thu, 6 Sep 2018 11:17:18 +0200
Rogier Wolff  wrote:

> On Thu, Sep 06, 2018 at 12:57:09PM +1000, Dave Chinner wrote:
> > On Wed, Sep 05, 2018 at 02:07:46PM +0200, Rogier Wolff wrote:  
> 
> > > And this has worked for years because
> > > the kernel caches stuff from inodes and data-blocks. If you suddenly
> > > write stuff to harddisk at 10ms for each seek between inode area and
> > > data-area..  
> > 
> > You're assuming an awful lot about filesystem implementation here.
> > Neither ext4, btrfs or XFS issue physical IO like this when flushing
> > data.  
> 
> My thinking is: When fsync (implicit or explicit)  needs to know 
> the result of the underlying IO, it needs to wait for it to have
> happened.

Worse than that. In many cases it needs to wait for the I/O command to
have been accepted and confirmed by the drive, then tell the disk to do a
commit to physical media, then see if that blows up. A confirmation the
disk got the data is not a confirmation that it's stable. Your disk can
also reply from its internal cache with data that will fail to hit the
media a few seconds later.

Given a cache flush on an ATA disk can take 7 seconds I'm not fond of it
8) Fortunately spinning rust is on the way out.

It's even uglier in truth. Spinning rust rewrites sectors under you
by magic without your knowledge and in freaky cases you can have data
turn error that you've not even touched this month. Flash has some
similar behaviour although it can at least use a supercap to do real work.

You can also issue things like a single 16K write and have only the last
8K succeed and the drive report an error, which freaks out some supposedly
robust techniques.

Alan


Re: POSIX violation by writeback error

2018-09-24 Thread Alan Cox
On Thu, 6 Sep 2018 11:17:18 +0200
Rogier Wolff  wrote:

> On Thu, Sep 06, 2018 at 12:57:09PM +1000, Dave Chinner wrote:
> > On Wed, Sep 05, 2018 at 02:07:46PM +0200, Rogier Wolff wrote:  
> 
> > > And this has worked for years because
> > > the kernel caches stuff from inodes and data-blocks. If you suddenly
> > > write stuff to harddisk at 10ms for each seek between inode area and
> > > data-area..  
> > 
> > You're assuming an awful lot about filesystem implementation here.
> > Neither ext4, btrfs or XFS issue physical IO like this when flushing
> > data.  
> 
> My thinking is: When fsync (implicit or explicit)  needs to know 
> the result of the underlying IO, it needs to wait for it to have
> happened.

Worse than that. In many cases it needs to wait for the I/O command to
have been accepted and confirmed by the drive, then tell the disk to do a
commit to physical media, then see if that blows up. A confirmation the
disk got the data is not a confirmation that it's stable. Your disk can
also reply from its internal cache with data that will fail to hit the
media a few seconds later.

Given a cache flush on an ATA disk can take 7 seconds I'm not fond of it
8) Fortunately spinning rust is on the way out.

It's even uglier in truth. Spinning rust rewrites sectors under you
by magic without your knowledge and in freaky cases you can have data
turn error that you've not even touched this month. Flash has some
similar behaviour although it can at least use a supercap to do real work.

You can also issue things like a single 16K write and have only the last
8K succeed and the drive report an error, which freaks out some supposedly
robust techniques.

Alan


Re: arc vendor prefix

2018-09-20 Thread Alan Cox
On Tue, 18 Sep 2018 21:05:20 -0400
Brian Dodge  wrote:

> Hi,
> 
> In the linux kernel commit 91ab076e3a2f092254fe5231bbfa92b37fd52e38 the 
> vendor prefix "arctic" was added to vendor-prefixes.txt.
> 
> The original change I authored used "arc" not "arctic", and the device 
> tree bindings were added assuming that prefix in commit 
> ce9d22573d85341c987f997461ac712e4dbe47b1
> 
> Perhaps odeju changed the name to arctic in the patch after I created it 
> and before it was approved.
> 
> This pairing isn't really compatible is it?  Either the prefix should be 
> reverted to "arc" or the bindings (and driver source) changed to use 
> "arctic", as in "arctic,arc2c0608" vs. "arc,arc2c0608"

ARC is a Synopsys registered trademark for the ARC processor line. I'm not
sure what the right answer is but I think it might be better to keep to
arctic to avoid confusion with arc based products.

Alan


Re: arc vendor prefix

2018-09-20 Thread Alan Cox
On Tue, 18 Sep 2018 21:05:20 -0400
Brian Dodge  wrote:

> Hi,
> 
> In the linux kernel commit 91ab076e3a2f092254fe5231bbfa92b37fd52e38 the 
> vendor prefix "arctic" was added to vendor-prefixes.txt.
> 
> The original change I authored used "arc" not "arctic", and the device 
> tree bindings were added assuming that prefix in commit 
> ce9d22573d85341c987f997461ac712e4dbe47b1
> 
> Perhaps odeju changed the name to arctic in the patch after I created it 
> and before it was approved.
> 
> This pairing isn't really compatible is it?  Either the prefix should be 
> reverted to "arc" or the bindings (and driver source) changed to use 
> "arctic", as in "arctic,arc2c0608" vs. "arc,arc2c0608"

ARC is a Synopsys registered trademark for the ARC processor line. I'm not
sure what the right answer is but I think it might be better to keep to
arctic to avoid confusion with arc based products.

Alan


Re: 32-bit PTI with THP = userspace corruption

2018-09-18 Thread Alan Cox
On Tue, 11 Sep 2018 14:12:22 +0200
Joerg Roedel  wrote:

> On Tue, Sep 11, 2018 at 02:58:10PM +0300, Meelis Roos wrote:
> > The machines where I have PAE off are the ones that have less memory. 
> > PAE is off just for performance reasons, not lack of PAE. PAE should be 
> > present on all of my affected machines anyway and current distributions 
> > seem to mostly assume 686 and PAE anyway for 32-bit systems.  
> 
> Right, most distributions don't even provide a non-PAE kernel for their
> users anymore.
> 
> How big is the performance impact of using PAE over legacy paging?

On what system. In the days of the original 36bit PAE Xeons it was around
10% when we measured it at Red Hat, but that was long ago and as you go
newer it really ought to be vanishingly small.

There are pretty much no machines that don't support PAE and are still
even vaguely able to boot a modern Linux kernel. The oddity is the
Pentium-M but most distros shipped a hack to use PAE on the Pentium M
anyway as it seems to work fine.

Alan



Re: 32-bit PTI with THP = userspace corruption

2018-09-18 Thread Alan Cox
On Tue, 11 Sep 2018 14:12:22 +0200
Joerg Roedel  wrote:

> On Tue, Sep 11, 2018 at 02:58:10PM +0300, Meelis Roos wrote:
> > The machines where I have PAE off are the ones that have less memory. 
> > PAE is off just for performance reasons, not lack of PAE. PAE should be 
> > present on all of my affected machines anyway and current distributions 
> > seem to mostly assume 686 and PAE anyway for 32-bit systems.  
> 
> Right, most distributions don't even provide a non-PAE kernel for their
> users anymore.
> 
> How big is the performance impact of using PAE over legacy paging?

On what system. In the days of the original 36bit PAE Xeons it was around
10% when we measured it at Red Hat, but that was long ago and as you go
newer it really ought to be vanishingly small.

There are pretty much no machines that don't support PAE and are still
even vaguely able to boot a modern Linux kernel. The oddity is the
Pentium-M but most distros shipped a hack to use PAE on the Pentium M
anyway as it seems to work fine.

Alan



Re: [PATCH 0/4] Add specific vt input's key map

2018-09-12 Thread Alan Cox
On Tue, 11 Sep 2018 22:23:55 +0200
Remi Pommarel  wrote:

> This patchset adds a way to have a specific keyboard config (i.e.
> keycode to keysym map) for a vt attached input.

Who actually needs this given that you can't even render most
international symbols in text mode and X and friends already deal with
mapping of keyboards at this degree of fine-ness and even more so ?

> - kbd_detach_conf is a bit clumsy because it tries to copy a shared sparse
>   pointer array without using GFP_ATOMIC.

It's not a performance critical path at least. You could look at
switching it to use RCU but firstly I think the question is whether
anyone cares enough to bother putting it in the upstream kernel ?

Alan


Re: [PATCH 0/4] Add specific vt input's key map

2018-09-12 Thread Alan Cox
On Tue, 11 Sep 2018 22:23:55 +0200
Remi Pommarel  wrote:

> This patchset adds a way to have a specific keyboard config (i.e.
> keycode to keysym map) for a vt attached input.

Who actually needs this given that you can't even render most
international symbols in text mode and X and friends already deal with
mapping of keyboards at this degree of fine-ness and even more so ?

> - kbd_detach_conf is a bit clumsy because it tries to copy a shared sparse
>   pointer array without using GFP_ATOMIC.

It's not a performance critical path at least. You could look at
switching it to use RCU but firstly I think the question is whether
anyone cares enough to bother putting it in the upstream kernel ?

Alan


Re: IRQ number question.

2018-09-03 Thread Alan Cox
On Mon, 3 Sep 2018 19:16:39 +0200
Rogier Wolff  wrote:

> Hi, 
> 
> I'm writing a kernel driver. It is not going to be widely used, so I'm
> not motivated to make things nice enough for inclusion in the standard
> kernel.
> 
> But lspci shows my device: 
> 
> 03:01.0 Serial bus controller [0c80]: Phoenix Contact GmbH & Co. Device 0002 
> (rev b7)
> Flags: bus master, stepping, medium devsel, latency 32, IRQ 14
> I/O ports at e070 [size=16]
> Memory at f7d0 (32-bit, non-prefetchable) [size=256K]
> 
> Now when I start my module and prod the device a bit, it will generate
> an interrupt. (in this case the monitor program needs to start sending
> messages to the card.)
> 
> Then the kernel reports: 
> 
> irq 18: nobody cared (try booting with the "irqpoll" option)
> 
> I've been writing device drivers in the past, but in the past
> when the lspci listed "IRQ 14" then I'd have to request_irq (14, ...

The IRQ number in the PCI configuration space is just a label really for
legacy OS stuff. Nothing actually routes interrupts according to it (*).
If it's coming up as 14 that looks more like the BIOS mislabelled it.
Legacy PCI interrupts care about lines and pins not irq numbers.

Are you looking at values after things like pci_enable_device were called
or before ? Are you also looking at what is in pcidev->irq after the
enable ?
 
> Has this changed? Or is this hardware "odd"/"bad"/"broken" in that it
> initializes the PCI devices wrong? (*)
> 
> My driver now works with the interrupts coming in nicely on IRQ18...
> 
> I have this card where I'm writing my own driver, and another PCI card
> that uses an "included-in-the-kernel" driver, and it too behaves as if
> it doesn't get any interrupts.
> 
>   Roger. 
> 
> (*) Obviously according to everybody "windows works", so could it be
> that modern windows simply activates an irq and polls to see what
> driver handles it? Or something like that? Ah! That would be somewhat
> similar to what "irqpoll" does on Linux!

Does the hardware also support modern INTX interrupts or just the old
legacy ones. Could be Windows is using INTX if so

Alan
(*) PCI doesn't. Certain slightly stoned PCI 'emulations' in some
hardware do.



Re: IRQ number question.

2018-09-03 Thread Alan Cox
On Mon, 3 Sep 2018 19:16:39 +0200
Rogier Wolff  wrote:

> Hi, 
> 
> I'm writing a kernel driver. It is not going to be widely used, so I'm
> not motivated to make things nice enough for inclusion in the standard
> kernel.
> 
> But lspci shows my device: 
> 
> 03:01.0 Serial bus controller [0c80]: Phoenix Contact GmbH & Co. Device 0002 
> (rev b7)
> Flags: bus master, stepping, medium devsel, latency 32, IRQ 14
> I/O ports at e070 [size=16]
> Memory at f7d0 (32-bit, non-prefetchable) [size=256K]
> 
> Now when I start my module and prod the device a bit, it will generate
> an interrupt. (in this case the monitor program needs to start sending
> messages to the card.)
> 
> Then the kernel reports: 
> 
> irq 18: nobody cared (try booting with the "irqpoll" option)
> 
> I've been writing device drivers in the past, but in the past
> when the lspci listed "IRQ 14" then I'd have to request_irq (14, ...

The IRQ number in the PCI configuration space is just a label really for
legacy OS stuff. Nothing actually routes interrupts according to it (*).
If it's coming up as 14 that looks more like the BIOS mislabelled it.
Legacy PCI interrupts care about lines and pins not irq numbers.

Are you looking at values after things like pci_enable_device were called
or before ? Are you also looking at what is in pcidev->irq after the
enable ?
 
> Has this changed? Or is this hardware "odd"/"bad"/"broken" in that it
> initializes the PCI devices wrong? (*)
> 
> My driver now works with the interrupts coming in nicely on IRQ18...
> 
> I have this card where I'm writing my own driver, and another PCI card
> that uses an "included-in-the-kernel" driver, and it too behaves as if
> it doesn't get any interrupts.
> 
>   Roger. 
> 
> (*) Obviously according to everybody "windows works", so could it be
> that modern windows simply activates an irq and polls to see what
> driver handles it? Or something like that? Ah! That would be somewhat
> similar to what "irqpoll" does on Linux!

Does the hardware also support modern INTX interrupts or just the old
legacy ones. Could be Windows is using INTX if so

Alan
(*) PCI doesn't. Certain slightly stoned PCI 'emulations' in some
hardware do.



Re: Contiguous DMA buffer view for a custom device (Intel/x86)

2018-08-20 Thread Alan Cox
> b) IOMMU can solve this problem for me by providing a device-specific
> contiguous view of a fragmented physical memory allocation
> c) In order to enable IOMMU do the above, I need to allocate DRHDs and
> DMARs in BIOS initialization (I build my own BIOS)

Yes. The EDK2 firmware toolkit has all the bits you need in it I think.

https://firmware.intel.com/sites/default/files/Intel_WhitePaper_Using_IOMMU_for_DMA_Protection_in_UEFI.pdf

isn't quite on the topic you want but it does explain it fairly well in
passing.

https://software.intel.com/en-us/blogs/2009/03/02/intels-virtualization-for-directed-io-aka-iommu-part-1

is a bit out of date but may help too.

> Please, let me know if I am on the right track? Of course I realize that
> implementing SGDMA would be the best option, but short of that and blocking
> out some physical memory on boot, what are my options?

If performance is absolutely critical simply stealing a chunk of memory
in the firmware and describing it your device some other way is ugly, but
for a custom solution I guess anything goes 8)

Alan


Re: Contiguous DMA buffer view for a custom device (Intel/x86)

2018-08-20 Thread Alan Cox
> b) IOMMU can solve this problem for me by providing a device-specific
> contiguous view of a fragmented physical memory allocation
> c) In order to enable IOMMU do the above, I need to allocate DRHDs and
> DMARs in BIOS initialization (I build my own BIOS)

Yes. The EDK2 firmware toolkit has all the bits you need in it I think.

https://firmware.intel.com/sites/default/files/Intel_WhitePaper_Using_IOMMU_for_DMA_Protection_in_UEFI.pdf

isn't quite on the topic you want but it does explain it fairly well in
passing.

https://software.intel.com/en-us/blogs/2009/03/02/intels-virtualization-for-directed-io-aka-iommu-part-1

is a bit out of date but may help too.

> Please, let me know if I am on the right track? Of course I realize that
> implementing SGDMA would be the best option, but short of that and blocking
> out some physical memory on boot, what are my options?

If performance is absolutely critical simply stealing a chunk of memory
in the firmware and describing it your device some other way is ugly, but
for a custom solution I guess anything goes 8)

Alan


Re: [PATCH] Fix kexec forbidding kernels signed with custom platform keys to boot

2018-08-17 Thread Alan Cox
> I'm actually not talking about UEFI storage, just the UEFI secure boot
> database.  I think we might come up with a viable model for adding keys
> from a UEFI variable that isn't part of the secure boot database.

You also need to be able to remove or not trust the existing ones
including the Microsoft keys. Not trust probably makes the most sense.

If you trust those you are dead already, because there are existing
signed kernel images and Windows boot images that have known remote ring 0
exploits so I can just boot one of those and own you.

Since you are never going to make a Fedora update blacklist the previous
kernel image it's also not going to get fixed because it's a usability
problem.

> The reason for keeping this boundary is to do with the politics of
> breaches.  If we get a breach to the secure boot boundary, Microsoft
> and all the ODMs will help us hunt it down and plug it (They have no
> option because Windows is threatened by any breach to that boundary). 
> If we use the keys beyond the secure boot boundary and get a breach
> that only affects our use case no-one will help us because no-one will
> care.

Also the politics of control. A world where Red Hat, Microsoft and some
other parties together effectively control who gets to load modules into
a free OS for most users is not a good one - whatever the module licence.

Plus I'd have thought if your lawyers are scared of signing keys they'll
be even more terrified of the competition law impacts of gatekeeping 8)

Alan


Re: [PATCH] Fix kexec forbidding kernels signed with custom platform keys to boot

2018-08-17 Thread Alan Cox
> I'm actually not talking about UEFI storage, just the UEFI secure boot
> database.  I think we might come up with a viable model for adding keys
> from a UEFI variable that isn't part of the secure boot database.

You also need to be able to remove or not trust the existing ones
including the Microsoft keys. Not trust probably makes the most sense.

If you trust those you are dead already, because there are existing
signed kernel images and Windows boot images that have known remote ring 0
exploits so I can just boot one of those and own you.

Since you are never going to make a Fedora update blacklist the previous
kernel image it's also not going to get fixed because it's a usability
problem.

> The reason for keeping this boundary is to do with the politics of
> breaches.  If we get a breach to the secure boot boundary, Microsoft
> and all the ODMs will help us hunt it down and plug it (They have no
> option because Windows is threatened by any breach to that boundary). 
> If we use the keys beyond the secure boot boundary and get a breach
> that only affects our use case no-one will help us because no-one will
> care.

Also the politics of control. A world where Red Hat, Microsoft and some
other parties together effectively control who gets to load modules into
a free OS for most users is not a good one - whatever the module licence.

Plus I'd have thought if your lawyers are scared of signing keys they'll
be even more terrified of the competition law impacts of gatekeeping 8)

Alan


Re: [PATCH] tty: vt_ioctl: fix potential Spectre v1

2018-08-17 Thread Alan Cox
On Thu, 16 Aug 2018 15:30:38 -0500
"Gustavo A. R. Silva"  wrote:

> vsa.console is indirectly controlled by user-space, hence leading to
> a potential exploitation of the Spectre variant 1 vulnerability.
> 
> This issue was detected with the help of Smatch:
> 
> drivers/tty/vt/vt_ioctl.c:711 vt_ioctl() warn: potential spectre issue
> 'vc_cons' [r]
> 
> Fix this by sanitizing vsa.console before using it to index vc_cons
> 
> Notice that given that speculation windows are large, the policy is
> to kill the speculation on the first load and not worry if it can be
> completed with a dependent load/store [1].
> 
> [1] https://marc.info/?l=linux-kernel=152449131114778=2
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Alan Cox 

Alan


Re: [PATCH] tty: vt_ioctl: fix potential Spectre v1

2018-08-17 Thread Alan Cox
On Thu, 16 Aug 2018 15:30:38 -0500
"Gustavo A. R. Silva"  wrote:

> vsa.console is indirectly controlled by user-space, hence leading to
> a potential exploitation of the Spectre variant 1 vulnerability.
> 
> This issue was detected with the help of Smatch:
> 
> drivers/tty/vt/vt_ioctl.c:711 vt_ioctl() warn: potential spectre issue
> 'vc_cons' [r]
> 
> Fix this by sanitizing vsa.console before using it to index vc_cons
> 
> Notice that given that speculation windows are large, the policy is
> to kill the speculation on the first load and not worry if it can be
> completed with a dependent load/store [1].
> 
> [1] https://marc.info/?l=linux-kernel=152449131114778=2
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Alan Cox 

Alan


Re: [PATCH] tty: rocket: Fix possible buffer overwrite on register_PCI

2018-08-02 Thread Alan Cox
On Fri, 27 Jul 2018 16:39:31 +0300
Anton Vasilyev  wrote:

> If number of isa and pci boards exceed NUM_BOARDS on the path
> rp_init()->init_PCI()->register_PCI() then buffer overwrite occurs
> in register_PCI() on assign rcktpt_io_addr[i].
> 
> The patch adds check on upper bound for index of registered
> board in register_PCI.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Anton Vasilyev 
> ---
>  drivers/tty/rocket.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/rocket.c b/drivers/tty/rocket.c
> index bdd17d2aaafd..b121d8f8f3d7 100644
> --- a/drivers/tty/rocket.c
> +++ b/drivers/tty/rocket.c
> @@ -1881,7 +1881,7 @@ static __init int register_PCI(int i, struct pci_dev 
> *dev)
>   ByteIO_t UPCIRingInd = 0;
>  
>   if (!dev || !pci_match_id(rocket_pci_ids, dev) ||
> - pci_enable_device(dev))
> + pci_enable_device(dev) || i >= NUM_BOARDS)
>   return 0;
>  
>   rcktpt_io_addr[i] = pci_resource_start(dev, 0);

This is a real fix but you want to check i >= NUM_BOARDS before you
enable the device

Alan



Re: [PATCH] tty: rocket: Fix possible buffer overwrite on register_PCI

2018-08-02 Thread Alan Cox
On Fri, 27 Jul 2018 16:39:31 +0300
Anton Vasilyev  wrote:

> If number of isa and pci boards exceed NUM_BOARDS on the path
> rp_init()->init_PCI()->register_PCI() then buffer overwrite occurs
> in register_PCI() on assign rcktpt_io_addr[i].
> 
> The patch adds check on upper bound for index of registered
> board in register_PCI.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Anton Vasilyev 
> ---
>  drivers/tty/rocket.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/rocket.c b/drivers/tty/rocket.c
> index bdd17d2aaafd..b121d8f8f3d7 100644
> --- a/drivers/tty/rocket.c
> +++ b/drivers/tty/rocket.c
> @@ -1881,7 +1881,7 @@ static __init int register_PCI(int i, struct pci_dev 
> *dev)
>   ByteIO_t UPCIRingInd = 0;
>  
>   if (!dev || !pci_match_id(rocket_pci_ids, dev) ||
> - pci_enable_device(dev))
> + pci_enable_device(dev) || i >= NUM_BOARDS)
>   return 0;
>  
>   rcktpt_io_addr[i] = pci_resource_start(dev, 0);

This is a real fix but you want to check i >= NUM_BOARDS before you
enable the device

Alan



Re: How to secure erase PCI-E NVME SSD connected via USB3?

2018-08-02 Thread Alan Cox
> # hdparm --user-master u --security-erase p /dev/sda
> (returns immediately and does nothing).
> 
> I've tried hdparm on an SSD connected via USB3 and it secure-erased ok.
> 
> Anyone working on this?

Sounds to me like you need to contact the vendor of the interface in
question. If it accepted a security erase command and didn't do it then
it's broken. It's at liberty to refuse it, or report it doesn't know what
you are talking about, but if it just returned and after re-plugging the
device its still using the old keys then it or the device is busted and
it's not something the OS can do much about.

Alan


Re: How to secure erase PCI-E NVME SSD connected via USB3?

2018-08-02 Thread Alan Cox
> # hdparm --user-master u --security-erase p /dev/sda
> (returns immediately and does nothing).
> 
> I've tried hdparm on an SSD connected via USB3 and it secure-erased ok.
> 
> Anyone working on this?

Sounds to me like you need to contact the vendor of the interface in
question. If it accepted a security erase command and didn't do it then
it's broken. It's at liberty to refuse it, or report it doesn't know what
you are talking about, but if it just returned and after re-plugging the
device its still using the old keys then it or the device is busted and
it's not something the OS can do much about.

Alan


Re: [PATCH v2 3/3] tty: Replace goldfish_tty_line_count with a #define

2018-08-02 Thread Alan Cox
On Tue, 24 Jul 2018 17:51:33 -0700
r...@google.com wrote:

> From: Roman Kiryanov 
> 
> The driver never mutates this variable - no benefits of
> keeping it mutable.
> 
> Signed-off-by: Roman Kiryanov 
> ---
> Changes in v2:
>  - Replaced "const u32" with "#define".

v1 was IMHO definitely better. And you are correct that gcc will not
allocate memory for a static const object unless you do things like take
its address. It will (unlike a define) however properly typecheck
everything.

Alan


  1   2   3   4   5   6   7   8   9   10   >