Re: [GIT PULL] KVM fixes for 4.10 merge window
> On Fri, Dec 16, 2016 at 8:57 AM, Paolo Bonzini wrote: > > > > git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus > > This piece-of-shit branch has obviously never been even compile-tested: > > arch/x86/kernel/kvm.c: In function ‘__kvm_vcpu_is_preempted’: > arch/x86/kernel/kvm.c:596:14: error: ‘struct kvm_steal_time’ has no > member named ‘preempted’ > > where commit b94c3698b4b0 ("Revert "x86/kvm: Support the vCPU > preemption check"") removed the "preempted" field from struct > kvm_steal_time, but you left this in place: > > __visible bool __kvm_vcpu_is_preempted(int cpu) > { > struct kvm_steal_time *src = &per_cpu(steal_time, cpu); > > return !!src->preempted; > } > > And no, that is not a merge artifact in my tree (although that > function did come in from Ingo). That compile failure comes from your > very own branch. Yes, it does. Well, to be honest I did test this (not just compile-test it) but I didn't have KVM guest support turned on, only KVM host support. Sorry, I'll resend it and make sure I do a "make allmodconfig" in the future (and not send pull requests at 6 PM on Friday). Paolo
Re: [GIT PULL] KVM fixes for 4.10 merge window
- Original Message - > From: "Pan Xinhui" > To: "Linus Torvalds" , "Paolo Bonzini" > > Cc: "Linux Kernel Mailing List" , "Radim > Krčmář" , "KVM list" > > Sent: Saturday, December 17, 2016 4:09:16 AM > Subject: Re: [GIT PULL] KVM fixes for 4.10 merge window > > > > 在 2016/12/17 03:42, Linus Torvalds 写道: > > On Fri, Dec 16, 2016 at 8:57 AM, Paolo Bonzini wrote: > >> > >> git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus > > > > This piece-of-shit branch has obviously never been even compile-tested: > > > > arch/x86/kernel/kvm.c: In function ‘__kvm_vcpu_is_preempted’: > > arch/x86/kernel/kvm.c:596:14: error: ‘struct kvm_steal_time’ has no > > member named ‘preempted’ > > > hi, Linus > oh, my bad also. I introduce this struct member and use it in same > patch. > Better to separate tem into two patches. I make one fix patch below. sorry > again. Hi Xinhui, don't worry it's purely my fault. :) > I have known where is th problem, I think if we can set this ->preempted > later after preempted_enable() > or just introduce something like write_guest_nosleep (per cpu memory section > in guest, so there is no page_fault or any other cannot sleep problems)? Yes there is already kvm_read_guest_inatomic, we can add an equivalent one for writes. It will be for 4.11 anyway, so there's time. Paolo
Re: [PATCH] scsi: esas2r: Fix format string type mistakes
On 12/16/2016 10:50 PM, Kees Cook wrote: > diff --git a/drivers/scsi/esas2r/esas2r_ioctl.c > b/drivers/scsi/esas2r/esas2r_ioctl.c > index 3e8483410f61..34976f9a1a10 100644 > --- a/drivers/scsi/esas2r/esas2r_ioctl.c > +++ b/drivers/scsi/esas2r/esas2r_ioctl.c > @@ -1301,7 +1301,7 @@ int esas2r_ioctl_handler(void *hostdata, int cmd, void > __user *arg) > ioctl = kzalloc(sizeof(struct atto_express_ioctl), GFP_KERNEL); > if (ioctl == NULL) { > esas2r_log(ESAS2R_LOG_WARN, > -"ioctl_handler kzalloc failed for %d bytes", > +"ioctl_handler kzalloc failed for %lu bytes", > sizeof(struct atto_express_ioctl)); > return -ENOMEM; > } Please use %zu to format size_t. Bart.
Re: [GIT PULL (resend)] readlink cleanup
On Sat, Dec 17, 2016 at 12:08 AM, Al Viro wrote: > On Fri, Dec 16, 2016 at 11:48:59PM +0100, Miklos Szeredi wrote: > >> This is a rework of the readlink cleanup patchset from the last cycle. Now >> readlink(2) does the following: >> >> - if i_op->readlink() is non-NULL (only proc and afs mountpoints for now) >>then it calls that >> >> - otherwise call i_op->get_link() >> >> - signature of ->readlink() now matches that of ->get_link() >> >> In particular this last bullet point buys us: >> >> - less complexity, because we already handle the delayed free of the >>buffer and copying to userspace due to ->get_link() being the normal way >>to read the symlink > > Less complexity where, exactly? In the caller the life does not become > any simpler - instead of "call ->readlink() and bugger off" you have > "call ->readlink() and go through the same motions as in ->get_link()-based > case". In the instances it becomes _more_ complex. Have you looked? Because in actual fact they don't. Theoretically it's either: - kmalloc + fill + readlink_copy + kfree --> kmalloc + fill + set_delayed_call - declare char[] on stack + fill + readlink_copy --> kmalloc + fill + set_delayed_call Presumably it's the second one you are talking about becoming more complex. There's exactly one instance of that in the tree and it actually becomes cleaner after the change. Current code does: - guess max link size to be 50 (very scientifically I'm sure, but no explanation given) - call filler - hope it didn't get truncated Which becomes: - call filler which allocates correctly sized buffer. > What's more, this new signature for ->readlink() makes no sense - instead of > "symlink traversal does not involve resolving a pathname, so we have to > fake one for readlink(2)" you get something resembling ->get_link(), which > would _not_ function as ->get_link() ought to. But it can be called by the > same codepath that calls ->get_link(), saving us the burden of returning > without doing what ->get_link-based case would - we still get to check if > ->readlink() is there, but we rejoin the common path immediately. And AFAICS > that's the _only_ benefit of that signature change - making it possible to > reuse a few lines that adapt ->get_link() to readlinkat(2) needs. With the signature change we get a consistent interface for reading the contents of symlinks. With that it will never make sense to play the stupid get_ds/set_ds() games that we've had. And no need to duplicate helper functions, like page_readlink() that is exactly the same as page_getlink() only for the different interface. And no need to export readlink_copy() which is something the filesystems never actually want to care about. Having different interfaces for the same thing is going to be more complex. I just don't get it what you are opposed to here. Thanks, Miklos
Re: [PATCH 2/2] iio: adc: hx711: Add IIO driver for AVIA HX711
On Tue, Dec 13, 2016 at 10:02 AM, Andreas Klinger wrote: > This is the IIO driver for AVIA HX711 ADC which ist mostly used in weighting > cells. First off cool that this is finally getting a driver... I'll have to get the SparkFun breakout and really cheap scale to test :). > > The protocol is quite simple and using GPIO's: > One GPIO is used as clock (SCK) while another GPIO is read (DOUT) > > Signed-off-by: Andreas Klinger > --- > drivers/iio/adc/Kconfig | 13 +++ > drivers/iio/adc/Makefile | 1 + > drivers/iio/adc/hx711.c | 269 > +++ > 3 files changed, 283 insertions(+) > create mode 100644 drivers/iio/adc/hx711.c > > diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig > index 932de1f9d1e7..7902b50fcf32 100644 > --- a/drivers/iio/adc/Kconfig > +++ b/drivers/iio/adc/Kconfig > @@ -205,6 +205,19 @@ config HI8435 > This driver can also be built as a module. If so, the module will be > called hi8435. > > +config HX711 > + tristate "AVIA HX711 ADC for weight cells" > + depends on GPIOLIB > + help > + If you say yes here you get support for AVIA HX711 ADC which is used > + for weight cells > + > + This driver uses two GPIO's, one for setting the clock and the other > + one for getting the data > + > + This driver can also be built as a module. If so, the module will be > + called hx711. > + > config INA2XX_ADC > tristate "Texas Instruments INA2xx Power Monitors IIO driver" > depends on I2C && !SENSORS_INA2XX > diff --git a/drivers/iio/adc/Makefile b/drivers/iio/adc/Makefile > index b1aa456e6af3..d46e289900ef 100644 > --- a/drivers/iio/adc/Makefile > +++ b/drivers/iio/adc/Makefile > @@ -21,6 +21,7 @@ obj-$(CONFIG_CC10001_ADC) += cc10001_adc.o > obj-$(CONFIG_DA9150_GPADC) += da9150-gpadc.o > obj-$(CONFIG_EXYNOS_ADC) += exynos_adc.o > obj-$(CONFIG_HI8435) += hi8435.o > +obj-$(CONFIG_HX711) += hx711.o > obj-$(CONFIG_IMX7D_ADC) += imx7d_adc.o > obj-$(CONFIG_INA2XX_ADC) += ina2xx-adc.o > obj-$(CONFIG_LP8788_ADC) += lp8788_adc.o > diff --git a/drivers/iio/adc/hx711.c b/drivers/iio/adc/hx711.c > new file mode 100644 > index ..cbc89e467985 > --- /dev/null > +++ b/drivers/iio/adc/hx711.c > @@ -0,0 +1,269 @@ > +/* > + * HX711: analog to digital converter for weight sensor module > + * > + * Copyright (c) Andreas Klinger > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define HX711_GAIN_32 2 /* gain = 32 for channel B */ > +#define HX711_GAIN_64 3 /* gain = 64 for channel A */ > +#define HX711_GAIN_128 1 /* gain = 128 for channel A */ > + > + > +struct hx711_data { > + struct device *dev; > + dev_t devt; > + struct gpio_desc*gpiod_sck; > + struct gpio_desc*gpiod_dout; > + int gain_pulse; > + struct mutexlock; > +}; > + > +static void hx711_reset(struct hx711_data *hx711_data) > +{ > + int val; > + int i; > + > + val = gpiod_get_value(hx711_data->gpiod_dout); could move the val assignment here to the initialization don't think it will hit 80 chars > + if (val) { move "int i" here to avoid compiler initialization warnings > + dev_warn(hx711_data->dev, "RESET-HX711\n"); > + > + gpiod_set_value(hx711_data->gpiod_sck, 1); > + udelay(80); IIRC this chip has quite a bit of latency thresholds, can't use usleep_range? Single core embedded systems could have an issue with continuous polling. > + gpiod_set_value(hx711_data->gpiod_sck, 0); > + > + for (i = 0; i < 1000; i++) { > + val = gpiod_get_value(hx711_data->gpiod_dout); > + if (!val) > + break; > + /* sleep at least 1 ms*/ > + msleep(1); > + } > + } > +} > + > +static int hx711_cycle(struct hx711_data *hx711_data) > +{ > + int val; > + > + /* if preempted for more then 60us while SCK is high: > +* hx711 is going in reset > +* ==> measuring is false > +*/ > + preempt_disable(); > + gpiod_set_value(hx711_data
Re: [TSN RFC v2 5/9] Add TSN header for the driver
On Fri, Dec 16, 2016 at 11:09:38PM +0100, Richard Cochran wrote: > On Fri, Dec 16, 2016 at 06:59:09PM +0100, hen...@austad.us wrote: > > +/* > > + * List of current subtype fields in the common header of AVTPDU > > + * > > + * Note: AVTPDU is a remnant of the standards from when it was AVB. > > + * > > + * The list has been updated with the recent values from IEEE 1722, draft > > 16. > > + */ > > +enum avtp_subtype { > > + TSN_61883_IIDC = 0, /* IEC 61883/IIDC Format */ > > + TSN_MMA_STREAM, /* MMA Streams */ > > + TSN_AAF,/* AVTP Audio Format */ > > + TSN_CVF,/* Compressed Video Format */ > > + TSN_CRF,/* Clock Reference Format */ > > + TSN_TSCF, /* Time-Synchronous Control Format */ > > + TSN_SVF,/* SDI Video Format */ > > + TSN_RVF,/* Raw Video Format */ > > + /* 0x08 - 0x6D reserved */ > > + TSN_AEF_CONTINOUS = 0x6e, /* AES Encrypted Format Continous */ > > + TSN_VSF_STREAM, /* Vendor Specific Format Stream */ > > + /* 0x70 - 0x7e reserved */ > > + TSN_EF_STREAM = 0x7f, /* Experimental Format Stream */ > > + /* 0x80 - 0x81 reserved */ > > + TSN_NTSCF = 0x82, /* Non Time-Synchronous Control Format */ > > + /* 0x83 - 0xed reserved */ > > + TSN_ESCF = 0xec,/* ECC Signed Control Format */ > > + TSN_EECF, /* ECC Encrypted Control Format */ > > + TSN_AEF_DISCRETE, /* AES Encrypted Format Discrete */ > > + /* 0xef - 0xf9 reserved */ > > + TSN_ADP = 0xfa, /* AVDECC Discovery Protocol */ > > + TSN_AECP, /* AVDECC Enumeration and Control Protocol */ > > + TSN_ACMP, /* AVDECC Connection Management Protocol */ > > + /* 0xfd reserved */ > > + TSN_MAAP = 0xfe,/* MAAP Protocol */ > > + TSN_EF_CONTROL, /* Experimental Format Control */ > > +}; > > The kernel shouldn't be in the business of assembling media packets. No, but assembling the packets and shipping frames to a destination is not neccessarily the same thing. A nice workflow would be to signal to the shim that "I'm sending a compressed video format" and then the shim/tsn_core will ship out the frames over the network - and then you need to set TSN_CVF as subtype in each header. That does not that mean you should do H.264 encode/decode *in* the kernel Perhaps this is better placed in include/uapi/tsn.h so that userspace and kernel share the same header? -- Henrik Austad signature.asc Description: PGP signature
Re: [GIT PULL] kbuild changes for v4.9-rc1
On 12/16/2016, 08:57 PM, Linus Torvalds wrote: > On Fri, Dec 16, 2016 at 11:55 AM, Jiri Slaby wrote: >> >> what happened to this? I had to apply this to fix 4.9-pae kernel here. > > Did you actually have to do that? Yes, disk drivers won't load: [2.141973] virtio_pci: disagrees about version of symbol mcount [2.144415] virtio_pci: Unknown symbol mcount (err -22) [2.164547] virtio_pci: disagrees about version of symbol mcount [2.166309] virtio_pci: Unknown symbol mcount (err -22) [2.180651] virtio_pci: disagrees about version of symbol mcount [2.182823] virtio_pci: Unknown symbol mcount (err -22) [2.210943] virtio_pci: disagrees about version of symbol mcount [2.220097] virtio_pci: Unknown symbol mcount (err -22) [2.220173] ata_piix: disagrees about version of symbol mcount [2.220174] ata_piix: Unknown symbol mcount (err -22) and whole machine gets stuck with systemd waiting for /dev/sd*. > Because a missing CRC shouldn't be fatal in 4.9. > > What was the failure mode? I am not sure what you mean? The kernel is rpm-ized 4.9 vanilla and this is the config: http://kernel.suse.com/cgit/kernel-source/tree/config/i386/default?h=stable thanks, -- js suse labs
Re: [TSN RFC v2 0/9] TSN driver for the kernel
Hi Richard, On Fri, Dec 16, 2016 at 11:05:30PM +0100, Richard Cochran wrote: > On Fri, Dec 16, 2016 at 06:59:04PM +0100, hen...@austad.us wrote: > > The driver is directed via ConfigFS as we need userspace to handle > > stream-reservation (MSRP), discovery and enumeration (IEEE 1722.1) and > > whatever other management is needed. > > I complained about configfs before, but you didn't listen. Yes you did, I remember quite well, and no, I didn't listen :) At the time, there were other issues that I had to address. The configfs-part is fairly isolated. As I tried to explain the last round, the *reason* I've used ConfigFS thus far, is because it makes it pretty easy from userspace to signal the driver to create a new alsa-device. And the reason I haven't changed configfs, is because so far, that part has worked fairly well and have made testing quite easy. At this stage, *this* is what is helpful, not a perfect interface. This does not mean that configfs is set in stone. To clearify: I'm sending out a new set now because, what I have works _fairly_ well for testing and a way to see what you can do with AVB. Using spotify to play music on random machines is quite entertaining. It is by no means -done-, nor do I consider it done. I have been tight on time, and instead of sitting in an office polishing on some code, I thought it better to send out a new (and not done) set of patches so that others could see it still being worked on. If this turned out to be noise-only, I appologize! > > 2 new fields in netdev_ops have been introduced, and the Intel > > igb-driver has been updated (as this an AVB-capable NIC which is > > available as a PCI-e card). > > The igb hacks show that you are on the wrong track. We can and should > be able to support TSN without resorting to driver specific hacks and > module parameters. I was not able to find a sane way to change the mode of the NIC, some of the settings required to enable Qav-mode must be done when bringing the NIC up, so I needed hooks in _probe(). ANother elemnt needed is a way for tsn_core to ascertain if a NIC is capable of TSN or not (this would be ndo_tsn_capable) Then finally, you need to update values in a per-tx-queue manner when a new stream is ready (hence ndo_tsn_link_configure). What you mean by 'driver specific hacks' is not obvious though, TSN requires a set of fairly standardized parameters (priority code points, size of frames to send in a new stream and so on), adding this to the hw-registers in the NIC is an operation that will be common for all TSN-capable NICs. > > Before reading on - this is not even beta, but I'd really appreciate if > > people would comment on the overall architecture and perhaps provide > > some pointers to where I should improve/fix/update > > As I said before about V1, this architecture stinks. I like feedback when it's short, sweet and to the point 2 out of 3 ain't that bad ;) > You appear to have continued hacking along and posted the same design > again. Did you even address any of the points I raised back then? So you did raise a lot of good points the last round, and no, I have not had the time to address them properly. That does not mean I do not *want* to (apart from configfs actually having worked quite nicely thus far and 'shim' being a name I like ;) From the last round of discussion: > 1. A proper userland stack for AVDECC, MAAP, FQTSS, and so on. The >OpenAVB project does not offer much beyond simple examples. Yes, I fully agree, as far as I know, no-one is working on this. That being said, I have not paid much attention the userspace tooling lately. But all of this must be handled in userspace, having avdecc in the kernel would be an utter nightmare :) > 2. A user space audio application that puts it all together, making > use of the services in #1, the linuxptp gPTP service, the ALSA > services, and the network connections. This program will have all > the knowledge about packet formats, AV encodings, and the local HW > capabilities. This program cannot yet be written, as we still need > some kernel work in the audio and networking subsystems. And therein lies the problem. It cannot yet be written, so we have to start in *some* end. And as I repeatedly stated in June, I'm at an RFC here, trying to spark some interest and lure other developers in :) Also, I really do not want a media-application to care about _where_ the frames are going. Sure, I see the issue of configuring a link, but that can be done from _outside_ the media-application. VLC (or aplay, or totem, or .. take your pick) should not have to worry about this. Applications that require finer control over timestamping is easier to adapt to AVB than all the others, I'd rather add special knobs for those who are interested than adding a set of knobs that -all- applications must be aware of. Could be that we are talking about the same thing, just from different perspectives.
Re: [PATCH 3.12 00/38] 3.12.69-stable review
On 12/14/2016, 01:51 AM, Shuah Khan wrote: > Compiled and booted on my test system. No dmesg regressions. On 12/14/2016, 04:42 AM, Guenter Roeck wrote: > Build results: > total: 128 pass: 128 fail: 0 > Qemu test results: > total: 93 pass: 93 fail: 0 > > Details are available at http://kerneltests.org/builders. Thank you both! -- js suse labs
Re: Document accounting of FDs passed over UNIX domain sockets
Hi Willy, On 12/17/2016 08:04 AM, Willy Tarreau wrote: > Hi Michael, > > On Fri, Dec 16, 2016 at 12:08:33PM +0100, Michael Kerrisk (man-pages) wrote: >> Hello Willy, >> >> Your commit 712f4aad406bb1 ("unix: properly account for FDs passed over >> unix sockets" added accounting to ensure that the RLIMIT_NOFILE limit >> could not be bypassed when passing file descriptors across UNIX >> domain sockets. >> >> Such patches should be CCed to linux-...@vger.kernel.org ;-) > > Yes, I learned this after your presentation at kernel recipes, but this > patch pre-dates it ;-) But the note in Documentation/SubmittingPatches predates that ;-) >> A documentation [atch would be great as well, but I had a shot >> at cobbling some text together. Does the text below (for the unix(7) >> man page) look okay? > > I think so, though maybe we can arrange it very slightly given that > this was considered as a fix for a vulnerability and backported to > various kernels : > >>ETOOMANYREFS >> This error can occur for sendmsg(2) when sending a file >> descriptor as ancilary data over a UNIX domain socket (see >> the description of SCM_RIGHTS, above). It occurs if the >> number of "in-flight" file descriptors exceeds the >> RLIMIT_NOFILE resource limit and the caller does not have >> the CAP_SYS_RESOURCE capability.An in-flight file >> descriptor is one that has been sent using sendmsg(2) but >> has not yet been accepted in the recipient process using >> recvmsg(2). >> >> This error is diagnosed since Linux 4.5. In earlier kernel >> versions, it was possible to place an unlimited number of >> file descriptors in flight, by sending each file descriptor >> with sendmsg(2) and then closing the file descriptor so >> that it was not accounted against the RLIMIT_NOFILE >> resource limit. > > - resource limit. > + resource limit. Some older stable kernels might have > + included the same check by backporting the fix from 4.5. > > I've just checked the exact versions containing this, but I don't think > it's worth providing the list, in my opinion mentionning that it could be > observed on some older versions is enough to help developers who see it > in field : > - 3.2.78 > - 3.10.99 > - 3.12.57 > - 3.14.63 > - 3.16.35 > - 3.18.27 > - 4.1.19 > - 4.4.4 Yea. This is a tricky issue that I run into now and then. I've added some different wording that expresses they same idea you intended. Thanks for noting this. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: wl1251 NVS calibration data format
Hi, On Fri, Dec 16, 2016 at 12:01:48PM +0100, Pali Rohár wrote: > Hi! Do you know format of wl1251 NVS calibration data file? > > I found that there is tool for changing NVS file for wl1271 and newer > chips (so not for wl1251!) at: https://github.com/gxk/ti-utils > > And wl1271 has in NVS data already place for MAC address. And in wlcore > (for wl1271 and newer) there is really kernel code which is doing > something with MAC address in NVS, see: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ti/wlcore/boot.c#n352 > > So... I would like to know if in wl1251 NVS calibration file is also > some place for MAC address or not. > > Default wl1251 NVS calibration file is available in linux-firmware: > https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/tree/ti-connectivity/wl1251-nvs.bin Pandora people [0] have a description of the format at [1]. [0] https://pandorawiki.org/WiFi [1] http://notaz.gp2x.de/misc/pnd/wl1251/nvs_map.txt -- Sebastian signature.asc Description: PGP signature
Re: [GIT PULL] kbuild changes for v4.9-rc1
On Sat, Dec 17, 2016 at 09:57:47AM +0100, Jiri Slaby wrote: > On 12/16/2016, 08:57 PM, Linus Torvalds wrote: > > On Fri, Dec 16, 2016 at 11:55 AM, Jiri Slaby wrote: > >> > >> what happened to this? I had to apply this to fix 4.9-pae kernel here. > > > > Did you actually have to do that? > > Yes, disk drivers won't load: > [2.141973] virtio_pci: disagrees about version of symbol mcount > [2.144415] virtio_pci: Unknown symbol mcount (err -22) > and whole machine gets stuck with systemd waiting for /dev/sd*. > > > Because a missing CRC shouldn't be fatal in 4.9. Most of us get just a scary-looking warning, but whatever the problem is for you, it's good to hear this patch works around it. Whatever the long-term solution will be, for 4.10 an updated[1] version of this fix is on kbuild/kbuild (and kbuild/for-next). I guess we'll bother stable@ once it is merged. Note that it handles only x86, there's a bunch of other architectures affected, alpha m68k s390 sparc ia64 might still need fixing. Meow! [1]. Turns out there was a missing symbol on 486; people build-test those but don't try to actually boot, and even when they do, they don't read warnings. -- Autotools hint: to do a zx-spectrum build on a pdp11 host, type: ./configure --host=zx-spectrum --build=pdp11
Re: [PATCH] ALSA: use designated initializers
On Dec 17 2016 09:59, Kees Cook wrote: Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Signed-off-by: Kees Cook --- sound/synth/emux/emux_seq.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) Reviewed-by: Takashi Sakamoto diff --git a/sound/synth/emux/emux_seq.c b/sound/synth/emux/emux_seq.c index a0209204ae48..55579f6b8cb2 100644 --- a/sound/synth/emux/emux_seq.c +++ b/sound/synth/emux/emux_seq.c @@ -33,13 +33,13 @@ static int snd_emux_unuse(void *private_data, struct snd_seq_port_subscribe *inf * MIDI emulation operators */ static struct snd_midi_op emux_ops = { - snd_emux_note_on, - snd_emux_note_off, - snd_emux_key_press, - snd_emux_terminate_note, - snd_emux_control, - snd_emux_nrpn, - snd_emux_sysex, + .note_on = snd_emux_note_on, + .note_off = snd_emux_note_off, + .key_press = snd_emux_key_press, + .note_terminate = snd_emux_terminate_note, + .control = snd_emux_control, + .nrpn = snd_emux_nrpn, + .sysex = snd_emux_sysex, }; Regards Takashi Sakamoto
Re: Revised request_key(2) man page for review
Hello David, On 12/15/2016 11:10 AM, David Howells wrote: > Michael Kerrisk (man-pages) wrote: > >>>│Is 'keyring' allowed to be 0? Reading the source, it │ >>>│appears so. In this case, by default, the key is │ >>>│assigned to the session keyring. But, the │ >>>│KEYCTL_SET_REQKEY_KEYRING also seems to have an │ >>>│influence here. What are the details here? │ > > Yes, the destination keyring can be 0. If you don't specify a destination > keyring, then: > > (1) If the key is found to already exist, the serial number is returned, but > no extra link is made. > > (2) If an error occurs other than "this key doesn't exist", then you'll just > get the error. > > (3) If we have to construct a new key, this will be attached to the default > keyring (as there's no destination keyring to attach to). Okay. Please take a look at the revised text that I'll send out after applying Eugene's patch. (Mail in a few minutes.) >>># echo 'create user mtk:* * /bin/keyctl instantiate %k %c %S' \ >>> > /etc/request-keys.conf > > There's a /etc/request-keys.d/ directory now. Yes, I'm aware. Did you mean I should fix something on this page? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
[PATCH] block: loose check on sg gap
If the last bvec of the 1st bio and the 1st bvec of the next bio are contineous physically, and the latter can be merged to last segment of the 1st bio, we should think they don't violate sg gap(or virt boundary) limit. Both Vitaly and Dexuan reported lots of unmergeable small bios are observed when running mkfs on Hyper-V virtual storage, and performance becomes quite low, so this patch is figured out for fixing the performance issue. The same issue should exist on NVMe too sine it sets virt boundary too. Reported-by: Vitaly Kuznetsov Reported-by: Dexuan Cui Tested-by: Dexuan Cui Cc: Keith Busch Signed-off-by: Ming Lei --- include/linux/blkdev.h | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 286b2a264383..1ce26e771bcc 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1608,6 +1608,25 @@ static inline bool bvec_gap_to_prev(struct request_queue *q, return __bvec_gap_to_prev(q, bprv, offset); } +/* + * Check if the two bvecs from two bios can be merged to one segment. + * If yes, no need to check gap between the two bios since the 1st bio + * and the 1st bvec in the 2nd bio can be handled in one segment. + */ +static inline bool bios_segs_mergeable(struct request_queue *q, + struct bio *prev, struct bio_vec *prev_last_bv, + struct bio_vec *next_first_bv) +{ + if (!BIOVEC_PHYS_MERGEABLE(prev_last_bv, next_first_bv)) + return false; + if (!BIOVEC_SEG_BOUNDARY(q, prev_last_bv, next_first_bv)) + return false; + if (prev->bi_seg_back_size + next_first_bv->bv_len > + queue_max_segment_size(q)) + return false; + return true; +} + static inline bool bio_will_gap(struct request_queue *q, struct bio *prev, struct bio *next) { @@ -1617,7 +1636,8 @@ static inline bool bio_will_gap(struct request_queue *q, struct bio *prev, bio_get_last_bvec(prev, &pb); bio_get_first_bvec(next, &nb); - return __bvec_gap_to_prev(q, &pb, nb.bv_offset); + if (!bios_segs_mergeable(q, prev, &pb, &nb)) + return __bvec_gap_to_prev(q, &pb, nb.bv_offset); } return false; -- 2.7.4
Re: [PATCH v1 2/2] firmware: dmi_scan: Pass dmi_entry_point to kexec'ed kernel
On 12/16/16 at 02:18pm, Andy Shevchenko wrote: > On Fri, 2016-12-16 at 10:32 +0800, Dave Young wrote: > > On 12/15/16 at 12:28pm, Jean Delvare wrote: > > > Hi Andy, > > > > > > On Fri, 2 Dec 2016 21:54:16 +0200, Andy Shevchenko wrote: > > > > Until now kexec'ed kernel has no clue where to look for DMI entry > > > > point. > > > > > > > > Pass it via kernel command line parameter in the same way as it's > > > > done for ACPI > > > > RSDP. > > > > > > I am no kexec expert but this confuses me. Shouldn't the second > > > kernel > > > have access to the EFI systab as the first kernel does? It includes > > > many more pointers than just ACPI and DMI tables, and it would seem > > > inconvenient to have to pass all these addresses individually > > > explicitly. > > > > Yes, in modern linux kernel, kexec has the support for EFI, I think it > > should work naturally at least in x86_64. > > Thanks for this good news! > > Unfortunately Intel Galileo is 32-bit platform. Maybe you can try use efi=noruntime kernel parameter in kexec/kdump kernel, see if it works or not. > > -- > Andy Shevchenko > Intel Finland Oy
Re: [PATCH v1 2/2] firmware: dmi_scan: Pass dmi_entry_point to kexec'ed kernel
Ccing efi people. On 12/16/16 at 02:33pm, Jean Delvare wrote: > On Fri, 16 Dec 2016 14:18:58 +0200, Andy Shevchenko wrote: > > On Fri, 2016-12-16 at 10:32 +0800, Dave Young wrote: > > > On 12/15/16 at 12:28pm, Jean Delvare wrote: > > > > I am no kexec expert but this confuses me. Shouldn't the second > > > > kernel have access to the EFI systab as the first kernel does? It > > > > includes many more pointers than just ACPI and DMI tables, and it > > > > would seem inconvenient to have to pass all these addresses > > > > individually explicitly. > > > > > > Yes, in modern linux kernel, kexec has the support for EFI, I think it > > > should work naturally at least in x86_64. > > > > Thanks for this good news! > > > > Unfortunately Intel Galileo is 32-bit platform. > > If it was done for X86_64 then maybe it can be generalized to X86? For X86_64, we have a new way for efi runtime memmory mapping, in i386 code it still use old ioremap way. It is impossible to use same way as the X86_64 since the virtual address space is limited. But maybe for 32bit, kexec kernel can run in physical mode, but I'm not sure, I would suggest Andy to do a test first with efi=noruntime for kexec 2nd kernel. Thanks Dave > > -- > Jean Delvare > SUSE L3 Support
Re: wl1251 NVS calibration data format
On Saturday 17 December 2016 10:37:05 Sebastian Reichel wrote: > Hi, > > On Fri, Dec 16, 2016 at 12:01:48PM +0100, Pali Rohár wrote: > > Hi! Do you know format of wl1251 NVS calibration data file? > > > > I found that there is tool for changing NVS file for wl1271 and > > newer chips (so not for wl1251!) at: > > https://github.com/gxk/ti-utils > > > > And wl1271 has in NVS data already place for MAC address. And in > > wlcore (for wl1271 and newer) there is really kernel code which is > > doing something with MAC address in NVS, see: > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tre > > e/drivers/net/wireless/ti/wlcore/boot.c#n352 > > > > So... I would like to know if in wl1251 NVS calibration file is > > also some place for MAC address or not. > > > > Default wl1251 NVS calibration file is available in linux-firmware: > > https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmwar > > e.git/tree/ti-connectivity/wl1251-nvs.bin > > Pandora people [0] have a description of the format at [1]. > > [0] https://pandorawiki.org/WiFi > [1] http://notaz.gp2x.de/misc/pnd/wl1251/nvs_map.txt Thank you very very much! I tried to search for something, but I have not find anything. In that description is something about STA mac address: 01a 6d //STA_ADDR_L Register Address. (STA MAC Address) 01b 54 // 01c 00 //STA_ADDR_L Register 01d 00 // 01e 32 // 01f 28 // 020 00 //STA_ADDR_H Register Data. STA would be abbreviation for station and so it should be really set to mac address of that chip? If yes, that could allow us to set permanent MAC address at time when loading & sending NVS calibration data... Exactly same as wl1271 and new drivers are working. I will try to play with driver if it is really truth! I already looked into original TI's multiplatform HAL driver for wl1251 chip (big mess) and found there that there is wl1251 command to read mac address from chip. It could be done by this wl1251 function: wl1251_cmd_interrogate(wl, DOT11_STATION_ID, mac, sizeof(*mac)) (same id as for setting permanent mac address, but opposite to read it) -- Pali Rohár pali.ro...@gmail.com signature.asc Description: This is a digitally signed message part.
Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Michal Hocko wrote: > On Fri 16-12-16 12:31:51, Johannes Weiner wrote: >>> @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int >>> order, >>> */ >>> WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER); >>> >>> + /* >>> +* Help non-failing allocations by giving them access to memory >>> +* reserves but do not use ALLOC_NO_WATERMARKS because this >>> +* could deplete whole memory reserves which would just make >>> +* the situation worse >>> +*/ >>> + page = __alloc_pages_cpuset_fallback(gfp_mask, order, >>> ALLOC_HARDER, ac); >>> + if (page) >>> + goto got_pg; >>> + >> >> But this should be a separate patch, IMO. >> >> Do we observe GFP_NOFS lockups when we don't do this? > > this is hard to tell but considering users like grow_dev_page we can get > stuck with a very slow progress I believe. Those allocations could see > some help. > >> Don't we risk >> premature exhaustion of the memory reserves, and it's better to wait >> for other reclaimers to make some progress instead? > > waiting for other reclaimers would be preferable but we should at least > give these some priority, which is what ALLOC_HARDER should help with. > >> Should we give >> reserve access to all GFP_NOFS allocations, or just the ones from a >> reclaim/cleaning context? > > I would focus only for those which are important enough. Which are those > is a harder question. But certainly those with GFP_NOFAIL are important > enough. > >> All that should go into the changelog of a separate allocation booster >> patch, I think. > > The reason I did both in the same patch is to address the concern about > potential lockups when NOFS|NOFAIL cannot make any progress. I've chosen > ALLOC_HARDER to give the minimum portion of the reserves so that we do > not risk other high priority users to be blocked out but still help a > bit at least and prevent from starvation when other reclaimers are > faster to consume the reclaimed memory. > > I can extend the changelog of course but I believe that having both > changes together makes some sense. NOFS|NOFAIL allocations are not all > that rare and sometimes we really depend on them making a further > progress. > I feel that allowing access to memory reserves based on __GFP_NOFAIL might not make sense. My understanding is that actual I/O operation triggered by I/O requests by filesystem code are processed by other threads. Even if we grant access to memory reserves to GFP_NOFS | __GFP_NOFAIL allocations by fs code, I think that it is possible that memory allocations by underlying bio code fails to make a further progress unless memory reserves are granted as well. Below is a typical trace which I observe under OOM lockuped situation (though this trace is from an OOM stress test using XFS). [ 1845.187246] MemAlloc: kworker/2:1(14498) flags=0x4208060 switches=323636 seq=48 gfp=0x240(GFP_NOIO) order=0 delay=430400 uninterruptible [ 1845.187248] kworker/2:1 D12712 14498 2 0x0080 [ 1845.187251] Workqueue: events_freezable_power_ disk_events_workfn [ 1845.187252] Call Trace: [ 1845.187253] ? __schedule+0x23f/0xba0 [ 1845.187254] schedule+0x38/0x90 [ 1845.187255] schedule_timeout+0x205/0x4a0 [ 1845.187256] ? del_timer_sync+0xd0/0xd0 [ 1845.187257] schedule_timeout_uninterruptible+0x25/0x30 [ 1845.187258] __alloc_pages_nodemask+0x1035/0x10e0 [ 1845.187259] ? alloc_request_struct+0x14/0x20 [ 1845.187261] alloc_pages_current+0x96/0x1b0 [ 1845.187262] ? bio_alloc_bioset+0x20f/0x2e0 [ 1845.187264] bio_copy_kern+0xc4/0x180 [ 1845.187265] blk_rq_map_kern+0x6f/0x120 [ 1845.187268] __scsi_execute.isra.23+0x12f/0x160 [ 1845.187270] scsi_execute_req_flags+0x8f/0x100 [ 1845.187271] sr_check_events+0xba/0x2b0 [sr_mod] [ 1845.187274] cdrom_check_events+0x13/0x30 [cdrom] [ 1845.187275] sr_block_check_events+0x25/0x30 [sr_mod] [ 1845.187276] disk_check_events+0x5b/0x150 [ 1845.187277] disk_events_workfn+0x17/0x20 [ 1845.187278] process_one_work+0x1fc/0x750 [ 1845.187279] ? process_one_work+0x167/0x750 [ 1845.187279] worker_thread+0x126/0x4a0 [ 1845.187280] kthread+0x10a/0x140 [ 1845.187281] ? process_one_work+0x750/0x750 [ 1845.187282] ? kthread_create_on_node+0x60/0x60 [ 1845.187283] ret_from_fork+0x2a/0x40 I think that this GFP_NOIO allocation request needs to consume more memory reserves than GFP_NOFS allocation request to make progress. Do we want to add __GFP_NOFAIL to this GFP_NOIO allocation request in order to allow access to memory reserves as well as GFP_NOFS | __GFP_NOFAIL allocation request?
Re: [RFC] minimum gcc version for kernel: raise to gcc-4.3 or 4.6?
On 2016-12-16 23:00:27 [+0100], Arnd Bergmann wrote: > On Friday, December 16, 2016 6:00:43 PM CET Sebastian Andrzej Siewior wrote: > > On 2016-12-16 11:56:21 [+0100], Arnd Bergmann wrote: > > > The original gcc-4.3 release was in early 2008. If we decide to still > > > support that, we probably want the first 10 quirks in this series, > > > while gcc-4.6 (released in 2011) requires none of them. > > > > It this min gcc thingy ARM only? > > This is part of the question that I'm trying to figure out myself. > > Clearly having the same minimum version across all architectures simplifies > things a lot, because many of the bugs in old versions are architecture > independent. agreed. > Then again, some architectures implicitly require a new version > because an old one never existed (e.g. arm64 or risc-v), while some other > architectures may require an old version. A new version is understandable. But why is an old version required? One thing is an enterprise distro that is "current" or "supported" and still stuck with gcc 4.1 because that is the version they decided to include in their release. This is sad. But you might want to ask yourself why you want the latest kernel but an old gcc / binutils. If you have an architecture that compiles with gcc v4.1 and not with gcc latest stable / trunk then it is a sign that this port is not supported properly / not heatly. One thing is something like avr32 which is not part of upstream gcc due to some legal reason (that was my understanding a few years ago). It might get to a problem for them once large parts of userland switch to a later C++ standard which is gcc-5+. > Arnd Sebastian
[PATCH] drivers: remoteproc: constify rproc_ops structures
Declare rproc_ops structures as const as they are only passed as an argument to the function rproc_alloc. This argument is of type const, so rproc_ops structures having this property can be declared const too. Done using Coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct rproc_ops i@p = {...}; @ok1@ identifier r1.i; position p; @@ rproc_alloc(...,&i@p,...) @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct rproc_ops i; File sizes before: textdata bss dec hex filename 1258 416 01674 68a remoteproc/omap_remoteproc.o 2402 240 02642 a52 remoteproc/st_remoteproc.o 2064 272 02336 920 remoteproc/st_slim_rproc.o 2160 240 02400 960 remoteproc/wkup_m3_rproc.o File sizes after: textdata bss dec hex filename 1297 368 01665 681 remoteproc/omap_remoteproc.o 2434 192 02626 a42 remoteproc/st_remoteproc.o 2112 240 02352 930 remoteproc/st_slim_rproc.o 2200 192 02392 958 remoteproc/wkup_m3_rproc.o Signed-off-by: Bhumika Goyal --- drivers/remoteproc/omap_remoteproc.c | 2 +- drivers/remoteproc/st_remoteproc.c | 2 +- drivers/remoteproc/st_slim_rproc.c | 2 +- drivers/remoteproc/wkup_m3_rproc.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/remoteproc/omap_remoteproc.c b/drivers/remoteproc/omap_remoteproc.c index fa63bf2..a96ce90 100644 --- a/drivers/remoteproc/omap_remoteproc.c +++ b/drivers/remoteproc/omap_remoteproc.c @@ -177,7 +177,7 @@ static int omap_rproc_stop(struct rproc *rproc) return 0; } -static struct rproc_ops omap_rproc_ops = { +static const struct rproc_ops omap_rproc_ops = { .start = omap_rproc_start, .stop = omap_rproc_stop, .kick = omap_rproc_kick, diff --git a/drivers/remoteproc/st_remoteproc.c b/drivers/remoteproc/st_remoteproc.c index da4e152..f21787b 100644 --- a/drivers/remoteproc/st_remoteproc.c +++ b/drivers/remoteproc/st_remoteproc.c @@ -107,7 +107,7 @@ static int st_rproc_stop(struct rproc *rproc) return sw_err ?: pwr_err; } -static struct rproc_ops st_rproc_ops = { +static const struct rproc_ops st_rproc_ops = { .start = st_rproc_start, .stop = st_rproc_stop, }; diff --git a/drivers/remoteproc/st_slim_rproc.c b/drivers/remoteproc/st_slim_rproc.c index 507716c..6cfd862 100644 --- a/drivers/remoteproc/st_slim_rproc.c +++ b/drivers/remoteproc/st_slim_rproc.c @@ -200,7 +200,7 @@ static void *slim_rproc_da_to_va(struct rproc *rproc, u64 da, int len) return va; } -static struct rproc_ops slim_rproc_ops = { +static const struct rproc_ops slim_rproc_ops = { .start = slim_rproc_start, .stop = slim_rproc_stop, .da_to_va = slim_rproc_da_to_va, diff --git a/drivers/remoteproc/wkup_m3_rproc.c b/drivers/remoteproc/wkup_m3_rproc.c index 18175d0..1ada0e5 100644 --- a/drivers/remoteproc/wkup_m3_rproc.c +++ b/drivers/remoteproc/wkup_m3_rproc.c @@ -111,7 +111,7 @@ static void *wkup_m3_rproc_da_to_va(struct rproc *rproc, u64 da, int len) return va; } -static struct rproc_ops wkup_m3_rproc_ops = { +static const struct rproc_ops wkup_m3_rproc_ops = { .start = wkup_m3_rproc_start, .stop = wkup_m3_rproc_stop, .da_to_va = wkup_m3_rproc_da_to_va, -- 1.9.1
Re: [PATCH 32/60] block: implement sp version of bvec iterator helpers
Hi Guys, On Sat, Oct 29, 2016 at 7:06 PM, kbuild test robot wrote: > Hi Ming, Thanks for the report! > > [auto build test ERROR on linus/master] > [also build test ERROR on v4.9-rc2 next-20161028] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for > convenience) to record what (public, well-known) commit your patch series was > built on] > [Check https://git-scm.com/docs/git-format-patch for more information] > > url: > https://github.com/0day-ci/linux/commits/Ming-Lei/block-support-multipage-bvec/20161029-163910 > config: sparc-defconfig (attached as .config) > compiler: sparc-linux-gcc (GCC) 6.2.0 > reproduce: > wget > https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross > -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=sparc > > All error/warnings (new ones prefixed by >>): > >In file included from arch/sparc/include/asm/oplib.h:6:0, > from arch/sparc/include/asm/pgtable_32.h:21, > from arch/sparc/include/asm/pgtable.h:6, > from include/linux/mm.h:68, This issue should be caused by somewhere in sparc arch, and this patch only adds '#include ' to 'include/linux/bvec.h' for using nth_page(). So Cc sparc list. Thanks, Ming > from include/linux/bvec.h:25, > from include/linux/blk_types.h:9, > from include/linux/fs.h:31, > from include/linux/proc_fs.h:8, > from arch/sparc/include/asm/prom.h:22, > from include/linux/of.h:232, > from arch/sparc/include/asm/openprom.h:14, > from arch/sparc/include/asm/device.h:9, > from include/linux/device.h:30, > from include/linux/node.h:17, > from include/linux/cpu.h:16, > from include/linux/stop_machine.h:4, > from kernel/sched/sched.h:10, > from kernel/sched/loadavg.c:11: >>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct >>> linux_prom_registers' declared inside parameter list will not be visible >>> outside of this definition or declaration > int prom_startcpu(int cpunode, struct linux_prom_registers *context_table, > ^~~~ >arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct > linux_prom_registers' declared inside parameter list will not be visible > outside of this definition or declaration > void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int > nregs); >^~~~ >arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct > linux_prom_registers' declared inside parameter list will not be visible > outside of this definition or declaration > struct linux_prom_registers *sbusregs, int nregs); > ^~~~ > -- >In file included from arch/sparc/include/asm/oplib.h:6:0, > from arch/sparc/include/asm/pgtable_32.h:21, > from arch/sparc/include/asm/pgtable.h:6, > from include/linux/mm.h:68, > from include/linux/bvec.h:25, > from include/linux/blk_types.h:9, > from include/linux/fs.h:31, > from include/linux/proc_fs.h:8, > from arch/sparc/include/asm/prom.h:22, > from include/linux/of.h:232, > from arch/sparc/include/asm/openprom.h:14, > from arch/sparc/prom/mp.c:12: >>> arch/sparc/include/asm/oplib_32.h:105:39: warning: 'struct >>> linux_prom_registers' declared inside parameter list will not be visible >>> outside of this definition or declaration > int prom_startcpu(int cpunode, struct linux_prom_registers *context_table, > ^~~~ >arch/sparc/include/asm/oplib_32.h:168:36: warning: 'struct > linux_prom_registers' declared inside parameter list will not be visible > outside of this definition or declaration > void prom_apply_obio_ranges(struct linux_prom_registers *obioregs, int > nregs); >^~~~ >arch/sparc/include/asm/oplib_32.h:172:18: warning: 'struct > linux_prom_registers' declared inside parameter list will not be visible > outside of this definition or declaration > struct linux_prom_registers *sbusregs, int nregs); > ^~~~ >>> arch/sparc/prom/mp.c:23:1: error: conflicting types for 'prom_startcpu' > prom_startcpu(int cpunode, struct linux_prom_r
[tip:x86/urgent] x86/mpx: Move bd_addr to mm_context_t
Commit-ID: cb02de96ec724b84373488dd349e53897ab432f5 Gitweb: http://git.kernel.org/tip/cb02de96ec724b84373488dd349e53897ab432f5 Author: Mark Rutland AuthorDate: Fri, 16 Dec 2016 12:40:55 + Committer: Thomas Gleixner CommitDate: Sat, 17 Dec 2016 12:29:56 +0100 x86/mpx: Move bd_addr to mm_context_t Currently bd_addr lives in mm_struct, which is otherwise architecture independent. Architecture-specific data is supposed to live within mm_context_t (itself contained in mm_struct). Other x86-specific context like the pkey accounting data lives in mm_context_t, and there's no readon the MPX data can't also live there. So as to keep the arch-specific data togather, and to set a good example for others, this patch moves bd_addr into x86's mm_context_t. Signed-off-by: Mark Rutland Acked-by: Dave Hansen Cc: Andrew Morton Link: http://lkml.kernel.org/r/1481892055-24596-1-git-send-email-mark.rutl...@arm.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/mmu.h | 4 arch/x86/include/asm/mpx.h | 4 ++-- arch/x86/mm/mpx.c | 10 +- include/linux/mm_types.h | 4 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index 72198c6..f9813b6 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -31,6 +31,10 @@ typedef struct { u16 pkey_allocation_map; s16 execute_only_pkey; #endif +#ifdef CONFIG_X86_INTEL_MPX + /* address of the bounds directory */ + void __user *bd_addr; +#endif } mm_context_t; #ifdef CONFIG_SMP diff --git a/arch/x86/include/asm/mpx.h b/arch/x86/include/asm/mpx.h index 7a35495..0b416d4 100644 --- a/arch/x86/include/asm/mpx.h +++ b/arch/x86/include/asm/mpx.h @@ -59,7 +59,7 @@ siginfo_t *mpx_generate_siginfo(struct pt_regs *regs); int mpx_handle_bd_fault(void); static inline int kernel_managing_mpx_tables(struct mm_struct *mm) { - return (mm->bd_addr != MPX_INVALID_BOUNDS_DIR); + return (mm->context.bd_addr != MPX_INVALID_BOUNDS_DIR); } static inline void mpx_mm_init(struct mm_struct *mm) { @@ -67,7 +67,7 @@ static inline void mpx_mm_init(struct mm_struct *mm) * NULL is theoretically a valid place to put the bounds * directory, so point this at an invalid address. */ - mm->bd_addr = MPX_INVALID_BOUNDS_DIR; + mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR; } void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start, unsigned long end); diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c index e4f8009..324e571 100644 --- a/arch/x86/mm/mpx.c +++ b/arch/x86/mm/mpx.c @@ -350,12 +350,12 @@ int mpx_enable_management(void) * The copy_xregs_to_kernel() beneath get_xsave_field_ptr() is * expected to be relatively expensive. Storing the bounds * directory here means that we do not have to do xsave in the -* unmap path; we can just use mm->bd_addr instead. +* unmap path; we can just use mm->context.bd_addr instead. */ bd_base = mpx_get_bounds_dir(); down_write(&mm->mmap_sem); - mm->bd_addr = bd_base; - if (mm->bd_addr == MPX_INVALID_BOUNDS_DIR) + mm->context.bd_addr = bd_base; + if (mm->context.bd_addr == MPX_INVALID_BOUNDS_DIR) ret = -ENXIO; up_write(&mm->mmap_sem); @@ -370,7 +370,7 @@ int mpx_disable_management(void) return -ENXIO; down_write(&mm->mmap_sem); - mm->bd_addr = MPX_INVALID_BOUNDS_DIR; + mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR; up_write(&mm->mmap_sem); return 0; } @@ -947,7 +947,7 @@ static int try_unmap_single_bt(struct mm_struct *mm, end = bta_end_vaddr; } - bde_vaddr = mm->bd_addr + mpx_get_bd_entry_offset(mm, start); + bde_vaddr = mm->context.bd_addr + mpx_get_bd_entry_offset(mm, start); ret = get_bt_addr(mm, bde_vaddr, &bt_addr); /* * No bounds table there, so nothing to unmap. diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4a8aced..ce70ceb 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -508,10 +508,6 @@ struct mm_struct { bool tlb_flush_pending; #endif struct uprobes_state uprobes_state; -#ifdef CONFIG_X86_INTEL_MPX - /* address of the bounds directory */ - void __user *bd_addr; -#endif #ifdef CONFIG_HUGETLB_PAGE atomic_long_t hugetlb_usage; #endif
RE: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
> On Fri, Dec 16, 2016 at 01:12:21AM +, Li, Liang Z wrote: > > There still exist the case if the MAX_ORDER is configured to a large > > value, e.g. 36 for a system with huge amount of memory, then there is only > 28 bits left for the pfn, which is not enough. > > Not related to the balloon but how would it help to set MAX_ORDER to 36? > My point here is MAX_ORDER may be configured to a big value. > What the MAX_ORDER affects is that you won't be able to ask the kernel > page allocator for contiguous memory bigger than 1<<(MAX_ORDER-1), but > that's a driver issue not relevant to the amount of RAM. Drivers won't > suddenly start to ask the kernel allocator to allocate compound pages at > orders >= 11 just because more RAM was added. > > The higher the MAX_ORDER the slower the kernel runs simply so the smaller > the MAX_ORDER the better. > > > Should we limit the MAX_ORDER? I don't think so. > > We shouldn't strictly depend on MAX_ORDER value but it's mostly limited > already even if configurable at build time. > I didn't know that and will take a look, thanks for your information. Liang > We definitely need it to reach at least the hugepage size, then it's mostly > driver issue, but drivers requiring large contiguous allocations should rely > on > CMA only or vmalloc if they only require it virtually contiguous, and not rely > on larger MAX_ORDER that would slowdown all kernel allocations/freeing.
Re: [PATCH] staging: android: ion: return -ENOMEM in ion_cma_heap allocation failure
2016-12-14 1:04 GMT+09:00 Laura Abbott : > On 12/08/2016 09:05 PM, Jaewon Kim wrote: >> Initial Commit 349c9e138551 ("gpu: ion: add CMA heap") returns -1 in >> allocation >> failure. The returned value is passed up to userspace through ioctl. So user >> can >> misunderstand error reason as -EPERM(1) rather than -ENOMEM(12). >> >> This patch simply changed this to return -ENOMEM. >> >> Signed-off-by: Jaewon Kim >> --- >> drivers/staging/android/ion/ion_cma_heap.c | 6 ++ >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/staging/android/ion/ion_cma_heap.c >> b/drivers/staging/android/ion/ion_cma_heap.c >> index 6c7de74..22b9582 100644 >> --- a/drivers/staging/android/ion/ion_cma_heap.c >> +++ b/drivers/staging/android/ion/ion_cma_heap.c >> @@ -24,8 +24,6 @@ >> #include "ion.h" >> #include "ion_priv.h" >> >> -#define ION_CMA_ALLOCATE_FAILED -1 >> - >> struct ion_cma_heap { >> struct ion_heap heap; >> struct device *dev; >> @@ -59,7 +57,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct >> ion_buffer *buffer, >> >> info = kzalloc(sizeof(struct ion_cma_buffer_info), GFP_KERNEL); >> if (!info) >> - return ION_CMA_ALLOCATE_FAILED; >> + return -ENOMEM; >> >> info->cpu_addr = dma_alloc_coherent(dev, len, &(info->handle), >> GFP_HIGHUSER | __GFP_ZERO); >> @@ -88,7 +86,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct >> ion_buffer *buffer, >> dma_free_coherent(dev, len, info->cpu_addr, info->handle); >> err: >> kfree(info); >> - return ION_CMA_ALLOCATE_FAILED; >> + return -ENOMEM; >> } >> >> static void ion_cma_free(struct ion_buffer *buffer) >> > > Happy to see cleanup > > Acked-by: Laura Abbott Thank you Laura Abbott. I'm honored to get Ack from you. I looked many patches of you. I hope this patch to be mainlined.
Re: wl1251 NVS calibration data format
Hi, On Sat, Dec 17, 2016 at 12:14:50PM +0100, Pali Rohár wrote: > On Saturday 17 December 2016 10:37:05 Sebastian Reichel wrote: > > On Fri, Dec 16, 2016 at 12:01:48PM +0100, Pali Rohár wrote: > > > Hi! Do you know format of wl1251 NVS calibration data file? > > > > > > I found that there is tool for changing NVS file for wl1271 and > > > newer chips (so not for wl1251!) at: > > > https://github.com/gxk/ti-utils > > > > > > And wl1271 has in NVS data already place for MAC address. And in > > > wlcore (for wl1271 and newer) there is really kernel code which is > > > doing something with MAC address in NVS, see: > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tre > > > e/drivers/net/wireless/ti/wlcore/boot.c#n352 > > > > > > So... I would like to know if in wl1251 NVS calibration file is > > > also some place for MAC address or not. > > > > > > Default wl1251 NVS calibration file is available in linux-firmware: > > > https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmwar > > > e.git/tree/ti-connectivity/wl1251-nvs.bin > > > > Pandora people [0] have a description of the format at [1]. > > > > [0] https://pandorawiki.org/WiFi > > [1] http://notaz.gp2x.de/misc/pnd/wl1251/nvs_map.txt > > Thank you very very much! You are welcome. > I tried to search for something, but I have not find anything. > In that description is something about STA mac address: > > 01a 6d //STA_ADDR_L Register Address. (STA MAC Address) > 01b 54 // > 01c 00 //STA_ADDR_L Register > 01d 00 // > 01e 32 // > 01f 28 // > 020 00 //STA_ADDR_H Register Data. > > STA would be abbreviation for station and so it should be really set to > mac address of that chip? Yes, STA is a common abbreviation: https://en.wikipedia.org/wiki/Station_(networking) > If yes, that could allow us to set permanent MAC address at time when > loading & sending NVS calibration data... Exactly same as wl1271 and new > drivers are working. > > I will try to play with driver if it is really truth! Thanks for your work. > I already looked into original TI's multiplatform HAL driver for wl1251 > chip (big mess) and found there that there is wl1251 command to read mac > address from chip. It could be done by this wl1251 function: > > wl1251_cmd_interrogate(wl, DOT11_STATION_ID, mac, sizeof(*mac)) > > (same id as for setting permanent mac address, but opposite to read it) -- Sebastian signature.asc Description: PGP signature
Re: netfilter regression causes lost pings "operation not permitted"
Trevor Cordes wrote: Sorry for late reply. > On 2016-12-07 Trevor Cordes wrote: > > Bisected down to: > > 870190a9ec9075205c0fa795a09fa931694a3ff1 > > 7c9664351980aaa6a4b8837a314360b3a4ad382a > > Oh! I forgot to mention the most important point: iptable_nat module > MUST be loaded for the bug to show up! > > modprobe iptable_nat > > If you rmmod it, the bug goes away. Interestingly, the bug occurs even > if you have every iptables table (including -t nat) completely empty > (no rules). All that is required is iptable_nat simply to be loaded. Pablo, I think stable should revert both patches. The alternative is for stable to pick up the fixes from 4.10 tree but that requires to pull rhhashtables new rhlist interface too... So I think revert is the way to go. Should I take care of that?
RE: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
> Subject: Re: [Qemu-devel] [PATCH kernel v5 0/5] Extend virtio-balloon for > fast (de)inflating & fast live migration > > On Thu, Dec 15, 2016 at 05:40:45PM -0800, Dave Hansen wrote: > > On 12/15/2016 05:38 PM, Li, Liang Z wrote: > > > > > > Use 52 bits for 'pfn', 12 bits for 'length', when the 12 bits is not long > enough for the 'length' > > > Set the 'length' to a special value to indicate the "actual length in > > > next 8 > bytes". > > > > > > That will be much more simple. Right? > > > > Sounds fine to me. > > > > Sounds fine to me too indeed. > > I'm only wondering what is the major point for compressing gpfn+len in > 8 bytes in the common case, you already use sg_init_table to send down two > pages, we could send three as well and avoid all math and bit shifts and ors, > or not? > Yes, we can use more pages for that. > I agree with the above because from a performance prospective I tend to > think the above proposal will run at least theoretically faster because the > other way is to waste double amount of CPU cache, and bit mangling in the > encoding and the later decoding on qemu side should be faster than > accessing an array of double size, but then I'm not sure if it's measurable > optimization. So I'd be curious to know the exact motivation and if it is to > reduce the CPU cache usage or if there's some other fundamental reason to > compress it. > The header already tells qemu how big is the array payload, couldn't we just > add more pages if one isn't enough? > The original intention to compress the PFN and length it's to reduce the memory required. Even the code was changed a lot from the previous versions, I think this is still true. Now we allocate a specified buffer size to save the 'PFN|length', when the buffer is not big enough to save all the page info for a specified order. A double size buffer will be allocated. This is what we want to avoid because the allocation may fail and allocation takes some time, for fast live migration, time is a critical factor we have to consider, more time takes means more unnecessary pages are sent, because live migration starts before the request for unused pages get response. Thanks Liang > Thanks, > Andrea
Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
BTW, here's some SipHash code I wrote for Linux a while ago. My target application was ext4 directory hashing, resulting in different implementation choices, although I still think that a rolled-up implementation like this is reasonable. Reducing I-cache impact speeds up the calling code. One thing I'd like to suggest you steal is the way it handles the fetch of the final partial word. It's a lot smaller and faster than an 8-way case statement. #include/* For rol64 */ #include #include #include /* The basic ARX mixing function, taken from Skein */ #define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a)) /* * The complete SipRound. Note that, when unrolled twice like below, * the 32-bit rotates drop out on 32-bit machines. */ #define SIP_ROUND(a, b, c, d) \ (SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \ SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32)) /* * This is rolled up more than most implementations, resulting in about * 55% the code size. Speed is a few precent slower. A crude benchmark * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);) * produces the following timings (in usec): * * i386i386i386x86_64 x86_64 x86_64 x86_64 * Length small unroll halfmd4 small unroll halfmd4 teahash * 1..4106910291608 195 160 399 690 * 1..8248323813851 410 360 9881659 * 1..12 430341526207 690 61816422690 * 1..16 612259318668 968 87623633786 * 1..20 83488137 112451323118531625567 * 1..24 10580 10327 139351657150440667635 * 1..28 13211 12956 168032069187150289759 * 1..32 15843 15572 19725247022606084 11932 * 1..36 18864 18609 24259293426787566 14794 * 1..1024 5890194 6130242 10264816 881933 881244 3617392 7589036 * * The performance penalty is quite minor, decreasing for long strings, * and it's significantly faster than half_md4, so I'm going for the * I-cache win. */ uint64_t siphash24(char const *in, size_t len, uint32_t const seed[4]) { uint64_t a = 0x736f6d6570736575;/* somepseu */ uint64_t b = 0x646f72616e646f6d;/* dorandom */ uint64_t c = 0x6c7967656e657261;/* lygenera */ uint64_t d = 0x7465646279746573;/* tedbytes */ uint64_t m = 0; uint8_t padbyte = len; /* * Mix in the 128-bit hash seed. This is in a format convenient * to the ext3/ext4 code. Please feel free to adapt the * */ if (seed) { m = seed[2] | (uint64_t)seed[3] << 32; b ^= m; d ^= m; m = seed[0] | (uint64_t)seed[1] << 32; /* a ^= m; is done in loop below */ c ^= m; } /* * By using the same SipRound code for all iterations, we * save space, at the expense of some branch prediction. But * branch prediction is hard because of variable length anyway. */ len = len/8 + 3;/* Now number of rounds to perform */ do { a ^= m; switch (--len) { unsigned bytes; default:/* Full words */ d ^= m = get_unaligned_le64(in); in += 8; break; case 2: /* Final partial word */ /* * We'd like to do one 64-bit fetch rather than * mess around with bytes, but reading past the end * might hit a protection boundary. Fortunately, * we know that protection boundaries are aligned, * so we can consider only three cases: * - The remainder occupies zero words * - The remainder fits into one word * - The remainder straddles two words */ bytes = padbyte & 7; if (bytes == 0) { m = 0; } else { unsigned offset = (unsigned)(uintptr_t)in & 7; if (offset + bytes <= 8) { m = le64_to_cpup((uint64_t const *) (in - offset)); m >>= 8*offset; } else { m = get_unaligned_le64(in); } m &= ((uint64_t)1 << 8*
Re: [PATCH 1/2] dt-bindings: usb: add DT binding for s3c2410 USB device controller
On Tue, Dec 13, 2016 at 12:59:15PM -0600, Rob Herring wrote: > > +Samsung S3C2410 and compatible USB device controller > > + > > +Required properties: > > + - compatible: Should be one of the following > > + "samsung,s3c2410-udc" > > + "samsung,s3c2440-udc" > > + - reg: address and length of the controller memory mapped region > > + - interrupts: interrupt number for the USB device controller > > + - clocks: Should reference the bus and host clocks > > + - clock-names: Should contain two strings > > + "usb-bus-gadget" for the USB bus clock > > Pretty sure the h/w clock name in the datasheet does not use the Linux > term gadget. You are right. The datasheet calls it UCLK. In the S3c24010 clock driver (clk-s3c2410.c), there's is a clock alias to UCLK called "usb-bus-gadget" that was used in the USB device controller's driver. We can change the driver and the DT binding to use "uclk" to better reflect the name used in the datasheet. What do you think? > > > + "usb-device" for the USB device clock > > + > > +Optional properties: > > + - samsung,vbus-gpio: If present, specifies a gpio that needs to be > > + activated for the bus to be powered. > > Isn't it the host side that controls Vbus? Yes. I'll change the description to "specifies a gpio that allows to detect whether vbus is present (USB is connected)." > > > + - samsung,pullup-gpio: If present, specifies a gpio to control the > > Both GPIOs need to specify the active state. OK. > > > + USB D+ pullup. > > + > > +usb1: udc@5200 { > > + compatible = "samsung,s3c2440-udc"; > > + reg = <0x5200 0x10>; > > + interrupts = <0 0 25 3>; > > + clocks = <&clocks UCLK>, <&clocks HCLK_USBD>; > > + clock-names = "usb-bus-gadget", "usb-device"; > > + samsung,pullup-gpio = <&gpc 5 GPIO_ACTIVE_HIGH>; > > +}; > > -- > > 1.9.1 > > Best regards, -- Sergio Prado Embedded Labworks
Re: OOM: Better, but still there on
On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > On Fri 16-12-16 19:47:00, Nils Holland wrote: > > > > Dec 16 18:56:24 boerne.fritz.box kernel: Purging GPU memory, 37 pages > > freed, 10219 pages still pinned. > > Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd invoked oom-killer: > > gfp_mask=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > nodemask=0, order=1, oom_score_adj=0 > > Dec 16 18:56:29 boerne.fritz.box kernel: kthreadd cpuset=/ mems_allowed=0 > [...] > > Dec 16 18:56:29 boerne.fritz.box kernel: Normal free:41008kB min:41100kB > > low:51372kB high:61644kB active_anon:0kB inactive_anon:0kB > > active_file:470556kB inactive_file:148kB unevictable:0kB > > writepending:1616kB present:897016kB managed:831480kB mlocked:0kB > > slab_reclaimable:213172kB slab_unreclaimable:86236kB kernel_stack:1864kB > > pagetables:3572kB bounce:0kB free_pcp:532kB local_pcp:456kB free_cma:0kB > > this is a GFP_KERNEL allocation so it cannot use the highmem zone again. > There is no anonymous memory in this zone but the allocation > context implies the full reclaim context so the file LRU should be > reclaimable. For some reason ~470MB of the active file LRU is still > there. This is quite unexpected. It is harder to tell more without > further data. It would be great if you could enable reclaim related > tracepoints: > > mount -t tracefs none /debug/trace > echo 1 > /debug/trace/events/vmscan/enable > cat /debug/trace/trace_pipe > trace.log > > should help > [...] No problem! I enabled writing the trace data to a file and then tried to trigger another OOM situation. That worked, this time without a complete kernel panic, but with only my processes being killed and the system becoming unresponsive. When that happened, I let it run for another minute or two so that in case it was still logging something to the trace file, it could continue to do so some time longer. Then I rebooted with the only thing that still worked, i.e. by means of magic SysRequest. The trace file has actually become rather big (around 21 MB). I didn't dare to cut anything from it because I didn't want to risk deleting something that might turn out important. So, due to the size, I'm not attaching the trace file to this message, but it's up compressed (about 536 KB) to be grabbed at: http://ftp.tisys.org/pub/misc/trace.log.xz For reference, here's the OOM report that goes along with this incident and the trace file: Dec 17 13:31:06 boerne.fritz.box kernel: Purging GPU memory, 145 pages freed, 10287 pages still pinned. Dec 17 13:31:07 boerne.fritz.box kernel: awesome invoked oom-killer: gfp_mask=0x25000c0(GFP_KERNEL_ACCOUNT), nodemask=0, order=0, oom_score_adj=0 Dec 17 13:31:07 boerne.fritz.box kernel: awesome cpuset=/ mems_allowed=0 Dec 17 13:31:07 boerne.fritz.box kernel: CPU: 1 PID: 5599 Comm: awesome Not tainted 4.9.0-gentoo #3 Dec 17 13:31:07 boerne.fritz.box kernel: Hardware name: TOSHIBA Satellite L500/KSWAA, BIOS V1.80 10/28/2009 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c18 Dec 17 13:31:07 boerne.fritz.box kernel: c1433406 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37d48 Dec 17 13:31:07 boerne.fritz.box kernel: c5319280 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c48 Dec 17 13:31:07 boerne.fritz.box kernel: c1170011 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c9c Dec 17 13:31:07 boerne.fritz.box kernel: 00200286 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c48 Dec 17 13:31:07 boerne.fritz.box kernel: c1438fff Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c4c Dec 17 13:31:07 boerne.fritz.box kernel: c72479c0 Dec 17 13:31:07 boerne.fritz.box kernel: c60dd200 Dec 17 13:31:07 boerne.fritz.box kernel: c5319280 Dec 17 13:31:07 boerne.fritz.box kernel: c1ad1899 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37d48 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c8c Dec 17 13:31:07 boerne.fritz.box kernel: c1114407 Dec 17 13:31:07 boerne.fritz.box kernel: c10513a5 Dec 17 13:31:07 boerne.fritz.box kernel: c5a37c78 Dec 17 13:31:07 boerne.fritz.box kernel: c11140a1 Dec 17 13:31:07 boerne.fritz.box kernel: 0005 Dec 17 13:31:07 boerne.fritz.box kernel: Dec 17 13:31:07 boerne.fritz.box kernel: Dec 17 13:31:07 boerne.fritz.box kernel: Call Trace: Dec 17 13:31:07 boerne.fritz.box kernel: [] dump_stack+0x47/0x61 Dec 17 13:31:07 boerne.fritz.box kernel: [] dump_header+0x5f/0x175 Dec 17 13:31:07 boerne.fritz.box kernel: [] ? ___ratelimit+0x7f/0xe0 Dec 17 13:31:07 boerne.fritz.box kernel: [] oom_kill_process+0x207/0x3c0 Dec 17 13:31:07 boerne.fritz.box kernel: [] ? has_capability_noaudit+0x15/0x20 Dec 17 13:31:07 boerne.fritz.box kernel: [] ? oom_badness.part.13+0xb1/0x120 Dec 17 13:31:07 boerne.fritz.box kernel: [] out_of_memory+0xd4/0x270 Dec 17 13:31:07 boerne.fritz.box kernel: [] __alloc_pages_nodemask+0xcf5/0xd60 Dec 17 13:31:07 boerne.fritz.box kernel: [] ? skb_queue_purge+0x30/0x30 Dec 17 13:31:07 boerne.fritz.box kernel: [] alloc_skb_with_fr
MAC address in wl1251 NVS data (Was: Re: wl1251 NVS calibration data format)
> On Sat, Dec 17, 2016 at 12:14:50PM +0100, Pali Rohár wrote: > > > [1] http://notaz.gp2x.de/misc/pnd/wl1251/nvs_map.txt > > In that description is something about STA mac address: > > 019 02 //length > > 01a 6d //STA_ADDR_L Register Address. (STA MAC > > Address) > > 01b 54 // > > 01c 00 //STA_ADDR_L Register > > 01d 00 // > > 01e 32 // > > 01f 28 // > > 020 00 //STA_ADDR_H Register Data. 021 08 // 022 00 // 023 00 // So... above data means: 019 - number of words 01a - low bits of offset applied with mask 0xfe 01b - high bits of offset 01c-01f first word 020-023 second word Interpreted as: at address offset 0x536c are written two words 0x2832 and 0x0800 wl1271 driver has in linux/drivers/net/wireless/ti/wlcore/boot.c this: /* update current MAC address to NVS */ nvs_ptr[11] = wl->addresses[0].addr[0]; nvs_ptr[10] = wl->addresses[0].addr[1]; nvs_ptr[6] = wl->addresses[0].addr[2]; nvs_ptr[5] = wl->addresses[0].addr[3]; nvs_ptr[4] = wl->addresses[0].addr[4]; nvs_ptr[3] = wl->addresses[0].addr[5]; Looking at wl1271-nvs.bin file (which is "modified" in kernel by boot.c) 000: 01 001: 6d 002: 54 003: 00 004: 00 005: ef 006: be Means: at address offset 0x536c is written one word 0xBEEF 007: 01 008: 71 009: 54 00a: ad 00b: de 00c: 00 00d: 00 Means: at address offset 0x5371 is written one word 0xDEAD Above boot.c kernel code updates those data to MAC address, so at address offset 0x536c is written four low bytes of MAC address and to 0x5371 are written remaining two bytes. So 00:00:DE:AD:BE:EF So conclusion: address offset for wl1271 (where is written MAC address) is exactly same as for wl1251 which is marked in that documentation as STA_ADDR_L Register. Btw, in our wl1251-nvs.bin found in Maemo rootfs, which is exactly same as in linux-firmware.git tree there are those data: 019: 02 01a: 6d 01b: 54 01c: 09 01d: 03 01e: 07 01f: 20 020: 00 021: 00 022: 00 023: 00 So hardcoded MAC address in wl1251-nvs.bin is: 00:00:20:07:03:09. Which is assigned to DIAB. Strange that it is not TI... -- Pali Rohár pali.ro...@gmail.com signature.asc Description: This is a digitally signed message part.
[PATCH] livepatch: fixup klp-convert tool integration
I've found some minor problems, this patch fixes: * save cmd_ld_ko_o into .module.cmd, if_changed_rule doesn't do that * fix bashisms for debian where /bin/sh is a symlink to /bin/dash * rename rule_link_module to rule_ld_ko_o, otherwise arg-check inside if_changed_rule compares cmd_link_module and cmd_ld_ko_o * use HOSTLOADLIBES_$module instead of HOSTLDFLAGS: -lelf must be at the end * check modinfo -F livepatch only if CONFIG_LIVEPATCH is true I think "modinfo -F" could be replaced with explicit mark in makefile, for example: LIVEPATCH_module.ko := y (like KASAN_SANITIZE_obj.o := n). Signed-off-by: Konstantin Khlebnikov --- scripts/Kbuild.include |4 +++- scripts/Makefile.modpost | 24 +++- scripts/livepatch/Makefile |2 +- 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 179219845dfc..e299fde3423b 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -247,6 +247,8 @@ endif # (needed for the shell) make-cmd = $(call escsq,$(subst \#,\\\#,$(subst $$,,$(cmd_$(1) +save-cmd = printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd + # Find any prerequisites that is newer than target or that does not exist. # PHONY targets skipped in both cases. any-prereq = $(filter-out $(PHONY),$?) $(filter-out $(PHONY) $(wildcard $^),$^) @@ -256,7 +258,7 @@ any-prereq = $(filter-out $(PHONY),$?) $(filter-out $(PHONY) $(wildcard $^),$^) if_changed = $(if $(strip $(any-prereq) $(arg-check)), \ @set -e; \ $(echo-cmd) $(cmd_$(1)); \ - printf '%s\n' 'cmd_$@ := $(make-cmd)' > $(dot-target).cmd, @:) + $(save-cmd), @:) # Execute the command and also postprocess generated .d dependencies file. if_changed_dep = $(if $(strip $(any-prereq) $(arg-check) ), \ diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost index 916dd347e8f6..5d149d0b05c2 100644 --- a/scripts/Makefile.modpost +++ b/scripts/Makefile.modpost @@ -123,24 +123,22 @@ quiet_cmd_ld_ko_o = LD [M] $@ $(LD) -r $(LDFLAGS) \ $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \ -o $@ $(filter-out FORCE,$^) ; \ - $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true) ; + $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true) -ifdef CONFIG_LIVEPATCH KLP_CONVERT = scripts/livepatch/klp-convert - cmd_klp_convert = \ - if [[ -n "`modinfo -F livepatch $@`" ]]; then \ - mv $@ $(@:.ko=.klp.o); \ - $(KLP_CONVERT) $(@:.ko=.klp.o) $@; \ - fi ; -endif - -define rule_link_module - $(call echo-cmd,ld_ko_o) $(cmd_ld_ko_o) \ - $(cmd_klp_convert) +quiet_cmd_klp_convert = LIVEPATCH $@ + cmd_klp_convert = mv $@ $(@:.ko=.klp.o); $(KLP_CONVERT) $(@:.ko=.klp.o) $@ + +define rule_ld_ko_o + $(call echo-cmd,ld_ko_o) $(cmd_ld_ko_o) ;\ + $(call save-cmd,ld_ko_o) ; \ + $(if $(CONFIG_LIVEPATCH),\ + if [ -n "`modinfo -F livepatch $@`" ] ; then \ + $(call echo-cmd,klp_convert) $(cmd_klp_convert) ; fi) endef $(modules): %.ko :%.o %.mod.o FORCE - +$(call if_changed_rule,link_module) + +$(call if_changed_rule,ld_ko_o) targets += $(modules) diff --git a/scripts/livepatch/Makefile b/scripts/livepatch/Makefile index 221829bb34c7..bd5c1ae553ab 100644 --- a/scripts/livepatch/Makefile +++ b/scripts/livepatch/Makefile @@ -4,4 +4,4 @@ always := $(hostprogs-y) klp-convert-objs := klp-convert.o elf.o HOSTCFLAGS := -g -I$(INSTALL_HDR_PATH)/include -Wall -HOSTLDFLAGS:= -lelf +HOSTLOADLIBES_klp-convert := -lelf
RE: [RFC 00/10] implement alternative and much simpler id allocator
From: Matthew Wilcox > From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk] > > This sounds good. I think there may still be a lot of users that never > > allocate more than a handful of IDAs, making a 128 byte allocation still > > somewhat excessive. One thing I considered was (exactly as it's done for > > file descriptor tables) to embed a single word in the struct ida and > > use that initially; I haven't looked closely at newIDA, so I don't know > > how easy that would be or if its worth the complexity. > > Heh, I was thinking about that too. The radix tree supports "exceptional > entries" which have the bottom bit set. On a 64-bit machine, we could use 62 > of the bits in the radix tree root to store the ID bitmap. I'm a little wary > of the > potential complexity, but we should try it out. Test patch here: http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/idr-2016-12-16 It passes the test suite ... which I actually had to adjust because it now succeeds in cases where it hadn't (allocating ID 0 without preallocating), and it will now fail in cases where it hadn't previously (assuming a single preallocation would be enough). There shouldn't be any examples of that in the kernel proper; it was simply me being lazy when I wrote the test suite.
Re: [PATCH v2 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context
On Fri, Dec 16, 2016 at 02:17:25PM +0100, Nicolai Hähnle wrote: > On 06.12.2016 16:25, Peter Zijlstra wrote: > >On Thu, Dec 01, 2016 at 03:06:47PM +0100, Nicolai Hähnle wrote: > > > >>@@ -640,10 +640,11 @@ __mutex_lock_common(struct mutex *lock, long state, > >>unsigned int subclass, > >>struct mutex_waiter waiter; > >>unsigned long flags; > >>bool first = false; > >>- struct ww_mutex *ww; > >>int ret; > >> > >>- if (use_ww_ctx) { > >>+ if (use_ww_ctx && ww_ctx) { > >>+ struct ww_mutex *ww; > >>+ > >>ww = container_of(lock, struct ww_mutex, base); > >>if (unlikely(ww_ctx == READ_ONCE(ww->ctx))) > >>return -EALREADY; > > > >So I don't see the point of removing *ww from the function scope, we can > >still compute that container_of() even if !ww_ctx, right? That would > >safe a ton of churn below, adding all those struct ww_mutex declarations > >and container_of() casts. > > > >(and note that the container_of() is a fancy NO-OP because base is the > >first member). > > Sorry for taking so long to get back to you. > > In my experience, the undefined behavior sanitizer in GCC for userspace > programs complains about merely casting a pointer to the wrong type. I never > went into the standards rabbit hole to figure out the details. It might be a > C++ only thing (ubsan cannot tell the difference otherwise anyway), but that > was the reason for doing the change in this more complicated way. Note that C only has what C++ calls reinterpret_cast<>(). It cannot complain about a 'wrong' cast, there is no such thing. Also, container_of() works, irrespective of what C language says about it -- note that the kernel in general hard relies on a lot of things C calls undefined behaviour. > Are you sure that this is defined behavior in C? If so, I'd be happy to go > with the version that has less churn. It should very much work with kernel C.
Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
On Fri, Dec 16, 2016 at 03:31:40PM -0800, Darren Hart wrote: > On Tue, Dec 13, 2016 at 09:36:38AM +0100, Peter Zijlstra wrote: > > That way, when we drop hb->lock to wait, futex and rt_mutex wait state is > > consistent. > > > > > > In any case, it passes our inadequate testing. > > It passed my CI tools/testing/selftests/futex/functional/run.sh. Did you also > happen to run a fuzz tester? I did not. I'm not sure how good trinity is at poking holes in futexes. I would love a domain specific fuzzer for futex, but I suspect it would end up being me writing it :-(
Re: [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI
On Fri, Dec 16, 2016 at 04:06:39PM -0800, Darren Hart wrote: > On Tue, Dec 13, 2016 at 09:36:40AM +0100, Peter Zijlstra wrote: > > Thomas spotted that fixup_pi_state_owner() can return errors and we > > fail to unlock the rt_mutex in that case. > > > > We handled this explicitly before Patch 1/10, so can this be rolled into 1/10 > (er 9) as a single commit? I don't think we did, see how this branch doesn't set pi_mutex.
Re: [alsa-devel] [PATCH v6 1/3] clk: x86: Add Atom PMC platform clocks
On Sat, Dec 17, 2016 at 3:33 AM, Stephen Boyd wrote: > On 12/15, Pierre-Louis Bossart wrote: >>Clients use devm_clk_get() with a "pmc_plt_clk_" >> argument. > > This is the problem. Clients should be calling clk_get() like: > > clk_get(dev, "signal name in datasheet") > > where the first argument is the device and the second argument is > some string that is meaningful to the device, not the system as a > whole. The way clkdev is intended is so that the dev argument's > dev_name() is combined with the con_id that matches some signale > name in the datasheet. This way when the same IP is put into some > other chip, the globally unique name doesn't need to change, just > the device name that's registered with the lookup. Obviously this > breaks down quite badly when dev_name() isn't stable. Is that > happening here? PMC Atom is a PCI device and thus each platform would have different dev_name(). Do you want to list all in each consumer if consumer wants to work on all of them or I missed something? So, the question is how clock getting will look like to work on currently both CherryTrail and BayTrail. -- With Best Regards, Andy Shevchenko
Re: [PATCH] rbtree: use designated initializers
On Fri, Dec 16, 2016 at 05:02:53PM -0800, Kees Cook wrote: > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. Works for me. Acked-by: Peter Zijlstra (Intel) One note on these structures, the intent is that GCC value propagation completely does away with everything and results in inlining the actual functions. Older versions of GCC had a wee bit of trouble with this, but recent versions do just that, not a single actual structure should end up being emitted in the object code. > Signed-off-by: Kees Cook > --- > include/linux/rbtree_augmented.h | 4 +++- > lib/rbtree.c | 4 +++- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/linux/rbtree_augmented.h > b/include/linux/rbtree_augmented.h > index d076183e49be..9702b6e183bc 100644 > --- a/include/linux/rbtree_augmented.h > +++ b/include/linux/rbtree_augmented.h > @@ -90,7 +90,9 @@ rbname ## _rotate(struct rb_node *rb_old, struct rb_node > *rb_new) \ > old->rbaugmented = rbcompute(old); \ > }\ > rbstatic const struct rb_augment_callbacks rbname = { > \ > - rbname ## _propagate, rbname ## _copy, rbname ## _rotate\ > + .propagate = rbname ## _propagate, \ > + .copy = rbname ## _copy,\ > + .rotate = rbname ## _rotate \ > }; > > > diff --git a/lib/rbtree.c b/lib/rbtree.c > index 1f8b112a7c35..4ba2828a67c0 100644 > --- a/lib/rbtree.c > +++ b/lib/rbtree.c > @@ -427,7 +427,9 @@ static inline void dummy_copy(struct rb_node *old, struct > rb_node *new) {} > static inline void dummy_rotate(struct rb_node *old, struct rb_node *new) {} > > static const struct rb_augment_callbacks dummy_callbacks = { > - dummy_propagate, dummy_copy, dummy_rotate > + .propagate = dummy_propagate, > + .copy = dummy_copy, > + .rotate = dummy_rotate > }; > > void rb_insert_color(struct rb_node *node, struct rb_root *root) > -- > 2.7.4 > > > -- > Kees Cook > Nexus Security
[PATCH v3 0/7] Runtime PM for Thunderbolt on Macs
Power down Thunderbolt controllers on Macs when nothing is plugged in to save around 2W per controller. Apple provides an ACPI-based (but nonstandard) mechanism to cut power and signal hotplug during powerdown. The usual way to implement such nonstandard mechanisms seems to be a struct dev_pm_domain. E.g. vga_switcheroo uses that for Optimus GPUs which control power with ACPI DSMs. Hence this third iteration of the series uses that as well. In v2 a more complicated approach was employed wherein power control was exerted by a PCIe port service driver instead. All the prep work went into 4.9 and 4.10, shrinking this series to just 7 patches: - The actual "meat" of the series (to borrow a term from Bjorn) is in patches [6/7] and [7/7]. These two need an ack from Andreas. - Patches [1/7] to [3/7] need an ack from Bjorn (and possibly Rafael or Mika). They're fairly small and just add a bit to struct pci_dev signifying that a device is part of a Thunderbolt daisy chain, then use that bit to modify runtime PM for PCIe ports. I'm also cc'ing Tomas and Amir at Intel Israel, if you guys have comments please shout. - Patches [4/7] and [5/7] need an ack from Rafael. Their sole purpose is to avoid a gratuitous WARN splat when assigning the struct dev_pm_domain. I've pushed the patches to GitHub to ease reviewing/fetching: https://github.com/l1k/linux/commits/thunderbolt_runpm_v3 Link to the previous iteration (v2, May 2016): http://www.spinics.net/lists/linux-pci/msg51158.html Thanks, Lukas Lukas Wunner (7): PCI: Recognize Thunderbolt devices PCI: Allow runtime PM on Thunderbolt ports PCI: Don't block runtime PM for Thunderbolt host hotplug ports Revert "PM / Runtime: Remove the exported function pm_children_suspended()" PM: Make requirements of dev_pm_domain_set() more precise thunderbolt: Power down controller when idle thunderbolt: Runtime suspend NHI when idle drivers/base/power/common.c | 15 +- drivers/base/power/runtime.c | 3 +- drivers/pci/pci.c| 20 ++- drivers/pci/pci.h| 2 + drivers/pci/probe.c | 34 + drivers/thunderbolt/Kconfig | 3 +- drivers/thunderbolt/Makefile | 4 +- drivers/thunderbolt/nhi.c| 5 + drivers/thunderbolt/power.c | 356 +++ drivers/thunderbolt/power.h | 37 + drivers/thunderbolt/switch.c | 9 ++ drivers/thunderbolt/tb.c | 13 ++ drivers/thunderbolt/tb.h | 2 + include/linux/pci.h | 1 + include/linux/pm_runtime.h | 7 + 15 files changed, 500 insertions(+), 11 deletions(-) create mode 100644 drivers/thunderbolt/power.c create mode 100644 drivers/thunderbolt/power.h -- 2.10.2
[PATCH v3 2/7] PCI: Allow runtime PM on Thunderbolt ports
Currently PCIe ports are only allowed to go to D3 if the BIOS is dated 2015 or newer to avoid potential issues with old chipsets. However for Thunderbolt we know that even the oldest controller, Light Ridge (2010), is able to suspend its ports to D3 just fine. We're about to add runtime PM for Thunderbolt on the Mac. Apple has released two EFI security updates in 2015 which encompass all machines with Thunderbolt, but the achieved power saving should be made available to users even if they haven't updated their BIOS. To this end, special-case Thunderbolt in pci_bridge_d3_possible(). This allows the Thunderbolt controller to power down but the root port to which the Thunderbolt controller is attached remains in D0 unless the EFI update is installed. Users can pass pcie_port_pm=force on the kernel command line if they cannot install the EFI update but still want to benefit from the additional power saving of putting the root port into D3. In practice, root ports can be suspended to D3 without issues at least on 2012 Ivy Bridge machines. If the BIOS cut-off date is ever lowered to 2010, the Thunderbolt special case can be removed. Cc: Mika Westerberg Cc: Rafael J. Wysocki Cc: Andreas Noever Cc: Tomas Winkler Cc: Amir Levy Signed-off-by: Lukas Wunner --- drivers/pci/pci.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a881c0d..8ed098d 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2224,7 +2224,7 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev) * @bridge: Bridge to check * * This function checks if it is possible to move the bridge to D3. - * Currently we only allow D3 for recent enough PCIe ports. + * Currently we only allow D3 for recent enough PCIe ports and Thunderbolt. */ bool pci_bridge_d3_possible(struct pci_dev *bridge) { @@ -2258,6 +2258,11 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) year >= 2015) { return true; } + + /* Even the oldest 2010 Thunderbolt controller supports D3. */ + if (bridge->is_thunderbolt) + return true; + break; } -- 2.10.2
[PATCH v3 3/7] PCI: Don't block runtime PM for Thunderbolt host hotplug ports
Hotplug ports generally block their parents from suspending to D3hot as otherwise their interrupts couldn't be delivered. An exception are Thunderbolt host controllers: They have a separate GPIO pin to side-band signal plug events even if the controller is powered down or its parent ports are suspended to D3. They can be told apart from Thunderbolt controllers in attached devices by checking if they're situated below a non-Thunderbolt device (typically a root port, or the downstream port of a PCIe switch in the case of the MacPro6,1). To enable runtime PM for Thunderbolt on the Mac, the downstream bridges of a host controller must not block runtime PM on the upstream bridge as power to the chip is only cut once the upstream bridge has suspended. Amend the condition in pci_dev_check_d3cold() accordingly. Cc: Mika Westerberg Cc: Rafael J. Wysocki Cc: Andreas Noever Cc: Tomas Winkler Cc: Amir Levy Signed-off-by: Lukas Wunner --- drivers/pci/pci.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 8ed098d..0b03fe7 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2271,6 +2271,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) static int pci_dev_check_d3cold(struct pci_dev *dev, void *data) { + struct pci_dev *parent, *grandparent; bool *d3cold_ok = data; if (/* The device needs to be allowed to go D3cold ... */ @@ -2284,7 +2285,17 @@ static int pci_dev_check_d3cold(struct pci_dev *dev, void *data) !pci_power_manageable(dev) || /* Hotplug interrupts cannot be delivered if the link is down. */ - dev->is_hotplug_bridge) + (dev->is_hotplug_bridge && + + /* +* Exception: Thunderbolt host controllers have a pin to +* side-band signal plug events. Their hotplug ports are +* recognizable by having a non-Thunderbolt device as +* grandparent. +*/ + !(dev->is_thunderbolt && (parent = pci_upstream_bridge(dev)) && +(grandparent = pci_upstream_bridge(parent)) && + !grandparent->is_thunderbolt))) *d3cold_ok = false; -- 2.10.2
[PATCH v3 1/7] PCI: Recognize Thunderbolt devices
We're about to allow runtime PM on Thunderbolt ports in pci_bridge_d3_possible() and unblock runtime PM for Thunderbolt host hotplug ports in pci_dev_check_d3cold(). In both cases we need to uniquely identify if a PCI device belongs to a Thunderbolt controller. We also have the need to detect presence of a Thunderbolt controller in drivers/platform/x86/apple-gmux.c because dual GPU MacBook Pros cannot switch external DP/HDMI ports between GPUs if they have Thunderbolt. Furthermore, in multiple places in the DRM subsystem we need to detect whether a GPU is on-board or attached with Thunderbolt. As an example, Thunderbolt-attached GPUs shall not be registered with vga_switcheroo. Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234 on devices belonging to a Thunderbolt controller which allows us to recognize them. Detect presence of this VSEC on device probe and cache it in a newly added is_thunderbolt bit in struct pci_dev which can then be queried by pci_bridge_d3_possible(), pci_dev_check_d3cold(), apple-gmux and others. Cc: Andreas Noever Cc: Tomas Winkler Cc: Amir Levy Signed-off-by: Lukas Wunner --- drivers/pci/pci.h | 2 ++ drivers/pci/probe.c | 34 ++ include/linux/pci.h | 1 + 3 files changed, 37 insertions(+) diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index cb17db2..45c2b81 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -3,6 +3,8 @@ #define PCI_FIND_CAP_TTL 48 +#define PCI_VSEC_ID_INTEL_TBT 0x1234 /* Thunderbolt */ + extern const unsigned char pcie_link_speed[]; bool pcie_cap_has_lnkctl(const struct pci_dev *dev); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index e164b5c..891a8fa 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1206,6 +1206,37 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev) pdev->is_hotplug_bridge = 1; } +static void set_pcie_vendor_specific(struct pci_dev *dev) +{ + int vsec = 0; + u32 header; + + while ((vsec = pci_find_next_ext_capability(dev, vsec, + PCI_EXT_CAP_ID_VNDR))) { + pci_read_config_dword(dev, vsec + PCI_VNDR_HEADER, &header); + + /* Is the device part of a Thunderbolt controller? */ + if (dev->vendor == PCI_VENDOR_ID_INTEL && + PCI_VNDR_HEADER_ID(header) == PCI_VSEC_ID_INTEL_TBT) + dev->is_thunderbolt = 1; + } + + /* +* Is the device attached with Thunderbolt? Walk upwards and check for +* each encountered bridge if it's part of a Thunderbolt controller. +* Reaching the host bridge means dev is soldered to the mainboard. +*/ + if (!dev->is_thunderbolt) { + struct pci_dev *parent = dev; + + while ((parent = pci_upstream_bridge(parent))) + if (parent->is_thunderbolt) { + dev->is_thunderbolt = 1; + break; + } + } +} + /** * pci_ext_cfg_is_aliased - is ext config space just an alias of std config? * @dev: PCI device @@ -1358,6 +1389,9 @@ int pci_setup_device(struct pci_dev *dev) /* need to have dev->class ready */ dev->cfg_size = pci_cfg_space_size(dev); + /* need to have dev->cfg_size ready */ + set_pcie_vendor_specific(dev); + /* "Unknown power state" */ dev->current_state = PCI_UNKNOWN; diff --git a/include/linux/pci.h b/include/linux/pci.h index e2d1a12..3c775e8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -358,6 +358,7 @@ struct pci_dev { unsigned intis_virtfn:1; unsigned intreset_fn:1; unsigned intis_hotplug_bridge:1; + unsigned intis_thunderbolt:1; /* part of Thunderbolt daisy chain */ unsigned int__aer_firmware_first_valid:1; unsigned int__aer_firmware_first:1; unsigned intbroken_intx_masking:1; -- 2.10.2
[PATCH v3 6/7] thunderbolt: Power down controller when idle
Document and implement Apple's ACPI-based (but nonstandard) pm mechanism for Thunderbolt. Briefly, an ACPI method provided by Apple is used to cut power to the controller. A GPE is enabled while the controller is powered down which sideband-signals a plug event, whereupon we reinstate power using the ACPI method. This saves 1.7 W on machines with a Light Ridge controller and is reported to save 4 W on Cactus Ridge 4C and Falcon Ridge 4C. (I believe 4 W includes the bus power drawn by Apple's Gigabit Ethernet adapter.) It fixes (at least partially) a power regression introduced in 3.17 by commit 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly"). A Thunderbolt controller appears to the OS as a set of virtual devices: One upstream bridge, multiple downstream bridges and one NHI (Native Host Interface). The upstream and downstream bridges represent a PCIe switch (see definition of a switch in the PCIe spec). The NHI device is used to manage the switch fabric. Hotplugged devices appear behind the downstream bridges: (Root Port) Upstream Bridge --+-- Downstream Bridge 0 NHI +-- Downstream Bridge 1 -- +-- Downstream Bridge 2 -- ... Power is cut to the entire set of devices. The Linux pm model is hierarchical and assumes that a child cannot resume before its parent. To conform to this model, power control must be governed by the Thunderbolt controller's topmost device, which is the upstream bridge. The NHI and downstream bridges go to D3hot independently and the upstream bridge goes to D3cold once all its children have suspended. This commit only adds runtime pm for the upstream bridge. Runtime pm for the NHI is added in a separate commit to signify its independence. Runtime pm for the downstream bridges is handled by the pcieport driver. Because Apple's ACPI methods are nonstandard, a struct dev_pm_domain is used to override the PCI bus pm_ops. The thunderbolt driver binds to the NHI, thus the dev_pm_domain is assigned to the upstream bridge when its grandchild ->probes and evicted when it ->removes. There are no Thunderbolt specs publicly available from Intel or Apple, so I've included documentation to the extent that I was able to reverse- engineer things. Documentation on the Go2Sx and Ok2Go2Sx pins is tentative as those are missing on my Light Ridge. Apple only uses them on Cactus Ridge 4C. Someone with such a controller needs to find out through experimentation if the documentation is accurate and amend it if necessary. To maximize power saving, the controller utilizes the PM core's direct- complete procedure, i.e. it stays suspended during the system sleep process. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92111 Cc: Andreas Noever Signed-off-by: Lukas Wunner --- drivers/thunderbolt/Kconfig | 3 +- drivers/thunderbolt/Makefile | 4 +- drivers/thunderbolt/nhi.c| 3 + drivers/thunderbolt/power.c | 347 +++ drivers/thunderbolt/power.h | 37 + drivers/thunderbolt/tb.h | 2 + 6 files changed, 393 insertions(+), 3 deletions(-) create mode 100644 drivers/thunderbolt/power.c create mode 100644 drivers/thunderbolt/power.h diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig index d35db16..41625cf 100644 --- a/drivers/thunderbolt/Kconfig +++ b/drivers/thunderbolt/Kconfig @@ -1,9 +1,10 @@ menuconfig THUNDERBOLT tristate "Thunderbolt support for Apple devices" - depends on PCI + depends on PCI && ACPI depends on X86 || COMPILE_TEST select APPLE_PROPERTIES if EFI_STUB && X86 select CRC32 + select PM help Cactus Ridge Thunderbolt Controller driver This driver is required if you want to hotplug Thunderbolt devices on diff --git a/drivers/thunderbolt/Makefile b/drivers/thunderbolt/Makefile index 5d1053c..b220825 100644 --- a/drivers/thunderbolt/Makefile +++ b/drivers/thunderbolt/Makefile @@ -1,3 +1,3 @@ obj-${CONFIG_THUNDERBOLT} := thunderbolt.o -thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o eeprom.o - +thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o \ + eeprom.o power.o diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index a8c2041..88fb2fb 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -605,6 +605,8 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id) } pci_set_drvdata(pdev, tb); + thunderbolt_power_init(tb); + return 0; } @@ -612,6 +614,7 @@ static void nhi_remove(struct pci_dev *pdev) { struct tb *tb = pci_get_drvdata(pdev); struct tb_nhi *nhi = tb->nhi; + thunderbolt_power_fini(tb); thunderbolt_shutdown_and_free(tb); nhi_shutdown(nhi); } diff --git a/drivers/thunderbolt/power.c b/drive
[PATCH v3 4/7] Revert "PM / Runtime: Remove the exported function pm_children_suspended()"
This reverts commit 62006c1702b3b1be0c0726949e0ee0ea2326be9c which removed pm_children_suspended() because it had only a single caller. We're about to add a second caller, so establish the status quo ante. Cc: Ulf Hansson Cc: Rafael J. Wysocki Signed-off-by: Lukas Wunner --- drivers/base/power/runtime.c | 3 +-- include/linux/pm_runtime.h | 7 +++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 872eac4..03293c3 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -243,8 +243,7 @@ static int rpm_check_suspend_allowed(struct device *dev) retval = -EACCES; else if (atomic_read(&dev->power.usage_count) > 0) retval = -EAGAIN; - else if (!dev->power.ignore_children && - atomic_read(&dev->power.child_count)) + else if (!pm_children_suspended(dev)) retval = -EBUSY; /* Pending resume requests take precedence over suspends. */ diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index ca4823e..7de2aa5c 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -66,6 +66,12 @@ static inline void pm_suspend_ignore_children(struct device *dev, bool enable) dev->power.ignore_children = enable; } +static inline bool pm_children_suspended(struct device *dev) +{ + return dev->power.ignore_children + || !atomic_read(&dev->power.child_count); +} + static inline void pm_runtime_get_noresume(struct device *dev) { atomic_inc(&dev->power.usage_count); @@ -161,6 +167,7 @@ static inline void pm_runtime_allow(struct device *dev) {} static inline void pm_runtime_forbid(struct device *dev) {} static inline void pm_suspend_ignore_children(struct device *dev, bool enable) {} +static inline bool pm_children_suspended(struct device *dev) { return false; } static inline void pm_runtime_get_noresume(struct device *dev) {} static inline void pm_runtime_put_noidle(struct device *dev) {} static inline bool device_run_wake(struct device *dev) { return false; } -- 2.10.2
[PATCH v3 5/7] PM: Make requirements of dev_pm_domain_set() more precise
Since commit 989561de9b51 ("PM / Domains: add setter for dev.pm_domain") a PM domain may only be assigned to unbound devices. The motivation was not made explicit in the changelog other than "in the general case that can cause problems and also [...] we can simplify code quite a bit if we can always assume that". Rafael J. Wysocki elaborated in a mailing list conversation that "setting a PM domain generally changes the set of PM callbacks for the device and it may not be safe to call it after the driver has been bound". The concern seems to be that if a device is put to sleep and its PM callbacks are changed, the device may end up in an undefined state or not resume at all. The real underlying requirement is thus to ensure that the device is awake and execution of its PM callbacks is prevented while the PM domain is assigned. Unbound devices happen to fulfill this requirement, but bound devices can be made to satisfy it as well: The caller can prevent execution of PM ops with lock_system_sleep() and by holding a runtime PM reference to the device. Accordingly, adjust dev_pm_domain_set() to WARN only if the device is in the midst of a system sleep transition, or runtime PM is enabled and the device is either not active or may become inactive imminently (because it has no active children or its refcount is zero). The change is required to support runtime PM for Thunderbolt on the Mac, which poses the unique issue that a child device (the NHI) needs to assign a PM domain to its grandparent (the upstream bridge). Because the grandparent's driver is built-in and the child's driver is a module, the grandparent is usually already bound when the child probes, resulting in a WARN splat when calling dev_pm_domain_set(). However the PM core guarantees both that the grandparent is active and that system sleep is not commenced until the child has finished probing. So in this case it is safe to call dev_pm_domain_set() from the child's ->probe hook and the WARN splat is entirely gratuitous. Note that commit e79aee49bcf9 ("PM: Avoid false-positive warnings in dev_pm_domain_set()") modified the WARN to not apply if a PM domain is removed. This is unsafe as it allows removal of the PM domain while the device is asleep. The present commit rectifies this. Cc: Ulf Hansson Cc: Tomeu Vizoso Cc: Rafael J. Wysocki Signed-off-by: Lukas Wunner --- drivers/base/power/common.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/base/power/common.c b/drivers/base/power/common.c index f6a9ad5..d02c1e0 100644 --- a/drivers/base/power/common.c +++ b/drivers/base/power/common.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "power.h" @@ -136,8 +137,10 @@ EXPORT_SYMBOL_GPL(dev_pm_domain_detach); * @dev: Device whose PM domain is to be set. * @pd: PM domain to be set, or NULL. * - * Sets the PM domain the device belongs to. The PM domain of a device needs - * to be set before its probe finishes (it's bound to a driver). + * Sets the PM domain the device belongs to. The PM domain of a device needs + * to be set while the device is awake. This is guaranteed during ->probe. + * Otherwise the caller is responsible for ensuring wakefulness, e.g. by + * holding a runtime PM reference as well as invoking lock_system_sleep(). * * This function must be called with the device lock held. */ @@ -146,8 +149,12 @@ void dev_pm_domain_set(struct device *dev, struct dev_pm_domain *pd) if (dev->pm_domain == pd) return; - WARN(pd && device_is_bound(dev), -"PM domains can only be changed for unbound devices\n"); + WARN(dev->power.is_prepared || dev->power.is_suspended || +(pm_runtime_enabled(dev) && + (dev->power.runtime_status != RPM_ACTIVE || + (pm_children_suspended(dev) && + !atomic_read(&dev->power.usage_count, +"PM domains can only be changed for awake devices\n"); dev->pm_domain = pd; device_pm_check_callbacks(dev); } -- 2.10.2
[PATCH v3 7/7] thunderbolt: Runtime suspend NHI when idle
Runtime suspend the NHI when no Thunderbolt devices have been plugged in for 10 sec (user-configurable via autosuspend_delay_ms in sysfs). The NHI is not able to detect plug events while suspended, it relies on the GPE handler to resume it on hotplug. After the NHI resumes, it takes about 700 ms until a hotplug event appears on the RX ring. In case autosuspend_delay_ms has been reduced to 0 by the user, we need to wait in tb_resume() to avoid going back to sleep before we had a chance to detect a hotplugged device. A runtime pm ref is held for the duration of tb_handle_hotplug() to keep the NHI awake while the hotplug event is processed. Apart from that we acquire a runtime pm ref for each newly allocated switch (except for the root switch) and drop one when a switch is freed, thereby ensuring the NHI stays active as long as devices are plugged in. This behaviour is identical to the macOS driver. Cc: Andreas Noever Signed-off-by: Lukas Wunner --- drivers/thunderbolt/nhi.c| 2 ++ drivers/thunderbolt/power.c | 9 + drivers/thunderbolt/switch.c | 9 + drivers/thunderbolt/tb.c | 13 + 4 files changed, 33 insertions(+) diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 88fb2fb..319ed81 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -632,6 +632,8 @@ static const struct dev_pm_ops nhi_pm_ops = { * pci-tunnels stay alive. */ .restore_noirq = nhi_resume_noirq, + .runtime_suspend = nhi_suspend_noirq, + .runtime_resume = nhi_resume_noirq, }; static struct pci_device_id nhi_ids[] = { diff --git a/drivers/thunderbolt/power.c b/drivers/thunderbolt/power.c index 4d7c6a0..1b5f066 100644 --- a/drivers/thunderbolt/power.c +++ b/drivers/thunderbolt/power.c @@ -320,6 +320,12 @@ void thunderbolt_power_init(struct tb *tb) tb->power = power; + pm_runtime_allow(nhi_dev); + pm_runtime_set_autosuspend_delay(nhi_dev, 1); + pm_runtime_use_autosuspend(nhi_dev); + pm_runtime_mark_last_busy(nhi_dev); + pm_runtime_put_autosuspend(nhi_dev); + return; err: @@ -336,6 +342,9 @@ void thunderbolt_power_fini(struct tb *tb) if (!power) return; + pm_runtime_get(nhi_dev); + pm_runtime_forbid(nhi_dev); + tb->power = NULL; dev_pm_domain_set(upstream_dev, NULL); diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c index c6f30b1..422fe6e 100644 --- a/drivers/thunderbolt/switch.c +++ b/drivers/thunderbolt/switch.c @@ -5,6 +5,7 @@ */ #include +#include #include #include "tb.h" @@ -326,6 +327,11 @@ void tb_switch_free(struct tb_switch *sw) if (!sw->is_unplugged) tb_plug_events_active(sw, false); + if (sw != sw->tb->root_switch) { + pm_runtime_mark_last_busy(&sw->tb->nhi->pdev->dev); + pm_runtime_put_autosuspend(&sw->tb->nhi->pdev->dev); + } + kfree(sw->ports); kfree(sw->drom); kfree(sw); @@ -420,6 +426,9 @@ struct tb_switch *tb_switch_alloc(struct tb *tb, u64 route) if (tb_plug_events_active(sw, true)) goto err; + if (tb->root_switch) + pm_runtime_get(&tb->nhi->pdev->dev); + return sw; err: kfree(sw->ports); diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 24b6d30..a3fedf9 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "tb.h" #include "tb_regs.h" @@ -217,8 +218,11 @@ static void tb_handle_hotplug(struct work_struct *work) { struct tb_hotplug_event *ev = container_of(work, typeof(*ev), work); struct tb *tb = ev->tb; + struct device *dev = &tb->nhi->pdev->dev; struct tb_switch *sw; struct tb_port *port; + + pm_runtime_get(dev); mutex_lock(&tb->lock); if (!tb->hotplug_active) goto out; /* during init, suspend or shutdown */ @@ -274,6 +278,8 @@ static void tb_handle_hotplug(struct work_struct *work) out: mutex_unlock(&tb->lock); kfree(ev); + pm_runtime_mark_last_busy(dev); + pm_runtime_put_autosuspend(dev); } /** @@ -433,4 +439,11 @@ void thunderbolt_resume(struct tb *tb) tb->hotplug_active = true; mutex_unlock(&tb->lock); tb_info(tb, "resume finished\n"); + + /* +* If runtime resuming due to a hotplug event (rather than resuming +* from system sleep), wait for it to arrive. May take about 700 ms. +*/ + if (tb->nhi->pdev->dev.power.runtime_status == RPM_RESUMING) + msleep(1000); } -- 2.10.2
Re: OOM: Better, but still there on
On 2016/12/17 21:59, Nils Holland wrote: > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: >> mount -t tracefs none /debug/trace >> echo 1 > /debug/trace/events/vmscan/enable >> cat /debug/trace/trace_pipe > trace.log >> >> should help >> [...] > > No problem! I enabled writing the trace data to a file and then tried > to trigger another OOM situation. That worked, this time without a > complete kernel panic, but with only my processes being killed and the > system becoming unresponsive. When that happened, I let it run for > another minute or two so that in case it was still logging something > to the trace file, it could continue to do so some time longer. Then I > rebooted with the only thing that still worked, i.e. by means of magic > SysRequest. Under OOM situation, writing to a file on disk unlikely works. Maybe logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" if your are using bash) works better. (I wish we can do it from kernel so that /bin/cat is not disturbed by delays due to page fault.) If you can configure netconsole for logging OOM killer messages and UDP socket for logging trace_pipe messages, udplogger at https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/ might fit for logging both output with timestamp into a single file.
Re: [PATCH v1 & v6 1/2] PM/devfreq: add suspend frequency support
Hi Lin, 2016-11-24 18:54 GMT+09:00 Chanwoo Choi : > Hi Lin, > > On 2016년 11월 24일 18:28, Chanwoo Choi wrote: >> Hi Lin, >> >> On 2016년 11월 24일 17:34, hl wrote: >>> Hi Chanwoo Choi, >>> >>> >>> On 2016年11月24日 16:16, Chanwoo Choi wrote: Hi Lin, On 2016년 11월 24일 16:34, hl wrote: > Hi Chanwoo Choi, > > I think the dev_pm_opp_get_suspend_opp() have implement most of > the funtion, all we need is just define the node in dts, like following: > > &dmc_opp_table { > opp06 { > opp-suspend; > }; > }; Two approaches use the 'opp-suspend' property. I think that the method to support suspend-opp have to guarantee following conditions: - Support the all of devfreq's governors. >>> As MyungJoo Ham suggestion, i will set the suspend frequency in >>> devfreq_suspend_device(), >>> which will ingore governor. >> >> Other approach already support the all of governors. >> Before calling the mail, I discussed with Myungjoo Ham. >> Myungjoo prefer to use the devfreq_suspend/devfreq_resume(). > > It is not correct expression. We need to wait the reply from Myungjoo > to clarify this. > >> >> To Myungjoo, >> Please add your opinion how to support the suspend frequency. > >> - Devfreq framework have the responsibility to change the frequency/voltage for suspend-opp. If we uses the new devfreq_suspend(), each devfreq device don't care how to support the suspend-opp. Just the developer of each devfreq device need to add 'opp-suspend' propet to OPP entry in DT file. >>> Why should support change the voltage in devfreq framework, i think it >>> shuold be handle in >>> specific driver, i think the devfreq only handle it can get the right >>> frequency, then pass it to >> >> No, the frequency should be handled by governor or framework. >> The each devfreq device has no any responsibility of next frequency/voltage. >> The governor and core of devfreq can decide the next frequency/voltage. >> You can refer to the cpufreq subsystem. >> >>> specific driver, i think the voltage should handle in the >>> devfreq->profile->target(); >> >> The call of devfreq->profile->target() have to be handled by devfreq >> framework. >> If user want to set the suspend frequency, user can add the 'suspend-opp' >> property. >> It think this way is easy. >> >> But, >> If the each devfreq device want to decide the next frequency/voltage only for >> suspend state. We can check the cpufreq subsystem. >> >> If specific devfreq device want to handle the suspend frequency, >> each devfreq will add the own suspend/resume functions as following: >> >> struct devfreq_dev_profile { >> int (*suspend)(struct devfreq *dev);// new function pointer >> int (*resume)(struct devfreq *dev); // new function pointer >> } a_profile; >> >> a_profile = devfreq_generic_suspend; >> >> The devfreq framework will provide the devfreq_generic_suspend() >> funticon. >> int devfreq_generic_suspend(struce devfreq *dev) { >> ... >> devfreq->profile->target(..., devfreq->suspend_freq); >> ... >> } >> >> or >> >> a_profile = a_devfreq_suspend; // specific function of each devfreq >> device >> >> The devfreq_suspend() will call 'devfreq->profile->suspend()' function >> instead of devfreq->profile->target(); >> >> The devfreq call the 'devfreq->profile->suspend()' >> to support the suspend frequency. >> >> Regards, >> Chanwoo Choi > > The key difference between two approaches: > > Your approach: > - The each developer should add the 'opp-suspend' property to the dts file. > - The each devfreq should call the devfreq_suspend_device() > to support the suspend frequency. > > If each devfreq doesn't call the devfreq_suspend_device(), devfreq framework > can support the suspend frequency. > > Other approach: > - The each developer only should add the 'opp-suspend' property to the dts > file > without the additional behavior. > > In the cpufreq subsystem, > When support the suspend frequency of cpufreq, we just add 'opp-suspend' > property > without the additional behavior. I'm missing the use-case when using the devfreq_suspend_device() before entering the suspend mode. We should consider the case when devfreq device calls the devfreq_suspend_device() directly. Because devfreq_suspend_device() is exported function, each devfreq device call this function on the fly without entering the suspend mode. I correct my opinion. Your approach is necessary. I'm sorry to confuse you. So, I make the following patch. This patch set the suspend frequency in devfreq_suspend_device() after stoping the governor. It consider the all governors of devfreq. What do you think? If you are ok, I'll send this patch with your author. int devfreq_suspend_device(struct devfreq *dev
Re: [PATCH v4] dt-bindings: power: supply: bq24735: reverse the polarity of ac-detect
Hi, On Fri, Dec 16, 2016 at 10:44:00AM +0100, Peter Rosin wrote: > The ACOK pin on the bq24735 is active-high, of course meaning that when > AC is OK the pin is high. However, all Tegra dts files have incorrectly > specified active-high even though the signal is inverted on the Tegra > boards. This has worked since the Linux driver has also inverted the > meaning of the GPIO. Fix this situation by simply specifying in the > bindings what everybody else agrees on; that the ti,ac-detect-gpios is > active on AC adapter absence. > > Signed-off-by: Peter Rosin Thanks for your patch. We are currently in the merge window and your patch will appear in linux-next once 4.10-rc1 has been tagged by Linus Torvalds. Until then I queued it into this branch: https://git.kernel.org/cgit/linux/kernel/git/sre/linux-power-supply.git/log/?h=for-next-next -- Sebastian signature.asc Description: PGP signature
Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
> diff --git a/lib/test_siphash.c b/lib/test_siphash.c > new file mode 100644 > index ..93549e4e22c5 > --- /dev/null > +++ b/lib/test_siphash.c > @@ -0,0 +1,83 @@ > +/* Test cases for siphash.c > + * > + * Copyright (C) 2016 Jason A. Donenfeld . All Rights > Reserved. > + * > + * This file is provided under a dual BSD/GPLv2 license. > + * > + * SipHash: a fast short-input PRF > + * https://131002.net/siphash/ > + * > + * This implementation is specifically for SipHash2-4. > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > + > +/* Test vectors taken from official reference source available at: > + * https://131002.net/siphash/siphash24.c > + */ > +static const u64 test_vectors[64] = { > + 0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL, > + 0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL, > + 0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL, > + 0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL, > + 0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL, > + 0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL, > + 0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL, > + 0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL, > + 0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL, > + 0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL, > + 0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL, > + 0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL, > + 0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL, > + 0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL, > + 0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL, > + 0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL, > + 0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL, > + 0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL, > + 0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL, > + 0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL, > + 0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL, > + 0x958a324ceb064572ULL > +}; > +static const siphash_key_t test_key = > + { 0x0706050403020100ULL , 0x0f0e0d0c0b0a0908ULL }; > + > +static int __init siphash_test_init(void) > +{ > + u8 in[64] __aligned(SIPHASH_ALIGNMENT); > + u8 in_unaligned[65]; > + u8 i; > + int ret = 0; > + > + for (i = 0; i < 64; ++i) { > + in[i] = i; > + in_unaligned[i + 1] = i; > + if (siphash(in, i, test_key) != test_vectors[i]) { > + pr_info("self-test aligned %u: FAIL\n", i + 1); > + ret = -EINVAL; > + } > + if (siphash_unaligned(in_unaligned + 1, i, test_key) != > test_vectors[i]) { > + pr_info("self-test unaligned %u: FAIL\n", i + 1); > + ret = -EINVAL; > + } > + } > + if (!ret) > + pr_info("self-tests: pass\n"); > + return ret; > +} > + > +static void __exit siphash_test_exit(void) > +{ > +} > + > +module_init(siphash_test_init); > +module_exit(siphash_test_exit); > + > +MODULE_AUTHOR("Jason A. Donenfeld "); > +MODULE_LICENSE("Dual BSD/GPL"); > -- > 2.11.0 > I believe the output of SipHash depends upon endianness. Folks who request a digest through the af_alg interface will likely expect a byte array. I think that means on little endian machines, values like element 0 must be reversed byte reversed: 0x726fdb47dd0e0e31ULL => 31,0e,0e,dd,47,db,6f,72 If I am not mistaken, that value (and other tv's) are returned here: return (v0 ^ v1) ^ (v2 ^ v3); It may be prudent to include the endian reversal in the test to ensure big endian machines produce expected results. Some closely related testing on an old Apple PowerMac G5 revealed that result needed to be reversed before returning it to a caller. Jeff
Re: [PATCH v3] power: supply: bq24735-charger: optionally poll the ac-detect gpio
Hi, On Thu, Dec 15, 2016 at 10:28:46AM +0100, Peter Rosin wrote: > If the ac-detect gpio does not support interrupts, provide a fallback > to poll the gpio at a configurable interval. > > Signed-off-by: Peter Rosin Thanks for your patch. We are currently in the merge window and your patch will appear in linux-next once 4.10-rc1 has been tagged by Linus Torvalds. Until then I queued it into this branch: https://git.kernel.org/cgit/linux/kernel/git/sre/linux-power-supply.git/log/?h=for-next-next -- Sebastian signature.asc Description: PGP signature
Re: [PATCH v2 1/2] mfd: axp20x: Add a few missing defines for AXP288 specific registers
On Thu, Dec 15, 2016 at 10:07 PM, Hans de Goede wrote: > Add defines for the AXP288_POWER_REASON and AXP288_RT_BATT_V_H and > AXP288_RT_BATT_V_L registers. While at it also move the > AXP288_TS_ADC_H-AXP288_GP_ADC_L defines, which for some reason where > in a different place, together with the rest of the AXP288 specific > defines. > > Signed-off-by: Hans de Goede Acked-by: Chen-Yu Tsai
Re: [PATCH v2 2/2] mfd: axp20x: Fix axp288 volatile ranges
On Thu, Dec 15, 2016 at 10:07 PM, Hans de Goede wrote: > The axp288 pmic has a lot more volatile registers then we were > listing in axp288_volatile_ranges, fix this. > > Signed-off-by: Hans de Goede Acked-by: Chen-Yu Tsai
Re: [PATCH v3 1/2] mfd: axp20x: Add a few missing defines for AXP288 specific registers
On Sat, Dec 17, 2016 at 4:09 AM, Hans de Goede wrote: > Add defines for the AXP288_POWER_REASON and AXP288_RT_BATT_V_H and > AXP288_RT_BATT_V_L and AXP288_BC_* registers. While at it also move the > AXP288_TS_ADC_H-AXP288_GP_ADC_L defines, which for some reason where > in a different place, together with the rest of the AXP288 specific > defines. > > Signed-off-by: Hans de Goede Acked-by: Chen-Yu Tsai
Re: [PATCH v3 2/2] mfd: axp20x: Fix axp288 volatile ranges
On Sat, Dec 17, 2016 at 4:09 AM, Hans de Goede wrote: > The axp288 pmic has a lot more volatile registers then we were > listing in axp288_volatile_ranges, fix this. > > Signed-off-by: Hans de Goede Acked-by: Chen-Yu Tsai FYI, if you're going to add support for the battery charger detection module later, you would need to add the remaining AXP288_BC_* registers to the writable table.
Re: [PATCH v1 & v6 1/2] PM/devfreq: add suspend frequency support
Hey guys, Chanwoo Choi wrote: > Hi Lin, > > 2016-11-24 18:54 GMT+09:00 Chanwoo Choi : >> Hi Lin, >> >> On 2016년 11월 24일 18:28, Chanwoo Choi wrote: >>> Hi Lin, >>> >>> On 2016년 11월 24일 17:34, hl wrote: Hi Chanwoo Choi, On 2016年11月24日 16:16, Chanwoo Choi wrote: > Hi Lin, > > On 2016년 11월 24일 16:34, hl wrote: >> Hi Chanwoo Choi, >> >> I think the dev_pm_opp_get_suspend_opp() have implement most of >> the funtion, all we need is just define the node in dts, like following: >> >> &dmc_opp_table { >> opp06 { >> opp-suspend; >> }; >> }; > Two approaches use the 'opp-suspend' property. > > I think that the method to support suspend-opp have to > guarantee following conditions: > - Support the all of devfreq's governors. As MyungJoo Ham suggestion, i will set the suspend frequency in devfreq_suspend_device(), which will ingore governor. >>> >>> Other approach already support the all of governors. >>> Before calling the mail, I discussed with Myungjoo Ham. >>> Myungjoo prefer to use the devfreq_suspend/devfreq_resume(). >> >> It is not correct expression. We need to wait the reply from Myungjoo >> to clarify this. >> >>> >>> To Myungjoo, >>> Please add your opinion how to support the suspend frequency. >> >>> > - Devfreq framework have the responsibility to change the >frequency/voltage for suspend-opp. If we uses the >new devfreq_suspend(), each devfreq device don't care >how to support the suspend-opp. Just the developer of each >devfreq device need to add 'opp-suspend' propet to OPP entry in DT > file. Why should support change the voltage in devfreq framework, i think it shuold be handle in specific driver, i think the devfreq only handle it can get the right frequency, then pass it to >>> >>> No, the frequency should be handled by governor or framework. >>> The each devfreq device has no any responsibility of next frequency/voltage. >>> The governor and core of devfreq can decide the next frequency/voltage. >>> You can refer to the cpufreq subsystem. >>> specific driver, i think the voltage should handle in the devfreq->profile->target(); >>> >>> The call of devfreq->profile->target() have to be handled by devfreq >>> framework. >>> If user want to set the suspend frequency, user can add the 'suspend-opp' >>> property. >>> It think this way is easy. >>> >>> But, >>> If the each devfreq device want to decide the next frequency/voltage only >>> for >>> suspend state. We can check the cpufreq subsystem. >>> >>> If specific devfreq device want to handle the suspend frequency, >>> each devfreq will add the own suspend/resume functions as following: >>> >>> struct devfreq_dev_profile { >>> int (*suspend)(struct devfreq *dev);// new function >>> pointer >>> int (*resume)(struct devfreq *dev); // new function >>> pointer >>> } a_profile; >>> >>> a_profile = devfreq_generic_suspend; >>> >>> The devfreq framework will provide the devfreq_generic_suspend() >>> funticon. >>> int devfreq_generic_suspend(struce devfreq *dev) { >>> ... >>> devfreq->profile->target(..., devfreq->suspend_freq); >>> ... >>> } >>> >>> or >>> >>> a_profile = a_devfreq_suspend; // specific function of each devfreq >>> device >>> >>> The devfreq_suspend() will call 'devfreq->profile->suspend()' function >>> instead of devfreq->profile->target(); >>> >>> The devfreq call the 'devfreq->profile->suspend()' >>> to support the suspend frequency. >>> >>> Regards, >>> Chanwoo Choi >> >> The key difference between two approaches: >> >> Your approach: >> - The each developer should add the 'opp-suspend' property to the dts file. >> - The each devfreq should call the devfreq_suspend_device() >> to support the suspend frequency. >> >> If each devfreq doesn't call the devfreq_suspend_device(), devfreq >> framework >> can support the suspend frequency. >> >> Other approach: >> - The each developer only should add the 'opp-suspend' property to the dts >> file >> without the additional behavior. >> >> In the cpufreq subsystem, >> When support the suspend frequency of cpufreq, we just add 'opp-suspend' >> property >> without the additional behavior. > > I'm missing the use-case when using the devfreq_suspend_device() > before entering the suspend mode. We should consider the case when > devfreq device > calls the devfreq_suspend_device() directly. Because devfreq_suspend_device() > is exported function, each devfreq device call this function on the fly > without entering the suspend mode. > > I correct my opinion. Your approach is necessary. I'm sorry to confuse you. > So, I make the following patch. This patch set the suspend frequency > in devfreq_s
Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
To follow up on my comments that your benchmark results were peculiar, here's my benchmark code. It just computes the hash of all n*(n+1)/2 possible non-empty substrings of a buffer of n (called "max" below) bytes. "cpb" is "cycles per byte". (The average length is (n+2)/3, c.f. https://oeis.org/A000292) On x86-32, HSipHash is asymptotically twice the speed of SipHash, rising to 2.5x for short strings: SipHash/HSipHash benchmark, sizeof(long) = 4 SipHash: max= 4 cycles= 10495 cpb=524.7500 (sum=47a4f5554869fa97) HSipHash: max= 4 cycles= 3400 cpb=170. (sum=146a863e) SipHash: max= 8 cycles= 24468 cpb=203.9000 (sum=21c41a86355affcc) HSipHash: max= 8 cycles= 9237 cpb= 76.9750 (sum=d3b5e0cd) SipHash: max= 16 cycles= 94622 cpb=115.9583 (sum=26d816b72721e48f) HSipHash: max= 16 cycles= 34499 cpb= 42.2782 (sum=16bb7475) SipHash: max= 32 cycles=418767 cpb= 69.9811 (sum=dd5a97694b8a832d) HSipHash: max= 32 cycles=156695 cpb= 26.1857 (sum=eed00fcb) SipHash: max= 64 cycles= 2119152 cpb= 46.3101 (sum=a2a725aecc09ed00) HSipHash: max= 64 cycles= 1008678 cpb= 22.0428 (sum=99b9f4f) SipHash: max= 128 cycles= 12728659 cpb= 35.5788 (sum=420878cd20272817) HSipHash: max= 128 cycles= 5452931 cpb= 15.2419 (sum=f1f4ad18) SipHash: max= 256 cycles= 38931946 cpb= 13.7615 (sum=e05dfb28b90dfd98) HSipHash: max= 256 cycles= 13807312 cpb= 4.8805 (sum=ceeafcc1) SipHash: max= 512 cycles= 205537380 cpb= 9.1346 (sum=7d129d4de145fbea) HSipHash: max= 512 cycles= 103420960 cpb= 4.5963 (sum=7f15a313) SipHash: max=1024 cycles=1540259472 cpb= 8.5817 (sum=cca7cbdc778ca8af) HSipHash: max=1024 cycles= 796090824 cpb= 4.4355 (sum=d8f3374f) On x86-64, SipHash is consistently faster, asymptotically approaching 2x for long strings: SipHash/HSipHash benchmark, sizeof(long) = 8 SipHash: max= 4 cycles= 2642 cpb=132.1000 (sum=47a4f5554869fa97) HSipHash: max= 4 cycles= 2498 cpb=124.9000 (sum=146a863e) SipHash: max= 8 cycles= 5270 cpb= 43.9167 (sum=21c41a86355affcc) HSipHash: max= 8 cycles= 7140 cpb= 59.5000 (sum=d3b5e0cd) SipHash: max= 16 cycles= 19950 cpb= 24.4485 (sum=26d816b72721e48f) HSipHash: max= 16 cycles= 23546 cpb= 28.8554 (sum=16bb7475) SipHash: max= 32 cycles= 80188 cpb= 13.4004 (sum=dd5a97694b8a832d) HSipHash: max= 32 cycles=101218 cpb= 16.9148 (sum=eed00fcb) SipHash: max= 64 cycles=373286 cpb= 8.1575 (sum=a2a725aecc09ed00) HSipHash: max= 64 cycles=535568 cpb= 11.7038 (sum=99b9f4f) SipHash: max= 128 cycles= 2075224 cpb= 5.8006 (sum=420878cd20272817) HSipHash: max= 128 cycles= 3336820 cpb= 9.3270 (sum=f1f4ad18) SipHash: max= 256 cycles= 14276278 cpb= 5.0463 (sum=e05dfb28b90dfd98) HSipHash: max= 256 cycles= 28847880 cpb= 10.1970 (sum=ceeafcc1) SipHash: max= 512 cycles= 50135180 cpb= 2.2281 (sum=7d129d4de145fbea) HSipHash: max= 512 cycles= 86145916 cpb= 3.8286 (sum=7f15a313) SipHash: max=1024 cycles= 334111900 cpb= 1.8615 (sum=cca7cbdc778ca8af) HSipHash: max=1024 cycles= 640432452 cpb= 3.5682 (sum=d8f3374f) Here's the code; compile with -DSELFTEST. (The main purpose of printing the sum is to prevent dead code elimination.) #if SELFTEST #include #include static inline uint64_t rol64(uint64_t word, unsigned int shift) { return word << shift | word >> (64 - shift); } static inline uint32_t rol32(uint32_t word, unsigned int shift) { return word << shift | word >> (32 - shift); } static inline uint64_t get_unaligned_le64(void const *p) { return *(uint64_t const *)p; } static inline uint32_t get_unaligned_le32(void const *p) { return *(uint32_t const *)p; } static inline uint64_t le64_to_cpup(uint64_t const *p) { return *p; } static inline uint32_t le32_to_cpup(uint32_t const *p) { return *p; } #else #include/* For rol64 */ #include #include #include #endif /* The basic ARX mixing function, taken from Skein */ #define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a)) /* * The complete SipRound. Note that, when unrolled twice like below, * the 32-bit rotates drop out on 32-bit machines. */ #define SIP_ROUND(a, b, c, d) \ (SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \ SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32)) /* * This is rolled up more than most implementations, resulting in about * 55% the code size. Speed is a few precent slower. A crude benchmark * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);) * produces the following timings (in usec): * * i386i386i386x86_64 x86_64 x86_64 x86_64 * Length small unroll halfmd4 small unroll halfmd4 teahash * 1..4106910291608 195 160 399 690 * 1..8248323813851 410 360 9881659 * 1..12 430341526207 690 61816422690 * 1..16 61225931866
Re: [PATCH v3] net: macb: Added PCI wrapper for Platform Driver.
From: Bartosz Folta Date: Wed, 14 Dec 2016 06:39:15 + > There are hardware PCI implementations of Cadence GEM network > controller. This patch will allow to use such hardware with reuse of > existing Platform Driver. > > Signed-off-by: Bartosz Folta > --- > Changed in v3: > Fixed dependencies in Kconfig. > --- > Changed in v2: > Respin to net-next. Changed patch formatting. Applied.
Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
On Fri, Dec 16, 2016 at 09:15:03PM -0500, George Spelvin wrote: > >> - Ted, Andy Lutorminski and I will try to figure out a construction of > >> get_random_long() that we all like. We don't have to find the most optimal solution right away; we can approach this incrementally, after all. So long as we replace get_random_{long,int}() with something which is (a) strictly better in terms of security given today's use of MD5, and (b) which is strictly *faster* than the current construction on 32-bit and 64-bit systems, we can do that, and can try to make it be faster while maintaining some minimum level of security which is sufficient for all current users of get_random_{long,int}() and which can be clearly artificulated for future users of get_random_{long,int}(). The main worry at this point I have is benchmarking siphash on a 32-bit system. It may be that simply batching the chacha20 output so that we're using the urandom construction more efficiently is the better way to go, since that *does* meet the criteron of strictly more secure and strictly faster than the current MD5 solution. I'm open to using siphash, but I want to see the the 32-bit numbers first. As far as half-siphash is concerned, it occurs to me that the main problem will be those users who need to guarantee that output can't be guessed over a long period of time. For example, if you have a long-running process, then the output needs to remain unguessable over potentially months or years, or else you might be weakening the ASLR protections. If on the other hand, the hash table or the process will be going away in a matter of seconds or minutes, the requirements with respect to cryptographic strength go down significantly. Now, maybe this doesn't matter that much if we can guarantee (or make assumptions) that the attacker doesn't have unlimited access the output stream of get_random_{long,int}(), or if it's being used in an anti-DOS use case where it ultimately only needs to be harder than alternate ways of attacking the system. Rekeying every five minutes doesn't necessarily help the with respect to ASLR, but it might reduce the amount of the output stream that would be available to the attacker in order to be able to attack the get_random_{long,int}() generator, and it also reduces the value of doing that attack to only compromising the ASLR for those processes started within that five minute window. Cheers, - Ted P.S. I'm using ASLR as an example use case, above; of course we will need to make similar eximainations of the other uses of get_random_{long,int}(). P.P.S. We might also want to think about potentially defining get_random_{long,int}() to be unambiguously strong, and then creating a get_weak_random_{long,int}() which on platforms where performance might be a consideration, it uses a weaker algorithm perhaps with some kind of rekeying interval.
Re: [PATCH v2 00/11] add support for VBUS max current and min voltage limits AXP20X and AXP22X PMICs
Hi Quentin, On Fri, Dec 09, 2016 at 12:04:08PM +0100, Quentin Schulz wrote: > The X-Powers AXP209 and AXP20X PMICs are able to set a limit for the > VBUS power supply for both max current and min voltage supplied. This > series of patch adds the possibility to set these limits from sysfs. > > Also, the AXP223 PMIC shares most of its behaviour with the AXP221 but > the former can set the VBUS power supply max current to 100mA, unlike > the latter. The AXP223 VBUS power supply driver used to probe on the > AXP221 compatible. This series of patch introduces a new compatible for > the AXP223 to be able to set the current max limit to 100mA. > > With that new compatible, boards having the AXP223 see their DT updated > to use the VBUS power supply driver with the correct compatible. > > This series of patch also migrates from of_device_is_compatible function > to the data field of of_device_id to identify the compatible used to > probe. This improves the code readability. > > Mostly cosmetic changes in v2 and adding volatile and writeable regs to > AXP20X and AXP22X MFD cells for the VBUS power supply driver. > > Quentin Schulz (11): > power: supply: axp20x_usb_power: use of_device_id data field instead > of device_is_compatible > mfd: axp20x: add volatile and writeable reg ranges for VBUS power > supply driver > power: supply: axp20x_usb_power: set min voltage and max current from > sysfs > Documentation: DT: binding: axp20x_usb_power: add axp223 compatible > power: supply: axp20x_usb_power: add 100mA max current limit for > AXP223 > mfd: axp20x: add separate MFD cell for AXP223 > ARM: dtsi: add DTSI for AXP223 > ARM: dts: sun8i-a33-olinuxino: use AXP223 DTSI > ARM: dts: sun8i-a33-sinlinx-sina33: use AXP223 DTSI > ARM: dts: sun8i-r16-parrot: use AXP223 DTSI > ARM: dtsi: sun8i-reference-design-tablet: use AXP223 DTSI Thanks for your patchset. We are currently in the merge window and patches 1 & 3-5 will appear in linux-next once 4.10-rc1 has been tagged by Linus Torvalds. Until then I queued them into this branch: https://git.kernel.org/cgit/linux/kernel/git/sre/linux-power-supply.git/log/?h=for-next-next -- Sebastian -- Sebastian signature.asc Description: PGP signature
Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
> As far as half-siphash is concerned, it occurs to me that the main > problem will be those users who need to guarantee that output can't be > guessed over a long period of time. For example, if you have a > long-running process, then the output needs to remain unguessable over > potentially months or years, or else you might be weakening the ASLR > protections. If on the other hand, the hash table or the process will > be going away in a matter of seconds or minutes, the requirements with > respect to cryptographic strength go down significantly. Perhaps SipHash-4-8 should be used instead of SipHash-2-4. I believe SipHash-4-8 is recommended for the security conscious who want to be more conservative in their security estimates. SipHash-4-8 does not add much more processing. If you are clocking SipHash-2-4 at 2.0 or 2.5 cpb, then SipHash-4-8 will run at 3.0 to 4.0. Both are well below MD5 times. (At least with the data sets I've tested). > Now, maybe this doesn't matter that much if we can guarantee (or make > assumptions) that the attacker doesn't have unlimited access the > output stream of get_random_{long,int}(), or if it's being used in an > anti-DOS use case where it ultimately only needs to be harder than > alternate ways of attacking the system. > > Rekeying every five minutes doesn't necessarily help the with respect > to ASLR, but it might reduce the amount of the output stream that > would be available to the attacker in order to be able to attack the > get_random_{long,int}() generator, and it also reduces the value of > doing that attack to only compromising the ASLR for those processes > started within that five minute window. Forgive my ignorance... I did not find reading on using the primitive in a PRNG. Does anyone know what Aumasson or Bernstein have to say? Aumasson's site does not seem to discuss the use case: https://www.google.com/search?q=siphash+rng+site%3A131002.net. (And their paper only mentions random-number once in a different context). Making the leap from internal hash tables and short-lived network packets to the rng case may leave something to be desired, especially if the bits get used in unanticipated ways, like creating long term private keys. Jeff
Re: [PATCH] x86/floppy: use designated initializers
* Kees Cook wrote: > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. > > Signed-off-by: Kees Cook > --- > arch/x86/include/asm/floppy.h | 20 ++-- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/include/asm/floppy.h b/arch/x86/include/asm/floppy.h > index 1c7eefe32502..d0e4702883b9 100644 > --- a/arch/x86/include/asm/floppy.h > +++ b/arch/x86/include/asm/floppy.h > @@ -229,18 +229,18 @@ static struct fd_routine_l { > int (*_dma_setup)(char *addr, unsigned long size, int mode, int io); > } fd_routine[] = { > { > - request_dma, > - free_dma, > - get_dma_residue, > - dma_mem_alloc, > - hard_dma_setup > + ._request_dma = request_dma, > + ._free_dma = free_dma, > + ._get_dma_residue = get_dma_residue, > + ._dma_mem_alloc = dma_mem_alloc, > + ._dma_setup = hard_dma_setup > }, > { > - vdma_request_dma, > - vdma_nop, > - vdma_get_dma_residue, > - vdma_mem_alloc, > - vdma_dma_setup > + ._request_dma = vdma_request_dma, > + ._free_dma = vdma_nop, > + ._get_dma_residue = vdma_get_dma_residue, > + ._dma_mem_alloc = vdma_mem_alloc, > + ._dma_setup = vdma_dma_setup Please align the two columns vertically while at it. Thanks, Ingo
Re: [PATCH 4/5] irda: irnet: Remove unused IRNET_MAJOR define
From: Corentin Labbe Date: Thu, 15 Dec 2016 11:42:49 +0100 > The IRNET_MAJOR define is not used, so this patch remove it. > > Signed-off-by: Corentin Labbe Applied.
Re: [PATCH 5/5] irda: irnet: add member name to the miscdevice declaration
From: Corentin Labbe Date: Thu, 15 Dec 2016 11:42:50 +0100 > Since the struct miscdevice have many members, it is dangerous to init > it without members name relying only on member order. > > This patch add member name to the init declaration. > > Signed-off-by: Corentin Labbe Applied.
Re: [PATCH 1/5] irda: irproc.c: Remove unneeded linux/miscdevice.h include
From: Corentin Labbe Date: Thu, 15 Dec 2016 11:42:46 +0100 > irproc.c does not use any miscdevice so this patch remove this > unnecessary inclusion. > > Signed-off-by: Corentin Labbe Applied.
Re: [PATCH 3/5] irnet: ppp: move IRNET_MINOR to include/linux/miscdevice.h
From: Corentin Labbe Date: Thu, 15 Dec 2016 11:42:48 +0100 > This patch move the define for IRNET_MINOR to include/linux/miscdevice.h > It is better that all minor number definitions are in the same place. > > Signed-off-by: Corentin Labbe Applied.
Re: [PATCH 2/5] irda: irnet: Move linux/miscdevice.h include
From: Corentin Labbe Date: Thu, 15 Dec 2016 11:42:47 +0100 > The only use of miscdevice is irda_ppp so no need to include > linux/miscdevice.h for all irda files. > This patch move the linux/miscdevice.h include to irnet_ppp.h > > Signed-off-by: Corentin Labbe Applied.
[RFC][PATCH] spinlock_debug: report spinlock lockup from unlock
There is a race window between the point when __spin_lock_debug() detects spinlock lockup and the time when CPU that caused the lockup receives its backtrace interrupt. Before __spin_lock_debug() triggers all_cpu_backtrace() it calls spin_dump() to printk() the current state of the lock and CPU backtrace. These printk() calls can take some time to print the messages to serial console, for instance (we are not talking about console_unlock() loop and a flood of messages from other CPUs, but just spin_dump() printk() and serial console). All those preparation steps can give CPU that caused the lockup enough time to run away, so when it receives a backtrace interrupt it can look completely innocent. The patch extends `struct raw_spinlock' with additional variable that stores jiffies of successful do_raw_spin_lock() and checks in debug_spin_unlock() whether the spin_lock has been locked for too long. So we will have a reliable backtrace from CPU that locked up and a reliable backtrace from CPU that caused the lockup. Missed spin_lock unlock deadline report (example): BUG: spinlock missed unlock deadline on CPU#0, bash/327 lock: lock.25562+0x0/0x60, .magic: dead4ead, .owner: bash/327, .owner_cpu: 0 CPU: 0 PID: 327 Comm: bash Call Trace: dump_stack+0x4f/0x65 spin_dump+0x8a/0x8f spin_bug+0x2b/0x2d do_raw_spin_unlock+0x92/0xa3 _raw_spin_unlock+0x27/0x44 ... Signed-off-by: Sergey Senozhatsky --- include/linux/spinlock_types.h | 4 +++- kernel/locking/spinlock_debug.c | 5 + 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h index 73548eb13a5d..8972e56eeefb 100644 --- a/include/linux/spinlock_types.h +++ b/include/linux/spinlock_types.h @@ -25,6 +25,7 @@ typedef struct raw_spinlock { #ifdef CONFIG_DEBUG_SPINLOCK unsigned int magic, owner_cpu; void *owner; + unsigned long acquire_tstamp; #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; @@ -45,7 +46,8 @@ typedef struct raw_spinlock { # define SPIN_DEBUG_INIT(lockname) \ .magic = SPINLOCK_MAGIC,\ .owner_cpu = -1,\ - .owner = SPINLOCK_OWNER_INIT, + .owner = SPINLOCK_OWNER_INIT, \ + .acquire_tstamp = 0, #else # define SPIN_DEBUG_INIT(lockname) #endif diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c index 0374a596cffa..daeab4bc86ff 100644 --- a/kernel/locking/spinlock_debug.c +++ b/kernel/locking/spinlock_debug.c @@ -12,6 +12,7 @@ #include #include #include +#include void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name, struct lock_class_key *key) @@ -27,6 +28,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name, lock->magic = SPINLOCK_MAGIC; lock->owner = SPINLOCK_OWNER_INIT; lock->owner_cpu = -1; + lock->acquire_tstamp = 0; } EXPORT_SYMBOL(__raw_spin_lock_init); @@ -90,6 +92,7 @@ static inline void debug_spin_lock_after(raw_spinlock_t *lock) { lock->owner_cpu = raw_smp_processor_id(); lock->owner = current; + lock->acquire_tstamp = jiffies; } static inline void debug_spin_unlock(raw_spinlock_t *lock) @@ -99,6 +102,8 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock) SPIN_BUG_ON(lock->owner != current, lock, "wrong owner"); SPIN_BUG_ON(lock->owner_cpu != raw_smp_processor_id(), lock, "wrong CPU"); + SPIN_BUG_ON(time_after_eq(jiffies, lock->acquire_tstamp + HZ), + lock, "missed unlock deadline"); lock->owner = SPINLOCK_OWNER_INIT; lock->owner_cpu = -1; } -- 2.11.0
Re: [PATCH net 0/3] dpaa_eth: a couple of fixes
From: Madalin Bucur Date: Thu, 15 Dec 2016 15:13:03 +0200 > This patch set introduces big endian accessors in the dpaa_eth driver > making sure accesses to the QBMan HW are correct on little endian > platforms. Removing a redundant Kconfig dependency on FSL_SOC. > Adding myself as maintainer of the dpaa_eth driver. Series applied, thanks.
Re: [PATCH v2 0/8] power: supply: tps65217: Support USB charger feature
Hi, On Fri, Dec 09, 2016 at 04:48:58PM +0900, Milo Kim wrote: > TPS65217 device supports two charger inputs - AC and USB. > Currently, only AC charger is supported. This patch-set adds USB charger > feature. Tested on Beaglebone black. > > Patch 1: Main patch > Patch 2, 3: Clean up for charger driver data > Patch 4 ~ 8: Naming changes for generic power supply class structure > > v2: > Regenerate the patchset for better code review > > Milo Kim (8): > power: supply: tps65217: Support USB charger interrupt > power: supply: tps65217: Use 'poll_task' on unloading the module patches look fine, but these two patches must be reordered to fix bisectability. Otherwise after patch 1 the thread is not properly killed during driver removal. -- Sebastian signature.asc Description: PGP signature
Re: [PATCH v3 2/2] dt-bindings: power: add bindings for sbs-charger
Hi, On Thu, Nov 24, 2016 at 01:33:43PM +0100, Nicolas Saenz Julienne wrote: > Adds device tree documentation for SBS charger compilant devices as defined > here: http://sbs-forum.org/specs/sbc110.pdf > > Signed-off-by: Nicolas Saenz Julienne > --- > v2 -> v3: > - add part number as compatible > > .../bindings/power/supply/sbs_sbs-charger.txt | 24 > ++ > 1 file changed, 24 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/power/supply/sbs_sbs-charger.txt > > diff --git > a/Documentation/devicetree/bindings/power/supply/sbs_sbs-charger.txt > b/Documentation/devicetree/bindings/power/supply/sbs_sbs-charger.txt > new file mode 100644 > index 000..f6b6027 > --- /dev/null > +++ b/Documentation/devicetree/bindings/power/supply/sbs_sbs-charger.txt > @@ -0,0 +1,24 @@ > +SBS sbs-charger > +~~ > + > +Required properties: > + - compatible: should contain one of the following: > + - "lltc,ltc4100" > + - "sbs,sbs-charger" That's not what I meant. The idea is to specify "lltc,ltc4100" with "sbs,sbs-charger" as fallback. Then the driver for now only handles "sbs,sbs-charger", but if any vendor registers need to be supported we have a more specific compatible value in DT, that can be used to identify the device. > +Optional properties: > +- interrupt-parent: Should be the phandle for the interrupt controller. Use > in > +conjunction with "interrupts". > +- interrupts: Interrupt mapping for GPIO IRQ. Use in conjunction with > +"interrupt-parent". If an interrupt is not provided the driver will > switch > +automatically to polling. > + > +Example: > + > + ltc4100@9 { > + compatible = "sbs,sbs-charger"; > + reg = <0x9>; > + interrupt-parent = <&gpio6>; > + interrupts = <7 IRQ_TYPE_LEVEL_LOW>; > + }; So the example would look like compatible = "lltc,ltc4100", "sbs,sbs-charger"; -- Sebastian signature.asc Description: PGP signature
Re: [PATCH v3 0/2] power: supply: add sbs-charger driver
Hi, On Tue, Dec 13, 2016 at 11:41:01AM +0100, Nicolas Saenz Julienne wrote: > On 24/11/16 13:33, Nicolas Saenz Julienne wrote: > > Hi, > > > > This series adds support for all SBS compatible battery chargers, as defined > > here: http://sbs-forum.org/specs/sbc110.pdf. > > > > The first patch changes the sbs-battery device name in order to be able to > > create a proper supplier/supplied relation between the two of them. > > > > The second introduces the driver. > > > > Regards, > > Nicolas > > > > changes since v2: > > - updated driver and dt-binding with Sebatian's comments > > > > changes since v1: > > - added dt bindings > > - updated driver with Sebastian's comments > > - s/Nicola/Nicolas/ in commits > > > > Nicolas Saenz Julienne (2): > > power: supply: add sbs-charger driver > > dt-bindings: power: add bindings for sbs-charger > > > > .../bindings/power/supply/sbs_sbs-charger.txt | 24 ++ > > drivers/power/supply/Kconfig | 6 + > > drivers/power/supply/Makefile | 1 + > > drivers/power/supply/sbs-charger.c | 275 > > + > > 4 files changed, 306 insertions(+) > > create mode 100644 > > Documentation/devicetree/bindings/power/supply/sbs_sbs-charger.txt > > create mode 100644 drivers/power/supply/sbs-charger.c > > > Hi, > any update? Sorry, I was busy. -- Sebastian signature.asc Description: PGP signature
Re: [PATCH v1 & v6 1/2] PM/devfreq: add suspend frequency support
2016-12-18 0:13 GMT+09:00 Tobias Jakobi : > Hey guys, > > Chanwoo Choi wrote: >> Hi Lin, >> >> 2016-11-24 18:54 GMT+09:00 Chanwoo Choi : >>> Hi Lin, >>> >>> On 2016년 11월 24일 18:28, Chanwoo Choi wrote: Hi Lin, On 2016년 11월 24일 17:34, hl wrote: > Hi Chanwoo Choi, > > > On 2016年11月24日 16:16, Chanwoo Choi wrote: >> Hi Lin, >> >> On 2016년 11월 24일 16:34, hl wrote: >>> Hi Chanwoo Choi, >>> >>> I think the dev_pm_opp_get_suspend_opp() have implement most of >>> the funtion, all we need is just define the node in dts, like following: >>> >>> &dmc_opp_table { >>> opp06 { >>> opp-suspend; >>> }; >>> }; >> Two approaches use the 'opp-suspend' property. >> >> I think that the method to support suspend-opp have to >> guarantee following conditions: >> - Support the all of devfreq's governors. > As MyungJoo Ham suggestion, i will set the suspend frequency in > devfreq_suspend_device(), > which will ingore governor. Other approach already support the all of governors. Before calling the mail, I discussed with Myungjoo Ham. Myungjoo prefer to use the devfreq_suspend/devfreq_resume(). >>> >>> It is not correct expression. We need to wait the reply from Myungjoo >>> to clarify this. >>> To Myungjoo, Please add your opinion how to support the suspend frequency. >>> >> - Devfreq framework have the responsibility to change the >>frequency/voltage for suspend-opp. If we uses the >>new devfreq_suspend(), each devfreq device don't care >>how to support the suspend-opp. Just the developer of each >>devfreq device need to add 'opp-suspend' propet to OPP entry in DT >> file. > Why should support change the voltage in devfreq framework, i think it > shuold be handle in > specific driver, i think the devfreq only handle it can get the right > frequency, then pass it to No, the frequency should be handled by governor or framework. The each devfreq device has no any responsibility of next frequency/voltage. The governor and core of devfreq can decide the next frequency/voltage. You can refer to the cpufreq subsystem. > specific driver, i think the voltage should handle in the > devfreq->profile->target(); The call of devfreq->profile->target() have to be handled by devfreq framework. If user want to set the suspend frequency, user can add the 'suspend-opp' property. It think this way is easy. But, If the each devfreq device want to decide the next frequency/voltage only for suspend state. We can check the cpufreq subsystem. If specific devfreq device want to handle the suspend frequency, each devfreq will add the own suspend/resume functions as following: struct devfreq_dev_profile { int (*suspend)(struct devfreq *dev);// new function pointer int (*resume)(struct devfreq *dev); // new function pointer } a_profile; a_profile = devfreq_generic_suspend; The devfreq framework will provide the devfreq_generic_suspend() funticon. int devfreq_generic_suspend(struce devfreq *dev) { ... devfreq->profile->target(..., devfreq->suspend_freq); ... } or a_profile = a_devfreq_suspend; // specific function of each devfreq device The devfreq_suspend() will call 'devfreq->profile->suspend()' function instead of devfreq->profile->target(); The devfreq call the 'devfreq->profile->suspend()' to support the suspend frequency. Regards, Chanwoo Choi >>> >>> The key difference between two approaches: >>> >>> Your approach: >>> - The each developer should add the 'opp-suspend' property to the dts file. >>> - The each devfreq should call the devfreq_suspend_device() >>> to support the suspend frequency. >>> >>> If each devfreq doesn't call the devfreq_suspend_device(), devfreq >>> framework >>> can support the suspend frequency. >>> >>> Other approach: >>> - The each developer only should add the 'opp-suspend' property to the dts >>> file >>> without the additional behavior. >>> >>> In the cpufreq subsystem, >>> When support the suspend frequency of cpufreq, we just add 'opp-suspend' >>> property >>> without the additional behavior. >> >> I'm missing the use-case when using the devfreq_suspend_device() >> before entering the suspend mode. We should consider the case when >> devfreq device >> calls the devfreq_suspend_device() directly. Because devfreq_suspend_device() >> is exported function, each devfreq device call this function on the fly >> withou
Re: [PATCH] block: loose check on sg gap
On 12/17/2016 03:49 AM, Ming Lei wrote: > If the last bvec of the 1st bio and the 1st bvec of the next > bio are contineous physically, and the latter can be merged > to last segment of the 1st bio, we should think they don't > violate sg gap(or virt boundary) limit. > > Both Vitaly and Dexuan reported lots of unmergeable small bios > are observed when running mkfs on Hyper-V virtual storage, and > performance becomes quite low, so this patch is figured out for > fixing the performance issue. > > The same issue should exist on NVMe too sine it sets virt boundary too. It looks pretty reasonable to me. I'll queue it up for some testing, changes like this always make me a little nervous. -- Jens Axboe
Re: [PATCH] net: use designated initializers
From: Kees Cook Date: Fri, 16 Dec 2016 16:58:58 -0800 > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. > > Signed-off-by: Kees Cook Applied, although "decnet: " would have been a much better subsystem prefix.
Re: [PATCH] isdn/gigaset: use designated initializers
From: Kees Cook Date: Fri, 16 Dec 2016 16:58:06 -0800 > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. > > Signed-off-by: Kees Cook Applied.
Re: [PATCH] ATM: use designated initializers
From: Kees Cook Date: Fri, 16 Dec 2016 16:58:43 -0800 > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. > > Signed-off-by: Kees Cook Applied.
Re: [PATCH] net/x25: use designated initializers
From: Kees Cook Date: Fri, 16 Dec 2016 17:03:39 -0800 > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. > > Signed-off-by: Kees Cook Applied.
Re: [PATCH] isdn: use designated initializers
From: Kees Cook Date: Fri, 16 Dec 2016 17:01:42 -0800 > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. > > Signed-off-by: Kees Cook Applied.
Re: [PATCH] WAN: use designated initializers
From: Kees Cook Date: Fri, 16 Dec 2016 16:59:18 -0800 > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. > > Signed-off-by: Kees Cook Applied.
Re: [PATCH] bna: use designated initializers
From: Kees Cook Date: Fri, 16 Dec 2016 17:00:54 -0800 > Prepare to mark sensitive kernel structures for randomization by making > sure they're using designated initializers. These were identified during > allyesconfig builds of x86, arm, and arm64, with most initializer fixes > extracted from grsecurity. > > Signed-off-by: Kees Cook Applied.
Re: OOM: Better, but still there on
On Sat, Dec 17, 2016 at 11:44:45PM +0900, Tetsuo Handa wrote: > On 2016/12/17 21:59, Nils Holland wrote: > > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: > >> mount -t tracefs none /debug/trace > >> echo 1 > /debug/trace/events/vmscan/enable > >> cat /debug/trace/trace_pipe > trace.log > >> > >> should help > >> [...] > > > > No problem! I enabled writing the trace data to a file and then tried > > to trigger another OOM situation. That worked, this time without a > > complete kernel panic, but with only my processes being killed and the > > system becoming unresponsive. > > [...] > > Under OOM situation, writing to a file on disk unlikely works. Maybe > logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" > if your are using bash) works better. (I wish we can do it from kernel > so that /bin/cat is not disturbed by delays due to page fault.) > > If you can configure netconsole for logging OOM killer messages and > UDP socket for logging trace_pipe messages, udplogger at > https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/ > might fit for logging both output with timestamp into a single file. Thanks for the hint, sounds very sane! I'll try to go that route for the next log / trace I produce. Of course, if Michal says that the trace file I've already posted, and which has been logged to file, is useless and would have been better if I had instead logged to a different machine via the network, I could also repeat the current experiment and produce a new file at any time. :-) Greetings Nils
Re: usb/core: warning in usb_create_ep_devs/sysfs_create_dir_ns
On Fri, Dec 16, 2016 at 7:01 PM, Alan Stern wrote: > On Mon, 12 Dec 2016, Andrey Konovalov wrote: > >> Hi! >> >> While running the syzkaller fuzzer I've got the following error report. >> >> On commit 3c49de52d5647cda8b42c4255cf8a29d1e22eff5 (Dev 2). >> >> WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0 >> gadgetfs: disconnected >> sysfs: cannot create duplicate filename >> '/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05' >> Kernel panic - not syncing: panic_on_warn set ... >> >> CPU: 2 PID: 865 Comm: kworker/2:1 Not tainted 4.9.0-rc7+ #34 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> Workqueue: usb_hub_wq hub_event >> 88006bee64c8 81f96b8a 0001 11000d7dcc2c >> ed000d7dcc24 0001 41b58ab3 8598b510 >> 81f968f8 850fee20 85cff020 dc00 >> Call Trace: >> [< inline >] __dump_stack lib/dump_stack.c:15 >> [] dump_stack+0x292/0x398 lib/dump_stack.c:51 >> [] panic+0x1cb/0x3a9 kernel/panic.c:179 >> [] __warn+0x1c4/0x1e0 kernel/panic.c:542 >> [] warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:565 >> [] sysfs_warn_dup+0x8a/0xa0 fs/sysfs/dir.c:30 >> [] sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:59 >> [< inline >] create_dir lib/kobject.c:71 >> [] kobject_add_internal+0x227/0xa60 lib/kobject.c:229 >> [< inline >] kobject_add_varg lib/kobject.c:366 >> [] kobject_add+0x139/0x220 lib/kobject.c:411 >> [] device_add+0x353/0x1660 drivers/base/core.c:1088 >> [] device_register+0x1d/0x20 drivers/base/core.c:1206 >> [] usb_create_ep_devs+0x163/0x260 >> drivers/usb/core/endpoint.c:195 >> [] create_intf_ep_devs+0x13b/0x200 >> drivers/usb/core/message.c:1030 >> [] usb_set_configuration+0x1083/0x18d0 >> drivers/usb/core/message.c:1937 > > Hi, Andrey: > > Please check whether the patch below fixes this problem. Hi Alan, Been testing with your patch for the last day, haven't seen any more reports or other issues. Tested-by: Andrey Konovalov Thanks! > > Alan Stern > > > > Index: usb-4.x/drivers/usb/core/config.c > === > --- usb-4.x.orig/drivers/usb/core/config.c > +++ usb-4.x/drivers/usb/core/config.c > @@ -234,6 +234,16 @@ static int usb_parse_endpoint(struct dev > if (ifp->desc.bNumEndpoints >= num_ep) > goto skip_to_next_endpoint_or_interface_descriptor; > > + /* Check for duplicate endpoint addresses */ > + for (i = 0; i < ifp->desc.bNumEndpoints; ++i) { > + if (ifp->endpoint[i].desc.bEndpointAddress == > + d->bEndpointAddress) { > + dev_warn(ddev, "config %d interface %d altsetting %d > has a duplicate endpoint with address 0x%X, skipping\n", > + cfgno, inum, asnum, d->bEndpointAddress); > + goto skip_to_next_endpoint_or_interface_descriptor; > + } > + } > + > endpoint = &ifp->endpoint[ifp->desc.bNumEndpoints]; > ++ifp->desc.bNumEndpoints; > >
Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
On Thu 2016-12-15 23:33:22, Lino Sanfilippo wrote: > On 15.12.2016 22:32, Lino Sanfilippo wrote: > > > Ah ok. Then maybe priv->hw->dma->stop_tx() does not do the job correctly > > (stop the > > tx path properly) and the HW is still active on the tx path while the tx > > buffers are > > freed. OTOH stmmac_release() also stops the phy before the tx (and rx) > > paths are stopped. > > Did you try to stop the phy fist in stmmac_tx_err_work(), too? > > > > Regards, > > Lino > > > > And this is the "sledgehammer" approach: Do a complete shutdown and restart > of the hardware in case of tx error (against net-next and only >compile tested). Wow, thanks a lot. I'll try to get the driver back to the non-working state, and try it. I believe I have some idea what is wrong there. (Missing memory barriers). > +static void stmmac_tx_err_work(struct work_struct *work) > +{ > + struct stmmac_priv *priv = container_of(work, struct stmmac_priv, > + tx_err_work); > + /* restart netdev */ > + rtnl_lock(); > + stmmac_release(priv->dev); > + stmmac_open(priv->dev); > + rtnl_unlock(); > +} Won't this up/down the interface, in a way userspace can observe? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH v4 2/3] perf tool: add PERF_RECORD_NAMESPACES to include namespaces related info
On Fri, Dec 16, 2016 at 12:07:20AM +0530, Hari Bathini wrote: SNIP > + > +int thread__set_namespaces(struct thread *thread, u64 timestamp, > +struct namespaces_event *event) > +{ > + struct namespaces *new, *curr = thread__namespaces(thread); > + > + new = namespaces__new(event); > + if (!new) > + return -ENOMEM; > + > + list_add(&new->list, &thread->namespaces_list); > + > + if (timestamp && curr) { > + /* > + * setns syscall must have changed few or all the namespaces > + * of this thread. Update end time for the namespaces > + * previously used. > + */ > + curr = list_next_entry(new, list); > + curr->end_time = timestamp; hi, couldn't you use just the curr you got from thread__namespaces? why to retrieve it again via 'new' pointer? thanks, jirka
Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
In the actual implementation ether_addr_equal function tests for equality to 0 when returning. It seems in commit 0d74c4 it is somehow overlooked to change this operator to reflect the actual function. Signed-off-by: Cihangir Akturk --- Documentation/unaligned-memory-access.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/unaligned-memory-access.txt b/Documentation/unaligned-memory-access.txt index a445da0..3f76c0c 100644 --- a/Documentation/unaligned-memory-access.txt +++ b/Documentation/unaligned-memory-access.txt @@ -151,7 +151,7 @@ bool ether_addr_equal(const u8 *addr1, const u8 *addr2) #else const u16 *a = (const u16 *)addr1; const u16 *b = (const u16 *)addr2; - return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0; + return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0; #endif } -- 2.1.4
[RFC 0/1] New PCI Switch Management Driver
Hi, [Appologies: this is a resend for some people. Due to a configuration error the original email was rejected by the mailing lists. I hope this one makes it!] We're looking to get some initial feedback on a new driver for a line of PCIe switches produced and produced and sold by Microsemi. The goal is to get the process moving to get this code included in upstream hopefully for 4.11. Facebook is currently gearing up to use this hardware in its Open Compute Platform and is pushing to have this driver in the upstream kernel. The following patch briefly describes the hardware and provides the first draft of driver code. Currently, the driver works and has been tested but is not feature complete. Thus, we are not looking to get it merged immediately. However we would like some early review, specifically on the interfaces and core concepts so that we don't do a lot of work down a path the community would reject. Barring any objections to this RFC, we will flesh out all the features and provide a completed patch for inclusion in the coming weeks. Work on a userspace tool, that utilizes this driver, is also being done at [1]. The tool is currently also a bit of a skeleton and will be fleshed out assuming there are no serious objections to our userspace interface. In the end, the tool will be released with a GPL license. The patch is based off of the v4.9 release. Thanks for your review, Logan [1] https://github.com/sbates130272/switchtec-user Logan Gunthorpe (1): MicroSemi Switchtec management interface driver Documentation/switchtec.txt| 54 +++ MAINTAINERS| 9 + drivers/pci/Kconfig| 1 + drivers/pci/Makefile | 1 + drivers/pci/switch/Kconfig | 13 + drivers/pci/switch/Makefile| 1 + drivers/pci/switch/switchtec.c | 824 + drivers/pci/switch/switchtec.h | 119 ++ 8 files changed, 1022 insertions(+) create mode 100644 Documentation/switchtec.txt create mode 100644 drivers/pci/switch/Kconfig create mode 100644 drivers/pci/switch/Makefile create mode 100644 drivers/pci/switch/switchtec.c create mode 100644 drivers/pci/switch/switchtec.h -- 2.1.4
[RFC 1/1] MicroSemi Switchtec management interface driver
Microsemi's "Switchtec" line of PCI switch devices is already supported by the kernel with standard PCI switch drivers. However, the Switchtec device advertises a special management endpoint which enables some additional functionality. This includes: * Packet and Byte Counters * Firmware Upgrades * Event and Error logs * Querying port link status * Custom user firmware commands This patch introduces the switchtec kernel module which provides pci driver that exposes a char device. The char device provides userspace access to this interface through read, write and (optionally) poll calls. Currently no ioctls have been implemented but a couple may be added in a later revision. A short text file is provided which documents the switchtec driver and outlines the semantics of using the char device. A WIP userspace tool which utilizes this interface is available at [1]. This tool takes inspiration (and borrows some code) from nvme-cli [2]. [1] https://github.com/sbates130272/switchtec-user [2] https://github.com/linux-nvme/nvme-cli Signed-off-by: Logan Gunthorpe Signed-off-by: Stephen Bates --- Documentation/switchtec.txt| 54 +++ MAINTAINERS| 9 + drivers/pci/Kconfig| 1 + drivers/pci/Makefile | 1 + drivers/pci/switch/Kconfig | 13 + drivers/pci/switch/Makefile| 1 + drivers/pci/switch/switchtec.c | 824 + drivers/pci/switch/switchtec.h | 119 ++ 8 files changed, 1022 insertions(+) create mode 100644 Documentation/switchtec.txt create mode 100644 drivers/pci/switch/Kconfig create mode 100644 drivers/pci/switch/Makefile create mode 100644 drivers/pci/switch/switchtec.c create mode 100644 drivers/pci/switch/switchtec.h diff --git a/Documentation/switchtec.txt b/Documentation/switchtec.txt new file mode 100644 index 000..04657ce --- /dev/null +++ b/Documentation/switchtec.txt @@ -0,0 +1,54 @@ + +Linux Switchtec Support + + +Microsemi's "Switchtec" line of PCI switch devices is already +supported by the kernel with standard PCI switch drivers. However, the +Switchtec device advertises a special management endpoint which +enables some additional functionality. This includes: + + * Packet and Byte Counters + * Firmware Upgrades + * Event and Error logs + * Querying port link status + * Custom user firmware commands + +The switchtec kernel module implements this functionality. + + + +Interface += + +The primary means of communicating with the Switchtec management firmware is +through the Memory-mapped Remote Procedure Call (MRPC) interface. +Commands are submitted to the interface with a 4-byte command +identifier and up to 1KB of command specific data. The firmware will +respond with a 4 bytes return code and up to 1KB of command specific +data. The interface only processes a single command at a time. + + +Userspace Interface +=== + +The MRPC interface will be exposed to userspace through a simple char +device: /dev/switchtec#, one for each management endpoint in the system. + +The char device has the following semantics: + + * A write must consist of at least 4 bytes and no more than 1028 bytes. + The first four bytes will be interpreted as the command to run and + the remainder will be used as the input data. A write will send the + command to the firmware to begin processing. + + * Each write must be followed by exactly one read. Any double write will + produce an error and any read that doesn't follow a write will + produce an error. + + * A read will block until the firmware completes the command and return + the four bytes of status plus up to 1024 bytes of output data. (The + length will be specified by the size parameter of the read call -- + reading less than 4 bytes will produce an error. + + * The poll call will also be supported for userspace applications that + need to do other things while waiting for the command to complete. diff --git a/MAINTAINERS b/MAINTAINERS index 63cefa6..1e21505 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9288,6 +9288,15 @@ S: Maintained F: Documentation/devicetree/bindings/pci/aardvark-pci.txt F: drivers/pci/host/pci-aardvark.c +PCI DRIVER FOR MICROSEMI SWITCHTEC +M: Kurt Schwemmer +M: Stephen Bates +M: Logan Gunthorpe +L: linux-...@vger.kernel.org +S: Maintained +F: Documentation/switchtec.txt +F: drivers/pci/switch/switchtec* + PCI DRIVER FOR NVIDIA TEGRA M: Thierry Reding L: linux-te...@vger.kernel.org diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 6555eb7..f72e8c5 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -133,3 +133,4 @@ config PCI_HYPERV source "drivers/pci/hotplug/Kconfig" source "drivers/pci/host/Kconfig" +source "drivers/pci/switch/Kconfig" diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 8db5079..15b46dd 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makef
What is the function of arch/x86/purgatory/purgatory.c?
While checking the rtlwifi family of drivers using Sparse, I got the following warnings: CHECK arch/x86/purgatory/purgatory.c arch/x86/purgatory/purgatory.c:21:15: warning: symbol 'backup_dest' was not declared. Should it be static? arch/x86/purgatory/purgatory.c:22:15: warning: symbol 'backup_src' was not declared. Should it be static? arch/x86/purgatory/purgatory.c:23:15: warning: symbol 'backup_sz' was not declared. Should it be static? arch/x86/purgatory/purgatory.c:25:4: warning: symbol 'sha256_digest' was not declared. Should it be static? arch/x86/purgatory/purgatory.c:27:19: warning: symbol 'sha_regions' was not declared. Should it be static? arch/x86/purgatory/purgatory.c:42:5: warning: symbol 'verify_sha256_digest' was not declared. Should it be static? arch/x86/purgatory/purgatory.c:61:6: warning: symbol 'purgatory' was not declared. Should it be static? Upon examination of the routine, I can see that if purgatory() should be static, then none of the code here will ever be accessed by any part of the kernel. Is there some bit of magic that is above my understanding, or is this a useless bit of code that has been forgotten and should be removed? If the former, then I think there should be declarations so that the clueless like me are not confused. Thanks, Larry
Potential issues (security and otherwise) with the current cgroup-bpf API
Hi all- I apologize for being rather late with this. I didn't realize that cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see it on the linux-api list, so I missed the discussion. I think that the inet ingress, egress etc filters are a neat feature, but I think the API has some issues that will bite us down the road if it becomes stable in its current form. Most of the problems I see are summarized in this transcript: # mkdir cg2 # mount -t cgroup2 none cg2 # mkdir cg2/nosockets # strace cgrp_socket_rule cg2/nosockets/ 0 ... open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3 You can modify a cgroup after opening it O_RDONLY? bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2, insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144, log_buf=0x6020c0, kern_version=0}, 48) = 4 This is fine. The bpf() syscall manipulates bpf objects. bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0 This is not so good: a) The bpf() syscall is supposed to manipulate bpf objects. This is manipulating a cgroup. There's no reason that a socket creation filter couldn't be written in a different language (new iptables table? Simple list of address families?), but if that happened, then using bpf() to install it would be entirely nonsensical. b) This is starting to be an excessively ugly multiplexer. Among other things, it's very unfriendly to seccomp. # echo $$ >cg2/nosockets/cgroup.procs # ping 127.0.0.1 ping: socket: Operation not permitted # ls cg2/nosockets/ cgroup.controllers cgroup.events cgroup.procs cgroup.subtree_control # cat cg2/nosockets/cgroup.controllers Something in cgroupfs should give an indication that this cgroup filters socket creation, but there's nothing there. You should also be able to turn the filter off from cgroupfs. # mkdir cg2/nosockets/sockets # /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1 This succeeded, which means that, if this feature is enabled in 4.10, then we're stuck with its semantics. If it returned -EINVAL instead, there would be a chance to refine it. # echo $$ >cg2/nosockets/sockets/cgroup.procs # ping 127.0.0.1 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms ^C --- 127.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms Bash was inside a cgroup that disallowed socket creation, but socket creation wasn't disallowed. This means that the obvious use of socket creation filters in nestable constainers fails insecurely. There's also a subtle but nasty potential security problem here. In 4.9 and before, cgroups has only one real effect in the kernel: resource control. A process in a malicious cgroup could be DoSed, but that was about the extent of the damage that a malicious cgroup could do. In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf programs attached that can do things if various events occur. (Right now, this means socket operations, but there are plans in the works to do this for LSM hooks too.) These bpf programs can say yes or no, but they can also read out various data (including socket payloads!) and save them away where an attacker can find them. This sounds a lot like seccomp with a narrower scope but a much stronger ability to exfiltrate private information. Unfortunately, while seccomp is very, very careful to prevent injection of a privileged victim into a malicious sandbox, the CGROUP_BPF mechanism appears to have no real security model. There is nothing to prevent a program that's in a malicious cgroup from running a setuid binary, and there is nothing to prevent a program that has the ability to move itself or another program into a malicious cgroup from doing so and then, if needed for exploitation, exec a setuid binary. This isn't much of a problem yet because you currently need CAP_NET_ADMIN to create a malicious sandbox in the first place. I'm sure that, in the near future, someone will want to make this stuff work in containers with delegated cgroup hierarchies, and then there may be a real problem here. I've included a few security people on this thread. The current API looks abusable, and it would be nice to find all the holes before 4.10 comes out. (The cgrp_socket_rule source is attached. You can build it by sticking it in samples/bpf and doing: $ make headers_install $ cd samples/bpf $ gcc -o cgrp_socket_rule cgrp_socket_rule.c libbpf.c -I../../usr/include ) --Andy /* eBPF example program: * * - Loads eBPF program * * The eBPF program sets the sk_bound_dev_if index in new AF_INET{6} * sockets opened by processes in the cgroup. * * - Attaches the new program to a cgroup using BPF_PROG_ATTACH */ #define _GNU_SOURCE #include #include #include #include #include #in
Re: [PATCH v1 & v6 1/2] PM/devfreq: add suspend frequency support
Hey Chanwoo, Chanwoo Choi wrote: > 2016-12-18 0:13 GMT+09:00 Tobias Jakobi : >> Hey guys, >> >> Chanwoo Choi wrote: >>> Hi Lin, >>> >>> 2016-11-24 18:54 GMT+09:00 Chanwoo Choi : Hi Lin, On 2016년 11월 24일 18:28, Chanwoo Choi wrote: > Hi Lin, > > On 2016년 11월 24일 17:34, hl wrote: >> Hi Chanwoo Choi, >> >> >> On 2016年11月24日 16:16, Chanwoo Choi wrote: >>> Hi Lin, >>> >>> On 2016년 11월 24일 16:34, hl wrote: Hi Chanwoo Choi, I think the dev_pm_opp_get_suspend_opp() have implement most of the funtion, all we need is just define the node in dts, like following: &dmc_opp_table { opp06 { opp-suspend; }; }; >>> Two approaches use the 'opp-suspend' property. >>> >>> I think that the method to support suspend-opp have to >>> guarantee following conditions: >>> - Support the all of devfreq's governors. >> As MyungJoo Ham suggestion, i will set the suspend frequency in >> devfreq_suspend_device(), >> which will ingore governor. > > Other approach already support the all of governors. > Before calling the mail, I discussed with Myungjoo Ham. > Myungjoo prefer to use the devfreq_suspend/devfreq_resume(). It is not correct expression. We need to wait the reply from Myungjoo to clarify this. > > To Myungjoo, > Please add your opinion how to support the suspend frequency. > >>> - Devfreq framework have the responsibility to change the >>>frequency/voltage for suspend-opp. If we uses the >>>new devfreq_suspend(), each devfreq device don't care >>>how to support the suspend-opp. Just the developer of each >>>devfreq device need to add 'opp-suspend' propet to OPP entry in DT >>> file. >> Why should support change the voltage in devfreq framework, i think it >> shuold be handle in >> specific driver, i think the devfreq only handle it can get the right >> frequency, then pass it to > > No, the frequency should be handled by governor or framework. > The each devfreq device has no any responsibility of next > frequency/voltage. > The governor and core of devfreq can decide the next frequency/voltage. > You can refer to the cpufreq subsystem. > >> specific driver, i think the voltage should handle in the >> devfreq->profile->target(); > > The call of devfreq->profile->target() have to be handled by devfreq > framework. > If user want to set the suspend frequency, user can add the 'suspend-opp' > property. > It think this way is easy. > > But, > If the each devfreq device want to decide the next frequency/voltage only > for > suspend state. We can check the cpufreq subsystem. > > If specific devfreq device want to handle the suspend frequency, > each devfreq will add the own suspend/resume functions as following: > > struct devfreq_dev_profile { > int (*suspend)(struct devfreq *dev);// new function > pointer > int (*resume)(struct devfreq *dev); // new function > pointer > } a_profile; > > a_profile = devfreq_generic_suspend; > > The devfreq framework will provide the devfreq_generic_suspend() > funticon. > int devfreq_generic_suspend(struce devfreq *dev) { > ... > devfreq->profile->target(..., > devfreq->suspend_freq); > ... > } > > or > > a_profile = a_devfreq_suspend; // specific function of each devfreq > device > > The devfreq_suspend() will call 'devfreq->profile->suspend()' > function > instead of devfreq->profile->target(); > > The devfreq call the 'devfreq->profile->suspend()' > to support the suspend frequency. > > Regards, > Chanwoo Choi The key difference between two approaches: Your approach: - The each developer should add the 'opp-suspend' property to the dts file. - The each devfreq should call the devfreq_suspend_device() to support the suspend frequency. If each devfreq doesn't call the devfreq_suspend_device(), devfreq framework can support the suspend frequency. Other approach: - The each developer only should add the 'opp-suspend' property to the dts file without the additional behavior. In the cpufreq subsystem, When support the suspend frequency of cpufreq, we just add 'opp-suspend' property without the additional behavior. >>> >>> I'm missing the use-case when using the devfreq_suspend_device() >>> before entering the suspend mode. We should consider the
Re: [PATCH V1] i2c: xgene: Fix missing code of DTB support
On Wed, Dec 14, 2016 at 02:17:26PM +0700, Tin Huynh wrote: > In DTB case, i2c-core doesn't create slave device which is installed > on i2c-xgene bus because of missing code in this driver. > This patch fixes this issue. > > Signed-off-by: Tin Huynh Applied to for-current, thanks! signature.asc Description: PGP signature
Re: [PATCH V5] i2c: designware: fix wrong Tx/Rx FIFO for ACPI
On Wed, Dec 14, 2016 at 04:23:58PM +0700, Tin Huynh wrote: > ACPI always sets Tx/Rx FIFO to 32. This configuration will > cause problem if the IP core supports a FIFO size of less than 32. > The driver should read the FIFO size from the IP and select the smaller > one of the two. > > Signed-off-by: Tin Huynh > Applied to for-current, thanks! signature.asc Description: PGP signature
[GIT PULL] ARM: exynos: Late mach/soc for v4.10
Hi, After our discussions about not-breaking out-of-tree DTB with SCU change in DeviceTree, I prepared an updated pull request without the questioned changes. Ten days ago I prepared a tag, pushed it... and apparently forgot to send pull request. At least, I don't have such email in my outbox. Dunno. So let's send it now, better late then never. With just few commits (without the DT SCU changes). These were sitting in the next for very long. Best regards, Krzysztof The following changes since commit 1001354ca34179f3db924eb66672442a173147dc: Linux 4.9-rc1 (2016-10-15 12:17:50 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git tags/samsung-soc-4.10-2 for you to fetch changes up to da6b21e97e39d42f90ab490ce7b54a0fe2c3fe35: ARM: Drop fixed 200 Hz timer requirement from Samsung platforms (2016-12-07 18:42:11 +0200) Samsung mach/soc update for v4.10: 1. Minor cleanup in smp_operations. 2. Another step in switching s3c24xx to new DMA API. 3. Drop fixed requirement for HZ=200 on Samsung platforms. Krzysztof Kozlowski (1): ARM: Drop fixed 200 Hz timer requirement from Samsung platforms Pankaj Dubey (1): ARM: EXYNOS: Remove smp_init_cpus hook from platsmp.c Sylwester Nawrocki (1): ARM: S3C24XX: Add DMA slave maps for remaining s3c24xx SoCs arch/arm/Kconfig | 3 +- arch/arm/mach-exynos/platsmp.c | 31 - arch/arm/mach-s3c24xx/common.c | 76 ++ 3 files changed, 77 insertions(+), 33 deletions(-)
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On 17/12/2016 19:18, Andy Lutomirski wrote: > Hi all- > > I apologize for being rather late with this. I didn't realize that > cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see > it on the linux-api list, so I missed the discussion. > > I think that the inet ingress, egress etc filters are a neat feature, > but I think the API has some issues that will bite us down the road > if it becomes stable in its current form. > > Most of the problems I see are summarized in this transcript: > > # mkdir cg2 > # mount -t cgroup2 none cg2 > # mkdir cg2/nosockets > # strace cgrp_socket_rule cg2/nosockets/ 0 > ... > open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3 > > You can modify a cgroup after opening it O_RDONLY? I sent a patch to check the cgroup.procs permission before attaching a BPF program to it [1], but it was not merged because not part of the current security model (which may not be crystal clear). The thing is that the current socket/BPF/cgroup feature is only available to a process with the *global CAP_NET_ADMIN* and such a process can already modify the network for every processes, so it doesn't make much sense to check if it can modify the network for a subset of this processes. [1] https://lkml.org/lkml/2016/9/19/854 However, needing a process to open a cgroup *directory* in write mode may not make sense because the process does not modify the content of the cgroup but only use it as a *reference* in the network stack. Forcing an open with write mode may forbid to use this kind of network-filtering feature in a read-only file-system but not necessarily read-only *network configuration*. Another point of view is that the CAP_NET_ADMIN may be an unneeded privilege if the cgroup migration is using a no_new_privs-like feature as I proposed with Landlock [2] (with an extra ptrace_may_access() check). The new capability proposition for cgroup may be interesting too. [2] https://lkml.org/lkml/2016/9/14/82 > > bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2, > insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144, > log_buf=0x6020c0, kern_version=0}, 48) = 4 > > This is fine. The bpf() syscall manipulates bpf objects. > > bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0 > > This is not so good: > > a) The bpf() syscall is supposed to manipulate bpf objects. This > is manipulating a cgroup. There's no reason that a socket creation > filter couldn't be written in a different language (new iptables > table? Simple list of address families?), but if that happened, > then using bpf() to install it would be entirely nonsensical. Another point of view is to say that the BPF program (called by the network stack) is using a reference to a set of processes thanks to a cgroup. > > b) This is starting to be an excessively ugly multiplexer. Among > other things, it's very unfriendly to seccomp. FWIW, Landlock will have the capability to filter this kind of action. > > # echo $$ >cg2/nosockets/cgroup.procs > # ping 127.0.0.1 > ping: socket: Operation not permitted > # ls cg2/nosockets/ > cgroup.controllers cgroup.events cgroup.procs cgroup.subtree_control > # cat cg2/nosockets/cgroup.controllers > > Something in cgroupfs should give an indication that this cgroup > filters socket creation, but there's nothing there. You should also > be able to turn the filter off from cgroupfs. Right. Everybody was OK at LPC to add such an information but it is not there yet. > > # mkdir cg2/nosockets/sockets > # /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1 > > This succeeded, which means that, if this feature is enabled in 4.10, > then we're stuck with its semantics. If it returned -EINVAL instead, > there would be a chance to refine it. This is indeed unfortunate. > > # echo $$ >cg2/nosockets/sockets/cgroup.procs > # ping 127.0.0.1 > PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. > 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms > ^C > --- 127.0.0.1 ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms > > Bash was inside a cgroup that disallowed socket creation, but socket > creation wasn't disallowed. This means that the obvious use of socket > creation filters in nestable constainers fails insecurely. > > > There's also a subtle but nasty potential security problem here. > In 4.9 and before, cgroups has only one real effect in the kernel: > resource control. A process in a malicious cgroup could be DoSed, > but that was about the extent of the damage that a malicious cgroup > could do. > > In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf > programs attached that can do things if various events occur. (Right > now, this means socket operations, but there are plans in the works > to do this f
[PATCH] staging: rtl8712: changed struct members to __le32
Fixed sparse warning "cast to restricted __le32". struct recv_stat and struct phy_stat have always little endian members. Signed-off-by: Jannik Becher --- drivers/staging/rtl8712/rtl8712_recv.h | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/drivers/staging/rtl8712/rtl8712_recv.h b/drivers/staging/rtl8712/rtl8712_recv.h index 0b0c273..0352e6f 100644 --- a/drivers/staging/rtl8712/rtl8712_recv.h +++ b/drivers/staging/rtl8712/rtl8712_recv.h @@ -50,12 +50,12 @@ #define REORDER_WAIT_TIME 30 /* (ms)*/ struct recv_stat { - unsigned int rxdw0; - unsigned int rxdw1; - unsigned int rxdw2; - unsigned int rxdw3; - unsigned int rxdw4; - unsigned int rxdw5; + __le32 rxdw0; + __le32 rxdw1; + __le32 rxdw2; + __le32 rxdw3; + __le32 rxdw4; + __le32 rxdw5; }; struct phy_cck_rx_status { @@ -69,14 +69,14 @@ struct phy_cck_rx_status { }; struct phy_stat { - unsigned int phydw0; - unsigned int phydw1; - unsigned int phydw2; - unsigned int phydw3; - unsigned int phydw4; - unsigned int phydw5; - unsigned int phydw6; - unsigned int phydw7; + __le32 phydw0; + __le32 phydw1; + __le32 phydw2; + __le32 phydw3; + __le32 phydw4; + __le32 phydw5; + __le32 phydw6; + __le32 phydw7; }; #define PHY_STAT_GAIN_TRSW_SHT 0 #define PHY_STAT_PWDB_ALL_SHT 4 -- 2.7.4
Re: What is the function of arch/x86/purgatory/purgatory.c?
On Sat, Dec 17, 2016 at 11:52:05AM -0600, Larry Finger wrote: > Upon examination of the routine, I can see that if purgatory() should be > static, then none of the code here will ever be accessed by any part of the > kernel. Is there some bit of magic that is above my understanding, or is > this a useless bit of code that has been forgotten and should be removed? I don't know what is and what is not above your understanding, but grepping in that area (grep -w purgatory arch/x86/purgatory/*) does catch this: arch/x86/purgatory/setup-x86_64.S: call purgatory which is hardly magic - looks like a function call. Looking into that file shows purgatory_start: .code64 /* Load a gdt so I know what the segment registers are */ lgdtgdt(%rip) /* load the data segments */ movl$0x18, %eax /* data segment */ movl%eax, %ds movl%eax, %es movl%eax, %ss movl%eax, %fs movl%eax, %gs /* Setup a stack */ leaqlstack_end(%rip), %rsp /* Call the C code */ call purgatory jmp entry64 which pretty much confirms that - it's called from purgatory_start().
probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo
Hi, I posted recently several netfilter related crashes, didn't got any answers, one of them started to happen quite often on loaded NAT (17Gbps), so after trying endless ways to make it stable, i found out that in backtrace i can often see timers, and this bug probably appearing on older releases, i've seen such backtrace with timer fired for conntrack on them. I disabled Intel turbo for cpus on this loaded NAT, and voila, panic disappeared for 2nd day! * by wrmsr -a 0x1a0 0x4000850089 I am not sure timers is the reason, but probably turbo creating some condition for bug. Here is examples of backtrace of last reboots (kernel 4.8.14), and same kernel worked perfectly without turbo. Last one also one crash on 4.8.0 that looks painfully similar, on totally different workload, but with conntrack enabled. It happens there much less often, so harder to crash and test by disabling turbo. [28904.162607] BUG: unable to handle kernel NULL pointer dereference at 0008 [28904.163210] IP: [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.163745] PGD 0 [28904.164058] Oops: 0002 [#1] SMP [28904.164323] Modules linked in: nf_nat_pptp nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables netconsole configfs 8021q garp mrp stp llc bonding ixgbe dca [28904.168132] CPU: 27 PID: 0 Comm: swapper/27 Not tainted 4.8.14-build-0124 #2 [28904.168398] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.1008.031920151331 03/19/2015 [28904.168853] task: 885fa42e8c40 task.stack: 885fa42f [28904.169114] RIP: 0010:[] [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.169643] RSP: 0018:885fbccc3dd8 EFLAGS: 00010246 [28904.169901] RAX: RBX: 885fbccc RCX: 885fbccc0010 [28904.170169] RDX: 885f87a1c150 RSI: 0142 RDI: 885fbccc [28904.170437] RBP: 885fbccc3de8 R08: cbdee177 R09: 0100 [28904.170704] R10: 885fbccc3dd0 R11: 820050c0 R12: 885f87a1c140 [28904.170971] R13: 0005d948 R14: 000ea942 R15: 885f87a1c160 [28904.171237] FS: () GS:885fbccc() knlGS: [28904.171688] CS: 0010 DS: ES: CR0: 80050033 [28904.171964] CR2: 0008 CR3: 00607f006000 CR4: 001406e0 [28904.172231] Stack: [28904.172482] 885f87a1c140 820a1405 885fbccc3e28 a00abb30 [28904.173182] 0002820a1405 885f87a1c140 885f99a28201 [28904.173884] 820050c8 885fbccc3e58 a00abc62 [28904.174585] Call Trace: [28904.174835] [28904.174912] [] nf_ct_delete_from_lists+0xc9/0xf2 [nf_conntrack] [28904.175613] [] nf_ct_delete+0x109/0x12c [nf_conntrack] [28904.175894] [] ? nf_ct_delete+0x12c/0x12c [nf_conntrack] [28904.176169] [] death_by_timeout+0xd/0xf [nf_conntrack] [28904.176443] [] call_timer_fn.isra.5+0x17/0x6b [28904.176714] [] expire_timers+0x6f/0x7e [28904.176975] [] run_timer_softirq+0x69/0x8b [28904.177238] [] ? clockevents_program_event+0xd0/0xe8 [28904.177504] [] __do_softirq+0xbd/0x1aa [28904.177765] [] irq_exit+0x37/0x7c [28904.178026] [] smp_trace_apic_timer_interrupt+0x7b/0x88 [28904.178300] [] smp_apic_timer_interrupt+0x9/0xb [28904.178565] [] apic_timer_interrupt+0x7c/0x90 [28904.178835] [28904.178907] [] ? mwait_idle+0x64/0x7a [28904.179436] [] ? atomic_notifier_call_chain+0x13/0x15 [28904.179712] [] arch_cpu_idle+0xa/0xc [28904.179976] [] default_idle_call+0x27/0x29 [28904.180244] [] cpu_startup_entry+0x11d/0x1c7 [28904.180508] [] start_secondary+0xe8/0xeb [28904.180767] Code: 80 2f 0b 82 48 89 df e8 da 90 84 e1 48 8b 43 10 49 8d 54 24 10 48 8d 4b 10 49 89 4c 24 18 a8 01 49 89 44 24 10 48 89 53 10 75 04 89 50 08 c6 03 00 5b 41 5c 5d c3 48 8b 05 10 be 00 00 89 f6 [28904.185546] RIP [] nf_ct_add_to_dying_list+0x55/0x61 [nf_conntrack] [28904.186065] RSP [28904.186319] CR2: 0008 [28904.186593] ---[ end trace 35cbc6c885a5c2d8 ]--- [28904.186860] Kernel panic - not syncing: Fatal exception in interrupt [28904.187155] Kernel Offset: disabled [28904.187419] Rebooting in 5 seconds.. [28909.193662] ACPI MEMORY or I/O RESET_REG. [14125.227611] BUG: unable to handle kernel NULL pointer dereference at (null) [14125.228215] IP: [] nf_nat_setup_info+0x6d8/0x755 [nf_nat] [14125.228564] PGD 0 [14125.228882] Oops: [#1] SMP [14125.229146] Modules linked in: nf_nat_pptp nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net ip_set nfnetlink iptable_raw ipt
Re: What is the function of arch/x86/purgatory/purgatory.c?
On 12/17/2016 01:46 PM, Al Viro wrote: On Sat, Dec 17, 2016 at 11:52:05AM -0600, Larry Finger wrote: Upon examination of the routine, I can see that if purgatory() should be static, then none of the code here will ever be accessed by any part of the kernel. Is there some bit of magic that is above my understanding, or is this a useless bit of code that has been forgotten and should be removed? I don't know what is and what is not above your understanding, but grepping in that area (grep -w purgatory arch/x86/purgatory/*) does catch this: arch/x86/purgatory/setup-x86_64.S: call purgatory which is hardly magic - looks like a function call. Looking into that file shows purgatory_start: .code64 /* Load a gdt so I know what the segment registers are */ lgdtgdt(%rip) /* load the data segments */ movl$0x18, %eax /* data segment */ movl%eax, %ds movl%eax, %es movl%eax, %ss movl%eax, %fs movl%eax, %gs /* Setup a stack */ leaqlstack_end(%rip), %rsp /* Call the C code */ call purgatory jmp entry64 which pretty much confirms that - it's called from purgatory_start(). Thanks for the explanation. Larry