date:20170418

Re: [PATCH 2/3] ACPI: Remove platform devices from a bus on removal

2017-04-18 Thread joeyli

On Wed, Apr 19, 2017 at 02:50:17PM +0800, joeyli wrote:
> On Wed, Mar 22, 2017 at 06:33:24PM +0100, Joerg Roedel wrote:
> > From: Joerg Roedel 
> > 
> > The function acpi_bus_attach() creates platform_devices if
> > this is specified by the firmware. But in acpi_bus_trim()
> > these devices are not removed, leaving a dangling reference
> > to the parent device.
> > 
> > In the case of a PCI root-bus, this results in the
> > host_bridge device not being released on hot-remove.
> > 
> > Fix it by scanning the list of platform_devices for devices
> > to be removed with the bus.
> > 
> > Signed-off-by: Joerg Roedel 
> > ---
> >  drivers/acpi/scan.c | 26 ++
> >  1 file changed, 26 insertions(+)
> > 
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 1926918..b07518b 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -13,6 +13,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  
> > @@ -1928,6 +1929,25 @@ int acpi_bus_scan(acpi_handle handle)
> >  EXPORT_SYMBOL(acpi_bus_scan);
> >  
> >  /**
> > + * acpi_bus_trim_platform_device - Check and remove a platform device
> > + *from a bus
> > + * @dev: Platform device to check
> > + * @data: pointer to the acpi_device to check dev against
> > + *
> > + * Checks whether the platform_device dev belongs to the acpi_device
> > + * data and unregisters dev if it matches.
> > + */
> > +static int acpi_bus_trim_platform_device(struct device *dev, void *data)
> > +{
> > +   struct acpi_device *adev = data;
> > +
> > +   if (dev->fwnode == acpi_fwnode_handle(adev))
> > +   platform_device_unregister(to_platform_device(dev));
> > +
> > +   return 0;
> > +}
> > +
> > +/**
> >   * acpi_bus_trim - Detach scan handlers and drivers from ACPI device 
> > objects.
> >   * @adev: Root of the ACPI namespace scope to walk.
> >   *
> > @@ -1950,6 +1970,12 @@ void acpi_bus_trim(struct acpi_device *adev)
> > } else {
> > device_release_driver(&adev->dev);
> > }
> > +
> > +   /* Remove platform devices from the bus */
> > +   if (adev->pnp.type.platform_id)
> 
> This patch cases that dock_notify() is broken in find_dock_station()
> because the dock_station platform device was removed.
> 
> If anyone wants to apply this patch, then the dock device should be
> excluded:
> 
> + if (is_dock_device(adev) && adev->pnp.type.platform_id)

Sorry, it should be:
+   if (!is_dock_device(adev) && adev->pnp.type.platform_id)

> 
> > +   bus_for_each_dev(&platform_bus_type, NULL, adev,
> > +acpi_bus_trim_platform_device);
> > +
> > /*
> >  * Most likely, the device is going away, so put it into D3cold before
> >  * that.
>

Joey Lee

Re: [PATCH 2/3] ARM: sun8i: a83t: Drop leading zeroes from device node addresses

2017-04-18 Thread Maxime Ripard

On Tue, Apr 18, 2017 at 05:22:02PM +0800, Chen-Yu Tsai wrote:
> On Tue, Apr 18, 2017 at 5:03 PM, Maxime Ripard
>  wrote:
> > On Tue, Apr 18, 2017 at 12:22:04PM +0800, Chen-Yu Tsai wrote:
> >> Kbuild now complains about leading zeroes in the address portion of
> >> device node names.
> >>
> >> Get rid of them.
> >>
> >> Signed-off-by: Chen-Yu Tsai 
> >> ---
> >>  arch/arm/boot/dts/sun8i-a83t.dtsi | 10 +-
> >>  1 file changed, 5 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/arm/boot/dts/sun8i-a83t.dtsi 
> >> b/arch/arm/boot/dts/sun8i-a83t.dtsi
> >> index 913aacafe8d5..82cb87f21b96 100644
> >> --- a/arch/arm/boot/dts/sun8i-a83t.dtsi
> >> +++ b/arch/arm/boot/dts/sun8i-a83t.dtsi
> >> @@ -162,7 +162,7 @@
> >>   #size-cells = <1>;
> >>   ranges;
> >>
> >> - pio: pinctrl@01c20800 {
> >> + pio: pinctrl@1c20800 {
> >
> > As far as I know this breaks Uboot's auto-addition of stdout-path
> 
> You're right. It breaks as Uboot has the path to the uarts hard-coded.
> That sucks. And from what I can tell, it's not easily solvable by just
> switching to serial alias based references. CONS_INDEX won't line up
> on the A23/A33 Q8 tablets.
> 
> Maybe we can just keep the uart device node the same for now, but fix
> all the other ones. We can come back and fix the uart later once we
> figure out how to fix Uboot.

Thinking more about this, I don't really know why we have that in
U-Boot actually. All our DTs for a very long time have had stdout-path
properly set (and if it's improperly set, this should be fixed). I'd
say we can simply remove that from U-Boot and be done with it.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

Re: [PATCH 3/3] ARM: sun8i: a83t: Rename pinmux setting names

2017-04-18 Thread Maxime Ripard

On Tue, Apr 18, 2017 at 05:16:53PM +0800, Chen-Yu Tsai wrote:
> On Tue, Apr 18, 2017 at 5:04 PM, Maxime Ripard
>  wrote:
> > On Tue, Apr 18, 2017 at 12:22:05PM +0800, Chen-Yu Tsai wrote:
> >> The pinmux setting nodes all have an address element in their node
> >> names, however the pinctrl node does not have #address-cells.
> >>
> >> Rename the existing pinmux setting nodes and labels in sun8i-a83t.dtsi,
> >> dropping identifiers for functions that only have one possible setting,
> >> and using the pingroup name if the function is identically available on
> >> different pingroups.
> >>
> >> Signed-off-by: Chen-Yu Tsai 
> >
> > Applied, and I really like the new names.
> >
> > Would you make the same patch for everyone?
> 
> I can. No guarantees on the schedule though.

Sure, I know this isn't really the most entertaining and fulfilling
patch to make ;)

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

Re: [patch V2 03/10] timers: Rework idle logic

2017-04-18 Thread Peter Zijlstra

On Tue, Apr 18, 2017 at 01:11:05PM +0200, Thomas Gleixner wrote:
> Storing next event and determining whether the base is idle can be done in
> __next_timer_interrupt(). 
> 
> Preparatory patch for new call sites which need this information as well.
> 
> Signed-off-by: Thomas Gleixner 
> ---
>  kernel/time/timer.c |   43 ---
>  1 file changed, 24 insertions(+), 19 deletions(-)
> 
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1358,8 +1358,11 @@ static int next_pending_bucket(struct ti
>  /*
>   * Search the first expiring timer in the various clock levels. Caller must
>   * hold base->lock.
> + *
> + * Stores the next expiry time in base. The return value indicates whether
> + * the base is empty or not.
>   */
> -static unsigned long __next_timer_interrupt(struct timer_base *base)
> +static bool __next_timer_interrupt(struct timer_base *base)

Can't say I'm a fan of this.. I sort of see where this is going, but the
fact remains that __next_timer_interrupt(), as a function, makes me
expect a return value of time/timer quantity.

Re: [PATCH 2/3] ACPI: Remove platform devices from a bus on removal

2017-04-18 Thread joeyli

On Wed, Mar 22, 2017 at 06:33:24PM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> The function acpi_bus_attach() creates platform_devices if
> this is specified by the firmware. But in acpi_bus_trim()
> these devices are not removed, leaving a dangling reference
> to the parent device.
> 
> In the case of a PCI root-bus, this results in the
> host_bridge device not being released on hot-remove.
> 
> Fix it by scanning the list of platform_devices for devices
> to be removed with the bus.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/acpi/scan.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 1926918..b07518b 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -1928,6 +1929,25 @@ int acpi_bus_scan(acpi_handle handle)
>  EXPORT_SYMBOL(acpi_bus_scan);
>  
>  /**
> + * acpi_bus_trim_platform_device - Check and remove a platform device
> + *  from a bus
> + * @dev: Platform device to check
> + * @data: pointer to the acpi_device to check dev against
> + *
> + * Checks whether the platform_device dev belongs to the acpi_device
> + * data and unregisters dev if it matches.
> + */
> +static int acpi_bus_trim_platform_device(struct device *dev, void *data)
> +{
> + struct acpi_device *adev = data;
> +
> + if (dev->fwnode == acpi_fwnode_handle(adev))
> + platform_device_unregister(to_platform_device(dev));
> +
> + return 0;
> +}
> +
> +/**
>   * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects.
>   * @adev: Root of the ACPI namespace scope to walk.
>   *
> @@ -1950,6 +1970,12 @@ void acpi_bus_trim(struct acpi_device *adev)
>   } else {
>   device_release_driver(&adev->dev);
>   }
> +
> + /* Remove platform devices from the bus */
> + if (adev->pnp.type.platform_id)

This patch cases that dock_notify() is broken in find_dock_station()
because the dock_station platform device was removed.

If anyone wants to apply this patch, then the dock device should be
excluded:

+   if (is_dock_device(adev) && adev->pnp.type.platform_id)

> + bus_for_each_dev(&platform_bus_type, NULL, adev,
> +  acpi_bus_trim_platform_device);
> +
>   /*
>* Most likely, the device is going away, so put it into D3cold before
>* that.

Thanks a lot!
Joey Lee

Re: [PATCH v3 2/2] scsi: storvsc: Add support for FC rport.

2017-04-18 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig

Re: [PATCH v3 1/2] scsi: scsi_transport_fc: Add dummy initiator role to rport

2017-04-18 Thread Christoph Hellwig

Looks good,

Reviewed-by: Christoph Hellwig

Re: [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level

2017-04-18 Thread Anshuman Khandual

On 04/19/2017 11:50 AM, Aneesh Kumar K.V wrote:
> Anshuman Khandual  writes:
> 
>> Though migrating gigantic HugeTLB pages does not sound much like real
>> world use case, they can be affected by memory errors. Hence migration
>> at the PGD level HugeTLB pages should be supported just to enable soft
>> and hard offline use cases.
> 
> In that case do we want to isolated the entire 16GB range ? Should we
> just dequeue the page from hugepage pool convert them to regular 64K
> pages and then isolate the 64K that had memory error ?

Though its a better thing to do, assuming that we can actually dequeue
the huge page and push it to the buddy allocator as normal 64K pages
(need to check on this as the original allocation happened from the
memblock instead of the buddy allocator, guess it should be possible
given that we do similar stuff during memory hot plug). In that case
we will also have to consider the same for the PMD based HugeTLB pages
as well or it should be only for these gigantic huge pages ?

Re: [lkp-robot] [mm/madvise] 23a003bfd2: mce-test.ras.fail

2017-04-18 Thread Naoya Horiguchi

On Mon, Apr 17, 2017 at 01:59:48PM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> commit: 23a003bfd23ea9ea0b7756b920e51f64b284b468 ("mm/madvise: pass return 
> code of memory_failure() to userspace")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Yes, this patch makes the result of memory error isolation visible to
userspace, so no wonder that some testcases start to spit failures.
I'm still digging each failure now, but I already found a few real kernel
bugs detected by this. Hopefully I'll post patches in a few days.

Thanks,
Naoya Horiguchi

> 
> in testcase: mce-test
> with following parameters:
> 
>   disk: 1HDD
>   fs: ext4
>   test_case: HWPOISON-HARD
>   test_mode: single
> 
> 
> 
> on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G 
> memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> 
> 
> -wjin:BENCHMARK_ROOT=/lkp/benchmarks
> -wjin: come here 14
> -wjin: come here 90
> mount_points=/fs/sdb3
> 2017-04-14 22:24:00 cp -af ras /fs/sdb3
> <<>>
> Case ID: HWPOISON-HARD"
> 
> hwpoison-inject module is loaded.
> 
> ***
> Pay attention:
> 
> This test is hard mode of HWPoison functional test.
> ***
> 
> 
> 
> Running tsimpleinj (simple hard offline test)
> PASS: ./tkillpoison
> 
> Running tsimpleinj (simple hard offline test)
> dirty page 0x7f511c605000
> signal 7 code 4 addr 0x7f511c605000
> recovered
> mlocked page 0x7f511c604000
> signal 7 code 4 addr 0x7f511c604000
> recovered
> clean file page 0x7f511c603000
> signal 7 code 4 addr 0x7f511c603000
> recovered
> file dirty page 0x7f511c602000
> signal 7 code 4 addr 0x7f511c602000
> recovered
> no error on msync expect error
> no error on fsync expect error
> hole file dirty page 0x7f511c5fe000
> signal 7 code 4 addr 0x7f511c5fe000
> recovered
> no error on hole msync expect error
> no error on hole fsync expect error
> FAILURE -- 2 of 5 cases broken!
> FAIL: ./tsimpleinj returned with failure.
> 
> Running tinjpage (hard offline test on various types of pages)
> vm.memory_failure_early_kill = 0
>  testing dirty anonymous
>   dirty poisoning page 0x7f716c8d1000
>   writing 2
>   signal 7 code 4 addr 0x7f716c8d1000
>   recovered
>  testing dirty anonymous unmap
>   dirty poisoning page 0x7f716c8d
>   writing 2
>   signal 7 code 4 addr 0x7f716c8d
>   recovered
>  testing mlocked anonymous
>   mlocked poisoning page 0x7f716c8d
>   writing 2
>   signal 7 code 4 addr 0x7f716c8d
>   recovered
>  testing file clean
>   file clean poisoning page 0x7f716c8cf000
>   reading 2e
>reading 2e
>   file clean poisoning page 0x7f716c8cf000
>   writing 4
>  testing file dirty
>   file dirty initial poisoning page 0x7f716c8ce000
>   signal 7 code 4 addr 0x7f716c8ce000
>   recovered
>   expected error 5 on msync expect error
>   reading 0
>   reading 0
> LATER: expected likely incorrect no error on explicit read after poison
> LATER: expected likely incorrect no error on explicit write after poison
> LATER: expected likely incorrect no error on fsync expect error
>  testing file hole
>   hole file dirty poisoning page 0x7f716c8ce000
>   signal 7 code 4 addr 0x7f716c8ce000
>   recovered
>   expected optional error 5 on hole fsync expect error
> LATER: expected likely incorrect no error on hole msync expect error
>  testing file clean mlocked
>   file clean mlocked poisoning page 0x7f716c8ca000
>   reading 2e
>reading 2e
>   file clean mlocked poisoning page 0x7f716c8ca000
>   writing 4
>  testing file dirty mlocked
>   file dirty mlocked initial poisoning page 0x7f716c8c9000
>   signal 7 code 4 addr 0x7f716c8c9000
>   recovered
>   expected error 5 on msync expect error
>   reading 0
>   reading 0
> LATER: expected likely incorrect no error on explicit read after poison
> LATER: expected likely incorrect no error on explicit write after poison
> LATER: expected likely incorrect no error on fsync expect error
>  testing nonlinear
>   rfp file dirty poisoning page 0x7f716c8c
>   signal 7 code 4 addr 0x7f716c8c
>   recovered
>   expected error 5 on rfp fsync expect error
> LATER: expected likely incorrect no error on rfp msync expect error
>  testing mmap shared
>   ipv shared page poisoning p

Re: [PATCH 2/3] jump_label: Provide static_key_slow_inc_nohp()

2017-04-18 Thread Peter Zijlstra

On Tue, Apr 18, 2017 at 10:50:43PM +0200, Thomas Gleixner wrote:
> On Tue, 18 Apr 2017, Peter Zijlstra wrote:
> > On Tue, Apr 18, 2017 at 12:46:29PM -0400, Steven Rostedt wrote:
> > > On Tue, 18 Apr 2017 15:03:50 +0200
> > > Peter Zijlstra  wrote:
> > > 
> > > > +++ b/kernel/padata.c
> > > > @@ -1008,11 +1008,10 @@ static struct padata_instance *padata_al
> > > >   * parallel workers.
> > > >   *
> > > >   * @wq: workqueue to use for the allocated padata instance
> > > > - *
> > > > - * Must be called from a get_online_cpus() protected region
> > > 
> > > Find the comment redundant?
> > 
> > Once there's code that enforces it? Yes. Nobody reads comments
> > ;-)
> 
> Nobody enables lockdep either .

In the grand scheme of things, true. But there are more people running
with lockdep enabled than there are people writing code, of which there
are more than people reading relevant comments while writing code.
Therefore having the lockdep annotation is two orders better than a
comment ;-)

Also, I would argue that an "assert" at the start of a function is a
fairly readable 'comment' all by itself.

In any case, I don't care too much. But I typically remove such comments
when I stick a lockdep_assert_held() in.

Re: [RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level

2017-04-18 Thread Aneesh Kumar K.V

Anshuman Khandual  writes:

> Though migrating gigantic HugeTLB pages does not sound much like real
> world use case, they can be affected by memory errors. Hence migration
> at the PGD level HugeTLB pages should be supported just to enable soft
> and hard offline use cases.

In that case do we want to isolated the entire 16GB range ? Should we
just dequeue the page from hugepage pool convert them to regular 64K
pages and then isolate the 64K that had memory error ?

>
> While allocating the new gigantic HugeTLB page, it should not matter
> whether new page comes from the same node or not. There would be very
> few gigantic pages on the system afterall, we should not be bothered
> about node locality when trying to save a big page from crashing.
>
> This introduces a new HugeTLB allocator called alloc_gigantic_page()
> which will scan over all online nodes on the system and allocate a
> single HugeTLB page.
>


-aneesh

linux-next: Tree for Apr 19

2017-04-18 Thread Stephen Rothwell

Hi all,

Changes since 20170418:

The vfs tree gained a conflict against the sparc tree.

The rcu tree gained a build failure so I used the version from
next-20170418.

The staging tree lost its build failures.

The akpm tree lost its build failure and lost 2 patches that turned
up elsewhere.

Non-merge commits (relative to Linus' tree): 9774
 9410 files changed, 1118292 insertions(+), 198756 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 258 trees (counting Linus' and 37 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (005882e53d62 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging fixes/master (97da3854c526 Linux 4.11-rc3)
Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading 
parentheses around a condition)
Merging arc-current/for-curr (6492f09e8644 ARC: [plat-eznps] Fix build error)
Merging arm-current/fixes (3872fe83a2fb Merge branch 'kprobe-fixes' of 
https://git.linaro.org/people/tixy/kernel into fixes)
Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (be5c5e843c4a powerpc/64: Fix HMI exception on LE 
with CONFIG_RELOCATABLE=y)
Merging sparc/master (544f8f935863 sparc64: Fix hugepage page table free)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (acf167f3f249 Merge branch 'bpf-fixes')
Merging ipsec/master (096f41d3a8fc af_key: Fix sadb_x_ipsecrequest parsing)
Merging netfilter/master (fe50543c194e netfilter: ipt_CLUSTERIP: Fix wrong 
conntrack netns refcnt usage)
Merging ipvs/master (0b9aefea8600 tcp: minimize false-positives on TCP/GRO 
check)
Merging wireless-drivers/master (d77facb88448 brcmfmac: use local iftype 
avoiding use-after-free of virtual interface)
Merging mac80211/master (9e478066eae4 mac80211: fix MU-MIMO follow-MAC mode)
Merging sound-current/for-linus (dfb00a569351 ALSA: firewire-lib: fix 
inappropriate assignment between signed/unsigned type)
Merging pci-current/for-linus (b9c1153f7a9c PCI: hisi: Fix DT binding 
(hisi-pcie-almost-ecam))
Merging driver-core.current/driver-core-linus (39da7c509acf Linux 4.11-rc6)
Merging tty.current/tty-linus (4f7d029b9bf0 Linux 4.11-rc7)
Merging usb.current/usb-linus (a71c9a1c779f Linux 4.11-rc5)
Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't 
access hidg->req without spinlock held)
Merging usb-serial-fixes/usb-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON)
Merging staging.current/staging-linus (39da7c509acf Linux 4.11-rc6)
Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging input-current/for-linus (704de489e0e3 Input: elantech - add Fujitsu 
Lifebook E547 to force crc_enabled)
Merging crypto-current/master (e6534aebb26e crypto: algif_aead - Fix bogus 
request dereference in completion function)
Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to 
palm_bk3710_port_info)
Merging vfio-fixes/for-linus (39da7c509acf Linux 4.11-rc6)
Merging kselftest-fixes/fixes (c1ae3cfa0e89 Linux 4.11-rc1)
Merg

[RFC] usb-phy-generic: Add support to SMSC USB3315

2017-04-18 Thread Peter Senna Tschudin

We need the SMSC USB3315 clock and regulator to always be initialized.
We also need the PHY driver to take the PHY out of reset. This patch
extends the existing USB generic nop phy driver to include a new
initialization path.

A new compatible string "smsc,usb3315" is used to decide which
initialization path to use.

CC: Peter Chen 
CC: Stephen Boyd 
CC: Fabien Lahoudere 
Signed-off-by: Peter Senna Tschudin 
---

This is a follow-up of previous discussion:
  https://www.spinics.net/lists/linux-usb/msg146680.html

 drivers/usb/phy/phy-generic.c | 33 +
 drivers/usb/phy/phy-generic.h |  1 +
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/phy/phy-generic.c b/drivers/usb/phy/phy-generic.c
index 89d6e7a..6ea9ce4 100644
--- a/drivers/usb/phy/phy-generic.c
+++ b/drivers/usb/phy/phy-generic.c
@@ -151,6 +151,9 @@ int usb_gen_phy_init(struct usb_phy *phy)
struct usb_phy_generic *nop = dev_get_drvdata(phy->dev);
int ret;
 
+   if (nop->init_done)
+   return 0;
+
if (!IS_ERR(nop->vcc)) {
if (regulator_enable(nop->vcc))
dev_err(phy->dev, "Failed to enable power\n");
@@ -164,6 +167,8 @@ int usb_gen_phy_init(struct usb_phy *phy)
 
nop_reset(nop);
 
+   nop->init_done = true;
+
return 0;
 }
 EXPORT_SYMBOL_GPL(usb_gen_phy_init);
@@ -216,18 +221,29 @@ static int nop_set_host(struct usb_otg *otg, struct 
usb_bus *host)
otg->host = host;
return 0;
 }
+int smsc_usb3315_init(struct usb_phy_generic *nop)
+{
+   /*
+* If the gpio for controlling reset state is not available, try again
+* later
+*/
+   if(!nop->gpiod_reset)
+   return -EPROBE_DEFER;
+
+   return usb_gen_phy_init(&nop->phy);
+}
 
 int usb_phy_gen_create_phy(struct device *dev, struct usb_phy_generic *nop,
struct usb_phy_generic_platform_data *pdata)
 {
+   struct device_node *node = NULL;
enum usb_phy_type type = USB_PHY_TYPE_USB2;
int err = 0;
-
u32 clk_rate = 0;
bool needs_vcc = false;
 
if (dev->of_node) {
-   struct device_node *node = dev->of_node;
+   node = dev->of_node;
 
if (of_property_read_u32(node, "clock-frequency", &clk_rate))
clk_rate = 0;
@@ -304,6 +320,12 @@ int usb_phy_gen_create_phy(struct device *dev, struct 
usb_phy_generic *nop,
nop->phy.otg->set_host  = nop_set_host;
nop->phy.otg->set_peripheral= nop_set_peripheral;
 
+   if(node && of_device_is_compatible(node, "smsc,usb3315")) {
+   err = smsc_usb3315_init(nop);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 EXPORT_SYMBOL_GPL(usb_phy_gen_create_phy);
@@ -318,6 +340,10 @@ static int usb_phy_generic_probe(struct platform_device 
*pdev)
if (!nop)
return -ENOMEM;
 
+   platform_set_drvdata(pdev, nop);
+
+   nop->init_done = false;
+
err = usb_phy_gen_create_phy(dev, nop, dev_get_platdata(&pdev->dev));
if (err)
return err;
@@ -346,8 +372,6 @@ static int usb_phy_generic_probe(struct platform_device 
*pdev)
return err;
}
 
-   platform_set_drvdata(pdev, nop);
-
return 0;
 }
 
@@ -362,6 +386,7 @@ static int usb_phy_generic_remove(struct platform_device 
*pdev)
 
 static const struct of_device_id nop_xceiv_dt_ids[] = {
{ .compatible = "usb-nop-xceiv" },
+   { .compatible = "smsc,usb3315" },
{ }
 };
 
diff --git a/drivers/usb/phy/phy-generic.h b/drivers/usb/phy/phy-generic.h
index 0d0eadd..db4ade6 100644
--- a/drivers/usb/phy/phy-generic.h
+++ b/drivers/usb/phy/phy-generic.h
@@ -14,6 +14,7 @@ struct usb_phy_generic {
struct gpio_desc *gpiod_vbus;
struct regulator *vbus_draw;
bool vbus_draw_enabled;
+   bool init_done;
unsigned long mA;
unsigned int vbus;
 };
-- 
2.9.3

Re: copy_page() on a kmalloc-ed page with DEBUG_SLAB enabled (was "zram: do not use copy_page with non-page alinged address")

2017-04-18 Thread Sergey Senozhatsky

On (04/18/17 13:06), Michal Hocko wrote:
[..]
> > > copy_page is a performance sensitive function and I believe that we do
> > > those tricks exactly for this purpose.
> > 
> > a wild thought,
> > 
> > use
> > #define copy_page(to,from)  memcpy((to), (from), PAGE_SIZE)
> > 
> > when DEBUG_SLAB is set? so arch copy_page() (if provided by arch)
> > won't be affected otherwise.
> 
> SLAB is not guaranteed to provide page size aligned object AFAIR.

oh, if there are no guarantees for page_sized allocations regardless
the .config then agree, won't help.

-ss

Re: [PATCH] mfd: menelaus: remove obsolete local_irq_disable() and local_irq_enable()

2017-04-18 Thread Martin Kepplinger



On 2017-04-12 22:15, Aaro Koskinen wrote:
> On Mon, Apr 10, 2017 at 11:37:18AM +0200, Martin Kepplinger wrote:
>> Since
>>
>> commit e6229bec25be ("rtc: make rtc_update_irq callable with irqs enabled")
>>
>> rtc_update_irq() is callable with irqs enabled, see the rtc drivers.
>> So update this accordingly.
>>
>> Signed-off-by: Martin Kepplinger 
>> Cc: Henri Roosen 
> 
> I think the patch looks OK, so:
> 
> Acked-by: Aaro Koskinen 
> 

Any other concern about this?

  martin

Re: copy_page() on a kmalloc-ed page with DEBUG_SLAB enabled (was "zram: do not use copy_page with non-page alinged address")

2017-04-18 Thread Minchan Kim

Hello Michal,

On Tue, Apr 18, 2017 at 09:33:07AM +0200, Michal Hocko wrote:
> On Tue 18-04-17 09:03:19, Minchan Kim wrote:
> > On Mon, Apr 17, 2017 at 10:20:42AM -0500, Christoph Lameter wrote:
> > > On Mon, 17 Apr 2017, Sergey Senozhatsky wrote:
> > > 
> > > > Minchan reported that doing copy_page() on a kmalloc(PAGE_SIZE) page
> > > > with DEBUG_SLAB enabled can cause a memory corruption (See below or
> > > > lkml.kernel.org/r/1492042622-12074-2-git-send-email-minc...@kernel.org )
> > > 
> > > Yes the alignment guarantees do not require alignment on a page boundary.
> > > 
> > > The alignment for kmalloc allocations is controlled by KMALLOC_MIN_ALIGN.
> > > Usually this is either double word aligned or cache line aligned.
> > > 
> > > > that's an interesting problem. arm64 copy_page(), for instance, wants 
> > > > src
> > > > and dst to be page aligned, which is reasonable, while generic 
> > > > copy_page(),
> > > > on the contrary, simply does memcpy(). there are, probably, other 
> > > > callpaths
> > > > that do copy_page() on kmalloc-ed pages and I'm wondering if there is 
> > > > some
> > > > sort of a generic fix to the problem.
> > > 
> > > Simple solution is to not allocate pages via the slab allocator but use
> > > the page allocator for this. The page allocator provides proper alignment.
> > > 
> > > There is a reason it is called the page allocator because if you want a
> > > page you use the proper allocator for it.
> 
> Agreed. Using the slab allocator for page sized object is just wasting
> cycles and additional metadata.
> 
> > It would be better if the APIs works with struct page, not address but
> > I can imagine there are many cases where don't have struct page itself
> > and redundant for kmap/kunmap.
> 
> I do not follow. Why would you need kmap for something that is already
> in the kernel space?

Because it can work with highmem pages.

> 
> > Another approach is the API does normal thing for non-aligned prefix and
> > tail space and fast thing for aligned space.
> > Otherwise, it would be happy if the API has WARN_ON non-page SIZE aligned
> > address.
> 
> copy_page is a performance sensitive function and I believe that we do
> those tricks exactly for this purpose. Why would we want to add an
> overhead for the alignment check or WARN_ON when using unaligned
> pointers? I do see that debugging a subtle memory corruption is PITA
> but that doesn't imply we should clobber the hot path IMHO.

What I wanted is VM_WARN_ON so it shouldn't be no overhead for whom
want really fast kernel. 

> 
> A big fat warning for copy_page would be definitely helpful though.

It's better than as-is but everyone doesn't read comment like such
simple API(e.g., clear_page(void *mem)), esp. And once it happens,
it's really subtle because for exmaple, you have not seen any bug
without slub debug. Based on it, you add new feature and crashed
for testing. To find a bug, you enable slub_debug. Bang.
you encounter a new bug lurked for a long time.
VM_WARN_ON would be valuable but I'm okay any option which might
have better to catch the bug if someone donates his time to fix
it up.

Thanks.

Re: linux-next: build failure after merge of the rcu tree

2017-04-18 Thread Stephen Rothwell

Hi Paul,

On Tue, 18 Apr 2017 21:06:20 -0700 "Paul E. McKenney" 
 wrote:
>
> Or at least broken in a more subtle and creative way.  ;-)

What I live for :-)

-- 
Cheers,
Stephen Rothwell

Re: Re: "mm: move pcp and lru-pcp draining into single wq" broke resume from s2ram

2017-04-18 Thread Tetsuo Handa

Geert Uytterhoeven wrote:
> 8 locks held by s2ram/1899:
>  #0:  (sb_writers#7){.+.+.+}, at: [] vfs_write+0xa8/0x15c
>  #1:  (&of->mutex){+.+.+.}, at: [] 
> kernfs_fop_write+0xf0/0x194
>  #2:  (s_active#48){.+.+.+}, at: [] 
> kernfs_fop_write+0xf8/0x194
>  #3:  (pm_mutex){+.+.+.}, at: [] pm_suspend+0x16c/0xabc
>  #4:  (&dev->mutex){..}, at: [] device_resume+0x58/0x190
>  #5:  (cma_mutex){+.+...}, at: [] cma_alloc+0x150/0x374
>  #6:  (lock){+.+...}, at: [] lru_add_drain_all+0x4c/0x1b4
>  #7:  (cpu_hotplug.dep_map){++}, at: [] 
> get_online_cpus+0x3c/0x9c

I think this situation suggests that

int pm_suspend(suspend_state_t state) {
  error = enter_state(state) {
if (!mutex_trylock(&pm_mutex)) /* #3 */
  return -EBUSY;
error = suspend_devices_and_enter(state) {
  error = suspend_enter(state, &wakeup) {
enable_nonboot_cpus() {
  cpu_maps_update_begin() {
mutex_lock(&cpu_add_remove_lock);
  }
  pr_info("Enabling non-boot CPUs ...\n");
  for_each_cpu(cpu, frozen_cpus) {
error = _cpu_up(cpu, 1, CPUHP_ONLINE) {
  cpu_hotplug_begin() {
mutex_lock(&cpu_hotplug.lock);
  }
  
  cpu_hotplug_done() {
mutex_unlock(&cpu_hotplug.lock);
  }
}
if (!error) {
  pr_info("CPU%d is up\n", cpu);
  continue;
}
  }
  cpu_maps_update_done() {
 mutex_unlock(&cpu_add_remove_lock);
  }
}
  }
  dpm_resume_end(PMSG_RESUME) {
dpm_resume(state) {
  mutex_lock(&dpm_list_mtx);
  while (!list_empty(&dpm_suspended_list)) {
mutex_unlock(&dpm_list_mtx);
error = device_resume(dev, state, false) {
  dpm_wait_for_superior(dev, async);
  dpm_watchdog_set(&wd, dev);
  device_lock(dev) {
mutex_lock(&dev->mutex); /* #4 */
  }
  error = dpm_run_callback(callback, dev, state, info) {
cma_alloc() {
  mutex_lock(&cma_mutex); /* #5 */
  alloc_contig_range() {
lru_add_drain_all() {
  mutex_lock(&lock); /* #6 */
  get_online_cpus() {
mutex_lock(&cpu_hotplug.lock); /* #7 hang? */
mutex_unlock(&cpu_hotplug.lock);
  }
  put_online_cpus();
  mutex_unlock(&lock); /* #6 */
}
  }
  mutex_unlock(&cma_mutex); /* #5 */
}
  }
  device_unlock(dev) {
mutex_unlock(&dev->mutex); /* #4 */
  }
}
mutex_lock(&dpm_list_mtx);
  }
  mutex_unlock(&dpm_list_mtx);
}
dpm_complete(state) {
  mutex_lock(&dpm_list_mtx);
  while (!list_empty(&dpm_prepared_list)) {
mutex_unlock(&dpm_list_mtx);
device_complete(dev, state) {
}
mutex_lock(&dpm_list_mtx);
  }
  mutex_unlock(&dpm_list_mtx);
}
  }
}
mutex_unlock(&pm_mutex); /* #3 */
  }
}

Somebody is waiting forever with cpu_hotplug.lock held?
I think that full dmesg with SysRq-t output is appreciated.

Re: export pcie_flr and remove copies of it in drivers V2

2017-04-18 Thread Leon Romanovsky

On Tue, Apr 18, 2017 at 01:36:12PM -0500, Bjorn Helgaas wrote:
> On Fri, Apr 14, 2017 at 09:11:24PM +0200, Christoph Hellwig wrote:
> > Hi all,
> >
> > this exports the PCI layer pcie_flr helper, and removes various opencoded
> > copies of it.
> >
> > Changes since V1:
> >  - rebase on top of the pci/virtualization branch
> >  - fixed the probe case in __pci_dev_reset
> >  - added ACKs from Bjorn
>
> Applied the first three patches:
>
>   bc13871ef35a PCI: Export pcie_flr()
>   e641c375d414 PCI: Call pcie_flr() from reset_intel_82599_sfp_virtfn()
>   40e0901ea4bf PCI: Call pcie_flr() from reset_chelsio_generic_dev()
>

Bjorn,

How do you suggest to proceed with other patches? They should be applied
to your tree either, because they depend on "bc13871ef35a PCI: Export
pcie_flr()".

Thanks


> to pci/virtualization for v4.12, thanks!


signature.asc
Description: PGP signature

[PATCH V3 02/17] thermal: cpu_cooling: rearrange globals

2017-04-18 Thread Viresh Kumar

Just to make it look better.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index be29489dd247..ce94aafed25d 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -105,8 +105,8 @@ struct cpufreq_cooling_device {
struct device *cpu_dev;
get_static_t plat_get_static_power;
 };
-static DEFINE_IDA(cpufreq_ida);
 
+static DEFINE_IDA(cpufreq_ida);
 static DEFINE_MUTEX(cooling_list_lock);
 static LIST_HEAD(cpufreq_dev_list);
 
-- 
2.12.0.432.g71c3a4f4ba37

[PATCH V3 01/17] thermal: cpu_cooling: Avoid accessing potentially freed structures

2017-04-18 Thread Viresh Kumar

After the lock is dropped, it is possible that the cpufreq_dev gets
freed before we call get_level() and that can cause kernel to crash.

Drop the lock after we are done using the structure.

Cc: 4.2+  # 4.2+
Fixes: 02373d7c69b4 ("thermal: cpu_cooling: fix lockdep problems in 
cpu_cooling")
Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 69d0f430b2d1..be29489dd247 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -153,8 +153,10 @@ unsigned long cpufreq_cooling_get_level(unsigned int cpu, 
unsigned int freq)
mutex_lock(&cooling_list_lock);
list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
if (cpumask_test_cpu(cpu, &cpufreq_dev->allowed_cpus)) {
+   unsigned long level = get_level(cpufreq_dev, freq);
+
mutex_unlock(&cooling_list_lock);
-   return get_level(cpufreq_dev, freq);
+   return level;
}
}
mutex_unlock(&cooling_list_lock);
-- 
2.12.0.432.g71c3a4f4ba37

[PATCH V3 09/17] thermal: cpu_cooling: store cpufreq policy

2017-04-18 Thread Viresh Kumar

The cpufreq policy can be used by the cpu_cooling driver, lets store it
in the cpufreq_cooling_device structure.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 55ff45c1e917..7dddc7443f5d 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -67,6 +67,7 @@ struct power_table {
  * registered.
  * @cdev: thermal_cooling_device pointer to keep track of the
  * registered cooling device.
+ * @policy: cpufreq policy.
  * @cpufreq_state: integer value representing the current state of cpufreq
  * cooling devices.
  * @clipped_freq: integer value representing the absolute value of the clipped
@@ -91,6 +92,7 @@ struct power_table {
 struct cpufreq_cooling_device {
int id;
struct thermal_cooling_device *cdev;
+   struct cpufreq_policy *policy;
unsigned int cpufreq_state;
unsigned int clipped_freq;
unsigned int max_level;
@@ -760,6 +762,7 @@ __cpufreq_cooling_register(struct device_node *np,
if (!cpufreq_cdev)
return ERR_PTR(-ENOMEM);
 
+   cpufreq_cdev->policy = policy;
num_cpus = cpumask_weight(policy->related_cpus);
cpufreq_cdev->time_in_idle = kcalloc(num_cpus,
sizeof(*cpufreq_cdev->time_in_idle),
-- 
2.12.0.432.g71c3a4f4ba37

Re: [PATCH V3 02/16] block, bfq: add full hierarchical scheduling and cgroups support

2017-04-18 Thread Paolo Valente

> Il giorno 18 apr 2017, alle ore 09:04, Tejun Heo  ha scritto:
> 
> Hello, Paolo.
> 
> On Wed, Apr 12, 2017 at 07:22:03AM +0200, Paolo Valente wrote:
>> could you elaborate a bit more on this?  I mean, cgroups support has
>> been in BFQ (and CFQ) for almost ten years, perfectly working as far
>> as I know.  Of course it is perfectly working in terms of I/O and not
>> of CPU bandwidth distribution; and, for the moment, it is effective
>> only for devices below 30-50KIOPS.  What's the point in throwing
>> (momentarily?) away such a fundamental feature?  What am I missing?
> 
> I've been trying to track down latency issues with the CPU controller
> which basically takes the same approach and I'm not sure nesting
> scheduler timelines is a good approach.  It intuitively feels elegant
> but seems to have some fundamental issues.  IIUC, bfq isn't quite the
> same in that it doesn't need load balancer across multiple queues and
> it could be that bfq is close enough to the basic model that the
> nested behavior maps to the correct scheduling behavior.
> 
> However, for example, in the CPU controller, the nested timelines
> break sleeper boost.  The boost is implemented by considering the
> thread to have woken up upto some duration prior to the current time;
> however, it only affects the timeline inside the cgroup and there's no
> good way to propagate it upwards.  The final result is two threads in
> a cgroup with the double weight can behave significantly worse in
> terms of latency compared to two threads with the weight of 1 in the
> root.
> 

Hi Tejun,
I don't know in detail the specific multiple-queue issues you report,
but bfq implements the upward propagation you mention: if a process in
a group is to be privileged, i.e., if the process has basically to be
provided with a higher weight (in addition to other important forms of
help), then this weight boost is propagated upward through the path
from the process to the root node in the group hierarchy.

> Given that the nested scheduling ends up pretty expensive, I'm not
> sure how good a model this nesting approach is.  Especially if there
> can be multiple queues, the weight distribution across cgroup
> instances across multiple queues has to be coordinated globally
> anyway,

To get perfect global service guarantees, yes.  But you can settle
with tradeoffs that, according to my experience with storage and
packet I/O, are so good to be probably indistinguishable from an
ideal, but too costly solution.  I mean, with a well-done approximated
scheduling solution, the deviation with respect to an ideal service
can be in the same order of the noise caused by unavoidable latencies
of other sw and hw components than the scheduler.

> so the weight / cost adjustment part can't happen
> automatically anyway as in single queue case.  If we're going there,
> we might as well implement cgroup support by actively modulating the
> combined weights, which will make individual scheduling operations
> cheaper and it easier to think about and guarantee latency behaviors.
> 

Yes.  Anyway, I didn't quite understand what is or could be the
alternative, w.r.t. hierarchical scheduling, for guaranteeing
bandwidth distribution of shared resources in a complex setting.  If
you think I could be of any help on this, just put me somehow in the
loop.

> If you think that bfq will stay single queue and won't need timeline
> modifying heuristics (for responsiveness or whatever), the current
> approach could be fine, but I'm a bit awry about committing to the
> current approach if we're gonna encounter the same problems.
> 

As of now, bfq is targeted at not too fast devices (< 30-50KIOPS),
which happen to be single queue.  In particular, bfq is currently
agnostic w.r.t.  to the number of downstream queues.

Thanks,
Paolo

> Thanks.
> 
> -- 
> tejun

[PATCH V3 08/17] cpufreq: create cpufreq_table_count_valid_entries()

2017-04-18 Thread Viresh Kumar

We need such a routine at two places already, lets create one.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq_stats.c | 13 -
 drivers/thermal/cpu_cooling.c   | 22 +-
 include/linux/cpufreq.h | 14 ++
 3 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index f570ead62454..9c3d319dc129 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -170,11 +170,10 @@ void cpufreq_stats_create_table(struct cpufreq_policy 
*policy)
unsigned int i = 0, count = 0, ret = -ENOMEM;
struct cpufreq_stats *stats;
unsigned int alloc_size;
-   struct cpufreq_frequency_table *pos, *table;
+   struct cpufreq_frequency_table *pos;
 
-   /* We need cpufreq table for creating stats table */
-   table = policy->freq_table;
-   if (unlikely(!table))
+   count = cpufreq_table_count_valid_entries(policy);
+   if (!count)
return;
 
/* stats already initialized */
@@ -185,10 +184,6 @@ void cpufreq_stats_create_table(struct cpufreq_policy 
*policy)
if (!stats)
return;
 
-   /* Find total allocation size */
-   cpufreq_for_each_valid_entry(pos, table)
-   count++;
-
alloc_size = count * sizeof(int) + count * sizeof(u64);
 
alloc_size += count * count * sizeof(int);
@@ -205,7 +200,7 @@ void cpufreq_stats_create_table(struct cpufreq_policy 
*policy)
stats->max_state = count;
 
/* Find valid-unique entries */
-   cpufreq_for_each_valid_entry(pos, table)
+   cpufreq_for_each_valid_entry(pos, policy->freq_table)
if (freq_table_get_index(stats, pos->frequency) == -1)
stats->freq_table[i++] = pos->frequency;
 
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 58e58065b650..55ff45c1e917 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -739,7 +739,6 @@ __cpufreq_cooling_register(struct device_node *np,
struct thermal_cooling_device *cdev;
struct cpufreq_cooling_device *cpufreq_cdev;
char dev_name[THERMAL_NAME_LENGTH];
-   struct cpufreq_frequency_table *pos, *table;
unsigned int freq, i, num_cpus;
int ret;
struct thermal_cooling_device_ops *cooling_ops;
@@ -750,9 +749,10 @@ __cpufreq_cooling_register(struct device_node *np,
return ERR_PTR(-EINVAL);
}
 
-   table = policy->freq_table;
-   if (!table) {
-   pr_debug("%s: CPUFreq table not found\n", __func__);
+   i = cpufreq_table_count_valid_entries(policy);
+   if (!i) {
+   pr_debug("%s: CPUFreq table not found or has no valid 
entries\n",
+__func__);
return ERR_PTR(-ENODEV);
}
 
@@ -777,20 +777,16 @@ __cpufreq_cooling_register(struct device_node *np,
goto free_time_in_idle;
}
 
-   /* Find max levels */
-   cpufreq_for_each_valid_entry(pos, table)
-   cpufreq_cdev->max_level++;
+   /* max_level is an index, not a counter */
+   cpufreq_cdev->max_level = i - 1;
 
-   cpufreq_cdev->freq_table = kmalloc(sizeof(*cpufreq_cdev->freq_table) *
- cpufreq_cdev->max_level, GFP_KERNEL);
+   cpufreq_cdev->freq_table = kmalloc(sizeof(*cpufreq_cdev->freq_table) * 
i,
+ GFP_KERNEL);
if (!cpufreq_cdev->freq_table) {
cdev = ERR_PTR(-ENOMEM);
goto free_time_in_idle_timestamp;
}
 
-   /* max_level is an index, not a counter */
-   cpufreq_cdev->max_level--;
-
cpumask_copy(&cpufreq_cdev->allowed_cpus, policy->related_cpus);
 
if (capacitance) {
@@ -816,7 +812,7 @@ __cpufreq_cooling_register(struct device_node *np,
 
/* Fill freq-table in descending order of frequencies */
for (i = 0, freq = -1; i <= cpufreq_cdev->max_level; i++) {
-   freq = find_next_max(table, freq);
+   freq = find_next_max(policy->freq_table, freq);
cpufreq_cdev->freq_table[i] = freq;
 
/* Warn for duplicate entries */
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 87165f06a307..affc13568af6 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -855,6 +855,20 @@ static inline int cpufreq_frequency_table_target(struct 
cpufreq_policy *policy,
return -EINVAL;
}
 }
+
+static inline int cpufreq_table_count_valid_entries(const struct 
cpufreq_policy *policy)
+{
+   struct cpufreq_frequency_table *pos;
+   int count = 0;
+
+   if (unlikely(!policy->freq_table))
+   return 0;
+
+   cpufreq_for_each_valid_entry(pos, policy->freq_table)
+   count++;
+
+   return count;
+}
 #else
 stat

[PATCH V3 14/17] thermal: cpu_cooling: get_level() can't fail

2017-04-18 Thread Viresh Kumar

The frequency passed to get_level() is returned by cpu_power_to_freq()
and it is guaranteed that get_level() can't fail.

Get rid of error code.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 71d15448a293..762ddfc4e654 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -119,22 +119,19 @@ static LIST_HEAD(cpufreq_cdev_list);
  * @cpufreq_cdev: cpufreq_cdev for which the property is required
  * @freq: Frequency
  *
- * Return: level on success, THERMAL_CSTATE_INVALID on error.
+ * Return: level corresponding to the frequency.
  */
 static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev,
   unsigned int freq)
 {
+   struct freq_table *freq_table = cpufreq_cdev->freq_table;
unsigned long level;
 
-   for (level = 0; level <= cpufreq_cdev->max_level; level++) {
-   if (freq == cpufreq_cdev->freq_table[level].frequency)
-   return level;
-
-   if (freq > cpufreq_cdev->freq_table[level].frequency)
+   for (level = 1; level < cpufreq_cdev->max_level; level++)
+   if (freq > freq_table[level].frequency)
break;
-   }
 
-   return THERMAL_CSTATE_INVALID;
+   return level - 1;
 }
 
 /**
@@ -623,13 +620,6 @@ static int cpufreq_power2state(struct 
thermal_cooling_device *cdev,
target_freq = cpu_power_to_freq(cpufreq_cdev, normalised_power);
 
*state = get_level(cpufreq_cdev, target_freq);
-   if (*state == THERMAL_CSTATE_INVALID) {
-   dev_err_ratelimited(&cdev->device,
-   "Failed to convert %dKHz for cpu %d into a 
cdev state\n",
-   target_freq, policy->cpu);
-   return -EINVAL;
-   }
-
trace_thermal_power_cpu_limit(policy->related_cpus, target_freq, *state,
  power);
return 0;
-- 
2.12.0.432.g71c3a4f4ba37

[PATCH V3 15/17] thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev

2017-04-18 Thread Viresh Kumar

'cpu_dev' is used by only one function, get_static_power(), and it
wouldn't be time consuming to get the cpu device structure within it.
This would help removing cpu_dev from struct cpufreq_cooling_device.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 762ddfc4e654..c85b217d16c8 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -87,7 +87,6 @@ struct time_in_idle {
  * @node: list_head to link all cpufreq_cooling_device together.
  * @last_load: load measured by the latest call to 
cpufreq_get_requested_power()
  * @idle_time: idle time stats
- * @cpu_dev: the cpu_device of policy->cpu.
  * @plat_get_static_power: callback to calculate the static power
  *
  * This structure is required for keeping information of each registered
@@ -104,7 +103,6 @@ struct cpufreq_cooling_device {
struct list_head node;
u32 last_load;
struct time_in_idle *idle_time;
-   struct device *cpu_dev;
get_static_t plat_get_static_power;
 };
 
@@ -255,8 +253,6 @@ static int update_freq_table(struct cpufreq_cooling_device 
*cpufreq_cdev,
freq_table[i].power = power;
}
 
-   cpufreq_cdev->cpu_dev = dev;
-
return 0;
 }
 
@@ -338,19 +334,22 @@ static int get_static_power(struct cpufreq_cooling_device 
*cpufreq_cdev,
 {
struct dev_pm_opp *opp;
unsigned long voltage;
-   struct cpumask *cpumask = cpufreq_cdev->policy->related_cpus;
+   struct cpufreq_policy *policy = cpufreq_cdev->policy;
+   struct cpumask *cpumask = policy->related_cpus;
unsigned long freq_hz = freq * 1000;
+   struct device *dev;
 
-   if (!cpufreq_cdev->plat_get_static_power || !cpufreq_cdev->cpu_dev) {
+   if (!cpufreq_cdev->plat_get_static_power) {
*power = 0;
return 0;
}
 
-   opp = dev_pm_opp_find_freq_exact(cpufreq_cdev->cpu_dev, freq_hz,
-true);
+   dev = get_cpu_device(policy->cpu);
+   WARN_ON(!dev);
+
+   opp = dev_pm_opp_find_freq_exact(dev, freq_hz, true);
if (IS_ERR(opp)) {
-   dev_warn_ratelimited(cpufreq_cdev->cpu_dev,
-"Failed to find OPP for frequency %lu: 
%ld\n",
+   dev_warn_ratelimited(dev, "Failed to find OPP for frequency 
%lu: %ld\n",
 freq_hz, PTR_ERR(opp));
return -EINVAL;
}
@@ -359,8 +358,7 @@ static int get_static_power(struct cpufreq_cooling_device 
*cpufreq_cdev,
dev_pm_opp_put(opp);
 
if (voltage == 0) {
-   dev_err_ratelimited(cpufreq_cdev->cpu_dev,
-   "Failed to get voltage for frequency %lu\n",
+   dev_err_ratelimited(dev, "Failed to get voltage for frequency 
%lu\n",
freq_hz);
return -EINVAL;
}
-- 
2.12.0.432.g71c3a4f4ba37

Re: [PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-18 Thread Hoeun Ryu


> On Apr 18, 2017, at 3:59 PM, Michal Hocko  wrote:
> 
>> On Tue 18-04-17 14:48:39, Hoeun Ryu wrote:
>> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
>> during boot process and those virtually mapped areas are never unmapped.
>> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
>> existing vmlist entries and prevent those areas from being removed from the
>> rbtree by accident.
> 
> Has this been a problem in the past or currently so that it is worth
> handling?
> 
>> This flags can be also used by other vmalloc APIs to
>> specify that the area will never go away.
> 
> Do we have a user for that?
> 
>> This makes remove_vm_area() more robust against other kind of errors (eg.
>> programming errors).
> 
> Well, yes it will help to prevent from vfree(early_mem) but we have 4
> users of vm_area_register_early so I am really wondering whether this is
> worth additional code. It would really help to understand your
> motivation for the patch if we were explicit about the problem you are
> trying to solve.

I just think that it would be good to make it robust against various kind of 
errors.
You might think that's not an enough reason to do so though.

> 
> Thanks
> 
> -- 
> Michal Hocko
> SUSE Labs

[PATCH V3 10/17] thermal: cpu_cooling: OPPs are registered for all CPUs

2017-04-18 Thread Viresh Kumar

The OPPs are registered for all CPUs of a cpufreq policy now and we
don't need to run the loop in build_dyn_power_table(). Just check for
the policy->cpu and we should be fine.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 7dddc7443f5d..ce387f62c93e 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -83,7 +83,7 @@ struct power_table {
  * @dyn_power_table: array of struct power_table for frequency to power
  * conversion, sorted in ascending order.
  * @dyn_power_table_entries: number of entries in the @dyn_power_table array
- * @cpu_dev: the first cpu_device from @allowed_cpus that has OPPs registered
+ * @cpu_dev: the cpu_device of policy->cpu.
  * @plat_get_static_power: callback to calculate the static power
  *
  * This structure is required for keeping information of each registered
@@ -207,24 +207,20 @@ static int build_dyn_power_table(struct 
cpufreq_cooling_device *cpufreq_cdev,
struct power_table *power_table;
struct dev_pm_opp *opp;
struct device *dev = NULL;
-   int num_opps = 0, cpu, i, ret = 0;
+   int num_opps = 0, cpu = cpufreq_cdev->policy->cpu, i, ret = 0;
unsigned long freq;
 
-   for_each_cpu(cpu, &cpufreq_cdev->allowed_cpus) {
-   dev = get_cpu_device(cpu);
-   if (!dev) {
-   dev_warn(&cpufreq_cdev->cdev->device,
-"No cpu device for cpu %d\n", cpu);
-   continue;
-   }
-
-   num_opps = dev_pm_opp_get_opp_count(dev);
-   if (num_opps > 0)
-   break;
-   else if (num_opps < 0)
-   return num_opps;
+   dev = get_cpu_device(cpu);
+   if (unlikely(!dev)) {
+   dev_warn(&cpufreq_cdev->cdev->device,
+"No cpu device for cpu %d\n", cpu);
+   return -ENODEV;
}
 
+   num_opps = dev_pm_opp_get_opp_count(dev);
+   if (num_opps < 0)
+   return num_opps;
+
if (num_opps == 0)
return -EINVAL;
 
-- 
2.12.0.432.g71c3a4f4ba37

[PATCH V3 12/17] thermal: cpu_cooling: merge frequency and power tables

2017-04-18 Thread Viresh Kumar

The cpu_cooling driver keeps two tables:

- freq_table: table of frequencies in descending order, built from
  policy->freq_table.

- power_table: table of frequencies and power in ascending order, built
  from OPP table.

If the OPPs are used for the CPU device then both these tables are
actually built using the OPP core and should have the same frequency
entries. And there is no need to keep separate tables for this.

Lets merge them both.

Note that the new table is in descending order of frequencies and so the
'for' loops were required to be fixed at few places to make it work.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 153 ++
 1 file changed, 67 insertions(+), 86 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 1097162f7f8a..17d6d4635936 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -49,14 +49,14 @@
  */
 
 /**
- * struct power_table - frequency to power conversion
+ * struct freq_table - frequency table along with power entries
  * @frequency: frequency in KHz
  * @power: power in mW
  *
  * This structure is built when the cooling device registers and helps
- * in translating frequency to power and viceversa.
+ * in translating frequency to power and vice versa.
  */
-struct power_table {
+struct freq_table {
u32 frequency;
u32 power;
 };
@@ -79,9 +79,6 @@ struct power_table {
  * @time_in_idle: previous reading of the absolute time that this cpu was idle
  * @time_in_idle_timestamp: wall time of the last invocation of
  * get_cpu_idle_time_us()
- * @dyn_power_table: array of struct power_table for frequency to power
- * conversion, sorted in ascending order.
- * @dyn_power_table_entries: number of entries in the @dyn_power_table array
  * @cpu_dev: the cpu_device of policy->cpu.
  * @plat_get_static_power: callback to calculate the static power
  *
@@ -95,13 +92,11 @@ struct cpufreq_cooling_device {
unsigned int cpufreq_state;
unsigned int clipped_freq;
unsigned int max_level;
-   unsigned int *freq_table;   /* In descending order */
+   struct freq_table *freq_table;  /* In descending order */
struct list_head node;
u32 last_load;
u64 *time_in_idle;
u64 *time_in_idle_timestamp;
-   struct power_table *dyn_power_table;
-   int dyn_power_table_entries;
struct device *cpu_dev;
get_static_t plat_get_static_power;
 };
@@ -125,10 +120,10 @@ static unsigned long get_level(struct 
cpufreq_cooling_device *cpufreq_cdev,
unsigned long level;
 
for (level = 0; level <= cpufreq_cdev->max_level; level++) {
-   if (freq == cpufreq_cdev->freq_table[level])
+   if (freq == cpufreq_cdev->freq_table[level].frequency)
return level;
 
-   if (freq > cpufreq_cdev->freq_table[level])
+   if (freq > cpufreq_cdev->freq_table[level].frequency)
break;
}
 
@@ -185,28 +180,25 @@ static int cpufreq_thermal_notifier(struct notifier_block 
*nb,
 }
 
 /**
- * build_dyn_power_table() - create a dynamic power to frequency table
- * @cpufreq_cdev:  the cpufreq cooling device in which to store the table
+ * update_freq_table() - Update the freq table with power numbers
+ * @cpufreq_cdev:  the cpufreq cooling device in which to update the table
  * @capacitance: dynamic power coefficient for these cpus
  *
- * Build a dynamic power to frequency table for this cpu and store it
- * in @cpufreq_cdev.  This table will be used in cpu_power_to_freq() and
- * cpu_freq_to_power() to convert between power and frequency
- * efficiently.  Power is stored in mW, frequency in KHz.  The
- * resulting table is in ascending order.
+ * Update the freq table with power numbers.  This table will be used in
+ * cpu_power_to_freq() and cpu_freq_to_power() to convert between power and
+ * frequency efficiently.  Power is stored in mW, frequency in KHz.  The
+ * resulting table is in descending order.
  *
  * Return: 0 on success, -EINVAL if there are no OPPs for any CPUs,
- * -ENOMEM if we run out of memory or -EAGAIN if an OPP was
- * added/enabled while the function was executing.
+ * or -ENOMEM if we run out of memory.
  */
-static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_cdev,
-u32 capacitance)
+static int update_freq_table(struct cpufreq_cooling_device *cpufreq_cdev,
+u32 capacitance)
 {
-   struct power_table *power_table;
+   struct freq_table *freq_table = cpufreq_cdev->freq_table;
struct dev_pm_opp *opp;
struct device *dev = NULL;
-   int num_opps = 0, cpu = cpufreq_cdev->policy->cpu, i, ret = 0;
-   unsigned long freq;
+   int num_opps = 0, cpu = cpufreq_cdev->policy->cpu, i;
 
dev = get_cpu_device(cpu);
if (unlikely(!dev)) {

[PATCH V3 13/17] thermal: cpu_cooling: create structure for idle time stats

2017-04-18 Thread Viresh Kumar

We keep two arrays for idle time stats and allocate memory for them
separately. It would be much easier to follow if we create an array of
idle stats structure instead and allocate it once.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 53 ---
 1 file changed, 25 insertions(+), 28 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 17d6d4635936..71d15448a293 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -62,6 +62,16 @@ struct freq_table {
 };
 
 /**
+ * struct time_in_idle - Idle time stats
+ * @time: previous reading of the absolute time that this cpu was idle
+ * @timestamp: wall time of the last invocation of get_cpu_idle_time_us()
+ */
+struct time_in_idle {
+   u64 time;
+   u64 timestamp;
+};
+
+/**
  * struct cpufreq_cooling_device - data for cooling device with cpufreq
  * @id: unique integer value corresponding to each cpufreq_cooling_device
  * registered.
@@ -76,9 +86,7 @@ struct freq_table {
  * cpufreq frequencies.
  * @node: list_head to link all cpufreq_cooling_device together.
  * @last_load: load measured by the latest call to 
cpufreq_get_requested_power()
- * @time_in_idle: previous reading of the absolute time that this cpu was idle
- * @time_in_idle_timestamp: wall time of the last invocation of
- * get_cpu_idle_time_us()
+ * @idle_time: idle time stats
  * @cpu_dev: the cpu_device of policy->cpu.
  * @plat_get_static_power: callback to calculate the static power
  *
@@ -95,8 +103,7 @@ struct cpufreq_cooling_device {
struct freq_table *freq_table;  /* In descending order */
struct list_head node;
u32 last_load;
-   u64 *time_in_idle;
-   u64 *time_in_idle_timestamp;
+   struct time_in_idle *idle_time;
struct device *cpu_dev;
get_static_t plat_get_static_power;
 };
@@ -296,18 +303,19 @@ static u32 get_load(struct cpufreq_cooling_device 
*cpufreq_cdev, int cpu,
 {
u32 load;
u64 now, now_idle, delta_time, delta_idle;
+   struct time_in_idle *idle_time = &cpufreq_cdev->idle_time[cpu_idx];
 
now_idle = get_cpu_idle_time(cpu, &now, 0);
-   delta_idle = now_idle - cpufreq_cdev->time_in_idle[cpu_idx];
-   delta_time = now - cpufreq_cdev->time_in_idle_timestamp[cpu_idx];
+   delta_idle = now_idle - idle_time->time;
+   delta_time = now - idle_time->timestamp;
 
if (delta_time <= delta_idle)
load = 0;
else
load = div64_u64(100 * (delta_time - delta_idle), delta_time);
 
-   cpufreq_cdev->time_in_idle[cpu_idx] = now_idle;
-   cpufreq_cdev->time_in_idle_timestamp[cpu_idx] = now;
+   idle_time->time = now_idle;
+   idle_time->timestamp = now;
 
return load;
 }
@@ -711,22 +719,14 @@ __cpufreq_cooling_register(struct device_node *np,
 
cpufreq_cdev->policy = policy;
num_cpus = cpumask_weight(policy->related_cpus);
-   cpufreq_cdev->time_in_idle = kcalloc(num_cpus,
-   sizeof(*cpufreq_cdev->time_in_idle),
-   GFP_KERNEL);
-   if (!cpufreq_cdev->time_in_idle) {
+   cpufreq_cdev->idle_time = kcalloc(num_cpus,
+sizeof(*cpufreq_cdev->idle_time),
+GFP_KERNEL);
+   if (!cpufreq_cdev->idle_time) {
cdev = ERR_PTR(-ENOMEM);
goto free_cdev;
}
 
-   cpufreq_cdev->time_in_idle_timestamp =
-   kcalloc(num_cpus, sizeof(*cpufreq_cdev->time_in_idle_timestamp),
-   GFP_KERNEL);
-   if (!cpufreq_cdev->time_in_idle_timestamp) {
-   cdev = ERR_PTR(-ENOMEM);
-   goto free_time_in_idle;
-   }
-
/* max_level is an index, not a counter */
cpufreq_cdev->max_level = i - 1;
 
@@ -734,7 +734,7 @@ __cpufreq_cooling_register(struct device_node *np,
  GFP_KERNEL);
if (!cpufreq_cdev->freq_table) {
cdev = ERR_PTR(-ENOMEM);
-   goto free_time_in_idle_timestamp;
+   goto free_idle_time;
}
 
ret = ida_simple_get(&cpufreq_ida, 0, 0, GFP_KERNEL);
@@ -797,10 +797,8 @@ __cpufreq_cooling_register(struct device_node *np,
ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id);
 free_table:
kfree(cpufreq_cdev->freq_table);
-free_time_in_idle_timestamp:
-   kfree(cpufreq_cdev->time_in_idle_timestamp);
-free_time_in_idle:
-   kfree(cpufreq_cdev->time_in_idle);
+free_idle_time:
+   kfree(cpufreq_cdev->idle_time);
 free_cdev:
kfree(cpufreq_cdev);
return cdev;
@@ -943,8 +941,7 @@ void cpufreq_cooling_unregister(struct 
thermal_cooling_device *cdev)
 
thermal_cooling_device_unregister(cpufreq_cdev->cdev);
ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id);
-

[PATCH V3 17/17] thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device

2017-04-18 Thread Viresh Kumar

This shrinks the size of the structure on arm64 by 8 bytes by avoiding
padding of 4 bytes at two places.

Also add missing doc comment for freq_table

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index dc73405b04f2..05073c33ba20 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -75,17 +75,18 @@ struct time_in_idle {
  * struct cpufreq_cooling_device - data for cooling device with cpufreq
  * @id: unique integer value corresponding to each cpufreq_cooling_device
  * registered.
- * @cdev: thermal_cooling_device pointer to keep track of the
- * registered cooling device.
- * @policy: cpufreq policy.
+ * @last_load: load measured by the latest call to 
cpufreq_get_requested_power()
  * @cpufreq_state: integer value representing the current state of cpufreq
  * cooling devices.
  * @clipped_freq: integer value representing the absolute value of the clipped
  * frequency.
  * @max_level: maximum cooling level. One less than total number of valid
  * cpufreq frequencies.
+ * @freq_table: Freq table in descending order of frequencies
+ * @cdev: thermal_cooling_device pointer to keep track of the
+ * registered cooling device.
+ * @policy: cpufreq policy.
  * @node: list_head to link all cpufreq_cooling_device together.
- * @last_load: load measured by the latest call to 
cpufreq_get_requested_power()
  * @idle_time: idle time stats
  * @plat_get_static_power: callback to calculate the static power
  *
@@ -94,14 +95,14 @@ struct time_in_idle {
  */
 struct cpufreq_cooling_device {
int id;
-   struct thermal_cooling_device *cdev;
-   struct cpufreq_policy *policy;
+   u32 last_load;
unsigned int cpufreq_state;
unsigned int clipped_freq;
unsigned int max_level;
struct freq_table *freq_table;  /* In descending order */
+   struct thermal_cooling_device *cdev;
+   struct cpufreq_policy *policy;
struct list_head node;
-   u32 last_load;
struct time_in_idle *idle_time;
get_static_t plat_get_static_power;
 };
-- 
2.12.0.432.g71c3a4f4ba37

[PATCH V3 07/17] thermal: cpu_cooling: use cpufreq_policy to register cooling device

2017-04-18 Thread Viresh Kumar

The CPU cooling driver uses the cpufreq policy, to get clip_cpus, the
frequency table, etc. Most of the callers of CPU cooling driver's
registration routines have the cpufreq policy with them, but they only
pass the policy->related_cpus cpumask. The __cpufreq_cooling_register()
routine then gets the policy by itself and uses it.

It would be much better if the callers can pass the policy instead
directly. This also fixes a basic design flaw, where the policy can be
freed while the CPU cooling driver is still active.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/arm_big_little.c   |  2 +-
 drivers/cpufreq/cpufreq-dt.c   |  2 +-
 drivers/cpufreq/dbx500-cpufreq.c   |  2 +-
 drivers/cpufreq/mt8173-cpufreq.c   |  4 +-
 drivers/cpufreq/qoriq-cpufreq.c|  3 +-
 drivers/thermal/cpu_cooling.c  | 61 --
 drivers/thermal/imx_thermal.c  | 22 ++--
 drivers/thermal/ti-soc-thermal/ti-thermal-common.c | 22 +---
 include/linux/cpu_cooling.h| 26 -
 9 files changed, 74 insertions(+), 70 deletions(-)

diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 418042201e6d..ea6d62547b10 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -540,7 +540,7 @@ static void bL_cpufreq_ready(struct cpufreq_policy *policy)
 &power_coefficient);
 
cdev[cur_cluster] = of_cpufreq_power_cooling_register(np,
-   policy->related_cpus, power_coefficient, NULL);
+   policy, power_coefficient, NULL);
if (IS_ERR(cdev[cur_cluster])) {
dev_err(cpu_dev,
"running cpufreq without cooling device: %ld\n",
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index c943787d761e..fef3c2160691 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -326,7 +326,7 @@ static void cpufreq_ready(struct cpufreq_policy *policy)
 &power_coefficient);
 
priv->cdev = of_cpufreq_power_cooling_register(np,
-   policy->related_cpus, power_coefficient, NULL);
+   policy, power_coefficient, NULL);
if (IS_ERR(priv->cdev)) {
dev_err(priv->cpu_dev,
"running cpufreq without cooling device: %ld\n",
diff --git a/drivers/cpufreq/dbx500-cpufreq.c b/drivers/cpufreq/dbx500-cpufreq.c
index 3575b82210ba..4ee0431579c1 100644
--- a/drivers/cpufreq/dbx500-cpufreq.c
+++ b/drivers/cpufreq/dbx500-cpufreq.c
@@ -43,7 +43,7 @@ static int dbx500_cpufreq_exit(struct cpufreq_policy *policy)
 
 static void dbx500_cpufreq_ready(struct cpufreq_policy *policy)
 {
-   cdev = cpufreq_cooling_register(policy->cpus);
+   cdev = cpufreq_cooling_register(policy);
if (IS_ERR(cdev))
pr_err("Failed to register cooling device %ld\n", 
PTR_ERR(cdev));
else
diff --git a/drivers/cpufreq/mt8173-cpufreq.c b/drivers/cpufreq/mt8173-cpufreq.c
index fd1886faf33a..f9f00fb4bc3a 100644
--- a/drivers/cpufreq/mt8173-cpufreq.c
+++ b/drivers/cpufreq/mt8173-cpufreq.c
@@ -320,9 +320,7 @@ static void mtk_cpufreq_ready(struct cpufreq_policy *policy)
of_property_read_u32(np, DYNAMIC_POWER, &capacitance);
 
info->cdev = of_cpufreq_power_cooling_register(np,
-   policy->related_cpus,
-   capacitance,
-   NULL);
+   policy, capacitance, NULL);
 
if (IS_ERR(info->cdev)) {
dev_err(info->cpu_dev,
diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c
index e2ea433a5f9c..4ada55b8856e 100644
--- a/drivers/cpufreq/qoriq-cpufreq.c
+++ b/drivers/cpufreq/qoriq-cpufreq.c
@@ -278,8 +278,7 @@ static void qoriq_cpufreq_ready(struct cpufreq_policy 
*policy)
struct device_node *np = of_get_cpu_node(policy->cpu, NULL);
 
if (of_find_property(np, "#cooling-cells", NULL)) {
-   cpud->cdev = of_cpufreq_cooling_register(np,
-policy->related_cpus);
+   cpud->cdev = of_cpufreq_cooling_register(np, policy);
 
if (IS_ERR(cpud->cdev) && PTR_ERR(cpud->cdev) != -ENOSYS) {
pr_err("cpu%d is not running as cooling device: %ld\n",
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 002b48dc6bea..58e58065b650 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -717,7 +717,7 @@ static unsigned int find_next_max(struct 
cp

[PATCH V3 16/17] thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()

2017-04-18 Thread Viresh Kumar

The frequency table shouldn't have any zero frequency entries and so
such a check isn't required. Though it would be better to make sure
'state' is within limits.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index c85b217d16c8..dc73405b04f2 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -560,12 +560,13 @@ static int cpufreq_state2power(struct 
thermal_cooling_device *cdev,
int ret;
struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
 
+   /* Request state should be less than max_level */
+   if (WARN_ON(state > cpufreq_cdev->max_level))
+   return -EINVAL;
+
num_cpus = cpumask_weight(cpufreq_cdev->policy->cpus);
 
freq = cpufreq_cdev->freq_table[state].frequency;
-   if (!freq)
-   return -EINVAL;
-
dynamic_power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus;
ret = get_static_power(cpufreq_cdev, tz, freq, &static_power);
if (ret)
-- 
2.12.0.432.g71c3a4f4ba37

[PATCH V3 03/17] thermal: cpu_cooling: Name cpufreq cooling devices as cpufreq_cdev

2017-04-18 Thread Viresh Kumar

Objects of "struct cpufreq_cooling_device" are named a bit
inconsistently. Lets use cpufreq_cdev everywhere. Also note that the
lists containing such devices is renamed similarly too.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 248 +-
 1 file changed, 124 insertions(+), 124 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index ce94aafed25d..80a46a80817b 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -108,27 +108,27 @@ struct cpufreq_cooling_device {
 
 static DEFINE_IDA(cpufreq_ida);
 static DEFINE_MUTEX(cooling_list_lock);
-static LIST_HEAD(cpufreq_dev_list);
+static LIST_HEAD(cpufreq_cdev_list);
 
 /* Below code defines functions to be used for cpufreq as cooling device */
 
 /**
  * get_level: Find the level for a particular frequency
- * @cpufreq_dev: cpufreq_dev for which the property is required
+ * @cpufreq_cdev: cpufreq_cdev for which the property is required
  * @freq: Frequency
  *
  * Return: level on success, THERMAL_CSTATE_INVALID on error.
  */
-static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_dev,
+static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev,
   unsigned int freq)
 {
unsigned long level;
 
-   for (level = 0; level <= cpufreq_dev->max_level; level++) {
-   if (freq == cpufreq_dev->freq_table[level])
+   for (level = 0; level <= cpufreq_cdev->max_level; level++) {
+   if (freq == cpufreq_cdev->freq_table[level])
return level;
 
-   if (freq > cpufreq_dev->freq_table[level])
+   if (freq > cpufreq_cdev->freq_table[level])
break;
}
 
@@ -148,12 +148,12 @@ static unsigned long get_level(struct 
cpufreq_cooling_device *cpufreq_dev,
  */
 unsigned long cpufreq_cooling_get_level(unsigned int cpu, unsigned int freq)
 {
-   struct cpufreq_cooling_device *cpufreq_dev;
+   struct cpufreq_cooling_device *cpufreq_cdev;
 
mutex_lock(&cooling_list_lock);
-   list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
-   if (cpumask_test_cpu(cpu, &cpufreq_dev->allowed_cpus)) {
-   unsigned long level = get_level(cpufreq_dev, freq);
+   list_for_each_entry(cpufreq_cdev, &cpufreq_cdev_list, node) {
+   if (cpumask_test_cpu(cpu, &cpufreq_cdev->allowed_cpus)) {
+   unsigned long level = get_level(cpufreq_cdev, freq);
 
mutex_unlock(&cooling_list_lock);
return level;
@@ -183,14 +183,14 @@ static int cpufreq_thermal_notifier(struct notifier_block 
*nb,
 {
struct cpufreq_policy *policy = data;
unsigned long clipped_freq;
-   struct cpufreq_cooling_device *cpufreq_dev;
+   struct cpufreq_cooling_device *cpufreq_cdev;
 
if (event != CPUFREQ_ADJUST)
return NOTIFY_DONE;
 
mutex_lock(&cooling_list_lock);
-   list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
-   if (!cpumask_test_cpu(policy->cpu, &cpufreq_dev->allowed_cpus))
+   list_for_each_entry(cpufreq_cdev, &cpufreq_cdev_list, node) {
+   if (!cpumask_test_cpu(policy->cpu, &cpufreq_cdev->allowed_cpus))
continue;
 
/*
@@ -204,7 +204,7 @@ static int cpufreq_thermal_notifier(struct notifier_block 
*nb,
 * But, if clipped_freq is greater than policy->max, we don't
 * need to do anything.
 */
-   clipped_freq = cpufreq_dev->clipped_freq;
+   clipped_freq = cpufreq_cdev->clipped_freq;
 
if (policy->max > clipped_freq)
cpufreq_verify_within_limits(policy, 0, clipped_freq);
@@ -217,11 +217,11 @@ static int cpufreq_thermal_notifier(struct notifier_block 
*nb,
 
 /**
  * build_dyn_power_table() - create a dynamic power to frequency table
- * @cpufreq_device:the cpufreq cooling device in which to store the table
+ * @cpufreq_cdev:  the cpufreq cooling device in which to store the table
  * @capacitance: dynamic power coefficient for these cpus
  *
  * Build a dynamic power to frequency table for this cpu and store it
- * in @cpufreq_device.  This table will be used in cpu_power_to_freq() and
+ * in @cpufreq_cdev.  This table will be used in cpu_power_to_freq() and
  * cpu_freq_to_power() to convert between power and frequency
  * efficiently.  Power is stored in mW, frequency in KHz.  The
  * resulting table is in ascending order.
@@ -230,7 +230,7 @@ static int cpufreq_thermal_notifier(struct notifier_block 
*nb,
  * -ENOMEM if we run out of memory or -EAGAIN if an OPP was
  * added/enabled while the function was executing.
  */
-static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+static int build_dyn_power_table(struct cpufreq

[PATCH V3 06/17] thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()

2017-04-18 Thread Viresh Kumar

'cpu' is used at only one place and there is no need to keep a separate
variable for it.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 1f4b6a719d05..002b48dc6bea 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -456,7 +456,6 @@ static int cpufreq_set_cur_state(struct 
thermal_cooling_device *cdev,
 unsigned long state)
 {
struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
-   unsigned int cpu = cpumask_any(&cpufreq_cdev->allowed_cpus);
unsigned int clip_freq;
 
/* Request state should be less than max_level */
@@ -471,7 +470,7 @@ static int cpufreq_set_cur_state(struct 
thermal_cooling_device *cdev,
cpufreq_cdev->cpufreq_state = state;
cpufreq_cdev->clipped_freq = clip_freq;
 
-   cpufreq_update_policy(cpu);
+   cpufreq_update_policy(cpumask_any(&cpufreq_cdev->allowed_cpus));
 
return 0;
 }
-- 
2.12.0.432.g71c3a4f4ba37

[PATCH V3 04/17] thermal: cpu_cooling: replace cool_dev with cdev

2017-04-18 Thread Viresh Kumar

Objects of "struct thermal_cooling_device" are named a bit
inconsistently. Lets use cdev everywhere.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 37 ++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 80a46a80817b..f1e784c22c5a 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -65,7 +65,7 @@ struct power_table {
  * struct cpufreq_cooling_device - data for cooling device with cpufreq
  * @id: unique integer value corresponding to each cpufreq_cooling_device
  * registered.
- * @cool_dev: thermal_cooling_device pointer to keep track of the
+ * @cdev: thermal_cooling_device pointer to keep track of the
  * registered cooling device.
  * @cpufreq_state: integer value representing the current state of cpufreq
  * cooling devices.
@@ -90,7 +90,7 @@ struct power_table {
  */
 struct cpufreq_cooling_device {
int id;
-   struct thermal_cooling_device *cool_dev;
+   struct thermal_cooling_device *cdev;
unsigned int cpufreq_state;
unsigned int clipped_freq;
unsigned int max_level;
@@ -242,7 +242,7 @@ static int build_dyn_power_table(struct 
cpufreq_cooling_device *cpufreq_cdev,
for_each_cpu(cpu, &cpufreq_cdev->allowed_cpus) {
dev = get_cpu_device(cpu);
if (!dev) {
-   dev_warn(&cpufreq_cdev->cool_dev->device,
+   dev_warn(&cpufreq_cdev->cdev->device,
 "No cpu device for cpu %d\n", cpu);
continue;
}
@@ -769,7 +769,7 @@ __cpufreq_cooling_register(struct device_node *np,
get_static_t plat_static_func)
 {
struct cpufreq_policy *policy;
-   struct thermal_cooling_device *cool_dev;
+   struct thermal_cooling_device *cdev;
struct cpufreq_cooling_device *cpufreq_cdev;
char dev_name[THERMAL_NAME_LENGTH];
struct cpufreq_frequency_table *pos, *table;
@@ -786,20 +786,20 @@ __cpufreq_cooling_register(struct device_node *np,
policy = cpufreq_cpu_get(cpumask_first(temp_mask));
if (!policy) {
pr_debug("%s: CPUFreq policy not found\n", __func__);
-   cool_dev = ERR_PTR(-EPROBE_DEFER);
+   cdev = ERR_PTR(-EPROBE_DEFER);
goto free_cpumask;
}
 
table = policy->freq_table;
if (!table) {
pr_debug("%s: CPUFreq table not found\n", __func__);
-   cool_dev = ERR_PTR(-ENODEV);
+   cdev = ERR_PTR(-ENODEV);
goto put_policy;
}
 
cpufreq_cdev = kzalloc(sizeof(*cpufreq_cdev), GFP_KERNEL);
if (!cpufreq_cdev) {
-   cool_dev = ERR_PTR(-ENOMEM);
+   cdev = ERR_PTR(-ENOMEM);
goto put_policy;
}
 
@@ -808,7 +808,7 @@ __cpufreq_cooling_register(struct device_node *np,
sizeof(*cpufreq_cdev->time_in_idle),
GFP_KERNEL);
if (!cpufreq_cdev->time_in_idle) {
-   cool_dev = ERR_PTR(-ENOMEM);
+   cdev = ERR_PTR(-ENOMEM);
goto free_cdev;
}
 
@@ -816,7 +816,7 @@ __cpufreq_cooling_register(struct device_node *np,
kcalloc(num_cpus, sizeof(*cpufreq_cdev->time_in_idle_timestamp),
GFP_KERNEL);
if (!cpufreq_cdev->time_in_idle_timestamp) {
-   cool_dev = ERR_PTR(-ENOMEM);
+   cdev = ERR_PTR(-ENOMEM);
goto free_time_in_idle;
}
 
@@ -827,7 +827,7 @@ __cpufreq_cooling_register(struct device_node *np,
cpufreq_cdev->freq_table = kmalloc(sizeof(*cpufreq_cdev->freq_table) *
  cpufreq_cdev->max_level, GFP_KERNEL);
if (!cpufreq_cdev->freq_table) {
-   cool_dev = ERR_PTR(-ENOMEM);
+   cdev = ERR_PTR(-ENOMEM);
goto free_time_in_idle_timestamp;
}
 
@@ -841,7 +841,7 @@ __cpufreq_cooling_register(struct device_node *np,
 
ret = build_dyn_power_table(cpufreq_cdev, capacitance);
if (ret) {
-   cool_dev = ERR_PTR(ret);
+   cdev = ERR_PTR(ret);
goto free_table;
}
 
@@ -852,7 +852,7 @@ __cpufreq_cooling_register(struct device_node *np,
 
ret = ida_simple_get(&cpufreq_ida, 0, 0, GFP_KERNEL);
if (ret < 0) {
-   cool_dev = ERR_PTR(ret);
+   cdev = ERR_PTR(ret);
goto free_power_table;
}
cpufreq_cdev->id = ret;
@@ -872,14 +872,13 @@ __cpufreq_cooling_register(struct device_node *np,
snprintf(dev_name, sizeof(dev_name), "thermal-cpufreq-%d",
 cpufreq_cdev->id);
 
-   cool_dev = thermal_of_cooling_dev

[PATCH V3 11/17] thermal: cpu_cooling: get rid of 'allowed_cpus'

2017-04-18 Thread Viresh Kumar

'allowed_cpus' is a copy of policy->related_cpus and can be replaced by
it directly. At some places we are only concerned about online CPUs and
policy->cpus can be used there.

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 77 ---
 1 file changed, 21 insertions(+), 56 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index ce387f62c93e..1097162f7f8a 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -74,7 +74,6 @@ struct power_table {
  * frequency.
  * @max_level: maximum cooling level. One less than total number of valid
  * cpufreq frequencies.
- * @allowed_cpus: all the cpus involved for this cpufreq_cooling_device.
  * @node: list_head to link all cpufreq_cooling_device together.
  * @last_load: load measured by the latest call to 
cpufreq_get_requested_power()
  * @time_in_idle: previous reading of the absolute time that this cpu was idle
@@ -97,7 +96,6 @@ struct cpufreq_cooling_device {
unsigned int clipped_freq;
unsigned int max_level;
unsigned int *freq_table;   /* In descending order */
-   struct cpumask allowed_cpus;
struct list_head node;
u32 last_load;
u64 *time_in_idle;
@@ -161,7 +159,7 @@ static int cpufreq_thermal_notifier(struct notifier_block 
*nb,
 
mutex_lock(&cooling_list_lock);
list_for_each_entry(cpufreq_cdev, &cpufreq_cdev_list, node) {
-   if (!cpumask_test_cpu(policy->cpu, &cpufreq_cdev->allowed_cpus))
+   if (policy != cpufreq_cdev->policy)
continue;
 
/*
@@ -304,7 +302,7 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device 
*cpufreq_cdev,
  * get_load() - get load for a cpu since last updated
  * @cpufreq_cdev:  &struct cpufreq_cooling_device for this cpu
  * @cpu:   cpu number
- * @cpu_idx:   index of the cpu in cpufreq_cdev->allowed_cpus
+ * @cpu_idx:   index of the cpu in time_in_idle*
  *
  * Return: The average load of cpu @cpu in percentage since this
  * function was last called.
@@ -351,7 +349,7 @@ static int get_static_power(struct cpufreq_cooling_device 
*cpufreq_cdev,
 {
struct dev_pm_opp *opp;
unsigned long voltage;
-   struct cpumask *cpumask = &cpufreq_cdev->allowed_cpus;
+   struct cpumask *cpumask = cpufreq_cdev->policy->related_cpus;
unsigned long freq_hz = freq * 1000;
 
if (!cpufreq_cdev->plat_get_static_power || !cpufreq_cdev->cpu_dev) {
@@ -468,7 +466,7 @@ static int cpufreq_set_cur_state(struct 
thermal_cooling_device *cdev,
cpufreq_cdev->cpufreq_state = state;
cpufreq_cdev->clipped_freq = clip_freq;
 
-   cpufreq_update_policy(cpumask_any(&cpufreq_cdev->allowed_cpus));
+   cpufreq_update_policy(cpufreq_cdev->policy->cpu);
 
return 0;
 }
@@ -504,28 +502,18 @@ static int cpufreq_get_requested_power(struct 
thermal_cooling_device *cdev,
int i = 0, cpu, ret;
u32 static_power, dynamic_power, total_load = 0;
struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
+   struct cpufreq_policy *policy = cpufreq_cdev->policy;
u32 *load_cpu = NULL;
 
-   cpu = cpumask_any_and(&cpufreq_cdev->allowed_cpus, cpu_online_mask);
-
-   /*
-* All the CPUs are offline, thus the requested power by
-* the cdev is 0
-*/
-   if (cpu >= nr_cpu_ids) {
-   *power = 0;
-   return 0;
-   }
-
-   freq = cpufreq_quick_get(cpu);
+   freq = cpufreq_quick_get(policy->cpu);
 
if (trace_thermal_power_cpu_get_power_enabled()) {
-   u32 ncpus = cpumask_weight(&cpufreq_cdev->allowed_cpus);
+   u32 ncpus = cpumask_weight(policy->related_cpus);
 
load_cpu = kcalloc(ncpus, sizeof(*load_cpu), GFP_KERNEL);
}
 
-   for_each_cpu(cpu, &cpufreq_cdev->allowed_cpus) {
+   for_each_cpu(cpu, policy->related_cpus) {
u32 load;
 
if (cpu_online(cpu))
@@ -550,9 +538,9 @@ static int cpufreq_get_requested_power(struct 
thermal_cooling_device *cdev,
}
 
if (load_cpu) {
-   trace_thermal_power_cpu_get_power(
-   &cpufreq_cdev->allowed_cpus,
-   freq, load_cpu, i, dynamic_power, static_power);
+   trace_thermal_power_cpu_get_power(policy->related_cpus, freq,
+ load_cpu, i, dynamic_power,
+ static_power);
 
kfree(load_cpu);
}
@@ -581,38 +569,22 @@ static int cpufreq_state2power(struct 
thermal_cooling_device *cdev,
   unsigned long state, u32 *power)
 {
unsigned int freq, num_cpus;
-   cpumask_var_t cpumask;
u32 static_power, dynamic_power;
int ret;
struct cpufreq_cooling_device *cpufreq_cdev = cdev-

[PATCH V3 05/17] thermal: cpu_cooling: remove cpufreq_cooling_get_level()

2017-04-18 Thread Viresh Kumar

There is only one user of cpufreq_cooling_get_level() and that already
has pointer to the cpufreq_cdev structure. It can directly call
get_level() instead and we can get rid of cpufreq_cooling_get_level().

Signed-off-by: Viresh Kumar 
---
 drivers/thermal/cpu_cooling.c | 33 +
 include/linux/cpu_cooling.h   |  6 --
 2 files changed, 1 insertion(+), 38 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index f1e784c22c5a..1f4b6a719d05 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -136,37 +136,6 @@ static unsigned long get_level(struct 
cpufreq_cooling_device *cpufreq_cdev,
 }
 
 /**
- * cpufreq_cooling_get_level - for a given cpu, return the cooling level.
- * @cpu: cpu for which the level is required
- * @freq: the frequency of interest
- *
- * This function will match the cooling level corresponding to the
- * requested @freq and return it.
- *
- * Return: The matched cooling level on success or THERMAL_CSTATE_INVALID
- * otherwise.
- */
-unsigned long cpufreq_cooling_get_level(unsigned int cpu, unsigned int freq)
-{
-   struct cpufreq_cooling_device *cpufreq_cdev;
-
-   mutex_lock(&cooling_list_lock);
-   list_for_each_entry(cpufreq_cdev, &cpufreq_cdev_list, node) {
-   if (cpumask_test_cpu(cpu, &cpufreq_cdev->allowed_cpus)) {
-   unsigned long level = get_level(cpufreq_cdev, freq);
-
-   mutex_unlock(&cooling_list_lock);
-   return level;
-   }
-   }
-   mutex_unlock(&cooling_list_lock);
-
-   pr_err("%s: cpu:%d not part of any cooling device\n", __func__, cpu);
-   return THERMAL_CSTATE_INVALID;
-}
-EXPORT_SYMBOL_GPL(cpufreq_cooling_get_level);
-
-/**
  * cpufreq_thermal_notifier - notifier callback for cpufreq policy change.
  * @nb:struct notifier_block * with callback info.
  * @event: value showing cpufreq event for which this function invoked.
@@ -697,7 +666,7 @@ static int cpufreq_power2state(struct 
thermal_cooling_device *cdev,
normalised_power = (dyn_power * 100) / last_load;
target_freq = cpu_power_to_freq(cpufreq_cdev, normalised_power);
 
-   *state = cpufreq_cooling_get_level(cpu, target_freq);
+   *state = get_level(cpufreq_cdev, target_freq);
if (*state == THERMAL_CSTATE_INVALID) {
dev_err_ratelimited(&cdev->device,
"Failed to convert %dKHz for cpu %d into a 
cdev state\n",
diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h
index c156f5082758..96c5e4c2f9c8 100644
--- a/include/linux/cpu_cooling.h
+++ b/include/linux/cpu_cooling.h
@@ -82,7 +82,6 @@ of_cpufreq_power_cooling_register(struct device_node *np,
  */
 void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev);
 
-unsigned long cpufreq_cooling_get_level(unsigned int cpu, unsigned int freq);
 #else /* !CONFIG_CPU_THERMAL */
 static inline struct thermal_cooling_device *
 cpufreq_cooling_register(const struct cpumask *clip_cpus)
@@ -117,11 +116,6 @@ void cpufreq_cooling_unregister(struct 
thermal_cooling_device *cdev)
 {
return;
 }
-static inline
-unsigned long cpufreq_cooling_get_level(unsigned int cpu, unsigned int freq)
-{
-   return THERMAL_CSTATE_INVALID;
-}
 #endif /* CONFIG_CPU_THERMAL */
 
 #endif /* __CPU_COOLING_H__ */
-- 
2.12.0.432.g71c3a4f4ba37

[PATCH V3 00/17] thermal: cpu_cooling: improve interaction with cpufreq core

2017-04-18 Thread Viresh Kumar

Hi Guys,

The cpu_cooling driver is designed to use CPU frequency scaling to avoid
high thermal states for a platform. But it wasn't glued really well with
cpufreq core. For example clipped-cpus is copied from the policy
structure and its much better to use the policy->cpus (or related_cpus)
fields directly as they may have got updated. Not that things were
broken before this series, but they can be optimized a bit more.

This series tries to improve interactions between cpufreq core and
cpu_cooling driver and does some fixes/cleanups to the cpu_cooling
driver.

I have tested it on ARM 32 (exynos) and 64 bit (hikey) boards and have
pushed them for 0-day build bot and kernel CI testing as well. We should
know if something is broken with these.

@Lukasz: It would be good if you can give them a test, specially because
of your work on the "power" specific bits in the driver. This series
already has the improvements you suggested.

Pushed here as well:

git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git thermal/cooling

V2->V3:
- Additional check to guarantee that policy is valid.
- Initialize freq-table and cpufreq_cdev->policy fields before they are
  used by the power-cooling functionality.
- Thanks Lukasz for testing out and suggesting these changes.

V1->V2:
- Name cpufreq cooling dev as cpufreq_cdev everywhere (Eduardo).

--
viresh

Viresh Kumar (17):
  thermal: cpu_cooling: Avoid accessing potentially freed structures
  thermal: cpu_cooling: rearrange globals
  thermal: cpu_cooling: Name cpufreq cooling devices as cpufreq_cdev
  thermal: cpu_cooling: replace cool_dev with cdev
  thermal: cpu_cooling: remove cpufreq_cooling_get_level()
  thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()
  thermal: cpu_cooling: use cpufreq_policy to register cooling device
  cpufreq: create cpufreq_table_count_valid_entries()
  thermal: cpu_cooling: store cpufreq policy
  thermal: cpu_cooling: OPPs are registered for all CPUs
  thermal: cpu_cooling: get rid of 'allowed_cpus'
  thermal: cpu_cooling: merge frequency and power tables
  thermal: cpu_cooling: create structure for idle time stats
  thermal: cpu_cooling: get_level() can't fail
  thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev
  thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()
  thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device

 drivers/cpufreq/arm_big_little.c   |   2 +-
 drivers/cpufreq/cpufreq-dt.c   |   2 +-
 drivers/cpufreq/cpufreq_stats.c|  13 +-
 drivers/cpufreq/dbx500-cpufreq.c   |   2 +-
 drivers/cpufreq/mt8173-cpufreq.c   |   4 +-
 drivers/cpufreq/qoriq-cpufreq.c|   3 +-
 drivers/thermal/cpu_cooling.c  | 602 +
 drivers/thermal/imx_thermal.c  |  22 +-
 drivers/thermal/ti-soc-thermal/ti-thermal-common.c |  22 +-
 include/linux/cpu_cooling.h|  32 +-
 include/linux/cpufreq.h|  14 +
 11 files changed, 311 insertions(+), 407 deletions(-)

-- 
2.12.0.432.g71c3a4f4ba37

Re: [PATCH] acpi: fix typo

2017-04-18 Thread Cao jin

Hi

On 04/19/2017 08:20 AM, Rafael J. Wysocki wrote:
> On Fri, Mar 31, 2017 at 11:46 AM, Cao jin  wrote:
>> Signed-off-by: Cao jin 
>> ---
>>  Documentation/acpi/linuxized-acpica.txt | 10 +-
> 
> Please send changes to this file separately.
> 
>>  include/acpi/actypes.h  |  4 ++--
> 
> This one belongs to ACPICA and there is a special process for
> modifying ACPICA files.
> 

I have read the process. So, does that means, we never send acpica patch
to kernel mail list, they only can be sent to acpica project?

-- 
Sincerely,
Cao jin

Re: [PATCH V2 00/17] thermal: cpu_cooling: improve interaction with cpufreq core

2017-04-18 Thread Viresh Kumar

On 18-04-17, 15:40, Lukasz Luba wrote:
> Hi Viresh,
> 
> I have checkout your branch at newest commit:
> 908063832c268f8add94
> I have built it and run it on my Juno r2.
> I have some python tests for IPA and I run one of them.
> 
> I seen a few issues so I have created a patch just
> to be able to run IPA.
> My next email will have the patch so you can see the changes.
> 
> IPA does not work with this patch set.
> I have tested two source codes from your repo:
> 1. your change 908063832c268f8add94
> 2. your base 8f506e0faf4e2a4a0bde9f9b1
> 
> In case 1. IPA does not work - temperature rises to 83degC
> in case 2. works - temperature is limited to 65degC.

Yeah, there were some cases power specific cases that weren't covered in my
tests and thanks a lot for testing it out. I have pushed my branch again and it
has all your fixes (a bit refined) in it.

> On Monday I can allocate more time for it.

My branch should just work now. Please see if you can allocate 10-15 min today
to give it a try, so that we can get it in earlier. I will send a V3 today and
your Tested-by would be very much appreciated.

Thanks Lukasz.

-- 
viresh

Re: [PATCH] make TIOCSTI ioctl require CAP_SYS_ADMIN

2017-04-18 Thread Kees Cook

On Tue, Apr 18, 2017 at 9:58 PM, Serge E. Hallyn  wrote:
> On Tue, Apr 18, 2017 at 11:45:26PM -0400, Matt Brown wrote:
>> This patch reproduces GRKERNSEC_HARDEN_TTY functionality from the grsecurity
>> project in-kernel.
>>
>> This will create the Kconfig SECURITY_TIOCSTI_RESTRICT and the corresponding
>> sysctl kernel.tiocsti_restrict that, when activated, restrict all TIOCSTI
>> ioctl calls from non CAP_SYS_ADMIN users.
>>
>> Possible effects on userland:
>>
>> There could be a few user programs that would be effected by this
>> change.
>> See: 
>> notable programs are: agetty, csh, xemacs and tcsh
>>
>> However, I still believe that this change is worth it given that the
>> Kconfig defaults to n. This will be a feature that is turned on for the
>
> It's not worthless, but note that for instance before this was fixed
> in lxc, this patch would not have helped with escapes from privileged
> containers.
>
>> same reason that people activate it when using grsecurity. Users of this
>> opt-in feature will realize that they are choosing security over some OS
>> features like unprivileged TIOCSTI ioctls, as should be clear in the
>> Kconfig help message.
>>
>> Threat Model/Patch Rational:
>>
>> >From grsecurity's config for GRKERNSEC_HARDEN_TTY.
>>
>>  | There are very few legitimate uses for this functionality and it
>>  | has made vulnerabilities in several 'su'-like programs possible in
>>  | the past.  Even without these vulnerabilities, it provides an
>>  | attacker with an easy mechanism to move laterally among other
>>  | processes within the same user's compromised session.
>>
>> So if one process within a tty session becomes compromised it can follow
>> that additional processes, that are thought to be in different security
>> boundaries, can be compromised as a result. When using a program like su
>> or sudo, these additional processes could be in a tty session where TTY file
>> descriptors are indeed shared over privilege boundaries.
>>
>> This is also an excellent writeup about the issue:
>> 
>>
>> Signed-off-by: Matt Brown 

Thanks for working on this! I think it'll be nice to have available.

>> ---
>>  drivers/tty/tty_io.c |  4 
>>  include/linux/tty.h  |  2 ++
>>  kernel/sysctl.c  | 12 
>>  security/Kconfig | 13 +
>>  4 files changed, 31 insertions(+)
>>
>> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
>> index e6d1a65..31894e8 100644
>> --- a/drivers/tty/tty_io.c
>> +++ b/drivers/tty/tty_io.c
>> @@ -2296,11 +2296,15 @@ static int tty_fasync(int fd, struct file *filp, int 
>> on)
>>   *   FIXME: may race normal receive processing
>>   */
>>
>> +int tiocsti_restrict = IS_ENABLED(CONFIG_SECURITY_TIOCSTI_RESTRICT);
>> +
>>  static int tiocsti(struct tty_struct *tty, char __user *p)
>>  {
>>   char ch, mbz = 0;
>>   struct tty_ldisc *ld;
>>
>> + if (tiocsti_restrict && !capable(CAP_SYS_ADMIN))
>> + return -EPERM;

I wonder if it might be worth adding a pr_warn_ratelimited() here to
help people identify either programs that want to use this feature or
actual attacks?

>>   if ((current->signal->tty != tty) && !capable(CAP_SYS_ADMIN))
>>   return -EPERM;
>>   if (get_user(ch, p))
>> diff --git a/include/linux/tty.h b/include/linux/tty.h
>> index 1017e904..7011102 100644
>> --- a/include/linux/tty.h
>> +++ b/include/linux/tty.h
>> @@ -342,6 +342,8 @@ struct tty_file_private {
>>   struct list_head list;
>>  };
>>
>> +extern int tiocsti_restrict;
>> +
>>  /* tty magic number */
>>  #define TTY_MAGIC0x5401
>>
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index acf0a5a..68d1363 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -67,6 +67,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include 
>>  #include 
>> @@ -833,6 +834,17 @@ static struct ctl_table kern_table[] = {
>>   .extra2 = &two,
>>   },
>>  #endif
>> +#if defined CONFIG_TTY
>> + {
>> + .procname   = "tiocsti_restrict",
>> + .data   = &tiocsti_restrict,

Since this is a new sysctl, it'll need to get documented in
Documentation/sysctl/kernel.txt as part of this patch.

>> + .maxlen = sizeof(int),
>> + .mode   = 0644,
>> + .proc_handler   = proc_dointvec_minmax_sysadmin,
>> + .extra1 = &zero,
>> + .extra2 = &one,
>> + },
>> +#endif
>>   {
>>   .procname   = "ngroups_max",
>>   .data   = &ngroups_max,
>> diff --git a/security/Kconfig b/security/Kconfig
>> index 3ff1bf9..7d13331 100644
>> --- a/security/Kconfig
>> +++ b/security/Kconfig
>> @@ -18,6 +18,19 @@ config SECURITY_DMESG_RESTRICT
>>
>> If you are unsure how to answer this question, answer N.
>>
>> +config SECURI

Re: [PATCH v2] usb: dwc3: add disable u2mac linestate check quirk

2017-04-18 Thread Guenter Roeck

On Tue, Apr 18, 2017 at 8:59 PM, wlf  wrote:
> Dear Guenter,
>
>
>
> 在 2017年04月18日 21:18, Guenter Roeck 写道:
>>
>> On Mon, Apr 17, 2017 at 10:17 PM, William Wu 
>> wrote:
>>>
>>> This patch adds a quirk to disable USB 2.0 MAC linestate check
>>> during HS transmit. Refer the dwc3 databook, we can use it for
>>> some special platforms if the linestate not reflect the expected
>>> line state(J) during transmission.
>>>
>>> When use this quirk, the controller implements a fixed 40-bit
>>> TxEndDelay after the packet is given on UTMI and ignores the
>>> linestate during the transmit of a token (during token-to-token
>>> and token-to-data IPGAP).
>>>
>>> On some rockchip platforms (e.g. rk3399), it requires to disable
>>> the u2mac linestate check to decrease the SSPLIT token to SETUP
>>> token inter-packet delay from 566ns to 466ns, and fix the issue
>>> that FS/LS devices not recognized if inserted through USB 3.0 HUB.
>>>
>>> Signed-off-by: William Wu 
>>> ---
>>> Changes in v2:
>>> - fix coding style
>>>
>>>   Documentation/devicetree/bindings/usb/dwc3.txt |  2 ++
>>>   drivers/usb/dwc3/core.c| 14 ++
>>>   drivers/usb/dwc3/core.h|  4 
>>>   3 files changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt
>>> b/Documentation/devicetree/bindings/usb/dwc3.txt
>>> index f658f39..6a89f0c 100644
>>> --- a/Documentation/devicetree/bindings/usb/dwc3.txt
>>> +++ b/Documentation/devicetree/bindings/usb/dwc3.txt
>>> @@ -45,6 +45,8 @@ Optional properties:
>>>  a free-running PHY clock.
>>>- snps,dis-del-phy-power-chg-quirk: when set core will change PHY
>>> power
>>>  from P0 to P1/P2/P3 without delay.
>>> + - snps,tx-ipgap-linecheck-dis-quirk: when set, disable u2mac linestate
>>> check
>>> +   during HS transmit.
>>
>> All other disable-something quirks are named
>> "snps,dis-something-quirk". Maybe use the same naming convention ?
>
> Yes, good idea！ I will fix it with "snps,dis-tx-ipgap-linecheck-quirk"  in
> next patch verison.
> Thanks:-)
>>
>>
>>>- snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
>>>  utmi_l1_suspend_n, false when asserts
>>> utmi_sleep_n
>>>- snps,hird-threshold: HIRD threshold
>>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
>>> index 455d89a..03429c5 100644
>>> --- a/drivers/usb/dwc3/core.c
>>> +++ b/drivers/usb/dwc3/core.c
>>> @@ -796,15 +796,19 @@ static int dwc3_core_init(struct dwc3 *dwc)
>>>  dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
>>>  }
>>>
>>> +   reg = dwc3_readl(dwc->regs, DWC3_GUCTL1);
>>> +
>>
>> My understanding is that the register was only introduced with dwc3
>> revision 2.50a. Is it ok to read and write it unconditionally ?
>
> Yes, refer to dwc3 databook, the DWC3_GUCTL1 was introduced since 2.50a.
> Maybe it's better
> to read and write it only when we know our controller version.
>
> Is it good to fix it like the following patch?
> But this patch has a problem that we need to read and write the register
> twice if our controller verison > = 2.90a, and need this quirk.
>
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -806,6 +806,12 @@ static int dwc3_core_init(struct dwc3 *dwc)
> dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
> }
>
> +   if (dwc->dis_tx_ipgap_linecheck_quirk) {
> +   reg = dwc3_readl(dwc->regs, DWC3_GUCTL1);
> +   reg |= DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS;
> +   dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
> +   }
> +
>

How about this ?

if (dwc->revision >= DWC3_REVISION_250A) {
reg = dwc3_readl(dwc->regs, DWC3_GUCTL1);
if (dwc->revision >= DWC3_REVISION_290A)
reg |= DWC3_GUCTL1_DEV_L1_EXIT_BY_HW;
if (dwc->dis_tx_ipgap_linecheck_quirk)
   reg |= DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS;
dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
}

Thanks,
Guenter

> Hi John & Felipe,
>Could you provide me some suggestion？
>Thank you！
>
>>>  /*
>>>   * Enable hardware control of sending remote wakeup in HS when
>>>   * the device is in the L1 state.
>>>   */
>>> -   if (dwc->revision >= DWC3_REVISION_290A) {
>>> -   reg = dwc3_readl(dwc->regs, DWC3_GUCTL1);
>>> +   if (dwc->revision >= DWC3_REVISION_290A)
>>>  reg |= DWC3_GUCTL1_DEV_L1_EXIT_BY_HW;
>>> -   dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
>>> -   }
>>> +
>>> +   if (dwc->tx_ipgap_linecheck_dis_quirk)
>>> +   reg |= DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS;
>>> +
>>> +   dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
>>>
>>>  return 0;
>>>
>>> @@ -1023,6 +1027,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
>>>  "snps,dis-u2-freeclk-exists-quirk");
>>>

[PATCH 4/4] ARM: sun8i: h3: bananapi-m2-plus: Enable USB OTG

2017-04-18 Thread Chen-Yu Tsai

The Bananapi M2 Plus has a USB OTG port that can be used in both
powered host mode and peripheral mode. When in peripheral mode,
the port does not power the board. There is no VBUS sensing on
the port.

This patch adds the regulator controlling VBUS on the OTG port,
the GPIO for the ID detect pin, and enables the USB OTG and host
controllers.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts 
b/arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts
index 52acbe111cad..17c7c088cdea 100644
--- a/arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts
+++ b/arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts
@@ -92,6 +92,10 @@
};
 };
 
+&ehci0 {
+   status = "okay";
+};
+
 &ehci1 {
status = "okay";
 };
@@ -145,6 +149,10 @@
status = "okay";
 };
 
+&ohci0 {
+   status = "okay";
+};
+
 &ohci1 {
status = "okay";
 };
@@ -170,6 +178,11 @@
};
 };
 
+®_usb0_vbus {
+   gpio = <&pio 3 11 GPIO_ACTIVE_HIGH>; /* PD11 */
+   status = "okay";
+};
+
 &uart0 {
pinctrl-names = "default";
pinctrl-0 = <&uart0_pins_a>;
@@ -182,7 +195,14 @@
status = "okay";
 };
 
+&usb_otg {
+   dr_mode = "otg";
+   status = "okay";
+};
+
 &usbphy {
-   /* USB VBUS is on as long as VCC-IO is on */
+   usb0_id_det-gpios = <&r_pio 0 6 GPIO_ACTIVE_HIGH>; /* PL6 */
+   usb0_vbus-supply = <®_usb0_vbus>;
+   /* USB host VBUS is on as long as VCC-IO is on */
status = "okay";
 };
-- 
2.11.0

[PATCH 0/4] ARM: sunxi: device tree pinctrl clean up and H3 OTG

2017-04-18 Thread Chen-Yu Tsai

Hi Maxime,

This series has 2 parts. The parts are largely unrelated, though the
second part should be applied after the first part, so we don't
accidentally mux pins that we shouldn't. Hence I'm sending them
together.

The first 2 patches clean up the sunxi device tree files, removing
pinmux settings for common GPIO pins. These include the enable pins
for the common regulators, and the mmc0 card detect pin from the
reference designs.

The second part, the latter 2 patches, enable USB OTG on the Orangepi
PC, PC Plus, Plus 2E, and the Bananapi M2+. The first 3 boards are
bunched together, due to how the PC Plus and Plus 2E device trees include
the device tree of the Opi PC.

Regards
ChenYu

Chen-Yu Tsai (4):
  ARM: sunxi: common-regulators: Drop pinmux settings for GPIO pins
  ARM: sunxi: Drop mmc0_cd_pin_reference_design pinmux setting
  ARM: sun8i: h3: orangepi-pc: Enable USB OTG
  ARM: sun8i: h3: bananapi-m2-plus: Enable USB OTG

 arch/arm/boot/dts/sun4i-a10-a1000.dts  |  2 +-
 arch/arm/boot/dts/sun4i-a10-ba10-tvbox.dts |  2 +-
 arch/arm/boot/dts/sun4i-a10-chuwi-v7-cw0825.dts|  2 +-
 arch/arm/boot/dts/sun4i-a10-cubieboard.dts |  2 +-
 arch/arm/boot/dts/sun4i-a10-dserve-dsrv9703c.dts   |  2 +-
 arch/arm/boot/dts/sun4i-a10-gemei-g9.dts   |  2 +-
 arch/arm/boot/dts/sun4i-a10-hackberry.dts  |  2 +-
 arch/arm/boot/dts/sun4i-a10-hyundai-a7hd.dts   |  6 +
 arch/arm/boot/dts/sun4i-a10-inet1.dts  |  2 +-
 arch/arm/boot/dts/sun4i-a10-inet97fv2.dts  |  2 +-
 arch/arm/boot/dts/sun4i-a10-inet9f-rev03.dts   |  2 +-
 .../boot/dts/sun4i-a10-itead-iteaduino-plus.dts|  2 +-
 arch/arm/boot/dts/sun4i-a10-jesurun-q5.dts |  2 +-
 arch/arm/boot/dts/sun4i-a10-marsboard.dts  |  2 +-
 arch/arm/boot/dts/sun4i-a10-mini-xplus.dts |  2 +-
 arch/arm/boot/dts/sun4i-a10-mk802.dts  |  2 +-
 arch/arm/boot/dts/sun4i-a10-mk802ii.dts|  2 +-
 arch/arm/boot/dts/sun4i-a10-olinuxino-lime.dts |  2 +-
 arch/arm/boot/dts/sun4i-a10-pcduino.dts|  2 +-
 arch/arm/boot/dts/sun4i-a10-pov-protab2-ips9.dts   |  2 +-
 arch/arm/boot/dts/sun4i-a10.dtsi   |  6 -
 arch/arm/boot/dts/sun5i-a10s-auxtek-t003.dts   |  8 --
 arch/arm/boot/dts/sun5i-a10s-auxtek-t004.dts   |  4 ---
 arch/arm/boot/dts/sun5i-a10s-olinuxino-micro.dts   |  4 ---
 arch/arm/boot/dts/sun5i-a10s-wobo-i5.dts   |  4 ---
 .../boot/dts/sun5i-a13-empire-electronix-d709.dts  |  4 ---
 arch/arm/boot/dts/sun5i-a13-hsg-h702.dts   |  5 
 arch/arm/boot/dts/sun5i-a13-olinuxino.dts  |  4 ---
 arch/arm/boot/dts/sun6i-a31-hummingbird.dts|  5 
 arch/arm/boot/dts/sun7i-a20-cubieboard2.dts|  2 +-
 arch/arm/boot/dts/sun7i-a20-cubietruck.dts |  2 +-
 arch/arm/boot/dts/sun7i-a20-hummingbird.dts|  2 +-
 arch/arm/boot/dts/sun7i-a20-i12-tvbox.dts  |  2 +-
 arch/arm/boot/dts/sun7i-a20-icnova-swac.dts|  2 +-
 arch/arm/boot/dts/sun7i-a20-itead-ibox.dts |  2 +-
 arch/arm/boot/dts/sun7i-a20-lamobo-r1.dts  |  8 --
 arch/arm/boot/dts/sun7i-a20-m3.dts |  2 +-
 arch/arm/boot/dts/sun7i-a20-mk808c.dts |  2 +-
 arch/arm/boot/dts/sun7i-a20-olimex-som-evb.dts |  2 +-
 arch/arm/boot/dts/sun7i-a20-olinuxino-lime.dts |  2 +-
 arch/arm/boot/dts/sun7i-a20-olinuxino-lime2.dts|  2 +-
 arch/arm/boot/dts/sun7i-a20-olinuxino-micro.dts|  2 +-
 arch/arm/boot/dts/sun7i-a20-pcduino3-nano.dts  |  2 +-
 arch/arm/boot/dts/sun7i-a20-pcduino3.dts   |  6 +
 arch/arm/boot/dts/sun7i-a20-wexler-tab7200.dts |  2 +-
 arch/arm/boot/dts/sun7i-a20-wits-pro-a20-dkt.dts   |  2 +-
 arch/arm/boot/dts/sun7i-a20.dtsi   |  6 -
 arch/arm/boot/dts/sun8i-h3-bananapi-m2-plus.dts| 22 +++-
 arch/arm/boot/dts/sun8i-h3-orangepi-2.dts  |  4 ---
 arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts | 22 +++-
 arch/arm/boot/dts/sunxi-common-regulators.dtsi | 30 --
 51 files changed, 78 insertions(+), 138 deletions(-)

-- 
2.11.0

[PATCH 3/4] ARM: sun8i: h3: orangepi-pc: Enable USB OTG

2017-04-18 Thread Chen-Yu Tsai

The Orange Pi PC, PC Plus, and Plus 2E all have a USB OTG port
that can be used in both powered host mode and peripheral mode.
When in peripheral mode, the port does not power the board.
There is no VBUS sensing on the port. All three boards have all
related pins routed the same way.

The device tree file for the Orange Pi Plus 2E is based on the
Orange Pi PC Plus, which itself is based on the Orange Pi PC.
Changes to the base Orange Pi PC device tree file affects all 3
boards.

This patch adds the regulator controlling VBUS on the OTG port,
the GPIO for the ID detect pin, and enables the USB OTG and host
controllers.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts 
b/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
index f148111c326d..1a044b17d6c6 100644
--- a/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
+++ b/arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts
@@ -97,6 +97,10 @@
status = "okay";
 };
 
+&ehci0 {
+   status = "okay";
+};
+
 &ehci1 {
status = "okay";
 };
@@ -125,6 +129,10 @@
status = "okay";
 };
 
+&ohci0 {
+   status = "okay";
+};
+
 &ohci1 {
status = "okay";
 };
@@ -156,6 +164,11 @@
};
 };
 
+®_usb0_vbus {
+   gpio = <&r_pio 0 2 GPIO_ACTIVE_HIGH>; /* PL2 */
+   status = "okay";
+};
+
 &uart0 {
pinctrl-names = "default";
pinctrl-0 = <&uart0_pins_a>;
@@ -180,7 +193,14 @@
status = "disabled";
 };
 
+&usb_otg {
+   dr_mode = "otg";
+   status = "okay";
+};
+
 &usbphy {
-   /* USB VBUS is always on */
+   usb0_id_det-gpios = <&pio 6 12 GPIO_ACTIVE_HIGH>; /* PG12 */
+   usb0_vbus-supply = <®_usb0_vbus>;
+   /* VBUS on USB host ports are always on */
status = "okay";
 };
-- 
2.11.0

[PATCH 2/4] ARM: sunxi: Drop mmc0_cd_pin_reference_design pinmux setting

2017-04-18 Thread Chen-Yu Tsai

As part of our effort to move pinctrl/GPIO interlocking into the
driver where it belongs, this patch drops the definition and usage
of the mmc0_cd_pin_reference_design pinmux setting for the default
mmc0 card detect GPIO pin.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun4i-a10-a1000.dts| 2 +-
 arch/arm/boot/dts/sun4i-a10-ba10-tvbox.dts   | 2 +-
 arch/arm/boot/dts/sun4i-a10-chuwi-v7-cw0825.dts  | 2 +-
 arch/arm/boot/dts/sun4i-a10-cubieboard.dts   | 2 +-
 arch/arm/boot/dts/sun4i-a10-dserve-dsrv9703c.dts | 2 +-
 arch/arm/boot/dts/sun4i-a10-gemei-g9.dts | 2 +-
 arch/arm/boot/dts/sun4i-a10-hackberry.dts| 2 +-
 arch/arm/boot/dts/sun4i-a10-hyundai-a7hd.dts | 2 +-
 arch/arm/boot/dts/sun4i-a10-inet1.dts| 2 +-
 arch/arm/boot/dts/sun4i-a10-inet97fv2.dts| 2 +-
 arch/arm/boot/dts/sun4i-a10-inet9f-rev03.dts | 2 +-
 arch/arm/boot/dts/sun4i-a10-itead-iteaduino-plus.dts | 2 +-
 arch/arm/boot/dts/sun4i-a10-jesurun-q5.dts   | 2 +-
 arch/arm/boot/dts/sun4i-a10-marsboard.dts| 2 +-
 arch/arm/boot/dts/sun4i-a10-mini-xplus.dts   | 2 +-
 arch/arm/boot/dts/sun4i-a10-mk802.dts| 2 +-
 arch/arm/boot/dts/sun4i-a10-mk802ii.dts  | 2 +-
 arch/arm/boot/dts/sun4i-a10-olinuxino-lime.dts   | 2 +-
 arch/arm/boot/dts/sun4i-a10-pcduino.dts  | 2 +-
 arch/arm/boot/dts/sun4i-a10-pov-protab2-ips9.dts | 2 +-
 arch/arm/boot/dts/sun4i-a10.dtsi | 6 --
 arch/arm/boot/dts/sun7i-a20-cubieboard2.dts  | 2 +-
 arch/arm/boot/dts/sun7i-a20-cubietruck.dts   | 2 +-
 arch/arm/boot/dts/sun7i-a20-hummingbird.dts  | 2 +-
 arch/arm/boot/dts/sun7i-a20-i12-tvbox.dts| 2 +-
 arch/arm/boot/dts/sun7i-a20-icnova-swac.dts  | 2 +-
 arch/arm/boot/dts/sun7i-a20-itead-ibox.dts   | 2 +-
 arch/arm/boot/dts/sun7i-a20-m3.dts   | 2 +-
 arch/arm/boot/dts/sun7i-a20-mk808c.dts   | 2 +-
 arch/arm/boot/dts/sun7i-a20-olimex-som-evb.dts   | 2 +-
 arch/arm/boot/dts/sun7i-a20-olinuxino-lime.dts   | 2 +-
 arch/arm/boot/dts/sun7i-a20-olinuxino-lime2.dts  | 2 +-
 arch/arm/boot/dts/sun7i-a20-olinuxino-micro.dts  | 2 +-
 arch/arm/boot/dts/sun7i-a20-pcduino3-nano.dts| 2 +-
 arch/arm/boot/dts/sun7i-a20-pcduino3.dts | 2 +-
 arch/arm/boot/dts/sun7i-a20-wexler-tab7200.dts   | 2 +-
 arch/arm/boot/dts/sun7i-a20-wits-pro-a20-dkt.dts | 2 +-
 arch/arm/boot/dts/sun7i-a20.dtsi | 6 --
 38 files changed, 36 insertions(+), 48 deletions(-)

diff --git a/arch/arm/boot/dts/sun4i-a10-a1000.dts 
b/arch/arm/boot/dts/sun4i-a10-a1000.dts
index f2a01fe2bebc..f80d37ddc4c6 100644
--- a/arch/arm/boot/dts/sun4i-a10-a1000.dts
+++ b/arch/arm/boot/dts/sun4i-a10-a1000.dts
@@ -171,7 +171,7 @@
 
 &mmc0 {
pinctrl-names = "default";
-   pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin_reference_design>;
+   pinctrl-0 = <&mmc0_pins_a>;
vmmc-supply = <®_vcc3v3>;
bus-width = <4>;
cd-gpios = <&pio 7 1 GPIO_ACTIVE_HIGH>; /* PH1 */
diff --git a/arch/arm/boot/dts/sun4i-a10-ba10-tvbox.dts 
b/arch/arm/boot/dts/sun4i-a10-ba10-tvbox.dts
index 942d739a4384..6b02de592a02 100644
--- a/arch/arm/boot/dts/sun4i-a10-ba10-tvbox.dts
+++ b/arch/arm/boot/dts/sun4i-a10-ba10-tvbox.dts
@@ -109,7 +109,7 @@
 
 &mmc0 {
pinctrl-names = "default";
-   pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin_reference_design>;
+   pinctrl-0 = <&mmc0_pins_a>;
vmmc-supply = <®_vcc3v3>;
bus-width = <4>;
cd-gpios = <&pio 7 1 GPIO_ACTIVE_HIGH>; /* PH1 */
diff --git a/arch/arm/boot/dts/sun4i-a10-chuwi-v7-cw0825.dts 
b/arch/arm/boot/dts/sun4i-a10-chuwi-v7-cw0825.dts
index 17f8c5ec011c..a7d61994b8fd 100644
--- a/arch/arm/boot/dts/sun4i-a10-chuwi-v7-cw0825.dts
+++ b/arch/arm/boot/dts/sun4i-a10-chuwi-v7-cw0825.dts
@@ -128,7 +128,7 @@
 
 &mmc0 {
pinctrl-names = "default";
-   pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin_reference_design>;
+   pinctrl-0 = <&mmc0_pins_a>;
vmmc-supply = <®_vcc3v3>;
bus-width = <4>;
cd-gpios = <&pio 7 1 GPIO_ACTIVE_HIGH>; /* PH1 */
diff --git a/arch/arm/boot/dts/sun4i-a10-cubieboard.dts 
b/arch/arm/boot/dts/sun4i-a10-cubieboard.dts
index d844938e2aa7..a698a994e5ff 100644
--- a/arch/arm/boot/dts/sun4i-a10-cubieboard.dts
+++ b/arch/arm/boot/dts/sun4i-a10-cubieboard.dts
@@ -142,7 +142,7 @@
 
 &mmc0 {
pinctrl-names = "default";
-   pinctrl-0 = <&mmc0_pins_a>, <&mmc0_cd_pin_reference_design>;
+   pinctrl-0 = <&mmc0_pins_a>;
vmmc-supply = <®_vcc3v3>;
bus-width = <4>;
cd-gpios = <&pio 7 1 GPIO_ACTIVE_HIGH>; /* PH1 */
diff --git a/arch/arm/boot/dts/sun4i-a10-dserve-dsrv9703c.dts 
b/arch/arm/boot/dts/sun4i-a10-dserve-dsrv9703c.dts
index aad3bec1cb39..e0777ae808c7 100644
--- a/arch/arm/boot/dts/sun4i-a10-dserve-dsrv9703c.dts
+++ b/arch/arm/boot/dts/sun4i-a10-dser

[PATCH 1/4] ARM: sunxi: common-regulators: Drop pinmux settings for GPIO pins

2017-04-18 Thread Chen-Yu Tsai

As part of our effort to move pinctrl/GPIO interlocking into the
driver where it belongs, this patch drops the definition and usage
of the pinmux settings for the common regulators defined in
sunxi-common-regulators.dtsi.

Signed-off-by: Chen-Yu Tsai 
---
 arch/arm/boot/dts/sun4i-a10-hyundai-a7hd.dts   |  4 ---
 arch/arm/boot/dts/sun5i-a10s-auxtek-t003.dts   |  8 --
 arch/arm/boot/dts/sun5i-a10s-auxtek-t004.dts   |  4 ---
 arch/arm/boot/dts/sun5i-a10s-olinuxino-micro.dts   |  4 ---
 arch/arm/boot/dts/sun5i-a10s-wobo-i5.dts   |  4 ---
 .../boot/dts/sun5i-a13-empire-electronix-d709.dts  |  4 ---
 arch/arm/boot/dts/sun5i-a13-hsg-h702.dts   |  5 
 arch/arm/boot/dts/sun5i-a13-olinuxino.dts  |  4 ---
 arch/arm/boot/dts/sun6i-a31-hummingbird.dts|  5 
 arch/arm/boot/dts/sun7i-a20-lamobo-r1.dts  |  8 --
 arch/arm/boot/dts/sun7i-a20-pcduino3.dts   |  4 ---
 arch/arm/boot/dts/sun8i-h3-orangepi-2.dts  |  4 ---
 arch/arm/boot/dts/sunxi-common-regulators.dtsi | 30 --
 13 files changed, 88 deletions(-)

diff --git a/arch/arm/boot/dts/sun4i-a10-hyundai-a7hd.dts 
b/arch/arm/boot/dts/sun4i-a10-hyundai-a7hd.dts
index 85dcf81ab64e..bc4351bb851f 100644
--- a/arch/arm/boot/dts/sun4i-a10-hyundai-a7hd.dts
+++ b/arch/arm/boot/dts/sun4i-a10-hyundai-a7hd.dts
@@ -120,10 +120,6 @@
status = "okay";
 };
 
-&usb2_vbus_pin_a {
-   pins = "PH6";
-};
-
 &usb_otg {
dr_mode = "otg";
status = "okay";
diff --git a/arch/arm/boot/dts/sun5i-a10s-auxtek-t003.dts 
b/arch/arm/boot/dts/sun5i-a10s-auxtek-t003.dts
index c6f742a7e69f..d2dee8d434bf 100644
--- a/arch/arm/boot/dts/sun5i-a10s-auxtek-t003.dts
+++ b/arch/arm/boot/dts/sun5i-a10s-auxtek-t003.dts
@@ -136,14 +136,6 @@
status = "okay";
 };
 
-&usb0_vbus_pin_a {
-   pins = "PG13";
-};
-
-&usb1_vbus_pin_a {
-   pins = "PB10";
-};
-
 &usb_otg {
dr_mode = "host";
status = "okay";
diff --git a/arch/arm/boot/dts/sun5i-a10s-auxtek-t004.dts 
b/arch/arm/boot/dts/sun5i-a10s-auxtek-t004.dts
index a27c3fa58736..16f839df4227 100644
--- a/arch/arm/boot/dts/sun5i-a10s-auxtek-t004.dts
+++ b/arch/arm/boot/dts/sun5i-a10s-auxtek-t004.dts
@@ -168,10 +168,6 @@
status = "okay";
 };
 
-&usb1_vbus_pin_a {
-   pins = "PG13";
-};
-
 &usbphy {
pinctrl-names = "default";
pinctrl-0 = <&usb0_id_detect_pin>;
diff --git a/arch/arm/boot/dts/sun5i-a10s-olinuxino-micro.dts 
b/arch/arm/boot/dts/sun5i-a10s-olinuxino-micro.dts
index 894f874a5beb..eff36fe1aaa3 100644
--- a/arch/arm/boot/dts/sun5i-a10s-olinuxino-micro.dts
+++ b/arch/arm/boot/dts/sun5i-a10s-olinuxino-micro.dts
@@ -271,10 +271,6 @@
status = "okay";
 };
 
-&usb0_vbus_pin_a {
-   pins = "PG11";
-};
-
 &usbphy {
pinctrl-names = "default";
pinctrl-0 = <&usb0_id_detect_pin>;
diff --git a/arch/arm/boot/dts/sun5i-a10s-wobo-i5.dts 
b/arch/arm/boot/dts/sun5i-a10s-wobo-i5.dts
index ea3e5655a61b..5482be174e12 100644
--- a/arch/arm/boot/dts/sun5i-a10s-wobo-i5.dts
+++ b/arch/arm/boot/dts/sun5i-a10s-wobo-i5.dts
@@ -216,10 +216,6 @@
status = "okay";
 };
 
-&usb1_vbus_pin_a {
-   pins = "PG12";
-};
-
 &usbphy {
usb1_vbus-supply = <®_usb1_vbus>;
status = "okay";
diff --git a/arch/arm/boot/dts/sun5i-a13-empire-electronix-d709.dts 
b/arch/arm/boot/dts/sun5i-a13-empire-electronix-d709.dts
index 34411d27aadf..3dbb0d7c2f8c 100644
--- a/arch/arm/boot/dts/sun5i-a13-empire-electronix-d709.dts
+++ b/arch/arm/boot/dts/sun5i-a13-empire-electronix-d709.dts
@@ -207,10 +207,6 @@
status = "okay";
 };
 
-&usb0_vbus_pin_a {
-   pins = "PG12";
-};
-
 &usbphy {
pinctrl-names = "default";
pinctrl-0 = <&usb0_id_detect_pin>, <&usb0_vbus_detect_pin>;
diff --git a/arch/arm/boot/dts/sun5i-a13-hsg-h702.dts 
b/arch/arm/boot/dts/sun5i-a13-hsg-h702.dts
index 2489c16f7efa..584fa579ded2 100644
--- a/arch/arm/boot/dts/sun5i-a13-hsg-h702.dts
+++ b/arch/arm/boot/dts/sun5i-a13-hsg-h702.dts
@@ -186,7 +186,6 @@
 };
 
 ®_usb0_vbus {
-   pinctrl-0 = <&usb0_vbus_pin_a>;
gpio = <&pio 6 12 GPIO_ACTIVE_HIGH>; /* PG12 */
status = "okay";
 };
@@ -202,10 +201,6 @@
status = "okay";
 };
 
-&usb0_vbus_pin_a {
-   pins = "PG12";
-};
-
 &usbphy {
pinctrl-names = "default";
pinctrl-0 = <&usb0_id_detect_pin>, <&usb0_vbus_detect_pin>;
diff --git a/arch/arm/boot/dts/sun5i-a13-olinuxino.dts 
b/arch/arm/boot/dts/sun5i-a13-olinuxino.dts
index 95f591bb8ced..38072c7e10e2 100644
--- a/arch/arm/boot/dts/sun5i-a13-olinuxino.dts
+++ b/arch/arm/boot/dts/sun5i-a13-olinuxino.dts
@@ -269,10 +269,6 @@
status = "okay";
 };
 
-&usb0_vbus_pin_a {
-   pins = "PG12";
-};
-
 &usbphy {
pinctrl-names = "default";
pinctrl-0 = <&usb0_id_detect_pin>, <&usb0_vbus_detect_pin>;
diff --git a/arch/arm/boot/dts/sun6i-a31-hummingbird.dts 
b/arch/arm/boot/dts/sun6i-a31-hummingbird.dts
index d4f74f476f25..b4c87a23e3f8 100644
--

Re: bfq-mq performance comparison to cfq

2017-04-18 Thread Bart Van Assche

On 04/11/17 00:29, Paolo Valente wrote:
>
>> Il giorno 10 apr 2017, alle ore 17:15, Bart Van Assche 
>>  ha scritto:
>>
>> On Mon, 2017-04-10 at 11:55 +0200, Paolo Valente wrote:
>>> That said, if you do always want maximum throughput, even at the
>>> expense of latency, then just switch off low-latency heuristics, i.e.,
>>> set low_latency to 0.  Depending on the device, setting slice_ilde to
>>> 0 may help a lot too (as well as with CFQ).  If the throughput is
>>> still low also after forcing BFQ to an only-throughput mode, then you
>>> hit some bug, and I'll have a little more work to do ...
>>
>> Has it been considered to make applications tell the I/O scheduler
>> whether to optimize for latency or for throughput? It shouldn't be that
>> hard for window managers and shells to figure out whether or not a new
>> application that is being started is interactive or not. This would
>> require a mechanism that allows applications to provide such information
>> to the I/O scheduler. Wouldn't that be a better approach than the I/O
>> scheduler trying to guess whether or not an application is an interactive
>> application?
>
> IMO that would be an (or maybe the) optimal solution, in terms of both
> throughput and latency.  We have even developed a prototype doing what
> you propose, for Android.  Unfortunately, I have not yet succeeded in
> getting support, to turn it into candidate production code, or to make
> a similar solution for lsb-compliant systems.

Hello Paolo,

What API was used by the Android application to tell the I/O scheduler 
to optimize for latency? Do you think that it would be sufficient if the 
application uses the ioprio_set() system call to set the I/O priority to 
IOPRIO_CLASS_RT?

Thanks,

Bart.

Re: Doubt on first access for PCIe device

2017-04-18 Thread Jon Masters

On 04/11/2017 10:15 AM, abhijit wrote:

> Here I am assuming, the completer ID will be device number and function 
> number that will eventually programmed in to  device. In that case, my 
> question is, without first write, how read request(VENDOR ID read) is 
> serviced/routed?

You'll want to read about PCIe enumeration at boot time and how the BIOS
walks the topology to assign these (which an OS may later re-number). In
particular, read about ECAM for configuration. This uses memory mapped
config space read/write accessors that target a memory address space
under the control of the root complex.

Jon.

Re: [PATCH 2/3] drm/vc4: Don't try to initialize FBDEV if we're only bound to V3D.

2017-04-18 Thread Daniel Vetter

On Tue, Apr 18, 2017 at 9:11 PM, Eric Anholt  wrote:
> The FBDEV initialization would throw an error in dmesg, when we just
> want to silently not initialize fbdev on a V3D-only VC4 instance.
>
> Signed-off-by: Eric Anholt 

Hm, this shouldn't be an error really, you might want to hotplug more
connectors later on. What exactly complains?
-Daniel

> ---
>  drivers/gpu/drm/vc4/vc4_kms.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
> index ad7925a9e0ea..237a504f11f0 100644
> --- a/drivers/gpu/drm/vc4/vc4_kms.c
> +++ b/drivers/gpu/drm/vc4/vc4_kms.c
> @@ -230,10 +230,12 @@ int vc4_kms_load(struct drm_device *dev)
>
> drm_mode_config_reset(dev);
>
> -   vc4->fbdev = drm_fbdev_cma_init(dev, 32,
> -   dev->mode_config.num_connector);
> -   if (IS_ERR(vc4->fbdev))
> -   vc4->fbdev = NULL;
> +   if (dev->mode_config.num_connector) {
> +   vc4->fbdev = drm_fbdev_cma_init(dev, 32,
> +   
> dev->mode_config.num_connector);
> +   if (IS_ERR(vc4->fbdev))
> +   vc4->fbdev = NULL;
> +   }
>
> drm_kms_helper_poll_init(dev);
>
> --
> 2.11.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Re: [PATCH] make TIOCSTI ioctl require CAP_SYS_ADMIN

2017-04-18 Thread Serge E. Hallyn

On Tue, Apr 18, 2017 at 11:45:26PM -0400, Matt Brown wrote:
> This patch reproduces GRKERNSEC_HARDEN_TTY functionality from the grsecurity
> project in-kernel.
> 
> This will create the Kconfig SECURITY_TIOCSTI_RESTRICT and the corresponding
> sysctl kernel.tiocsti_restrict that, when activated, restrict all TIOCSTI
> ioctl calls from non CAP_SYS_ADMIN users.
> 
> Possible effects on userland:
> 
> There could be a few user programs that would be effected by this
> change.
> See: 
> notable programs are: agetty, csh, xemacs and tcsh
> 
> However, I still believe that this change is worth it given that the
> Kconfig defaults to n. This will be a feature that is turned on for the

It's not worthless, but note that for instance before this was fixed
in lxc, this patch would not have helped with escapes from privileged
containers.

> same reason that people activate it when using grsecurity. Users of this
> opt-in feature will realize that they are choosing security over some OS
> features like unprivileged TIOCSTI ioctls, as should be clear in the
> Kconfig help message.
> 
> Threat Model/Patch Rational:
> 
> >From grsecurity's config for GRKERNSEC_HARDEN_TTY.
> 
>  | There are very few legitimate uses for this functionality and it
>  | has made vulnerabilities in several 'su'-like programs possible in
>  | the past.  Even without these vulnerabilities, it provides an
>  | attacker with an easy mechanism to move laterally among other
>  | processes within the same user's compromised session.
> 
> So if one process within a tty session becomes compromised it can follow
> that additional processes, that are thought to be in different security
> boundaries, can be compromised as a result. When using a program like su
> or sudo, these additional processes could be in a tty session where TTY file
> descriptors are indeed shared over privilege boundaries.
> 
> This is also an excellent writeup about the issue:
> 
> 
> Signed-off-by: Matt Brown 
> ---
>  drivers/tty/tty_io.c |  4 
>  include/linux/tty.h  |  2 ++
>  kernel/sysctl.c  | 12 
>  security/Kconfig | 13 +
>  4 files changed, 31 insertions(+)
> 
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index e6d1a65..31894e8 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -2296,11 +2296,15 @@ static int tty_fasync(int fd, struct file *filp, int 
> on)
>   *   FIXME: may race normal receive processing
>   */
>  
> +int tiocsti_restrict = IS_ENABLED(CONFIG_SECURITY_TIOCSTI_RESTRICT);
> +
>  static int tiocsti(struct tty_struct *tty, char __user *p)
>  {
>   char ch, mbz = 0;
>   struct tty_ldisc *ld;
>  
> + if (tiocsti_restrict && !capable(CAP_SYS_ADMIN))
> + return -EPERM;
>   if ((current->signal->tty != tty) && !capable(CAP_SYS_ADMIN))
>   return -EPERM;
>   if (get_user(ch, p))
> diff --git a/include/linux/tty.h b/include/linux/tty.h
> index 1017e904..7011102 100644
> --- a/include/linux/tty.h
> +++ b/include/linux/tty.h
> @@ -342,6 +342,8 @@ struct tty_file_private {
>   struct list_head list;
>  };
>  
> +extern int tiocsti_restrict;
> +
>  /* tty magic number */
>  #define TTY_MAGIC0x5401
>  
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index acf0a5a..68d1363 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -67,6 +67,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -833,6 +834,17 @@ static struct ctl_table kern_table[] = {
>   .extra2 = &two,
>   },
>  #endif
> +#if defined CONFIG_TTY
> + {
> + .procname   = "tiocsti_restrict",
> + .data   = &tiocsti_restrict,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = proc_dointvec_minmax_sysadmin,
> + .extra1 = &zero,
> + .extra2 = &one,
> + },
> +#endif
>   {
>   .procname   = "ngroups_max",
>   .data   = &ngroups_max,
> diff --git a/security/Kconfig b/security/Kconfig
> index 3ff1bf9..7d13331 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -18,6 +18,19 @@ config SECURITY_DMESG_RESTRICT
>  
> If you are unsure how to answer this question, answer N.
>  
> +config SECURITY_TIOCSTI_RESTRICT

This is an odd way to name this.  Shouldn't the name reflect that it
is setting the default, rather than enabling the feature?

Besides that, I'm ok with the patch.

> + bool "Restrict unprivileged use of tiocsti command injection"
> + default n
> + help
> +   This enforces restrictions on unprivileged users injecting commands
> +   into other processes which share a tty session using the TIOCSTI
> +   ioctl. This option makes TIOCSTI use require CAP_SYS_ADMI

Re: [PATCH v2 2/2] drm: dw-hdmi: gate audio clock from the I2S enablement callbacks

2017-04-18 Thread Archit Taneja




On 04/14/2017 02:01 PM, Romain Perier wrote:

Currently, the audio sampler clock is enabled from dw_hdmi_setup() at
step E. and is kept enabled for later use. This clock should be enabled
and disabled along with the actual audio stream and not always on (that
is bad for PM). Futhermore, as described by the datasheet, the I2S


s/Futhermore/Furthermore


variant need to gate/ungate the clock when the stream is


s/need/needs


enabled/disabled.

This commit adds a parameter to hdmi_audio_enable_clk() that controls
when the audio sample clock must be enabled or disabled. Then, it adds
the call to this function from dw_hdmi_i2s_audio_enable() and
dw_hdmi_i2s_audio_disable().

Signed-off-by: Romain Perier 
---
 drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index 5b328c0..a6da634 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -544,6 +544,12 @@ void dw_hdmi_set_sample_rate(struct dw_hdmi *hdmi, 
unsigned int rate)
 }
 EXPORT_SYMBOL_GPL(dw_hdmi_set_sample_rate);

+static void hdmi_enable_audio_clk(struct dw_hdmi *hdmi, bool enable)
+{
+   hdmi_modb(hdmi, enable ? 0 : HDMI_MC_CLKDIS_AUDCLK_DISABLE,
+ HDMI_MC_CLKDIS_AUDCLK_DISABLE, HDMI_MC_CLKDIS);
+}
+
 void dw_hdmi_ahb_audio_enable(struct dw_hdmi *hdmi)
 {
hdmi_set_cts_n(hdmi, hdmi->audio_cts, hdmi->audio_n);
@@ -557,6 +563,12 @@ void dw_hdmi_ahb_audio_disable(struct dw_hdmi *hdmi)
 void dw_hdmi_i2s_audio_enable(struct dw_hdmi *hdmi)
 {
hdmi_set_cts_n(hdmi, hdmi->audio_cts, hdmi->audio_n);
+   hdmi_enable_audio_clk(hdmi, true);
+}
+
+void dw_hdmi_i2s_audio_disable(struct dw_hdmi *hdmi)
+{
+   hdmi_enable_audio_clk(hdmi, false);
 }


This should be static too.

If you're okay with the suggestions, I can fix these myself and push. Let
me know if that's okay.

Thanks,
Archit



 void dw_hdmi_audio_enable(struct dw_hdmi *hdmi)
@@ -1592,11 +1604,6 @@ static void dw_hdmi_enable_video_path(struct dw_hdmi 
*hdmi)
HDMI_MC_FLOWCTRL);
 }

-static void hdmi_enable_audio_clk(struct dw_hdmi *hdmi)
-{
-   hdmi_modb(hdmi, 0, HDMI_MC_CLKDIS_AUDCLK_DISABLE, HDMI_MC_CLKDIS);
-}
-
 /* Workaround to clear the overflow condition */
 static void dw_hdmi_clear_overflow(struct dw_hdmi *hdmi)
 {
@@ -1710,7 +1717,7 @@ static int dw_hdmi_setup(struct dw_hdmi *hdmi, struct 
drm_display_mode *mode)

/* HDMI Initialization Step E - Configure audio */
hdmi_clk_regenerator_update_pixel_clock(hdmi);
-   hdmi_enable_audio_clk(hdmi);
+   hdmi_enable_audio_clk(hdmi, true);
}

/* not for DVI mode */
@@ -2438,6 +2445,7 @@ __dw_hdmi_probe(struct platform_device *pdev,
audio.write = hdmi_writeb;
audio.read  = hdmi_readb;
hdmi->enable_audio = dw_hdmi_i2s_audio_enable;
+   hdmi->disable_audio = dw_hdmi_i2s_audio_disable;

pdevinfo.name = "dw-hdmi-i2s-audio";
pdevinfo.data = &audio;



--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH v3 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src

2017-04-18 Thread Michael Ellerman

Peter Zijlstra  writes:

> On Tue, Apr 11, 2017 at 07:21:05AM +0530, Madhavan Srinivasan wrote:
>> From: Sukadev Bhattiprolu 
>> 
>> perf_mem_data_src is an union that is initialized via the ->val field
>> and accessed via the bitmap fields. For this to work on big endian
>> platforms (Which is broken now), we also need a big-endian represenation
>> of perf_mem_data_src. i.e, in a big endian system, if user request
>> PERF_SAMPLE_DATA_SRC (perf report -d), will get the default value from
>> perf_sample_data_init(), which is PERF_MEM_NA. Value for PERF_MEM_NA
>> is constructed using shifts:
>> 
>>   /* TLB access */
>>   #define PERF_MEM_TLB_NA0x01 /* not available */
>>   ...
>>   #define PERF_MEM_TLB_SHIFT 26
>> 
>>   #define PERF_MEM_S(a, s) \
>>  (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
>> 
>>   #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
>>  PERF_MEM_S(LVL, NA)   |\
>>  PERF_MEM_S(SNOOP, NA) |\
>>  PERF_MEM_S(LOCK, NA)  |\
>>  PERF_MEM_S(TLB, NA))
>> 
>> Which works out as:
>> 
>>   ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))
>> 
>> Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
>> in CPU endian.
>> 
>> But then in the perf tool, the code uses the bitfields to inspect the
>> value, and currently the bitfields are defined using little endian
>> ordering.
>> 
>> So eg. in perf_mem__tlb_scnprintf() we see:
>>   data_src->val = 0x5080021
>>  op = 0x0
>> lvl = 0x0
>>   snoop = 0x0
>>lock = 0x0
>>dtlb = 0x0
>>rsvd = 0x5080021
>> 
>> Patch does a minimal fix of adding big endian definition of the bitfields
>> to match the values that are already exported by the kernel on big endian.
>> And it makes no change on little endian.
>
> I think it is important to note that there are no current big-endian
> users. So 'fixing' this will not break anybody and will ensure future
> users (next patch) will work correctly.

Actually that's only partly true. As I describe above the PERF_MEM_NA
value is currently exported on BE platforms when a user requests it.

So I added this text after the output from perf_mem__tlb_scnprintf():

  Because of the way the perf tool code is written this is still displayed to 
the
  user as "N/A", so there is no bug visible at the UI level.
  
  Currently there are no big endian architectures which export a meaningful
  value (ie. other than PERF_MEM_NA), so the extent of the bug on big endian
  platforms is that the PERF_MEM_NA value is exported incorrectly as described
  above. Subsequent patches will add support on big endian powerpc for 
populating
  the data source value.


Hope that is clear.

It also occurred to me that we don't actually have to redefine the whole
union, it's only the bitfields that matter, so we could reduce the diff
to:

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..97152c79df6b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -894,12 +894,23 @@ enum perf_callchain_context {
 union perf_mem_data_src {
__u64 val;
struct {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
__u64   mem_op:5,   /* type of opcode */
mem_lvl:14, /* memory hierarchy level */
mem_snoop:5,/* snoop mode */
mem_lock:2, /* lock instr */
mem_dtlb:7, /* tlb access */
mem_rsvd:31;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+   __u64   mem_rsvd:31,
+   mem_dtlb:7, /* tlb access */
+   mem_lock:2, /* lock instr */
+   mem_snoop:5,/* snoop mode */
+   mem_lvl:14, /* memory hierarchy level */
+   mem_op:5;   /* type of opcode */
+#else
+#error "Unknown endianness"
+#endif
};
 };
 

That looks better to me, thoughts?

cheers

Re: [PATCH v2 1/2] drm: dw-hdmi: add specific I2S and AHB functions for stream handling

2017-04-18 Thread Archit Taneja




On 04/14/2017 02:01 PM, Romain Perier wrote:

Currently, CTS+N is forced to zero as a workaround of the IP block for
i.MX platforms. This is requested in the datasheet of the corresponding
IP for AHB mode only. However, we have seen that it introduces glitches
or delays when playing a sound on HDMI for I2S mode. This proves that we
cannot keep the current functions for handling audio stream as-is if
these contain workaround that are specific to a mode.

This commit introduces two callbacks, one for each variant.
dw_hdmi_setup defines the right function depending on the detected
variant. Then, the exported functions dw_hdmi_audio_enable and
dw_hdmi_audio_disable calls the corresponding callbacks

Signed-off-by: Romain Perier 
---
 drivers/gpu/drm/bridge/synopsys/dw-hdmi.c | 26 --
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
index 4b6f216..5b328c0 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi.c
@@ -173,6 +173,8 @@ struct dw_hdmi {

unsigned int reg_shift;
struct regmap *regm;
+   void (*enable_audio)(struct dw_hdmi *hdmi);
+   void (*disable_audio)(struct dw_hdmi *hdmi);
 };

 #define HDMI_IH_PHY_STAT0_RX_SENSE \
@@ -542,13 +544,29 @@ void dw_hdmi_set_sample_rate(struct dw_hdmi *hdmi, 
unsigned int rate)
 }
 EXPORT_SYMBOL_GPL(dw_hdmi_set_sample_rate);

+void dw_hdmi_ahb_audio_enable(struct dw_hdmi *hdmi)
+{
+   hdmi_set_cts_n(hdmi, hdmi->audio_cts, hdmi->audio_n);
+}
+
+void dw_hdmi_ahb_audio_disable(struct dw_hdmi *hdmi)
+{
+   hdmi_set_cts_n(hdmi, hdmi->audio_cts, 0);
+}
+
+void dw_hdmi_i2s_audio_enable(struct dw_hdmi *hdmi)
+{
+   hdmi_set_cts_n(hdmi, hdmi->audio_cts, hdmi->audio_n);
+}
+


I get some sparse warnings asking for the above 3 to be static.

Thanks,
Archit


 void dw_hdmi_audio_enable(struct dw_hdmi *hdmi)
 {
unsigned long flags;

spin_lock_irqsave(&hdmi->audio_lock, flags);
hdmi->audio_enable = true;
-   hdmi_set_cts_n(hdmi, hdmi->audio_cts, hdmi->audio_n);
+   if (hdmi->enable_audio)
+   hdmi->enable_audio(hdmi);
spin_unlock_irqrestore(&hdmi->audio_lock, flags);
 }
 EXPORT_SYMBOL_GPL(dw_hdmi_audio_enable);
@@ -559,7 +577,8 @@ void dw_hdmi_audio_disable(struct dw_hdmi *hdmi)

spin_lock_irqsave(&hdmi->audio_lock, flags);
hdmi->audio_enable = false;
-   hdmi_set_cts_n(hdmi, hdmi->audio_cts, 0);
+   if (hdmi->disable_audio)
+   hdmi->disable_audio(hdmi);
spin_unlock_irqrestore(&hdmi->audio_lock, flags);
 }
 EXPORT_SYMBOL_GPL(dw_hdmi_audio_disable);
@@ -2404,6 +2423,8 @@ __dw_hdmi_probe(struct platform_device *pdev,
audio.irq = irq;
audio.hdmi = hdmi;
audio.eld = hdmi->connector.eld;
+   hdmi->enable_audio = dw_hdmi_ahb_audio_enable;
+   hdmi->disable_audio = dw_hdmi_ahb_audio_disable;

pdevinfo.name = "dw-hdmi-ahb-audio";
pdevinfo.data = &audio;
@@ -2416,6 +2437,7 @@ __dw_hdmi_probe(struct platform_device *pdev,
audio.hdmi  = hdmi;
audio.write = hdmi_writeb;
audio.read  = hdmi_readb;
+   hdmi->enable_audio = dw_hdmi_i2s_audio_enable;

pdevinfo.name = "dw-hdmi-i2s-audio";
pdevinfo.data = &audio;



--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode

2017-04-18 Thread Jin, Yao




On 4/19/2017 8:53 AM, Jin, Yao wrote:



On 4/19/2017 2:53 AM, Jiri Olsa wrote:

On Wed, Apr 12, 2017 at 06:21:05AM +0800, Jin Yao wrote:

SNIP


+const char *branch_type_name(int type)
+{
+const char *branch_names[PERF_BR_MAX] = {
+"N/A",
+"JCC",
+"JMP",
+"IND_JMP",
+"CALL",
+"IND_CALL",
+"RET",
+"SYSCALL",
+"SYSRET",
+"IRQ",
+"INT",
+"IRET",
+"FAR_BRANCH",
+};
+
+if ((type >= 0) && (type < PERF_BR_MAX))
+return branch_names[type];
+
+return NULL;

looks like we should add util/branch.c with above functions
and merge it with util/parse-branch-options.c

we create new file even for less code ;-)

thanks,
jirka


Could we directly add branch_type_name() in util/parse-branch-options.c?

I just feel it's a bit waste of creating a new file for less code. :)

Thanks
Jin Yao


After considering again, yes, creating util/branch.c should be better. I 
will do that.


Thanks
Jin Yao

Re: linux-next: build failure after merge of the rcu tree

2017-04-18 Thread Paul E. McKenney

On Wed, Apr 19, 2017 at 01:50:16PM +1000, Stephen Rothwell wrote:
> Hi Paul,
> 
> After merging the rcu tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
> 
> kernel/rcu/rcutorture.c: In function 'rcu_torture_stats_print':
> kernel/rcu/rcutorture.c:1369:3: error: implicit declaration of function 
> 'srcutorture_get_gp_data' [-Werror=implicit-function-declaration]
>srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
>^
> 
> Caused by commit
> 
>   b4d55cac0a93 ("srcu: Make rcutorture writer stalls print SRCU GP state")
> 
> This config has CONFIG_CLASSIC_SRCU=y and CONFIG_RCU_TORTURE_TEST=m, so
> CONFIG_RCU_TORTURE_TEST is not defined - CONFIG_RCU_TORTURE_TEST_MODULE
> is defined.  You probably want to protect srcutorture_get_gp_data() with
> IS_ENABLED(CONFIG_RCU_TORTURE_TEST) instead.
> 
> I have used the rcu tree from next-20170418 for today.

Please accept my apologies!  I forgot about the state of -rcu while
chasing another bug, and only a few minutes ago made the transition
from "Why doesn't this code work?" to "Why didn't my brain work?".  :-/

Will be fixed for tomorrow's -next.  Or at least broken in a more subtle
and creative way.  ;-)

Thanx, Paul

Re: [PATCH v2] usb: dwc3: add disable u2mac linestate check quirk

2017-04-18 Thread wlf


Dear Guenter,


在 2017年04月18日 21:18, Guenter Roeck 写道:

On Mon, Apr 17, 2017 at 10:17 PM, William Wu  wrote:

This patch adds a quirk to disable USB 2.0 MAC linestate check
during HS transmit. Refer the dwc3 databook, we can use it for
some special platforms if the linestate not reflect the expected
line state(J) during transmission.

When use this quirk, the controller implements a fixed 40-bit
TxEndDelay after the packet is given on UTMI and ignores the
linestate during the transmit of a token (during token-to-token
and token-to-data IPGAP).

On some rockchip platforms (e.g. rk3399), it requires to disable
the u2mac linestate check to decrease the SSPLIT token to SETUP
token inter-packet delay from 566ns to 466ns, and fix the issue
that FS/LS devices not recognized if inserted through USB 3.0 HUB.

Signed-off-by: William Wu 
---
Changes in v2:
- fix coding style

  Documentation/devicetree/bindings/usb/dwc3.txt |  2 ++
  drivers/usb/dwc3/core.c| 14 ++
  drivers/usb/dwc3/core.h|  4 
  3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt 
b/Documentation/devicetree/bindings/usb/dwc3.txt
index f658f39..6a89f0c 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -45,6 +45,8 @@ Optional properties:
 a free-running PHY clock.
   - snps,dis-del-phy-power-chg-quirk: when set core will change PHY power
 from P0 to P1/P2/P3 without delay.
+ - snps,tx-ipgap-linecheck-dis-quirk: when set, disable u2mac linestate check
+   during HS transmit.

All other disable-something quirks are named
"snps,dis-something-quirk". Maybe use the same naming convention ?
Yes, good idea！ I will fix it with "snps,dis-tx-ipgap-linecheck-quirk"  
in next patch verison.

Thanks:-)



   - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
 utmi_l1_suspend_n, false when asserts utmi_sleep_n
   - snps,hird-threshold: HIRD threshold
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 455d89a..03429c5 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -796,15 +796,19 @@ static int dwc3_core_init(struct dwc3 *dwc)
 dwc3_writel(dwc->regs, DWC3_GUCTL2, reg);
 }

+   reg = dwc3_readl(dwc->regs, DWC3_GUCTL1);
+

My understanding is that the register was only introduced with dwc3
revision 2.50a. Is it ok to read and write it unconditionally ?
Yes, refer to dwc3 databook, the DWC3_GUCTL1 was introduced since 2.50a. 
Maybe it's better

to read and write it only when we know our controller version.

Is it good to fix it like the following patch?
But this patch has a problem that we need to read and write the register
twice if our controller verison > = 2.90a, and need this quirk.

--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -806,6 +806,12 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
}

+   if (dwc->dis_tx_ipgap_linecheck_quirk) {
+   reg = dwc3_readl(dwc->regs, DWC3_GUCTL1);
+   reg |= DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS;
+   dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
+   }
+

Hi John & Felipe,
   Could you provide me some suggestion？
   Thank you！

 /*
  * Enable hardware control of sending remote wakeup in HS when
  * the device is in the L1 state.
  */
-   if (dwc->revision >= DWC3_REVISION_290A) {
-   reg = dwc3_readl(dwc->regs, DWC3_GUCTL1);
+   if (dwc->revision >= DWC3_REVISION_290A)
 reg |= DWC3_GUCTL1_DEV_L1_EXIT_BY_HW;
-   dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
-   }
+
+   if (dwc->tx_ipgap_linecheck_dis_quirk)
+   reg |= DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS;
+
+   dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);

 return 0;

@@ -1023,6 +1027,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
 "snps,dis-u2-freeclk-exists-quirk");
 dwc->dis_del_phy_power_chg_quirk = device_property_read_bool(dev,
 "snps,dis-del-phy-power-chg-quirk");
+   dwc->tx_ipgap_linecheck_dis_quirk = device_property_read_bool(dev,
+   "snps,tx-ipgap-linecheck-dis-quirk");

 dwc->tx_de_emphasis_quirk = device_property_read_bool(dev,
 "snps,tx_de_emphasis_quirk");
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 981c77f..3c2537b 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -204,6 +204,7 @@
  #define DWC3_GCTL_DSBLCLKGTNG  BIT(0)

  /* Global User Control 1 Register */
+#define DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS BIT(28)
  #define DWC3_GUCTL1_DEV_L1_EXIT_BY_HW  BIT(24)

  /* Global USB2 P

linux-next: build failure after merge of the rcu tree

2017-04-18 Thread Stephen Rothwell

Hi Paul,

After merging the rcu tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

kernel/rcu/rcutorture.c: In function 'rcu_torture_stats_print':
kernel/rcu/rcutorture.c:1369:3: error: implicit declaration of function 
'srcutorture_get_gp_data' [-Werror=implicit-function-declaration]
   srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
   ^

Caused by commit

  b4d55cac0a93 ("srcu: Make rcutorture writer stalls print SRCU GP state")

This config has CONFIG_CLASSIC_SRCU=y and CONFIG_RCU_TORTURE_TEST=m, so
CONFIG_RCU_TORTURE_TEST is not defined - CONFIG_RCU_TORTURE_TEST_MODULE
is defined.  You probably want to protect srcutorture_get_gp_data() with
IS_ENABLED(CONFIG_RCU_TORTURE_TEST) instead.

I have used the rcu tree from next-20170418 for today.

-- 
Cheers,
Stephen Rothwell

[PATCH] make TIOCSTI ioctl require CAP_SYS_ADMIN

2017-04-18 Thread Matt Brown

This patch reproduces GRKERNSEC_HARDEN_TTY functionality from the grsecurity
project in-kernel.

This will create the Kconfig SECURITY_TIOCSTI_RESTRICT and the corresponding
sysctl kernel.tiocsti_restrict that, when activated, restrict all TIOCSTI
ioctl calls from non CAP_SYS_ADMIN users.

Possible effects on userland:

There could be a few user programs that would be effected by this
change.
See: 
notable programs are: agetty, csh, xemacs and tcsh

However, I still believe that this change is worth it given that the
Kconfig defaults to n. This will be a feature that is turned on for the
same reason that people activate it when using grsecurity. Users of this
opt-in feature will realize that they are choosing security over some OS
features like unprivileged TIOCSTI ioctls, as should be clear in the
Kconfig help message.

Threat Model/Patch Rational:

>From grsecurity's config for GRKERNSEC_HARDEN_TTY.

 | There are very few legitimate uses for this functionality and it
 | has made vulnerabilities in several 'su'-like programs possible in
 | the past.  Even without these vulnerabilities, it provides an
 | attacker with an easy mechanism to move laterally among other
 | processes within the same user's compromised session.

So if one process within a tty session becomes compromised it can follow
that additional processes, that are thought to be in different security
boundaries, can be compromised as a result. When using a program like su
or sudo, these additional processes could be in a tty session where TTY file
descriptors are indeed shared over privilege boundaries.

This is also an excellent writeup about the issue:


Signed-off-by: Matt Brown 
---
 drivers/tty/tty_io.c |  4 
 include/linux/tty.h  |  2 ++
 kernel/sysctl.c  | 12 
 security/Kconfig | 13 +
 4 files changed, 31 insertions(+)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index e6d1a65..31894e8 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -2296,11 +2296,15 @@ static int tty_fasync(int fd, struct file *filp, int on)
  * FIXME: may race normal receive processing
  */
 
+int tiocsti_restrict = IS_ENABLED(CONFIG_SECURITY_TIOCSTI_RESTRICT);
+
 static int tiocsti(struct tty_struct *tty, char __user *p)
 {
char ch, mbz = 0;
struct tty_ldisc *ld;
 
+   if (tiocsti_restrict && !capable(CAP_SYS_ADMIN))
+   return -EPERM;
if ((current->signal->tty != tty) && !capable(CAP_SYS_ADMIN))
return -EPERM;
if (get_user(ch, p))
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 1017e904..7011102 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -342,6 +342,8 @@ struct tty_file_private {
struct list_head list;
 };
 
+extern int tiocsti_restrict;
+
 /* tty magic number */
 #define TTY_MAGIC  0x5401
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index acf0a5a..68d1363 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -833,6 +834,17 @@ static struct ctl_table kern_table[] = {
.extra2 = &two,
},
 #endif
+#if defined CONFIG_TTY
+   {
+   .procname   = "tiocsti_restrict",
+   .data   = &tiocsti_restrict,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax_sysadmin,
+   .extra1 = &zero,
+   .extra2 = &one,
+   },
+#endif
{
.procname   = "ngroups_max",
.data   = &ngroups_max,
diff --git a/security/Kconfig b/security/Kconfig
index 3ff1bf9..7d13331 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -18,6 +18,19 @@ config SECURITY_DMESG_RESTRICT
 
  If you are unsure how to answer this question, answer N.
 
+config SECURITY_TIOCSTI_RESTRICT
+   bool "Restrict unprivileged use of tiocsti command injection"
+   default n
+   help
+ This enforces restrictions on unprivileged users injecting commands
+ into other processes which share a tty session using the TIOCSTI
+ ioctl. This option makes TIOCSTI use require CAP_SYS_ADMIN.
+
+ If this option is not selected, no restrictions will be enforced
+ unless the tiocsti_restrict sysctl is explicitly set to (1).
+
+ If you are unsure how to answer this question, answer N.
+
 config SECURITY
bool "Enable different security models"
depends on SYSFS
-- 
2.10.2

Re: [PATCH v2 8/9] staging: fsl-dpaa2/eth: Add TODO file

2017-04-18 Thread Stuart Yoder

On Wed, Apr 12, 2017 at 11:25 AM, Ioana Radulescu
 wrote:
> Add a list of TODO items for the Ethernet driver
>
> Signed-off-by: Ioana Radulescu 
> ---
> v2: Add note
>
>  drivers/staging/fsl-dpaa2/ethernet/TODO | 14 ++
>  1 file changed, 14 insertions(+)
>  create mode 100644 drivers/staging/fsl-dpaa2/ethernet/TODO
>
> diff --git a/drivers/staging/fsl-dpaa2/ethernet/TODO 
> b/drivers/staging/fsl-dpaa2/ethernet/TODO
> new file mode 100644
> index ..110e66d44b42
> --- /dev/null
> +++ b/drivers/staging/fsl-dpaa2/ethernet/TODO
> @@ -0,0 +1,14 @@
> +* Add a DPAA2 MAC kernel driver in order to allow PHY management; currently
> +  the DPMAC objects and their link to DPNIs are handled by MC internally
> +  and all PHYs are seen as fixed-link
> +* add more debug support: decide how to expose detailed debug statistics,
> +  add ingress error queue support
> +* MC firmware uprev; the DPAA2 objects used by the Ethernet driver need to
> +  be kept in sync with binary interface changes in MC
> +* refine README file
> +* cleanup
> +
> +NOTE: None of the above is must-have before getting the DPAA2 Ethernet driver
> +out of staging. The main requirement for that is to have the drivers it
> +depends on, fsl-mc bus and DPIO driver, moved to drivers/bus and drivers/soc
> +respectively.

The TODO file should have contact info (I think)...look at other
drivers/staging TODO
for examples.

Stuart

WARNING: kernel stack frame pointer has bad value

2017-04-18 Thread Steven Rostedt

Josh,

I'm starting to get a bunch of these warnings, and I'm thinking they
are false positives. The stack frame error is recorded at a call from
entry_SYSCALL_64_fastpath, where I would expect the bp to not be valid.

To trigger this, I only need to go into /sys/kernel/debug/tracing and
echo function > current_tracer then cat trace. Maybe function tracer
stack frames is messing it up some how, but it always fails at the
entry call.

Here's the dump;

 WARNING: kernel stack frame pointer at 8800bda0ff30 in sshd:1090 has bad 
value 55b32abf1fa8
 unwind stack type:0 next_sp:  (null) mask:6 graph_idx:0
 8800bda0fd28: 81cf502a (entry_SYSCALL_64_fastpath+0x18/0xad)
 8800bda0fd30: 810dc940 (sigprocmask+0x150/0x150)
 8800bda0fd38: 81cf502a (entry_SYSCALL_64_fastpath+0x18/0xad)
 8800bda0fd40: 8800c7e60040 (0x8800c7e60040)
 8800bda0fd48: 8800bda0fe08 (0x8800bda0fe08)
 8800bda0fd50: 825393c0 (ftrace_trace_arrays+0x40/0x40)
 8800bda0fd58: 8800c7e60040 (0x8800c7e60040)
 8800bda0fd60: 0008 (0x8)
 8800bda0fd68: 001a0800 (0x1a0800)
 8800bda0fd70:  ...
 8800bda0fd78: fbfff04a727c (0xfbfff04a727c)
 8800bda0fd80: 8122c8bb (trace_function+0x2b/0x120)
 8800bda0fd88: dc00 (0xdc00)
 8800bda0fd90: 810dc940 (sigprocmask+0x150/0x150)
 8800bda0fd98: 825393e0 (global_trace+0x20/0x1680)
 8800bda0fda0: ff7d (0xff7d)
 8800bda0fda8: 8122c8bb (trace_function+0x2b/0x120)
 8800bda0fdb0: 0010 (0x10)
 8800bda0fdb8: 0246 (0x246)
 8800bda0fdc0: 8800bda0fdd0 (0x8800bda0fdd0)
 8800bda0fdc8: 0018 (0x18)
 8800bda0fdd0: a02e0077 (0xa02e0077)
 8800bda0fdd8: 0246 (0x246)
 8800bda0fde0: 8800c7e60040 (0x8800c7e60040)
 8800bda0fde8: 8800c7e60040 (0x8800c7e60040)
 8800bda0fdf0: 0007 (0x7)
 8800bda0fdf8: 810dc940 (sigprocmask+0x150/0x150)
 8800bda0fe00: 81cf502a (entry_SYSCALL_64_fastpath+0x18/0xad)
 8800bda0fe08: 8800bda0fe68 (0x8800bda0fe68)
 8800bda0fe10: 81238168 (function_trace_call+0x208/0x260)
 8800bda0fe18: 00026f10 (0x26f10)
 8800bda0fe20: 8800c7e621f0 (0x8800c7e621f0)
 8800bda0fe28: 00026f10 (0x26f10)
 8800bda0fe30: 8800d3ea6f10 (0x8800d3ea6f10)
 8800bda0fe38: 8010 (0x8010)
 8800bda0fe40: 7d1f4e80 (0x7d1f4e80)
 8800bda0fe48: 7d1f4e00 (0x7d1f4e00)
 8800bda0fe50:  ...
 8800bda0fe58: 7d1f4f8f (0x7d1f4f8f)
 8800bda0fe60: 55b32a9a2a51 (0x55b32a9a2a51)
 8800bda0fe68: 8800bda0ff20 (0x8800bda0ff20)
 8800bda0fe70: a02e0077 (0xa02e0077)
 8800bda0fe78: 55b32bdc57c0 (0x55b32bdc57c0)
 8800bda0fe80: 41b58ab3 (0x41b58ab3)
 8800bda0fe88: 8233e3f0 (ONEf+0x16e40/0x5840d)
 8800bda0fe90: 8800bda0fed0 (0x8800bda0fed0)
 8800bda0fe98: 55b32abf1fa8 (0x55b32abf1fa8)
 8800bda0fea0: 8800bda0fee0 (0x8800bda0fee0)
 8800bda0fea8: 8800c7e60040 (0x8800c7e60040)
 8800bda0feb0: 81cf5017 (entry_SYSCALL_64_fastpath+0x5/0xad)
 8800bda0feb8: 001a0800 (0x1a0800)
 8800bda0fec0:  ...
 8800bda0fec8: 000e (0xe)
 8800bda0fed0: 0008 (0x8)
 8800bda0fed8: 7d1f4e00 (0x7d1f4e00)
 8800bda0fee0: 7d1f4e80 (0x7d1f4e80)
 8800bda0fee8:  ...
 8800bda0fef0: 8800bda0ff48 (0x8800bda0ff48)
 8800bda0fef8: 810dc945 (SyS_rt_sigprocmask+0x5/0x1a0)
 8800bda0ff00: 8800c7e60040 (0x8800c7e60040)
 8800bda0ff08: 0008 (0x8)
 8800bda0ff10: 001a0800 (0x1a0800)
 8800bda0ff18:  ...
 8800bda0ff20: 8800bda0ff30 (0x8800bda0ff30)
 8800bda0ff28: 810dc945 (SyS_rt_sigprocmask+0x5/0x1a0)
 8800bda0ff30: 55b32abf1fa8 (0x55b32abf1fa8)
 8800bda0ff38: 81cf502a (entry_SYSCALL_64_fastpath+0x18/0xad)
 8800bda0ff40: 55b32abf1fa8 (0x55b32abf1fa8)
 8800bda0ff48: 810dc945 (SyS_rt_sigprocmask+0x5/0x1a0)
 8800bda0ff50: 81cf502a (entry_SYSCALL_64_fastpath+0x18/0xad)
 8800bda0ff58: 258c9a9a (0x258c9a9a)
 8800bda0ff60: 9a954c2d (0x9a954c2d)
 8800bda0ff68: fc397de1 (0xfc397de1)
 8800bda0ff70: 2badc874 (0x2badc874)
 8800bda0ff78: 8800bda0ff98 (0x8800bda0ff98)
 8800bda0ff80: 81149040 (trace_hardirqs_off_caller+0xc0/0x110)
 8800bda0ff88: 0246 (0x246)
 8800bda0ff90: 0008 (0x8)
 8800bda0ff98: 001a0800 (0x1a0800)
 8800bda0ffa0:  ...
 8800bda0ff

[RFC] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level

2017-04-18 Thread Anshuman Khandual

Though migrating gigantic HugeTLB pages does not sound much like real
world use case, they can be affected by memory errors. Hence migration
at the PGD level HugeTLB pages should be supported just to enable soft
and hard offline use cases.

While allocating the new gigantic HugeTLB page, it should not matter
whether new page comes from the same node or not. There would be very
few gigantic pages on the system afterall, we should not be bothered
about node locality when trying to save a big page from crashing.

This introduces a new HugeTLB allocator called alloc_gigantic_page()
which will scan over all online nodes on the system and allocate a
single HugeTLB page.

Signed-off-by: Anshuman Khandual 
---
Tested on a POWER8 machine with 16GB pages along with Aneesh's
recent HugeTLB enablement patch series on powerpc which can
be found here.

https://lkml.org/lkml/2017/4/17/225

Here, we directly call alloc_gigantic_page() which ignores node
locality. But we can also first call normal alloc_huge_page()
with the node number and if that fails to allocate then call
alloc_gigantic_page() as a fallback option.

 include/linux/hugetlb.h |  8 +++-
 mm/hugetlb.c| 17 +
 mm/memory-failure.c |  8 ++--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 04b73a9c8b4b..ee75197e6ed8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -347,6 +347,7 @@ struct huge_bootmem_page {
 
 struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
+struct page *alloc_gigantic_page(struct hstate *h);
 struct page *alloc_huge_page_node(struct hstate *h, int nid);
 struct page *alloc_huge_page_noerr(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
@@ -473,7 +474,11 @@ extern int dissolve_free_huge_pages(unsigned long 
start_pfn,
 static inline bool hugepage_migration_supported(struct hstate *h)
 {
 #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
-   return huge_page_shift(h) == PMD_SHIFT;
+   if ((huge_page_shift(h) == PMD_SHIFT) ||
+   (huge_page_shift(h) == PGDIR_SHIFT))
+   return true;
+   else
+   return false;
 #else
return false;
 #endif
@@ -511,6 +516,7 @@ static inline void hugetlb_count_sub(long l, struct 
mm_struct *mm)
 #else  /* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 #define alloc_huge_page(v, a, r) NULL
+#define alloc_gigantic_page(h) NULL
 #define alloc_huge_page_node(h, nid) NULL
 #define alloc_huge_page_noerr(v, a, r) NULL
 #define alloc_bootmem_huge_page(h) NULL
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 97a44db06850..f2b31dddb1bc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1669,6 +1669,23 @@ struct page *__alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE);
 }
 
+struct page *alloc_gigantic_page(struct hstate *h)
+{
+   struct page *page = NULL;
+   int nid = 0;
+
+   spin_lock(&hugetlb_lock);
+   if (h->free_huge_pages - h->resv_huge_pages > 0) {
+   for_each_online_node(nid) {
+   page = dequeue_huge_page_node(h, nid);
+   if (page)
+   break;
+   }
+   }
+   spin_unlock(&hugetlb_lock);
+   return page;
+}
+
 /*
  * This allocation function is useful in the context where vma is irrelevant.
  * E.g. soft-offlining uses this function because it only cares physical
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fe64d7729a8e..619650969fe5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1481,11 +1481,15 @@ EXPORT_SYMBOL(unpoison_memory);
 static struct page *new_page(struct page *p, unsigned long private, int **x)
 {
int nid = page_to_nid(p);
-   if (PageHuge(p))
+   if (PageHuge(p)) {
+   if (hstate_is_gigantic(page_hstate(compound_head(p
+   return 
alloc_gigantic_page(page_hstate(compound_head(p)));
+
return alloc_huge_page_node(page_hstate(compound_head(p)),
   nid);
-   else
+   } else {
return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0);
+   }
 }
 
 /*
-- 
2.12.0

Re: [RfC PATCH] drm: fourcc byteorder: brings header file comments in line with reality.

2017-04-18 Thread Ilia Mirkin

On Tue, Apr 18, 2017 at 11:19 PM, Ilia Mirkin  wrote:
> On Tue, Apr 18, 2017 at 9:01 PM, Michel Dänzer  wrote:
>> On 18/04/17 07:14 PM, Gerd Hoffmann wrote:
>>>   Hi,
>>>
> Quite true that this proves nothing. However one should note that
> fbcon -> fbdev works,

 BTW, this supports Gerd's patch, since the KMS fbdev emulation code uses
 e.g. DRM_FORMAT_XRGB for depth/bpp 24/32, and the fbdev API uses
 native endian packed colour values.
>>>
>>> Same is true for DRM_IOCTL_MODE_ADDFB, with depth/bpp 24/32 you'll get
>>> DRM_FORMAT_XRGB (only DRM_IOCTL_MODE_ADDFB2 allows userspace specify
>>> fourcc formats directly).
>>
>> Right, and since all major Xorg drivers use DRM_IOCTL_MODE_ADDFB,
>> they're effectively using DRM_FORMAT_XRGB as native endianness as well.
>
> In the meanwhile, it has been pointed out to me that pre-nv50 display
> code actually doesn't use DRM_FORMAT_* at all -- it uses some helpers
> which end up advertising XR24 / AR24. However from what I can tell,
> that's not a well-reasoned selection. Either way, I'm going to test
> Gerd's patch, hopefully during the week, or weekend at the latest. My
> current suspicion is that it will have no effect on nouveau either
> way. We'll find out.

(And as Michel points out, the patch doesn't actually touch anything,
just comments. I originally thought it changed format -> fourcc
mapping.)

Re: [RfC PATCH] drm: fourcc byteorder: brings header file comments in line with reality.

2017-04-18 Thread Ilia Mirkin

On Tue, Apr 18, 2017 at 9:01 PM, Michel Dänzer  wrote:
> On 18/04/17 07:14 PM, Gerd Hoffmann wrote:
>>   Hi,
>>
 Quite true that this proves nothing. However one should note that
 fbcon -> fbdev works,
>>>
>>> BTW, this supports Gerd's patch, since the KMS fbdev emulation code uses
>>> e.g. DRM_FORMAT_XRGB for depth/bpp 24/32, and the fbdev API uses
>>> native endian packed colour values.
>>
>> Same is true for DRM_IOCTL_MODE_ADDFB, with depth/bpp 24/32 you'll get
>> DRM_FORMAT_XRGB (only DRM_IOCTL_MODE_ADDFB2 allows userspace specify
>> fourcc formats directly).
>
> Right, and since all major Xorg drivers use DRM_IOCTL_MODE_ADDFB,
> they're effectively using DRM_FORMAT_XRGB as native endianness as well.

In the meanwhile, it has been pointed out to me that pre-nv50 display
code actually doesn't use DRM_FORMAT_* at all -- it uses some helpers
which end up advertising XR24 / AR24. However from what I can tell,
that's not a well-reasoned selection. Either way, I'm going to test
Gerd's patch, hopefully during the week, or weekend at the latest. My
current suspicion is that it will have no effect on nouveau either
way. We'll find out.

  -ilia

Re: [PATCH] of: introduce event tracepoints for dynamic device_node lifecyle

2017-04-18 Thread Frank Rowand

On 04/18/17 19:46, Steven Rostedt wrote:
> On Tue, 18 Apr 2017 17:07:17 -0700
> Frank Rowand  wrote:
> 
> 
>> As far as I know, there is no easy way to combine trace data and printk()
>> style data to create a single chronology of events.  If some of the
>> information needed to debug an issue is trace data and some is printk()
>> style data then it becomes more difficult to understand the overall
>> situation.
> 
> You mean like:
> 
>  # echo 1 > /sys/kernel/debug/tracing/events/printk/console/enable
> 
> Makes all printks also go into the ftrace ring buffer.

Thanks!  I was hoping there was going to be an easy answer like this.


> -- Steve
> 
>>
>> If Rob wants to convert printk() style data to trace data (and I can't
>> convince him otherwise) then I will have further comments on this specific
>> patch.
>>
> .
>

Re: [Intel-gfx] [PATCH] dma-buf: Rename dma-ops to prevent conflict with kunmap_atomic macro

2017-04-18 Thread kbuild test robot

Hi Logan,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc7 next-20170418]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Logan-Gunthorpe/dma-buf-Rename-dma-ops-to-prevent-conflict-with-kunmap_atomic-macro/20170419-082521
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/tegra/gem.c:115:2: error: unknown field 'map' specified in 
>> initializer
 .map = tegra_bo_kmap,
 ^
>> drivers/gpu/drm/tegra/gem.c:116:2: error: unknown field 'unmap' specified in 
>> initializer
 .unmap = tegra_bo_kunmap,
 ^
>> drivers/gpu/drm/tegra/gem.c:622:2: error: unknown field 'kmap_atomic' 
>> specified in initializer
 .kmap_atomic = tegra_gem_prime_kmap_atomic,
 ^
>> drivers/gpu/drm/tegra/gem.c:622:17: error: initialization from incompatible 
>> pointer type [-Werror=incompatible-pointer-types]
 .kmap_atomic = tegra_gem_prime_kmap_atomic,
^~~
   drivers/gpu/drm/tegra/gem.c:622:17: note: (near initialization for 
'tegra_gem_prime_dmabuf_ops.begin_cpu_access')
>> drivers/gpu/drm/tegra/gem.c:623:2: error: unknown field 'kunmap_atomic' 
>> specified in initializer
 .kunmap_atomic = tegra_gem_prime_kunmap_atomic,
 ^
   drivers/gpu/drm/tegra/gem.c:623:19: error: initialization from incompatible 
pointer type [-Werror=incompatible-pointer-types]
 .kunmap_atomic = tegra_gem_prime_kunmap_atomic,
  ^
   drivers/gpu/drm/tegra/gem.c:623:19: note: (near initialization for 
'tegra_gem_prime_dmabuf_ops.end_cpu_access')
>> drivers/gpu/drm/tegra/gem.c:624:2: error: unknown field 'kmap' specified in 
>> initializer
 .kmap = tegra_gem_prime_kmap,
 ^
>> drivers/gpu/drm/tegra/gem.c:625:2: error: unknown field 'kunmap' specified 
>> in initializer
 .kunmap = tegra_gem_prime_kunmap,
 ^
   cc1: some warnings being treated as errors

vim +/map +115 drivers/gpu/drm/tegra/gem.c

   109  .get = tegra_bo_get,
   110  .put = tegra_bo_put,
   111  .pin = tegra_bo_pin,
   112  .unpin = tegra_bo_unpin,
   113  .mmap = tegra_bo_mmap,
   114  .munmap = tegra_bo_munmap,
 > 115  .map = tegra_bo_kmap,
 > 116  .unmap = tegra_bo_kunmap,
   117  };
   118  
   119  static int tegra_bo_iommu_map(struct tegra_drm *tegra, struct tegra_bo 
*bo)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

padata & workqueue list corruption

2017-04-18 Thread Samuel Holland


Hello Steffen & Workqueue People,

As Jason wrote about here a few weeks ago, we've been having issues
with padata. After spending considerable time working to rule out
the possibility that our code was doing something wrong, I've begun
to debug padata and the workqueue subsystems. I've gotten some
potentially useful backtraces and was hoping somebody here might
read them and have an "ah ha!" moment.

We've been using the padata library for some high-throughput
encryption/decryption workloads, and on a relatively weak CPU (Celeron
J1800), we have run into list corruption that results in all workqueues
getting stalled, and occasional panics. I can reproduce this fairly
easily, albeit after several hours of load.

Representative backtraces follow (the warnings come in sets). I have
kernel .configs and extended netconsole output from several occurrences
available upon request.

WARNING: CPU: 1 PID: 0 at lib/list_debug.c:33 __list_add+0x89/0xb0
list_add corruption. prev->next should be next (99f135016a90), but
was d34affc03b10. (prev=d34affc03b10).
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G   O4.9.20+ #1
Call Trace:
 
 dump_stack+0x67/0x92
 __warn+0xc6/0xe0
 warn_slowpath_fmt+0x5a/0x80
 __list_add+0x89/0xb0
 insert_work+0x3c/0xc0
 __queue_work+0x18a/0x600
 queue_work_on+0x33/0x70
 padata_do_parallel+0x14f/0x240
 ? padata_index_to_cpu+0x50/0x50
 ? packet_receive+0x140/0x140 [wireguard]
 packet_consume_data+0x1b9/0x2b0 [wireguard]
 ? packet_create_data+0x6b0/0x6b0 [wireguard]
 ? get_partial_node.isra.72+0x47/0x250
 ? skb_prepare_header+0xd5/0x3f0 [wireguard]
 ? packet_receive+0x140/0x140 [wireguard]
 packet_receive+0x79/0x140 [wireguard]
 ? __udp4_lib_lookup+0x147/0x2d0
 receive+0x1a/0x30 [wireguard]
 udp_queue_rcv_skb+0x34a/0x5b0
 __udp4_lib_rcv+0x468/0xb40
 ? ip_local_deliver_finish+0x21/0x370
 udp_rcv+0x15/0x20
 ip_local_deliver_finish+0xb7/0x370
 ? ip_local_deliver_finish+0x21/0x370
 ip_local_deliver+0x1e6/0x230
 ? ip_local_deliver+0x62/0x230
 ? ip_rcv_finish+0x670/0x670
 ip_rcv_finish+0x1ae/0x670
 ip_rcv+0x366/0x4d0
 ? ip_rcv+0x26a/0x4d0
 ? inet_del_offload+0x40/0x40
 __netif_receive_skb_core+0xae1/0xc80
 ? inet_del_offload+0x40/0x40
 ? netif_receive_skb_internal+0x29/0x200
 __netif_receive_skb+0x18/0x60
 netif_receive_skb_internal+0x7b/0x200
 ? netif_receive_skb_internal+0x29/0x200
 netif_receive_skb+0xcd/0x130
 br_pass_frame_up+0x2b1/0x2c0
 ? br_pass_frame_up+0xad/0x2c0
 ? br_allowed_ingress+0x38a/0x5d0
 ? br_allowed_ingress+0x1f5/0x5d0
 br_handle_frame_finish+0x28f/0x5a0
 ? br_handle_frame+0x1c1/0x5e0
 br_handle_frame+0x2c5/0x5e0
 ? br_handle_frame+0x1c1/0x5e0
 ? vlan_do_receive+0x37/0x380
 ? br_handle_frame_finish+0x5a0/0x5a0
 __netif_receive_skb_core+0x1e6/0xc80
 ? netif_receive_skb_internal+0x29/0x200
 __netif_receive_skb+0x18/0x60
 netif_receive_skb_internal+0x7b/0x200
 ? netif_receive_skb_internal+0x29/0x200
 napi_gro_receive+0x148/0x200
 igb_poll+0x67b/0xdb0
 ? net_rx_action+0xe5/0x450
 net_rx_action+0x224/0x450
 __do_softirq+0x1a9/0x4a0
 irq_exit+0xbe/0xd0
 do_IRQ+0x65/0x110
 common_interrupt+0x89/0x89
 

This add looks to be racing with a deletion of the last item in the
list, because in the actual list prev->next = prev.

WARNING: CPU: 1 PID: 0 at lib/list_debug.c:36 __list_add+0xac/0xb0
list_add double add: new=d34affc03b10, prev=d34affc03b10,
next=99f135016a90.
CPU: 1 PID: 0 Comm: swapper/1 Tainted: GW  O4.9.20+ #1
Call Trace:
 
 dump_stack+0x67/0x92
 __warn+0xc6/0xe0
 warn_slowpath_fmt+0x5a/0x80
 __list_add+0xac/0xb0
 insert_work+0x3c/0xc0
 __queue_work+0x18a/0x600
 queue_work_on+0x33/0x70
 padata_do_parallel+0x14f/0x240
 ? padata_index_to_cpu+0x50/0x50
 ? packet_receive+0x140/0x140 [wireguard]
 packet_consume_data+0x1b9/0x2b0 [wireguard]
 ? packet_create_data+0x6b0/0x6b0 [wireguard]
 ? get_partial_node.isra.72+0x47/0x250
 ? skb_prepare_header+0xd5/0x3f0 [wireguard]
 ? packet_receive+0x140/0x140 [wireguard]
 packet_receive+0x79/0x140 [wireguard]
 ? __udp4_lib_lookup+0x147/0x2d0
 receive+0x1a/0x30 [wireguard]
 udp_queue_rcv_skb+0x34a/0x5b0
 __udp4_lib_rcv+0x468/0xb40
 ? ip_local_deliver_finish+0x21/0x370
 udp_rcv+0x15/0x20
 ip_local_deliver_finish+0xb7/0x370
 ? ip_local_deliver_finish+0x21/0x370
 ip_local_deliver+0x1e6/0x230
 ? ip_local_deliver+0x62/0x230
 ? ip_rcv_finish+0x670/0x670
 ip_rcv_finish+0x1ae/0x670
 ip_rcv+0x366/0x4d0
 ? ip_rcv+0x26a/0x4d0
 ? inet_del_offload+0x40/0x40
 __netif_receive_skb_core+0xae1/0xc80
 ? inet_del_offload+0x40/0x40
 ? netif_receive_skb_internal+0x29/0x200
 __netif_receive_skb+0x18/0x60
 netif_receive_skb_internal+0x7b/0x200
 ? netif_receive_skb_internal+0x29/0x200
 netif_receive_skb+0xcd/0x130
 br_pass_frame_up+0x2b1/0x2c0
 ? br_pass_frame_up+0xad/0x2c0
 ? br_allowed_ingress+0x38a/0x5d0
 ? br_allowed_ingress+0x1f5/0x5d0
 br_handle_frame_finish+0x28f/0x5a0
 ? br_handle_frame+0x1c1/0x5e0
 br_handle_frame+0x2c5/0x5e0
 ? br_handle_frame+0x1c1/0x5e0
 ? vlan_do_receive+0x37/0x380
 ? br_handle_frame_finish+0x5a0/0x5a0
 __

Potential bug in path handling

2017-04-18 Thread iceboy

I found this while writing a simple sandbox. Script to reproduce: 
https://gist.github.com/iceb0y/93e77e6945019d8a863b452e18a18079

In the `bugbox`:

bugbox-4.3$ ls bin
(you get the files in /bin)

however

bugbox-4.3$ ls ../bin
(nothing)

Tried with latest 4.11 kernel. The problem occurs when you bind mount `/` to 
itself, and then remount it. Looks like one of the mount namespace, bind mount 
or pivot_root is mishandling root barrier, causing `../bin` referencing to the 
`bin` directory instead of the bind mount. This could be a security problem.

Any idea on what's the problem, or how to debug this?

* Dependencies of `bugbox`:
python 2 or 3
the `butter` package for syscall (sorry)
/bin /lib and /lib64 on your system are real, not symlinks

Re: [PATCH v2 3/3] mtd: dataflash: Make use of "extened device information"

2017-04-18 Thread Andrey Smirnov

On Tue, Apr 18, 2017 at 11:31 AM, Marek Vasut  wrote:
> On 04/18/2017 04:21 PM, Andrey Smirnov wrote:
>> In anticipation of supporting chips that need it, extend the size of
>> struct flash_info's 'jedec_id' field to make room 2 byte of extended
>> device information as well as add code to fetch this data during
>> jedec_probe().
>>
>> Cc: cphe...@gmail.com
>> Cc: David Woodhouse 
>> Cc: Brian Norris 
>> Cc: Boris Brezillon 
>> Cc: Marek Vasut 
>> Cc: Richard Weinberger 
>> Cc: Cyrille Pitchen 
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Andrey Smirnov 
>> ---
>>
>> Changes since [v1]:
>>
>>   - Formatting
>>
>> [v1] http://lkml.kernel.org/r/20170411161722.11164-1-andrew.smir...@gmail.com
>>
>>
>>  drivers/mtd/devices/mtd_dataflash.c | 113 
>> +---
>>  1 file changed, 65 insertions(+), 48 deletions(-)
>>
>> diff --git a/drivers/mtd/devices/mtd_dataflash.c 
>> b/drivers/mtd/devices/mtd_dataflash.c
>> index 5b7a8c3..5d694a4 100644
>> --- a/drivers/mtd/devices/mtd_dataflash.c
>> +++ b/drivers/mtd/devices/mtd_dataflash.c
>> @@ -690,7 +690,7 @@ struct flash_info {
>>   /* JEDEC id has a high byte of zero plus three data bytes:
>>* the manufacturer id, then a two byte device id.
>>*/
>> - u32 jedec_id;
>> + u64 jedec_id;
>>
>>   /* The size listed here is what works with OP_ERASE_PAGE. */
>>   unsignednr_pages;
>> @@ -713,63 +713,34 @@ static struct flash_info dataflash_data[] = {
>>* These newer chips also support 128-byte security registers (with
>>* 64 bytes one-time-programmable) and software write-protection.
>>*/
>> - { "AT45DB011B",  0x1f2200, 512, 264, 9, SUP_POW2PS},
>> - { "at45db011d",  0x1f2200, 512, 256, 8, SUP_POW2PS | IS_POW2PS},
>> + { "AT45DB011B",  0x1f2200, 512, 264, 9, SUP_POW2PS},
>> + { "at45db011d",  0x1f2200, 512, 256, 8, SUP_POW2PS | IS_POW2PS},
>>
>> - { "AT45DB021B",  0x1f2300, 1024, 264, 9, SUP_POW2PS},
>> - { "at45db021d",  0x1f2300, 1024, 256, 8, SUP_POW2PS | IS_POW2PS},
>> + { "AT45DB021B",  0x1f2300, 1024, 264, 9, SUP_POW2PS},
>> + { "at45db021d",  0x1f2300, 1024, 256, 8, SUP_POW2PS | IS_POW2PS},
>>
>> - { "AT45DB041x",  0x1f2400, 2048, 264, 9, SUP_POW2PS},
>> - { "at45db041d",  0x1f2400, 2048, 256, 8, SUP_POW2PS | IS_POW2PS},
>> + { "AT45DB041x",  0x1f2400, 2048, 264, 9, SUP_POW2PS},
>> + { "at45db041d",  0x1f2400, 2048, 256, 8, SUP_POW2PS | IS_POW2PS},
>>
>> - { "AT45DB081B",  0x1f2500, 4096, 264, 9, SUP_POW2PS},
>> - { "at45db081d",  0x1f2500, 4096, 256, 8, SUP_POW2PS | IS_POW2PS},
>> + { "AT45DB081B",  0x1f2500, 4096, 264, 9, SUP_POW2PS},
>> + { "at45db081d",  0x1f2500, 4096, 256, 8, SUP_POW2PS | IS_POW2PS},
>>
>> - { "AT45DB161x",  0x1f2600, 4096, 528, 10, SUP_POW2PS},
>> - { "at45db161d",  0x1f2600, 4096, 512, 9, SUP_POW2PS | IS_POW2PS},
>> + { "AT45DB161x",  0x1f2600, 4096, 528, 10, SUP_POW2PS},
>> + { "at45db161d",  0x1f2600, 4096, 512, 9, SUP_POW2PS | IS_POW2PS},
>>
>> - { "AT45DB321x",  0x1f2700, 8192, 528, 10, 0},   /* rev C */
>> + { "AT45DB321x",  0x1f2700, 8192, 528, 10, 0},   /* rev C */
>>
>> - { "AT45DB321x",  0x1f2701, 8192, 528, 10, SUP_POW2PS},
>> - { "at45db321d",  0x1f2701, 8192, 512, 9, SUP_POW2PS | IS_POW2PS},
>> + { "AT45DB321x",  0x1f2701, 8192, 528, 10, SUP_POW2PS},
>> + { "at45db321d",  0x1f2701, 8192, 512, 9, SUP_POW2PS | IS_POW2PS},
>>
>> - { "AT45DB642x",  0x1f2800, 8192, 1056, 11, SUP_POW2PS},
>> - { "at45db642d",  0x1f2800, 8192, 1024, 10, SUP_POW2PS | IS_POW2PS},
>> + { "AT45DB642x",  0x1f2800, 8192, 1056, 11, SUP_POW2PS},
>> + { "at45db642d",  0x1f2800, 8192, 1024, 10, SUP_POW2PS | IS_POW2PS},
>>  };
>>
>> -static struct flash_info *jedec_probe(struct spi_device *spi)
>> +static struct flash_info *jedec_lookup(struct spi_device *spi, u64 jedec)
>>  {
>> - int ret, i;
>> - u8 code = OP_READ_ID;
>> - u8 id[3];
>> - u32 jedec;
>> + int status, i;
>>   struct flash_info *info;
>> - int status;
>> -
>> - /*
>> -  * JEDEC also defines an optional "extended device information"
>> -  * string for after vendor-specific data, after the three bytes
>> -  * we use here.  Supporting some chips might require using it.
>> -  *
>> -  * If the vendor ID isn't Atmel's (0x1f), assume this call failed.
>> -  * That's not an error; only rev C and newer chips handle it, and
>> -  * only Atmel sells these chips.
>> -  */
>> - ret = spi_write_then_read(spi, &code, 1, id, 3);
>> - if (ret < 0) {
>> - pr_debug("%s: error %d reading JEDEC ID\n",
>> - dev_name(&spi->dev), ret);
>> - return ERR_PTR(ret);
>> - }
>> -
>> - if (id[0] != CFI_MFR_ATMEL)
>> - return NULL;
>> -
>> - jedec = id[0];
>> - jedec = jedec

Re: [PATCH] of: introduce event tracepoints for dynamic device_node lifecyle

2017-04-18 Thread Steven Rostedt

On Tue, 18 Apr 2017 18:42:32 -0700
Frank Rowand  wrote:

> And of course the other issue with using tracepoints is the extra space
> required to hold the tracepoint info.  With the pr_debug() approach, the
> space usage can be easily removed for a production kernel via a config
> option.

Now if you are saying you want to be able to enable debugging without
the tracing infrastructure I would agree. As the tracing infrastructure
is large. But I'm working on shrinking it more.

> 
> Tracepoints are wonderful technology, but not always the proper tool to
> use for debug info.

But if you are going to have tracing enabled regardless, adding a few
more tracepoints isn't going to make the difference.

-- Steve

> 
> > If Rob wants to convert printk() style data to trace data (and I can't
> > convince him otherwise) then I will have further comments on this specific
> > patch.
> >

Re: [PATCH] of: introduce event tracepoints for dynamic device_node lifecyle

2017-04-18 Thread Steven Rostedt

On Tue, 18 Apr 2017 17:07:17 -0700
Frank Rowand  wrote:


> As far as I know, there is no easy way to combine trace data and printk()
> style data to create a single chronology of events.  If some of the
> information needed to debug an issue is trace data and some is printk()
> style data then it becomes more difficult to understand the overall
> situation.

You mean like:

 # echo 1 > /sys/kernel/debug/tracing/events/printk/console/enable

Makes all printks also go into the ftrace ring buffer.

-- Steve

> 
> If Rob wants to convert printk() style data to trace data (and I can't
> convince him otherwise) then I will have further comments on this specific
> patch.
>

Re: [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

2017-04-18 Thread Xiongfeng Wang

Hi James,

Thanks for your reply.

On 2017/4/18 18:51, James Morse wrote:
> Hi Wang Xiongfeng,
> 
> On 18/04/17 02:09, Xiongfeng Wang wrote:
>> I have some confusion about the RAS feature when VHE is enabled. Does RAS 
>> spec support
>> the situation when VHE is enabled. When VHE is disabled, the hyperviosr 
>> delegates the error
>> exception to EL1 by setting HCR_EL2.VSE to 1, and this will inject a virtual 
>> SEI into OS.
> 
> (The ARM-ARM also requires the HCR_EL2.AMO to be set so that physical SError
>  Interrupts are taken to EL2, meaning EL1 can never receive a physical SError)
> 
> 
>> My understanding is that HCR_EL2.VSE is only used to inject a virtual SEI 
>> into EL1.
> 
> ... mine too ...
> 
>> But when VHE is enabled, the host OS will run at EL2. We can't inject a 
>> virtual SEI into
>> host OS. I don't know if RAS spec can handle this situation.
> 
> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
> describe a way to inject these as they are generated by the CPU.
> 
> Am I right in thinking you want this to use SError Interrupts as an APEI
> notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)

Yes, using sei as an APEI notification is one part of my consideration. Another 
use is for ESB.
RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External 
Abort with ESB'
describes the SEI recovery process when ESB is implemented.

In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs in 
EL0 and not been taken immediately,
and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. The 
ESB at SVC entry is
used for preventing the error propagating from user space to kernel space. The 
EL3 SEI handler collects
the errors and fills in the APEI table, and then jump to EL2 SEI handler. EL2 
SEI handler inject
an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an 
SEI is pending.
Then ESB is executed again, and DISR_EL1.A is set by hardware (2.4.4 ESB and 
virtual errors), so that
the following process can be executed.

So we want to inject a vSEI into OS, when control is returned from EL3/2 to OS, 
no matter whether
it is on host OS or guest OS. I don't know if my understanding is right here.
> 
> This is straightforward for the hyper-visor to implement using Virtual SError.
> I don't think its not always feasible for the host as Physical SError is 
> routed
> to EL3 by SCR_EL3.EA, meaning there is no hardware generated SError that can
> reach EL2. Another APEI notification mechanism may be more appropriate.

> 
> EL3 may be able to 'fake' an SError by returning into the appropriate EL2 
> vector
> if the exception came from EL{0,1}, or from EL2 and PSTATE.A is clear.
> If the SError came from EL2 and the ESR_EL3.IESB bit is set, we can write an
> appropriate ESR into DISR.

Yes, this can work. When VHE is enabled, we can set DISR.A by software, and 
'fake'
an SError by returning into the EL2 SEI vector.

> You cant use SError to cover all the possible RAS exceptions. We already have
> this problem using SEI if PSTATE.A was set and the exception was an imprecise
> abort from EL2. We can't return to the interrupted context and we can't 
> deliver
> an SError to EL2 either.

SEI came from EL2 and PSTATE.A is set. Is it the situation where VHE is enabled 
and CPU is running
in kernel space. If SEI occurs in kernel space, can we just panic or shutdown.
> 
> Setting SCR_EL3.EA allows firmware to handle these ugly corner cases. 
> Notifying
> the OS is a separate problem where APEI's SEI may not always be the best 
> choice.
> 
> 
> Thanks,
> 
> James
> 
> .
> 
Thanks,
Wang Xiongfeng

Re: [PATCH] of: introduce event tracepoints for dynamic device_node lifecyle

2017-04-18 Thread Frank Rowand

On 04/18/17 18:31, Michael Ellerman wrote:
> Frank Rowand  writes:
> 
>> On 04/17/17 17:32, Tyrel Datwyler wrote:
>>> This patch introduces event tracepoints for tracking a device_nodes
>>> reference cycle as well as reconfig notifications generated in response
>>> to node/property manipulations.
>>>
>>> With the recent upstreaming of the refcount API several device_node
>>> underflows and leaks have come to my attention in the pseries (DLPAR) 
>>> dynamic
>>> logical partitioning code (ie. POWER speak for hotplugging virtual and 
>>> physcial
>>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>>> easy and quick mechanism for validating the reference counting of
>>> device_nodes during their lifetime.
>>>
>>> Further, when pseries lpars are migrated to a different machine we
>>> perform a live update of our device tree to bring it into alignment with the
>>> configuration of the new machine. The of_reconfig_notify trace point
>>> provides a mechanism that can be turned for debuging the device tree
>>> modifications with out having to build a custom kernel to get at the
>>> DEBUG code introduced by commit 00aa3720.
>>
>> I do not like changing individual (or small groups of) printk() style
>> debugging information to tracepoint style.
> 
> I'm not quite sure which printks() you're referring to.
> 
> The only printks that are removed in this series are under #ifdef DEBUG,
> and so are essentially not there unless you build a custom kernel.

Yes, I am talking about pr_debug(), pr_info(), pr_err(), etc.


> 
> They also only cover the reconfig case, which is actually less
> interesting than the much more common and bug-prone get/put logic.

When I was looking at the get/put issue I used pr_debug().


>> As far as I know, there is no easy way to combine trace data and printk()
>> style data to create a single chronology of events.  If some of the
>> information needed to debug an issue is trace data and some is printk()
>> style data then it becomes more difficult to understand the overall
>> situation.
> 
> If you enable CONFIG_PRINTK_TIME then you should be able to just sort
> the trace and the printk output by the timestamp. If you're really
> trying to correlate the two then you should probably just be using
> trace_printk().

Except the existing debug code that uses pr_debug() does not use
trace_printk().

And "just sort" does not apply to multi-line output like:

cpuhp/23-147   [023]    128.324827:
of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023]    128.324829:
of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023]    128.324829:
of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023]    128.324831:
of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009]    128.439000:
of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009]    128.439002:
of_reconfig_notify: action=DETACH_NODE, 
dn->full_name=/cpus/PowerPC,POWER8@10,
prop->name=null, old_prop->name=null
   drmgr-7284  [009]    128.439015:
of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009]    128.439016:
of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4

I was kinda hoping that maybe someone had already created a tool to deal
with this issue.  But not too optimistic.


> But IMO this level of detail, tracing every get/put, does not belong in
> printk. Trace points are absolutely the right solution for this type of
> debugging.
> 
> cheers
> .
>

Re: [PATCH] of: introduce event tracepoints for dynamic device_node lifecyle

2017-04-18 Thread Oliver O'Halloran

On Wed, Apr 19, 2017 at 2:46 AM, Rob Herring  wrote:
> On Mon, Apr 17, 2017 at 7:32 PM, Tyrel Datwyler
>  wrote:
>> This patch introduces event tracepoints for tracking a device_nodes
>> reference cycle as well as reconfig notifications generated in response
>> to node/property manipulations.
>>
>> With the recent upstreaming of the refcount API several device_node
>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>> logical partitioning code (ie. POWER speak for hotplugging virtual and 
>> physcial
>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>> easy and quick mechanism for validating the reference counting of
>> device_nodes during their lifetime.
>
> Not really relevant for this patch, but since you are looking at
> pseries and refcounting, the refcounting largely exists for pseries.
> It's also hard to get right as this type of fix is fairly common. It's
> now used for overlays, but we really probably only need to refcount
> the overlays or changesets as a whole, not at a node level. If you
> have any thoughts on how a different model of refcounting could work
> for pseries, I'd like to discuss it.

One idea I've been kicking around is differentiating short and long
term references to a node. I figure most leaks are due to a missing
of_node_put() within a stack frame so it might be possible to use the
ftrace infrastructure to detect and emit warnings if a short term
reference is leaked. Long term references are slightly harder to deal
with, but they're less common so we can add more detailed reference
tracking there (devm_of_get_node?).

Oliver

Re: [PATCH v13 03/10] mux: minimal mux subsystem and gpio-based mux controller

2017-04-18 Thread Joe Perches

On Tue, 2017-04-18 at 23:53 +0200, Peter Rosin wrote:
> On 2017-04-18 13:44, Greg Kroah-Hartman wrote:
> > On Tue, Apr 18, 2017 at 12:59:50PM +0200, Peter Rosin wrote:
[]
> > > > > + ret = device_add(&mux_chip->dev);
> > > > > + if (ret < 0)
> > > > > + dev_err(&mux_chip->dev,
> > > > > + "device_add failed in mux_chip_register: %d\n", 
> > > > > ret);
> > > > 
> > > > Did you run checkpatch.pl in strict mode on this new file?  Please do 
> > > > so :)
> > > 
> > > I did, and did it again just to be sure, and I do not get any complaints.
> > > So, what's wrong?
> > 
> > You list the function name in the printk string, it should complain
> > that __func__ should be used.  Oh well, it's just a perl script, it
> > doesn't always catch everything.
> > isn't always correct :)
> 
> Ah, ok.

Also, please use the checkpatch in -next as it has a
slightly better mechanism to identify functions and
uses in strings.

$ ./scripts/checkpatch.pl ~/1.patch
WARNING: Prefer using '"%s...", __func__' to using 'mux_chip_register', this 
function's name, in a string
#302: FILE: drivers/mux/mux-core.c:134:
+   "device_add failed in mux_chip_register: %d\n", ret);

[PATCH] cgroup: avoid attaching a cgroup root to two different superblocks, take 2

2017-04-18 Thread Zefan Li

Commit bfb0b80db5f9 is broken. Now we try to fix the race by delaying
the initialization of cgroup root refcnt until a superblock has been
allocated.

Cc: sta...@vger.kernel.org # 3.16+
Reported-by: Dmitry Vyukov 
Reported-by: Andrei Vagin 
Tested-by: Andrei Vagin 
Signed-off-by: Zefan Li 
---

I agree we apply this for 4.12.

Again I didn't receive your reply. I believe the problem is in my side.
I'll see if there's some network administrator who can look into this
issue, but I don't have much hope...

---
 kernel/cgroup/cgroup-internal.h |  2 +-
 kernel/cgroup/cgroup-v1.c   | 16 +++-
 kernel/cgroup/cgroup.c  |  8 
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index 9203bfb..e470268 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -163,7 +163,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, 
size_t buflen,
 
 void cgroup_free_root(struct cgroup_root *root);
 void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts);
-int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask);
+int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags);
 int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
 struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
   struct cgroup_root *root, unsigned long magic,
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 1dc22f6..6ca9b12 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -1072,6 +1072,7 @@ struct dentry *cgroup1_mount(struct file_system_type 
*fs_type, int flags,
struct cgroup_subsys *ss;
struct dentry *dentry;
int i, ret;
+   bool new_root = false;
 
cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
@@ -1181,10 +1182,11 @@ struct dentry *cgroup1_mount(struct file_system_type 
*fs_type, int flags,
ret = -ENOMEM;
goto out_unlock;
}
+   new_root = true;
 
init_cgroup_root(root, &opts);
 
-   ret = cgroup_setup_root(root, opts.subsys_mask);
+   ret = cgroup_setup_root(root, opts.subsys_mask, PERCPU_REF_INIT_DEAD);
if (ret)
cgroup_free_root(root);
 
@@ -1201,6 +1203,18 @@ struct dentry *cgroup1_mount(struct file_system_type 
*fs_type, int flags,
 CGROUP_SUPER_MAGIC, ns);
 
/*
+* There's a race window after we release cgroup_mutex and before
+* allocating a superblock. Make sure a concurrent process won't
+* be able to re-use the root during this window by delaying the
+* initialization of root refcnt.
+*/
+   if (new_root) {
+   mutex_lock(&cgroup_mutex);
+   percpu_ref_reinit(&root->cgrp.self.refcnt);
+   mutex_unlock(&cgroup_mutex);
+   }
+
+   /*
 * If @pinned_sb, we're reusing an existing root and holding an
 * extra ref on its sb.  Mount is complete.  Put the extra ref.
 */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 4885132..0f98010 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1640,7 +1640,7 @@ void init_cgroup_root(struct cgroup_root *root, struct 
cgroup_sb_opts *opts)
set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
 }
 
-int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
+int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
 {
LIST_HEAD(tmp_links);
struct cgroup *root_cgrp = &root->cgrp;
@@ -1656,8 +1656,8 @@ int cgroup_setup_root(struct cgroup_root *root, u16 
ss_mask)
root_cgrp->id = ret;
root_cgrp->ancestor_ids[0] = ret;
 
-   ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release, 0,
- GFP_KERNEL);
+   ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release,
+ ref_flags, GFP_KERNEL);
if (ret)
goto out;
 
@@ -4512,7 +4512,7 @@ int __init cgroup_init(void)
hash_add(css_set_table, &init_css_set.hlist,
 css_set_hash(init_css_set.subsys));
 
-   BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
+   BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0));
 
mutex_unlock(&cgroup_mutex);
 
-- 
1.8.3.1

[PATCH] doc/linuxized-acpica: fix typo

2017-04-18 Thread Cao jin

Signed-off-by: Cao jin 
---
 Documentation/acpi/linuxized-acpica.txt | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/acpi/linuxized-acpica.txt 
b/Documentation/acpi/linuxized-acpica.txt
index defe2ee..3ad7b0d 100644
--- a/Documentation/acpi/linuxized-acpica.txt
+++ b/Documentation/acpi/linuxized-acpica.txt
@@ -24,7 +24,7 @@ upstream.
The homepage of ACPICA project is: www.acpica.org, it is maintained and
supported by Intel Corporation.
 
-   The following figure depicts the Linux ACPI subystem where the ACPICA
+   The following figure depicts the Linux ACPI subsystem where the ACPICA
adaptation is included:
 
   +-+
@@ -110,7 +110,7 @@ upstream.
Linux patches.  The patches generated by this process are referred to as
"linuxized ACPICA patches".  The release process is carried out on a local
copy the ACPICA git repository.  Each commit in the monthly release is
-   converted into a linuxized ACPICA patch.  Together, they form the montly
+   converted into a linuxized ACPICA patch.  Together, they form the monthly
ACPICA release patchset for the Linux ACPI community.  This process is
illustrated in the following figure:
 
@@ -165,7 +165,7 @@ upstream.
.
 
Before the linuxized ACPICA patches are sent to the Linux ACPI community
-   for review, there is a quality ensurance build test process to reduce
+   for review, there is a quality assurance build test process to reduce
porting issues.  Currently this build process only takes care of the
following kernel configuration options:
CONFIG_ACPI/CONFIG_ACPI_DEBUG/CONFIG_ACPI_DEBUGGER
@@ -195,12 +195,12 @@ upstream.
   release utilities (please refer to Section 4 below for the details).
3. Linux specific features - Sometimes it's impossible to use the
   current ACPICA APIs to implement features required by the Linux kernel,
-  so Linux developers occasionaly have to change ACPICA code directly.
+  so Linux developers occasionally have to change ACPICA code directly.
   Those changes may not be acceptable by ACPICA upstream and in such cases
   they are left as committed ACPICA divergences unless the ACPICA side can
   implement new mechanisms as replacements for them.
4. ACPICA release fixups - ACPICA only tests commits using a set of the
-  user space simulation utilies, thus the linuxized ACPICA patches may
+  user space simulation utilities, thus the linuxized ACPICA patches may
   break the Linux kernel, leaving us build/boot failures.  In order to
   avoid breaking Linux bisection, fixes are applied directly to the
   linuxized ACPICA patches during the release process.  When the release
-- 
2.1.0

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt

On Tue, 2017-04-18 at 16:24 -0600, Jason Gunthorpe wrote:
> Basically, all this list processing is a huge overhead compared to
> just putting a helper call in the existing sg iteration loop of the
> actual op.  Particularly if the actual op is a no-op like no-mmu x86
> would use.

Yes, I'm leaning toward that approach too.

The helper itself could hang off the devmap though.

> Since dma mapping is a performance path we must be careful not to
> create intrinsic inefficiencies with otherwise nice layering :)
> 
> Jason

[PATCH] Documentation: DocBook: kgdb: update CONFIG_STRICT_KERNEL_RWX info

2017-04-18 Thread Li Qiang

CONFIG_STRICT_KERNEL_RWX is no longer selectable on most architectures.
Update this info to the documentation.

Signed-off-by: Li Qiang 
---
 Documentation/DocBook/kgdb.tmpl | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/DocBook/kgdb.tmpl b/Documentation/DocBook/kgdb.tmpl
index 856ac20..ef0b67b 100644
--- a/Documentation/DocBook/kgdb.tmpl
+++ b/Documentation/DocBook/kgdb.tmpl
@@ -121,7 +121,9 @@
 If kgdb supports it for the architecture you are using, you can
 use hardware breakpoints if you desire to run with the
 CONFIG_STRICT_KERNEL_RWX option turned on, else you need to turn off
-this option.
+this option. In most architectures, this option is not selectable.
+For this situation, it can be turned off by adding a runtime parameter
+'rodata=off'.
 
 
 Next you should choose one of more I/O drivers to interconnect
-- 
2.7.4

Re: [PATCH] scsi: fc: remove redundant check of an unsigned long being less than zero

2017-04-18 Thread Martin K. Petersen

Colin King  writes:

> The check for an unsigned long being less than zero is always false so
> it is a redundant check and can be removed.

Applied to 4.12/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: ibmvfc: don't check for failure from mempool_alloc()

2017-04-18 Thread Martin K. Petersen

NeilBrown  writes:

> mempool_alloc() cannot fail when passed GFP_NOIO or any other gfp
> setting that is permitted to sleep.  So remove this pointless code.

Applied to 4.12/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH] module: Unify the return value type of try_module_get

2017-04-18 Thread gfree . wind

From: Gao Feng 

The prototypes of try_module_get are different with different macro.
When enable module and module unload, it returns bool, but others not.
Now unify their return value type as bool.

Signed-off-by: Gao Feng 
---
 include/linux/module.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 0297c5c..6b79eb7 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -582,7 +582,7 @@ extern void __noreturn __module_put_and_exit(struct module 
*mod,
 extern void module_put(struct module *module);
 
 #else /*!CONFIG_MODULE_UNLOAD*/
-static inline int try_module_get(struct module *module)
+static inline bool try_module_get(struct module *module)
 {
return !module || module_is_live(module);
 }
@@ -674,9 +674,9 @@ static inline void __module_get(struct module *module)
 {
 }
 
-static inline int try_module_get(struct module *module)
+static inline bool try_module_get(struct module *module)
 {
-   return 1;
+   return true;
 }
 
 static inline void module_put(struct module *module)
-- 
1.9.1

[PATCH 1/3] f2fs: add ioctl to flush data from faster device to cold area

2017-04-18 Thread Jaegeuk Kim

This patch adds an ioctl to flush data in faster device to cold area. User can
give device number and number of segments to move. It doesn't move it if there
is only one device.

The parameter looks like:

struct f2fs_flush_device {
u32 dev_num;/* device number to flush */
u32 segments;   /* # of segments to flush */
};

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h| 12 --
 fs/f2fs/file.c| 67 +--
 fs/f2fs/gc.c  | 19 +++-
 fs/f2fs/segment.c | 14 
 fs/f2fs/segment.h |  4 +++-
 5 files changed, 102 insertions(+), 14 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 562db8989a4e..c28e8e7d6a5f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -280,6 +280,8 @@ static inline bool __has_cursum_space(struct f2fs_journal 
*journal,
 #define F2FS_IOC_DEFRAGMENT_IO(F2FS_IOCTL_MAGIC, 8)
 #define F2FS_IOC_MOVE_RANGE_IOWR(F2FS_IOCTL_MAGIC, 9,  \
struct f2fs_move_range)
+#define F2FS_IOC_FLUSH_DEVICE  _IOW(F2FS_IOCTL_MAGIC, 10,  \
+   struct f2fs_flush_device)
 
 #define F2FS_IOC_SET_ENCRYPTION_POLICY FS_IOC_SET_ENCRYPTION_POLICY
 #define F2FS_IOC_GET_ENCRYPTION_POLICY FS_IOC_GET_ENCRYPTION_POLICY
@@ -316,6 +318,11 @@ struct f2fs_move_range {
u64 len;/* size to move */
 };
 
+struct f2fs_flush_device {
+   u32 dev_num;/* device number to flush */
+   u32 segments;   /* # of segments to flush */
+};
+
 /*
  * For INODE and NODE manager
  */
@@ -941,7 +948,7 @@ struct f2fs_sb_info {
int bg_gc;  /* background gc calls */
unsigned int ndirty_inode[NR_INODE_TYPE];   /* # of dirty inodes */
 #endif
-   unsigned int last_victim[2];/* last victim segment # */
+   unsigned int last_victim[4];/* last victim segment # */
spinlock_t stat_lock;   /* lock for stat operations */
 
/* For sysfs suppport */
@@ -2323,7 +2330,8 @@ int f2fs_migrate_page(struct address_space *mapping, 
struct page *newpage,
 int start_gc_thread(struct f2fs_sb_info *sbi);
 void stop_gc_thread(struct f2fs_sb_info *sbi);
 block_t start_bidx_of_node(unsigned int node_ofs, struct inode *inode);
-int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background);
+int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background,
+   unsigned int segno);
 void build_gc_manager(struct f2fs_sb_info *sbi);
 
 /*
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 0ac833dd2634..561ecb46007b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1855,7 +1855,7 @@ static int f2fs_ioc_gc(struct file *filp, unsigned long 
arg)
mutex_lock(&sbi->gc_mutex);
}
 
-   ret = f2fs_gc(sbi, sync, true);
+   ret = f2fs_gc(sbi, sync, true, NULL_SEGNO);
 out:
mnt_drop_write_file(filp);
return ret;
@@ -2211,6 +2211,67 @@ static int f2fs_ioc_move_range(struct file *filp, 
unsigned long arg)
return err;
 }
 
+static int f2fs_ioc_flush_device(struct file *filp, unsigned long arg)
+{
+   struct inode *inode = file_inode(filp);
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+   unsigned int start_segno = 0, end_segno = 0;
+   unsigned int dev_start_segno = 0, dev_end_segno = 0;
+   struct f2fs_flush_device range;
+   int ret;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (f2fs_readonly(sbi->sb))
+   return -EROFS;
+
+   if (copy_from_user(&range, (struct f2fs_flush_device __user *)arg,
+   sizeof(range)))
+   return -EFAULT;
+
+   if (sbi->s_ndevs <= 1 || sbi->s_ndevs - 1 <= range.dev_num) {
+   f2fs_msg(sbi->sb, KERN_WARNING, "Can't flush %u in %d\n",
+   range.dev_num, sbi->s_ndevs);
+   return -EINVAL;
+   }
+
+   ret = mnt_want_write_file(filp);
+   if (ret)
+   return ret;
+
+   if (range.dev_num != 0)
+   dev_start_segno = GET_SEGNO(sbi, FDEV(range.dev_num).start_blk);
+   dev_end_segno = GET_SEGNO(sbi, FDEV(range.dev_num).end_blk);
+
+   start_segno = sbi->last_victim[FLUSH_DEVICE];
+   if (start_segno < dev_start_segno || start_segno >= dev_end_segno)
+   start_segno = dev_start_segno;
+   end_segno = min(start_segno + range.segments, dev_end_segno);
+
+   while (start_segno < end_segno) {
+   if (!mutex_trylock(&sbi->gc_mutex)) {
+   ret = -EBUSY;
+   goto out;
+   }
+   sbi->last_victim[GC_CB] = end_segno + 1;
+   sbi->last_victim[GC_GREEDY] = end_segno + 1;
+   sbi->last_victim[ALLOC_NEXT] = end_segno + 1;
+

[PATCH 3/3] f2fs: assign allocation hint for warm/cold data

2017-04-18 Thread Jaegeuk Kim

This patch gives slower device region to warm/cold data area more eagerly.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/gc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d988c1aaf132..eb1846a80da3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1028,4 +1028,9 @@ void build_gc_manager(struct f2fs_sb_info *sbi)
 
sbi->fggc_threshold = div64_u64((main_count - ovp_count) *
BLKS_PER_SEC(sbi), (main_count - resv_count));
+
+   /* give warm/cold data area from slower device */
+   if (sbi->s_ndevs)
+   sbi->last_victim[ALLOC_NEXT] =
+   GET_SEGNO(sbi, FDEV(0).end_blk) + 1;
 }
-- 
2.11.0

[PATCH 2/3] f2fs: fix _IOW usage

2017-04-18 Thread Jaegeuk Kim

This patch fixes wrong _IOW usage.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c28e8e7d6a5f..6655061f6d3f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -275,9 +275,10 @@ static inline bool __has_cursum_space(struct f2fs_journal 
*journal,
 #define F2FS_IOC_START_VOLATILE_WRITE  _IO(F2FS_IOCTL_MAGIC, 3)
 #define F2FS_IOC_RELEASE_VOLATILE_WRITE_IO(F2FS_IOCTL_MAGIC, 4)
 #define F2FS_IOC_ABORT_VOLATILE_WRITE  _IO(F2FS_IOCTL_MAGIC, 5)
-#define F2FS_IOC_GARBAGE_COLLECT   _IO(F2FS_IOCTL_MAGIC, 6)
+#define F2FS_IOC_GARBAGE_COLLECT   _IOW(F2FS_IOCTL_MAGIC, 6, __u32)
 #define F2FS_IOC_WRITE_CHECKPOINT  _IO(F2FS_IOCTL_MAGIC, 7)
-#define F2FS_IOC_DEFRAGMENT_IO(F2FS_IOCTL_MAGIC, 8)
+#define F2FS_IOC_DEFRAGMENT_IOWR(F2FS_IOCTL_MAGIC, 8,  \
+   struct f2fs_defragment)
 #define F2FS_IOC_MOVE_RANGE_IOWR(F2FS_IOCTL_MAGIC, 9,  \
struct f2fs_move_range)
 #define F2FS_IOC_FLUSH_DEVICE  _IOW(F2FS_IOCTL_MAGIC, 10,  \
-- 
2.11.0

Re: [PATCH] of: introduce event tracepoints for dynamic device_node lifecyle

2017-04-18 Thread Frank Rowand

On 04/18/17 17:07, Frank Rowand wrote:
> On 04/17/17 17:32, Tyrel Datwyler wrote:
>> This patch introduces event tracepoints for tracking a device_nodes
>> reference cycle as well as reconfig notifications generated in response
>> to node/property manipulations.
>>
>> With the recent upstreaming of the refcount API several device_node
>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>> logical partitioning code (ie. POWER speak for hotplugging virtual and 
>> physcial
>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>> easy and quick mechanism for validating the reference counting of
>> device_nodes during their lifetime.
>>
>> Further, when pseries lpars are migrated to a different machine we
>> perform a live update of our device tree to bring it into alignment with the
>> configuration of the new machine. The of_reconfig_notify trace point
>> provides a mechanism that can be turned for debuging the device tree
>> modifications with out having to build a custom kernel to get at the
>> DEBUG code introduced by commit 00aa3720.
> 
> I do not like changing individual (or small groups of) printk() style
> debugging information to tracepoint style.
> 
> As far as I know, there is no easy way to combine trace data and printk()
> style data to create a single chronology of events.  If some of the
> information needed to debug an issue is trace data and some is printk()
> style data then it becomes more difficult to understand the overall
> situation.

And of course the other issue with using tracepoints is the extra space
required to hold the tracepoint info.  With the pr_debug() approach, the
space usage can be easily removed for a production kernel via a config
option.

Tracepoints are wonderful technology, but not always the proper tool to
use for debug info.

> If Rob wants to convert printk() style data to trace data (and I can't
> convince him otherwise) then I will have further comments on this specific
> patch.
> 
> -Frank
> 
>>
>> The following trace events are provided: of_node_get, of_node_put,
>> of_node_release, and of_reconfig_notify. These trace points require a kernel
>> built with ftrace support to be enabled. In a typical environment where
>> debugfs is mounted at /sys/kernel/debug the entire set of tracepoints
>> can be set with the following:
>>
>>   echo "of:*" > /sys/kernel/debug/tracing/set_event
>>
>> or
>>
>>   echo 1 > /sys/kernel/debug/tracing/of/enable
>>
>> The following shows the trace point data from a DLPAR remove of a cpu
>> from a pseries lpar:
>>
>> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
>>
>> cpuhp/23-147   [023]    128.324827:
>>  of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023]    128.324829:
>>  of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023]    128.324829:
>>  of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023]    128.324831:
>>  of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>>drmgr-7284  [009]    128.439000:
>>  of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>>drmgr-7284  [009]    128.439002:
>>  of_reconfig_notify: action=DETACH_NODE, 
>> dn->full_name=/cpus/PowerPC,POWER8@10,
>>  prop->name=null, old_prop->name=null
>>drmgr-7284  [009]    128.439015:
>>  of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>>drmgr-7284  [009]    128.439016:
>>  of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
>>
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  drivers/of/dynamic.c  | 30 ++-
>>  include/trace/events/of.h | 93 
>> +++
>>  2 files changed, 105 insertions(+), 18 deletions(-)
>>  create mode 100644 include/trace/events/of.h
>>
> 
> < snip >
> 
>

RE: [PATCH] ACPICA: Export mutex functions

2017-04-18 Thread Zheng, Lv

Hi,

> From: Devel [mailto:devel-boun...@acpica.org] On Behalf Of Zheng, Lv
> Subject: Re: [Devel] [PATCH] ACPICA: Export mutex functions
> 
> Hi,
> 
> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > Subject: Re: [PATCH] ACPICA: Export mutex functions
> >
> > On 04/18/2017 12:14 AM, Zheng, Lv wrote:
> > > Hi,
> > >
> > >> From: Zheng, Lv
> > >> Subject: RE: [PATCH] ACPICA: Export mutex functions
> > >>
> > >> Hi,
> > >>
> > >>> From: Guenter Roeck [mailto:li...@roeck-us.net]
> > >>> Subject: Re: [PATCH] ACPICA: Export mutex functions
> > >>>
> > >>> On 04/17/2017 04:53 PM, Zheng, Lv wrote:
> >  Hi,
> > 
> > > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > > Subject: Re: [PATCH] ACPICA: Export mutex functions
> > >
> > > On Mon, Apr 17, 2017 at 11:29:38PM +0200, Rafael J. Wysocki wrote:
> > >> On Mon, Apr 17, 2017 at 11:03 PM, Guenter Roeck  
> > >> wrote:
> > >>> On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
> > 
> > 
> > > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > > Subject: Re: [PATCH] ACPICA: Export mutex functions
> > >
> > > On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
> > >>
> > >>> From: Moore, Robert
> > >>> Subject: RE: [PATCH] ACPICA: Export mutex functions
> > >>>
> > >>> There is a model for the drivers to directly acquire an AML 
> > >>> mutex
> > >>> object. That is why the acquire/release public interfaces were 
> > >>> added
> > >>> to ACPICA.
> > >>>
> > >>> I forget all of the details, but the model was developed with 
> > >>> MS and
> > >>> others during the ACPI 6.0 timeframe.
> > >>>
> > >>>
> > >> [Moore, Robert]
> > >>
> > >>
> > >> Here is the case where the OS may need to directly acquire an AML
> > > mutex:
> > >>
> > >> From the ACPI spec:
> > >>
> > >> 19.6.2 Acquire (Acquire a Mutex)
> > >>
> > >> Note: For Mutex objects referenced by a _DLM object, the host OS 
> > >> may
> > > also contend for ownership.
> > >>
> > > From the context in the dsdt, and from description of expected 
> > > use cases
> > > for _DLM objects I can find, this is what the mutex is used for 
> > > (to
> > > serialize access to a resource on a low pin count serial 
> > > interconnect,
> > > aka LPC).
> > >
> > > What does that mean in practice ? That I am not supposed to use it
> > > because it doesn't follow standard ACPI mutex declaration rules ?
> > >
> > > Thanks,
> > > Guenter
> > >
> > >>
> >  [Moore, Robert]
> > 
> >  I'm not an expert on the _DLM method, but I would point you to the 
> >  description section in
> the
> > > ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
> > 
> > >>>
> > >>> I did. However, not being an ACPI expert, that doesn't tell me 
> > >>> anything.
> > >>
> > >> Basically, if the kernel and AML need to access a device 
> > >> concurrently,
> > >> there should be a _DLM object under that device in the ACPI tables.
> > >> In that case it is expected to return a list of (AML) mutexes that 
> > >> can
> > >> be acquired by the kernel in order to synchronize device access with
> > >> respect to AML (and for each mutex it may also return a description 
> > >> of
> > >> the specific resources to be protected by it).
> > >>
> > >> Bottom line: without _DLM, the kernel cannot synchronize things with
> > >> respect to AML properly, because it has no information how to do that
> > >> then.
> > >
> > > That is all quite interesting. I do see the mutex in question used on 
> > > various
> > > motherboards from various vendors (I checked boards from Gigabyte, 
> > > MSI, and
> > > Intel). Interestingly, the naming seems to be consistent - it is 
> > > always named
> > > "MUT0". For the most part, it seems to be available on more recent
> > > motherboards; older motherboards tend to use the resource without 
> > > locking.
> > > However, I don't see any mention of "_DLM" in any of the DSDTs.
> > >
> > 
> >  OK, then you might be having problems in your opregion driver.
> > 
> > > At the same time, access to ports 0x2e/0x2f is widely used in the 
> > > kernel.
> > > As mentioned before, it is used in watchdog, hardware monitoring, and 
> > > gpio
> > > drivers, but also in parallel port and infrared driver code. 
> > > Effectively
> > > that means that all this code is inherently unsafe on systems with 
> > > ACPI
> > > support.
> > >
> > > I had thought about implementing a set of utility functions

[rcu:rcu/next 29/29] include/linux/srcutiny.h:90:15: error: implicit declaration of function 'rcu_seq_ctr'

2017-04-18 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
rcu/next
head:   b4d55cac0a93834e7e79143111a0b8ecea49a630
commit: b4d55cac0a93834e7e79143111a0b8ecea49a630 [29/29] srcu: Make rcutorture 
writer stalls print SRCU GP state
config: i386-randconfig-x005-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout b4d55cac0a93834e7e79143111a0b8ecea49a630
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from include/linux/srcu.h:60:0,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:749,
from include/linux/gfp.h:5,
from include/linux/slab.h:14,
from include/linux/crypto.h:24,
from arch/x86/kernel/asm-offsets.c:8:
   include/linux/srcutiny.h: In function 'srcutorture_get_gp_data':
>> include/linux/srcutiny.h:90:15: error: implicit declaration of function 
>> 'rcu_seq_ctr' [-Werror=implicit-function-declaration]
 *completed = rcu_seq_ctr(sp->srcu_gp_seq);
  ^~~
   Cyclomatic Complexity 1 arch/x86/kernel/asm-offsets_32.c:foo
   Cyclomatic Complexity 1 arch/x86/kernel/asm-offsets.c:common
   Cyclomatic Complexity 1 
arch/x86/kernel/asm-offsets.c:_GLOBAL__sub_I_65535_0_foo
   cc1: some warnings being treated as errors
   make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +/rcu_seq_ctr +90 include/linux/srcutiny.h

84 unsigned long *gpnum,
85 unsigned long *completed)
86  {
87  if (test_type != SRCU_FLAVOR)
88  return;
89  *flags = 0;
  > 90  *completed = rcu_seq_ctr(sp->srcu_gp_seq);
91  *gpnum = *completed;
92  if (rcu_segcblist_ready_cbs(&sp->srcu_cblist))
93  (*gpnum)++;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [rcu:rcu/next 29/29] kernel/rcu/rcutorture.c:1369:3: error: implicit declaration of function 'srcutorture_get_gp_data'

2017-04-18 Thread Paul E. McKenney

On Wed, Apr 19, 2017 at 09:26:55AM +0800, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
> rcu/next
> head:   b4d55cac0a93834e7e79143111a0b8ecea49a630
> commit: b4d55cac0a93834e7e79143111a0b8ecea49a630 [29/29] srcu: Make 
> rcutorture writer stalls print SRCU GP state
> config: x86_64-randconfig-x012-201716 (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> git checkout b4d55cac0a93834e7e79143111a0b8ecea49a630
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>kernel/rcu/rcutorture.c: In function 'rcu_torture_stats_print':
> >> kernel/rcu/rcutorture.c:1369:3: error: implicit declaration of function 
> >> 'srcutorture_get_gp_data' [-Werror=implicit-function-declaration]
>   srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
>   ^~~
>cc1: some warnings being treated as errors

Yes, I should have used "experimental" as branch name.  Apologies for
the noise!

Thanx, Paul

> vim +/srcutorture_get_gp_data +1369 kernel/rcu/rcutorture.c
> 
>   1363int __maybe_unused flags = 0;
>   1364unsigned long __maybe_unused gpnum = 0;
>   1365unsigned long __maybe_unused completed = 0;
>   1366
>   1367rcutorture_get_gp_data(cur_ops->ttype,
>   1368   &flags, &gpnum, 
> &completed);
> > 1369srcutorture_get_gp_data(cur_ops->ttype, 
> > srcu_ctlp,
>   1370&flags, &gpnum, 
> &completed);
>   1371wtp = READ_ONCE(writer_task);
>   1372pr_alert("??? Writer stall state %s(%d) g%lu 
> c%lu f%#x ->state %#lx\n",
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH] of: introduce event tracepoints for dynamic device_node lifecyle

2017-04-18 Thread Michael Ellerman

Frank Rowand  writes:

> On 04/17/17 17:32, Tyrel Datwyler wrote:
>> This patch introduces event tracepoints for tracking a device_nodes
>> reference cycle as well as reconfig notifications generated in response
>> to node/property manipulations.
>> 
>> With the recent upstreaming of the refcount API several device_node
>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>> logical partitioning code (ie. POWER speak for hotplugging virtual and 
>> physcial
>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>> easy and quick mechanism for validating the reference counting of
>> device_nodes during their lifetime.
>> 
>> Further, when pseries lpars are migrated to a different machine we
>> perform a live update of our device tree to bring it into alignment with the
>> configuration of the new machine. The of_reconfig_notify trace point
>> provides a mechanism that can be turned for debuging the device tree
>> modifications with out having to build a custom kernel to get at the
>> DEBUG code introduced by commit 00aa3720.
>
> I do not like changing individual (or small groups of) printk() style
> debugging information to tracepoint style.

I'm not quite sure which printks() you're referring to.

The only printks that are removed in this series are under #ifdef DEBUG,
and so are essentially not there unless you build a custom kernel.

They also only cover the reconfig case, which is actually less
interesting than the much more common and bug-prone get/put logic.

> As far as I know, there is no easy way to combine trace data and printk()
> style data to create a single chronology of events.  If some of the
> information needed to debug an issue is trace data and some is printk()
> style data then it becomes more difficult to understand the overall
> situation.

If you enable CONFIG_PRINTK_TIME then you should be able to just sort
the trace and the printk output by the timestamp. If you're really
trying to correlate the two then you should probably just be using
trace_printk().

But IMO this level of detail, tracing every get/put, does not belong in
printk. Trace points are absolutely the right solution for this type of
debugging.

cheers

[rcu:rcu/next 29/29] kernel/rcu/rcutorture.c:1369:3: error: implicit declaration of function 'srcutorture_get_gp_data'

2017-04-18 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git 
rcu/next
head:   b4d55cac0a93834e7e79143111a0b8ecea49a630
commit: b4d55cac0a93834e7e79143111a0b8ecea49a630 [29/29] srcu: Make rcutorture 
writer stalls print SRCU GP state
config: x86_64-randconfig-x012-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout b4d55cac0a93834e7e79143111a0b8ecea49a630
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   kernel/rcu/rcutorture.c: In function 'rcu_torture_stats_print':
>> kernel/rcu/rcutorture.c:1369:3: error: implicit declaration of function 
>> 'srcutorture_get_gp_data' [-Werror=implicit-function-declaration]
  srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
  ^~~
   cc1: some warnings being treated as errors

vim +/srcutorture_get_gp_data +1369 kernel/rcu/rcutorture.c

  1363  int __maybe_unused flags = 0;
  1364  unsigned long __maybe_unused gpnum = 0;
  1365  unsigned long __maybe_unused completed = 0;
  1366  
  1367  rcutorture_get_gp_data(cur_ops->ttype,
  1368 &flags, &gpnum, &completed);
> 1369  srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp,
  1370  &flags, &gpnum, &completed);
  1371  wtp = READ_ONCE(writer_task);
  1372  pr_alert("??? Writer stall state %s(%d) g%lu c%lu f%#x 
->state %#lx\n",

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt

On Tue, 2017-04-18 at 17:21 -0600, Jason Gunthorpe wrote:
> Splitting the sgl is different from iommu batching.
> 
> As an example, an O_DIRECT write of 1 MB with a single 4K P2P page in
> the middle.
> 
> The optimum behavior is to allocate a 1MB-4K iommu range and fill it
> with the CPU memory. Then return a SGL with three entires, two
> pointing into the range and one to the p2p.
> 
> It is creating each range which tends to be expensive, so creating
> two
> ranges (or worse, if every SGL created a range it would be 255) is
> very undesired.

I think it's easier to get us started to just use a helper and
stick it in the existing sglist processing loop of the architecture.

As we noticed, stacking dma_ops is actually non-trivial and opens quite
the can of worms.

As Jerome mentioned, you can end up with IOs ops containing an sglist
that is a collection of memory and GPU pages for example.

Cheers,
Ben.

RE: [PATCH] ACPICA: Export mutex functions

2017-04-18 Thread Zheng, Lv

Hi,

> From: Guenter Roeck [mailto:li...@roeck-us.net]
> Subject: Re: [PATCH] ACPICA: Export mutex functions
> 
> On 04/18/2017 12:14 AM, Zheng, Lv wrote:
> > Hi,
> >
> >> From: Zheng, Lv
> >> Subject: RE: [PATCH] ACPICA: Export mutex functions
> >>
> >> Hi,
> >>
> >>> From: Guenter Roeck [mailto:li...@roeck-us.net]
> >>> Subject: Re: [PATCH] ACPICA: Export mutex functions
> >>>
> >>> On 04/17/2017 04:53 PM, Zheng, Lv wrote:
>  Hi,
> 
> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > Subject: Re: [PATCH] ACPICA: Export mutex functions
> >
> > On Mon, Apr 17, 2017 at 11:29:38PM +0200, Rafael J. Wysocki wrote:
> >> On Mon, Apr 17, 2017 at 11:03 PM, Guenter Roeck  
> >> wrote:
> >>> On Mon, Apr 17, 2017 at 08:40:38PM +, Moore, Robert wrote:
> 
> 
> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > Subject: Re: [PATCH] ACPICA: Export mutex functions
> >
> > On Mon, Apr 17, 2017 at 07:27:37PM +, Moore, Robert wrote:
> >>
> >>> From: Moore, Robert
> >>> Subject: RE: [PATCH] ACPICA: Export mutex functions
> >>>
> >>> There is a model for the drivers to directly acquire an AML mutex
> >>> object. That is why the acquire/release public interfaces were 
> >>> added
> >>> to ACPICA.
> >>>
> >>> I forget all of the details, but the model was developed with MS 
> >>> and
> >>> others during the ACPI 6.0 timeframe.
> >>>
> >>>
> >> [Moore, Robert]
> >>
> >>
> >> Here is the case where the OS may need to directly acquire an AML
> > mutex:
> >>
> >> From the ACPI spec:
> >>
> >> 19.6.2 Acquire (Acquire a Mutex)
> >>
> >> Note: For Mutex objects referenced by a _DLM object, the host OS 
> >> may
> > also contend for ownership.
> >>
> > From the context in the dsdt, and from description of expected use 
> > cases
> > for _DLM objects I can find, this is what the mutex is used for (to
> > serialize access to a resource on a low pin count serial 
> > interconnect,
> > aka LPC).
> >
> > What does that mean in practice ? That I am not supposed to use it
> > because it doesn't follow standard ACPI mutex declaration rules ?
> >
> > Thanks,
> > Guenter
> >
> >>
>  [Moore, Robert]
> 
>  I'm not an expert on the _DLM method, but I would point you to the 
>  description section in the
> > ACPI spec, 5.7.5 _DLM (DeviceLock Mutex).
> 
> >>>
> >>> I did. However, not being an ACPI expert, that doesn't tell me 
> >>> anything.
> >>
> >> Basically, if the kernel and AML need to access a device concurrently,
> >> there should be a _DLM object under that device in the ACPI tables.
> >> In that case it is expected to return a list of (AML) mutexes that can
> >> be acquired by the kernel in order to synchronize device access with
> >> respect to AML (and for each mutex it may also return a description of
> >> the specific resources to be protected by it).
> >>
> >> Bottom line: without _DLM, the kernel cannot synchronize things with
> >> respect to AML properly, because it has no information how to do that
> >> then.
> >
> > That is all quite interesting. I do see the mutex in question used on 
> > various
> > motherboards from various vendors (I checked boards from Gigabyte, MSI, 
> > and
> > Intel). Interestingly, the naming seems to be consistent - it is always 
> > named
> > "MUT0". For the most part, it seems to be available on more recent
> > motherboards; older motherboards tend to use the resource without 
> > locking.
> > However, I don't see any mention of "_DLM" in any of the DSDTs.
> >
> 
>  OK, then you might be having problems in your opregion driver.
> 
> > At the same time, access to ports 0x2e/0x2f is widely used in the 
> > kernel.
> > As mentioned before, it is used in watchdog, hardware monitoring, and 
> > gpio
> > drivers, but also in parallel port and infrared driver code. Effectively
> > that means that all this code is inherently unsafe on systems with ACPI
> > support.
> >
> > I had thought about implementing a set of utility functions to make the 
> > kernel
> > code safer to use if the mutex is found to exist.
> 
>  As what you've mentioned, there are already lots of parallel accesses in 
>  kernel without enabling
> >>> ACPI.
>  Are these accesses mutually exclusive (safe)?
> >>>
> >>> In-kernel, yes (using request_muxed_region). Against ACPI, no.
> >>>
>  If so, why do you need to invent a new synchronization mechanism?
> 
> >>>
> >>> Because I am seeing a pro

Re: [PATCH 0/4] ftrace: Add 'function-fork' trace option (v2)

2017-04-18 Thread Steven Rostedt

On Wed, 19 Apr 2017 09:27:28 +0900
Namhyung Kim  wrote:

> Hi Steve,
> 
> Sorry for little late,
> 
> On Tue, Apr 18, 2017 at 4:18 AM, Steven Rostedt  wrote:
> > On Mon, 17 Apr 2017 11:44:26 +0900
> > Namhyung Kim  wrote:
> >  
> >> Hello,
> >>
> >> This patchset add 'function-fork' option to function tracer which
> >> makes pid filter to be inherited like 'event-fork' does.  During the
> >> test, I found a bug of pid filter on an instance directory.  The patch
> >> 1 fixes it and maybe it should go to the stable tree.  
> >
> > Hmm, are the other patches dependent on it?  
> 
> Nop, but there will be a small clash on trace.h for the declaration.

Yep, I push up a merge with mainline with my linux-next branch to cover
the conflicts. I'll let Linus know about it too when I do my pull
request in the merge window.

> 
> >
> > I think I may just push it separately to Linus now, but the other
> > patches will be on my devel branch which will not be abased off of this
> > fix. Will that break too much? I just cherry-picked a patch from my
> > urgent branch as it required to be on my devel branch and go to Linus.  
> 
> I don't think it breaks much.

Except that your test triggers the bug it uncovered ;-)

-- Steve

> 
> >
> > Hmm, I may be able to make a separate branch with this. I have to see
> > how much it conflicts with my current development.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt

On Tue, 2017-04-18 at 15:22 -0600, Jason Gunthorpe wrote:
> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote:
> > > I think this opens an even bigger can of worms..
> > 
> > No, I don't think it does. You'd only shim when the target page is
> > backed by a device, not host memory, and you can figure this out by
> > a
> > is_zone_device_page()-style lookup.
> 
> The bigger can of worms is how do you meaningfully stack dma_ops.
> 
> What does the p2p provider do when it detects a p2p page?

Yeah I think we don't really want to stack dma_ops... thinking more
about it.

As I just wrote, it looks like we might need a more specialised hook
in the devmap to be used by the main dma_op, on a per-page basis.

Ben.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt

On Tue, 2017-04-18 at 15:03 -0600, Jason Gunthorpe wrote:
> I don't follow, when does get_dma_ops() return a p2p aware provider?
> It has no way to know if the DMA is going to involve p2p, get_dma_ops
> is called with the device initiating the DMA.
> 
> So you'd always return the P2P shim on a system that has registered
> P2P memory?
> 
> Even so, how does this shim work? dma_ops are not really intended to
> be stacked. How would we make unmap work, for instance? What happens
> when the underlying iommu dma ops actually natively understands p2p
> and doesn't want the shim?

Good point. We only know on a per-page basis ... ugh.

So we really need to change the arch main dma_ops. I'm not opposed to
that. What we then need to do is have that main arch dma_map_sg,
when it encounters a "device" page, call into a helper attached to
the devmap to handle *that page*, providing sufficient context.

That helper wouldn't perform the actual iommu mapping. It would simply
return something along the lines of:

 - "use that alternate bus address and don't map in the iommu"
 - "use that alternate bus address and do map in the iommu"
 - "proceed as normal"
 - "fail"

What do you think ?

Cheers,
Ben.

Re: [Patch v3 1/2] lustre: Parantheses added for Macro argument to avoid precedence issues

2017-04-18 Thread Dilger, Andreas

On Apr 18, 2017, at 09:50, g...@kroah.com wrote:
> 
> On Sat, Apr 15, 2017 at 01:50:42PM +, Rishiraj Manwatkar wrote:
>> Subject: [Patch v3 1/2] lustre: Parantheses added for Macro argument to 
>> avoid precedence issues

(typo) s/Parantheses/parenthesis/ s/Macro/macro/

The Subject line (excluding [PATCH] part) should be under 60

>> Parantheses are added for Macro argument, to avoid precedence issues.


Should be something like:

Subject: [PATCH v4 1/2] staging/lustre: add parenthesis to macro arguments

Add parenthesis to cl_io_for_each() macro to avoid potential issues with
unexpected argument expansion in CPP.

>> Signed-off-by: Rishiraj Manwatkar 
>> ---
>> v1 -> v2: Added mailing list in cc.
>> v2 -> v3: Changed From: to be same as Signed-off-by:.
>> drivers/staging/lustre/lustre/obdclass/cl_io.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/staging/lustre/lustre/obdclass/cl_io.c 
>> b/drivers/staging/lustre/lustre/obdclass/cl_io.c
>> index ee7d677..0997254 100755
>> --- a/drivers/staging/lustre/lustre/obdclass/cl_io.c
>> +++ b/drivers/staging/lustre/lustre/obdclass/cl_io.c
>> @@ -52,9 +52,9 @@
>>  */
>> 
>> #define cl_io_for_each(slice, io) \
>> -list_for_each_entry((slice), &io->ci_layers, cis_linkage)
>> +list_for_each_entry((slice), &(io)->ci_layers, cis_linkage)
> 
> Really?  There is no precedence issues that I can see here, sorry.


Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt

On Tue, 2017-04-18 at 14:48 -0600, Logan Gunthorpe wrote:
> > ...and that dma_map goes through get_dma_ops(), so I don't see the conflict?
> 
> The main conflict is in dma_map_sg which only does get_dma_ops once but
> the sg may contain memory of different types.

We can handle that in our "overriden" dma ops.

It's a bit tricky but it *could* break it down into segments and
forward portions back to the original dma ops.

Ben.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-18 Thread Benjamin Herrenschmidt

On Tue, 2017-04-18 at 12:00 -0600, Jason Gunthorpe wrote:
> - All platforms can succeed if the PCI devices are under the same
>   'segment', but where segments begin is somewhat platform specific
>   knowledge. (this is 'same switch' idea Logan has talked about)

We also need to be careful whether P2P is enabled in the switch
or not.

Cheers,
Ben.

1 2 3 4 5 6 7 8 9 >

1 - 100 of 830 matches

Mail list logo