date:20171114

[PATCH] ipmi: Stop timers before cleaning up the module

2017-11-14 Thread Masamitsu Yamazaki

System may crash after unloading ipmi_si.ko module
because a timer may remain and fire after the module cleaned up resources.

cleanup_one_si() contains the following processing.

/*
 * Make sure that interrupts, the timer and the thread are
 * stopped and will not run again.
 */
if (to_clean->irq_cleanup)
to_clean->irq_cleanup(to_clean);
wait_for_timer_and_thread(to_clean);

/*
 * Timeouts are stopped, now make sure the interrupts are off
 * in the BMC.  Note that timers and CPU interrupts are off,
 * so no need for locks.
 */
while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
poll(to_clean);
schedule_timeout_uninterruptible(1);
}

si_state changes as following in the while loop calling poll(to_clean).

  SI_GETTING_MESSAGES
=> SI_CHECKING_ENABLES
 => SI_SETTING_ENABLES
  => SI_GETTING_EVENTS
   => SI_NORMAL

As written in the code comments above,
timers are expected to stop before the polling loop and not to run again.
But the timer is set again in the following process
when si_state becomes SI_SETTING_ENABLES.

  => poll
 => smi_event_handler
   => handle_transaction_done
  // smi_info->si_state == SI_SETTING_ENABLES
 => start_getting_events
   => start_new_msg
=> smi_mod_timer
  => mod_timer

As a result, before the timer set in start_new_msg() expires, 
the polling loop may see si_state becoming SI_NORMAL 
and the module clean-up finishes.

For example, hard LOCKUP and panic occurred as following.
smi_timeout was called after smi_event_handler,
kcs_event and hangs at port_inb()
trying to access I/O port after release.

#11 [88069fdc5ef0] end_repeat_nmi at 816ac8d3
[exception RIP: port_inb+19]
RIP: c0473053  RSP: 88069fdc3d80  RFLAGS: 0006
RAX: 8806800f8e00  RBX: 880682bd9400  RCX: 
RDX: 0ca3  RSI: 0ca3  RDI: 8806800f8e40
RBP: 88069fdc3d80   R8: 81d86dfc   R9: 81e36426
R10: 000509f0  R11: 0010  R12: 00]:00
R13:   R14: 0246  R15: 8806800f8e00
ORIG_RAX:   CS: 0010  SS: 
---  ---
#12 [88069fdc3d80] port_inb at c0473053 [ipmi_si]
#13 [88069fdc3d88] kcs_event at c0477952 [ipmi_si]
#14 [88069fdc3db0] smi_event_handler at c047465d [ipmi_si]
#15 [88069fdc3df0] smi_timeout at c0474f9e [ipmi_si]

To fix the problem I defined a flag, timer_can_start, 
as member of struct smi_info.
The flag is enabled immediately after initializing the timer
and disabled immediately before waiting for timer deletion.

Fixes: 0cfec916e86d ("ipmi: Start the timer and thread on internal msgs")
Signed-off-by: Yamazaki Masamitsu 



diff -Nurp a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
--- a/drivers/char/ipmi/ipmi_si_intf.c  2017-11-09 15:00:31.436926440 +0900
+++ b/drivers/char/ipmi/ipmi_si_intf.c  2017-11-13 14:14:02.399051610 +0900
@@ -242,6 +242,9 @@ struct smi_info {
/* The timer for this si. */
struct timer_list   si_timer;
 
+   /* This flag is set, if the timer can be set */
+   booltimer_can_start;
+
/* This flag is set, if the timer is running (timer_pending() isn't 
enough) */
booltimer_running;
 
@@ -417,6 +420,8 @@ out:
 
 static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val)
 {
+   if (!smi_info->timer_can_start)
+   return;
smi_info->last_timeout_jiffies = jiffies;
mod_timer(&smi_info->si_timer, new_val);
smi_info->timer_running = true;
@@ -436,21 +441,18 @@ static void start_new_msg(struct smi_inf
smi_info->handlers->start_transaction(smi_info->si_sm, msg, size);
 }
 
-static void start_check_enables(struct smi_info *smi_info, bool start_timer)
+static void start_check_enables(struct smi_info *smi_info)
 {
unsigned char msg[2];
 
msg[0] = (IPMI_NETFN_APP_REQUEST << 2);
msg[1] = IPMI_GET_BMC_GLOBAL_ENABLES_CMD;
 
-   if (start_timer)
-   start_new_msg(smi_info, msg, 2);
-   else
-   smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2);
+   start_new_msg(smi_info, msg, 2);
smi_info->si_state = SI_CHECKING_ENABLES;
 }
 
-static void start_clear_flags(struct smi_info *smi_info, bool start_timer)
+static void start_clear_flags(struct smi_info *smi_info)
 {
unsigned char msg[3];
 
@@ -459,10 +461,7 @@ static void start_clear_flags(struct smi
msg[1] = IPMI_CLEAR_MSG_FLAGS_CMD;
msg[2] = WDT_PRE_TIMEOUT_INT;
 
-   if (start_timer)
-   start_new_msg(smi_info, msg, 3);
-   else
-   smi_info->handlers->start_transaction(smi_info-

[PATCH] refcount_t: documentation for memory ordering differences

2017-11-14 Thread Elena Reshetova

Some functions from refcount_t API provide different
memory ordering guarantees that their atomic counterparts.
This adds a document outlining these differences.

Signed-off-by: Elena Reshetova 
---
 Documentation/refcount-vs-atomic.txt | 124 +++
 1 file changed, 124 insertions(+)
 create mode 100644 Documentation/refcount-vs-atomic.txt

diff --git a/Documentation/refcount-vs-atomic.txt 
b/Documentation/refcount-vs-atomic.txt
new file mode 100644
index 000..e703039
--- /dev/null
+++ b/Documentation/refcount-vs-atomic.txt
@@ -0,0 +1,124 @@
+==
+refcount_t API compare to atomic_t
+==
+
+The goal of refcount_t API is to provide a minimal API for implementing
+object's reference counters. While a generic architecture-independent
+implementation from lib/refcount.c uses atomic operations underneath,
+there are a number of differences between some of the refcount_*() and
+atomic_*() functions with regards to the memory ordering guarantees.
+This document outlines the differences and provides respective examples
+in order to help maintainers validate their code against the change in
+these memory ordering guarantees.
+
+memory-barriers.txt and atomic_t.txt provide more background to the
+memory ordering in general and for atomic operations specifically.
+
+Notation
+
+
+An absence of memory ordering guarantees (i.e. fully unordered)
+in case of atomics & refcounters only provides atomicity and
+program order (po) relation (on the same CPU). It guarantees that
+each atomic_*() and refcount_*() operation is atomic and instructions
+are executed in program order on a single CPU.
+Implemented using READ_ONCE()/WRITE_ONCE() and
+compare-and-swap primitives.
+
+A strong (full) memory ordering guarantees that all prior loads and
+stores (all po-earlier instructions) on the same CPU are completed
+before any po-later instruction is executed on the same CPU.
+It also guarantees that all po-earlier stores on the same CPU
+and all propagated stores from other CPUs must propagate to all
+other CPUs before any po-later instruction is executed on the original
+CPU (A-cumulative property). Implemented using smp_mb().
+
+A RELEASE memory ordering guarantees that all prior loads and
+stores (all po-earlier instructions) on the same CPU are completed
+before the operation. It also guarantees that all po-earlier
+stores on the same CPU and all propagated stores from other CPUs
+must propagate to all other CPUs before the release operation
+(A-cumulative property). Implemented using smp_store_release().
+
+A control dependency (on success) for refcounters guarantees that
+if a reference for an object was successfully obtained (reference
+counter increment or addition happened, function returned true),
+then further stores are ordered against this operation.
+Control dependency on stores are not implemented using any explicit
+barriers, but rely on CPU not to speculate on stores. This is only
+a single CPU relation and provides no guarantees for other CPUs.
+
+
+Comparison of functions
+==
+
+case 1) - non-RMW ops
+-
+
+Function changes:
+atomic_set() --> refcount_set()
+atomic_read() --> refcount_read()
+
+Memory ordering guarantee changes:
+fully unordered --> fully unordered
+
+case 2) - increment-based ops that return no value
+--
+
+Function changes:
+atomic_inc() --> refcount_inc()
+atomic_add() --> refcount_add()
+
+Memory ordering guarantee changes:
+fully unordered --> fully unordered
+
+
+case 3) - decrement-based RMW ops that return no value
+--
+Function changes:
+atomic_dec() --> refcount_dec()
+
+Memory ordering guarantee changes:
+fully unordered --> RELEASE ordering
+
+
+case 4) - increment-based RMW ops that return a value
+-
+
+Function changes:
+atomic_inc_not_zero() --> refcount_inc_not_zero()
+no atomic counterpart --> refcount_add_not_zero()
+
+Memory ordering guarantees changes:
+fully ordered --> control dependency on success for stores
+
+*Note*: we really assume here that necessary ordering is provided as a result
+of obtaining pointer to the object!
+
+
+case 5) - decrement-based RMW ops that return a value
+-
+
+Function changes:
+atomic_dec_and_test() --> refcount_dec_and_test()
+atomic_sub_and_test() --> refcount_sub_and_test()
+no atomic counterpart --> refcount_dec_if_one()
+atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(&var)
+
+Memory ordering guarantees changes:
+fully ordered --> RELEASE o

Re: [PATCH v6 2/2] watchdog: Add Spreadtrum watchdog driver

2017-11-14 Thread Eric Long

Hi,

Thanks for Guenter's review and detail comments.
Please help to apply this patch if there is no any other comments.

Best regards,
Eric Long

On Fri, Nov 10, 2017 at 01:00:32PM -0800, Guenter Roeck wrote:
> On Mon, Nov 06, 2017 at 10:46:28AM +0800, Eric Long wrote:
> > This patch adds the watchdog driver for Spreadtrum SC9860 platform.
> > 
> > Signed-off-by: Eric Long 
> 
> Reviewed-by: Guenter Roeck 
> 
> > ---
> > Changes since v5:
> >  - Modify the "irq" type as int type.
> >  - Delete unused api sprd_wdt_is_running().
> > 
> > Changes since v4:
> >  - Remove sprd_wdt_remove().
> >  - Add devm_add_action() for sprd_wdt_disable().
> > 
> > Changes since v3:
> >  - Update Kconfig SPRD_WATCHDOG help messages.
> >  - Correct the wrong spell words.
> >  - Rename SPRD_WDT_CNT_HIGH_VALUE as SPRD_WDT_CNT_HIGH_SHIFT.
> >  - Remove unused macor.
> >  - Update sprd_wdt_set_pretimeout() api.
> >  - Add wdt->wdd.timeout default value.
> >  - Use devm_watchdog_register_device() to register wdt device.
> >  - If module does not support NOWAYOUT, disable wdt when remove this driver.
> >  - Call sprd_wdt_disable() every wdt suspend.
> > 
> > Changes since v2:
> >  - Rename all the macors, add SPRD tag at the head of the macro names.
> >  - Rename SPRD_WDT_CLK as SPRD_WTC_CNT_STEP.
> >  - Remove the code which check timeout value at the wrong place.
> >  - Add min/max timeout value limit.
> >  - Remove set WDOG_HW_RUNNING status at sprd_wdt_enable().
> >  - Add timeout/pretimeout judgment when set them.
> >  - Support WATCHDOG_NOWAYOUT status.
> > 
> > Changes since v1:
> >  - Use pretimeout instead of own implementation.
> >  - Fix timeout loop when loading timeout values.
> >  - use the infrastructure to read and set "timeout-sec" property.
> >  - Add conditions when start or stop watchdog.
> >  - Change the position of enabling watchdog.
> >  - Other optimization.
> > ---
> >  drivers/watchdog/Kconfig|   8 +
> >  drivers/watchdog/Makefile   |   1 +
> >  drivers/watchdog/sprd_wdt.c | 399 
> > 
> >  3 files changed, 408 insertions(+)
> >  create mode 100644 drivers/watchdog/sprd_wdt.c
> > 
> > diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
> > index c722cbf..3367a8c 100644
> > --- a/drivers/watchdog/Kconfig
> > +++ b/drivers/watchdog/Kconfig
> > @@ -787,6 +787,14 @@ config UNIPHIER_WATCHDOG
> >   To compile this driver as a module, choose M here: the
> >   module will be called uniphier_wdt.
> >  
> > +config SPRD_WATCHDOG
> > +   tristate "Spreadtrum watchdog support"
> > +   depends on ARCH_SPRD || COMPILE_TEST
> > +   select WATCHDOG_CORE
> > +   help
> > + Say Y here to include watchdog timer supported
> > + by Spreadtrum system.
> > +
> >  # AVR32 Architecture
> >  
> >  config AT32AP700X_WDT
> > diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
> > index 56adf9f..187cca2 100644
> > --- a/drivers/watchdog/Makefile
> > +++ b/drivers/watchdog/Makefile
> > @@ -87,6 +87,7 @@ obj-$(CONFIG_ASPEED_WATCHDOG) += aspeed_wdt.o
> >  obj-$(CONFIG_ZX2967_WATCHDOG) += zx2967_wdt.o
> >  obj-$(CONFIG_STM32_WATCHDOG) += stm32_iwdg.o
> >  obj-$(CONFIG_UNIPHIER_WATCHDOG) += uniphier_wdt.o
> > +obj-$(CONFIG_SPRD_WATCHDOG) += sprd_wdt.o
> >  
> >  # AVR32 Architecture
> >  obj-$(CONFIG_AT32AP700X_WDT) += at32ap700x_wdt.o
> > diff --git a/drivers/watchdog/sprd_wdt.c b/drivers/watchdog/sprd_wdt.c
> > new file mode 100644
> > index 000..a8b280f
> > --- /dev/null
> > +++ b/drivers/watchdog/sprd_wdt.c
> > @@ -0,0 +1,399 @@
> > +/*
> > + * Spreadtrum watchdog driver
> > + * Copyright (C) 2017 Spreadtrum - http://www.spreadtrum.com
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * version 2 as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * General Public License for more details.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define SPRD_WDT_LOAD_LOW  0x0
> > +#define SPRD_WDT_LOAD_HIGH 0x4
> > +#define SPRD_WDT_CTRL  0x8
> > +#define SPRD_WDT_INT_CLR   0xc
> > +#define SPRD_WDT_INT_RAW   0x10
> > +#define SPRD_WDT_INT_MSK   0x14
> > +#define SPRD_WDT_CNT_LOW   0x18
> > +#define SPRD_WDT_CNT_HIGH  0x1c
> > +#define SPRD_WDT_LOCK  0x20
> > +#define SPRD_WDT_IRQ_LOAD_LOW  0x2c
> > +#define SPRD_WDT_IRQ_LOAD_HIGH 0x30
> > +
> > +/* WDT_CTRL */
> > +#define SPRD_WDT_INT_EN_BITBIT(0)
> > +#define SPRD_WDT_CNT_EN_BIT

[PATCH v2] refcount_t vs. atomic_t ordering differences

2017-11-14 Thread Elena Reshetova

Changes in v2:

 - typos and english are fixed based on Randy Dunlap's
   proof reading
 - structure of document improved: 
 * definitions now in the beginning
 * confusing examples removed
 * less redundancy overall and more up-to-the-point text
 - definitions try to follow LKMM defined in
   github.com/aparri/memory-model/blob/master/Documentation/explanation.txt


Elena Reshetova (1):
  refcount_t: documentation for memory ordering differences

 Documentation/refcount-vs-atomic.txt | 124 +++
 1 file changed, 124 insertions(+)
 create mode 100644 Documentation/refcount-vs-atomic.txt

-- 
2.7.4

Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again

2017-11-14 Thread Greg Kroah-Hartman

On Wed, Nov 15, 2017 at 08:43:58AM +0100, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki  wrote:
> 
> > On Wednesday, November 15, 2017 1:06:12 AM CET Linus Torvalds wrote:
> > > On Tue, Nov 14, 2017 at 4:04 PM, Linus Torvalds
> > >  wrote:
> > > > On Tue, Nov 14, 2017 at 3:53 PM, Thomas Gleixner  
> > > > wrote:
> > > >> Current head + Raphaels patch:
> > > >>
> > > >> real0m0.029s
> > > >> user0m0.000s
> > > >> sys 0m0.010s
> > > >>
> > > >> So that patch is actually slower.
> > > >
> > > > Oh it definitely is expected to be slower, because it does the IPI to
> > > > all the cores and actually gets their frequency right.
> > > >
> > > > It was the old one that we had to revert (because it did so
> > > > sequentially) that was really bad, and took something like 2+ seconds
> > > > on Ingo's 160-core thing, iirc.
> > > 
> > > Looked it up. Ingo's machine "only" had 120 cores, and he said
> > > 
> > > fomalhaut:~> time cat /proc/cpuinfo  >/dev/null
> > > real0m2.689s
> > > 
> > > for the bad serial case, so yeah, it looks "a bit" better than it was ;)
> > 
> > OK, so may I queue it up?
> > 
> > I don't think I can get that to work substantially faster anyway ...
> 
> The new version is OK I suppose:
> 
>   Acked-by: Ingo Molnar 
> 
> I also think that /proc/cpuinfo is a pretty bad interface for many uses - I 
> personally only very rarely need the cpuinfo of _all_ CPUs.
> 
> We we should eventually have /proc/cpu/N/info or so, so that 99% of the times 
> cpuinfo is needed to report bugs we can do:
> 
>   cat /proc/cpu/0/info
> 
> With maybe also the following variants:
> 
>   /proc/cpu/first/
>   /proc/cpu/last/
>   /proc/cpu/current/
> 
> ... to the first/last/current CPUs.

We started to move this info into /sys/devices/cpu/ in individual files,
but that got stalled due to a lack of review and general "freak out" by
the ARM maintainers :)

Hopefully that patch set will come back soon so people can review it
properly.

thanks,

greg k-h

Re: [PATCH 02/10] dmaengine: virt-dma: Support for race free transfer termination

2017-11-14 Thread Linus Walleij

On Tue, Nov 14, 2017 at 3:32 PM, Peter Ujfalusi  wrote:

> Even with the introduced vchan_synchronize() we can face race when
> terminating a cyclic transfer.
>
> If the terminate_all is called after the interrupt handler called
> vchan_cyclic_callback(), but before the vchan_complete tasklet is called:
> vc->cyclic is set to the cyclic descriptor, but the descriptor itself was
> freed up in the driver's terminate_all() callback.
> When the vhan_complete() is executed it will try to fetch the vc->cyclic
> vdesc, but the pointer is pointing now to uninitialized memory leading to
> (hard to reproduce) kernel crash.
>
> In order to fix this, drivers should:
> - call vchan_terminate_vdesc() from their terminate_all callback instead
> calling their free_desc function to free up the descriptor.
> - implement device_synchronize callback and call vchan_synchronize().
>
> This way we can make sure that the descriptor is only going to be freed up
> after the vchan_callback was executed in a safe manner.
>
> Signed-off-by: Peter Ujfalusi 

Reviewed-by: Linus Walleij 

Yours,
Linus Walleij

Re: [GIT PULL] x86 updates for v4.15

2017-11-14 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Tue, Nov 14, 2017 at 1:48 AM, Borislav Petkov  wrote:
> >
> > Just did 2 suspend cycles (once to RAM and once to disk) on my x230
> > with your tree from right now and it looks ok so far. So it could be
> > machine- and config-specific...
> 
> .. and it's not repeatable for me. I rebooted pretty quickly, and
> didn't gather a lot of information (well, 'dmesg' would SIBGUS, so..)
> and it hasn't happened again.
> 
> Will ignore until I have more information.

Haven't seen such behavior or got such reports - although admittedly laptop 
suspend/resume testing is done only sporadically, as it isn't easily automated.

As per the symptoms one thing that _could_ produce SIGSEGVs are the 
CONFIG_X86_INTEL_UMIP changes: the upcoming changes that make any UMIP action 
more 
verbose should make it more apparent if that's the case.

Plus, of course, anything entry code related. We did a few harmless 
(looking...) 
x86/mm changes as well, but none stands out at the moment.

Thanks,

Ingo

Re: [PATCH 07/10] dmaengine: amba-pl08x: Use vchan_terminate_vdesc() instead of desc_free

2017-11-14 Thread Linus Walleij

On Tue, Nov 14, 2017 at 3:32 PM, Peter Ujfalusi  wrote:

> To avoid race with vchan_complete, use the race free way to terminate
> running transfer.
>
> Implement the device_synchronize callback to make sure that the terminated
> descriptor is freed.
>
> CC: Linus Walleij 
> Signed-off-by: Peter Ujfalusi 

I had to read patch 1 before I understood how the descriptor
gets free:ed now, but now I see it :)
Reviewed-by: Linus Walleij 

(Good work with hunting down these corner cases, I'm
very happy you're doing this!)

Yours,
Linus Walleij

Re: [PATCH 01/10] dmaengine: virt-dma: Add helper to free/reuse a descriptor

2017-11-14 Thread Linus Walleij

On Tue, Nov 14, 2017 at 3:32 PM, Peter Ujfalusi  wrote:

> The vchan_vdesc_fini() can be used to free or reuse a given descriptor
> after it has been marked as completed.
>
> Signed-off-by: Peter Ujfalusi 

Reviewed-by: Linus Walleij 

Yours,
Linus Walleij

Re: [f2fs-dev] [PATCH] f2fs: expose quota information in debugfs

2017-11-14 Thread Chao Yu

Hi Jaegeuk,

On 2017/11/14 13:12, Jaegeuk Kim wrote:
> This patch shows # of dirty pages and # of hidden quota files.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/debug.c | 11 +++
>  fs/f2fs/f2fs.h  | 10 --
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> index f7eec506ceea..ecada8425268 100644
> --- a/fs/f2fs/debug.c
> +++ b/fs/f2fs/debug.c
> @@ -45,9 +45,18 @@ static void update_general_status(struct f2fs_sb_info *sbi)
>   si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
>   si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
>   si->ndirty_data = get_pages(sbi, F2FS_DIRTY_DATA);
> + si->ndirty_qdata = get_pages(sbi, F2FS_DIRTY_QDATA);
>   si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
>   si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
>   si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> +
> + si->nquota_files = 0;
> + if (f2fs_sb_has_quota_ino(sbi->sb)) {
> + for (i = 0; i < MAXQUOTAS; i++) {
> + if (f2fs_qf_ino(sbi->sb, i))
> + si->nquota_files++;
> + }
> + }
>   si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
>   si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES);
>   si->aw_cnt = atomic_read(&sbi->aw_cnt);
> @@ -369,6 +378,8 @@ static int stat_show(struct seq_file *s, void *v)
>  si->ndirty_dent, si->ndirty_dirs, si->ndirty_all);
>   seq_printf(s, "  - datas: %4d in files:%4d\n",
>  si->ndirty_data, si->ndirty_files);
> + seq_printf(s, "  - quota datas: %4d in quota files:%4d\n",
> +si->ndirty_qdata, si->nquota_files);
>   seq_printf(s, "  - meta: %4d in %4d\n",
>  si->ndirty_meta, si->meta_pages);
>   seq_printf(s, "  - imeta: %4d\n",
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 5c379a8ea075..44f874483ecf 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -865,6 +865,7 @@ struct f2fs_sm_info {
>  enum count_type {
>   F2FS_DIRTY_DENTS,
>   F2FS_DIRTY_DATA,
> + F2FS_DIRTY_QDATA,
>   F2FS_DIRTY_NODES,
>   F2FS_DIRTY_META,
>   F2FS_INMEM_PAGES,
> @@ -1642,6 +1643,8 @@ static inline void inode_inc_dirty_pages(struct inode 
> *inode)
>   atomic_inc(&F2FS_I(inode)->dirty_pages);
>   inc_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ?
>   F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA);
> + if (IS_NOQUOTA(inode))

If we're trying to get quota sysfile information, how about using sysfile ino
for distinguishing from normal file?

Thanks,

> + inc_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA);
>  }
>  
>  static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type)
> @@ -1658,6 +1661,8 @@ static inline void inode_dec_dirty_pages(struct inode 
> *inode)
>   atomic_dec(&F2FS_I(inode)->dirty_pages);
>   dec_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ?
>   F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA);
> + if (IS_NOQUOTA(inode))
> + dec_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA);
>  }
>  
>  static inline s64 get_pages(struct f2fs_sb_info *sbi, int count_type)
> @@ -2771,9 +2776,10 @@ struct f2fs_stat_info {
>   unsigned long long hit_largest, hit_cached, hit_rbtree;
>   unsigned long long hit_total, total_ext;
>   int ext_tree, zombie_tree, ext_node;
> - int ndirty_node, ndirty_dent, ndirty_meta, ndirty_data, ndirty_imeta;
> + int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
> + int ndirty_data, ndirty_qdata;
>   int inmem_pages;
> - unsigned int ndirty_dirs, ndirty_files, ndirty_all;
> + unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
>   int nats, dirty_nats, sits, dirty_sits;
>   int free_nids, avail_nids, alloc_nids;
>   int total_count, utilization;
>

Re: [PATCH] samples: replace FSF address with web source in license notices

2017-11-14 Thread Martin Kepplinger


Am 15.11.2017 07:29 schrieb Greg KH:

On Tue, Nov 14, 2017 at 10:50:37AM +0100, Martin Kepplinger wrote:
A few years ago the FSF moved and "59 Temple Place" is wrong. Having 
this

still in our source files feels old and unmaintained.

Let's take the license statement serious and not confuse users.

As https://www.gnu.org/licenses/gpl-howto.html suggests, we replace 
the

postal address with "" in the samples
directory.


What would be best is to just put the SPDX single line at the top of 
the

files, and then remove this license "boilerplate" entirely.  I've
started to do that with some subsystems already (drivers/usb/ and
drivers/tty/ are almost finished, see Linus's tree for details), and
I've sent out a patch series for drivers/s390/ yesterday if you want to
see an example of how to do it.

Could you do that here instead of this patch as well?



Is there consensus about this? I'm not a layer, but is this clear enough 
for
useres? And what holds against only adding the new SPDX tag line at the 
top?


Other than I don't like mixing // and /**/ comments, it indeed looks
quite clean. Is there consensus about the syntax too?

thanks

   martin

Re: [RFC PATCH v3 for 4.15 08/24] Provide cpu_opv system call

2017-11-14 Thread Michael Kerrisk (man-pages)

Hi Matthieu

On 14 November 2017 at 21:03, Mathieu Desnoyers
 wrote:
> This new cpu_opv system call executes a vector of operations on behalf
> of user-space on a specific CPU with preemption disabled. It is inspired
> from readv() and writev() system calls which take a "struct iovec" array
> as argument.

Do you have a man page spfr this syscall already?

Thanks,

Michael


> The operations available are: comparison, memcpy, add, or, and, xor,
> left shift, right shift, and mb. The system call receives a CPU number
> from user-space as argument, which is the CPU on which those operations
> need to be performed. All preparation steps such as loading pointers,
> and applying offsets to arrays, need to be performed by user-space
> before invoking the system call. The "comparison" operation can be used
> to check that the data used in the preparation step did not change
> between preparation of system call inputs and operation execution within
> the preempt-off critical section.
>
> The reason why we require all pointer offsets to be calculated by
> user-space beforehand is because we need to use get_user_pages_fast() to
> first pin all pages touched by each operation. This takes care of
> faulting-in the pages. Then, preemption is disabled, and the operations
> are performed atomically with respect to other thread execution on that
> CPU, without generating any page fault.
>
> A maximum limit of 16 operations per cpu_opv syscall invocation is
> enforced, so user-space cannot generate a too long preempt-off critical
> section. Each operation is also limited a length of PAGE_SIZE bytes,
> meaning that an operation can touch a maximum of 4 pages (memcpy: 2
> pages for source, 2 pages for destination if addresses are not aligned
> on page boundaries). Moreover, a total limit of 4216 bytes is applied
> to operation lengths.
>
> If the thread is not running on the requested CPU, a new
> push_task_to_cpu() is invoked to migrate the task to the requested CPU.
> If the requested CPU is not part of the cpus allowed mask of the thread,
> the system call fails with EINVAL. After the migration has been
> performed, preemption is disabled, and the current CPU number is checked
> again and compared to the requested CPU number. If it still differs, it
> means the scheduler migrated us away from that CPU. Return EAGAIN to
> user-space in that case, and let user-space retry (either requesting the
> same CPU number, or a different one, depending on the user-space
> algorithm constraints).
>
> Signed-off-by: Mathieu Desnoyers 
> CC: "Paul E. McKenney" 
> CC: Peter Zijlstra 
> CC: Paul Turner 
> CC: Thomas Gleixner 
> CC: Andrew Hunter 
> CC: Andy Lutomirski 
> CC: Andi Kleen 
> CC: Dave Watson 
> CC: Chris Lameter 
> CC: Ingo Molnar 
> CC: "H. Peter Anvin" 
> CC: Ben Maurer 
> CC: Steven Rostedt 
> CC: Josh Triplett 
> CC: Linus Torvalds 
> CC: Andrew Morton 
> CC: Russell King 
> CC: Catalin Marinas 
> CC: Will Deacon 
> CC: Michael Kerrisk 
> CC: Boqun Feng 
> CC: linux-...@vger.kernel.org
> ---
>
> Changes since v1:
> - handle CPU hotplug,
> - cleanup implementation using function pointers: We can use function
>   pointers to implement the operations rather than duplicating all the
>   user-access code.
> - refuse device pages: Performing cpu_opv operations on io map'd pages
>   with preemption disabled could generate long preempt-off critical
>   sections, which leads to unwanted scheduler latency. Return EFAULT if
>   a device page is received as parameter
> - restrict op vector to 4216 bytes length sum: Restrict the operation
>   vector to length sum of:
>   - 4096 bytes (typical page size on most architectures, should be
> enough for a string, or structures)
>   - 15 * 8 bytes (typical operations on integers or pointers).
>   The goal here is to keep the duration of preempt off critical section
>   short, so we don't add significant scheduler latency.
> - Add INIT_ONSTACK macro: Introduce the
>   CPU_OP_FIELD_u32_u64_INIT_ONSTACK() macros to ensure that users
>   correctly initialize the upper bits of CPU_OP_FIELD_u32_u64() on their
>   stack to 0 on 32-bit architectures.
> - Add CPU_MB_OP operation:
>   Use-cases with:
>   - two consecutive stores,
>   - a mempcy followed by a store,
>   require a memory barrier before the final store operation. A typical
>   use-case is a store-release on the final store. Given that this is a
>   slow path, just providing an explicit full barrier instruction should
>   be sufficient.
> - Add expect fault field:
>   The use-case of list_pop brings interesting challenges. With rseq, we
>   can use rseq_cmpnev_storeoffp_load(), and therefore load a pointer,
>   compare it against NULL, add an offset, and load the target "next"
>   pointer from the object, all within a single req critical section.
>
>   Life is not so easy for cpu_opv in this use-case, mainly because we
>   need to pin all pages we are going to touch in the preempt-off
>   critical section beforehand. So we need to kno

Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again

2017-11-14 Thread Ingo Molnar


* Rafael J. Wysocki  wrote:

> On Wednesday, November 15, 2017 1:06:12 AM CET Linus Torvalds wrote:
> > On Tue, Nov 14, 2017 at 4:04 PM, Linus Torvalds
> >  wrote:
> > > On Tue, Nov 14, 2017 at 3:53 PM, Thomas Gleixner  
> > > wrote:
> > >> Current head + Raphaels patch:
> > >>
> > >> real0m0.029s
> > >> user0m0.000s
> > >> sys 0m0.010s
> > >>
> > >> So that patch is actually slower.
> > >
> > > Oh it definitely is expected to be slower, because it does the IPI to
> > > all the cores and actually gets their frequency right.
> > >
> > > It was the old one that we had to revert (because it did so
> > > sequentially) that was really bad, and took something like 2+ seconds
> > > on Ingo's 160-core thing, iirc.
> > 
> > Looked it up. Ingo's machine "only" had 120 cores, and he said
> > 
> > fomalhaut:~> time cat /proc/cpuinfo  >/dev/null
> > real0m2.689s
> > 
> > for the bad serial case, so yeah, it looks "a bit" better than it was ;)
> 
> OK, so may I queue it up?
> 
> I don't think I can get that to work substantially faster anyway ...

The new version is OK I suppose:

  Acked-by: Ingo Molnar 

I also think that /proc/cpuinfo is a pretty bad interface for many uses - I 
personally only very rarely need the cpuinfo of _all_ CPUs.

We we should eventually have /proc/cpu/N/info or so, so that 99% of the times 
cpuinfo is needed to report bugs we can do:

cat /proc/cpu/0/info

With maybe also the following variants:

/proc/cpu/first/
/proc/cpu/last/
/proc/cpu/current/

... to the first/last/current CPUs.

Thanks,

Ingo

Re: [PATCH] gpio: always include linux/gpio/consumer.h in linux/gpio.h

2017-11-14 Thread Linus Walleij

On Tue, Nov 14, 2017 at 12:39 PM, Arnd Bergmann  wrote:

> linux/gpio/consumer.h is a bit odd, it contains definitions for a number
> of the advanced gpio interfaces, in variants for both gpiolib-based
> platforms and those not using gpiolib.
>
> The file gets included implicitly by linux/gpio.h, but only if gpiolib
> is enabled. Driver writers regularly fail to notice this and include
> the top-level linux/gpio.h but use the newer interfaces.
>
> The latest such driver is a new touchscreen driver that produced this
> build failure on an x86 randconfig build:
>
> drivers/input/touchscreen/hideep.c: In function 'hideep_power_on':
> drivers/input/touchscreen/hideep.c:670:3: error: implicit declaration of 
> function 'gpiod_set_value_cansleep'; did you mean 'gpio_set_value_cansleep'? 
> [-Werror=implicit-function-declaration]
>gpiod_set_value_cansleep(ts->reset_gpio, 0);
>
> I don't want to manually add linux/gpio/consumer.h inclusions to each
> such file any more, so let's just include this in linux/gpio.h for everyone.

Consumers should really just use 
and stop including  at all.

 does not have the producer/consumer split
that the new API has, and the latter was inspired by
 and 
etc.

I.e. the right fix is not just to add #include 
but also *delete* #include 

The only time a driver need both includes is when they
use the legacy GPIO API and the new consumer API
at the same time. Or if they both produce and consume
GPIOs (such as some GPIO drivers do).

I don't know what to do besides documenting it, and it is
documented clearly in:
Documentation/gpio/consumer.txt

Apparently people write their drivers for GPIO without reading
this documentation and just including random headers or
copy-pasting.

I am trying to make more drivers good examples, one at a
time, starting with the most important and used ones.
drivers/gpio/busses/i2c-gpio.c is the most recent cleanup.

We can't delete the inclusion of  from
 however much we wanted to, because it breaks
a ton of legacy code. Instead we move one step at the time.

What we *could* do is try to emit a build warning for drivers
that use the implicit include of 
from .

Or add some code to checkpatch to scream about it.

Ideas?

Yours,
Linus Walleij

Re: [PATCH v3 4/6] PM / core: Add helpers for subsystem callback selection

2017-11-14 Thread Ulf Hansson

On 12 November 2017 at 01:42, Rafael J. Wysocki  wrote:
> From: Rafael J. Wysocki 
>
> Add helper routines to find and return a suitable subsystem callback
> during the "noirq" phases of system suspend/resume (or analogous)
> transitions as well as during the "late" phase of system suspend and
> the "early" phase of system resume (or analogous) transitions.
>
> The helpers will be called from additional sites going forward.
>
> Signed-off-by: Rafael J. Wysocki 

With a minor nitpick, see below, feel free to add:

Reviewed-by: Ulf Hansson 

> ---
>
> v2 -> v3: No changes.
>
> ---
>  drivers/base/power/main.c |  196 
> +++---
>  1 file changed, 136 insertions(+), 60 deletions(-)
>
> Index: linux-pm/drivers/base/power/main.c
> ===
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -525,6 +525,14 @@ static void dpm_watchdog_clear(struct dp
>  #define dpm_watchdog_clear(x)
>  #endif
>
> +static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
> +pm_message_t state,
> +const char **info_p);
> +
> +static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
> +   pm_message_t state,
> +   const char **info_p);
> +

There is no need to declare these functions.

Perhaps a following patch in the series need them, but then that
change should add these or even better (in my opinion) just move the
implementations and avoid the declarations all together.

[...]

Kind regards
Uffe

[PATCH v2] arm64: perf: remove unsupported events for Cortex-A73

2017-11-14 Thread Xu YiPing

bus access read/write events are not supported in A73, based on the
Cortex-A73 TRM r0p2, section 11.9 Events (pages 11-457 to 11-460).

Fixes: 5561b6c5e981 "arm64: perf: add support for Cortex-A73"
Signed-off-by: Xu YiPing 
---
 arch/arm64/kernel/perf_event.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 9eaef51..3affca3 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -262,12 +262,6 @@ static const unsigned 
armv8_a73_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 
[C(L1D)][C(OP_READ)][C(RESULT_ACCESS)]  = 
ARMV8_IMPDEF_PERFCTR_L1D_CACHE_RD,
[C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)] = 
ARMV8_IMPDEF_PERFCTR_L1D_CACHE_WR,
-
-   [C(NODE)][C(OP_READ)][C(RESULT_ACCESS)] = 
ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_RD,
-   [C(NODE)][C(OP_WRITE)][C(RESULT_ACCESS)] = 
ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_WR,
-
-   [C(NODE)][C(OP_READ)][C(RESULT_ACCESS)] = 
ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_RD,
-   [C(NODE)][C(OP_WRITE)][C(RESULT_ACCESS)] = 
ARMV8_IMPDEF_PERFCTR_BUS_ACCESS_WR,
 };
 
 static const unsigned armv8_thunder_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
-- 
2.7.4

Re: [PATCH v3 1/3] leds: core: Introduce generic pattern interface

2017-11-14 Thread Greg KH

On Tue, Nov 14, 2017 at 11:13:43PM -0800, Bjorn Andersson wrote:
> Some LED controllers have support for autonomously controlling
> brightness over time, according to some preprogrammed pattern or
> function.
> 
> This adds a new optional operator that LED class drivers can implement
> if they support such functionality as well as a new device attribute to
> configure the pattern for a given LED.
> 
> Signed-off-by: Bjorn Andersson 
> ---
> 
> Changes since v2:
> - None
> 
> Changes since v1:
> - New patch, based on discussions following v1
> 
>  Documentation/ABI/testing/sysfs-class-led |  20 
>  drivers/leds/led-class.c  | 150 
> ++
>  include/linux/leds.h  |  21 +
>  3 files changed, 191 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-led 
> b/Documentation/ABI/testing/sysfs-class-led
> index 5f67f7ab277b..74a7f5b1f89b 100644
> --- a/Documentation/ABI/testing/sysfs-class-led
> +++ b/Documentation/ABI/testing/sysfs-class-led
> @@ -61,3 +61,23 @@ Description:
>   gpio and backlight triggers. In case of the backlight trigger,
>   it is useful when driving a LED which is intended to indicate
>   a device in a standby like state.
> +
> +What:/sys/class/leds//pattern
> +Date:July 2017

That was many months ago :)

> +KernelVersion:   4.14

And that kernel version is long since released :)

thanks,

greg k-h

Re: [PATCH 1/2] bpf: add a bpf_override_function helper

2017-11-14 Thread Ingo Molnar


* Josef Bacik  wrote:

> > > Then 'not crashing kernel' requirement will be preserved.
> > > btrfs or whatever else we will be testing with override_return
> > > will be functioning in 'stress test' mode and if bpf program
> > > is not careful and returns error all the time then one particular
> > > subsystem (like btrfs) will not be functional, but the kernel
> > > will not be crashing.
> > > Thoughts?
> > 
> > Yeah, that approach sounds much better to me: it should be fundamentally be 
> > opt-in, and should be documented that it should not be possible to crash 
> > the 
> > kernel via changing the return value.
> > 
> > I'd make it a bit clearer in the naming what the purpose of the annotation 
> > is: for 
> > example would BPF_ALLOW_ERROR_INJECTION() work for you guys? I.e. I think 
> > it 
> > should generally be used to change actual integer error values - or at most 
> > user 
> > pointers, but not kernel pointers. Not enforced in a type safe manner, but 
> > the 
> > naming should give enough hints?
> > 
> > Such return-injection BFR programs can still totally confuse user-space 
> > obviously: 
> > for example returning an IO error could corrupt application data - but 
> > that's the 
> > nature of such facilities and similar results could already be achieved via 
> > ptrace 
> > as well. But the result of a BPF program should never be _worse_ than 
> > ptrace, in 
> > terms of kernel integrity.
> > 
> > Note that with such a safety mechanism in place no kernel message has to be 
> > generated either I suspect.
> > 
> > In any case, my NAK would be lifted with such an approach.
> 
> I'm going to want to annotate kmalloc, so it's still going to be possible to
> make things go horribly wrong, is this still going to be ok with you?  
> Obviously
> I want to use this for btrfs, but really what I used this for originally was 
> an
> NBD problem where I had to do special handling for getting EINTR back from
> kernel_sendmsg, which was a pain to trigger properly without this patch.  
> Opt-in
> is going to make it so we're just flagging important function calls anwyay
> because those are the ones that fail rarely and that we want to test, which 
> puts
> us back in the same situation you are worried about, so it doesn't make much
> sense to me to do it this way.  Thanks,

I suppose - let's see how it goes? The important factor is the opt-in aspect I 
believe.

Technically the kernel should never crash on a kmalloc() failure either, 
although 
obviously things can go horribly wrong from user-space's perspective.

Thanks,

Ingo

Re: [GIT PULL] sound updates for 4.15-rc1

2017-11-14 Thread Takashi Iwai

[Adding more people and alsa-devel to Cc]

On Wed, 15 Nov 2017 03:40:09 +0100,
Linus Torvalds wrote:
> 
> On Tue, Nov 14, 2017 at 6:51 AM, Takashi Iwai  wrote:
> >
> > please pull sound updates for v4.15-rc1 from:
> 
> Hmm. Making "oldconfig" on my laptop with this, my
> SND_SOC_INTEL_SKYLAKE went away.
> 
> And the reason seems to be that new SND_SOC_INTEL_SST_TOPLEVEL config option.
> 
> Which has no help associated with it.
> 
> This is not a friendly thing to do to people. It basically breaks
> existing setups for no documented reason, and with no explanation.
> 
> Please fix the config situation. At the very least, add documentation.

Sorry about that.  I saw Vinod already submitted a patch to add the
help text to CONFIG_SND_SOC_INTEL_SST_TOPLEVEL, so the least fix
should go in soon.

But now looking at these changes, I noticed a few things, too:

- With the introduction of SND_SOC_INTEL_SST_TOPLEVEL, keeping
  SND_SOC_INTEL_COMMON and SND_SOC_INTEL_MACH individually doesn't
  make much sense.  They can be dropped and replaced with
  SND_SOC_INTEL_SST_TOPLEVEL as a further cleanup.

- ... or, make SND_SOC_INTEL_SST_TOPLEVEL=y as default, if this is
  considered to be a top-level filter config (like the network vendor
  kconfig items).  In that case, the reverse-selection of
  SND_SOC_INTEL_COMMON and SND_SOC_INTEL_MACH should be avoided, but
  they should be selected from the actual drivers instead.

And I believe there are a few more possible cleanups / fixes in the
messy Intel ASoC Kconfigs.  For example, SND_SOC_INTEL_SST is almost
always set.  The only exception is via SND_SST_ATOM_HIFI2_PLATFORM.
But all machine drivers using Atom Hifi2 do set SND_SST_IPC_ACPI,
which also requires SND_SOC_INTEL_SST.

Further looking at this, we see that the only entry that does *not*
require SND_SOC_INTEL_SST is the case with SND_MFLD_MACHINE in
sound/soc/intel/boards.  And now more interesting part -- there is no
corresponding entry in Makefile.  That is, this kconfig is effectively
dead!  The source code mfld_machine.c exists, but it's just a place
holder now.  The code was supposed to be integrated into atom
directory by the commit b97169da0699, but it seems forgotten to be
updated.

Hmm...

Takashi

[RFC PATCH 5/5] ARM: dts: rockchip: add isp node for rk3288

2017-11-14 Thread Jacob Chen

From: Jacob Chen 

rk3288 have a Embedded 13M ISP and MIPI-CSI2 interface.

Signed-off-by: Jacob Chen 
---
 arch/arm/boot/dts/rk3288.dtsi | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/arm/boot/dts/rk3288.dtsi b/arch/arm/boot/dts/rk3288.dtsi
index 60658c5c9a48..f9a81137146d 100644
--- a/arch/arm/boot/dts/rk3288.dtsi
+++ b/arch/arm/boot/dts/rk3288.dtsi
@@ -962,6 +962,30 @@
status = "disabled";
};
 
+   isp: isp@ff91 {
+   compatible = "rockchip,rk3288-cif-isp";
+   reg = <0x0 0xff91 0x0 0x4000>;
+   interrupts = ;
+   clocks = <&cru SCLK_ISP>, <&cru ACLK_ISP>,
+<&cru HCLK_ISP>, <&cru PCLK_ISP_IN>,
+<&cru SCLK_ISP_JPE>;
+   clock-names = "clk_isp", "aclk_isp",
+ "hclk_isp", "pclk_isp_in",
+ "sclk_isp_jpe";
+   assigned-clocks = <&cru SCLK_ISP>;
+   assigned-clock-rates = <4>;
+   power-domains = <&power RK3288_PD_VIO>;
+   iommus = <&isp_mmu>;
+   status = "disabled";
+   isp_mipi_phy_rx0: isp-mipi-phy-rx0 {
+   compatible = "rockchip,rk3288-mipi-dphy";
+   rockchip,grf = <&grf>;
+   clocks = <&cru SCLK_MIPIDSI_24M>, <&cru PCLK_MIPI_CSI>;
+   clock-names = "dphy-ref", "pclk";
+   status = "disabled";
+   };
+   };
+
isp_mmu: iommu@ff914000 {
compatible = "rockchip,iommu";
reg = <0x0 0xff914000 0x0 0x100>, <0x0 0xff915000 0x0 0x100>;
-- 
2.14.2

[RFC PATCH 4/5] arm64: dts: rockchip: add isp0 node for rk3399

2017-11-14 Thread Jacob Chen

From: Shunqian Zheng 

rk3399 have two ISP, but we havn't test isp1, so just add isp0 at present.

Signed-off-by: Shunqian Zheng 
Signed-off-by: Jacob Chen 
---
 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index ab7629c5b856..f696e62d09dd 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -1577,6 +1577,32 @@
status = "disabled";
};
 
+   isp0: isp0@ff91 {
+   compatible = "rockchip,rk3399-cif-isp";
+   reg = <0x0 0xff91 0x0 0x4000>;
+   interrupts = ;
+   clocks = <&cru SCLK_ISP0>,
+<&cru ACLK_ISP0>, <&cru ACLK_ISP0_WRAPPER>,
+<&cru HCLK_ISP0>, <&cru HCLK_ISP0_WRAPPER>;
+   clock-names = "clk_isp",
+ "aclk_isp", "aclk_isp_wrap",
+ "hclk_isp", "hclk_isp_wrap";
+   power-domains = <&power RK3399_PD_ISP0>;
+   iommus = <&isp0_mmu>;
+   status = "disabled";
+
+   isp_mipi_dphy_rx0: isp-mipi-dphy-rx0 {
+   compatible = "rockchip,rk3399-mipi-dphy";
+   rockchip,grf = <&grf>;
+   clocks = <&cru SCLK_MIPIDPHY_REF>,
+<&cru SCLK_DPHY_RX0_CFG>,
+<&cru PCLK_VIO_GRF>;
+   clock-names = "dphy-ref", "dphy-cfg", "grf";
+   power-domains = <&power RK3399_PD_VIO>;
+   status = "disabled";
+   };
+   };
+
isp0_mmu: iommu@ff914000 {
compatible = "rockchip,iommu";
reg = <0x0 0xff914000 0x0 0x100>, <0x0 0xff915000 0x0 0x100>;
-- 
2.14.2

[RFC PATCH 2/5] media: rkisp1: Add user space ABI definitions

2017-11-14 Thread Jacob Chen

From: Jeffy Chen 

Add the header for userspace

Signed-off-by: Jeffy Chen 
Signed-off-by: Jacob Chen 
---
 include/uapi/linux/rkisp1-config.h | 554 +
 1 file changed, 554 insertions(+)
 create mode 100644 include/uapi/linux/rkisp1-config.h

diff --git a/include/uapi/linux/rkisp1-config.h 
b/include/uapi/linux/rkisp1-config.h
new file mode 100644
index ..a801fbc9ef47
--- /dev/null
+++ b/include/uapi/linux/rkisp1-config.h
@@ -0,0 +1,554 @@
+/*
+ * Rockchip isp1 driver
+ *
+ * Copyright (C) 2017 Rockchip Electronics Co., Ltd.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UAPI_RKISP1_CONFIG_H
+#define _UAPI_RKISP1_CONFIG_H
+
+#include 
+#include 
+
+#define CIFISP_MODULE_DPCC  (1 << 0)
+#define CIFISP_MODULE_BLS   (1 << 1)
+#define CIFISP_MODULE_SDG   (1 << 2)
+#define CIFISP_MODULE_HST   (1 << 3)
+#define CIFISP_MODULE_LSC   (1 << 4)
+#define CIFISP_MODULE_AWB_GAIN  (1 << 5)
+#define CIFISP_MODULE_FLT   (1 << 6)
+#define CIFISP_MODULE_BDM   (1 << 7)
+#define CIFISP_MODULE_CTK   (1 << 8)
+#define CIFISP_MODULE_GOC   (1 << 9)
+#define CIFISP_MODULE_CPROC (1 << 10)
+#define CIFISP_MODULE_AFC   (1 << 11)
+#define CIFISP_MODULE_AWB   (1 << 12)
+#define CIFISP_MODULE_IE(1 << 13)
+#define CIFISP_MODULE_AEC   (1 << 14)
+#define CIFISP_MODULE_WDR   (1 << 15)
+#define CIFISP_MODULE_DPF   (1 << 16)
+#define CIFISP_MODULE_DPF_STRENGTH  (1 << 17)
+
+#define CIFISP_CTK_COEFF_MAX0x100
+#define CIFISP_CTK_OFFSET_MAX   0x800
+
+#define CIFISP_AE_MEAN_MAX  25
+#define CIFISP_HIST_BIN_N_MAX   16
+#define CIFISP_AFM_MAX_WINDOWS  3
+#define CIFISP_DEGAMMA_CURVE_SIZE   17
+
+#define CIFISP_BDM_MAX_TH   0xFF
+
+/* maximum value for horizontal start address */
+#define CIFISP_BLS_START_H_MAX 0x0FFF
+/* maximum value for horizontal stop address */
+#define CIFISP_BLS_STOP_H_MAX  0x0FFF
+/* maximum value for vertical start address */
+#define CIFISP_BLS_START_V_MAX 0x0FFF
+/* maximum value for vertical stop address */
+#define CIFISP_BLS_STOP_V_MAX  0x0FFF
+/* maximum is 2^18 = 262144*/
+#define CIFISP_BLS_SAMPLES_MAX 0x0012
+/* maximum value for fixed black level */
+#define CIFISP_BLS_FIX_SUB_MAX 0x0FFF
+/* minimum value for fixed black level */
+#define CIFISP_BLS_FIX_SUB_MIN 0xF000
+/* 13 bit range (signed)*/
+#define CIFISP_BLS_FIX_MASK0x1FFF
+/* AWB */
+#define CIFISP_AWB_MAX_GRID1
+#define CIFISP_AWB_MAX_FRAMES  7
+
+/* Gamma out*/
+/* Maximum number of color samples supported */
+#define CIFISP_GAMMA_OUT_MAX_SAMPLES   17
+
+/* LSC */
+#define CIFISP_LSC_GRAD_TBL_SIZE   8
+#define CIFISP_LSC_SIZE_TBL_SIZE   8
+/*
+ * The following matches the tuning process,
+ * not the max capabilities of the chip.
+ * Last value unused.
+ */
+#defineCIFISP_LSC_DATA_TBL_SIZE   290
+/* HIST */
+/* Last 3 values unused. */
+#define CIFISP_HISTOGRAM_WEIGHT_GRIDS_SIZE 28
+
+/* DPCC */
+#define CIFISP_DPCC_METHODS_MAX   3
+
+/* DPF */
+#define CIFISP_DPF_MAX_NLF_COEFFS  17
+#define CIFISP_DPF_MAX_SPATIAL_COEFFS  6
+
+/* measurement types */
+#define CIFISP_STAT_AWB   (1 << 0)
+#define CIFISP_STAT_AUTOEXP   (1 << 1)
+#define CIFISP_STAT_AFM_FIN   (1 << 2)
+#define CIFISP_STAT_HIST

[RFC PATCH 1/5] media: videodev2.h, v4l2-ioctl: add rkisp1 meta buffer format

2017-11-14 Thread Jacob Chen

From: Shunqian Zheng 

Add the Rockchip ISP1 specific processing parameter format
V4L2_META_FMT_RK_ISP1_PARAMS and metadata format
V4L2_META_FMT_RK_ISP1_STAT_3A for 3A.

Signed-off-by: Shunqian Zheng 
Signed-off-by: Jacob Chen 
---
 drivers/media/v4l2-core/v4l2-ioctl.c | 2 ++
 include/uapi/linux/videodev2.h   | 4 
 2 files changed, 6 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index d6587b3ec33e..0604ae9ea444 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1252,6 +1252,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
case V4L2_TCH_FMT_TU08: descr = "8-bit unsigned touch data"; 
break;
case V4L2_META_FMT_VSP1_HGO:descr = "R-Car VSP1 1-D Histogram"; 
break;
case V4L2_META_FMT_VSP1_HGT:descr = "R-Car VSP1 2-D Histogram"; 
break;
+   case V4L2_META_FMT_RK_ISP1_PARAMS:  descr = "Rockchip ISP1 3A 
params"; break;
+   case V4L2_META_FMT_RK_ISP1_STAT_3A: descr = "Rockchip ISP1 3A 
statistics"; break;
 
default:
/* Compressed formats */
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index e507b29ba1e0..14efa6513126 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -690,6 +690,10 @@ struct v4l2_pix_format {
 #define V4L2_META_FMT_VSP1_HGOv4l2_fourcc('V', 'S', 'P', 'H') /* R-Car 
VSP1 1-D Histogram */
 #define V4L2_META_FMT_VSP1_HGTv4l2_fourcc('V', 'S', 'P', 'T') /* R-Car 
VSP1 2-D Histogram */
 
+/* Vendor specific - used for IPU3 camera sub-system */
+#define V4L2_META_FMT_RK_ISP1_PARAMS   v4l2_fourcc('R', 'K', '1', 'P') /* 
Rockchip ISP1 params */
+#define V4L2_META_FMT_RK_ISP1_STAT_3A  v4l2_fourcc('R', 'K', '1', 'S') /* 
Rockchip ISP1 3A statistics */
+
 /* priv field value to indicates that subsequent fields are valid. */
 #define V4L2_PIX_FMT_PRIV_MAGIC0xfeedcafe
 
-- 
2.14.2

[PATCH] rtc: Add tracepoints for RTC system

2017-11-14 Thread Baolin Wang

It will be more helpful to add some tracepoints to track RTC actions when
debugging RTC driver. Below sample is that we set/read the RTC time, then
set 2 alarms, so we can see the trace logs:

set/read RTC time:
kworker/1:1-586   [001]  21.826112: rtc_set_time: 2017-11-10 08:13:00 UTC 
(1510301580)
kworker/1:1-586   [001]  21.826174: rtc_read_time: 2017-11-10 08:13:00 UTC 
(1510301580)

set the first alarm timer:
kworker/1:1-586   [001]  21.841098: rtc_timer_enqueue: RTC 
timer:(ffc15ad913c8) 2017-11-10 08:15:00 UTC (1510301700)
kworker/1:1-586   [001]  22.009424: rtc_set_alarm: 2017-11-10 08:15:00 UTC 
(1510301700)

set the second alarm timer:
kworker/1:1-586   [001]  22.181304: rtc_timer_enqueue: RTC 
timer:(ff80088e6430) 2017-11-10 08:17:00 UTC (1510301820)

the first alarm timer was expired:
kworker/0:1-67[000]  145.156226: rtc_timer_dequeue: RTC 
timer:(ffc15ad913c8) 2017-11-10 08:15:00 UTC (1510301700)
kworker/0:1-67[000]  145.156235: rtc_timer_fired: RTC 
timer:(ffc15ad913c8) 2017-11-10 08:15:00 UTC (1510301700)
kworker/0:1-67[000]  145.173137: rtc_set_alarm: 2017-11-10 08:17:00 UTC 
(1510301820)

the second alarm timer was expired:
kworker/0:1-67[000]  269.102985: rtc_timer_dequeue: RTC 
timer:(ff80088e6430) 2017-11-10 08:17:00 UTC (1510301820)
kworker/0:1-67[000]  269.102992: rtc_timer_fired: RTC 
timer:(ff80088e6430) 2017-11-10 08:17:00 UTC (1510301820)

disable alarm irq:
kworker/0:1-67[000]  269.103098: rtc_alarm_irq_enable: disable RTC 
alarm IRQ

Signed-off-by: Baolin Wang 
---
 drivers/rtc/interface.c|   46 ++
 include/trace/events/rtc.h |  215 
 2 files changed, 261 insertions(+)
 create mode 100644 include/trace/events/rtc.h

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 8cec9a0..cdd3ac8 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -17,6 +17,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer);
 static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer);
 
@@ -53,6 +56,9 @@ int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm)
 
err = __rtc_read_time(rtc, tm);
mutex_unlock(&rtc->ops_lock);
+
+   if (!err)
+   trace_rtc_read_time(tm);
return err;
 }
 EXPORT_SYMBOL_GPL(rtc_read_time);
@@ -87,6 +93,9 @@ int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm)
mutex_unlock(&rtc->ops_lock);
/* A timer might have just expired */
schedule_work(&rtc->irqwork);
+
+   if (!err)
+   trace_rtc_set_time(tm);
return err;
 }
 EXPORT_SYMBOL_GPL(rtc_set_time);
@@ -119,6 +128,9 @@ static int rtc_read_alarm_internal(struct rtc_device *rtc, 
struct rtc_wkalrm *al
}
 
mutex_unlock(&rtc->ops_lock);
+
+   if (!err)
+   trace_rtc_read_alarm(&alarm->time);
return err;
 }
 
@@ -316,6 +328,8 @@ int rtc_read_alarm(struct rtc_device *rtc, struct 
rtc_wkalrm *alarm)
}
mutex_unlock(&rtc->ops_lock);
 
+   if (!err)
+   trace_rtc_read_alarm(&alarm->time);
return err;
 }
 EXPORT_SYMBOL_GPL(rtc_read_alarm);
@@ -352,6 +366,8 @@ static int __rtc_set_alarm(struct rtc_device *rtc, struct 
rtc_wkalrm *alarm)
else
err = rtc->ops->set_alarm(rtc->dev.parent, alarm);
 
+   if (!err)
+   trace_rtc_set_alarm(&alarm->time);
return err;
 }
 
@@ -406,6 +422,8 @@ int rtc_initialize_alarm(struct rtc_device *rtc, struct 
rtc_wkalrm *alarm)
 
rtc->aie_timer.enabled = 1;
timerqueue_add(&rtc->timerqueue, &rtc->aie_timer.node);
+   trace_rtc_timer_enqueue(&rtc->aie_timer,
+   rtc_ktime_to_tm(rtc->aie_timer.node.expires));
}
mutex_unlock(&rtc->ops_lock);
return err;
@@ -435,6 +453,9 @@ int rtc_alarm_irq_enable(struct rtc_device *rtc, unsigned 
int enabled)
err = rtc->ops->alarm_irq_enable(rtc->dev.parent, enabled);
 
mutex_unlock(&rtc->ops_lock);
+
+   if (!err)
+   trace_rtc_alarm_irq_enable(enabled);
return err;
 }
 EXPORT_SYMBOL_GPL(rtc_alarm_irq_enable);
@@ -709,6 +730,9 @@ int rtc_irq_set_state(struct rtc_device *rtc, struct 
rtc_task *task, int enabled
rtc->pie_enabled = enabled;
}
spin_unlock_irqrestore(&rtc->irq_task_lock, flags);
+
+   if (!err)
+   trace_rtc_irq_set_state(enabled);
return err;
 }
 EXPORT_SYMBOL_GPL(rtc_irq_set_state);
@@ -745,6 +769,9 @@ int rtc_irq_set_freq(struct rtc_device *rtc, struct 
rtc_task *task, int freq)
}
}
spin_unlock_irqrestore(&rtc->irq_task_lock, flags);
+
+   if (!err)
+   trace_rtc_irq_set_freq(freq);
return err;
 }
 E

[RFC PATCH 0/5] Rockchip ISP1 Driver

2017-11-14 Thread Jacob Chen

This patch series add a ISP(Camera) v4l2 driver for rockchip rk3288/rk3399 SoC.

TODO:
  - Thomas is rewriting the binding code between isp, phy, sensors, i hope we 
could get suggestions.

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/768633/2
rules:
  - There are many mipi interfaces("rx0", "dxrx0")(actually it also could 
be parallel interface) in SoC and isp can decide which one will be used.
  - Sometimes there will be more than one senor in a mipi phy, the sofrware 
should decide which one is used(media link).
  - rk3399 have two isp.
  - Add a dummy buffer(dma_alloc_coherent) so drvier won't hold buffer.
  - Finish all TODO comments(mostly about hardware) in driver.

To help do a quick review, i have push source code to my Github.
  
https://github.com/wzyy2/linux/tree/rkisp1/drivers/media/platform/rockchip/isp1

Below are some infomations about driver/hardware:

Rockchip ISP1 have many Hardware Blocks(simplied):

  MIPI  --> ISP --> DCrop(Mainpath) --> RSZ(Mainpath) --> DMA(Mainpath)
  DMA-Input --> --> DCrop(Selfpath) --> RSZ(Selfpath) --> DMA(Selfpath);)

(Acutally the TRM(rk3288, isp) could be found online.. which contains a 
more detailed block diagrams ;-P)

The funcitons of each hardware block:

  Mainpath : up to 4k resolution, support raw/yuv format
  Selfpath : up tp 1080p, support rotate, support rgb/yuv format
  RSZ: scaling 
  DCrop: crop
  ISP: 3A, Color processing, Crop
  MIPI: MIPI Camera interface

Media pipelines:

  Mainpath, Selfpath <-- ISP subdev <-- MIPI  <-- Sensor
  3A stats   <--<-- 3A parms

Code struct:

  capture.c : Mainpath, Selfpath, RSZ, DCROP : capture device.
  rkisp1.c : ISP : v4l2 sub-device.
  isp_params.c : 3A parms : output device.
  isp_stats.c : 3A stats : capture device.
  mipi_dphy_sy.c : MIPI : sperated v4l2 sub-device.

Usage:
  ChromiumOS:
use below v4l2-ctl command to capture frames.

  v4l2-ctl --verbose -d /dev/video4 --stream-mmap=2
  --stream-to=/tmp/stream.out --stream-count=60 --stream-poll

use below command to playback the video on your PC.

  mplayer /tmp/stream.out -loop 0 --demuxer=rawvideo
  --rawvideo=w=800:h=600:size=$((800*600*2)):format=yuy2
or
  mplayer ./stream.out -loop 0 -demuxer rawvideo -rawvideo
  w=800:h=600:size=$((800*600*2)):format=yuy2

  Linux:
use rkcamsrc gstreamer plugin(just a modified v4l2src) to preview.

  gst-launch-1.0 rkcamsrc device=/dev/video0 io-mode=4 disable-3A=true
  videoconvert ! video/x-raw,format=NV12,width=640,height=480 ! kmssink

Jacob Chen (2):
  media: rkisp1: add rockchip isp1 driver
  ARM: dts: rockchip: add isp node for rk3288

Jeffy Chen (1):
  media: rkisp1: Add user space ABI definitions

Shunqian Zheng (2):
  media: videodev2.h, v4l2-ioctl: add rkisp1 meta buffer format
  arm64: dts: rockchip: add isp0 node for rk3399

 arch/arm/boot/dts/rk3288.dtsi  |   24 +
 arch/arm64/boot/dts/rockchip/rk3399.dtsi   |   26 +
 drivers/media/platform/Kconfig |   10 +
 drivers/media/platform/Makefile|1 +
 drivers/media/platform/rockchip/isp1/Makefile  |9 +
 drivers/media/platform/rockchip/isp1/capture.c | 1678 
 drivers/media/platform/rockchip/isp1/capture.h |   46 +
 drivers/media/platform/rockchip/isp1/common.h  |  327 
 drivers/media/platform/rockchip/isp1/dev.c |  728 +
 drivers/media/platform/rockchip/isp1/isp_params.c  | 1556 ++
 drivers/media/platform/rockchip/isp1/isp_params.h  |   81 +
 drivers/media/platform/rockchip/isp1/isp_stats.c   |  537 +++
 drivers/media/platform/rockchip/isp1/isp_stats.h   |   81 +
 .../media/platform/rockchip/isp1/mipi_dphy_sy.c|  619 
 .../media/platform/rockchip/isp1/mipi_dphy_sy.h|   42 +
 drivers/media/platform/rockchip/isp1/regs.c|  251 +++
 drivers/media/platform/rockchip/isp1/regs.h| 1578 ++
 drivers/media/platform/rockchip/isp1/rkisp1.c  | 1132 +
 drivers/media/platform/rockchip/isp1/rkisp1.h  |  130 ++
 drivers/media/v4l2-core/v4l2-ioctl.c   |2 +
 include/uapi/linux/rkisp1-config.h |  554 +++
 include/uapi/linux/videodev2.h |4 +
 22 files changed, 9416 insertions(+)
 create mode 100644 drivers/media/platform/rockchip/isp1/Makefile
 create mode 100644 drivers/media/platform/rockchip/isp1/capture.c
 create mode 100644 drivers/media/platform/rockchip/isp1/capture.h
 create mode 100644 drivers/media/platform/rockchip/isp1/common.h
 create mode 100644 drivers/media/platform/rockchip/isp1/dev.c
 create mode 100644 drivers/media/platform/rockchip/isp1/isp_params.c
 create mode 100644 drivers/media/platform/rockchip/isp1/isp_params.h
 create mode 100644 drivers/media/platform/rockchip/isp1/isp_stats.c
 create mode 100644 drivers/media/platform/rockchip/isp1/isp_stats.

[RFC PATCH 1/5] media: videodev2.h, v4l2-ioctl: add rkisp1 meta buffer format

2017-11-14 Thread Jacob Chen

From: Shunqian Zheng 

Add the Rockchip ISP1 specific processing parameter format
V4L2_META_FMT_RK_ISP1_PARAMS and metadata format
V4L2_META_FMT_RK_ISP1_STAT_3A for 3A.

Signed-off-by: Shunqian Zheng 
Signed-off-by: Jacob Chen 
---
 drivers/media/v4l2-core/v4l2-ioctl.c | 2 ++
 include/uapi/linux/videodev2.h   | 4 
 2 files changed, 6 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index d6587b3ec33e..0604ae9ea444 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1252,6 +1252,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
case V4L2_TCH_FMT_TU08: descr = "8-bit unsigned touch data"; 
break;
case V4L2_META_FMT_VSP1_HGO:descr = "R-Car VSP1 1-D Histogram"; 
break;
case V4L2_META_FMT_VSP1_HGT:descr = "R-Car VSP1 2-D Histogram"; 
break;
+   case V4L2_META_FMT_RK_ISP1_PARAMS:  descr = "Rockchip ISP1 3A 
params"; break;
+   case V4L2_META_FMT_RK_ISP1_STAT_3A: descr = "Rockchip ISP1 3A 
statistics"; break;
 
default:
/* Compressed formats */
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index e507b29ba1e0..14efa6513126 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -690,6 +690,10 @@ struct v4l2_pix_format {
 #define V4L2_META_FMT_VSP1_HGOv4l2_fourcc('V', 'S', 'P', 'H') /* R-Car 
VSP1 1-D Histogram */
 #define V4L2_META_FMT_VSP1_HGTv4l2_fourcc('V', 'S', 'P', 'T') /* R-Car 
VSP1 2-D Histogram */
 
+/* Vendor specific - used for IPU3 camera sub-system */
+#define V4L2_META_FMT_RK_ISP1_PARAMS   v4l2_fourcc('R', 'K', '1', 'P') /* 
Rockchip ISP1 params */
+#define V4L2_META_FMT_RK_ISP1_STAT_3A  v4l2_fourcc('R', 'K', '1', 'S') /* 
Rockchip ISP1 3A statistics */
+
 /* priv field value to indicates that subsequent fields are valid. */
 #define V4L2_PIX_FMT_PRIV_MAGIC0xfeedcafe
 
-- 
2.14.2

Re: [PATCH v3 2/3] usb: xhci: Add DbC support in xHCI driver

2017-11-14 Thread Lu Baolu

Hi,

On 11/14/2017 03:28 PM, Felipe Balbi wrote:
> Hi,
>
> Mathias Nyman  writes:
>>> +static int dbc_buf_alloc(struct dbc_buf *db, unsigned int size)
>>> +{
>>> +   db->buf_buf = kzalloc(size, GFP_KERNEL);
>>> +   if (!db->buf_buf)
>>> +   return -ENOMEM;
>>> +
>>> +   db->buf_size = size;
>>> +   db->buf_put = db->buf_buf;
>>> +   db->buf_get = db->buf_buf;
>>> +
>>> +   return 0;
>>> +}
> you may wanna have a look at kfifo.
>

Yeah! kfifo gives me exactly what I want here.

I will replace it with kfifo. Thank you.

Best regards,
Lu Baolu

Re: [PATCH] remoteproc: qcom: Fix error handling paths in order to avoid memory leaks

2017-11-14 Thread Bjorn Andersson

On Tue 14 Nov 22:58 PST 2017, Christophe JAILLET wrote:

> In case of error returned by 'q6v5_xfer_mem_ownership', we must free
> some resources before returning.
> 
> In 'q6v5_mpss_init_image()', add a new label to undo a previous
> 'dma_alloc_attrs()'.
> In 'q6v5_mpss_load()', re-use the already existing error handling code to
> undo a previous 'request_firmware()', as already done in the other error
> handling paths of the function.
> 
> Signed-off-by: Christophe JAILLET 

Thanks!

Regards,
Bjorn

Re: [PATCH] PM / runtime: Drop children check from __pm_runtime_set_status()

2017-11-14 Thread Ulf Hansson

[...]

>>
>> When pm_runtime_set_suspended(dev) is called, dev's child device may
>> still be runtime PM enabled and active.
>> I was suggesting to add a check for this scenario, to see if dev's
>> child device is runtime PM is enabled, as and additional constraint
>> before deciding to return an error code.
>
> Well, that's sort of difficult to do, however, because the code would need to
> walk all of the children of the device and the child power lock cannot be
> acquired under the one of the parent, so it would be fragile and ugly.

Yeah, you have a point.

>
>> The idea was to get a consistent behavior, from the
>> pm_runtime_set_active|suspended() APIs point of view, and not from the
>> runtime PM core point of view.
>
> Yes, but the cost is high and the benefit is shallow.
>
> The enable-time WARN() should cover the really broken cases without that
> much complexity.

Fair enough!

Feel free to add:
Reviewed-by: Ulf Hansson 

Kind regards
Uffe

[PATCH v3 0/3] Qualcomm Light Pulse Generator

2017-11-14 Thread Bjorn Andersson

This series introduces a generic pattern interface in the LED class and
a driver for the Qualcomm Light Pulse Generator.

Bjorn Andersson (3):
  leds: core: Introduce generic pattern interface
  leds: Add driver for Qualcomm LPG
  DT: leds: Add Qualcomm Light Pulse Generator binding

 Documentation/ABI/testing/sysfs-class-led  |   20 +
 .../devicetree/bindings/leds/leds-qcom-lpg.txt |   66 ++
 drivers/leds/Kconfig   |7 +
 drivers/leds/Makefile  |1 +
 drivers/leds/led-class.c   |  150 +++
 drivers/leds/leds-qcom-lpg.c   | 1232 
 include/linux/leds.h   |   21 +
 7 files changed, 1497 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/leds/leds-qcom-lpg.txt
 create mode 100644 drivers/leds/leds-qcom-lpg.c

-- 
2.15.0

[PATCH v3 3/3] DT: leds: Add Qualcomm Light Pulse Generator binding

2017-11-14 Thread Bjorn Andersson

This adds the binding document describing the three hardware blocks
related to the Light Pulse Generator found in a wide range of Qualcomm
PMICs.

Signed-off-by: Bjorn Andersson 
---

Changes since v2:
- Squashed all things into one node
- Removed quirks from the binding, compatible implies number of channels, their
  configuration etc.
- Binding describes LEDs connected as child nodes
- Support describing multi-channel LEDs
- Change style of the binding document, to match other LED bindings

Changes since v1:
- Dropped custom pattern properties
- Renamed cell-index to qcom,lpg-channel to clarify its purpose

 .../devicetree/bindings/leds/leds-qcom-lpg.txt | 66 ++
 1 file changed, 66 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/leds/leds-qcom-lpg.txt

diff --git a/Documentation/devicetree/bindings/leds/leds-qcom-lpg.txt 
b/Documentation/devicetree/bindings/leds/leds-qcom-lpg.txt
new file mode 100644
index ..9cee6f9f543c
--- /dev/null
+++ b/Documentation/devicetree/bindings/leds/leds-qcom-lpg.txt
@@ -0,0 +1,66 @@
+Binding for Qualcomm Light Pulse Generator
+
+The Qualcomm Light Pulse Generator consists of three different hardware blocks;
+a ramp generator with lookup table, the light pulse generator and a three
+channel current sink. These blocks are found in a wide range of Qualcomm PMICs.
+
+Required properties:
+- compatible: one of:
+ "qcom,pm8916-pwm",
+ "qcom,pm8941-lpg",
+ "qcom,pm8994-lpg",
+ "qcom,pmi8994-lpg",
+ "qcom,pmi8998-lpg",
+
+Optional properties:
+- qcom,power-source: power-source used to drive the output, as defined in the
+datasheet. Should be specified if the TRILED block is
+present
+- qcom,dtest: configures the output into an internal test line of the
+ pmic. Specified by a list of u32 pairs, one pair per channel,
+ where each pair denotes the test line to drive and the second
+ configures how the value should be outputed, as defined in the
+ datasheet
+- #pwm-cells: should be 2, see ../pwm/pwm.txt
+
+LED subnodes:
+A set of subnodes can be used to specify LEDs connected to the LPG. Channels
+not associated with a LED are available as pwm channels, see ../pwm/pwm.txt.
+
+Required properties:
+- led-sources: list of channels associated with this LED, starting at 1 for the
+  first LPG channel
+
+Optional properties:
+- label: see Documentation/devicetree/bindings/leds/common.txt
+- default-state: see Documentation/devicetree/bindings/leds/common.txt
+- linux,default-trigger: see Documentation/devicetree/bindings/leds/common.txt
+
+Example:
+The following example defines a RGB LED attached to the PM8941.
+
+&spmi_bus {
+   pm8941@1 {
+   lpg {
+   compatible = "qcom,pm8941-lpg";
+   qcom,power-source = <1>;
+
+   rgb {
+   led-sources = <7 6 5>;
+   };
+   };
+   };
+};
+
+The following example defines the single PWM channel of the PM8916, which can
+be muxed by the MPP4 as a current sink.
+
+&spmi_bus {
+   pm8916@1 {
+   pm8916_pwm: pwm {
+   compatible = "qcom,pm8916-pwm";
+
+   #pwm-cells = <2>;
+   };
+   };
+};
-- 
2.15.0

[PATCH v3 2/3] leds: Add driver for Qualcomm LPG

2017-11-14 Thread Bjorn Andersson

The Light Pulse Generator (LPG) is a PWM-block found in a wide range of
PMICs from Qualcomm. It can operate on fixed parameters or based on a
lookup-table, altering the duty cycle over time - which provides the
means for e.g. hardware assisted transitions of LED brightness.

Signed-off-by: Bjorn Andersson 
---
Changes since v2:
- Squash all components into one driver
- Track PWM channels and "logical" LEDs separately
- Support multiple channels to be bound to a single LED
- Per-PMIC compatible, to deal with minor differences (e.g. value to enable
  9bit resolution for PWM)
- TRILED enablement is done atomically for all channels associated with a LED
- LUT sequencer start is done atomically for all channels associated with a LED
- Support PM8916 (PWM only), PM8941, PM8994 and PMI8998 introduced (PMI8994
  still works...)

The multiple channels per LED is currently implemented by assigning the same
pattern and same brightness to all channels. This allows the RGB LED to show
various brighness of white and do patterns in shades of white. But it's
implemented in a way that as we figure out how to expose multi-color LEDs
through the LED framework this new information could easily be applied to the
right channel, and we would have the ability to control the channels
individually.

Changes since v1:
- Remove custom DT properties for patterns
- Extract pattern interface into the LED core

 drivers/leds/Kconfig |7 +
 drivers/leds/Makefile|1 +
 drivers/leds/leds-qcom-lpg.c | 1232 ++
 3 files changed, 1240 insertions(+)
 create mode 100644 drivers/leds/leds-qcom-lpg.c

diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index 52ea34e337cd..ccc3aa4b2474 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -651,6 +651,13 @@ config LEDS_POWERNV
  To compile this driver as a module, choose 'm' here: the module
  will be called leds-powernv.
 
+config LEDS_QCOM_LPG
+   tristate "LED support for Qualcomm LPG"
+   depends on LEDS_CLASS
+   help
+ This option enables support for the Light Pulse Generator found in a
+ wide variety of Qualcomm PMICs.
+
 config LEDS_SYSCON
bool "LED support for LEDs on system controllers"
depends on LEDS_CLASS=y
diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
index 35980450db9b..2d5149ca429d 100644
--- a/drivers/leds/Makefile
+++ b/drivers/leds/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_LEDS_MAX77693)   += leds-max77693.o
 obj-$(CONFIG_LEDS_MAX8997) += leds-max8997.o
 obj-$(CONFIG_LEDS_LM355x)  += leds-lm355x.o
 obj-$(CONFIG_LEDS_BLINKM)  += leds-blinkm.o
+obj-$(CONFIG_LEDS_QCOM_LPG)+= leds-qcom-lpg.o
 obj-$(CONFIG_LEDS_SYSCON)  += leds-syscon.o
 obj-$(CONFIG_LEDS_MENF21BMC)   += leds-menf21bmc.o
 obj-$(CONFIG_LEDS_KTD2692) += leds-ktd2692.o
diff --git a/drivers/leds/leds-qcom-lpg.c b/drivers/leds/leds-qcom-lpg.c
new file mode 100644
index ..481e940d7e04
--- /dev/null
+++ b/drivers/leds/leds-qcom-lpg.c
@@ -0,0 +1,1232 @@
+/*
+ * Copyright (c) 2017 Linaro Ltd
+ * Copyright (c) 2010-2012, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define LPG_PATTERN_CONFIG_REG 0x40
+#define LPG_SIZE_CLK_REG   0x41
+#define LPG_PREDIV_CLK_REG 0x42
+#define PWM_TYPE_CONFIG_REG0x43
+#define PWM_VALUE_REG  0x44
+#define PWM_ENABLE_CONTROL_REG 0x46
+#define PWM_SYNC_REG   0x47
+#define LPG_RAMP_DURATION_REG  0x50
+#define LPG_HI_PAUSE_REG   0x52
+#define LPG_LO_PAUSE_REG   0x54
+#define LPG_HI_IDX_REG 0x56
+#define LPG_LO_IDX_REG 0x57
+#define PWM_SEC_ACCESS_REG 0xd0
+#define PWM_DTEST_REG(x)   (0xe2 + (x) - 1)
+
+#define TRI_LED_SRC_SEL0x45
+#define TRI_LED_EN_CTL 0x46
+#define TRI_LED_ATC_CTL0x47
+
+#define LPG_LUT_REG(x) (0x40 + (x) * 2)
+#define RAMP_CONTROL_REG   0xc8
+
+struct lpg_channel;
+struct lpg_data;
+
+/**
+ * struct lpg - LPG device context
+ * @dev:   struct device for LPG device
+ * @map:   regmap for register access
+ * @pwm:   PWM-chip object, if operating in PWM mode
+ * @pwm_9bit_mask: bitmask for enabling 9bit pwm
+ * @lut_base:  base address of the LUT block (optional)
+ * @lut_size:  number of entries in the LUT block
+ * @lut_bitmap:allocation bitmap for LUT entries
+

[PATCH v3 1/3] leds: core: Introduce generic pattern interface

2017-11-14 Thread Bjorn Andersson

Some LED controllers have support for autonomously controlling
brightness over time, according to some preprogrammed pattern or
function.

This adds a new optional operator that LED class drivers can implement
if they support such functionality as well as a new device attribute to
configure the pattern for a given LED.

Signed-off-by: Bjorn Andersson 
---

Changes since v2:
- None

Changes since v1:
- New patch, based on discussions following v1

 Documentation/ABI/testing/sysfs-class-led |  20 
 drivers/leds/led-class.c  | 150 ++
 include/linux/leds.h  |  21 +
 3 files changed, 191 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-led 
b/Documentation/ABI/testing/sysfs-class-led
index 5f67f7ab277b..74a7f5b1f89b 100644
--- a/Documentation/ABI/testing/sysfs-class-led
+++ b/Documentation/ABI/testing/sysfs-class-led
@@ -61,3 +61,23 @@ Description:
gpio and backlight triggers. In case of the backlight trigger,
it is useful when driving a LED which is intended to indicate
a device in a standby like state.
+
+What:  /sys/class/leds//pattern
+Date:  July 2017
+KernelVersion: 4.14
+Description:
+   Specify a pattern for the LED, for LED hardware that support
+   altering the brightness as a function of time.
+
+   The pattern is given by a series of tuples, of brightness and
+   duration (ms). The LED is expected to traverse the series and
+   each brightness value for the specified duration.
+
+   Additionally a repeat marker ":|" can be appended to the
+   series, which should cause the pattern to be repeated
+   endlessly.
+
+   As LED hardware might have different capabilities and precision
+   the requested pattern might be slighly adjusted by the driver
+   and the resulting pattern of such operation should be returned
+   when this file is read.
diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
index b0e2d55acbd6..bd630e2ae967 100644
--- a/drivers/leds/led-class.c
+++ b/drivers/leds/led-class.c
@@ -74,6 +74,154 @@ static ssize_t max_brightness_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(max_brightness);
 
+static ssize_t pattern_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct led_classdev *led_cdev = dev_get_drvdata(dev);
+   struct led_pattern *pattern;
+   size_t offset = 0;
+   size_t count;
+   bool repeat;
+   size_t i;
+   int n;
+
+   if (!led_cdev->pattern_get)
+   return -EOPNOTSUPP;
+
+   pattern = led_cdev->pattern_get(led_cdev, &count, &repeat);
+   if (IS_ERR_OR_NULL(pattern))
+   return PTR_ERR(pattern);
+
+   for (i = 0; i < count; i++) {
+   n = snprintf(buf + offset, PAGE_SIZE - offset, "%d %d",
+pattern[i].brightness, pattern[i].delta_t);
+
+   if (offset + n >= PAGE_SIZE)
+   goto err_nospc;
+
+   offset += n;
+
+   if (i < count - 1)
+   buf[offset++] = ' ';
+   }
+
+   if (repeat) {
+   if (offset + 4 >= PAGE_SIZE)
+   goto err_nospc;
+
+   memcpy(buf + offset, " :|", 3);
+   offset += 3;
+   }
+
+   if (offset + 1 >= PAGE_SIZE)
+   goto err_nospc;
+
+   buf[offset++] = '\n';
+
+   kfree(pattern);
+   return offset;
+
+err_nospc:
+   kfree(pattern);
+   return -ENOSPC;
+}
+
+static ssize_t pattern_store(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t size)
+{
+   struct led_classdev *led_cdev = dev_get_drvdata(dev);
+   struct led_pattern *pattern = NULL;
+   unsigned long val;
+   char *sbegin;
+   char *elem;
+   char *s;
+   int len = 0;
+   int ret = 0;
+   bool odd = true;
+   bool repeat = false;
+
+   s = sbegin = kstrndup(buf, size, GFP_KERNEL);
+   if (!s)
+   return -ENOMEM;
+
+   /* Trim trailing newline */
+   s[strcspn(s, "\n")] = '\0';
+
+   /* If the remaining string is empty, clear the pattern */
+   if (!s[0]) {
+   ret = led_cdev->pattern_clear(led_cdev);
+   goto out;
+   }
+
+   pattern = kcalloc(size, sizeof(*pattern), GFP_KERNEL);
+   if (!pattern) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   /* Parse out the brightness & delta_t touples and check for repeat */
+   while ((elem = strsep(&s, " ")) != NULL) {
+   if (!strcmp(elem, ":|")) {
+   repeat = true;
+   break;
+   }
+
+   ret = kstrtoul(elem, 10, &v

Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)

2017-11-14 Thread Dave Hansen

On 11/14/2017 07:44 PM, Matthew Wilcox wrote:
> On Mon, Nov 13, 2017 at 02:46:25PM -0800, Dave Hansen wrote:
>> On 11/13/2017 02:20 PM, Dave Hansen wrote:
>>> On 11/09/2017 05:09 PM, Tycho Andersen wrote:
 which I guess is from the additional flags in grow_dev_page() somewhere 
 down
 the stack. Anyway... it seems this is a kernel allocation that's using
 MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than 
 just
 all MOVABLE allocations are un-mapped via xpfo, and all the others are 
 mapped.

 Do you have any ideas?
>>>
>>> It still has to do a kmap() or kmap_atomic() to be able to access it.  I
>>> thought you hooked into that.  Why isn't that path getting hit for these?
>>
>> Oh, this looks to be accessing data mapped by a buffer_head.  It
>> (rudely) accesses data via:
>>
>> void set_bh_page(struct buffer_head *bh,
>> ...
>>  bh->b_data = page_address(page) + offset;
> 
> We don't need to kmap in order to access MOVABLE allocations.  kmap is
> only needed for HIGHMEM allocations.  So there's nothing wrong with ext4
> or set_bh_page().

Yeah, it's definitely not _buggy_.

Although, I do wonder what we should do about these for XPFO.  Should we
just stick a kmap() in there and comment it?  What we really need is a
mechanism to say "use this as a kernel page" and "stop using this as a
kernel page".  kmap() does that... kinda.  It's not a perfect fit, but
it's pretty close.

[PATCH] remoteproc: qcom: Fix error handling paths in order to avoid memory leaks

2017-11-14 Thread Christophe JAILLET

In case of error returned by 'q6v5_xfer_mem_ownership', we must free
some resources before returning.

In 'q6v5_mpss_init_image()', add a new label to undo a previous
'dma_alloc_attrs()'.
In 'q6v5_mpss_load()', re-use the already existing error handling code to
undo a previous 'request_firmware()', as already done in the other error
handling paths of the function.

Signed-off-by: Christophe JAILLET 
---
We could certainly also propagate the error code returned by
'q6v5_xfer_mem_ownership()' instead of returning an unconditional -EAGAIN.
Not sure of the potential impacts, so I've left it as-is.
---
 drivers/remoteproc/qcom_q6v5_pil.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/qcom_q6v5_pil.c 
b/drivers/remoteproc/qcom_q6v5_pil.c
index a019796c363a..8a3fa2bcc9f6 100644
--- a/drivers/remoteproc/qcom_q6v5_pil.c
+++ b/drivers/remoteproc/qcom_q6v5_pil.c
@@ -580,7 +580,8 @@ static int q6v5_mpss_init_image(struct q6v5 *qproc, const 
struct firmware *fw)
if (ret) {
dev_err(qproc->dev,
"assigning Q6 access to metadata failed: %d\n", ret);
-   return -EAGAIN;
+   ret = -EAGAIN;
+   goto free_dma_attrs;
}
 
writel(phys, qproc->rmb_base + RMB_PMI_META_DATA_REG);
@@ -599,6 +600,7 @@ static int q6v5_mpss_init_image(struct q6v5 *qproc, const 
struct firmware *fw)
dev_warn(qproc->dev,
 "mdt buffer not reclaimed system may become 
unstable\n");
 
+free_dma_attrs:
dma_free_attrs(qproc->dev, fw->size, ptr, phys, dma_attrs);
 
return ret < 0 ? ret : 0;
@@ -712,7 +714,8 @@ static int q6v5_mpss_load(struct q6v5 *qproc)
if (ret) {
dev_err(qproc->dev,
"assigning Q6 access to mpss memory failed: %d\n", ret);
-   return -EAGAIN;
+   ret = -EAGAIN;
+   goto release_firmware;
}
 
boot_addr = relocate ? qproc->mpss_phys : min_addr;
-- 
2.14.1

Re: [PATCH 2/3] X86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-11-14 Thread Dave Young

On 11/15/17 at 01:47pm, Baoquan He wrote:
> Hi Dave,
> 
> Thanks for your effort to push this into upstream. While I have one
> concern, please see the inline comments.
> 
> On 10/24/17 at 01:31pm, Dave Young wrote:
> > Now crashkernel=X will fail if there's not enough memory at low region
> > (below 896M) when trying to reserve large memory size.  One can use
> > crashkernel=xM,high to reserve it at high region (>4G) but it is more
> > convinient to improve crashkernel=X to: 
> > 
> >  - First try to reserve X below 896M (for being compatible with old
> >kexec-tools).
> >  - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
> >  - If fails, try to reserve X from MAXMEM top down.
> > 
> > It's more transparent and user-friendly.
> > 
> > If crashkernel is large and the reserved is beyond 896M, old kexec-tools
> > is not compatible with new kernel because old kexec-tools can not load
> > kernel at high memory region, there was an old discussion below:
> > https://lkml.org/lkml/2013/10/15/601
> > 
> > But actually the behavior is consistent during my test. Suppose
> > old kernel fail to reserve memory at low areas, kdump does not
> > work because no meory reserved. With this patch, suppose new kernel
> > successfully reserved memory at high areas, old kexec-tools still fail
> > to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
> > worry about the compatibility.
> > 
> > Here is the test result (kexec-tools 2.0.2, no high memory load
> > support):
> > Crashkernel over 4G:
> > # cat /proc/iomem|grep Crash
> >   be00-cdff : Crash kernel
> >   21300-21eff : Crash kernel
> > # ./kexec  -p /boot/vmlinuz-`uname -r`
> > Memory for crashkernel is not reserved
> > Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> > Then try loading kdump kernel
> > 
> > crashkernel: 896M-4G:
> > # cat /proc/iomem|grep Crash
> >   9600-cdef : Crash kernel
> > # ./kexec -p /boot/vmlinuz-4.14.0-rc4+
> > ELF core (kcore) parse failed
> > Cannot load /boot/vmlinuz-4.14.0-rc4+
> > 
> > Signed-off-by: Dave Young 
> > ---
> >  arch/x86/kernel/setup.c |   16 
> >  1 file changed, 16 insertions(+)
> > 
> > --- linux-x86.orig/arch/x86/kernel/setup.c
> > +++ linux-x86/arch/x86/kernel/setup.c
> > @@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
> > high ? CRASH_ADDR_HIGH_MAX
> >  : CRASH_ADDR_LOW_MAX,
> > crash_size, CRASH_ALIGN);
> > +#ifdef CONFIG_X86_64
> > +   /*
> > +* crashkernel=X reserve below 896M fails? Try below 4G
> > +*/
> > +   if (!high && !crash_base)
> > +   crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +   (1ULL << 32),
> > +   crash_size, CRASH_ALIGN);
> > +   /*
> > +* crashkernel=X reserve below 4G fails? Try MAXMEM
> > +*/
> > +   if (!high && !crash_base)
> > +   crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +   CRASH_ADDR_HIGH_MAX,
> > +   crash_size, CRASH_ALIGN);
> 
> For kdump, most of systems are x86 64. If both Yinghai and Vivek have no
> objection to search an available region of crash_size above 896M
> naturely, why don't we search it with function
> __memblock_find_range_bottom_up(). It can search from below 896M to
> above 4G, almost the same as the change you have made currently. Mainly
> the code will be much simpler.
> 
> The several times of searching looks not good and a little confusing.
> 
> What do you think?

Bao, thanks for the comment, it might be a good idea, will explore this
way see if there are risks to go with your suggestion.

> 
> Thanks
> Baoquan
> 
> > +#endif
> > if (!crash_base) {
> > pr_info("crashkernel reservation failed - No suitable 
> > area found.\n");
> > return;
> > 
> >

Re: [PATCH 1/2] x86,kvm: move qemu/guest FPU switching out to vcpu_run

2017-11-14 Thread quan.x...@gmail.com




On 2017/11/15 05:54, r...@redhat.com wrote:

From: Rik van Riel 

Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.


Rik, be careful with VM migration. with you patch, I don't think you 
could load fpu/xstate

  context accurately after VM migration.


Quan
Alibaba Cloud

This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.

This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.

No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.

There may be other tests where performance changes are noticeable.

Signed-off-by: Rik van Riel 
Suggested-by: Christian Borntraeger 
---
  arch/x86/include/asm/kvm_host.h | 13 +
  arch/x86/kvm/x86.c  | 34 +-
  include/linux/kvm_host.h|  2 +-
  3 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9d7d856b2d89..ffe54958491f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
  
+	/*

+* QEMU userspace and the guest each have their own FPU state.
+* In vcpu_run, we switch between the user and guest FPU contexts.
+* While running a VCPU, the VCPU thread will have the guest FPU
+* context.
+*
+* Note that while the PKRU state lives inside the fpu registers,
+* it is switched out separately at VMENTER and VMEXIT time. The
+* "guest_fpu" state here contains the guest FPU context, with the
+* host PRKU bits.
+*/
+   struct fpu user_fpu;
struct fpu guest_fpu;
+
u64 xcr0;
u64 guest_supported_xcr0;
u32 guest_xstate_size;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 03869eb7fcd6..aad5181ed4e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2917,7 +2917,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
srcu_read_unlock(&vcpu->kvm->srcu, idx);
pagefault_enable();
kvm_x86_ops->vcpu_put(vcpu);
-   kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
  }
  
@@ -5228,13 +5227,10 @@ static void emulator_halt(struct x86_emulate_ctxt *ctxt)
  
  static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)

  {
-   preempt_disable();
-   kvm_load_guest_fpu(emul_to_vcpu(ctxt));
  }
  
  static void emulator_put_fpu(struct x86_emulate_ctxt *ctxt)

  {
-   preempt_enable();
  }
  
  static int emulator_intercept(struct x86_emulate_ctxt *ctxt,

@@ -6908,7 +6904,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
  
  	kvm_x86_ops->prepare_guest_switch(vcpu);

-   kvm_load_guest_fpu(vcpu);
  
  	/*

 * Disable IRQs before setting IN_GUEST_MODE.  Posted interrupt
@@ -7255,12 +7250,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
}
}
  
+	kvm_load_guest_fpu(vcpu);

+
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = 
vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
-   goto out;
+   goto out_fpu;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
  
@@ -7269,6 +7266,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)

else
r = vcpu_run(vcpu);
  
+out_fpu:

+   kvm_put_guest_fpu(vcpu);
  out:
post_kvm_run_save(vcpu);
if (vcpu->sigset_active)
@@ -7663,32 +7662,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.cr0 |= X86_CR0_ET;
  }
  
+/* Swap (qemu) user FPU context for the guest FPU context. */

  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
  {
-   if (vcpu->guest_fpu_loaded)
-   return;
-
-   /*
-* Restore all possible states in the guest,
-* and assume host would use all available bits.
-* Guest xcr0 would be loaded later.
-*/
-

Re: Adding rseq tree to -next

2017-11-14 Thread Mathieu Desnoyers

- On Nov 14, 2017, at 11:38 PM, Stephen Rothwell s...@canb.auug.org.au 
wrote:

> Hi Mathieu,
> 
> On Wed, 15 Nov 2017 01:22:04 + (UTC) Mathieu Desnoyers
>  wrote:
>>
>> - On Nov 14, 2017, at 7:15 PM, Stephen Rothwell s...@canb.auug.org.au 
>> wrote:
>> 
>> > On Tue, 14 Nov 2017 23:54:06 + (UTC) Mathieu Desnoyers
>> >  wrote:
>> >>
>> >> Would it be possible to add the "rseq" tree to -next for testing ?
>> >> 
>> >> I prepared a branch at:
>> >> 
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git
>> >> branch: rseq/for-next
>> > 
>> > I try not to add new trees during the merge window (the only exceptions
>> > are for trees that will remain empty until after the merge window
>> > closes or trees only containing material for the current merge window -
>> > and in that case it is a bit late and a pain if it interacts with
>> > anything else).
>> > 
>> > I will add it after -rc1 is released, though.  Please remind me if I
>> > forget.
>> 
>> No worries, sorry for the short notice. I'll try to do a merge
>> attempt into -next on my end before sending to Linus then.
> 
> OK, since you intend to ask Linus to merge it during this merge window,
> I have added it from today (I hope I don't regret it too much :-)).

Thanks! I did attempt to do the merge with -next myself, and the
conflicts were pretty much trivial to handle. One I have not seen
in your messages so far is the comment added to mmdrop() on x86. The
function moves from a static inline in a header to a standard function
(in a C file), so the comment should move accordingly.

Thank you,

Mathieu

> 
> Thanks for adding your subsystem tree as a participant of linux-next.  As
> you may know, this is not a judgement of your code.  The purpose of
> linux-next is for integration testing and to lower the impact of
> conflicts between subsystems in the next merge window.
> 
> You will need to ensure that the patches/commits in your tree/series have
> been:
> * submitted under GPL v2 (or later) and include the Contributor's
>Signed-off-by,
> * posted to the relevant mailing list,
> * reviewed by you (or another maintainer of your subsystem tree),
> * successfully unit tested, and
> * destined for the current or next Linux merge window.
> 
> Basically, this should be just what you would send to Linus (or ask him
> to fetch).  It is allowed to be rebased if you deem it necessary.
> 
> --
> Cheers,
> Stephen Rothwell
> s...@canb.auug.org.au

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH 00/24] staging: ccree: more cleanup patches

2017-11-14 Thread Gilad Ben-Yossef

On Tue, Nov 14, 2017 at 11:48 AM, Dan Carpenter
 wrote:
> On Tue, Nov 14, 2017 at 11:33:20AM +0200, Gilad Ben-Yossef wrote:
>> On Mon, Nov 13, 2017 at 8:33 PM, Dan Carpenter  
>> wrote:
>> > These cleanups look nice.  Thanks.
>> >
>> > I hope you do a mass remove of likely/unlikely in a patch soon.
>> > Whenever, I see one of those in a + line I always have to remind myself
>> > that you're planning to do it in a later patch.
>> >
>>
>> So, a question about that - there indeed seems to be an inflation of
>> likely/unlikely in the ccree driver, but
>> what stopped me from removing them was that I found out I don't have a
>> clue about when it's a good idea
>> to use them and when it isn't (obviously in places where you know the
>> probable code flow of course).
>>
>> Any hints?
>
> They should only be included if benchmarking shows that it makes a
> difference.  I think they need to be about 100 right predictions to 1
> wrong prediction on a fast path.  So remove them all and add them back
> one at a time.
>

OK, that makes a lot of sense.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

[PATCH] Input: ALPS - fix DualPoint flag for 74 03 28 devices

2017-11-14 Thread Aaron Ma

There is a regression of commit 4a646580f793 ("Input: ALPS - fix
two-finger scroll breakage"), ALPS device fails with log:

psmouse serio1: alps: Rejected trackstick packet from non DualPoint device

ALPS device with id "74 03 28" report OTP[0] data 0xCE after
commit 4a646580f793, after restore the OTP reading order,
it becomes to 0x10 as before and reports the right flag.

Fixes: 4a646580f793 ("Input: ALPS - fix two-finger scroll breakage")
Cc: 
Signed-off-by: Aaron Ma 
---
 drivers/input/mouse/alps.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
index 579b899add26..c59b8f7ca2fc 100644
--- a/drivers/input/mouse/alps.c
+++ b/drivers/input/mouse/alps.c
@@ -2562,8 +2562,8 @@ static int alps_set_defaults_ss4_v2(struct psmouse 
*psmouse,
 
memset(otp, 0, sizeof(otp));
 
-   if (alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]) ||
-   alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]))
+   if (alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]) ||
+   alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]))
return -1;
 
alps_update_device_area_ss4_v2(otp, priv);
-- 
2.13.6

Re: [RFC PATCH for 4.15 00/24] Restartable sequences and CPU op vector v11

2017-11-14 Thread Mathieu Desnoyers

- On Nov 14, 2017, at 11:12 PM, Andy Lutomirski l...@amacapital.net wrote:

>> On Nov 14, 2017, at 1:32 PM, Mathieu Desnoyers 
>> 
>> wrote:
>>
>> - On Nov 14, 2017, at 4:15 PM, Andy Lutomirski l...@amacapital.net wrote:
>>
>>
>> One thing I kept however that diverge from your recommendation is the
>> "sign" parameter to the rseq syscall. I prefer this flexible
>> approach to a hardcoded signature value. We never know when we may
>> need to randomize or change this in the future.
>>
>> Regarding abort target signature the vs x86 disassemblers, I used a
>> 5-byte no-op on x86 32/64:
>>
>>  x86-32: nopl 
>>  x86-64: nopl (%rip)
> 
> I still don't see how this can possibly work well with libraries.  If
> glibc or whatever issues the syscall and registers some signature,
> that signature *must* match the expectation of all libraries used in
> that thread or it's not going to work.

Here is how I envision this signature can eventually be randomized:

A librseq.so provided by glibc manages rseq thread registration. That
library could generate a random uint32_t value as signature for each
process within a constructor, as well as lazily upon first call to
signature query function (whichever comes first).

The constructors of every program/library using rseq would invoke
a signature getter function to query the random value, and iterate over
a section of pointers to signatures, and update those as part of the
constructors (temporarily mprotecting the pages as writeable).

Given that this would prevent page sharing across processes due to
CoW, I would not advise going for this randomized signature solution
unless necessary, but I think it's good to keep the door open to this
by keeping a uint32_t sig argument to sys_rseq.

> I can see two reasonable ways
> to handle it:
> 
> 1. The signature is just a well-known constant.  If you have an rseq
> abort landing site, you end up with something like:
> 
> nopl $11223344(%rip)
> landing_site:
> 
> or whatever the constant is.

If librseq.so passes a hardcoded constant to sys_rseq, then my solution
is very similar to this one, except that mine can allow randomized
signatures in the future for a kernel ABI perspective.

> 
> 2. The signature varies depending on the rseq_cs in use.  So you get:
> 
> static struct rseq_cs this_cs = {
>  .signature = 0x55667788;
>  ...
> };
> 
> and then the abort landing site has:
> 
> nopl $11223344(%rip)
> nopl $55667788(%rax)
> landing_site:

AFAIU, this solution defeats the purpose of having code signatures in the
in the first place. An attacker simply has to:

1) Craft a dummy struct rseq_cs on the stack, with:

struct rseq_cs {
  .signature = ,
  .start_ip = 0x0,
  .len = -1UL,
  .abort_ip = ,
}

2) Store the address of this dummy struct rseq_cs into __rseq_abi.rseq_cs.

3) Profit.

You should _never_ compare the signature in the code with an integer
value which can end up being controlled by the attacker.

Passing the signature to the system call upon registration leaves to the
kernel the job of keeping that signature around. An attacker would need
to first invoke sys_rseq to unregister the current __rseq_abi and re-register
with another signature in order to make this work. If an attacker has that
much access to control program execution and issue system calls at will,
then the game is already lost: they already control the execution flow,
so what's the point in trying to prevent branching to a specific address ?

> 
> The former is a bit easier to deal with.  The latter has the nice
> property that you can't subvert one rseq_cs to land somewhere else,
> but it's not clear to me how what actual attack this prevents, so I
> think I prefer #1.  I just think that your variant is asking for
> trouble down the road with incompatible userspace.

As described above, user-space can easily make the signature randomization
work by having all users patch code within constructors.

Thanks,

Mathieu

> 
> --Andy

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH] samples: replace FSF address with web source in license notices

2017-11-14 Thread Greg KH

On Tue, Nov 14, 2017 at 10:50:37AM +0100, Martin Kepplinger wrote:
> A few years ago the FSF moved and "59 Temple Place" is wrong. Having this
> still in our source files feels old and unmaintained.
> 
> Let's take the license statement serious and not confuse users.
> 
> As https://www.gnu.org/licenses/gpl-howto.html suggests, we replace the
> postal address with "" in the samples
> directory.

What would be best is to just put the SPDX single line at the top of the
files, and then remove this license "boilerplate" entirely.  I've
started to do that with some subsystems already (drivers/usb/ and
drivers/tty/ are almost finished, see Linus's tree for details), and
I've sent out a patch series for drivers/s390/ yesterday if you want to
see an example of how to do it.

Could you do that here instead of this patch as well?

thanks,

greg k-h

Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

2017-11-14 Thread Shakeel Butt

On Tue, Nov 14, 2017 at 4:56 PM, Minchan Kim  wrote:
> On Tue, Nov 14, 2017 at 06:37:42AM +0900, Tetsuo Handa wrote:
>> When shrinker_rwsem was introduced, it was assumed that
>> register_shrinker()/unregister_shrinker() are really unlikely paths
>> which are called during initialization and tear down. But nowadays,
>> register_shrinker()/unregister_shrinker() might be called regularly.
>> This patch prepares for allowing parallel registration/unregistration
>> of shrinkers.
>>
>> Since do_shrink_slab() can reschedule, we cannot protect shrinker_list
>> using one RCU section. But using atomic_inc()/atomic_dec() for each
>> do_shrink_slab() call will not impact so much.
>>
>> This patch uses polling loop with short sleep for unregister_shrinker()
>> rather than wait_on_atomic_t(), for we can save reader's cost (plain
>> atomic_dec() compared to atomic_dec_and_test()), we can expect that
>> do_shrink_slab() of unregistering shrinker likely returns shortly, and
>> we can avoid khungtaskd warnings when do_shrink_slab() of unregistering
>> shrinker unexpectedly took so long.
>>
>> Signed-off-by: Tetsuo Handa 
>
> Before reviewing this patch, can't we solve the problem with more
> simple way? Like this.
>
> Shakeel, What do you think?
>

Seems simple enough. I will run my test (running fork bomb in one
memcg and separately time a mount operation) and update if numbers
differ significantly.

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 13d711dd8776..cbb624cb9baa 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -498,6 +498,14 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
> sc.nid = 0;
>
> freed += do_shrink_slab(&sc, shrinker, nr_scanned, 
> nr_eligible);
> +   /*
> +* bail out if someone want to register a new shrinker to 
> prevent
> +* long time stall by parallel ongoing shrinking.
> +*/
> +   if (rwsem_is_contended(&shrinker_rwsem)) {
> +   freed = 1;

freed = freed ?: 1;

> +   break;
> +   }
> }
>
> up_read(&shrinker_rwsem);

Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit

2017-11-14 Thread Eric W. Biederman

Kirill Tkhai  writes:

> Curently mutex is used to protect pernet operations list. It makes
> cleanup_net() to execute ->exit methods of the same operations set,
> which was used on the time of ->init, even after net namespace is
> unlinked from net_namespace_list.
>
> But the problem is it's need to synchronize_rcu() after net is removed
> from net_namespace_list():
>
> Destroy net_ns:
> cleanup_net()
>   mutex_lock(&net_mutex)
>   list_del_rcu(&net->list)
>   synchronize_rcu()  <--- Sleep there for ages
>   list_for_each_entry_reverse(ops, &pernet_list, list)
> ops_exit_list(ops, &net_exit_list)
>   list_for_each_entry_reverse(ops, &pernet_list, list)
> ops_free_list(ops, &net_exit_list)
>   mutex_unlock(&net_mutex)
>
> This primitive is not fast, especially on the systems with many processors
> and/or when preemptible RCU is enabled in config. So, all the time, while
> cleanup_net() is waiting for RCU grace period, creation of new net namespaces
> is not possible, the tasks, who makes it, are sleeping on the same mutex:
>
> Create net_ns:
> copy_net_ns()
>   mutex_lock_killable(&net_mutex)<--- Sleep there for ages
>
> The solution is to convert net_mutex to the rw_semaphore. Then,
> pernet_operations::init/::exit methods, modifying the net-related data,
> will require down_read() locking only, while down_write() will be used
> for changing pernet_list.
>
> This gives signify performance increase, like you may see below. There
> is measured sequential net namespace creation in a cycle, in single
> thread, without other tasks (single user mode):
>
> 1)int main(int argc, char *argv[])
> {
> unsigned nr;
> if (argc < 2) {
> fprintf(stderr, "Provide nr iterations arg\n");
> return 1;
> }
> nr = atoi(argv[1]);
> while (nr-- > 0) {
> if (unshare(CLONE_NEWNET)) {
> perror("Can't unshare");
> return 1;
> }
> }
> return 0;
> }
>
> Origin, 10 unshare():
> 0.03user 23.14system 1:39.85elapsed 23%CPU
>
> Patched, 10 unshare():
> 0.03user 67.49system 1:08.34elapsed 98%CPU
>
> 2)for i in {1..1}; do unshare -n bash -c exit; done
>
> Origin:
> real 1m24,190s
> user 0m6,225s
> sys 0m15,132s
>
> Patched:
> real 0m18,235s   (4.6 times faster)
> user 0m4,544s
> sys 0m13,796s
>
> This patch requires commit 76f8507f7a64 "locking/rwsem: Add 
> down_read_killable()"
> from Linus tree (not in net-next yet).

Using a rwsem to protect the list of operations makes sense.

That should allow removing the sing

I am not wild about taking a the rwsem down_write in
rtnl_link_unregister, and net_ns_barrier.  I think that works but it
goes from being a mild hack to being a pretty bad hack and something
else that can kill the parallelism you are seeking it add.

There are about 204 instances of struct pernet_operations.  That is a
lot of code to have carefully audited to ensure it can in parallel all
at once.  The existence of the exit_batch method, net_ns_barrier,
for_each_net and taking of net_mutex in rtnl_link_unregister all testify
to the fact that there are data structures accessed by multiple network
namespaces.

My preference would be to:

- Add the net_sem in addition to net_mutex with down_write only held in
  register and unregister, and maybe net_ns_barrier and
  rtnl_link_unregister.

- Factor out struct pernet_ops out of struct pernet_operations.  With
  struct pernet_ops not having the exit_batch method.  With pernet_ops
  being embedded an anonymous member of the old struct pernet_operations.

- Add [un]register_pernet_{sys,dev} functions that take a struct
  pernet_ops, that don't take net_mutex.  Have them order the
  pernet_list as:

  pernet_sys
  pernet_subsys
  pernet_device
  pernet_dev

  With the chunk in the middle taking the net_mutex.

  I think I would enforce the ordering with a failure to register
  if a subsystem or a device tried to register out of order.  

- Disable use of the single threaded workqueue if nothing needs the
  net_mutex.

- Add a test mode that deliberartely spawns threads on multiple
  processors and deliberately creates multiple network namespaces
  at the same time.

- Add a test mode that deliberately spawns threads on multiple
  processors and delibearate destrosy multiple network namespaces
  at the same time.
  
- Convert the code to unlocked operation one pernet_operations to at a
  time.  Being careful with the loopback device it's order in the list
  strongly matters.

- Finally remove the unnecessary code.


At the end of the day because all of the operations for one network
namespace will run in parallel with all of the operations for another
network namespace all of the sophistication that goes into batching the
cleanup of multiple network namespaces can be removed.  As different
tasks (not sharing a lock) can wait in syncrhonize_rcu at the sa

[PATCH] usb: dwc3: Enable the USB snooping

2017-11-14 Thread Ran Wang

Add support for USB3 snooping by asserting bits
in register DWC3_GSBUSCFG0 for data and descriptor.

Signed-off-by: Changming Huang 
Signed-off-by: Rajesh Bhagat 
Signed-off-by: Ran Wang 
---
 drivers/usb/dwc3/core.c | 24 
 drivers/usb/dwc3/core.h | 10 ++
 2 files changed, 34 insertions(+)

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 07832509584f..ffc078ab4a1c 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -236,6 +236,26 @@ static int dwc3_core_soft_reset(struct dwc3 *dwc)
return -ETIMEDOUT;
 }
 
+/*
+ * dwc3_enable_snooping - Enable snooping feature
+ * @dwc3: Pointer to our controller context structure
+ */
+static void dwc3_enable_snooping(struct dwc3 *dwc)
+{
+   u32 cfg;
+
+   cfg = dwc3_readl(dwc->regs, DWC3_GSBUSCFG0);
+   if (dwc->dma_coherent) {
+   cfg &= ~DWC3_GSBUSCFG0_SNP_MASK;
+   cfg |= (AXI3_CACHE_TYPE_SNP << DWC3_GSBUSCFG0_DATARD_SHIFT) |
+   (AXI3_CACHE_TYPE_SNP << DWC3_GSBUSCFG0_DESCRD_SHIFT) |
+   (AXI3_CACHE_TYPE_SNP << DWC3_GSBUSCFG0_DATAWR_SHIFT) |
+   (AXI3_CACHE_TYPE_SNP << DWC3_GSBUSCFG0_DESCWR_SHIFT);
+   }
+
+   dwc3_writel(dwc->regs, DWC3_GSBUSCFG0, cfg);
+}
+
 /*
  * dwc3_frame_length_adjustment - Adjusts frame length if required
  * @dwc3: Pointer to our controller context structure
@@ -776,6 +796,8 @@ static int dwc3_core_init(struct dwc3 *dwc)
/* Adjust Frame Length */
dwc3_frame_length_adjustment(dwc);
 
+   dwc3_enable_snooping(dwc);
+
usb_phy_set_suspend(dwc->usb2_phy, 0);
usb_phy_set_suspend(dwc->usb3_phy, 0);
ret = phy_power_on(dwc->usb2_generic_phy);
@@ -1021,6 +1043,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
&hird_threshold);
dwc->usb3_lpm_capable = device_property_read_bool(dev,
"snps,usb3_lpm_capable");
+   dwc->dma_coherent = device_property_read_bool(dev,
+   "dma-coherent");
 
dwc->disable_scramble_quirk = device_property_read_bool(dev,
"snps,disable_scramble_quirk");
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 4a4a4c98508c..6e6a66650e53 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -153,6 +153,14 @@
 
 /* Bit fields */
 
+/* Global SoC Bus Configuration Register 0 */
+#define AXI3_CACHE_TYPE_SNP0x2 /* cacheable */
+#define DWC3_GSBUSCFG0_DATARD_SHIFT28
+#define DWC3_GSBUSCFG0_DESCRD_SHIFT24
+#define DWC3_GSBUSCFG0_DATAWR_SHIFT20
+#define DWC3_GSBUSCFG0_DESCWR_SHIFT16
+#define DWC3_GSBUSCFG0_SNP_MASK0x
+
 /* Global Debug Queue/FIFO Space Available Register */
 #define DWC3_GDBGFIFOSPACE_NUM(n)  ((n) & 0x1f)
 #define DWC3_GDBGFIFOSPACE_TYPE(n) (((n) << 5) & 0x1e0)
@@ -859,6 +867,7 @@ struct dwc3_scratchpad_array {
  * 3   - Reserved
  * @imod_interval: set the interrupt moderation interval in 250ns
  * increments or 0 to disable.
+ * @dma_coherent: set if enable dma-coherent.
  */
 struct dwc3 {
struct work_struct  drd_work;
@@ -990,6 +999,7 @@ struct dwc3 {
unsignedsetup_packet_pending:1;
unsignedthree_stage_setup:1;
unsignedusb3_lpm_capable:1;
+   unsigneddma_coherent:1;
 
unsigneddisable_scramble_quirk:1;
unsignedu2exit_lfps_quirk:1;
-- 
2.14.1

Re: Coccinelle: badzero.cocci failure

2017-11-14 Thread Julia Lawall



On Tue, 14 Nov 2017, Masahiro Yamada wrote:

> Hi Julia,
>
>
> 2017-11-14 18:07 GMT+09:00 Julia Lawall :
> >> coccicheck failed
> >> $ cat cocci-debug.txt
> >> /home/masahiro/bin/spatch -D report --no-show-diff --very-quiet
> >> --cocci-file scripts/coccinelle/null/badzero.cocci --dir . -I
> >> ./arch/x86/include -I ./arch/x86/include/generated -I ./include -I
> >> ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi -I
> >> ./include/uapi -I ./include/generated/uapi --include
> >> ./include/linux/kconfig.h --jobs 8 --chunksize 1
> >> Fatal error: exception
> >> Yes_prepare_ocamlcocci.LinkFailure("/tmp/ocaml_cocci_18c9f9.cmxs")
> >
> > Does your Coccinelle support OCaml?  I'm not sure what is the proper way to
> > check for this, but in my coccinelle/config.log file I have
> >
> > FEATURE_OCAML='1'
>
>
> Yes.  I also see this line in my config.log
>
>
> > spatch --version gives:
> >
> > spatch version 1.0.6-00147-g19f9421 compiled with OCaml version 4.02.3
> > Flags passed to the configure script: [none]
> > Python scripting support: yes
> > Syntax of regular expresssions: Str
>
> My version output looks like follows:
>
> $ spatch --version
> spatch version 1.0.6-00345-g2ca0bef compiled with OCaml version 4.02.3
> Flags passed to the configure script: --prefix=/home/masahiro
> Python scripting support: yes
> Syntax of regular expresssions: PCRE
>
>
> > I'm not sure why it doesn't give feedback on whether OCaml scripting is
> > supported.  I will check on this.

Can you try the following semantic patch (called eg nothing.cocci):

@script:ocaml@
@@

()

on any .c file, ie

spatch --sp-file nothing.cocci test.c

julia

Re: 4.14 kernel and acpi INT3400:00: Unsupported event [0x86]

2017-11-14 Thread Zhang Rui

Hi, Brian,

thanks for your quick fix, as it is in merge window right now, I will
queue it for for next -rc2.

thanks,
rui

On Tue, 2017-11-14 at 10:50 -0700, Brian Bian wrote:
> I have submitted a patch to suppress such messages. The INT3400
> driver
> currently handles 0x83 thermal-relationship-table-change event
> only, and all other ACPI notification codes are unknown/irrelevant
> to the INT3400 driver.
> 
> Thanks,
> -Brian
> 
> On Mon, 13 Nov 2017, Arkadiusz Miskiewicz wrote:
> 
> > 
> > On Monday 13 of November 2017, Zhang Rui wrote:
> > > 
> > > On Sun, 2017-11-12 at 23:25 +0100, Arkadiusz Miskiewicz wrote:
> > > > 
> > > > Hello.
> > > > 
> > > > On Dell XPS 9530 and 4.14 kernel dmesg is flooded with:
> > > > 
> > > > [  292.580807] acpi INT3400:00: Unsupported event [0x86]
> > > > [  299.284648] acpi INT3400:00: Unsupported event [0x86]
> > > > [  305.648079] acpi INT3400:00: Unsupported event [0x86]
> > > > [  315.444799] acpi INT3400:00: Unsupported event [0x86]
> > > > [  317.432412] acpi INT3400:00: Unsupported event [0x86]
> > > > [  319.420239] acpi INT3400:00: Unsupported event [0x86]
> > > > [  321.408476] acpi INT3400:00: Unsupported event [0x86]
> > > > [  323.400304] acpi INT3400:00: Unsupported event [0x86]
> > > > [  325.388358] acpi INT3400:00: Unsupported event [0x86]
> > > > 
> > > > What 0x86 might mean?
> > > please attach the acpidump output.
> > Attached.
> > 
> > > 
> > > 
> > > thanks,
> > > rui
> > 
> > 
> > -- 
> > Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

Re: rpmsg: qcom_glink_native: no module license, taints kernel

2017-11-14 Thread Bjorn Andersson

On Sun 12 Nov 09:17 PST 2017, Randy Dunlap wrote:

> [44098.635339] qcom_glink_native: module license 'unspecified' taints kernel.

Thanks for reporting this.

Regards,
Bjorn

Re: [PATCH] lost path_put in perf_fill_ns_link_info

2017-11-14 Thread Vasily Averin

On 2017-11-08 16:04, Vasily Averin wrote:
> On 2017-11-08 15:09, Alexander Shishkin wrote:
>> On Mon, Nov 06, 2017 at 09:22:18AM +0300, Vasily Averin wrote:
>>> Fixes: commit e422267322cd ("perf: Add PERF_RECORD_NAMESPACES to include 
>>> namespaces related info")
>>> Signed-off-by: Vasily Averin 
>>
>> The change description is missing. One needs to open the source code and
>> look for proof of correctness for this patch.
> 
> perf_fill_ns_link_info() calls ns_get_path()
> it returns ns_path with increased mnt and dentry counters.
> 
> Problem is that nodody decrement these counters.
> 
> You can call ./perf record --namespaces unshare -m
> and look how grows mount counter on nsfs_mnt. 

Situation is even worse, leaked dentry does not allow to free related 
namespaces.

[root@localhost ~]# uname -a
Linux localhost.localdomain 4.14.0+ #10 SMP Wed Nov 15 00:31:34 MSK 2017 x86_64 
x86_64 x86_64 GNU/Linux

VvS:  without --namespace perf works correctly

[root@localhost ~]# grep namespace /proc/slabinfo
pid_namespace  0  0   2568   128 : tunables000 : 
slabdata  0  0  0
user_namespace 0  0824   398 : tunables000 : 
slabdata  0  0  0
net_namespace  0  5   627258 : tunables000 : 
slabdata  1  1  0
[root@localhost ~]# perf record  -q  unshare -n -U  -p --fork true 
[root@localhost ~]# grep namespace /proc/slabinfo
pid_namespace  0 12   2568   128 : tunables000 : 
slabdata  1  1  0
user_namespace 0 39824   398 : tunables000 : 
slabdata  1  1  0
net_namespace  0  5   627258 : tunables000 : 
slabdata  1  1  0

VvS: with --namespace perf leaks namespaces

[root@localhost ~]# perf record  -q  --namespace unshare -n -U  -p --fork true
[root@localhost ~]# grep namespace /proc/slabinfo
pid_namespace  1 12   2568   128 : tunables000 : 
slabdata  1  1  0
user_namespace 1 39824   398 : tunables000 : 
slabdata  1  1  0
net_namespace  1  5   627258 : tunables000 : 
slabdata  1  1  0

VvS: ... and once again, to be sure

[root@localhost ~]# perf record  -q  --namespace unshare -n -U  -p --fork true
[root@localhost ~]# grep namespace /proc/slabinfo
pid_namespace  2 12   2568   128 : tunables000 : 
slabdata  1  1  0
user_namespace 2 39824   398 : tunables000 : 
slabdata  1  1  0
net_namespace  2  5   627258 : tunables000 : 
slabdata  1  1  0


kmemleak also detects leaks dentry, inode, related structures and namespaces

unreferenced object 0x998008e76738 (size 192): <<< dentry
  comm "unshare", pid 1436, jiffies 4294786539 (age 509.114s)
[] __ns_get_path+0xf5/0x160
[] ns_get_path+0x28/0x50
[] perf_fill_ns_link_info+0x20/0x80
[] perf_event_namespaces.part.101+0xd7/0x120
[] copy_process.part.34+0x171d/0x1ae0
[] _do_fork+0xcc/0x390
[] do_syscall_64+0x61/0x170
[] return_from_SYSCALL_64+0x0/0x65
[] 0x

unreferenced object 0x998008e7c928 (size 600): <<< inode
  comm "unshare", pid 1436, jiffies 4294786539 (age 509.114s)
[] new_inode_pseudo+0xe/0x60
[] __ns_get_path+0x42/0x160
[] ns_get_path+0x28/0x50
 ...

unreferenced object 0x99800cacd320 (size 40): <<< inode_security_struct
  comm "unshare", pid 1436, jiffies 4294786539 (age 509.114s)
[] security_inode_alloc+0x36/0x50
[] inode_init_always+0xf5/0x1d0
[] alloc_inode+0x2b/0x80
[] new_inode_pseudo+0xe/0x60
[] __ns_get_path+0x42/0x160
 ...

unreferenced object 0x99800cb88a10 (size 2232): <<< pid_namespace
  comm "unshare", pid 1436, jiffies 4294786439 (age 509.214s)
[] create_new_namespaces+0xd4/0x1b0
[] unshare_nsproxy_namespaces+0x59/0xb0
[] SyS_unshare+0x1e5/0x370
[] entry_SYSCALL_64_fastpath+0x1a/0x7d
[] 0x

I've resend the patch as  "[PATCH] memory leaks triggered by perf --namespace"

Thank you,
Vasily Averin

Re: [PATCH 2/3] X86/kdump: crashkernel=X try to reserve below 896M first then below 4G and MAXMEM

2017-11-14 Thread Baoquan He

Hi Dave,

Thanks for your effort to push this into upstream. While I have one
concern, please see the inline comments.

On 10/24/17 at 01:31pm, Dave Young wrote:
> Now crashkernel=X will fail if there's not enough memory at low region
> (below 896M) when trying to reserve large memory size.  One can use
> crashkernel=xM,high to reserve it at high region (>4G) but it is more
> convinient to improve crashkernel=X to: 
> 
>  - First try to reserve X below 896M (for being compatible with old
>kexec-tools).
>  - If fails, try to reserve X below 4G (swiotlb need to stay below 4G).
>  - If fails, try to reserve X from MAXMEM top down.
> 
> It's more transparent and user-friendly.
> 
> If crashkernel is large and the reserved is beyond 896M, old kexec-tools
> is not compatible with new kernel because old kexec-tools can not load
> kernel at high memory region, there was an old discussion below:
> https://lkml.org/lkml/2013/10/15/601
> 
> But actually the behavior is consistent during my test. Suppose
> old kernel fail to reserve memory at low areas, kdump does not
> work because no meory reserved. With this patch, suppose new kernel
> successfully reserved memory at high areas, old kexec-tools still fail
> to load kdump kernel (tested 2.0.2), so it is acceptable, no need to
> worry about the compatibility.
> 
> Here is the test result (kexec-tools 2.0.2, no high memory load
> support):
> Crashkernel over 4G:
> # cat /proc/iomem|grep Crash
>   be00-cdff : Crash kernel
>   21300-21eff : Crash kernel
> # ./kexec  -p /boot/vmlinuz-`uname -r`
> Memory for crashkernel is not reserved
> Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
> Then try loading kdump kernel
> 
> crashkernel: 896M-4G:
> # cat /proc/iomem|grep Crash
>   9600-cdef : Crash kernel
> # ./kexec -p /boot/vmlinuz-4.14.0-rc4+
> ELF core (kcore) parse failed
> Cannot load /boot/vmlinuz-4.14.0-rc4+
> 
> Signed-off-by: Dave Young 
> ---
>  arch/x86/kernel/setup.c |   16 
>  1 file changed, 16 insertions(+)
> 
> --- linux-x86.orig/arch/x86/kernel/setup.c
> +++ linux-x86/arch/x86/kernel/setup.c
> @@ -568,6 +568,22 @@ static void __init reserve_crashkernel(v
>   high ? CRASH_ADDR_HIGH_MAX
>: CRASH_ADDR_LOW_MAX,
>   crash_size, CRASH_ALIGN);
> +#ifdef CONFIG_X86_64
> + /*
> +  * crashkernel=X reserve below 896M fails? Try below 4G
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + (1ULL << 32),
> + crash_size, CRASH_ALIGN);
> + /*
> +  * crashkernel=X reserve below 4G fails? Try MAXMEM
> +  */
> + if (!high && !crash_base)
> + crash_base = memblock_find_in_range(CRASH_ALIGN,
> + CRASH_ADDR_HIGH_MAX,
> + crash_size, CRASH_ALIGN);

For kdump, most of systems are x86 64. If both Yinghai and Vivek have no
objection to search an available region of crash_size above 896M
naturely, why don't we search it with function
__memblock_find_range_bottom_up(). It can search from below 896M to
above 4G, almost the same as the change you have made currently. Mainly
the code will be much simpler.

The several times of searching looks not good and a little confusing.

What do you think?

Thanks
Baoquan

> +#endif
>   if (!crash_base) {
>   pr_info("crashkernel reservation failed - No suitable 
> area found.\n");
>   return;
> 
>

linux-next: Tree for Nov 15

2017-11-14 Thread Stephen Rothwell

Hi all,

Please do not add any v4.16 material to your linux-next included trees
until v4.15-rc1 has been released.

Changes since 20171114:

New tree: rseq

Removed tree: vfs-jk at maintainers request

The rseq tree gained conflicts against Linus' and the powerpc trees.

Non-merge commits (relative to Linus' tree): 10361
 8500 files changed, 500697 insertions(+), 230634 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 272 trees (counting Linus' and 42 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (894025f24bd0 Merge tag 'usb-4.15-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb)
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (bb3f38c3c5b7 kbuild: clang: fix build failures 
with sparse check)
Merging arc-current/for-curr (f3156851616b ARCv2: boot log: updates for HS48: 
dual-issue, ECC, Loop Buffer)
Merging arm-current/fixes (b9dd05c7002e ARM: 8720/1: ensure dump_instr() checks 
addr_limit)
Merging m68k-current/for-linus (5e387199c17c m68k/defconfig: Update defconfigs 
for v4.14-rc7)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7ecb37f62fe5 powerpc/perf: Fix core-imc hotplug 
callback failure during imc initialization)
Merging sparc/master (23198ddffb6c sparc32: Add cmpxchg64().)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (b39545684a90 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging ipsec/master (b39545684a90 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging netfilter/master (7400bb4b5800 netfilter: nf_reject_ipv4: Fix 
use-after-free in send_reset)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a6127b4440d1 Merge tag 
'iwlwifi-for-kalle-2017-10-06' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (b39545684a90 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging sound-current/for-linus (7087cb8fad5e Documentation: sound: hd-audio: 
notes.rst)
CONFLICT (modify/delete): sound/oss/uart6850.c deleted in 
sound-current/for-linus and modified in HEAD. Version HEAD of 
sound/oss/uart6850.c left in tree.
CONFLICT (modify/delete): sound/oss/sys_timer.c deleted in 
sound-current/for-linus and modified in HEAD. Version HEAD of 
sound/oss/sys_timer.c left in tree.
CONFLICT (modify/delete): sound/oss/soundcard.c deleted in 
sound-current/for-linus and modified in HEAD. Version HEAD of 
sound/oss/soundcard.c left in tree.
CONFLICT (modify/delete): sound/oss/midibuf.c deleted in 
sound-current/for-linus and modified in HEAD. Version HEAD of 
sound/oss/midibuf.c left in tree.
$ git rm -f sound/oss/midibuf.c sound/oss/soundcard.c sound/oss/sys_timer.c 
sound/oss/uart6850.c
Merging pci-current/for-linus (6b7be529634b MAINTAINERS: Add Lorenzo Pieralisi 
for PCI host bridge drivers)
Merging driver-core.current/driver-core-linus (39dae59d66ac Linux 4.14-rc8)
Merging tty.current/tty-linus (894025f24bd0 Merge tag '

[PATCH] memory leaks triggered by perf --namespace

2017-11-14 Thread Vasily Averin

perf with --namespace key leaks various memory objects including namespaces

# uname -r
4.14.0+
# perf record  -q  --namespace unshare -n -U  -p --fork true
# grep namespace /proc/slabinfo
pid_namespace  1 12   2568   128 
user_namespace 1 39824   398 
net_namespace  1  5   627258 

This happen because perf_fill_ns_link_info() struct patch ns_path:
during initialization ns_path incremented counters on related mnt and dentry,
but without lost path_put nobody decremented them back.
Leaked dentry is name of related namespace,
and its leak does not allow to free unused namespace.

Fixes: commit e422267322cd ("perf: Add PERF_RECORD_NAMESPACES to include 
namespaces related info")
Signed-off-by: Vasily Averin 
---
 kernel/events/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 10cdb9c..ab5ac84 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6756,6 +6756,7 @@ static void perf_fill_ns_link_info(struct 
perf_ns_link_info *ns_link_info,
ns_inode = ns_path.dentry->d_inode;
ns_link_info->dev = new_encode_dev(ns_inode->i_sb->s_dev);
ns_link_info->ino = ns_inode->i_ino;
+   path_put(&ns_path);
}
 }
 
-- 
2.7.4

Re: [PATCH] sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_t

2017-11-14 Thread David Miller

From: Elena Reshetova 
Date: Fri, 20 Oct 2017 10:57:57 +0300

> atomic_t variables are currently used to implement reference
> counters with the following properties:
>  - counter is initialized to 1 using atomic_set()
>  - a resource is freed upon counter reaching zero
>  - once counter reaches zero, its further
>increments aren't allowed
>  - counter schema uses basic atomic operations
>(set, inc, inc_not_zero, dec_and_test, etc.)
> 
> Such atomic variables should be converted to a newly provided
> refcount_t type and API that prevents accidental counter overflows
> and underflows. This is important since overflows and underflows
> can lead to use-after-free situation and be exploitable.
> 
> The variable mdesc_handle.refcnt is used as pure reference counter.
> Convert it to refcount_t and fix up the operations.
> 
> Suggested-by: Kees Cook 
> Reviewed-by: David Windsor 
> Reviewed-by: Hans Liljestrand 
> Signed-off-by: Elena Reshetova 

Applied.

Re: [PATCH] sparc/led: Convert timers to use timer_setup()

2017-11-14 Thread David Miller

From: Kees Cook 
Date: Tue, 10 Oct 2017 15:07:27 -0700

> In preparation for unconditionally passing the struct timer_list pointer to
> all timer callbacks, switch to using the new timer_setup() and from_timer()
> to pass the timer pointer explicitly. Adds a static variable to hold timeout
> value.
> 
> Cc: "David S. Miller" 
> Cc: Geliang Tang 
> Cc: Ingo Molnar 
> Cc: sparcli...@vger.kernel.org
> Signed-off-by: Kees Cook 

Applied.

Re: [PATCH v4 2/3] powerpc/modules: Don't try to restore r2 after a sibling call

2017-11-14 Thread Kamalesh Babulal


On Tuesday 14 November 2017 09:23 PM, Josh Poimboeuf wrote:

On Tue, Nov 14, 2017 at 03:59:21PM +0530, Naveen N. Rao wrote:

Kamalesh Babulal wrote:

From: Josh Poimboeuf 

When attempting to load a livepatch module, I got the following error:

  module_64: patch_module: Expect noop after relocate, got 3c82

The error was triggered by the following code in
unregister_netdevice_queue():

  14c:   00 00 00 48 b   14c 
 14c: R_PPC64_REL24  net_set_todo
  150:   00 00 82 3c addis   r4,r2,0

GCC didn't insert a nop after the branch to net_set_todo() because it's
a sibling call, so it never returns.  The nop isn't needed after the
branch in that case.

Signed-off-by: Josh Poimboeuf 
Signed-off-by: Kamalesh Babulal 
---
 arch/powerpc/kernel/module_64.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 39b01fd..9e5391f 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -489,6 +489,10 @@ static int restore_r2(u32 *instruction, struct module *me)
if (is_early_mcount_callsite(instruction - 1))
return 1;

+   /* Sibling calls don't return, so they don't need to restore r2 */
+   if (instruction[-1] == PPC_INST_BRANCH)
+   return 1;
+


This looks quite fragile, unless we know for sure that gcc will _always_
emit this instruction form for sibling calls with relocations.

As an alternative, does it make sense to do the following check instead?
if ((instr_is_branch_iform(insn) || instr_is_branch_bform(insn))
&& !(insn & 0x1))


Yes, good point.  How about something like this?

(completely untested because I don't have access to a box at the moment)


Reviewed-and-tested-by: Kamalesh Babulal 




diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index abef812de7f8..302e4368debc 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -33,6 +33,7 @@ int patch_branch(unsigned int *addr, unsigned long target, 
int flags);
 int patch_instruction(unsigned int *addr, unsigned int instr);

 int instr_is_relative_branch(unsigned int instr);
+int instr_is_link_branch(unsigned int instr);
 int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
 unsigned long branch_target(const unsigned int *instr);
 unsigned int translate_branch(const unsigned int *dest,
diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 9cb007bc7075..b5148a206b4d 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -487,11 +487,13 @@ static bool is_early_mcount_callsite(u32 *instruction)
restore r2. */
 static int restore_r2(u32 *instruction, struct module *me)
 {
-   if (is_early_mcount_callsite(instruction - 1))
+   u32 *prev_insn = instruction - 1;
+
+   if (is_early_mcount_callsite(prev_insn))
return 1;

/* Sibling calls don't return, so they don't need to restore r2 */
-   if (instruction[-1] == PPC_INST_BRANCH)
+   if (!instr_is_link_branch(*prev_insn))
return 1;

if (*instruction != PPC_INST_NOP) {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index c9de03e0c1f1..4727fafd37e4 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -304,6 +304,12 @@ int instr_is_relative_branch(unsigned int instr)
return instr_is_branch_iform(instr) || instr_is_branch_bform(instr);
 }

+int instr_is_link_branch(unsigned int instr)
+{
+   return (instr_is_branch_iform(instr) || instr_is_branch_bform(instr)) &&
+  (instr & BRANCH_SET_LINK);
+}
+
 static unsigned long branch_iform_target(const unsigned int *instr)
 {
signed long imm;




--
cheers,
Kamalesh.

Re: [PATCH v4 0/5] sparc64: Optimize fls and __fls

2017-11-14 Thread David Miller

From: Vijay Kumar 
Date: Wed, 11 Oct 2017 12:50:01 -0600

> SPARC provides lzcnt instruction (with VIS3) which can be used to
> optimize fls, __fls and fls64 functions. For the systems that supports
> lzcnt instruction, we now do boot time patching to use sparc
> optimized fls, __fls and fls64 functions.
> 
> v3->v4:
>  -  Fixed a typo.
> v2->v3:
>  -  Using ENTRY(), ENDPROC() for assembler functions.
>  -  Removed BITS_PER_LONG from __fls.
>  -  Using generic fls64().
>  -  Replaced lzcnt instruction with .word directive.
> v1->v2:
>  - Fixed delay slot issue.

Series applied, thank you.

Re: [PATCH] openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start

2017-11-14 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Tue, 14 Nov 2017 14:26:16 -0600

> It seems that the intention of the code is to null check the value
> returned by function genlmsg_put. But the current code is null
> checking the address of the pointer that holds the value returned
> by genlmsg_put.
> 
> Fix this by properly null checking the value returned by function
> genlmsg_put in order to avoid a pontential null pointer dereference.
> 
> Addresses-Coverity-ID: 1461561 ("Dereference before null check")
> Addresses-Coverity-ID: 1461562 ("Dereference null return value")
> Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
> Signed-off-by: Gustavo A. R. Silva 

Applied.

linux-next: build warning after merge of the net-next tree

2017-11-14 Thread Stephen Rothwell

Hi all,

After merging the net-next tree, today's linux-next build (powerpc
allyesconfig) produced this warning:

In file included from drivers/net/ethernet/ibm/ibmvnic.c:52:0:
drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_open':
include/linux/dma-mapping.h:571:2: warning: 'dma_addr' may be used 
uninitialized in this function [-Wmaybe-uninitialized]
  debug_dma_mapping_error(dev, dma_addr);
  ^
drivers/net/ethernet/ibm/ibmvnic.c:852:13: note: 'dma_addr' was declared here
  dma_addr_t dma_addr;
 ^

Introduced by commit

  4e6759be28e4 ("ibmvnic: Feature implementation of Vital Product Data (VPD) 
for the ibmvnic driver")

-- 
Cheers,
Stephen Rothwell

Re: Adding rseq tree to -next

2017-11-14 Thread Stephen Rothwell

Hi Mathieu,

On Wed, 15 Nov 2017 01:22:04 + (UTC) Mathieu Desnoyers 
 wrote:
>
> - On Nov 14, 2017, at 7:15 PM, Stephen Rothwell s...@canb.auug.org.au 
> wrote:
> 
> > On Tue, 14 Nov 2017 23:54:06 + (UTC) Mathieu Desnoyers
> >  wrote:  
> >>
> >> Would it be possible to add the "rseq" tree to -next for testing ?
> >> 
> >> I prepared a branch at:
> >> 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/rseq/linux-rseq.git
> >> branch: rseq/for-next  
> > 
> > I try not to add new trees during the merge window (the only exceptions
> > are for trees that will remain empty until after the merge window
> > closes or trees only containing material for the current merge window -
> > and in that case it is a bit late and a pain if it interacts with
> > anything else).
> > 
> > I will add it after -rc1 is released, though.  Please remind me if I
> > forget.  
> 
> No worries, sorry for the short notice. I'll try to do a merge
> attempt into -next on my end before sending to Linus then.

OK, since you intend to ask Linus to merge it during this merge window,
I have added it from today (I hope I don't regret it too much :-)).

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgement of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
 * submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
 * posted to the relevant mailing list,
 * reviewed by you (or another maintainer of your subsystem tree),
 * successfully unit tested, and 
 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
s...@canb.auug.org.au

linux-next: manual merge of the rseq tree with Linus' tree

2017-11-14 Thread Stephen Rothwell

Hi Mathieu,

Today's linux-next merge of the rseq tree got a conflict in:

  kernel/sched/core.c

between commit:

  7863406143d8 ("sched/isolation: Move housekeeping related code to its own 
file")

from Linus' tree and commit:

  533bd7403b04 ("x86: Introduce sync_core_before_usermode (v2)")

from the rseq tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc kernel/sched/core.c
index c85dfb746f8c,43099a091742..
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@@ -27,7 -26,7 +27,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  
  #include 
  #include

linux-next: manual merge of the rseq tree with the powerpc tree

2017-11-14 Thread Stephen Rothwell

Hi Mathieu,

Today's linux-next merge of the rseq tree got a conflict in:

  arch/powerpc/Kconfig

between commit:

  32ce3862af3c ("powerpc/lib: Implement PMEM API")

from the powerpc tree and commit:

  ba4b152f3f2f ("membarrier: powerpc: Skip memory barrier in switch_mm() (v7)")

from the rseq tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/Kconfig
index c51e6ce42e7a,e54a822e5fb9..
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@@ -139,7 -139,7 +139,8 @@@ config PP
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
+   select ARCH_HAS_MEMBARRIER_HOOKS
 +  select ARCH_HAS_PMEM_APIif PPC64
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST

linux-next: manual merge of the rseq tree with Linus' tree

2017-11-14 Thread Stephen Rothwell

Hi Mathieu,

[I may regret adding the rseq tree ...]

Today's linux-next merge of the rseq tree got a conflict in:

  arch/x86/entry/entry_64.S

between commits:

  9da78ba6b47b ("x86/entry/64: Remove the restore_c_regs_and_iret label")
  26c4ef9c49d8 ("x86/entry/64: Split the IRET-to-user and IRET-to-kernel paths")
  e53178328c9b ("x86/entry/64: Shrink paranoid_exit_restore and make labels 
local")

from Linus' tree and commit:

  60a77bfd24d5 ("membarrier: x86: Provide core serializing command (v2)")

from the rseq tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/entry/entry_64.S
index a2b30ec69497,4859f04e1695..
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@@ -617,21 -612,8 +617,25 @@@ GLOBAL(retint_user
mov %rsp,%rdi
callprepare_exit_to_usermode
TRACE_IRQS_IRETQ
 +
 +GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 +#ifdef CONFIG_DEBUG_ENTRY
 +  /* Assert that pt_regs indicates user mode. */
 +  testb   $3, CS(%rsp)
 +  jnz 1f
 +  ud2
 +1:
 +#endif
SWAPGS
 -  jmp restore_regs_and_iret
 +  POP_EXTRA_REGS
 +  POP_C_REGS
 +  addq$8, %rsp/* skip regs->orig_ax */
++  /*
++   * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization
++   * when returning from IPI handler.
++   */
 +  INTERRUPT_RETURN
 +
  
  /* Returning to kernel space */
  retint_kernel:
@@@ -651,17 -633,19 +655,21 @@@
 */
TRACE_IRQS_IRETQ
  
 -/*
 - * At this label, code paths which return to kernel and to user,
 - * which come from interrupts/exception and from syscalls, merge.
 - */
 -GLOBAL(restore_regs_and_iret)
 -  RESTORE_EXTRA_REGS
 -restore_c_regs_and_iret:
 -  RESTORE_C_REGS
 -  REMOVE_PT_GPREGS_FROM_STACK 8
 +GLOBAL(restore_regs_and_return_to_kernel)
 +#ifdef CONFIG_DEBUG_ENTRY
 +  /* Assert that pt_regs indicates kernel mode. */
 +  testb   $3, CS(%rsp)
 +  jz  1f
 +  ud2
 +1:
 +#endif
 +  POP_EXTRA_REGS
 +  POP_C_REGS
 +  addq$8, %rsp/* skip regs->orig_ax */
+   /*
+* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization
+* when returning from IPI handler.
+*/
INTERRUPT_RETURN
  
  ENTRY(native_iret)

Re: [PATCH 1/2] x86,kvm: move qemu/guest FPU switching out to vcpu_run

2017-11-14 Thread Wanpeng Li

2017-11-15 11:03 GMT+08:00 Rik van Riel :
> On Wed, 2017-11-15 at 08:47 +0800, Wanpeng Li wrote:
>> 2017-11-15 5:54 GMT+08:00  :
>> > From: Rik van Riel 
>> >
>> > Currently, every time a VCPU is scheduled out, the host kernel will
>> > first save the guest FPU/xstate context, then load the qemu
>> > userspace
>> > FPU context, only to then immediately save the qemu userspace FPU
>> > context back to memory. When scheduling in a VCPU, the same
>> > extraneous
>> > FPU loads and saves are done.
>> >
>> > This could be avoided by moving from a model where the guest FPU is
>> > loaded and stored with preemption disabled, to a model where the
>> > qemu userspace FPU is swapped out for the guest FPU context for
>> > the duration of the KVM_RUN ioctl.
>>
>> What will happen if CONFIG_PREEMPT is enabled?
>
> The scheduler will save the guest FPU context when a
> VCPU thread is preempted, and restore it when it is
> scheduled back in.

I mean all the involved processes will use fpu. Before patch if kernel
preempt occur:

context_switch
  -> prepare_task_switch
-> fire_sched_out_preempt_notifiers
  -> kvm_sched_out
-> kvm_arch_vcpu_put
  -> kvm_put_guest_fpu
   -> copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu)
save xsave area to guest fpu buffer
   -> __kernel_fpu_end
 ->
copy_kernel_to_fpregs(¤t->thread.fpu.state)
 restore prev vCPU qemu
userspace FPU to the xsave area
  -> switch_to
-> __switch_to
-> switch_fpu_prepare
  -> copy_fpregs_to_fpstate => save xsave area to prev
vCPU qemu userspace FPU
-> switch_fpu_finish
  -> copy_kernel_to_fpgregs => restore next task FPU
to xsave area


After the patch:

context_switch
  -> prepare_task_switch
-> fire_sched_out_preempt_notifiers
  -> kvm_sched_out

 -> switch_to
-> __switch_to
-> switch_fpu_prepare
  -> copy_fpregs_to_fpstate => Oops
  save xsave area to prev vCPU qemu userspace FPU,
actually the guest FPU buffer is loaded in xsave area, you transmit
guest FPU in xsave area into the prev vCPU qemu userspace FPU

Regards,
Wanpeng Li

Re: Linux 3.10.108 (EOL)

2017-11-14 Thread Willy Tarreau

On Tue, Nov 14, 2017 at 11:40:31PM +0100, Sebastian Gottschall wrote:
> > And anyway the end of life has been indicated on kernel.org for 18 months
> > and in every announce in 2017, so it cannot be a surprize anymore :-) At
> > least nobody seemed to complain for all this time!
> 
> itsn no surprise for sure, but that also means i have to stay on the old
> kernel for these special devices and your argument about disable certain
> parts which simply turned bigger over time is no option
> 
> since it would remove features which existed before. its not that i enable
> all features of the kernel. i use every kernel with the same options (some
> are adjusted since they are renamed or moved)

Then I have a few questions :
  - how did you choose this kernel ? Or did you choose the hardware based
on the kernel size ?
  - what would have you done if 3.10 had not been LTS ?
  - have you at least tried other kernels before claiming they are much
larger ? Following your principle, 3.2 should be smaller and 3.16 not
much larger. The former offers you about 6 extra months of maintenance,
the latter 3.5 years (https://www.kernel.org/category/releases.html).

> but even then the kernel is turning into a ram and space eating monster if i
> look on devices with 16 mb ram and 4 mb flash. this is mainly for
> maintaining older hardware with latest updates.

So why didn't you ask if it was possible to pursue the maintenance a bit a
long time ago ? LTS maintenance is a collective effort and is done based on
usage. If enough people have good reasons for going further it can be enough
a justification to push the deadline. Now it's too late.

> the more recent hardware is getting better here
> 
> you dont seem to know how it is to work on wireless routers :-)

Yes I do, I've been distributing a full blown load balancer distro on a
10 MB image (running on 3.10 as well). But I also know that sometimes
you make some nice space savings on new kernels (xz/zstd compression,
ability to remove certain useless stuff in these environments such as
FS ACLs or mandatory locks, etc). Sure, upgrading to a new kernel on
existing hardware is always a challenge. But it's also an interesting
one.

Also just to give you an idea, I've just compared the size of these
kernels configured with "make allnoconfig" (and I verified that all
of them were compressed using gzip) :

  3.10.108 : 875 kB
  4.4.97   : 522 kB
  4.9.61   : 561 kB
  4.14 : 566 kB

So the argument that migrating away from 3.10 is hard due to the size
doesn't stand much here :-)

Willy

[URGENT PATCH] arm64: dts: uniphier: route on-board device IRQ to GPIO controller for PXs3

2017-11-14 Thread Masahiro Yamada

Commit 429f203eb712 ("arm64: dts: uniphier: route on-board device IRQ
to GPIO controller") missed to update this DTS.  It becames a real
problem when arm and arm64 trees are merged together.

Signed-off-by: Masahiro Yamada 
---

Arnd, Olof,

I think you are sending pull-requests for v4.15-rc1 shortly.
Can you apply this on top your arm64 tree?

With DT tree merged in Linus' tree today,
I realized my mistake that was hidden before.

arch/arm64/boot/dts/socionext/uniphier-pxs3-ref.dtb: Warning 
(interrupts_property):
interrupts size is (12), expected multiple of 8 in 
/soc@0/system-bus@58c0/support-card@1,1f0/ethernet@0

We do not have much time, so I am sending this patch
instead of a pull-reuqest.

Please pick up.


 arch/arm64/boot/dts/socionext/uniphier-pxs3-ref.dts | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/socionext/uniphier-pxs3-ref.dts 
b/arch/arm64/boot/dts/socionext/uniphier-pxs3-ref.dts
index dad4743..864feeb 100644
--- a/arch/arm64/boot/dts/socionext/uniphier-pxs3-ref.dts
+++ b/arch/arm64/boot/dts/socionext/uniphier-pxs3-ref.dts
@@ -38,7 +38,8 @@
 };
 
 ðsc {
-   interrupts = <0 52 4>;
+   interrupt-parent = <&gpio>;
+   interrupts = <0 8>;
 };
 
 &serial0 {
-- 
2.7.4

Re: [RFC PATCH for 4.15 00/24] Restartable sequences and CPU op vector v11

2017-11-14 Thread Andy Lutomirski

> On Nov 14, 2017, at 1:32 PM, Mathieu Desnoyers 
>  wrote:
>
> - On Nov 14, 2017, at 4:15 PM, Andy Lutomirski l...@amacapital.net wrote:
>
>
> One thing I kept however that diverge from your recommendation is the
> "sign" parameter to the rseq syscall. I prefer this flexible
> approach to a hardcoded signature value. We never know when we may
> need to randomize or change this in the future.
>
> Regarding abort target signature the vs x86 disassemblers, I used a
> 5-byte no-op on x86 32/64:
>
>  x86-32: nopl 
>  x86-64: nopl (%rip)

I still don't see how this can possibly work well with libraries.  If
glibc or whatever issues the syscall and registers some signature,
that signature *must* match the expectation of all libraries used in
that thread or it's not going to work.  I can see two reasonable ways
to handle it:

1. The signature is just a well-known constant.  If you have an rseq
abort landing site, you end up with something like:

nopl $11223344(%rip)
landing_site:

or whatever the constant is.

2. The signature varies depending on the rseq_cs in use.  So you get:

static struct rseq_cs this_cs = {
  .signature = 0x55667788;
  ...
};

and then the abort landing site has:

nopl $11223344(%rip)
nopl $55667788(%rax)
landing_site:

The former is a bit easier to deal with.  The latter has the nice
property that you can't subvert one rseq_cs to land somewhere else,
but it's not clear to me how what actual attack this prevents, so I
think I prefer #1.  I just think that your variant is asking for
trouble down the road with incompatible userspace.

--Andy

Re: Allocation failure of ring buffer for trace

2017-11-14 Thread YASUAKI ISHIMATSU

Hi Mel,

Your patch works good.

Here are the results of your patch.

- boot up without trace_buf_size boot option

When system boots up without trace_buf_size boot option, deferred_init_memmap()
runs after booting SMP configuration. There no change of boot sequence between
4.14.0 and 4.14.0 with your patch.

[0.256285] x86: Booting SMP configuration:
...
[5.313195] node 0 initialised, 15530251 pages in 653ms
[5.330691] node 1 initialised, 15988494 pages in 670ms
[5.331746] node 2 initialised, 15988493 pages in 671ms
[5.332166] node 6 initialised, 15982779 pages in 670ms
[5.332673] node 3 initialised, 15988494 pages in 671ms
[5.332618] node 4 initialised, 15988494 pages in 672ms
[5.334187] node 7 initialised, 15987304 pages in 672ms
[5.334976] node 5 initialised, 15988494 pages in 673ms

- boot up with trace_buf_size boot option

When system boots up with trace_buf_size boot option, deferred_init_memmap()
runs before booting SMP configuration. So every memory on all nodes is
initialised before allocating trace buffer. And system can boot up even if
we set trace_buf_size boot option.

[0.932114] node 0 initialised, 15530251 pages in 684ms
[1.604918] node 1 initialised, 15988494 pages in 671ms
[2.278933] node 2 initialised, 15988494 pages in 673ms
[2.965076] node 3 initialised, 15988494 pages in 686ms
[3.669064] node 4 initialised, 15988494 pages in 703ms
[4.354983] node 5 initialised, 15988493 pages in 684ms
[5.028681] node 6 initialised, 15982779 pages in 673ms
[5.716102] node 7 initialised, 15987304 pages in 687ms
[5.727855] smp: Bringing up secondary CPUs ...
[5.745937] x86: Booting SMP configuration:

Thanks,
Yasuaki Ishimatsu

On 11/14/2017 06:46 AM, Mel Gorman wrote:
> On Mon, Nov 13, 2017 at 12:48:36PM -0500, YASUAKI ISHIMATSU wrote:
>> When using trace_buf_size= boot option, memory allocation of ring buffer
>> for trace fails as follows:
>>
>> [ ] x86: Booting SMP configuration:
>> 
>>
>> In my server, there are 384 CPUs, 512 GB memory and 8 nodes. And
>> "trace_buf_size=100M" is set.
>>
>> When using trace_buf_size=100M, kernel allocates 100 MB memory
>> per CPU before calling free_are_init_core(). Kernel tries to
>> allocates 38.4GB (100 MB * 384 CPU) memory. But available memory
>> at this time is about 16GB (2 GB * 8 nodes) due to the following commit:
>>
>>   3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages
>>  if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
>>
> 
> 1. What is the use case for such a large trace buffer being allocated at
>boot time?
> 2. Is disabling CONFIG_DEFERRED_STRUCT_PAGE_INIT at compile time an
>option for you given that it's a custom-built kernel and not a
>distribution kernel?
> 
> Basically, as the allocation context is within smp_init(), there are no
> opportunities to do the deferred meminit early. Furthermore, the partial
> initialisation of memory occurs before the size of the trace buffers is
> set so there is no opportunity to adjust the amount of memory that is
> pre-initialised. We could potentially catch when memory is low during
> system boot and adjust the amount that is initialised serially but the
> complexity would be high. Given that deferred meminit is basically a minor
> optimisation that only affects very large machines and trace_buf_size being
> used is somewhat specialised, I think the most straight-forward option is
> to go back to serialised meminit if trace_buf_size is specified like this;
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 710143741eb5..6ef0ab13f774 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -558,6 +558,19 @@ void drain_local_pages(struct zone *zone);
>  
>  void page_alloc_init_late(void);
>  
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +extern void __init disable_deferred_meminit(void);
> +extern void page_alloc_init_late_prepare(void);
> +#else
> +static inline void disable_deferred_meminit(void)
> +{
> +}
> +
> +static inline void page_alloc_init_late_prepare(void)
> +{
> +}
> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
> +
>  /*
>   * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict 
> what
>   * GFP flags are used before interrupts are enabled. Once interrupts are
> diff --git a/init/main.c b/init/main.c
> index 0ee9c6866ada..0248b8b5bc3a 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1058,6 +1058,8 @@ static noinline void __init kernel_init_freeable(void)
>   do_pre_smp_initcalls();
>   lockup_detector_init();
>  
> + page_alloc_init_late_prepare();
> +
>   smp_init();
>   sched_init_smp();
>  
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 752e5daf0896..cfa7175ff093 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1115,6 +1115,13 @@ static int __init set_buf_size(char *str)
>   if (buf_size == 0)
>   return 0;
>   trace_buf_size = buf_size;
> +
> + /

Re: [PATCH 09/10] dmaengine: k3dma: Use vchan_terminate_vdesc() instead of desc_free

2017-11-14 Thread zhangfei




On 2017年11月14日 22:32, Peter Ujfalusi wrote:

To avoid race with vchan_complete, use the race free way to terminate
running transfer.

Implement the device_synchronize callback to make sure that the terminated
descriptor is freed.

CC: Zhangfei Gao 
Signed-off-by: Peter Ujfalusi 


Thanks Peter.

Acked-by: Zhangfei Gao

RE: [PATCH 4.4 05/56] ARM: dts: imx53-qsb-common: fix FEC pinmux config

2017-11-14 Thread Patrick Brünn

>From: Ben Hutchings [mailto:ben.hutchi...@codethink.co.uk]
>Sent: Dienstag, 14. November 2017 16:45
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> --
>>
>> From: Patrick Bruenn 
>>
>>
>> [ Upstream commit 8b649e426336d7d4800ff9c82858328f4215ba01 ]
>>
>> The pinmux configuration in device tree was different from manual
>> muxing in /board/freescale/mx53loco/mx53loco.c
>> All pins were configured as NO_PAD_CTL(1 << 31), which was fine as the
>> bootloader already did the correct pinmuxing for us.
>> But recently u-boot is migrating to reuse device tree files from the
>> kernel tree, so it seems to be better to have the correct pinmuxing in
>> our files, too.
>[...]
>
>Presumably u-boot will reuse the *upstream* device tree files, which
>implies this doesn't need to be fixed on stable branches.
>
>Ben.
I think Ben made a good point. It might be dangerous to change the pinmux
configuration for an Ethernet controller in  stable kernels, just in case
anyone out there uses a hardware and bootloader with different muxing.
Shawn might be able to tell, if that is even possible.
In case there is at least a chance someone is using a different configuration,
I wouldn't backport this commit to any stable branch.

Regards,
Patrick

Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

Re: [PATCH v8 2/6] time: sync read_boot_clock64() with persistent clock

2017-11-14 Thread Pavel Tatashin

> IMO, using the extern keyword on function prototypes in *.h files
> is superfluous, but, It doesn't matter for functionality. *extern*
> is default keywords.
>
> AFAIK, it's a code style problem. In x86 arch, we prefer to
> keep *extern* explicitly, so, let's keep it like before for
> code consistency.

Sounds good, I will restore externs.

Thank you,
Pavel

Re: [PATCH v8 6/6] x86/tsc: use tsc early

2017-11-14 Thread Dou Liyang

Hi Pavel,

At 11/15/2017 05:46 AM, Pavel Tatashin wrote:

Hi Dou,

Great comments, my replies below:

 static inline unsigned long long paravirt_sched_clock(void)
 {
-   return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
+   return PVOP_CALL0(unsigned long long,
pv_time_ops.active_sched_clock);
 }

Should in the 5th patch

Actually, it has to be in patch 6, because otherwise patch 5 without
patch 6 would cause native_sched_clock() to be used even when a
platform specific clock is set, thus may cause performance regressions
where it is not optimal to use tsc for clock.

Indeed.

>> +  tsc_early_disable();
>> +  __sched_clock_offset = sched_clock_early() - sched_clock();
>> +  pr_info("sched clock early is finished, offset[%lld.%09lds]\n", t, r);
>> +}

s/sched clock early/early sched clock/

>> +#endif /* CONFIG_X86_TSC */

Add definitions for the situation of X86_TSC = no :

#else /* CONFIG_X86_TSC */
static inline void tsc_early_init(unsigned int khz) { }
static inline void tsc_early_fini(void) { }

Excellent point, I totally forgot about  X86_TSC = no, however, a
better fix is to simply remove #ifdef CONFIG_X86_TSC from my
functions. Apparently, even with X86_TSC=no we can use TSC unless

Agree with removing #ifdef CONFIG_X86_TSC

notsc kernel parameter is passed. This will be in the next patchset.

According to tsc_early_delay_calibrate(), if (!boot_cpu_has(X86_FEATURE_TSC
|| !tsc_khz ), tsc_early_init(tsc_khz)
will never be called, so here is  redundant.

return;
}

@@ -1302,6 +1385,7 @@ void __init tsc_init(void)
if (!tsc_khz) {
mark_tsc_unstable("could not calculate TSC khz");
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
+   tsc_early_fini();

ditto

Right, in both case we still want to call tsc_early_fini(). Because,
it calls tsc_early_disable() even when tsc_early_init() was never
called.  tsc_early_disable()  either sets static branch
__tsc_early_static to false or changes active_sched_clock to be
platform specific, depending on CONFIG_PARAVIRT.

Yes, the function names confused me, the actually purpose of
tsc_early_fini() is switching schedule clock to the final one. right?

How about replacing *return;* with *goto final_sched_clock;*

Thanks,
dou.

BTW, seems you forgot to cc Peter Zijlstra  in both V7
and V8 patchsets.

Thank you for noticing this! I will include Peter when I send out
patchset version 9.

Thank you,
Pavel

Re: [PATCH v17 6/6] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2017-11-14 Thread Wei Wang


On 11/15/2017 05:21 AM, Michael S. Tsirkin wrote:

On Tue, Nov 14, 2017 at 08:02:03PM +0800, Wei Wang wrote:

On 11/14/2017 01:32 AM, Michael S. Tsirkin wrote:

- guest2host_cmd: written by the guest to ACK to the host about the
commands that have been received. The host will clear the corresponding
bits on the host2guest_cmd register. The guest also uses this register
to send commands to the host (e.g. when finish free page reporting).

I am not sure what is the role of guest2host_cmd. Reporting of
the correct cmd id seems sufficient indication that guest
received the start command. Not getting any more seems sufficient
to detect stop.


I think the issue is when the host is waiting for the guest to report pages,
it does not know whether the guest is going to report more or the report is
done already. That's why we need a way to let the guest tell the host "the
report is done, don't wait for more", then the host continues to the next
step - sending the non-free pages to the destination. The following method
is a conclusion of other comments, with some new thought. Please have a
check if it is good.

config won't work well for this IMHO.
Writes to config register are hard to synchronize with the VQ.
For example, guest sends free pages, host says stop, meanwhile
guest sends stop for 1st set of pages.


I still don't see an issue with this. Please see below:
(before jumping into the discussion, just make sure I've well explained 
this point: now host-to-guest commands are done via config, and 
guest-to-host commands are done via the free page vq)


Case: Host starts to request the reporting with cmd_id=1. Some time 
later, Host writes "stop" to config, meantime guest happens to finish 
the reporting and plan to actively send a "stop" command from the 
free_page_vq().
  Essentially, this is like a sync between two threads - if we 
view the config interrupt handler as one thread, another is the free 
page reporting worker thread.


- what the config handler does is simply:
  1.1:  WRITE_ONCE(vb->reporting_stop, true);

- what the reporting thread will do is
  2.1:  WRITE_ONCE(vb->reporting_stop, true);
  2.2:  send_stop_to_host_via_vq();

From the guest point of view, no matter 1.1 is executed first or 2.1 
first, it doesn't make a difference to the end result - 
vb->reporting_stop is set.


From the host point of view, it knows that cmd_id=1 has truly stopped 
the reporting when it receives a "stop" sign via the vq.




How about adding a buffer with "stop" in the VQ instead?
Wastes a VQ entry which you will need to reserve for this
but is it a big deal?


The free page vq is guest-to-host direction. Using it for host-to-guest 
requests will make it bidirectional, which will result in the same issue 
described before: https://lkml.org/lkml/2017/10/11/1009 (the first response)


On the other hand, I think adding another new vq for host-to-guest 
requesting doesn't make a difference in essence, compared to using 
config (same 1.1, 2.1, 2.2 above), but will be more complicated.




Two new configuration registers in total:
- cmd_reg: the command register, combined from the previous host2guest and
guest2host. I think we can use the same register for host requesting and
guest ACKing, since the guest writing will trap to QEMU, that is, all the
writes to the register are performed in QEMU, and we can keep things work in
a correct way there.
- cmd_id_reg: the sequence id of the free page report command.

-- free page report:
 - host requests the guest to start reporting by "cmd_reg |
REPORT_START";
 - guest ACKs to the host about receiving the start reporting request by
"cmd_reg | REPORT_START", host will clear the flag bit once receiving the
ACK.
 - host requests the guest to stop reporting by "cmd_reg | REPORT_STOP";
 - guest ACKs to the host about receiving the stop reporting request by
"cmd_reg | REPORT_STOP", host will clear the flag once receiving the ACK.
 - guest tells the host about the start of the reporting by writing "cmd
id" into an outbuf, which is added to the free page vq.
 - guest tells the host about the end of the reporting by writing "0"
into an outbuf, which is added to the free page vq. (we reserve "id=0" as
the stop sign)

-- ballooning:
 - host requests the guest to start ballooning by "cmd_reg | BALLOONING";
 - guest ACKs to the host about receiving the request by "cmd_reg |
BALLOONING", host will clear the flag once receiving the ACK.


Some more explanations:
-- Why not let the host request the guest to start the free page reporting
simply by writing a new cmd id to the cmd_id_reg?
The configuration interrupt is shared among all the features - ballooning,
free page reporting, and future feature extensions which need host-to-guest
requests. Some features may need to add other feature specific configuration
registers, like free page reporting need the cmd_id_reg, which is not used
by ballooning. The r

Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)

2017-11-14 Thread Matthew Wilcox

On Mon, Nov 13, 2017 at 02:46:25PM -0800, Dave Hansen wrote:
> On 11/13/2017 02:20 PM, Dave Hansen wrote:
> > On 11/09/2017 05:09 PM, Tycho Andersen wrote:
> >> which I guess is from the additional flags in grow_dev_page() somewhere 
> >> down
> >> the stack. Anyway... it seems this is a kernel allocation that's using
> >> MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than 
> >> just
> >> all MOVABLE allocations are un-mapped via xpfo, and all the others are 
> >> mapped.
> >>
> >> Do you have any ideas?
> > 
> > It still has to do a kmap() or kmap_atomic() to be able to access it.  I
> > thought you hooked into that.  Why isn't that path getting hit for these?
> 
> Oh, this looks to be accessing data mapped by a buffer_head.  It
> (rudely) accesses data via:
> 
> void set_bh_page(struct buffer_head *bh,
> ...
>   bh->b_data = page_address(page) + offset;

We don't need to kmap in order to access MOVABLE allocations.  kmap is
only needed for HIGHMEM allocations.  So there's nothing wrong with ext4
or set_bh_page().

Re: [PATCH 2/3] arm64: dts: hisilicon: add pinctrl nodes for hi3798cv200-poplar board

2017-11-14 Thread Shawn Guo

On Wed, Oct 18, 2017 at 07:22:07AM -0400, Jiancheng Xue wrote:
> From: Younian Wang 
> 
> Add pinctrl nodes for hi3798cv200-poplar board
> 
> Signed-off-by: Younian Wang 
> Signed-off-by: Jiancheng Xue 
> ---
>  .../boot/dts/hisilicon/hi3798cv200-poplar.dts  |   1 +
>  arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi |  71 +++
>  arch/arm64/boot/dts/hisilicon/poplar-pinctrl.dtsi  | 651 
> +
>  3 files changed, 723 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/hisilicon/poplar-pinctrl.dtsi
> 
> diff --git a/arch/arm64/boot/dts/hisilicon/hi3798cv200-poplar.dts 
> b/arch/arm64/boot/dts/hisilicon/hi3798cv200-poplar.dts
> index b914287..6a0b7e9 100644
> --- a/arch/arm64/boot/dts/hisilicon/hi3798cv200-poplar.dts
> +++ b/arch/arm64/boot/dts/hisilicon/hi3798cv200-poplar.dts
> @@ -11,6 +11,7 @@
>  
>  #include 
>  #include "hi3798cv200.dtsi"
> +#include "poplar-pinctrl.dtsi"
>  
>  / {
>   model = "HiSilicon Poplar Development Board";
> diff --git a/arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi 
> b/arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi
> index 0d11dc7..5a73c68 100644
> --- a/arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi
> +++ b/arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi
> @@ -106,6 +106,54 @@
>   #reset-cells = <2>;
>   };
>  
> + pmx0: pinconf@8a21000 {
> + compatible = "pinconf-single";
> + reg = <0x8a21000 0x180>;
> + pinctrl-single,register-width = <32>;
> + pinctrl-single,function-mask = <7>;
> + pinctrl-single,gpio-range = <
> + &range 0  8 2  /* GPIO 0 */
> + &range 8  1 0  /* GPIO 1 */
> + &range 9  4 2
> + &range 13 1 0
> + &range 14 1 1
> + &range 15 1 0
> + &range 16 5 0  /* GPIO 2 */
> + &range 21 3 1
> + &range 24 4 1  /* GPIO 3 */
> + &range 28 2 2
> + &range 86 1 1
> + &range 87 1 0
> + &range 30 4 2  /* GPIO 4 */
> + &range 34 3 0
> + &range 37 1 2
> + &range 38 3 2  /* GPIO 6 */
> + &range 41 5 0
> + &range 46 8 1  /* GPIO 7 */
> + &range 54 8 1  /* GPIO 8 */
> + &range 64 7 1  /* GPIO 9 */
> + &range 71 1 0
> + &range 72 6 1  /* GPIO 10 */
> + &range 78 1 0
> + &range 79 1 1
> + &range 80 6 1  /* GPIO 11 */
> + &range 70 2 1
> + &range 88 8 0  /* GPIO 12 */
> + >;
> +
> + range: gpio-range {
> + #pinctrl-single,gpio-range-cells = <3>;
> + };
> + };
> +
> + pmx1: pinconf@844 {
> + compatible = "pinctrl-single";
> + reg = <0x844 4>;
> + pinctrl-single,register-width = <32>;
> + pinctrl-single,function-mask = <1>;
> + pinctrl-single,bit-per-mux;
> + };
> +
>   uart0: serial@8b0 {
>   compatible = "arm,pl011", "arm,primecell";
>   reg = <0x8b0 0x1000>;
> @@ -209,6 +257,7 @@
>   #gpio-cells = <2>;
>   interrupt-controller;
>   #interrupt-cells = <2>;
> + gpio-ranges = <&pmx0 0 0 8>;
>   clocks = <&crg HISTB_APB_CLK>;
>   clock-names = "apb_pclk";
>   status = "disabled";
> @@ -222,6 +271,13 @@
>   #gpio-cells = <2>;
>   interrupt-controller;
>   #interrupt-cells = <2>;
> + gpio-ranges = <
> + &pmx0 0 8 1
> + &pmx0 1 9 4
> + &pmx0 5 13 1
> + &pmx0 6 14 1
> + &pmx0 7 15 1
> + >;
>   clocks = <&crg HISTB_APB_CLK>;
>   clock-names = "apb_pclk";
>   status = "disabled";
> @@ -235,6 +291,7 @@
>   #gpio-cells = <2>;
>   interrupt-controller;
>   #interrupt-cells = <2>;
> + gpio-ranges = <&pmx0 0 16 5 &pmx0 5 21 3>;
>   clocks = <&crg HISTB_APB_CLK>;
>

Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit

2017-11-14 Thread Eric W. Biederman

Kirill Tkhai  writes:

> On 14.11.2017 21:39, Cong Wang wrote:
>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai  wrote:
>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>
>>> get_user_ns(user_ns);
>>>
>>> -   rv = mutex_lock_killable(&net_mutex);
>>> +   rv = down_read_killable(&net_sem);
>>> if (rv < 0) {
>>> net_free(net);
>>> dec_net_namespaces(ucounts);
>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags,
>>> list_add_tail_rcu(&net->list, &net_namespace_list);
>>> rtnl_unlock();
>>> }
>>> -   mutex_unlock(&net_mutex);
>>> +   up_read(&net_sem);
>>> if (rv < 0) {
>>> dec_net_namespaces(ucounts);
>>> put_user_ns(user_ns);
>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work)
>>> list_replace_init(&cleanup_list, &net_kill_list);
>>> spin_unlock_irq(&cleanup_list_lock);
>>>
>>> -   mutex_lock(&net_mutex);
>>> +   down_read(&net_sem);
>>>
>>> /* Don't let anyone else find us. */
>>> rtnl_lock();
>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work)
>>> list_for_each_entry_reverse(ops, &pernet_list, list)
>>> ops_free_list(ops, &net_exit_list);
>>>
>>> -   mutex_unlock(&net_mutex);
>>> +   up_read(&net_sem);
>> 
>> After your patch setup_net() could run concurrently with cleanup_net(),
>> given that ops_exit_list() is called on error path of setup_net() too,
>> it means ops->exit() now could run concurrently if it doesn't have its
>> own lock. Not sure if this breaks any existing user.
>
> Yes, there will be possible concurrent ops->init() for a net namespace,
> and ops->exit() for another one. I hadn't found pernet operations, which
> have a problem with that. If they exist, they are hidden and not clear seen.
> The pernet operations in general do not touch someone else's memory.
> If suddenly there is one, KASAN should show it after a while.

Certainly the use of hash tables shared between multiple network
namespaces would count.  I don't rembmer how many of these we have but
there used to be quite a few.

Eric

[PATCH v2] ksm : use checksum and memcmp for rb_tree

2017-11-14 Thread Kyeongdon Kim

The current ksm is using memcmp to insert and search 'rb_tree'.
It does cause very expensive computation cost.
In order to reduce the time of this operation,
we have added a checksum to traverse.

Nearly all 'rb_node' in stable_tree_insert() function
can be inserted as a checksum, most of it is possible
in unstable_tree_search_insert() function.
In stable_tree_search() function, the checksum may be an additional.
But, checksum check duration is extremely small.
Considering the time of the whole cmp_and_merge_page() function,
it requires very little cost on average.

Using this patch, we compared the time of ksm_do_scan() function
by adding kernel trace at the start-end position of operation.
(ARM 32bit target android device,
over 1000 sample time gap stamps average)

On original KSM scan avg duration = 0.0166893 sec
14991.975619 : ksm_do_scan_start: START: ksm_do_scan
14991.990975 : ksm_do_scan_end: END: ksm_do_scan
14992.008989 : ksm_do_scan_start: START: ksm_do_scan
14992.016839 : ksm_do_scan_end: END: ksm_do_scan
...

On patch KSM scan avg duration = 0.0041157 sec
41081.46131 : ksm_do_scan_start : START: ksm_do_scan
41081.46636 : ksm_do_scan_end : END: ksm_do_scan
41081.48476 : ksm_do_scan_start : START: ksm_do_scan
41081.48795 : ksm_do_scan_end : END: ksm_do_scan
...

We have tested randomly so many times for the stability
and couldn't see any abnormal issue until now.
Also, we found out this patch can make some good advantage
for the power consumption than KSM default enable.

v1 -> v2
- add comment for oldchecksum value
- move the oldchecksum value out of union
- remove check code regarding checksum 0 in stable_tree_search()

link to v1 : https://lkml.org/lkml/2017/10/30/251

Signed-off-by: Kyeongdon Kim 
---
 mm/ksm.c | 48 
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index be8f457..9280569 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -134,6 +134,7 @@ struct ksm_scan {
  * @kpfn: page frame number of this ksm page (perhaps temporarily on wrong nid)
  * @chain_prune_time: time of the last full garbage collection
  * @rmap_hlist_len: number of rmap_item entries in hlist or STABLE_NODE_CHAIN
+ * @oldchecksum: previous checksum of the page about a stable_node
  * @nid: NUMA node id of stable tree in which linked (may not match kpfn)
  */
 struct stable_node {
@@ -159,6 +160,7 @@ struct stable_node {
 */
 #define STABLE_NODE_CHAIN -1024
int rmap_hlist_len;
+   u32 oldchecksum;
 #ifdef CONFIG_NUMA
int nid;
 #endif
@@ -1522,7 +1524,7 @@ static __always_inline struct page *chain(struct 
stable_node **s_n_d,
  * This function returns the stable tree node of identical content if found,
  * NULL otherwise.
  */
-static struct page *stable_tree_search(struct page *page)
+static struct page *stable_tree_search(struct page *page, u32 checksum)
 {
int nid;
struct rb_root *root;
@@ -1550,6 +1552,18 @@ static struct page *stable_tree_search(struct page *page)
 
cond_resched();
stable_node = rb_entry(*new, struct stable_node, node);
+
+   /* first make rb_tree by checksum */
+   if (checksum < stable_node->oldchecksum) {
+   parent = *new;
+   new = &parent->rb_left;
+   continue;
+   } else if (checksum > stable_node->oldchecksum) {
+   parent = *new;
+   new = &parent->rb_right;
+   continue;
+   }
+
stable_node_any = NULL;
tree_page = chain_prune(&stable_node_dup, &stable_node, root);
/*
@@ -1768,7 +1782,7 @@ static struct page *stable_tree_search(struct page *page)
  * This function returns the stable tree node just allocated on success,
  * NULL otherwise.
  */
-static struct stable_node *stable_tree_insert(struct page *kpage)
+static struct stable_node *stable_tree_insert(struct page *kpage, u32 checksum)
 {
int nid;
unsigned long kpfn;
@@ -1792,6 +1806,18 @@ static struct stable_node *stable_tree_insert(struct 
page *kpage)
cond_resched();
stable_node = rb_entry(*new, struct stable_node, node);
stable_node_any = NULL;
+
+   /* first make rb_tree by checksum */
+   if (checksum < stable_node->oldchecksum) {
+   parent = *new;
+   new = &parent->rb_left;
+   continue;
+   } else if (checksum > stable_node->oldchecksum) {
+   parent = *new;
+   new = &parent->rb_right;
+   continue;
+   }
+
tree_page = chain(&stable_node_dup, stable_node, root);
if (!stable_node_dup) {
/*
@@ -1850,6 +1876,7 @@ static struct stable_node *stable_tree_insert(struct page 
*kpage)

Re: [PATCH RFC v3 4/6] Documentation: Add three sysctls for smart idle poll

2017-11-14 Thread quan.x...@gmail.com




On 2017年11月14日 15:44, Ingo Molnar wrote:

* Quan Xu  wrote:



On 2017/11/13 23:08, Ingo Molnar wrote:

* Quan Xu  wrote:


From: Quan Xu 

To reduce the cost of poll, we introduce three sysctl to control the
poll time when running as a virtual machine with paravirt.

Signed-off-by: Yang Zhang 
Signed-off-by: Quan Xu 
---
   Documentation/sysctl/kernel.txt |   35 +++
   arch/x86/kernel/paravirt.c  |4 
   include/linux/kernel.h  |6 ++
   kernel/sysctl.c |   34 ++
   4 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c..30c25fb 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -714,6 +714,41 @@ kernel tries to allocate a number starting from this one.
   ==
+paravirt_poll_grow: (X86 only)
+
+Multiplied value to increase the poll time. This is expected to take
+effect only when running as a virtual machine with CONFIG_PARAVIRT
+enabled. This can't bring any benifit on bare mental even with
+CONFIG_PARAVIRT enabled.
+
+By default this value is 2. Possible values to set are in range {2..16}.
+
+==
+
+paravirt_poll_shrink: (X86 only)
+
+Divided value to reduce the poll time. This is expected to take effect
+only when running as a virtual machine with CONFIG_PARAVIRT enabled.
+This can't bring any benifit on bare mental even with CONFIG_PARAVIRT
+enabled.
+
+By default this value is 2. Possible values to set are in range {2..16}.
+
+==
+
+paravirt_poll_threshold_ns: (X86 only)
+
+Controls the maximum poll time before entering real idle path. This is
+expected to take effect only when running as a virtual machine with
+CONFIG_PARAVIRT enabled. This can't bring any benifit on bare mental
+even with CONFIG_PARAVIRT enabled.
+
+By default, this value is 0 means not to poll. Possible values to set
+are in range {0..50}. Change the value to non-zero if running
+latency-bound workloads in a virtual machine.

I absolutely hate it how this hybrid idle loop polling mechanism is not
self-tuning!

Ingo, actually it is self-tuning..

Then why the hell does it touch the syscall ABI?



just for more data about performance and CPU utilization with different
the maximum poll time.

there are 3 parameters, paravirt_poll_{grow|shrink|threshold_ns}..
we didn't touch paravirt_poll_{grow|shrink} since we sent out v1.

We tested it based on  benchmark contextswitch / netperf with different
paravirt_poll_threshold_ns.

Here is the data we get when running benchmark contextswitch to measure
the latency(lower is better):
  halt_poll_threshold=0  -- 3402.9 ns/ctxsw -- 199.8 %CPU
  halt_poll_threshold=1  -- 1151.4 ns/ctxsw -- 200.1 %CPU
  halt_poll_threshold=2  -- 1149.7 ns/ctxsw -- 199.9 %CPU
  halt_poll_threshold=3  -- 1151.0 ns/ctxsw -- 199.9 %CPU
  halt_poll_threshold=4  -- 1155.4 ns/ctxsw -- 199.3 %CPU
  halt_poll_threshold=5  -- 1161.0 ns/ctxsw -- 200.0 %CPU
  halt_poll_threshold=10 -- 1163.8 ns/ctxsw -- 200.4 %CPU
  halt_poll_threshold=20 -- 1163.8 ns/ctxsw -- 201.4 %CPU
  halt_poll_threshold=30 -- 1159.4 ns/ctxsw -- 201.9 %CPU
  halt_poll_threshold=50 -- 1163.5 ns/ctxsw -- 205.5 %CPU


Here is the data we get when running benchmark netperf:
  halt_poll_threshold=0  -- 29031.6 bit/s -- 76.1  %CPU
  halt_poll_threshold=1  -- 29021.7 bit/s -- 105.1 %CPU
  halt_poll_threshold=2  -- 33463.5 bit/s -- 128.2 %CPU
  halt_poll_threshold=3  -- 34436.4 bit/s -- 127.8 %CPU
  halt_poll_threshold=4  -- 35563.3 bit/s -- 129.6 %CPU
  halt_poll_threshold=5  -- 35787.7 bit/s -- 129.4 %CPU
  halt_poll_threshold=10 -- 35477.7 bit/s -- 130.0 %CPU
  halt_poll_threshold=20 -- 35877.7 bit/s -- 131.0 %CPU
  halt_poll_threshold=30 -- 35730.0 bit/s -- 132.4 %CPU
  halt_poll_threshold=50 -- 34978.4 bit/s -- 134.2 %CPU


and think of the default value(20, for x86) of kvm dynamic poll,
I'll set it as the same as kvm dynamic poll.

I also test idle VM with diffrent halt_poll_threshold, which doesn't
make CPU utilization fluctuated..



could I only leave paravirt_poll_threshold_ns parameter (the maximum poll time),
which is as similar as "adaptive halt-polling" Wanpeng mentioned.. then user can
turn it off, or find an appropriate threshold for some odd scenario..

That way lies utter madness. Maybe add it as a debugfs knob, but exposing it to
userspace: NAK.


.. so, I will make these 3 parameters by default in next v4.
 paravirt_poll_threshold_ns = 20
 paravirt_poll_shrink = 2
 paravirt_poll_grow = 2

neither touch the syscal ABI nor expose it to userspace again.


Quan

Re: [PATCH v3 0/9] memfd: add sealing to hugetlb-backed memory

2017-11-14 Thread Mike Kravetz

+Cc: Andrew, Michal, David

Are there any other comments on this patch series from Marc-André?  Is anything
else needed to move forward?

I have reviewed the patches in the series.  David Herrmann (the original
memfd_create/file sealing author) has also taken a look at the patches.

One outstanding issue is sorting out the config option dependencies.  Although,
IMO this is not a strict requirement for this series.  I have addressed this
issue in a follow on series:
http://lkml.kernel.org/r/20171109014109.21077-1-mike.krav...@oracle.com

-- 
Mike Kravetz


On 11/07/2017 04:27 AM, Marc-André Lureau wrote:
> Hi,
> 
> Recently, Mike Kravetz added hugetlbfs support to memfd. However, he
> didn't add sealing support. One of the reasons to use memfd is to have
> shared memory sealing when doing IPC or sharing memory with another
> process with some extra safety. qemu uses shared memory & hugetables
> with vhost-user (used by dpdk), so it is reasonable to use memfd
> now instead for convenience and security reasons.
> 
> Thanks!
> 
> v3:
> - do remaining MFD_DEF_SIZE/mfd_def_size substitutions
> - fix missing unistd.h include in common.c
> - tweaked a bit commit message prefixes
> - added reviewed-by tags
> 
> v2:
> - add "memfd-hugetlb:" prefix in memfd-test
> - run fuse test on hugetlb backend memory
> - rename function memfd_file_get_seals() -> memfd_file_seals_ptr()
> - update commit messages
> - added reviewed-by tags
> 
> RFC->v1:
> - split rfc patch, after early review feedback
> - added patch for memfd-test changes
> - fix build with hugetlbfs disabled
> - small code and commit messages improvements
> 
> Marc-André Lureau (9):
>   shmem: unexport shmem_add_seals()/shmem_get_seals()
>   shmem: rename functions that are memfd-related
>   hugetlb: expose hugetlbfs_inode_info in header
>   hugetlb: implement memfd sealing
>   shmem: add sealing support to hugetlb-backed memfd
>   memfd-test: test hugetlbfs sealing
>   memfd-test: add 'memfd-hugetlb:' prefix when testing hugetlbfs
>   memfd-test: move common code to a shared unit
>   memfd-test: run fuse test on hugetlb backend memory
> 
>  fs/fcntl.c |   2 +-
>  fs/hugetlbfs/inode.c   |  39 +++--
>  include/linux/hugetlb.h|  11 ++
>  include/linux/shmem_fs.h   |   6 +-
>  mm/shmem.c |  59 ---
>  tools/testing/selftests/memfd/Makefile |   5 +
>  tools/testing/selftests/memfd/common.c |  46 ++
>  tools/testing/selftests/memfd/common.h |   9 ++
>  tools/testing/selftests/memfd/fuse_test.c  |  44 +++--
>  tools/testing/selftests/memfd/memfd_test.c | 212 
> -
>  tools/testing/selftests/memfd/run_fuse_test.sh |   2 +-
>  tools/testing/selftests/memfd/run_tests.sh |   1 +
>  12 files changed, 200 insertions(+), 236 deletions(-)
>  create mode 100644 tools/testing/selftests/memfd/common.c
>  create mode 100644 tools/testing/selftests/memfd/common.h
>

[PATCH v3 2/4] MIPS: Loongson64: lemote-2f move ec_kb3310b.h to include dir and clean up

2017-11-14 Thread Jiaxun Yang

To operate EC from platform driver, this head file need able to be include
from anywhere. This patch just move ec_kb3310b.h to include dir and
clean up ec_kb3310b.h.

Signed-off-by: Jiaxun Yang 
---
 arch/mips/include/asm/mach-loongson64/ec_kb3310b.h | 170 +++
 arch/mips/loongson64/lemote-2f/ec_kb3310b.c|   2 +-
 arch/mips/loongson64/lemote-2f/ec_kb3310b.h| 188 -
 arch/mips/loongson64/lemote-2f/pm.c|   4 +-
 arch/mips/loongson64/lemote-2f/reset.c |   4 +-
 5 files changed, 175 insertions(+), 193 deletions(-)
 create mode 100644 arch/mips/include/asm/mach-loongson64/ec_kb3310b.h
 delete mode 100644 arch/mips/loongson64/lemote-2f/ec_kb3310b.h

diff --git a/arch/mips/include/asm/mach-loongson64/ec_kb3310b.h 
b/arch/mips/include/asm/mach-loongson64/ec_kb3310b.h
new file mode 100644
index ..2e8690532ea5
--- /dev/null
+++ b/arch/mips/include/asm/mach-loongson64/ec_kb3310b.h
@@ -0,0 +1,170 @@
+/*
+ * KB3310B Embedded Controller
+ *
+ *  Copyright (C) 2008 Lemote Inc.
+ *  Author: liujl , 2008-03-14
+ *  Copyright (C) 2009 Lemote Inc.
+ *  Author: Wu Zhangjin 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _EC_KB3310B_H
+#define _EC_KB3310B_H
+
+extern unsigned char ec_read(unsigned short addr);
+extern void ec_write(unsigned short addr, unsigned char val);
+extern int ec_query_seq(unsigned char cmd);
+extern int ec_query_event_num(void);
+extern int ec_get_event_num(void);
+
+typedef int (*sci_handler) (int status);
+extern sci_handler yeeloong_report_lid_status;
+
+#define ON 1
+#define OFF0
+
+#define SCI_IRQ_NUM 0x0A
+
+/*
+ * The following registers are determined by the EC index configuration.
+ * 1, fill the PORT_HIGH as EC register high part.
+ * 2, fill the PORT_LOW as EC register low part.
+ * 3, fill the PORT_DATA as EC register write data or get the data from it.
+ */
+#defineEC_IO_PORT_HIGH 0x0381
+#defineEC_IO_PORT_LOW  0x0382
+#defineEC_IO_PORT_DATA 0x0383
+
+/*
+ * EC delay time is 500us for register and status access
+ */
+#defineEC_REG_DELAY500 /* unit : us */
+#defineEC_CMD_TIMEOUT  0x1000
+
+/*
+ * EC access port for SCI communication
+ */
+#defineEC_CMD_PORT 0x66
+#defineEC_STS_PORT 0x66
+#defineEC_DAT_PORT 0x62
+#defineCMD_INIT_IDLE_MODE  0xdd
+#defineCMD_EXIT_IDLE_MODE  0xdf
+#defineCMD_INIT_RESET_MODE 0xd8
+#defineCMD_REBOOT_SYSTEM   0x8c
+#defineCMD_GET_EVENT_NUM   0x84
+#defineCMD_PROGRAM_PIECE   0xda
+
+/* Temperature & Fan registers */
+#defineREG_TEMPERATURE_VALUE   0xF458
+#defineREG_FAN_AUTO_MAN_SWITCH 0xF459
+#defineBIT_FAN_AUTO0
+#defineBIT_FAN_MANUAL  1
+#defineREG_FAN_CONTROL 0xF4D2
+#defineREG_FAN_STATUS  0xF4DA
+#defineREG_FAN_SPEED_HIGH  0xFE22
+#defineREG_FAN_SPEED_LOW   0xFE23
+#defineREG_FAN_SPEED_LEVEL 0xF4CC
+/* Fan speed divider */
+#defineFAN_SPEED_DIVIDER   48  /* (60*1000*1000/62.5/2)*/
+
+/* Battery registers */
+#defineREG_BAT_DESIGN_CAP_HIGH 0xF77D
+#defineREG_BAT_DESIGN_CAP_LOW  0xF77E
+#defineREG_BAT_FULLCHG_CAP_HIGH0xF780
+#defineREG_BAT_FULLCHG_CAP_LOW 0xF781
+#defineREG_BAT_DESIGN_VOL_HIGH 0xF782
+#defineREG_BAT_DESIGN_VOL_LOW  0xF783
+#defineREG_BAT_CURRENT_HIGH0xF784
+#defineREG_BAT_CURRENT_LOW 0xF785
+#defineREG_BAT_VOLTAGE_HIGH0xF786
+#defineREG_BAT_VOLTAGE_LOW 0xF787
+#defineREG_BAT_TEMPERATURE_HIGH0xF788
+#defineREG_BAT_TEMPERATURE_LOW 0xF789
+#defineREG_BAT_RELATIVE_CAP_HIGH   0xF492
+#defineREG_BAT_RELATIVE_CAP_LOW0xF493
+#defineREG_BAT_VENDOR  0xF4C4
+#defineFLAG_BAT_VENDOR_SANYO   0x01
+#defineFLAG_BAT_VENDOR_SIMPLO  0x02
+#defineREG_BAT_CELL_COUNT  0xF4C6
+#defineFLAG_BAT_CELL_3S1P  0x03
+#defineFLAG_BAT_CELL_3S2P  0x06
+#defineREG_BAT_CHARGE  0xF4A2
+#defineFLAG_BAT_CHARGE_DISCHARGE   0x01
+#defineFLAG_BAT_CHARGE_CHARGE  0x02
+#defineFLAG_BAT_CHARGE_ACPOWER 0x00
+#defineREG_BAT_STATUS  0xF4B0
+#defineBIT_BAT_STATUS_LOW  (1 << 5)
+#defineBIT_BAT_STATUS_DESTROY  (1 << 2)
+#defineBIT_BAT_STATUS_FULL (1 << 1)
+#defineBIT_

[PATCH v3 4/4] MIPS: Loongson64: Load platform device during boot

2017-11-14 Thread Jiaxun Yang

This patch just add pdev during boot to load the platform driver

Signed-off-by: Jiaxun Yang 
---
 arch/mips/loongson64/lemote-2f/Makefile   |  2 +-
 arch/mips/loongson64/lemote-2f/platform.c | 27 +++
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 arch/mips/loongson64/lemote-2f/platform.c

diff --git a/arch/mips/loongson64/lemote-2f/Makefile 
b/arch/mips/loongson64/lemote-2f/Makefile
index 08b8abcbfef5..31c90737b98c 100644
--- a/arch/mips/loongson64/lemote-2f/Makefile
+++ b/arch/mips/loongson64/lemote-2f/Makefile
@@ -2,7 +2,7 @@
 # Makefile for lemote loongson2f family machines
 #
 
-obj-y += clock.o machtype.o irq.o reset.o ec_kb3310b.o
+obj-y += clock.o machtype.o irq.o reset.o ec_kb3310b.o platform.o
 
 #
 # Suspend Support
diff --git a/arch/mips/loongson64/lemote-2f/platform.c 
b/arch/mips/loongson64/lemote-2f/platform.c
new file mode 100644
index ..46922f730a64
--- /dev/null
+++ b/arch/mips/loongson64/lemote-2f/platform.c
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2017 Jiaxun Yang.
+ * Author: Jiaxun Yang, jiaxun.y...@flygoat.com
+
+ * Copyright (C) 2009 Lemote Inc.
+ * Author: Wu Zhangjin, wuzhang...@gmail.com
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include 
+#include 
+
+#include 
+
+static int __init lemote2f_platform_init(void)
+{
+   if (mips_machtype != MACH_LEMOTE_YL2F89)
+   return -ENODEV;
+
+   return platform_device_register_simple("yeeloong_laptop", -1, NULL, 0);
+}
+
+arch_initcall(lemote2f_platform_init);
\ No newline at end of file
-- 
2.14.1

[PATCH v3 3/4] MIPS: Loongson64: Yeeloong add platform driver

2017-11-14 Thread Jiaxun Yang

Yeeloong is a laptop with a MIPS Loongson 2F processor, AMD CS5536
chipset, and KB3310B controller.

This yeeloong_laptop module enables access to sensors, battery,
video camera switch, external video connector event, and some
additional buttons.

This driver was orginally from linux-loongson-community. I Just do
some clean up and port to mainline kernel tree.

Signed-off-by: Jiaxun Yang 
---
 drivers/platform/mips/Kconfig   |   19 +
 drivers/platform/mips/Makefile  |3 +
 drivers/platform/mips/yeeloong_laptop.c | 1142 +++
 3 files changed, 1164 insertions(+)
 create mode 100644 drivers/platform/mips/yeeloong_laptop.c

diff --git a/drivers/platform/mips/Kconfig b/drivers/platform/mips/Kconfig
index b3ae30a4c67b..acd27e36710b 100644
--- a/drivers/platform/mips/Kconfig
+++ b/drivers/platform/mips/Kconfig
@@ -23,4 +23,23 @@ config CPU_HWMON
help
  Loongson-3A/3B CPU Hwmon (temperature sensor) driver.
 
+config LEMOTE_YEELOONG2F
+   tristate "Lemote YeeLoong Laptop"
+   depends on LEMOTE_MACH2F
+   select BACKLIGHT_LCD_SUPPORT
+   select LCD_CLASS_DEVICE
+   select BACKLIGHT_CLASS_DEVICE
+   select POWER_SUPPLY
+   select HWMON
+   select INPUT
+   select INPUT_MISC
+   select INPUT_SPARSEKMAP
+   select INPUT_EVDEV
+   default m
+   help
+ YeeLoong netbook is a mini laptop made by Lemote, which is basically
+ compatible to FuLoong2F mini PC, but it has an extra Embedded
+ Controller(kb3310b) for battery, hotkey, backlight, temperature and
+ fan management.
+
 endif # MIPS_PLATFORM_DEVICES
diff --git a/drivers/platform/mips/Makefile b/drivers/platform/mips/Makefile
index 8dfd03924c37..b3172b081a2f 100644
--- a/drivers/platform/mips/Makefile
+++ b/drivers/platform/mips/Makefile
@@ -1 +1,4 @@
 obj-$(CONFIG_CPU_HWMON) += cpu_hwmon.o
+
+obj-$(CONFIG_LEMOTE_YEELOONG2F)+= yeeloong_laptop.o
+CFLAGS_yeeloong_laptop.o = -I$(srctree)/arch/mips/loongson/lemote-2f
diff --git a/drivers/platform/mips/yeeloong_laptop.c 
b/drivers/platform/mips/yeeloong_laptop.c
new file mode 100644
index ..61353c3e894c
--- /dev/null
+++ b/drivers/platform/mips/yeeloong_laptop.c
@@ -0,0 +1,1142 @@
+/*
+ * Driver for YeeLoong laptop extras
+ *
+ *  Copyright (C) 2017 Jiaxun Yang.
+ *  Author: Jiaxun Yang 
+ *
+ *  Copyright (C) 2009 Lemote Inc.
+ *  Author: Wu Zhangjin , Liu Junliang 
+ *
+ *  Fixes: Petr Pisar , 2012, 2013, 2014, 2015.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 as
+ *  published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include/* for backlight subdriver */
+#include 
+#include/* for hwmon subdriver */
+#include 
+#include/* for clamp_val() */
+#include/* for hotkey subdriver */
+#include 
+#include 
+#include 
+#include /* for AC & Battery subdriver */
+#include/* For MODULE_DEVICE_TABLE() */
+
+#include 
+
+#include 
+
+#include   /* for loongson_cmdline */
+#include 
+
+/* common function */
+#define EC_VER_LEN 64
+
+static int ec_version_before(char *version)
+{
+   char *p, ec_ver[EC_VER_LEN];
+
+   p = strstr(loongson_cmdline, "EC_VER=");
+   if (!p)
+   memset(ec_ver, 0, EC_VER_LEN);
+   else {
+   strncpy(ec_ver, p, EC_VER_LEN);
+   p = strstr(ec_ver, " ");
+   if (p)
+   *p = '\0';
+   }
+
+   return (strncasecmp(ec_ver, version, 64) < 0);
+}
+
+/* backlight subdriver */
+#define MAX_BRIGHTNESS 8
+
+static int yeeloong_set_brightness(struct backlight_device *bd)
+{
+   unsigned int level, current_level;
+   static unsigned int old_level;
+
+   level = (bd->props.fb_blank == FB_BLANK_UNBLANK &&
+bd->props.power == FB_BLANK_UNBLANK) ?
+   bd->props.brightness : 0;
+
+   level = clamp_val(level, 0, MAX_BRIGHTNESS);
+
+   /* Avoid to modify the brightness when EC is tuning it */
+   if (old_level != level) {
+   current_level = ec_read(REG_DISPLAY_BRIGHTNESS);
+   if (old_level == current_level)
+   ec_write(REG_DISPLAY_BRIGHTNESS, level);
+   old_level = level;
+   }
+
+   return 0;
+}
+
+static int yeeloong_get_brightness(struct backlight_device *bd)
+{
+   return ec_read(REG_DISPLAY_BRIGHTNESS);
+}
+
+const struct backlight_ops backlight_ops = {
+   .get_brightness = yeeloong_get_brightness,
+   .update_status = yeeloong_set_brightness,
+};
+
+static struct backlight_device *yeeloong_backlight_dev;
+
+static int yeeloong_backlight_init(void)
+{
+   int ret;
+   struct backlight_properties props;
+
+   memset(&props, 0, sizeof(struct backlight_properties));
+   props.type = BACKLIGHT_RAW;
+   props.max_brightness = MAX_BRIGHTNESS;
+   yeeloong_backlight_d

[PATCH v3 1/4] MIPS: Lonngson64: Copy kernel command line from arcs_cmdline

2017-11-14 Thread Jiaxun Yang

Since lemote-2f/marchtype.c need to get cmdline from loongson.h
this patch simply copy kernel command line from arcs_cmdline
to fix that issue.

Signed-off-by: Jiaxun Yang 
---
 arch/mips/include/asm/mach-loongson64/loongson.h | 6 ++
 arch/mips/loongson64/common/cmdline.c| 7 +++
 2 files changed, 13 insertions(+)

diff --git a/arch/mips/include/asm/mach-loongson64/loongson.h 
b/arch/mips/include/asm/mach-loongson64/loongson.h
index c68c0cc879c6..1edf3a484e6a 100644
--- a/arch/mips/include/asm/mach-loongson64/loongson.h
+++ b/arch/mips/include/asm/mach-loongson64/loongson.h
@@ -45,6 +45,12 @@ static inline void prom_init_uart_base(void)
 #endif
 }
 
+/*
+ * Copy kernel command line from arcs_cmdline
+ */
+#include 
+extern char loongson_cmdline[COMMAND_LINE_SIZE];
+
 /* irq operation functions */
 extern void bonito_irqdispatch(void);
 extern void __init bonito_irq_init(void);
diff --git a/arch/mips/loongson64/common/cmdline.c 
b/arch/mips/loongson64/common/cmdline.c
index 01fbed137028..49e172184e15 100644
--- a/arch/mips/loongson64/common/cmdline.c
+++ b/arch/mips/loongson64/common/cmdline.c
@@ -21,6 +21,11 @@
 
 #include 
 
+/* the kernel command line copied from arcs_cmdline */
+#include 
+char loongson_cmdline[COMMAND_LINE_SIZE];
+EXPORT_SYMBOL(loongson_cmdline);
+
 void __init prom_init_cmdline(void)
 {
int prom_argc;
@@ -45,4 +50,6 @@ void __init prom_init_cmdline(void)
}
 
prom_init_machtype();
+   /* copy arcs_cmdline into loongson_cmdline */
+   strncpy(loongson_cmdline, arcs_cmdline, COMMAND_LINE_SIZE);
 }
-- 
2.14.1

[PATCH AUTOSEL for 4.9 07/56] clk: sunxi-ng: A31: Fix spdif clock register

2017-11-14 Thread alexander . levin

From: Marcus Cooper 

[ Upstream commit 70421257c068b91476e70cade15fca68045d0693 ]

As the SPDIF was rarely documented on the earlier Allwinner SoCs
it was assumed that it had a similar clock register to the one
described in the H3 User Manual.

However this is not the case and it looks to shares the same setup
as the I2S clock registers.

Signed-off-by: Marcus Cooper 
Signed-off-by: Maxime Ripard 
Signed-off-by: Sasha Levin 
---
 drivers/clk/sunxi-ng/ccu-sun6i-a31.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/sunxi-ng/ccu-sun6i-a31.c 
b/drivers/clk/sunxi-ng/ccu-sun6i-a31.c
index 0cca3601d99e..df97e25aec76 100644
--- a/drivers/clk/sunxi-ng/ccu-sun6i-a31.c
+++ b/drivers/clk/sunxi-ng/ccu-sun6i-a31.c
@@ -468,8 +468,8 @@ static SUNXI_CCU_MUX_WITH_GATE(daudio0_clk, "daudio0", 
daudio_parents,
 static SUNXI_CCU_MUX_WITH_GATE(daudio1_clk, "daudio1", daudio_parents,
   0x0b4, 16, 2, BIT(31), CLK_SET_RATE_PARENT);
 
-static SUNXI_CCU_M_WITH_GATE(spdif_clk, "spdif", "pll-audio",
-0x0c0, 0, 4, BIT(31), CLK_SET_RATE_PARENT);
+static SUNXI_CCU_MUX_WITH_GATE(spdif_clk, "spdif", daudio_parents,
+  0x0c0, 16, 2, BIT(31), CLK_SET_RATE_PARENT);
 
 static SUNXI_CCU_GATE(usb_phy0_clk,"usb-phy0", "osc24M",
  0x0cc, BIT(8), 0);
-- 
2.11.0

[PATCH AUTOSEL for 4.9 09/56] dmaengine: zx: set DMA_CYCLIC cap_mask bit

2017-11-14 Thread alexander . levin

From: Shawn Guo 

[ Upstream commit fc318d64f3d91e15babac00e08354b1beb650b57 ]

The zx_dma driver supports cyclic transfer mode.  Let's set DMA_CYCLIC
cap_mask bit to make that clear, and avoid unnecessary failure when
clients request channel via dma_request_chan_by_mask() with DMA_CYCLIC
bit set in mask.

Signed-off-by: Shawn Guo 
Reviewed-by: Jun Nie 
Signed-off-by: Vinod Koul 
Signed-off-by: Sasha Levin 
---
 drivers/dma/zx296702_dma.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/dma/zx296702_dma.c b/drivers/dma/zx296702_dma.c
index 245d759d5ffc..6059d81e701a 100644
--- a/drivers/dma/zx296702_dma.c
+++ b/drivers/dma/zx296702_dma.c
@@ -813,6 +813,7 @@ static int zx_dma_probe(struct platform_device *op)
INIT_LIST_HEAD(&d->slave.channels);
dma_cap_set(DMA_SLAVE, d->slave.cap_mask);
dma_cap_set(DMA_MEMCPY, d->slave.cap_mask);
+   dma_cap_set(DMA_CYCLIC, d->slave.cap_mask);
dma_cap_set(DMA_PRIVATE, d->slave.cap_mask);
d->slave.dev = &op->dev;
d->slave.device_free_chan_resources = zx_dma_free_chan_resources;
-- 
2.11.0

[PATCH AUTOSEL for 4.9 11/56] fscrypt: use ENOTDIR when setting encryption policy on nondirectory

2017-11-14 Thread alexander . levin

From: Eric Biggers 

[ Upstream commit dffd0cfa06d4ed83bb3ae8eb067989ceec5d18e1 ]

As part of an effort to clean up fscrypt-related error codes, make
FS_IOC_SET_ENCRYPTION_POLICY fail with ENOTDIR when the file descriptor
does not refer to a directory.  This is more descriptive than EINVAL,
which was ambiguous with some of the other error cases.

I am not aware of any users who might be relying on the previous error
code of EINVAL, which was never documented anywhere, and in some buggy
kernels did not exist at all as the S_ISDIR() check was missing.

This failure case will be exercised by an xfstest.

Signed-off-by: Eric Biggers 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Sasha Levin 
---
 fs/crypto/policy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index bb4e209bd809..c160d2d0e18d 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -113,7 +113,7 @@ int fscrypt_process_policy(struct file *filp,
 
if (!inode_has_encryption_context(inode)) {
if (!S_ISDIR(inode->i_mode))
-   ret = -EINVAL;
+   ret = -ENOTDIR;
else if (!inode->i_sb->s_cop->empty_dir)
ret = -EOPNOTSUPP;
else if (!inode->i_sb->s_cop->empty_dir(inode))
-- 
2.11.0

[PATCH AUTOSEL for 4.9 13/56] net: 3com: typhoon: typhoon_init_one: make return values more specific

2017-11-14 Thread alexander . levin

From: Thomas Preisner 

[ Upstream commit 6b6bbb5922a4b1d4b58125a572da91010295fba3 ]

In some cases the return value of a failing function is not being used
and the function typhoon_init_one() returns another negative error code
instead.

Signed-off-by: Thomas Preisner 
Signed-off-by: Milan Stephan 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/3com/typhoon.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/3com/typhoon.c 
b/drivers/net/ethernet/3com/typhoon.c
index 8f8418d2ac4a..6d0f0eb2ee2f 100644
--- a/drivers/net/ethernet/3com/typhoon.c
+++ b/drivers/net/ethernet/3com/typhoon.c
@@ -2366,9 +2366,9 @@ typhoon_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 * 4) Get the hardware address.
 * 5) Put the card to sleep.
 */
-   if (typhoon_reset(ioaddr, WaitSleep) < 0) {
+   err = typhoon_reset(ioaddr, WaitSleep);
+   if (err < 0) {
err_msg = "could not reset 3XP";
-   err = -EIO;
goto error_out_dma;
}
 
@@ -2382,16 +2382,16 @@ typhoon_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
typhoon_init_interface(tp);
typhoon_init_rings(tp);
 
-   if(typhoon_boot_3XP(tp, TYPHOON_STATUS_WAITING_FOR_HOST) < 0) {
+   err = typhoon_boot_3XP(tp, TYPHOON_STATUS_WAITING_FOR_HOST);
+   if (err < 0) {
err_msg = "cannot boot 3XP sleep image";
-   err = -EIO;
goto error_out_reset;
}
 
INIT_COMMAND_WITH_RESPONSE(&xp_cmd, TYPHOON_CMD_READ_MAC_ADDRESS);
-   if(typhoon_issue_command(tp, 1, &xp_cmd, 1, xp_resp) < 0) {
+   err = typhoon_issue_command(tp, 1, &xp_cmd, 1, xp_resp);
+   if (err < 0) {
err_msg = "cannot read MAC address";
-   err = -EIO;
goto error_out_reset;
}
 
@@ -2424,9 +2424,9 @@ typhoon_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if(xp_resp[0].numDesc != 0)
tp->capabilities |= TYPHOON_WAKEUP_NEEDS_RESET;
 
-   if(typhoon_sleep(tp, PCI_D3hot, 0) < 0) {
+   err = typhoon_sleep(tp, PCI_D3hot, 0);
+   if (err < 0) {
err_msg = "cannot put adapter to sleep";
-   err = -EIO;
goto error_out_reset;
}
 
-- 
2.11.0

[PATCH AUTOSEL for 4.9 16/56] rt2800: set minimum MPDU and PSDU lengths to sane values

2017-11-14 Thread alexander . levin

From: Stanislaw Gruszka 

[ Upstream commit a51b89698ccc93c7e274eb71377fae49c4593ab2 ]

Signed-off-by: Stanislaw Gruszka 
Signed-off-by: Kalle Valo 
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/ralink/rt2x00/rt2800lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800lib.c 
b/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
index bf3f0a39908c..9fc6f1615343 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
@@ -4707,8 +4707,8 @@ static int rt2800_init_registers(struct rt2x00_dev 
*rt2x00dev)
rt2x00_set_field32(®, MAX_LEN_CFG_MAX_PSDU, 2);
else
rt2x00_set_field32(®, MAX_LEN_CFG_MAX_PSDU, 1);
-   rt2x00_set_field32(®, MAX_LEN_CFG_MIN_PSDU, 0);
-   rt2x00_set_field32(®, MAX_LEN_CFG_MIN_MPDU, 0);
+   rt2x00_set_field32(®, MAX_LEN_CFG_MIN_PSDU, 10);
+   rt2x00_set_field32(®, MAX_LEN_CFG_MIN_MPDU, 10);
rt2800_register_write(rt2x00dev, MAX_LEN_CFG, reg);
 
rt2800_register_read(rt2x00dev, LED_CFG, ®);
-- 
2.11.0

[PATCH AUTOSEL for 4.9 14/56] net: 3com: typhoon: typhoon_init_one: fix incorrect return values

2017-11-14 Thread alexander . levin

From: Thomas Preisner 

[ Upstream commit 107fded7bf616ad6f46823d98b8ed6405d7adf2d ]

In a few cases the err-variable is not set to a negative error code if a
function call in typhoon_init_one() fails and thus 0 is returned
instead.
It may be better to set err to the appropriate negative error
code before returning.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188841

Reported-by: Pan Bian 
Signed-off-by: Thomas Preisner 
Signed-off-by: Milan Stephan 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/ethernet/3com/typhoon.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/3com/typhoon.c 
b/drivers/net/ethernet/3com/typhoon.c
index 6d0f0eb2ee2f..a0012c3cb4f6 100644
--- a/drivers/net/ethernet/3com/typhoon.c
+++ b/drivers/net/ethernet/3com/typhoon.c
@@ -2398,8 +2398,9 @@ typhoon_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
*(__be16 *)&dev->dev_addr[0] = htons(le16_to_cpu(xp_resp[0].parm1));
*(__be32 *)&dev->dev_addr[2] = htonl(le32_to_cpu(xp_resp[0].parm2));
 
-   if(!is_valid_ether_addr(dev->dev_addr)) {
+   if (!is_valid_ether_addr(dev->dev_addr)) {
err_msg = "Could not obtain valid ethernet address, aborting";
+   err = -EIO;
goto error_out_reset;
}
 
@@ -2407,7 +2408,8 @@ typhoon_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 * later when we print out the version reported.
 */
INIT_COMMAND_WITH_RESPONSE(&xp_cmd, TYPHOON_CMD_READ_VERSIONS);
-   if(typhoon_issue_command(tp, 1, &xp_cmd, 3, xp_resp) < 0) {
+   err = typhoon_issue_command(tp, 1, &xp_cmd, 3, xp_resp);
+   if (err < 0) {
err_msg = "Could not get Sleep Image version";
goto error_out_reset;
}
@@ -2449,7 +2451,8 @@ typhoon_init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
dev->features = dev->hw_features |
NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_RXCSUM;
 
-   if(register_netdev(dev) < 0) {
+   err = register_netdev(dev);
+   if (err < 0) {
err_msg = "unable to register netdev";
goto error_out_reset;
}
-- 
2.11.0

[PATCH AUTOSEL for 4.9 15/56] drm/armada: Fix compile fail

2017-11-14 Thread alexander . levin

From: Daniel Vetter 

[ Upstream commit 7357f89954b6d005df6ab8929759e78d7d9a80f9 ]

I reported the include issue for tracepoints a while ago, but nothing
seems to have happened. Now it bit us, since the drm_mm_print
conversion was broken for armada. Fix it, so I can re-enable armada
in the drm-misc build configs.

v2: Rebase just the compile fix on top of Chris' build fix.

Cc: Russell King 
Cc: Chris Wilson 
Acked: Chris Wilson 
Signed-off-by: Daniel Vetter 
Link: 
http://patchwork.freedesktop.org/patch/msgid/1483115932-19584-1-git-send-email-daniel.vet...@ffwll.ch
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/armada/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/armada/Makefile b/drivers/gpu/drm/armada/Makefile
index ffd673615772..26412d2f8c98 100644
--- a/drivers/gpu/drm/armada/Makefile
+++ b/drivers/gpu/drm/armada/Makefile
@@ -4,3 +4,5 @@ armada-y+= armada_510.o
 armada-$(CONFIG_DEBUG_FS) += armada_debugfs.o
 
 obj-$(CONFIG_DRM_ARMADA) := armada.o
+
+CFLAGS_armada_trace.o := -I$(src)
-- 
2.11.0

[PATCH AUTOSEL for 4.9 12/56] net: Allow IP_MULTICAST_IF to set index to L3 slave

2017-11-14 Thread alexander . levin

From: David Ahern 

[ Upstream commit 7bb387c5ab12aeac3d5eea28686489ff46b53ca9 ]

IP_MULTICAST_IF fails if sk_bound_dev_if is already set and the new index
does not match it. e.g.,

ntpd[15381]: setsockopt IP_MULTICAST_IF 192.168.1.23 fails: Invalid argument

Relax the check in setsockopt to allow setting mc_index to an L3 slave if
sk_bound_dev_if points to an L3 master.

Make a similar change for IPv6. In this case change the device lookup to
take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for
the lookup of a potential L3 master device.

This really only silences a setsockopt failure since uses of mc_index are
secondary to sk_bound_dev_if if it is set. In both cases, if either index
is an L3 slave or master, lookups are directed to the same FIB table so
relaxing the check at setsockopt time causes no harm.

Patch is based on a suggested change by Darwin for a problem noted in
their code base.

Suggested-by: Darwin Dingel 
Signed-off-by: David Ahern 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 net/ipv4/ip_sockglue.c   |  7 ++-
 net/ipv6/ipv6_sockglue.c | 16 
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 4d37bdcbc2d5..551dd393ceec 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -819,6 +819,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
{
struct ip_mreqn mreq;
struct net_device *dev = NULL;
+   int midx;
 
if (sk->sk_type == SOCK_STREAM)
goto e_inval;
@@ -863,11 +864,15 @@ static int do_ip_setsockopt(struct sock *sk, int level,
err = -EADDRNOTAVAIL;
if (!dev)
break;
+
+   midx = l3mdev_master_ifindex(dev);
+
dev_put(dev);
 
err = -EINVAL;
if (sk->sk_bound_dev_if &&
-   mreq.imr_ifindex != sk->sk_bound_dev_if)
+   mreq.imr_ifindex != sk->sk_bound_dev_if &&
+   (!midx || midx != sk->sk_bound_dev_if))
break;
 
inet->mc_index = mreq.imr_ifindex;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 636ec56f5f50..38bee173dc2b 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -585,16 +585,24 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, 
int optname,
 
if (val) {
struct net_device *dev;
+   int midx;
 
-   if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != val)
-   goto e_inval;
+   rcu_read_lock();
 
-   dev = dev_get_by_index(net, val);
+   dev = dev_get_by_index_rcu(net, val);
if (!dev) {
+   rcu_read_unlock();
retv = -ENODEV;
break;
}
-   dev_put(dev);
+   midx = l3mdev_master_ifindex_rcu(dev);
+
+   rcu_read_unlock();
+
+   if (sk->sk_bound_dev_if &&
+   sk->sk_bound_dev_if != val &&
+   (!midx || midx != sk->sk_bound_dev_if))
+   goto e_inval;
}
np->mcast_oif = val;
retv = 0;
-- 
2.11.0

[PATCH AUTOSEL for 4.9 18/56] mwifiex: sdio: fix use after free issue for save_adapter

2017-11-14 Thread alexander . levin

From: Amitkumar Karwar 

[ Upstream commit 74c8719b8ee0922593a5cbec0bd6127d86d8a2f4 ]

If we have sdio work requests received when sdio card reset is
happening, we may end up accessing older save_adapter pointer
later which is already freed during card reset.
This patch solves the problem by cancelling those pending requests.

Signed-off-by: Amitkumar Karwar 
Signed-off-by: Kalle Valo 
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/marvell/mwifiex/sdio.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c 
b/drivers/net/wireless/marvell/mwifiex/sdio.c
index 8718950004f3..8d601dcf2948 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -2296,6 +2296,12 @@ static void mwifiex_recreate_adapter(struct 
sdio_mmc_card *card)
mmc_hw_reset(func->card->host);
sdio_release_host(func);
 
+   /* Previous save_adapter won't be valid after this. We will cancel
+* pending work requests.
+*/
+   clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &iface_work_flags);
+   clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &iface_work_flags);
+
mwifiex_sdio_probe(func, device_id);
 }
 
-- 
2.11.0

[PATCH AUTOSEL for 4.9 17/56] adm80211: return an error if adm8211_alloc_rings() fails

2017-11-14 Thread alexander . levin

From: Dan Carpenter 

[ Upstream commit c705a6b3aa7804d7bc6660183f51e510c61dc807 ]

We accidentally return success when adm8211_alloc_rings() fails but we
should preserve the error code.

Fixes: cc0b88cf5ecf ("[PATCH] Add adm8211 802.11b wireless driver")
Signed-off-by: Dan Carpenter 
Signed-off-by: Kalle Valo 
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/admtek/adm8211.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/admtek/adm8211.c 
b/drivers/net/wireless/admtek/adm8211.c
index 70ecd82d674d..0321f85c388c 100644
--- a/drivers/net/wireless/admtek/adm8211.c
+++ b/drivers/net/wireless/admtek/adm8211.c
@@ -1843,7 +1843,8 @@ static int adm8211_probe(struct pci_dev *pdev,
priv->rx_ring_size = rx_ring_size;
priv->tx_ring_size = tx_ring_size;
 
-   if (adm8211_alloc_rings(dev)) {
+   err = adm8211_alloc_rings(dev);
+   if (err) {
printk(KERN_ERR "%s (adm8211): Cannot allocate TX/RX ring\n",
   pci_name(pdev));
goto err_iounmap;
-- 
2.11.0

[PATCH AUTOSEL for 4.9 02/56] RDS: make message size limit compliant with spec

2017-11-14 Thread alexander . levin

From: Avinash Repaka 

[ Upstream commit f9fb69adb6c7acca60977a4db5a5f95b8e66c041 ]

RDS support max message size as 1M but the code doesn't check this
in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size
and its enforced irrespective of underlying transport.

Signed-off-by: Avinash Repaka 
Signed-off-by: Santosh Shilimkar 
Signed-off-by: Sasha Levin 
---
 net/rds/rdma.c |  9 -
 net/rds/rds.h  |  3 +++
 net/rds/send.c | 31 +++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index 8d3a851a3476..60e90f761838 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -40,7 +40,6 @@
 /*
  * XXX
  *  - build with sparse
- *  - should we limit the size of a mr region?  let transport return failure?
  *  - should we detect duplicate keys on a socket?  hmm.
  *  - an rdma is an mlock, apply rlimit?
  */
@@ -200,6 +199,14 @@ static int __rds_rdma_map(struct rds_sock *rs, struct 
rds_get_mr_args *args,
goto out;
}
 
+   /* Restrict the size of mr irrespective of underlying transport
+* To account for unaligned mr regions, subtract one from nr_pages
+*/
+   if ((nr_pages - 1) > (RDS_MAX_MSG_SIZE >> PAGE_SHIFT)) {
+   ret = -EMSGSIZE;
+   goto out;
+   }
+
rdsdebug("RDS: get_mr addr %llx len %llu nr_pages %u\n",
args->vec.addr, args->vec.bytes, nr_pages);
 
diff --git a/net/rds/rds.h b/net/rds/rds.h
index f107a968ddff..30a51fec0f63 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -50,6 +50,9 @@ void rdsdebug(char *fmt, ...)
 #define RDS_FRAG_SHIFT 12
 #define RDS_FRAG_SIZE  ((unsigned int)(1 << RDS_FRAG_SHIFT))
 
+/* Used to limit both RDMA and non-RDMA RDS message to 1MB */
+#define RDS_MAX_MSG_SIZE   ((unsigned int)(1 << 20))
+
 #define RDS_CONG_MAP_BYTES (65536 / 8)
 #define RDS_CONG_MAP_PAGES (PAGE_ALIGN(RDS_CONG_MAP_BYTES) / PAGE_SIZE)
 #define RDS_CONG_MAP_PAGE_BITS (PAGE_SIZE * 8)
diff --git a/net/rds/send.c b/net/rds/send.c
index f28651b6ae83..310d57928405 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -988,6 +988,26 @@ static int rds_send_mprds_hash(struct rds_sock *rs, struct 
rds_connection *conn)
return hash;
 }
 
+static int rds_rdma_bytes(struct msghdr *msg, size_t *rdma_bytes)
+{
+   struct rds_rdma_args *args;
+   struct cmsghdr *cmsg;
+
+   for_each_cmsghdr(cmsg, msg) {
+   if (!CMSG_OK(msg, cmsg))
+   return -EINVAL;
+
+   if (cmsg->cmsg_level != SOL_RDS)
+   continue;
+
+   if (cmsg->cmsg_type == RDS_CMSG_RDMA_ARGS) {
+   args = CMSG_DATA(cmsg);
+   *rdma_bytes += args->remote_vec.bytes;
+   }
+   }
+   return 0;
+}
+
 int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
 {
struct sock *sk = sock->sk;
@@ -1002,6 +1022,7 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, 
size_t payload_len)
int nonblock = msg->msg_flags & MSG_DONTWAIT;
long timeo = sock_sndtimeo(sk, nonblock);
struct rds_conn_path *cpath;
+   size_t total_payload_len = payload_len, rdma_payload_len = 0;
 
/* Mirror Linux UDP mirror of BSD error message compatibility */
/* XXX: Perhaps MSG_MORE someday */
@@ -1034,6 +1055,16 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, 
size_t payload_len)
}
release_sock(sk);
 
+   ret = rds_rdma_bytes(msg, &rdma_payload_len);
+   if (ret)
+   goto out;
+
+   total_payload_len += rdma_payload_len;
+   if (max_t(size_t, payload_len, rdma_payload_len) > RDS_MAX_MSG_SIZE) {
+   ret = -EMSGSIZE;
+   goto out;
+   }
+
if (payload_len > rds_sk_sndbuf(rs)) {
ret = -EMSGSIZE;
goto out;
-- 
2.11.0

[PATCH AUTOSEL for 4.9 19/56] ath10k: fix incorrect txpower set by P2P_DEVICE interface

2017-11-14 Thread alexander . levin

From: Ryan Hsu 

[ Upstream commit 88407beb1b1462f706a1950a355fd086e1c450b6 ]

Ath10k reports the phy capability that supports P2P_DEVICE interface.

When we use the P2P supported wpa_supplicant to start connection, it'll
create two interfaces, one is wlan0 (vdev_id=0) and one is P2P_DEVICE
p2p-dev-wlan0 which is for p2p control channel (vdev_id=1).

ath10k_pci mac vdev create 0 (add interface) type 2 subtype 0
ath10k_add_interface: vdev_id: 0, txpower: 0, bss_power: 0
...
ath10k_pci mac vdev create 1 (add interface) type 2 subtype 1
ath10k_add_interface: vdev_id: 1, txpower: 0, bss_power: 0

And the txpower in per vif bss_conf will only be set to valid tx power when
the interface is assigned with channel_ctx.

But this P2P_DEVICE interface will never be used for any connection, so
that the uninitialized bss_conf.txpower=0 is assinged to the
arvif->txpower when interface created.

Since the txpower configuration is firmware per physical interface.
So the smallest txpower of all vifs will be the one limit the tx power
of the physical device, that causing the low txpower issue on other
active interfaces.

wlan0: Limiting TX power to 21 (24 - 3) dBm
ath10k_pci mac vdev_id 0 txpower 21
ath10k_mac_txpower_recalc: vdev_id: 1, txpower: 0
ath10k_mac_txpower_recalc: vdev_id: 0, txpower: 21
ath10k_pci mac txpower 0

This issue only happens when we use the wpa_supplicant that supports
P2P or if we use the iw tool to create the control P2P_DEVICE interface.

Signed-off-by: Ryan Hsu 
Signed-off-by: Kalle Valo 
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/ath/ath10k/mac.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 30e98afa2e68..d3ac7e0745e2 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4668,7 +4668,8 @@ static int ath10k_mac_txpower_recalc(struct ath10k *ar)
lockdep_assert_held(&ar->conf_mutex);
 
list_for_each_entry(arvif, &ar->arvifs, list) {
-   WARN_ON(arvif->txpower < 0);
+   if (arvif->txpower <= 0)
+   continue;
 
if (txpower == -1)
txpower = arvif->txpower;
@@ -4676,8 +4677,8 @@ static int ath10k_mac_txpower_recalc(struct ath10k *ar)
txpower = min(txpower, arvif->txpower);
}
 
-   if (WARN_ON(txpower == -1))
-   return -EINVAL;
+   if (txpower == -1)
+   return 0;
 
ret = ath10k_mac_txpower_setup(ar, txpower);
if (ret) {
-- 
2.11.0

[PATCH AUTOSEL for 4.9 21/56] ath10k: fix potential memory leak in ath10k_wmi_tlv_op_pull_fw_stats()

2017-11-14 Thread alexander . levin

From: Christian Lamparter 

[ Upstream commit 097e46d2ae90265d1afe141ba6208ba598b79e01 ]

ath10k_wmi_tlv_op_pull_fw_stats() uses tb = ath10k_wmi_tlv_parse_alloc(...)
function, which allocates memory. If any of the three error-paths are
taken, this tb needs to be freed.

Signed-off-by: Christian Lamparter 
Signed-off-by: Kalle Valo 
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/ath/ath10k/wmi-tlv.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c 
b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
index e64f59300a7c..0e4d49a0 100644
--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
@@ -1105,8 +1105,10 @@ static int ath10k_wmi_tlv_op_pull_fw_stats(struct ath10k 
*ar,
struct ath10k_fw_stats_pdev *dst;
 
src = data;
-   if (data_len < sizeof(*src))
+   if (data_len < sizeof(*src)) {
+   kfree(tb);
return -EPROTO;
+   }
 
data += sizeof(*src);
data_len -= sizeof(*src);
@@ -1126,8 +1128,10 @@ static int ath10k_wmi_tlv_op_pull_fw_stats(struct ath10k 
*ar,
struct ath10k_fw_stats_vdev *dst;
 
src = data;
-   if (data_len < sizeof(*src))
+   if (data_len < sizeof(*src)) {
+   kfree(tb);
return -EPROTO;
+   }
 
data += sizeof(*src);
data_len -= sizeof(*src);
@@ -1145,8 +1149,10 @@ static int ath10k_wmi_tlv_op_pull_fw_stats(struct ath10k 
*ar,
struct ath10k_fw_stats_peer *dst;
 
src = data;
-   if (data_len < sizeof(*src))
+   if (data_len < sizeof(*src)) {
+   kfree(tb);
return -EPROTO;
+   }
 
data += sizeof(*src);
data_len -= sizeof(*src);
-- 
2.11.0

[PATCH AUTOSEL for 4.9 20/56] ath10k: ignore configuring the incorrect board_id

2017-11-14 Thread alexander . levin

From: Ryan Hsu 

[ Upstream commit d2e202c06ca42d353d95df12437740921a6d05b5 ]

With command to get board_id from otp, in the case of following

  boot get otp board id result 0x board_id 0 chip_id 0
  boot using board name 'bus=pci,bmi-chip-id=0,bmi-board-id=0"
  ...
  failed to fetch board data for bus=pci,bmi-chip-id=0,bmi-board-id=0 from
  ath10k/QCA6174/hw3.0/board-2.bin

The invalid board_id=0 will be used as index to search in the board-2.bin.

Ignore the case with board_id=0, as it means the otp is not carrying
the board id information.

Signed-off-by: Ryan Hsu 
Signed-off-by: Kalle Valo 
Signed-off-by: Sasha Levin 
---
 drivers/net/wireless/ath/ath10k/core.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c 
b/drivers/net/wireless/ath/ath10k/core.c
index 366d3dcb8e9d..7b3017f55e3d 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -691,8 +691,11 @@ static int ath10k_core_get_board_id_from_otp(struct ath10k 
*ar)
   "boot get otp board id result 0x%08x board_id %d chip_id 
%d\n",
   result, board_id, chip_id);
 
-   if ((result & ATH10K_BMI_BOARD_ID_STATUS_MASK) != 0)
+   if ((result & ATH10K_BMI_BOARD_ID_STATUS_MASK) != 0 ||
+   (board_id == 0)) {
+   ath10k_warn(ar, "board id is not exist in otp, ignore it\n");
return -EOPNOTSUPP;
+   }
 
ar->id.bmi_ids_valid = true;
ar->id.bmi_board_id = board_id;
-- 
2.11.0

RE: [PATCH v6 3/3] ACPI / PMIC: Add TI PMIC TPS68470 operation region driver

2017-11-14 Thread Mani, Rajmohan

Hi Rafael,

> -Original Message-
> From: Rafael J. Wysocki [mailto:r...@rjwysocki.net]
> Sent: Tuesday, October 03, 2017 4:44 AM
> To: Sakari Ailus 
> Cc: Mani, Rajmohan ; linux-
> ker...@vger.kernel.org; linux-g...@vger.kernel.org; linux-
> a...@vger.kernel.org; Lee Jones ; Linus Walleij
> ; Alexandre Courbot ; Len
> Brown ; Andy Shevchenko 
> Subject: Re: [PATCH v6 3/3] ACPI / PMIC: Add TI PMIC TPS68470 operation
> region driver
> 
> On Monday, October 2, 2017 6:54:03 PM CEST Sakari Ailus wrote:
> > Hi Rafael,
> >
> > On Mon, Aug 14, 2017 at 11:39:00PM +0300, Sakari Ailus wrote:
> > > On Fri, Jul 28, 2017 at 05:30:26PM -0700, Rajmohan Mani wrote:
> > > > The Kabylake platform coreboot (Chrome OS equivalent of
> > > > BIOS) has defined 4 operation regions for the TI TPS68470 PMIC.
> > > > These operation regions are to enable/disable voltage regulators,
> > > > configure voltage regulators, enable/disable clocks and to
> > > > configure clocks.
> > > >
> > > > This config adds ACPI operation region support for TI TPS68470 PMIC.
> > > > TPS68470 device is an advanced power management unit that powers a
> > > > Compact Camera Module (CCM), generates clocks for image sensors,
> > > > drives a dual LED for flash and incorporates two LED drivers for
> > > > general purpose indicators.
> > > > This driver enables ACPI operation region support to control
> > > > voltage regulators and clocks for the TPS68470 PMIC.
> > > >
> > > > Signed-off-by: Rajmohan Mani 
> > >
> > > Acked-by: Sakari Ailus 
> >
> > The other two patches from the set are in v4.14-rc1.
> 
> Thanks!
> 
> I'll queue it up for 4.15 then.
> 

Thanks. This patch has landed in mainline kernel couple of days ago.

Raj

[PATCH AUTOSEL for 4.9 40/56] drm/i915: Assert no external observers when unwind a failed request alloc

2017-11-14 Thread alexander . levin

From: Chris Wilson 

[ Upstream commit 1618bdb89b5d8b47edf42d9c6ea96ecf001ad625 ]

Before we return the request back to the kmem_cache after a failed
i915_gem_request_alloc(), we should assert that it has not been added to
any global state tracking.

Signed-off-by: Chris Wilson 
Reviewed-by: Joonas Lahtinen 
Link: 
http://patchwork.freedesktop.org/patch/msgid/20161125131718.20978-2-ch...@chris-wilson.co.uk
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/i915/i915_gem_request.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c 
b/drivers/gpu/drm/i915/i915_gem_request.c
index 8832f8ec1583..71e284d7c640 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -455,6 +455,11 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
return req;
 
 err_ctx:
+   /* Make sure we didn't add ourselves to external state before freeing */
+   GEM_BUG_ON(!list_empty(&req->active_list));
+   GEM_BUG_ON(!list_empty(&req->priotree.signalers_list));
+   GEM_BUG_ON(!list_empty(&req->priotree.waiters_list));
+
i915_gem_context_put(ctx);
 err:
kmem_cache_free(dev_priv->requests, req);
-- 
2.11.0

Re: [PATCH v4 1/1] xdp: Sample xdp program implementing ip forward

2017-11-14 Thread Christina Jacob

On Thu, Nov 9, 2017 at 7:08 AM, Jesper Dangaard Brouer
 wrote:
> On Wed, 08 Nov 2017 10:40:24 +0900 (KST)
> David Miller  wrote:
>
>> From: Christina Jacob 
>> Date: Sun,  5 Nov 2017 08:52:30 +0530
>>
>> > From: Christina Jacob 
>> >
>> > Implements port to port forwarding with route table and arp table
>> > lookup for ipv4 packets using bpf_redirect helper function and
>> > lpm_trie  map.
>> >
>> > Signed-off-by: Christina Jacob 
>>
>> Applied to net-next, thank you.
>
> I've not had time to proper test (and review) this V4 patch, but I
> guess I'll have to do so when I get home from Seoul...
>
> I especially want to measure the effect of using bpf_redirect_map().
> To Christina: what performance improvement did you see on your
> board/arch when switching from bpf_redirect() to bpf_redirect_map()?

ndo_xdp_flush is yet to be implemented in our driver.
So I don't see any difference moving from bpf_redirect to bpf_redirect_map.

>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

[PATCH AUTOSEL for 4.9 34/56] mac80211: Suppress NEW_PEER_CANDIDATE event if no room

2017-11-14 Thread alexander . levin

From: Masashi Honma 

[ Upstream commit 11197d006bcfabf0173a7820a163fcaac420d10e ]

Previously, kernel sends NEW_PEER_CANDIDATE event to user land even if
the found peer does not have any room to accept other peer. This causes
continuous connection trials.

Signed-off-by: Masashi Honma 
Signed-off-by: Johannes Berg 
Signed-off-by: Sasha Levin 
---
 net/mac80211/mesh_plink.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c
index 7fcdcf622655..fcba70e57073 100644
--- a/net/mac80211/mesh_plink.c
+++ b/net/mac80211/mesh_plink.c
@@ -505,12 +505,14 @@ mesh_sta_info_alloc(struct ieee80211_sub_if_data *sdata, 
u8 *addr,
 
/* Userspace handles station allocation */
if (sdata->u.mesh.user_mpm ||
-   sdata->u.mesh.security & IEEE80211_MESH_SEC_AUTHED)
-   cfg80211_notify_new_peer_candidate(sdata->dev, addr,
-  elems->ie_start,
-  elems->total_len,
-  GFP_KERNEL);
-   else
+   sdata->u.mesh.security & IEEE80211_MESH_SEC_AUTHED) {
+   if (mesh_peer_accepts_plinks(elems) &&
+   mesh_plink_availables(sdata))
+   cfg80211_notify_new_peer_candidate(sdata->dev, addr,
+  elems->ie_start,
+  elems->total_len,
+  GFP_KERNEL);
+   } else
sta = __mesh_sta_info_alloc(sdata, addr);
 
return sta;
-- 
2.11.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 938 matches

Mail list logo