Re: [PATCH v2 6/6] staging: lustre: mdc: use large xattr buffers for old servers

2018-05-31 Thread NeilBrown
On Thu, May 31 2018, Dilger, Andreas wrote:

> On May 31, 2018, at 18:54, Greg Kroah-Hartman  
> wrote:
>> 
>> On Tue, May 29, 2018 at 10:21:45AM -0400, James Simmons wrote:
>>> From: "John L. Hammond" 
>>> 
>>> Pre 2.10.1 MDTs will crash when they receive a listxattr (MDS_GETXATTR
>>> with OBD_MD_FLXATTRLS) RPC for an orphan or dead object. So for
>>> clients connected to these older MDTs, try to avoid sending listxattr
>>> RPCs by making the bulk getxattr (MDS_GETXATTR with OBD_MD_FLXATTRALL)
>>> more likely to succeed and thereby reducing the chances of falling
>>> back to listxattr.
>>> 
>>> +#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
>> 
>> Why are you adding pointless version checks to mainline?  Please don't
>> add new ones of these, you need to be working on removing the existing
>> ones.
>
> These are not Linux kernel version checks, but rather Lustre release version
> checks.  This allows us to remove workarounds like this in the future when
> they are no longer needed, rather than accumulating cruft forever.  It's like
> the separation of NFSv2 vs NFSv3 vs NFSv4.

It looks very different to the separation of NFSv{2,3,4}.  Those are
conditionally compiled on a whole-file basis.
If we ever want to remove this code it should be hard to search for
occurances of OBD_OCD_VERSION() in the code, we don't need the C
preprocessor to be able to remove them for us.
In this particular example:

+   if (exp->exp_connect_data.ocd_version < OBD_OCD_VERSION(2, 10, 1, 0))
+   min_buf_size = exp->exp_connect_data.ocd_max_easize;

if you want to be able to compile without that one test, you could arrange
that OBD_OSC_VERSION(2, 10, 1, 0) evaluates to 0.
As ocd_version is unsigned, the comparison will always be false, and
the compiler will optimize the code away.

As a general rule, you need a very good reason to have #if or #ifdef in
.c files.  They are usually OK in .h files.

Thanks,
NeilBrown



signature.asc
Description: PGP signature


Re: [PATCH v2 6/6] staging: lustre: mdc: use large xattr buffers for old servers

2018-05-31 Thread NeilBrown
On Thu, May 31 2018, Dilger, Andreas wrote:

> On May 31, 2018, at 18:54, Greg Kroah-Hartman  
> wrote:
>> 
>> On Tue, May 29, 2018 at 10:21:45AM -0400, James Simmons wrote:
>>> From: "John L. Hammond" 
>>> 
>>> Pre 2.10.1 MDTs will crash when they receive a listxattr (MDS_GETXATTR
>>> with OBD_MD_FLXATTRLS) RPC for an orphan or dead object. So for
>>> clients connected to these older MDTs, try to avoid sending listxattr
>>> RPCs by making the bulk getxattr (MDS_GETXATTR with OBD_MD_FLXATTRALL)
>>> more likely to succeed and thereby reducing the chances of falling
>>> back to listxattr.
>>> 
>>> +#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
>> 
>> Why are you adding pointless version checks to mainline?  Please don't
>> add new ones of these, you need to be working on removing the existing
>> ones.
>
> These are not Linux kernel version checks, but rather Lustre release version
> checks.  This allows us to remove workarounds like this in the future when
> they are no longer needed, rather than accumulating cruft forever.  It's like
> the separation of NFSv2 vs NFSv3 vs NFSv4.

It looks very different to the separation of NFSv{2,3,4}.  Those are
conditionally compiled on a whole-file basis.
If we ever want to remove this code it should be hard to search for
occurances of OBD_OCD_VERSION() in the code, we don't need the C
preprocessor to be able to remove them for us.
In this particular example:

+   if (exp->exp_connect_data.ocd_version < OBD_OCD_VERSION(2, 10, 1, 0))
+   min_buf_size = exp->exp_connect_data.ocd_max_easize;

if you want to be able to compile without that one test, you could arrange
that OBD_OSC_VERSION(2, 10, 1, 0) evaluates to 0.
As ocd_version is unsigned, the comparison will always be false, and
the compiler will optimize the code away.

As a general rule, you need a very good reason to have #if or #ifdef in
.c files.  They are usually OK in .h files.

Thanks,
NeilBrown



signature.asc
Description: PGP signature


How to obtain file's inode in ufshcd driver

2018-05-31 Thread Roman Storozhenko
Hello everybody.

I am modifying ufshcd driver:
https://elixir.bootlin.com/linux/v3.8/source/drivers/scsi/ufs/ufshcd.c

I do a read of a file on ext4 filesystem and want to obtain the file's
inode. But it seems that there are no structures contatin any
references to inode at this level of abstraction.
I looked into scsi_cmd structure and all the structures it references
to. Also I used ftrace to find on which level of abstraction inode
disappear but without any luck. Could you please say to me where I
could find the lowest level where inode is still present and the best
way how to pass it throughout linux io stack to the ufshcd driver?
Frankly speaking I need not struct inode as a whole but just its
number, so using reserved or unused fields of existing structures is
appropriate.

-- 
Kind regards,
Roman


How to obtain file's inode in ufshcd driver

2018-05-31 Thread Roman Storozhenko
Hello everybody.

I am modifying ufshcd driver:
https://elixir.bootlin.com/linux/v3.8/source/drivers/scsi/ufs/ufshcd.c

I do a read of a file on ext4 filesystem and want to obtain the file's
inode. But it seems that there are no structures contatin any
references to inode at this level of abstraction.
I looked into scsi_cmd structure and all the structures it references
to. Also I used ftrace to find on which level of abstraction inode
disappear but without any luck. Could you please say to me where I
could find the lowest level where inode is still present and the best
way how to pass it throughout linux io stack to the ufshcd driver?
Frankly speaking I need not struct inode as a whole but just its
number, so using reserved or unused fields of existing structures is
appropriate.

-- 
Kind regards,
Roman


[PATCH] ASoC: AMD: Configure channel 1 or channel 0 for capture

2018-05-31 Thread Akshu Agrawal
ST/CZ SoC have 2 channels for capture in the I2SSP path.
The DMA though these channels is done using the same dma
descriptors.
We configure the channel and enable it on the basis of
channel selected by machine driver. Machine driver knows
which codec sits on which channel and thus sends the information
to dma driver.

Signed-off-by: Akshu Agrawal 
---
This patch is dependent on ASoC: AMD: Change codec to channel link as per 
hardware redesign
https://patchwork.kernel.org/patch/10388099/

 sound/soc/amd/acp-da7219-max98357a.c | 43 +---
 sound/soc/amd/acp-pcm-dma.c  | 79 +++-
 sound/soc/amd/acp.h  |  4 ++
 3 files changed, 119 insertions(+), 7 deletions(-)

diff --git a/sound/soc/amd/acp-da7219-max98357a.c 
b/sound/soc/amd/acp-da7219-max98357a.c
index 566bd26..f42606e 100644
--- a/sound/soc/amd/acp-da7219-max98357a.c
+++ b/sound/soc/amd/acp-da7219-max98357a.c
@@ -149,6 +149,7 @@ static int cz_da7219_startup(struct snd_pcm_substream 
*substream)
   _rates);
 
machine->i2s_instance = I2S_SP_INSTANCE;
+   machine->capture_channel = CAP_CHANNEL1;
return da7219_clk_enable(substream);
 }
 
@@ -172,7 +173,7 @@ static void cz_max_shutdown(struct snd_pcm_substream 
*substream)
da7219_clk_disable();
 }
 
-static int cz_dmic_startup(struct snd_pcm_substream *substream)
+static int cz_dmic0_startup(struct snd_pcm_substream *substream)
 {
struct snd_soc_pcm_runtime *rtd = substream->private_data;
struct snd_soc_card *card = rtd->card;
@@ -182,6 +183,17 @@ static int cz_dmic_startup(struct snd_pcm_substream 
*substream)
return da7219_clk_enable(substream);
 }
 
+static int cz_dmic1_startup(struct snd_pcm_substream *substream)
+{
+   struct snd_soc_pcm_runtime *rtd = substream->private_data;
+   struct snd_soc_card *card = rtd->card;
+   struct acp_platform_info *machine = snd_soc_card_get_drvdata(card);
+
+   machine->i2s_instance = I2S_SP_INSTANCE;
+   machine->capture_channel = CAP_CHANNEL0;
+   return da7219_clk_enable(substream);
+}
+
 static void cz_dmic_shutdown(struct snd_pcm_substream *substream)
 {
da7219_clk_disable();
@@ -197,8 +209,13 @@ static void cz_dmic_shutdown(struct snd_pcm_substream 
*substream)
.shutdown = cz_max_shutdown,
 };
 
-static const struct snd_soc_ops cz_dmic_cap_ops = {
-   .startup = cz_dmic_startup,
+static const struct snd_soc_ops cz_dmic0_cap_ops = {
+   .startup = cz_dmic0_startup,
+   .shutdown = cz_dmic_shutdown,
+};
+
+static const struct snd_soc_ops cz_dmic1_cap_ops = {
+   .startup = cz_dmic1_startup,
.shutdown = cz_dmic_shutdown,
 };
 
@@ -241,8 +258,9 @@ static void cz_dmic_shutdown(struct snd_pcm_substream 
*substream)
.ops = _max_play_ops,
},
{
-   .name = "dmic",
-   .stream_name = "DMIC Capture",
+   /* C panel DMIC */
+   .name = "dmic0",
+   .stream_name = "DMIC0 Capture",
.platform_name = "acp_audio_dma.0.auto",
.cpu_dai_name = "designware-i2s.3.auto",
.codec_dai_name = "adau7002-hifi",
@@ -250,7 +268,20 @@ static void cz_dmic_shutdown(struct snd_pcm_substream 
*substream)
.dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF
| SND_SOC_DAIFMT_CBM_CFM,
.dpcm_capture = 1,
-   .ops = _dmic_cap_ops,
+   .ops = _dmic0_cap_ops,
+   },
+   {
+   /* A/B panel DMIC */
+   .name = "dmic1",
+   .stream_name = "DMIC1 Capture",
+   .platform_name = "acp_audio_dma.0.auto",
+   .cpu_dai_name = "designware-i2s.2.auto",
+   .codec_dai_name = "adau7002-hifi",
+   .codec_name = "ADAU7002:00",
+   .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF
+   | SND_SOC_DAIFMT_CBM_CFM,
+   .dpcm_capture = 1,
+   .ops = _dmic1_cap_ops,
},
 };
 
diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-dma.c
index 7720384..ad85354 100644
--- a/sound/soc/amd/acp-pcm-dma.c
+++ b/sound/soc/amd/acp-pcm-dma.c
@@ -336,6 +336,68 @@ static void config_acp_dma(void __iomem *acp_mmio,
   rtd->dma_dscr_idx_2, asic_type);
 }
 
+static void acp_dma_cap_channel(void __iomem *acp_mmio, u16 cap_channel,
+   bool enable)
+{
+   u32 val;
+
+   if (enable) {
+   switch (cap_channel) {
+   case CAP_CHANNEL1:
+   val = acp_reg_read(acp_mmio,
+  mmACP_I2S_16BIT_RESOLUTION_EN);
+   if (val & ACP_I2S_MIC_16BIT_RESOLUTION_EN) {
+   acp_reg_write(0x0, acp_mmio,
+ 

[PATCH] ASoC: AMD: Configure channel 1 or channel 0 for capture

2018-05-31 Thread Akshu Agrawal
ST/CZ SoC have 2 channels for capture in the I2SSP path.
The DMA though these channels is done using the same dma
descriptors.
We configure the channel and enable it on the basis of
channel selected by machine driver. Machine driver knows
which codec sits on which channel and thus sends the information
to dma driver.

Signed-off-by: Akshu Agrawal 
---
This patch is dependent on ASoC: AMD: Change codec to channel link as per 
hardware redesign
https://patchwork.kernel.org/patch/10388099/

 sound/soc/amd/acp-da7219-max98357a.c | 43 +---
 sound/soc/amd/acp-pcm-dma.c  | 79 +++-
 sound/soc/amd/acp.h  |  4 ++
 3 files changed, 119 insertions(+), 7 deletions(-)

diff --git a/sound/soc/amd/acp-da7219-max98357a.c 
b/sound/soc/amd/acp-da7219-max98357a.c
index 566bd26..f42606e 100644
--- a/sound/soc/amd/acp-da7219-max98357a.c
+++ b/sound/soc/amd/acp-da7219-max98357a.c
@@ -149,6 +149,7 @@ static int cz_da7219_startup(struct snd_pcm_substream 
*substream)
   _rates);
 
machine->i2s_instance = I2S_SP_INSTANCE;
+   machine->capture_channel = CAP_CHANNEL1;
return da7219_clk_enable(substream);
 }
 
@@ -172,7 +173,7 @@ static void cz_max_shutdown(struct snd_pcm_substream 
*substream)
da7219_clk_disable();
 }
 
-static int cz_dmic_startup(struct snd_pcm_substream *substream)
+static int cz_dmic0_startup(struct snd_pcm_substream *substream)
 {
struct snd_soc_pcm_runtime *rtd = substream->private_data;
struct snd_soc_card *card = rtd->card;
@@ -182,6 +183,17 @@ static int cz_dmic_startup(struct snd_pcm_substream 
*substream)
return da7219_clk_enable(substream);
 }
 
+static int cz_dmic1_startup(struct snd_pcm_substream *substream)
+{
+   struct snd_soc_pcm_runtime *rtd = substream->private_data;
+   struct snd_soc_card *card = rtd->card;
+   struct acp_platform_info *machine = snd_soc_card_get_drvdata(card);
+
+   machine->i2s_instance = I2S_SP_INSTANCE;
+   machine->capture_channel = CAP_CHANNEL0;
+   return da7219_clk_enable(substream);
+}
+
 static void cz_dmic_shutdown(struct snd_pcm_substream *substream)
 {
da7219_clk_disable();
@@ -197,8 +209,13 @@ static void cz_dmic_shutdown(struct snd_pcm_substream 
*substream)
.shutdown = cz_max_shutdown,
 };
 
-static const struct snd_soc_ops cz_dmic_cap_ops = {
-   .startup = cz_dmic_startup,
+static const struct snd_soc_ops cz_dmic0_cap_ops = {
+   .startup = cz_dmic0_startup,
+   .shutdown = cz_dmic_shutdown,
+};
+
+static const struct snd_soc_ops cz_dmic1_cap_ops = {
+   .startup = cz_dmic1_startup,
.shutdown = cz_dmic_shutdown,
 };
 
@@ -241,8 +258,9 @@ static void cz_dmic_shutdown(struct snd_pcm_substream 
*substream)
.ops = _max_play_ops,
},
{
-   .name = "dmic",
-   .stream_name = "DMIC Capture",
+   /* C panel DMIC */
+   .name = "dmic0",
+   .stream_name = "DMIC0 Capture",
.platform_name = "acp_audio_dma.0.auto",
.cpu_dai_name = "designware-i2s.3.auto",
.codec_dai_name = "adau7002-hifi",
@@ -250,7 +268,20 @@ static void cz_dmic_shutdown(struct snd_pcm_substream 
*substream)
.dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF
| SND_SOC_DAIFMT_CBM_CFM,
.dpcm_capture = 1,
-   .ops = _dmic_cap_ops,
+   .ops = _dmic0_cap_ops,
+   },
+   {
+   /* A/B panel DMIC */
+   .name = "dmic1",
+   .stream_name = "DMIC1 Capture",
+   .platform_name = "acp_audio_dma.0.auto",
+   .cpu_dai_name = "designware-i2s.2.auto",
+   .codec_dai_name = "adau7002-hifi",
+   .codec_name = "ADAU7002:00",
+   .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF
+   | SND_SOC_DAIFMT_CBM_CFM,
+   .dpcm_capture = 1,
+   .ops = _dmic1_cap_ops,
},
 };
 
diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-dma.c
index 7720384..ad85354 100644
--- a/sound/soc/amd/acp-pcm-dma.c
+++ b/sound/soc/amd/acp-pcm-dma.c
@@ -336,6 +336,68 @@ static void config_acp_dma(void __iomem *acp_mmio,
   rtd->dma_dscr_idx_2, asic_type);
 }
 
+static void acp_dma_cap_channel(void __iomem *acp_mmio, u16 cap_channel,
+   bool enable)
+{
+   u32 val;
+
+   if (enable) {
+   switch (cap_channel) {
+   case CAP_CHANNEL1:
+   val = acp_reg_read(acp_mmio,
+  mmACP_I2S_16BIT_RESOLUTION_EN);
+   if (val & ACP_I2S_MIC_16BIT_RESOLUTION_EN) {
+   acp_reg_write(0x0, acp_mmio,
+ 

RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-31 Thread Hoeun Ryu


> -Original Message-
> From: Petr Mladek [mailto:pmla...@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky 
> Cc: Hoeun Ryu ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 

I'd like to say that the preempting context is NMI,
so the preempted contexts could be task/irq/bh contexts on the same CPU.

> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 

It looks more clear.
But I'd modify "be part of a soft- or hard-lockup on another CPU." to
"be part of another non-nmi context on the same CPU or a soft- or
hard-lockup on another CPU."

How about it?

> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr



RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-31 Thread Hoeun Ryu


> -Original Message-
> From: Petr Mladek [mailto:pmla...@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky 
> Cc: Hoeun Ryu ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 

I'd like to say that the preempting context is NMI,
so the preempted contexts could be task/irq/bh contexts on the same CPU.

> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 

It looks more clear.
But I'd modify "be part of a soft- or hard-lockup on another CPU." to
"be part of another non-nmi context on the same CPU or a soft- or
hard-lockup on another CPU."

How about it?

> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr



Re: [PATCH 20/32] vfs: Make close() unmount the attached mount if so flagged [ver #8]

2018-05-31 Thread Al Viro
On Fri, Jun 01, 2018 at 04:18:29AM +0100, Al Viro wrote:

> +void umount_on_fput(struct vfsmount *mnt)
>  {

 which needs to do mntput() in case if it's attached.


Re: [PATCH 20/32] vfs: Make close() unmount the attached mount if so flagged [ver #8]

2018-05-31 Thread Al Viro
On Fri, Jun 01, 2018 at 04:18:29AM +0100, Al Viro wrote:

> +void umount_on_fput(struct vfsmount *mnt)
>  {

 which needs to do mntput() in case if it's attached.


Re: [PATCH v2 0/2] slimbus: Add QCOM SLIMBus NGD driver

2018-05-31 Thread Vinod
On 25-05-18, 12:15, Srinivas Kandagatla wrote:
> This patchset adds support to basic version of Qualcomm NGD SLIMBus
> controller driver found SoCs from B family.
> 
> This controller is light-weight SLIMBus controller driver responsible for
> communicating with slave HW directly over the bus using messaging
> interface, and communicating with master component residing on ADSP
> for bandwidth and data-channel management.
> 
> Tested this patchset on DB820c with WCD9335 codec.
> I have pushed my working branch to [1] incase someone want to try.
> 
> This patch has dependency on https://lkml.org/lkml/2018/5/17/251

Looks good to me know, FWIW:

Reviewed-by: Vinod Koul 

-- 
~Vinod


Re: [PATCH v2 0/2] slimbus: Add QCOM SLIMBus NGD driver

2018-05-31 Thread Vinod
On 25-05-18, 12:15, Srinivas Kandagatla wrote:
> This patchset adds support to basic version of Qualcomm NGD SLIMBus
> controller driver found SoCs from B family.
> 
> This controller is light-weight SLIMBus controller driver responsible for
> communicating with slave HW directly over the bus using messaging
> interface, and communicating with master component residing on ADSP
> for bandwidth and data-channel management.
> 
> Tested this patchset on DB820c with WCD9335 codec.
> I have pushed my working branch to [1] incase someone want to try.
> 
> This patch has dependency on https://lkml.org/lkml/2018/5/17/251

Looks good to me know, FWIW:

Reviewed-by: Vinod Koul 

-- 
~Vinod


[PATCH V6 29/38] x86/intel_rdt: Pseudo-lock region creation/removal core

2018-05-31 Thread Reinette Chatre
The user requests a pseudo-locked region by providing a schemata to a
resource group that is in the pseudo-locksetup mode. This is the
functionality that consumes the parsed user data and creates the
pseudo-locked region.

First, required information is deduced from user provided data.
This includes, how much memory does the requested bitmask represent,
which CPU the requested region is associated with, and what is the
cache line size of that cache (to learn the stride needed for locking).
Second, a contiguous block of memory matching the requested bitmask is
allocated.

Finally, pseudo-locking is performed. The resource group already has the
allocation that reflects the requested bitmask. With this class of service
active and interference minimized, the allocated memory is loaded into the
cache.

Signed-off-by: Reinette Chatre 
---
V6: Obtain the cache line size information from cache information
associated with CPU associated with pseudo-locked region instead
of cpu 0. While the result is expected to always be the same between
the cpus is now sure and reflects the intention of the code.

 arch/x86/kernel/cpu/intel_rdt.h |  17 ++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 336 
 2 files changed, 353 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 119645c83e27..886cd28b305f 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -129,11 +129,26 @@ struct mongroup {
  * @d: RDT domain to which this pseudo-locked region
  * belongs
  * @cbm:   bitmask of the pseudo-locked region
+ * @lock_thread_wq:waitqueue used to wait on the pseudo-locking thread
+ * completion
+ * @thread_done:   variable used by waitqueue to test if pseudo-locking
+ * thread completed
+ * @cpu:   core associated with the cache on which the setup code
+ * will be run
+ * @line_size: size of the cache lines
+ * @size:  size of pseudo-locked region in bytes
+ * @kmem:  the kernel memory associated with pseudo-locked region
  */
 struct pseudo_lock_region {
struct rdt_resource *r;
struct rdt_domain   *d;
u32 cbm;
+   wait_queue_head_t   lock_thread_wq;
+   int thread_done;
+   int cpu;
+   unsigned intline_size;
+   unsigned intsize;
+   void*kmem;
 };
 
 /**
@@ -505,6 +520,8 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
 int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
 bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm);
 bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
+void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
 void closid_free(int closid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c 
b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 310c67b12a63..27c06695e7c9 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -11,8 +11,14 @@
 
 #define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
 #include 
+#include 
 #include "intel_rdt.h"
 
 /*
@@ -80,6 +86,53 @@ static u64 get_prefetch_disable_bits(void)
 }
 
 /**
+ * pseudo_lock_region_init - Initialize pseudo-lock region information
+ * @plr: pseudo-lock region
+ *
+ * Called after user provided a schemata to be pseudo-locked. From the
+ * schemata the  pseudo_lock_region is on entry already initialized
+ * with the resource, domain, and capacity bitmask. Here the information
+ * required for pseudo-locking is deduced from this data and 
+ * pseudo_lock_region initialized further. This information includes:
+ * - size in bytes of the region to be pseudo-locked
+ * - cache line size to know the stride with which data needs to be accessed
+ *   to be pseudo-locked
+ * - a cpu associated with the cache instance on which the pseudo-locking
+ *   flow can be executed
+ *
+ * Return: 0 on success, <0 on failure. Descriptive error will be written
+ * to last_cmd_status buffer.
+ */
+static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
+{
+   struct cpu_cacheinfo *ci;
+   int i;
+
+   /* Pick the first cpu we find that is associated with the cache. */
+   plr->cpu = cpumask_first(>d->cpu_mask);
+
+   if (!cpu_online(plr->cpu)) {
+   rdt_last_cmd_printf("cpu %u associated with cache not online\n",
+   plr->cpu);
+   return -ENODEV;
+   }
+
+   ci = 

[PATCH V6 29/38] x86/intel_rdt: Pseudo-lock region creation/removal core

2018-05-31 Thread Reinette Chatre
The user requests a pseudo-locked region by providing a schemata to a
resource group that is in the pseudo-locksetup mode. This is the
functionality that consumes the parsed user data and creates the
pseudo-locked region.

First, required information is deduced from user provided data.
This includes, how much memory does the requested bitmask represent,
which CPU the requested region is associated with, and what is the
cache line size of that cache (to learn the stride needed for locking).
Second, a contiguous block of memory matching the requested bitmask is
allocated.

Finally, pseudo-locking is performed. The resource group already has the
allocation that reflects the requested bitmask. With this class of service
active and interference minimized, the allocated memory is loaded into the
cache.

Signed-off-by: Reinette Chatre 
---
V6: Obtain the cache line size information from cache information
associated with CPU associated with pseudo-locked region instead
of cpu 0. While the result is expected to always be the same between
the cpus is now sure and reflects the intention of the code.

 arch/x86/kernel/cpu/intel_rdt.h |  17 ++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 336 
 2 files changed, 353 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 119645c83e27..886cd28b305f 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -129,11 +129,26 @@ struct mongroup {
  * @d: RDT domain to which this pseudo-locked region
  * belongs
  * @cbm:   bitmask of the pseudo-locked region
+ * @lock_thread_wq:waitqueue used to wait on the pseudo-locking thread
+ * completion
+ * @thread_done:   variable used by waitqueue to test if pseudo-locking
+ * thread completed
+ * @cpu:   core associated with the cache on which the setup code
+ * will be run
+ * @line_size: size of the cache lines
+ * @size:  size of pseudo-locked region in bytes
+ * @kmem:  the kernel memory associated with pseudo-locked region
  */
 struct pseudo_lock_region {
struct rdt_resource *r;
struct rdt_domain   *d;
u32 cbm;
+   wait_queue_head_t   lock_thread_wq;
+   int thread_done;
+   int cpu;
+   unsigned intline_size;
+   unsigned intsize;
+   void*kmem;
 };
 
 /**
@@ -505,6 +520,8 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
 int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
 bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm);
 bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
+void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
 void closid_free(int closid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c 
b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 310c67b12a63..27c06695e7c9 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -11,8 +11,14 @@
 
 #define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
 #include 
+#include 
 #include "intel_rdt.h"
 
 /*
@@ -80,6 +86,53 @@ static u64 get_prefetch_disable_bits(void)
 }
 
 /**
+ * pseudo_lock_region_init - Initialize pseudo-lock region information
+ * @plr: pseudo-lock region
+ *
+ * Called after user provided a schemata to be pseudo-locked. From the
+ * schemata the  pseudo_lock_region is on entry already initialized
+ * with the resource, domain, and capacity bitmask. Here the information
+ * required for pseudo-locking is deduced from this data and 
+ * pseudo_lock_region initialized further. This information includes:
+ * - size in bytes of the region to be pseudo-locked
+ * - cache line size to know the stride with which data needs to be accessed
+ *   to be pseudo-locked
+ * - a cpu associated with the cache instance on which the pseudo-locking
+ *   flow can be executed
+ *
+ * Return: 0 on success, <0 on failure. Descriptive error will be written
+ * to last_cmd_status buffer.
+ */
+static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
+{
+   struct cpu_cacheinfo *ci;
+   int i;
+
+   /* Pick the first cpu we find that is associated with the cache. */
+   plr->cpu = cpumask_first(>d->cpu_mask);
+
+   if (!cpu_online(plr->cpu)) {
+   rdt_last_cmd_printf("cpu %u associated with cache not online\n",
+   plr->cpu);
+   return -ENODEV;
+   }
+
+   ci = 

Re: [PATCH v2 19/21] bcache: use match_string() helper

2018-05-31 Thread Coly Li
On 2018/6/1 12:32 PM, Yisheng Xie wrote:
> Hi Coly,
> 
> On 2018/6/1 11:45, Coly Li wrote:
>> On 2018/5/31 7:11 PM, Yisheng Xie wrote:
>>> match_string() returns the index of an array for a matching string,
>>> which can be used instead of open coded variant.
>>>
>>> Cc: Kent Overstreet 
>>> Cc: linux-bca...@vger.kernel.org 
>>> Signed-off-by: Yisheng Xie 
>>
>> Hi Yishenng,
>>
>> Andy Shevchenko  submitted a patch to
>> replace the whole bch_read_string_list() with __sysfs_match_string().
>> And this patch is applied in Jens' block tree, will go into mainline
>> kernel in v4.18.
>>
>> If you search bcache mailing list, you may find a patch named with
>> "bcache: Replace bch_read_string_list() by __sysfs_match_string()".
>>
>> That means this patch will conflict with existing changes.
> 
> Get it, and thanks for this information.
> 
> Sorry Andy, for doing this once more.
> 
Hi Yisheng,

Thank you for the effort, hope to see more bcache patches from you :-)

Coly Li


Re: [PATCH v2 19/21] bcache: use match_string() helper

2018-05-31 Thread Coly Li
On 2018/6/1 12:32 PM, Yisheng Xie wrote:
> Hi Coly,
> 
> On 2018/6/1 11:45, Coly Li wrote:
>> On 2018/5/31 7:11 PM, Yisheng Xie wrote:
>>> match_string() returns the index of an array for a matching string,
>>> which can be used instead of open coded variant.
>>>
>>> Cc: Kent Overstreet 
>>> Cc: linux-bca...@vger.kernel.org 
>>> Signed-off-by: Yisheng Xie 
>>
>> Hi Yishenng,
>>
>> Andy Shevchenko  submitted a patch to
>> replace the whole bch_read_string_list() with __sysfs_match_string().
>> And this patch is applied in Jens' block tree, will go into mainline
>> kernel in v4.18.
>>
>> If you search bcache mailing list, you may find a patch named with
>> "bcache: Replace bch_read_string_list() by __sysfs_match_string()".
>>
>> That means this patch will conflict with existing changes.
> 
> Get it, and thanks for this information.
> 
> Sorry Andy, for doing this once more.
> 
Hi Yisheng,

Thank you for the effort, hope to see more bcache patches from you :-)

Coly Li


Re: [PATCH] softirq: reorder trace_softirqs_on to prevent lockdep splat

2018-05-31 Thread Joel Fernandes
On Mon, May 07, 2018 at 02:21:17PM -0400, Steven Rostedt wrote:
> Peter, Ingo or Thomas,
> 
> Can you queue this up? Or would you prefer that I take it?

This patch is a bug fix, could it be queued for v4.17 -rc cycle or for
linux-next?

I have also included it below again:

thanks,

 - Joel

---8<---

From: "Joel Fernandes (Google)" 
Date: Tue, 1 May 2018 18:44:39 -0700
Subject: [PATCH] softirq: reorder trace_softirqs_on to prevent lockdep splat

I'm able to reproduce a lockdep splat with config options:
CONFIG_PROVE_LOCKING=y,
CONFIG_DEBUG_LOCK_ALLOC=y and
CONFIG_PREEMPTIRQ_EVENTS=y

$ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable

[   26.112609] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[   26.112636] WARNING: CPU: 0 PID: 118 at kernel/locking/lockdep.c:3854
[...]
[   26.144229] Call Trace:
[   26.144926]  
[   26.145506]  lock_acquire+0x55/0x1b0
[   26.146499]  ? __do_softirq+0x46f/0x4d9
[   26.147571]  ? __do_softirq+0x46f/0x4d9
[   26.148646]  trace_preempt_on+0x8f/0x240
[   26.149744]  ? trace_preempt_on+0x4d/0x240
[   26.150862]  ? __do_softirq+0x46f/0x4d9
[   26.151930]  preempt_count_sub+0x18a/0x1a0
[   26.152985]  __do_softirq+0x46f/0x4d9
[   26.153937]  irq_exit+0x68/0xe0
[   26.154755]  smp_apic_timer_interrupt+0x271/0x280
[   26.156056]  apic_timer_interrupt+0xf/0x20
[   26.157105]  

The issue was this:

preempt_count = 1 << SOFTIRQ_SHIFT

__local_bh_enable(cnt = 1 << SOFTIRQ_SHIFT) {
if (softirq_count() == (cnt && SOFTIRQ_MASK)) {
trace_softirqs_on() {
current->softirqs_enabled = 1;
}
}
preempt_count_sub(cnt) {
trace_preempt_on() {
tracepoint() {
rcu_read_lock_sched() {
// jumps into lockdep

Where preempt_count still has softirqs disabled, but
current->softirqs_enabled is true, and we get a splat.

Cc: sta...@vger.kernel.org
Reviewed-by: Steven Rostedt (VMware) 
Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable 
events")
Signed-off-by: Joel Fernandes (Google) 
---
 kernel/softirq.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 177de3640c78..8a040bcaa033 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -139,9 +139,13 @@ static void __local_bh_enable(unsigned int cnt)
 {
lockdep_assert_irqs_disabled();
 
+   if (preempt_count() == cnt)
+   trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+
if (softirq_count() == (cnt & SOFTIRQ_MASK))
trace_softirqs_on(_RET_IP_);
-   preempt_count_sub(cnt);
+
+   __preempt_count_sub(cnt);
 }
 
 /*
-- 
2.17.0.921.gf22659ad46-goog



Re: [PATCH] softirq: reorder trace_softirqs_on to prevent lockdep splat

2018-05-31 Thread Joel Fernandes
On Mon, May 07, 2018 at 02:21:17PM -0400, Steven Rostedt wrote:
> Peter, Ingo or Thomas,
> 
> Can you queue this up? Or would you prefer that I take it?

This patch is a bug fix, could it be queued for v4.17 -rc cycle or for
linux-next?

I have also included it below again:

thanks,

 - Joel

---8<---

From: "Joel Fernandes (Google)" 
Date: Tue, 1 May 2018 18:44:39 -0700
Subject: [PATCH] softirq: reorder trace_softirqs_on to prevent lockdep splat

I'm able to reproduce a lockdep splat with config options:
CONFIG_PROVE_LOCKING=y,
CONFIG_DEBUG_LOCK_ALLOC=y and
CONFIG_PREEMPTIRQ_EVENTS=y

$ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable

[   26.112609] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[   26.112636] WARNING: CPU: 0 PID: 118 at kernel/locking/lockdep.c:3854
[...]
[   26.144229] Call Trace:
[   26.144926]  
[   26.145506]  lock_acquire+0x55/0x1b0
[   26.146499]  ? __do_softirq+0x46f/0x4d9
[   26.147571]  ? __do_softirq+0x46f/0x4d9
[   26.148646]  trace_preempt_on+0x8f/0x240
[   26.149744]  ? trace_preempt_on+0x4d/0x240
[   26.150862]  ? __do_softirq+0x46f/0x4d9
[   26.151930]  preempt_count_sub+0x18a/0x1a0
[   26.152985]  __do_softirq+0x46f/0x4d9
[   26.153937]  irq_exit+0x68/0xe0
[   26.154755]  smp_apic_timer_interrupt+0x271/0x280
[   26.156056]  apic_timer_interrupt+0xf/0x20
[   26.157105]  

The issue was this:

preempt_count = 1 << SOFTIRQ_SHIFT

__local_bh_enable(cnt = 1 << SOFTIRQ_SHIFT) {
if (softirq_count() == (cnt && SOFTIRQ_MASK)) {
trace_softirqs_on() {
current->softirqs_enabled = 1;
}
}
preempt_count_sub(cnt) {
trace_preempt_on() {
tracepoint() {
rcu_read_lock_sched() {
// jumps into lockdep

Where preempt_count still has softirqs disabled, but
current->softirqs_enabled is true, and we get a splat.

Cc: sta...@vger.kernel.org
Reviewed-by: Steven Rostedt (VMware) 
Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable 
events")
Signed-off-by: Joel Fernandes (Google) 
---
 kernel/softirq.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 177de3640c78..8a040bcaa033 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -139,9 +139,13 @@ static void __local_bh_enable(unsigned int cnt)
 {
lockdep_assert_irqs_disabled();
 
+   if (preempt_count() == cnt)
+   trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+
if (softirq_count() == (cnt & SOFTIRQ_MASK))
trace_softirqs_on(_RET_IP_);
-   preempt_count_sub(cnt);
+
+   __preempt_count_sub(cnt);
 }
 
 /*
-- 
2.17.0.921.gf22659ad46-goog



[PATCH] x86/microcode/intel: Fix memleak in save_microcode_patch

2018-05-31 Thread Zhenzhong Duan
Free useless ucode_patch entry when it's replaced.

Signed-off-by: Zhenzhong Duan 
---
 arch/x86/kernel/cpu/microcode/intel.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c 
b/arch/x86/kernel/cpu/microcode/intel.c
index 1c2cfa0..461e315 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -150,6 +150,12 @@ static bool microcode_matches(struct 
microcode_header_intel *mc_header,
return false;
 }
 
+static void memfree_patch(struct ucode_patch *p)
+{
+   kfree(p->data);
+   kfree(p);
+}
+
 static struct ucode_patch *memdup_patch(void *data, unsigned int size)
 {
struct ucode_patch *p;
@@ -190,8 +196,10 @@ static void save_microcode_patch(void *data, unsigned int 
size)
p = memdup_patch(data, size);
if (!p)
pr_err("Error allocating buffer %p\n", data);
-   else
+   else {
list_replace(>plist, >plist);
+   memfree_patch(iter);
+   }
}
}
 
-- 
1.7.1


[PATCH] gcc-plugins: disable GCC_PLUGIN_STRUCTLEAK_BYREF_ALL for COMPILE_TEST

2018-05-31 Thread Masahiro Yamada
We have enabled GCC_PLUGINS for COMPILE_TEST, but allmodconfig now
produces new warnings.

  CC [M]  drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.o
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
‘wlc_phy_workarounds_nphy_rev7’:
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16563:1: warning: 
the frame size of 3128 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
‘wlc_phy_workarounds_nphy_rev3’:
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16905:1: warning: 
the frame size of 2800 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
‘wlc_phy_cal_txiqlo_nphy’:
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:26033:1: warning: 
the frame size of 2488 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^

It looks like GCC_PLUGIN_STRUCTLEAK_BYREF_ALL is causing this.
Add "depends on !COMPILE_TEST" to not dirturb the compile test.

Reported-by: Stephen Rothwell 
Suggested-by: Kees Cook 
Signed-off-by: Masahiro Yamada 
---

 arch/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index e9475d0..e5ff804 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -489,6 +489,7 @@ config GCC_PLUGIN_STRUCTLEAK
 config GCC_PLUGIN_STRUCTLEAK_BYREF_ALL
bool "Force initialize all struct type variables passed by reference"
depends on GCC_PLUGIN_STRUCTLEAK
+   depends on !COMPILE_TEST
help
  Zero initialize any struct type local variable that may be passed by
  reference without having been initialized.
-- 
2.7.4



[PATCH] x86/microcode/intel: Fix memleak in save_microcode_patch

2018-05-31 Thread Zhenzhong Duan
Free useless ucode_patch entry when it's replaced.

Signed-off-by: Zhenzhong Duan 
---
 arch/x86/kernel/cpu/microcode/intel.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c 
b/arch/x86/kernel/cpu/microcode/intel.c
index 1c2cfa0..461e315 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -150,6 +150,12 @@ static bool microcode_matches(struct 
microcode_header_intel *mc_header,
return false;
 }
 
+static void memfree_patch(struct ucode_patch *p)
+{
+   kfree(p->data);
+   kfree(p);
+}
+
 static struct ucode_patch *memdup_patch(void *data, unsigned int size)
 {
struct ucode_patch *p;
@@ -190,8 +196,10 @@ static void save_microcode_patch(void *data, unsigned int 
size)
p = memdup_patch(data, size);
if (!p)
pr_err("Error allocating buffer %p\n", data);
-   else
+   else {
list_replace(>plist, >plist);
+   memfree_patch(iter);
+   }
}
}
 
-- 
1.7.1


[PATCH] gcc-plugins: disable GCC_PLUGIN_STRUCTLEAK_BYREF_ALL for COMPILE_TEST

2018-05-31 Thread Masahiro Yamada
We have enabled GCC_PLUGINS for COMPILE_TEST, but allmodconfig now
produces new warnings.

  CC [M]  drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.o
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
‘wlc_phy_workarounds_nphy_rev7’:
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16563:1: warning: 
the frame size of 3128 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
‘wlc_phy_workarounds_nphy_rev3’:
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16905:1: warning: 
the frame size of 2800 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
‘wlc_phy_cal_txiqlo_nphy’:
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:26033:1: warning: 
the frame size of 2488 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^

It looks like GCC_PLUGIN_STRUCTLEAK_BYREF_ALL is causing this.
Add "depends on !COMPILE_TEST" to not dirturb the compile test.

Reported-by: Stephen Rothwell 
Suggested-by: Kees Cook 
Signed-off-by: Masahiro Yamada 
---

 arch/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index e9475d0..e5ff804 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -489,6 +489,7 @@ config GCC_PLUGIN_STRUCTLEAK
 config GCC_PLUGIN_STRUCTLEAK_BYREF_ALL
bool "Force initialize all struct type variables passed by reference"
depends on GCC_PLUGIN_STRUCTLEAK
+   depends on !COMPILE_TEST
help
  Zero initialize any struct type local variable that may be passed by
  reference without having been initialized.
-- 
2.7.4



Re: [PATCH 2/2] printk: make sure to print log on console.

2018-05-31 Thread Sergey Senozhatsky
On (05/31/18 14:21), Petr Mladek wrote:
> > 
> > Upstream printk has no printing kthread. And we also run
> > printk()->console_unlock() with disabled preemption.
> 
> Yes, the comment was wrong

Yes, that was the only thing I meant.
I really didn't have any time to look at the patch yesterday, just
commented on the most obvious thing.

> but the problem is real.

Yep, could be. But not exactly the way it is described in the commit
messages and the patch does not fully address the problem.

The patch assumes that all those events happen sequentially. While
in reality they can happen in parallel on different CPUs.

Example:

CPU0CPU1

set console verbose

dump_backtrace()
{
// for (;;)  print frames
printk("%pS\n", frame0);
printk("%pS\n", frame1);
printk("%pS\n", frame2);
printk("%pS\n", frame3);
... console_loglevel = 
CONSOLE_LOGLEVEL_SILENT;
printk("%pS\n", frame12);
printk("%pS\n", frame13);
}

Part of backtrace or the entire backtrace will be missed, because
we read the global console_loglevel. The problem is still there.

> The console handling is asynchronous even without the kthread.
> The current console_lock owner is responsible for handling messages
> also from other CPUs.

A nitpick: that's another thing to improve in the commit message, because
   I don't see that console_silent() race in the upstream kernel. We even
   don't have any users of console_silent() function. The only thing that
   ever sets console_loglevel to CONSOLE_LOGLEVEL_SILENT is
   kernel/debug/kdb/kdb_io.c


Now, there is a bunch of places that manually set console_loglevel.
At a glance, __handle_sysrq() *seems* to be interesting:

---

void __handle_sysrq(int key, bool check_mask)
{
struct sysrq_key_op *op_p;
int orig_log_level;
int i;

rcu_sysrq_start();
rcu_read_lock();
/*
~* Raise the apparent loglevel to maximum so that the sysrq header
~* is shown to provide the user with positive feedback.  We do not
~* simply emit this at KERN_EMERG as that would change message
~* routing in the consumers of /proc/kmsg.
~*/
~   orig_log_level = console_loglevel;
~   console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
pr_info("SysRq : ");

op_p = __sysrq_get_key_op(key);
if (op_p) {
/*
 * Should we check for enabled operations (/proc/sysrq-trigger
 * should not) and is the invoked operation enabled?
 */
if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
pr_cont("%s\n", op_p->action_msg);
console_loglevel = orig_log_level;
op_p->handler(key);
} else {
pr_cont("This sysrq operation is disabled.\n");
}
} else {
pr_cont("HELP : ");
/* Only print the help msg once per handler */
for (i = 0; i < ARRAY_SIZE(sysrq_key_table); i++) {
if (sysrq_key_table[i]) {
int j;

for (j = 0; sysrq_key_table[i] !=
sysrq_key_table[j]; j++)
;
if (j != i)
continue;
pr_cont("%s ", sysrq_key_table[i]->help_msg);
}
}
pr_cont("\n");
~   console_loglevel = orig_log_level;
}
rcu_read_unlock();
rcu_sysrq_end();
}

---

But only at a glance... In fact, it raises the loglevel to maximum only
to printk() "This sysrq operation is disabled" or "HELP: " and help_msg
messages. The sysrq handler itself is executed under orig_log_level
loglevel. So it doesn't look like we are loosing anything super-critical,
after all.


This case is very interesting:

static void uv_nmi_dump_state(int cpu, struct pt_regs *regs, int master)
{
if (master) {
int tcpu;
int ignored = 0;
int saved_console_loglevel = console_loglevel;

pr_alert("UV: tracing %s for %d CPUs from CPU %d\n",
uv_nmi_action_is("ips") ? "IPs" : "processes",
atomic_read(_nmi_cpus_in_nmi), cpu);

console_loglevel = uv_nmi_loglevel;
atomic_set(_nmi_slave_continue, SLAVE_EXIT);
for_each_online_cpu(tcpu) {
if (cpumask_test_cpu(tcpu, uv_nmi_cpu_mask))
ignored++;
else if (tcpu == cpu)

Re: [PATCH 2/2] printk: make sure to print log on console.

2018-05-31 Thread Sergey Senozhatsky
On (05/31/18 14:21), Petr Mladek wrote:
> > 
> > Upstream printk has no printing kthread. And we also run
> > printk()->console_unlock() with disabled preemption.
> 
> Yes, the comment was wrong

Yes, that was the only thing I meant.
I really didn't have any time to look at the patch yesterday, just
commented on the most obvious thing.

> but the problem is real.

Yep, could be. But not exactly the way it is described in the commit
messages and the patch does not fully address the problem.

The patch assumes that all those events happen sequentially. While
in reality they can happen in parallel on different CPUs.

Example:

CPU0CPU1

set console verbose

dump_backtrace()
{
// for (;;)  print frames
printk("%pS\n", frame0);
printk("%pS\n", frame1);
printk("%pS\n", frame2);
printk("%pS\n", frame3);
... console_loglevel = 
CONSOLE_LOGLEVEL_SILENT;
printk("%pS\n", frame12);
printk("%pS\n", frame13);
}

Part of backtrace or the entire backtrace will be missed, because
we read the global console_loglevel. The problem is still there.

> The console handling is asynchronous even without the kthread.
> The current console_lock owner is responsible for handling messages
> also from other CPUs.

A nitpick: that's another thing to improve in the commit message, because
   I don't see that console_silent() race in the upstream kernel. We even
   don't have any users of console_silent() function. The only thing that
   ever sets console_loglevel to CONSOLE_LOGLEVEL_SILENT is
   kernel/debug/kdb/kdb_io.c


Now, there is a bunch of places that manually set console_loglevel.
At a glance, __handle_sysrq() *seems* to be interesting:

---

void __handle_sysrq(int key, bool check_mask)
{
struct sysrq_key_op *op_p;
int orig_log_level;
int i;

rcu_sysrq_start();
rcu_read_lock();
/*
~* Raise the apparent loglevel to maximum so that the sysrq header
~* is shown to provide the user with positive feedback.  We do not
~* simply emit this at KERN_EMERG as that would change message
~* routing in the consumers of /proc/kmsg.
~*/
~   orig_log_level = console_loglevel;
~   console_loglevel = CONSOLE_LOGLEVEL_DEFAULT;
pr_info("SysRq : ");

op_p = __sysrq_get_key_op(key);
if (op_p) {
/*
 * Should we check for enabled operations (/proc/sysrq-trigger
 * should not) and is the invoked operation enabled?
 */
if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
pr_cont("%s\n", op_p->action_msg);
console_loglevel = orig_log_level;
op_p->handler(key);
} else {
pr_cont("This sysrq operation is disabled.\n");
}
} else {
pr_cont("HELP : ");
/* Only print the help msg once per handler */
for (i = 0; i < ARRAY_SIZE(sysrq_key_table); i++) {
if (sysrq_key_table[i]) {
int j;

for (j = 0; sysrq_key_table[i] !=
sysrq_key_table[j]; j++)
;
if (j != i)
continue;
pr_cont("%s ", sysrq_key_table[i]->help_msg);
}
}
pr_cont("\n");
~   console_loglevel = orig_log_level;
}
rcu_read_unlock();
rcu_sysrq_end();
}

---

But only at a glance... In fact, it raises the loglevel to maximum only
to printk() "This sysrq operation is disabled" or "HELP: " and help_msg
messages. The sysrq handler itself is executed under orig_log_level
loglevel. So it doesn't look like we are loosing anything super-critical,
after all.


This case is very interesting:

static void uv_nmi_dump_state(int cpu, struct pt_regs *regs, int master)
{
if (master) {
int tcpu;
int ignored = 0;
int saved_console_loglevel = console_loglevel;

pr_alert("UV: tracing %s for %d CPUs from CPU %d\n",
uv_nmi_action_is("ips") ? "IPs" : "processes",
atomic_read(_nmi_cpus_in_nmi), cpu);

console_loglevel = uv_nmi_loglevel;
atomic_set(_nmi_slave_continue, SLAVE_EXIT);
for_each_online_cpu(tcpu) {
if (cpumask_test_cpu(tcpu, uv_nmi_cpu_mask))
ignored++;
else if (tcpu == cpu)

Re: [PATCH v2 19/21] bcache: use match_string() helper

2018-05-31 Thread Yisheng Xie
Hi Coly,

On 2018/6/1 11:45, Coly Li wrote:
> On 2018/5/31 7:11 PM, Yisheng Xie wrote:
>> match_string() returns the index of an array for a matching string,
>> which can be used instead of open coded variant.
>>
>> Cc: Kent Overstreet 
>> Cc: linux-bca...@vger.kernel.org 
>> Signed-off-by: Yisheng Xie 
> 
> Hi Yishenng,
> 
> Andy Shevchenko  submitted a patch to
> replace the whole bch_read_string_list() with __sysfs_match_string().
> And this patch is applied in Jens' block tree, will go into mainline
> kernel in v4.18.
> 
> If you search bcache mailing list, you may find a patch named with
> "bcache: Replace bch_read_string_list() by __sysfs_match_string()".
> 
> That means this patch will conflict with existing changes.

Get it, and thanks for this information.

Sorry Andy, for doing this once more.

Thanks
Yisheng
> 
> Thanks.
> 
> Coly Li
> 
>> ---
>>  drivers/md/bcache/util.c | 9 ++---
>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
>> index 74febd5..cd1f4fd 100644
>> --- a/drivers/md/bcache/util.c
>> +++ b/drivers/md/bcache/util.c
>> @@ -136,22 +136,17 @@ ssize_t bch_snprint_string_list(char *buf, size_t 
>> size, const char * const list[
>>  
>>  ssize_t bch_read_string_list(const char *buf, const char * const list[])
>>  {
>> -size_t i;
>> +ssize_t i;
>>  char *s, *d = kstrndup(buf, PAGE_SIZE - 1, GFP_KERNEL);
>>  if (!d)
>>  return -ENOMEM;
>>  
>>  s = strim(d);
>>  
>> -for (i = 0; list[i]; i++)
>> -if (!strcmp(list[i], s))
>> -break;
>> +i = match_string(list, -1, s);
>>  
>>  kfree(d);
>>  
>> -if (!list[i])
>> -return -EINVAL;
>> -
>>  return i;
>>  }
>>  
>>
> 
> 
> 



Re: [PATCH v2 19/21] bcache: use match_string() helper

2018-05-31 Thread Yisheng Xie
Hi Coly,

On 2018/6/1 11:45, Coly Li wrote:
> On 2018/5/31 7:11 PM, Yisheng Xie wrote:
>> match_string() returns the index of an array for a matching string,
>> which can be used instead of open coded variant.
>>
>> Cc: Kent Overstreet 
>> Cc: linux-bca...@vger.kernel.org 
>> Signed-off-by: Yisheng Xie 
> 
> Hi Yishenng,
> 
> Andy Shevchenko  submitted a patch to
> replace the whole bch_read_string_list() with __sysfs_match_string().
> And this patch is applied in Jens' block tree, will go into mainline
> kernel in v4.18.
> 
> If you search bcache mailing list, you may find a patch named with
> "bcache: Replace bch_read_string_list() by __sysfs_match_string()".
> 
> That means this patch will conflict with existing changes.

Get it, and thanks for this information.

Sorry Andy, for doing this once more.

Thanks
Yisheng
> 
> Thanks.
> 
> Coly Li
> 
>> ---
>>  drivers/md/bcache/util.c | 9 ++---
>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
>> index 74febd5..cd1f4fd 100644
>> --- a/drivers/md/bcache/util.c
>> +++ b/drivers/md/bcache/util.c
>> @@ -136,22 +136,17 @@ ssize_t bch_snprint_string_list(char *buf, size_t 
>> size, const char * const list[
>>  
>>  ssize_t bch_read_string_list(const char *buf, const char * const list[])
>>  {
>> -size_t i;
>> +ssize_t i;
>>  char *s, *d = kstrndup(buf, PAGE_SIZE - 1, GFP_KERNEL);
>>  if (!d)
>>  return -ENOMEM;
>>  
>>  s = strim(d);
>>  
>> -for (i = 0; list[i]; i++)
>> -if (!strcmp(list[i], s))
>> -break;
>> +i = match_string(list, -1, s);
>>  
>>  kfree(d);
>>  
>> -if (!list[i])
>> -return -EINVAL;
>> -
>>  return i;
>>  }
>>  
>>
> 
> 
> 



Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-05-31 Thread Mike Snitzer
On Thu, May 31 2018 at 10:40pm -0400,
Martin K. Petersen  wrote:

> 
> Mike,
> 
> > 1) container A is tasked with managing some dedicated NVMe technology
> > that absolutely needs native NVMe multipath.
> 
> > 2) container B is tasked with offering some canned layered product
> > that was developed ontop of dm-multipath with its own multipath-tools
> > oriented APIs, etc. And it is to manage some other NVMe technology on
> > the same host as container A.
> 
> This assumes there is something to manage. And that the administrative
> model currently employed by DM multipath will be easily applicable to
> ANA devices. I don't believe that's the case. The configuration happens
> on the storage side, not on the host.

Fair point.

> With ALUA (and the proprietary implementations that predated the spec),
> it was very fuzzy whether it was the host or the target that owned
> responsibility for this or that. Part of the reason was that ALUA was
> deliberately vague to accommodate everybody's existing, non-standards
> compliant multipath storage implementations.
> 
> With ANA the heavy burden falls entirely on the storage. Most of the
> things you would currently configure in multipath.conf have no meaning
> in the context of ANA. Things that are currently the domain of
> dm-multipath or multipathd are inextricably living either in the storage
> device or in the NVMe ANA "device handler". And I think you are
> significantly underestimating the effort required to expose that
> information up the stack and to make use of it. That's not just a
> multipath personality toggle switch.

I'm aware that most everything in multipath.conf is SCSI/FC specific.
That isn't the point.  dm-multipath and multipathd are an existing
framework for managing multipath storage.

It could be made to work with NVMe.  But yes it would not be easy.
Especially not with the native NVMe multipath crew being so damn
hostile.

> If you want to make multipath -ll show something meaningful for ANA
> devices, then by all means go ahead. I don't have any problem with
> that.

Thanks so much for your permission ;)  But I'm actually not very
involved with multipathd development anyway.  It is likely a better use
of time in the near-term though.  Making the multipath tools and
libraries be able to understand native NVMe multipath in all its glory
might be a means to an end from a compatibility with existing monitoring
applications perspective.

Though NVMe just doesn't have per-device accounting at all.  Also not
yet aware how nvme cli conveys paths being down vs up, etc.

Glad that isn't my problem ;)

> But I don't think the burden of allowing multipathd/DM to inject
> themselves into the path transition state machine has any benefit
> whatsoever to the user. It's only complicating things and therefore we'd
> be doing people a disservice rather than a favor.

This notion that only native NVMe multipath can be successful is utter
bullshit.  And the mere fact that I've gotten such a reaction from a
select few speaks to some serious control issues.

Imagine if XFS developers just one day imposed that it is the _only_
filesystem that can be used on persistent memory.

Just please dial it back.. seriously tiresome.


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-05-31 Thread Mike Snitzer
On Thu, May 31 2018 at 10:40pm -0400,
Martin K. Petersen  wrote:

> 
> Mike,
> 
> > 1) container A is tasked with managing some dedicated NVMe technology
> > that absolutely needs native NVMe multipath.
> 
> > 2) container B is tasked with offering some canned layered product
> > that was developed ontop of dm-multipath with its own multipath-tools
> > oriented APIs, etc. And it is to manage some other NVMe technology on
> > the same host as container A.
> 
> This assumes there is something to manage. And that the administrative
> model currently employed by DM multipath will be easily applicable to
> ANA devices. I don't believe that's the case. The configuration happens
> on the storage side, not on the host.

Fair point.

> With ALUA (and the proprietary implementations that predated the spec),
> it was very fuzzy whether it was the host or the target that owned
> responsibility for this or that. Part of the reason was that ALUA was
> deliberately vague to accommodate everybody's existing, non-standards
> compliant multipath storage implementations.
> 
> With ANA the heavy burden falls entirely on the storage. Most of the
> things you would currently configure in multipath.conf have no meaning
> in the context of ANA. Things that are currently the domain of
> dm-multipath or multipathd are inextricably living either in the storage
> device or in the NVMe ANA "device handler". And I think you are
> significantly underestimating the effort required to expose that
> information up the stack and to make use of it. That's not just a
> multipath personality toggle switch.

I'm aware that most everything in multipath.conf is SCSI/FC specific.
That isn't the point.  dm-multipath and multipathd are an existing
framework for managing multipath storage.

It could be made to work with NVMe.  But yes it would not be easy.
Especially not with the native NVMe multipath crew being so damn
hostile.

> If you want to make multipath -ll show something meaningful for ANA
> devices, then by all means go ahead. I don't have any problem with
> that.

Thanks so much for your permission ;)  But I'm actually not very
involved with multipathd development anyway.  It is likely a better use
of time in the near-term though.  Making the multipath tools and
libraries be able to understand native NVMe multipath in all its glory
might be a means to an end from a compatibility with existing monitoring
applications perspective.

Though NVMe just doesn't have per-device accounting at all.  Also not
yet aware how nvme cli conveys paths being down vs up, etc.

Glad that isn't my problem ;)

> But I don't think the burden of allowing multipathd/DM to inject
> themselves into the path transition state machine has any benefit
> whatsoever to the user. It's only complicating things and therefore we'd
> be doing people a disservice rather than a favor.

This notion that only native NVMe multipath can be successful is utter
bullshit.  And the mere fact that I've gotten such a reaction from a
select few speaks to some serious control issues.

Imagine if XFS developers just one day imposed that it is the _only_
filesystem that can be used on persistent memory.

Just please dial it back.. seriously tiresome.


Re: [PATCH v3 00/16] Provide saturating helpers for allocation

2018-05-31 Thread Kees Cook
On Thu, May 31, 2018 at 5:54 PM, Linus Torvalds
 wrote:
> On Thu, May 31, 2018 at 7:43 PM Kees Cook  wrote:
>>
>> So, while nothing does:
>> kmalloc_array(a, b, ...) -> kmalloc(array_size(a, b), ...)
>> the treewide changes DO perform changes like this:
>> kmalloc(a * b, ...) -> kmalloc(array_size(a, b), ...)
>
> Ugh. I really really still absolutely despise this.

Heh. Yeah, I called this out specifically because I wasn't sure if
this was going to be okay. :P

> Why can't you just have a separate set of coccinelle scripts that do
> the simple and clean cases?
>
> So *before* doing any array_size() conversions, just do
>
> kzalloc(a*b, ...) -> kcalloc(a, b, ...)
> kmalloc(a*b,..) -> kmalloc_array(a,b, ...)
>
> and the obvious variations on that (devm_xyz() has all the same helpers).

Yup. I'll get started on it. I did have a version of a python script
that generated coccinelle scripts, but I started losing my mind. I'll
double-check if I can find a way to do some internal-to-Coccinelle
python to handle some of the variation directly, etc.

For those interested in the details: the complexity for me is in how
Coccinelle handles expressions (or my understanding of it's handling).
There's nothing in between "expression" and "identifier", so
"thing->field" is an expression not an identifier ("thing" is an
identifier), but "foo * bar" is _also_ an expression, so I have to
slowly peel away the "easy" stuff (sizeof, constants, etc) before
expressions to avoid collapsing factors into the wrong arguments (e.g.
kzalloc(a * b * c, ...) -> kcalloc(a * b, c, ...) is not desirable),
so there end up being a LOT of rules... I was able to compress
allocation families into a a regex, but without that, I'll end up with
the sizeof/const/etc rules times the family times the kalloc and
_array rules.

> Only after doing the ones that don't have the nice obvious helpers, do
> the remaining ones with array_size(), ie
>
> *alloc(a*b, ..) -> *alloc(array_size(a,b), ...)
>
> because that really makes for much less legible code.
>
> Hmm?

Sounds good. Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH v3 00/16] Provide saturating helpers for allocation

2018-05-31 Thread Kees Cook
On Thu, May 31, 2018 at 5:54 PM, Linus Torvalds
 wrote:
> On Thu, May 31, 2018 at 7:43 PM Kees Cook  wrote:
>>
>> So, while nothing does:
>> kmalloc_array(a, b, ...) -> kmalloc(array_size(a, b), ...)
>> the treewide changes DO perform changes like this:
>> kmalloc(a * b, ...) -> kmalloc(array_size(a, b), ...)
>
> Ugh. I really really still absolutely despise this.

Heh. Yeah, I called this out specifically because I wasn't sure if
this was going to be okay. :P

> Why can't you just have a separate set of coccinelle scripts that do
> the simple and clean cases?
>
> So *before* doing any array_size() conversions, just do
>
> kzalloc(a*b, ...) -> kcalloc(a, b, ...)
> kmalloc(a*b,..) -> kmalloc_array(a,b, ...)
>
> and the obvious variations on that (devm_xyz() has all the same helpers).

Yup. I'll get started on it. I did have a version of a python script
that generated coccinelle scripts, but I started losing my mind. I'll
double-check if I can find a way to do some internal-to-Coccinelle
python to handle some of the variation directly, etc.

For those interested in the details: the complexity for me is in how
Coccinelle handles expressions (or my understanding of it's handling).
There's nothing in between "expression" and "identifier", so
"thing->field" is an expression not an identifier ("thing" is an
identifier), but "foo * bar" is _also_ an expression, so I have to
slowly peel away the "easy" stuff (sizeof, constants, etc) before
expressions to avoid collapsing factors into the wrong arguments (e.g.
kzalloc(a * b * c, ...) -> kcalloc(a * b, c, ...) is not desirable),
so there end up being a LOT of rules... I was able to compress
allocation families into a a regex, but without that, I'll end up with
the sizeof/const/etc rules times the family times the kalloc and
_array rules.

> Only after doing the ones that don't have the nice obvious helpers, do
> the remaining ones with array_size(), ie
>
> *alloc(a*b, ..) -> *alloc(array_size(a,b), ...)
>
> because that really makes for much less legible code.
>
> Hmm?

Sounds good. Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-05-31 Thread Mike Snitzer
On Thu, May 31 2018 at 12:34pm -0400,
Christoph Hellwig  wrote:

> On Thu, May 31, 2018 at 08:37:39AM -0400, Mike Snitzer wrote:
> > I saw your reply to the 1/3 patch.. I do agree it is broken for not
> > checking if any handles are active.  But that is easily fixed no?
> 
> Doing a switch at runtime simply is a really bad idea.  If for some
> reason we end up with a good per-controller switch it would have
> to be something set at probe time, and to get it on a controller
> you'd need to reset it first.

Yes, I see that now.  And the implementation would need to be something
yourself or other more seasoned NVMe developers pursued.  NVMe code is
pretty unforgiving.

I took a crack at aspects of this, my head hurts.  While testing I hit
some "interesting" lack of self-awareness about NVMe resources that are
in use.  So lots of associations are able to be torn down rather than
graceful failure.  Could be nvme_fcloop specific, but it is pretty easy
to do the following using mptest's lib/unittests/nvme_4port_create.sh
followed by: modprobe -r nvme_fcloop

Results in an infinite spew of:
[14245.345759] nvme_fcloop: fcloop_exit: Failed deleting remote port
[14245.351851] nvme_fcloop: fcloop_exit: Failed deleting target port
[14245.357944] nvme_fcloop: fcloop_exit: Failed deleting remote port
[14245.364038] nvme_fcloop: fcloop_exit: Failed deleting target port

Another fun one is to lib/unittests/nvme_4port_delete.sh while the
native NVMe multipath device (created from nvme_4port_create.sh) was
still in use by an xfs mount, so:
./nvme_4port_create.sh
mount /dev/nvme1n1 /mnt
./nvme_4port_delete.sh
umount /mnt

Those were clear screwups on my part but I wouldn't have expected them
to cause nvme to blow through so many stop signs.

Anyway, I put enough time to trying to make the previously thought
"simple" mpath_personality switch safe -- in the face of active handles
(issue Sagi pointed out) -- that it is clear NVMe just doesn't have
enough state to do it in a clean way.  Would require a deeper
understanding of the code that I don't have.  Most every NVMe function
returns void so there is basically no potential for error handling (in
the face of a resource being in use).

The following is my WIP patch (built ontop of the 3 patches from
this thread's series) that has cured me of wanting to continue pursuit
of a robust implementation of the runtime 'mpath_personality' switch:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1e018d0..80103b3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2146,10 +2146,8 @@ static ssize_t 
__nvme_subsys_store_mpath_personality(struct nvme_subsystem *subs
goto out;
}
 
-   if (subsys->native_mpath != native_mpath) {
-   subsys->native_mpath = native_mpath;
-   ret = nvme_mpath_change_personality(subsys);
-   }
+   if (subsys->native_mpath != native_mpath)
+   ret = nvme_mpath_change_personality(subsys, native_mpath);
 out:
return ret ? ret : count;
 }
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 53d2610..017c924 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -247,26 +247,57 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
put_disk(head->disk);
 }
 
-int nvme_mpath_change_personality(struct nvme_subsystem *subsys)
+static bool __nvme_subsys_in_use(struct nvme_subsystem *subsys)
 {
struct nvme_ctrl *ctrl;
-   int ret = 0;
+   struct nvme_ns *ns, *next;
 
-restart:
-   mutex_lock(>lock);
list_for_each_entry(ctrl, >ctrls, subsys_entry) {
-   if (!list_empty(>namespaces)) {
-   mutex_unlock(>lock);
-   nvme_remove_namespaces(ctrl);
-   goto restart;
+   down_write(>namespaces_rwsem);
+   list_for_each_entry_safe(ns, next, >namespaces, list) {
+   if ((kref_read(>kref) > 1) ||
+   // FIXME: need to compare with N paths
+   (ns->head && (kref_read(>head->ref) > 1))) {
+   printk("ns->kref = %d", kref_read(>kref));
+   printk("ns->head->ref = %d", 
kref_read(>head->ref));
+   up_write(>namespaces_rwsem);
+   mutex_unlock(>lock);
+   return true;
+   }
}
+   up_write(>namespaces_rwsem);
}
-   mutex_unlock(>lock);
+
+   return false;
+}
+
+int nvme_mpath_change_personality(struct nvme_subsystem *subsys, bool native)
+{
+   struct nvme_ctrl *ctrl;
 
mutex_lock(>lock);
-   list_for_each_entry(ctrl, >ctrls, subsys_entry)
-   nvme_queue_scan(ctrl);
+
+   if (__nvme_subsys_in_use(subsys)) {
+   mutex_unlock(>lock);
+   return -EBUSY;
+   }

Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-05-31 Thread Mike Snitzer
On Thu, May 31 2018 at 12:34pm -0400,
Christoph Hellwig  wrote:

> On Thu, May 31, 2018 at 08:37:39AM -0400, Mike Snitzer wrote:
> > I saw your reply to the 1/3 patch.. I do agree it is broken for not
> > checking if any handles are active.  But that is easily fixed no?
> 
> Doing a switch at runtime simply is a really bad idea.  If for some
> reason we end up with a good per-controller switch it would have
> to be something set at probe time, and to get it on a controller
> you'd need to reset it first.

Yes, I see that now.  And the implementation would need to be something
yourself or other more seasoned NVMe developers pursued.  NVMe code is
pretty unforgiving.

I took a crack at aspects of this, my head hurts.  While testing I hit
some "interesting" lack of self-awareness about NVMe resources that are
in use.  So lots of associations are able to be torn down rather than
graceful failure.  Could be nvme_fcloop specific, but it is pretty easy
to do the following using mptest's lib/unittests/nvme_4port_create.sh
followed by: modprobe -r nvme_fcloop

Results in an infinite spew of:
[14245.345759] nvme_fcloop: fcloop_exit: Failed deleting remote port
[14245.351851] nvme_fcloop: fcloop_exit: Failed deleting target port
[14245.357944] nvme_fcloop: fcloop_exit: Failed deleting remote port
[14245.364038] nvme_fcloop: fcloop_exit: Failed deleting target port

Another fun one is to lib/unittests/nvme_4port_delete.sh while the
native NVMe multipath device (created from nvme_4port_create.sh) was
still in use by an xfs mount, so:
./nvme_4port_create.sh
mount /dev/nvme1n1 /mnt
./nvme_4port_delete.sh
umount /mnt

Those were clear screwups on my part but I wouldn't have expected them
to cause nvme to blow through so many stop signs.

Anyway, I put enough time to trying to make the previously thought
"simple" mpath_personality switch safe -- in the face of active handles
(issue Sagi pointed out) -- that it is clear NVMe just doesn't have
enough state to do it in a clean way.  Would require a deeper
understanding of the code that I don't have.  Most every NVMe function
returns void so there is basically no potential for error handling (in
the face of a resource being in use).

The following is my WIP patch (built ontop of the 3 patches from
this thread's series) that has cured me of wanting to continue pursuit
of a robust implementation of the runtime 'mpath_personality' switch:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1e018d0..80103b3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2146,10 +2146,8 @@ static ssize_t 
__nvme_subsys_store_mpath_personality(struct nvme_subsystem *subs
goto out;
}
 
-   if (subsys->native_mpath != native_mpath) {
-   subsys->native_mpath = native_mpath;
-   ret = nvme_mpath_change_personality(subsys);
-   }
+   if (subsys->native_mpath != native_mpath)
+   ret = nvme_mpath_change_personality(subsys, native_mpath);
 out:
return ret ? ret : count;
 }
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 53d2610..017c924 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -247,26 +247,57 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
put_disk(head->disk);
 }
 
-int nvme_mpath_change_personality(struct nvme_subsystem *subsys)
+static bool __nvme_subsys_in_use(struct nvme_subsystem *subsys)
 {
struct nvme_ctrl *ctrl;
-   int ret = 0;
+   struct nvme_ns *ns, *next;
 
-restart:
-   mutex_lock(>lock);
list_for_each_entry(ctrl, >ctrls, subsys_entry) {
-   if (!list_empty(>namespaces)) {
-   mutex_unlock(>lock);
-   nvme_remove_namespaces(ctrl);
-   goto restart;
+   down_write(>namespaces_rwsem);
+   list_for_each_entry_safe(ns, next, >namespaces, list) {
+   if ((kref_read(>kref) > 1) ||
+   // FIXME: need to compare with N paths
+   (ns->head && (kref_read(>head->ref) > 1))) {
+   printk("ns->kref = %d", kref_read(>kref));
+   printk("ns->head->ref = %d", 
kref_read(>head->ref));
+   up_write(>namespaces_rwsem);
+   mutex_unlock(>lock);
+   return true;
+   }
}
+   up_write(>namespaces_rwsem);
}
-   mutex_unlock(>lock);
+
+   return false;
+}
+
+int nvme_mpath_change_personality(struct nvme_subsystem *subsys, bool native)
+{
+   struct nvme_ctrl *ctrl;
 
mutex_lock(>lock);
-   list_for_each_entry(ctrl, >ctrls, subsys_entry)
-   nvme_queue_scan(ctrl);
+
+   if (__nvme_subsys_in_use(subsys)) {
+   mutex_unlock(>lock);
+   return -EBUSY;
+   }

Re: linux-next: build warnings after merge of the kbuild tree

2018-05-31 Thread Kees Cook
On Thu, May 31, 2018 at 6:56 PM, Masahiro Yamada
 wrote:
> 2018-05-31 12:53 GMT+09:00 Kees Cook :
>> On Wed, May 30, 2018 at 6:26 PM, Kees Cook  wrote:
>>> On Wed, May 30, 2018 at 6:12 PM, Masahiro Yamada
>>>  wrote:
 Hi.
 (+CC Kees)

 2018-05-31 7:40 GMT+09:00 Stephen Rothwell :
> Hi Masahiro,
>
> After merging the kbuild tree, today's linux-next build (x86_64
> allmodconfig) produced these warnings:
>
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy_rev7':
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16563:1: 
> warning: the frame size of 3136 bytes is larger than 2048 bytes 
> [-Wframe-larger-than=]
>  }
>  ^
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy_rev3':
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16905:1: 
> warning: the frame size of 2872 bytes is larger than 2048 bytes 
> [-Wframe-larger-than=]
>  }
>  ^
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_cal_txiqlo_nphy':
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:26033:1: 
> warning: the frame size of 2432 bytes is larger than 2048 bytes 
> [-Wframe-larger-than=]
>  }
>  ^
>
> I have no idea what caused these warnings to appear ... nothing in those
> functions looks too bad.


 This has been triggered by the following commit:


 commit 0e461945f3504e09b8ecf947b6398adce1287a28
 Author: Masahiro Yamada 
 Date:   Mon May 28 18:22:07 2018 +0900

 gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST



 CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL was previously disabled
 for COMPILE_TEST, which is now enabled.

For the moment, can you add "depends on !COMPILE_TEST" to
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL in your tree and I'll continue
to figure out what's happening?

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: linux-next: build warnings after merge of the kbuild tree

2018-05-31 Thread Kees Cook
On Thu, May 31, 2018 at 6:56 PM, Masahiro Yamada
 wrote:
> 2018-05-31 12:53 GMT+09:00 Kees Cook :
>> On Wed, May 30, 2018 at 6:26 PM, Kees Cook  wrote:
>>> On Wed, May 30, 2018 at 6:12 PM, Masahiro Yamada
>>>  wrote:
 Hi.
 (+CC Kees)

 2018-05-31 7:40 GMT+09:00 Stephen Rothwell :
> Hi Masahiro,
>
> After merging the kbuild tree, today's linux-next build (x86_64
> allmodconfig) produced these warnings:
>
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy_rev7':
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16563:1: 
> warning: the frame size of 3136 bytes is larger than 2048 bytes 
> [-Wframe-larger-than=]
>  }
>  ^
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_workarounds_nphy_rev3':
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16905:1: 
> warning: the frame size of 2872 bytes is larger than 2048 bytes 
> [-Wframe-larger-than=]
>  }
>  ^
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
> 'wlc_phy_cal_txiqlo_nphy':
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:26033:1: 
> warning: the frame size of 2432 bytes is larger than 2048 bytes 
> [-Wframe-larger-than=]
>  }
>  ^
>
> I have no idea what caused these warnings to appear ... nothing in those
> functions looks too bad.


 This has been triggered by the following commit:


 commit 0e461945f3504e09b8ecf947b6398adce1287a28
 Author: Masahiro Yamada 
 Date:   Mon May 28 18:22:07 2018 +0900

 gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST



 CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL was previously disabled
 for COMPILE_TEST, which is now enabled.

For the moment, can you add "depends on !COMPILE_TEST" to
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL in your tree and I'll continue
to figure out what's happening?

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH v2 19/21] bcache: use match_string() helper

2018-05-31 Thread Coly Li
On 2018/5/31 7:11 PM, Yisheng Xie wrote:
> match_string() returns the index of an array for a matching string,
> which can be used instead of open coded variant.
> 
> Cc: Kent Overstreet 
> Cc: linux-bca...@vger.kernel.org 
> Signed-off-by: Yisheng Xie 

Hi Yishenng,

Andy Shevchenko  submitted a patch to
replace the whole bch_read_string_list() with __sysfs_match_string().
And this patch is applied in Jens' block tree, will go into mainline
kernel in v4.18.

If you search bcache mailing list, you may find a patch named with
"bcache: Replace bch_read_string_list() by __sysfs_match_string()".

That means this patch will conflict with existing changes.

Thanks.

Coly Li

> ---
>  drivers/md/bcache/util.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> index 74febd5..cd1f4fd 100644
> --- a/drivers/md/bcache/util.c
> +++ b/drivers/md/bcache/util.c
> @@ -136,22 +136,17 @@ ssize_t bch_snprint_string_list(char *buf, size_t size, 
> const char * const list[
>  
>  ssize_t bch_read_string_list(const char *buf, const char * const list[])
>  {
> - size_t i;
> + ssize_t i;
>   char *s, *d = kstrndup(buf, PAGE_SIZE - 1, GFP_KERNEL);
>   if (!d)
>   return -ENOMEM;
>  
>   s = strim(d);
>  
> - for (i = 0; list[i]; i++)
> - if (!strcmp(list[i], s))
> - break;
> + i = match_string(list, -1, s);
>  
>   kfree(d);
>  
> - if (!list[i])
> - return -EINVAL;
> -
>   return i;
>  }
>  
> 



Re: [PATCH v2 19/21] bcache: use match_string() helper

2018-05-31 Thread Coly Li
On 2018/5/31 7:11 PM, Yisheng Xie wrote:
> match_string() returns the index of an array for a matching string,
> which can be used instead of open coded variant.
> 
> Cc: Kent Overstreet 
> Cc: linux-bca...@vger.kernel.org 
> Signed-off-by: Yisheng Xie 

Hi Yishenng,

Andy Shevchenko  submitted a patch to
replace the whole bch_read_string_list() with __sysfs_match_string().
And this patch is applied in Jens' block tree, will go into mainline
kernel in v4.18.

If you search bcache mailing list, you may find a patch named with
"bcache: Replace bch_read_string_list() by __sysfs_match_string()".

That means this patch will conflict with existing changes.

Thanks.

Coly Li

> ---
>  drivers/md/bcache/util.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> index 74febd5..cd1f4fd 100644
> --- a/drivers/md/bcache/util.c
> +++ b/drivers/md/bcache/util.c
> @@ -136,22 +136,17 @@ ssize_t bch_snprint_string_list(char *buf, size_t size, 
> const char * const list[
>  
>  ssize_t bch_read_string_list(const char *buf, const char * const list[])
>  {
> - size_t i;
> + ssize_t i;
>   char *s, *d = kstrndup(buf, PAGE_SIZE - 1, GFP_KERNEL);
>   if (!d)
>   return -ENOMEM;
>  
>   s = strim(d);
>  
> - for (i = 0; list[i]; i++)
> - if (!strcmp(list[i], s))
> - break;
> + i = match_string(list, -1, s);
>  
>   kfree(d);
>  
> - if (!list[i])
> - return -EINVAL;
> -
>   return i;
>  }
>  
> 



Re: [PATCH 6/6] arm64: dts: socionext: Add missing cooling device properties for CPUs

2018-05-31 Thread Masahiro Yamada
2018-05-25 14:40 GMT+09:00 Viresh Kumar :
> The cooling device properties, like "#cooling-cells" and
> "dynamic-power-coefficient", should either be present for all the CPUs
> of a cluster or none. If these are present only for a subset of CPUs of
> a cluster then things will start falling apart as soon as the CPUs are
> brought online in a different order. For example, this will happen
> because the operating system looks for such properties in the CPU node
> it is trying to bring up, so that it can register a cooling device.
>
> Add such missing properties.
>
> Signed-off-by: Viresh Kumar 


Applied to linux-uniphier.

I had already sent a PR for v4.18-rc1 before I received this patch.
Please wait for v4.19-rc1.

Thanks.

> ---
>  arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi 
> b/arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi
> index 3a5ed789c056..10ffb5019013 100644
> --- a/arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi
> +++ b/arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi
> @@ -58,6 +58,7 @@
> clocks = <_clk 32>;
> enable-method = "psci";
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
>
> cpu2: cpu@100 {
> @@ -77,6 +78,7 @@
> clocks = <_clk 33>;
> enable-method = "psci";
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
> };
>
> --
> 2.15.0.194.g9af6a3dea062
>



-- 
Best Regards
Masahiro Yamada


Re: [PATCH 6/6] arm64: dts: socionext: Add missing cooling device properties for CPUs

2018-05-31 Thread Masahiro Yamada
2018-05-25 14:40 GMT+09:00 Viresh Kumar :
> The cooling device properties, like "#cooling-cells" and
> "dynamic-power-coefficient", should either be present for all the CPUs
> of a cluster or none. If these are present only for a subset of CPUs of
> a cluster then things will start falling apart as soon as the CPUs are
> brought online in a different order. For example, this will happen
> because the operating system looks for such properties in the CPU node
> it is trying to bring up, so that it can register a cooling device.
>
> Add such missing properties.
>
> Signed-off-by: Viresh Kumar 


Applied to linux-uniphier.

I had already sent a PR for v4.18-rc1 before I received this patch.
Please wait for v4.19-rc1.

Thanks.

> ---
>  arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi 
> b/arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi
> index 3a5ed789c056..10ffb5019013 100644
> --- a/arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi
> +++ b/arch/arm64/boot/dts/socionext/uniphier-ld20.dtsi
> @@ -58,6 +58,7 @@
> clocks = <_clk 32>;
> enable-method = "psci";
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
>
> cpu2: cpu@100 {
> @@ -77,6 +78,7 @@
> clocks = <_clk 33>;
> enable-method = "psci";
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
> };
>
> --
> 2.15.0.194.g9af6a3dea062
>



-- 
Best Regards
Masahiro Yamada


Re: [PATCH 05/15] arm: dts: uniphier: Add missing cooling device properties for CPUs

2018-05-31 Thread Masahiro Yamada
2018-05-25 19:31 GMT+09:00 Viresh Kumar :
> The cooling device properties, like "#cooling-cells" and
> "dynamic-power-coefficient", should either be present for all the CPUs
> of a cluster or none. If these are present only for a subset of CPUs of
> a cluster then things will start falling apart as soon as the CPUs are
> brought online in a different order. For example, this will happen
> because the operating system looks for such properties in the CPU node
> it is trying to bring up, so that it can register a cooling device.
>
> Add such missing properties.
>
> Signed-off-by: Viresh Kumar 


Applied to linux-uniphier.

I had already sent a PR for v4.18-rc1 before I received this patch.
Please wait for v4.19-rc1.

Thanks.


> ---
>  arch/arm/boot/dts/uniphier-pxs2.dtsi | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm/boot/dts/uniphier-pxs2.dtsi 
> b/arch/arm/boot/dts/uniphier-pxs2.dtsi
> index debcbd15c24b..40ed15465095 100644
> --- a/arch/arm/boot/dts/uniphier-pxs2.dtsi
> +++ b/arch/arm/boot/dts/uniphier-pxs2.dtsi
> @@ -36,6 +36,7 @@
> enable-method = "psci";
> next-level-cache = <>;
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
>
> cpu2: cpu@2 {
> @@ -46,6 +47,7 @@
> enable-method = "psci";
> next-level-cache = <>;
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
>
> cpu3: cpu@3 {
> @@ -56,6 +58,7 @@
> enable-method = "psci";
> next-level-cache = <>;
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
> };
>
> --
> 2.15.0.194.g9af6a3dea062
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


Re: [PATCH 05/15] arm: dts: uniphier: Add missing cooling device properties for CPUs

2018-05-31 Thread Masahiro Yamada
2018-05-25 19:31 GMT+09:00 Viresh Kumar :
> The cooling device properties, like "#cooling-cells" and
> "dynamic-power-coefficient", should either be present for all the CPUs
> of a cluster or none. If these are present only for a subset of CPUs of
> a cluster then things will start falling apart as soon as the CPUs are
> brought online in a different order. For example, this will happen
> because the operating system looks for such properties in the CPU node
> it is trying to bring up, so that it can register a cooling device.
>
> Add such missing properties.
>
> Signed-off-by: Viresh Kumar 


Applied to linux-uniphier.

I had already sent a PR for v4.18-rc1 before I received this patch.
Please wait for v4.19-rc1.

Thanks.


> ---
>  arch/arm/boot/dts/uniphier-pxs2.dtsi | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm/boot/dts/uniphier-pxs2.dtsi 
> b/arch/arm/boot/dts/uniphier-pxs2.dtsi
> index debcbd15c24b..40ed15465095 100644
> --- a/arch/arm/boot/dts/uniphier-pxs2.dtsi
> +++ b/arch/arm/boot/dts/uniphier-pxs2.dtsi
> @@ -36,6 +36,7 @@
> enable-method = "psci";
> next-level-cache = <>;
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
>
> cpu2: cpu@2 {
> @@ -46,6 +47,7 @@
> enable-method = "psci";
> next-level-cache = <>;
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
>
> cpu3: cpu@3 {
> @@ -56,6 +58,7 @@
> enable-method = "psci";
> next-level-cache = <>;
> operating-points-v2 = <_opp>;
> +   #cooling-cells = <2>;
> };
> };
>
> --
> 2.15.0.194.g9af6a3dea062
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


[RESEND] [PATCH] platform/x86: dell-wmi: Ignore new rfkill and fn-lock events

2018-05-31 Thread Kai-Heng Feng
There are two new events generated by dell-wmi, rfkill and fn-lock, from
Dell Systems.

When Fn-lock hotkey gets pressed to switch to function mode:
[85951.591542] dell_wmi: Unknown key with type 0x0010 and code 0xe035
pressed
[85951.591546] dell_wmi: Unknown key with type 0x0010 and code 0x
pressed

When Fn-lock hotkey gets pressed to switch to multimedia mode:
[85956.667686] dell_wmi: Unknown key with type 0x0010 and code 0xe035
pressed
[85956.667690] dell_wmi: Unknown key with type 0x0010 and code 0x0001
pressed

When radio hotkey gets pressed:
[85974.430220] dell_wmi: Unknown key with type 0x0010 and code 0xe008
pressed

These events are for notification purpose, so we can ignore them.

This patch is tested on XPS 9370.

Reviewed-by: Pali Rohár 
Reviewed-by: Mario Limonciello 
Signed-off-by: Kai-Heng Feng 
---
 drivers/platform/x86/dell-wmi.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 8d102195a392..ba8e6725d7ac 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -261,6 +261,12 @@ static const u16 bios_to_linux_keycode[256] = {
  * override them.
  */
 static const struct key_entry dell_wmi_keymap_type_0010[] = {
+   /* Fn-lock switched to function keys */
+   { KE_IGNORE, 0x0, { KEY_RESERVED } },
+
+   /* Fn-lock switched to multimedia keys */
+   { KE_IGNORE, 0x1, { KEY_RESERVED } },
+
/* Mic mute */
{ KE_KEY, 0x150, { KEY_MICMUTE } },
 
@@ -296,6 +302,14 @@ static const struct key_entry dell_wmi_keymap_type_0010[] 
= {
{ KE_KEY,0x851, { KEY_PROG2 } },
{ KE_KEY,0x852, { KEY_PROG3 } },
 
+   /*
+* Radio disable (notify only -- there is no model for which the
+* WMI event is supposed to trigger an action).
+*/
+   { KE_IGNORE, 0xe008, { KEY_RFKILL } },
+
+   /* Fn-lock */
+   { KE_IGNORE, 0xe035, { KEY_RESERVED } },
 };
 
 /*
-- 
2.17.0



[RESEND] [PATCH] platform/x86: dell-wmi: Ignore new rfkill and fn-lock events

2018-05-31 Thread Kai-Heng Feng
There are two new events generated by dell-wmi, rfkill and fn-lock, from
Dell Systems.

When Fn-lock hotkey gets pressed to switch to function mode:
[85951.591542] dell_wmi: Unknown key with type 0x0010 and code 0xe035
pressed
[85951.591546] dell_wmi: Unknown key with type 0x0010 and code 0x
pressed

When Fn-lock hotkey gets pressed to switch to multimedia mode:
[85956.667686] dell_wmi: Unknown key with type 0x0010 and code 0xe035
pressed
[85956.667690] dell_wmi: Unknown key with type 0x0010 and code 0x0001
pressed

When radio hotkey gets pressed:
[85974.430220] dell_wmi: Unknown key with type 0x0010 and code 0xe008
pressed

These events are for notification purpose, so we can ignore them.

This patch is tested on XPS 9370.

Reviewed-by: Pali Rohár 
Reviewed-by: Mario Limonciello 
Signed-off-by: Kai-Heng Feng 
---
 drivers/platform/x86/dell-wmi.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/platform/x86/dell-wmi.c b/drivers/platform/x86/dell-wmi.c
index 8d102195a392..ba8e6725d7ac 100644
--- a/drivers/platform/x86/dell-wmi.c
+++ b/drivers/platform/x86/dell-wmi.c
@@ -261,6 +261,12 @@ static const u16 bios_to_linux_keycode[256] = {
  * override them.
  */
 static const struct key_entry dell_wmi_keymap_type_0010[] = {
+   /* Fn-lock switched to function keys */
+   { KE_IGNORE, 0x0, { KEY_RESERVED } },
+
+   /* Fn-lock switched to multimedia keys */
+   { KE_IGNORE, 0x1, { KEY_RESERVED } },
+
/* Mic mute */
{ KE_KEY, 0x150, { KEY_MICMUTE } },
 
@@ -296,6 +302,14 @@ static const struct key_entry dell_wmi_keymap_type_0010[] 
= {
{ KE_KEY,0x851, { KEY_PROG2 } },
{ KE_KEY,0x852, { KEY_PROG3 } },
 
+   /*
+* Radio disable (notify only -- there is no model for which the
+* WMI event is supposed to trigger an action).
+*/
+   { KE_IGNORE, 0xe008, { KEY_RFKILL } },
+
+   /* Fn-lock */
+   { KE_IGNORE, 0xe035, { KEY_RESERVED } },
 };
 
 /*
-- 
2.17.0



Re: [PATCH 20/32] vfs: Make close() unmount the attached mount if so flagged [ver #8]

2018-05-31 Thread Al Viro
FWIW, this on top of your current branch (I'll fold and reorder) gets
your move_mount(2) test work, AFAICS:

diff --git a/fs/file_table.c b/fs/file_table.c
index 06e979e1347e..6dd9760b3ddc 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -200,9 +200,6 @@ static void __fput(struct file *file)
eventpoll_release(file);
locks_remove_file(file);
 
-   if (unlikely(file->f_mode & FMODE_NEED_UNMOUNT))
-   drop_collected_mounts(mnt);
-
ima_file_free(file);
if (unlikely(file->f_flags & FASYNC)) {
if (file->f_op->fasync)
@@ -228,7 +225,10 @@ static void __fput(struct file *file)
file->f_inode = NULL;
file_free(file);
dput(dentry);
-   mntput(mnt);
+   if (unlikely(file->f_mode & FMODE_NEED_UNMOUNT))
+   umount_on_fput(mnt);
+   else
+   mntput(mnt);
 }
 
 static LLIST_HEAD(delayed_fput_list);
diff --git a/fs/internal.h b/fs/internal.h
index 6d0538af32d9..7ec629916f6d 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -75,7 +75,7 @@ extern struct vfsmount *lookup_mnt(const struct path *);
 extern int finish_automount(struct vfsmount *, struct path *);
 
 extern int sb_prepare_remount_readonly(struct super_block *);
-extern int copy_mount_for_o_path(struct path *, struct path *, bool);
+extern int copy_mount_for_o_path(struct path *, bool);
 
 extern void __init mnt_init(void);
 
@@ -86,6 +86,8 @@ extern void __mnt_drop_write(struct vfsmount *);
 extern void __mnt_drop_write_file(struct file *);
 extern void mnt_drop_write_file_path(struct file *);
 
+extern void umount_on_fput(struct vfsmount *);
+
 /*
  * fs_struct.c
  */
diff --git a/fs/namei.c b/fs/namei.c
index 7430e000c1d2..b1c451b6b508 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3454,28 +3454,20 @@ static int do_tmpfile(struct nameidata *nd, unsigned 
flags,
 
 static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file)
 {
-   struct path path, tmp;
-   int error;
-
-   error = path_lookupat(nd, flags, );
+   struct path path;
+   int error = path_lookupat(nd, flags, );
if (error)
return error;
 
if (file->f_flags & O_CLONE_MOUNT) {
-   error = copy_mount_for_o_path(
-   , , !(file->f_flags & O_NON_RECURSIVE));
-   path_put();
+   error = copy_mount_for_o_path(,
+   !(file->f_flags & O_NON_RECURSIVE));
if (error < 0)
return error;
-   path = tmp;
}
 
audit_inode(nd->name, path.dentry, 0);
error = vfs_open(, file, current_cred());
-   if (error < 0 &&
-   (flags & O_CLONE_MOUNT) &&
-   !(file->f_mode & FMODE_NEED_UNMOUNT))
-   __detach_mounts(path.dentry);
path_put();
return error;
 }
diff --git a/fs/namespace.c b/fs/namespace.c
index dd9cf81c2aea..74c68a23d088 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1793,7 +1793,7 @@ struct vfsmount *collect_mounts(const struct path *path)
return >mnt;
 }
 
-void drop_collected_mounts(struct vfsmount *mnt)
+void umount_on_fput(struct vfsmount *mnt)
 {
namespace_lock();
lock_mount_hash();
@@ -1803,6 +1803,15 @@ void drop_collected_mounts(struct vfsmount *mnt)
namespace_unlock();
 }
 
+void drop_collected_mounts(struct vfsmount *mnt)
+{
+   namespace_lock();
+   lock_mount_hash();
+   umount_tree(real_mount(mnt), UMOUNT_SYNC);
+   unlock_mount_hash();
+   namespace_unlock();
+}
+
 /**
  * clone_private_mount - create a private clone of a path
  *
@@ -2153,6 +2162,30 @@ static bool has_locked_children(struct mount *mnt, 
struct dentry *dentry)
return false;
 }
 
+static struct mount *__do_loopback(struct path *old_path, int recurse)
+{
+   struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt);
+
+   if (IS_MNT_UNBINDABLE(old))
+   return mnt;
+
+   if (!check_mnt(old) && old_path->dentry->d_op != _dentry_operations)
+   return mnt;
+
+   if (!recurse && has_locked_children(old, old_path->dentry))
+   return mnt;
+
+   if (recurse)
+   mnt = copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE);
+   else
+   mnt = clone_mnt(old, old_path->dentry, 0);
+
+   if (!IS_ERR(mnt))
+   mnt->mnt.mnt_flags &= ~MNT_LOCKED;
+
+   return mnt;
+}
+
 /*
  * do loopback mount.
  */
@@ -2160,7 +2193,7 @@ static int do_loopback(struct path *path, const char 
*old_name,
int recurse)
 {
struct path old_path;
-   struct mount *mnt = NULL, *old, *parent;
+   struct mount *mnt = NULL, *parent;
struct mountpoint *mp;
int err;
if (!old_name || !*old_name)
@@ -2174,38 +2207,21 @@ static int do_loopback(struct path *path, const char 
*old_name,
goto out;
 
mp = 

Re: [PATCH 20/32] vfs: Make close() unmount the attached mount if so flagged [ver #8]

2018-05-31 Thread Al Viro
FWIW, this on top of your current branch (I'll fold and reorder) gets
your move_mount(2) test work, AFAICS:

diff --git a/fs/file_table.c b/fs/file_table.c
index 06e979e1347e..6dd9760b3ddc 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -200,9 +200,6 @@ static void __fput(struct file *file)
eventpoll_release(file);
locks_remove_file(file);
 
-   if (unlikely(file->f_mode & FMODE_NEED_UNMOUNT))
-   drop_collected_mounts(mnt);
-
ima_file_free(file);
if (unlikely(file->f_flags & FASYNC)) {
if (file->f_op->fasync)
@@ -228,7 +225,10 @@ static void __fput(struct file *file)
file->f_inode = NULL;
file_free(file);
dput(dentry);
-   mntput(mnt);
+   if (unlikely(file->f_mode & FMODE_NEED_UNMOUNT))
+   umount_on_fput(mnt);
+   else
+   mntput(mnt);
 }
 
 static LLIST_HEAD(delayed_fput_list);
diff --git a/fs/internal.h b/fs/internal.h
index 6d0538af32d9..7ec629916f6d 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -75,7 +75,7 @@ extern struct vfsmount *lookup_mnt(const struct path *);
 extern int finish_automount(struct vfsmount *, struct path *);
 
 extern int sb_prepare_remount_readonly(struct super_block *);
-extern int copy_mount_for_o_path(struct path *, struct path *, bool);
+extern int copy_mount_for_o_path(struct path *, bool);
 
 extern void __init mnt_init(void);
 
@@ -86,6 +86,8 @@ extern void __mnt_drop_write(struct vfsmount *);
 extern void __mnt_drop_write_file(struct file *);
 extern void mnt_drop_write_file_path(struct file *);
 
+extern void umount_on_fput(struct vfsmount *);
+
 /*
  * fs_struct.c
  */
diff --git a/fs/namei.c b/fs/namei.c
index 7430e000c1d2..b1c451b6b508 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3454,28 +3454,20 @@ static int do_tmpfile(struct nameidata *nd, unsigned 
flags,
 
 static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file)
 {
-   struct path path, tmp;
-   int error;
-
-   error = path_lookupat(nd, flags, );
+   struct path path;
+   int error = path_lookupat(nd, flags, );
if (error)
return error;
 
if (file->f_flags & O_CLONE_MOUNT) {
-   error = copy_mount_for_o_path(
-   , , !(file->f_flags & O_NON_RECURSIVE));
-   path_put();
+   error = copy_mount_for_o_path(,
+   !(file->f_flags & O_NON_RECURSIVE));
if (error < 0)
return error;
-   path = tmp;
}
 
audit_inode(nd->name, path.dentry, 0);
error = vfs_open(, file, current_cred());
-   if (error < 0 &&
-   (flags & O_CLONE_MOUNT) &&
-   !(file->f_mode & FMODE_NEED_UNMOUNT))
-   __detach_mounts(path.dentry);
path_put();
return error;
 }
diff --git a/fs/namespace.c b/fs/namespace.c
index dd9cf81c2aea..74c68a23d088 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1793,7 +1793,7 @@ struct vfsmount *collect_mounts(const struct path *path)
return >mnt;
 }
 
-void drop_collected_mounts(struct vfsmount *mnt)
+void umount_on_fput(struct vfsmount *mnt)
 {
namespace_lock();
lock_mount_hash();
@@ -1803,6 +1803,15 @@ void drop_collected_mounts(struct vfsmount *mnt)
namespace_unlock();
 }
 
+void drop_collected_mounts(struct vfsmount *mnt)
+{
+   namespace_lock();
+   lock_mount_hash();
+   umount_tree(real_mount(mnt), UMOUNT_SYNC);
+   unlock_mount_hash();
+   namespace_unlock();
+}
+
 /**
  * clone_private_mount - create a private clone of a path
  *
@@ -2153,6 +2162,30 @@ static bool has_locked_children(struct mount *mnt, 
struct dentry *dentry)
return false;
 }
 
+static struct mount *__do_loopback(struct path *old_path, int recurse)
+{
+   struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt);
+
+   if (IS_MNT_UNBINDABLE(old))
+   return mnt;
+
+   if (!check_mnt(old) && old_path->dentry->d_op != _dentry_operations)
+   return mnt;
+
+   if (!recurse && has_locked_children(old, old_path->dentry))
+   return mnt;
+
+   if (recurse)
+   mnt = copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE);
+   else
+   mnt = clone_mnt(old, old_path->dentry, 0);
+
+   if (!IS_ERR(mnt))
+   mnt->mnt.mnt_flags &= ~MNT_LOCKED;
+
+   return mnt;
+}
+
 /*
  * do loopback mount.
  */
@@ -2160,7 +2193,7 @@ static int do_loopback(struct path *path, const char 
*old_name,
int recurse)
 {
struct path old_path;
-   struct mount *mnt = NULL, *old, *parent;
+   struct mount *mnt = NULL, *parent;
struct mountpoint *mp;
int err;
if (!old_name || !*old_name)
@@ -2174,38 +2207,21 @@ static int do_loopback(struct path *path, const char 
*old_name,
goto out;
 
mp = 

Re: [RESEND PATCH v3 1/2] sched/deadline: Add cpudl_maximum_dl() for clean-up

2018-05-31 Thread Byungchul Park




On 2018-05-25 14:13, Byungchul Park wrote:



On 2018-05-09 15:33, Byungchul Park wrote:

On Thu, Jan 11, 2018 at 10:07:16AM +0100, Peter Zijlstra wrote:



Sorry for the huge delay on this, but I'll have to postpone further.
Still busy with meltdown/spectre stuff.


Please consider this. Even though it's not a big bug, anyway leading
mis-behavior in certain situaions.


Could you see this patches, it's been too long since the start tho?


Please, any opinion.

--
Thanks,
Byungchul


Re: [RESEND PATCH v3 1/2] sched/deadline: Add cpudl_maximum_dl() for clean-up

2018-05-31 Thread Byungchul Park




On 2018-05-25 14:13, Byungchul Park wrote:



On 2018-05-09 15:33, Byungchul Park wrote:

On Thu, Jan 11, 2018 at 10:07:16AM +0100, Peter Zijlstra wrote:



Sorry for the huge delay on this, but I'll have to postpone further.
Still busy with meltdown/spectre stuff.


Please consider this. Even though it's not a big bug, anyway leading
mis-behavior in certain situaions.


Could you see this patches, it's been too long since the start tho?


Please, any opinion.

--
Thanks,
Byungchul


[PATCH v3] PCI: mediatek: Add system pm support for MT2712

2018-05-31 Thread honghui.zhang
From: Honghui Zhang 

The MTCMOS of PCIe Host for MT2712 will be off when system suspend, and all
the internal control register will be reset after system resume. The PCIe
link should be re-established and the related control register values
should be re-set after system resume.

Signed-off-by: Honghui Zhang 
CC: Ryder Lee 
---
 drivers/pci/host/pcie-mediatek.c | 60 
 1 file changed, 60 insertions(+)

diff --git a/drivers/pci/host/pcie-mediatek.c b/drivers/pci/host/pcie-mediatek.c
index dabf1086..5363cc7 100644
--- a/drivers/pci/host/pcie-mediatek.c
+++ b/drivers/pci/host/pcie-mediatek.c
@@ -132,12 +132,14 @@ struct mtk_pcie_port;
 /**
  * struct mtk_pcie_soc - differentiate between host generations
  * @need_fix_class_id: whether this host's class ID needed to be fixed or not
+ * @pm_support: whether the host's MTCMOS will be off when suspend
  * @ops: pointer to configuration access functions
  * @startup: pointer to controller setting functions
  * @setup_irq: pointer to initialize IRQ functions
  */
 struct mtk_pcie_soc {
bool need_fix_class_id;
+   bool pm_support;
struct pci_ops *ops;
int (*startup)(struct mtk_pcie_port *port);
int (*setup_irq)(struct mtk_pcie_port *port, struct device_node *node);
@@ -1179,12 +1181,69 @@ static int mtk_pcie_probe(struct platform_device *pdev)
return err;
 }
 
+#ifdef CONFIG_PM_SLEEP
+static int mtk_pcie_suspend_noirq(struct device *dev)
+{
+   struct mtk_pcie *pcie = dev_get_drvdata(dev);
+   const struct mtk_pcie_soc *soc = pcie->soc;
+   struct mtk_pcie_port *port;
+
+   if (!soc->pm_support)
+   return 0;
+
+   list_for_each_entry(port, >ports, list) {
+   clk_disable_unprepare(port->ahb_ck);
+   clk_disable_unprepare(port->sys_ck);
+   phy_power_off(port->phy);
+   }
+
+   return 0;
+}
+
+static int mtk_pcie_resume_noirq(struct device *dev)
+{
+   struct mtk_pcie *pcie = dev_get_drvdata(dev);
+   const struct mtk_pcie_soc *soc = pcie->soc;
+   struct mtk_pcie_port *port;
+   int ret;
+
+   if (!soc->pm_support)
+   return 0;
+
+   list_for_each_entry(port, >ports, list) {
+   phy_power_on(port->phy);
+   clk_prepare_enable(port->sys_ck);
+   clk_prepare_enable(port->ahb_ck);
+
+   ret = soc->startup(port);
+   if (ret) {
+   dev_err(dev, "Port%d link down\n", port->slot);
+   phy_power_off(port->phy);
+   clk_disable_unprepare(port->sys_ck);
+   clk_disable_unprepare(port->ahb_ck);
+   return ret;
+   }
+
+   if (IS_ENABLED(CONFIG_PCI_MSI))
+   mtk_pcie_enable_msi(port);
+   }
+
+   return 0;
+}
+#endif
+
+static const struct dev_pm_ops mtk_pcie_pm_ops = {
+   SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(mtk_pcie_suspend_noirq,
+ mtk_pcie_resume_noirq)
+};
+
 static const struct mtk_pcie_soc mtk_pcie_soc_v1 = {
.ops = _pcie_ops,
.startup = mtk_pcie_startup_port,
 };
 
 static const struct mtk_pcie_soc mtk_pcie_soc_mt2712 = {
+   .pm_support = true,
.ops = _pcie_ops_v2,
.startup = mtk_pcie_startup_port_v2,
.setup_irq = mtk_pcie_setup_irq,
@@ -1211,6 +1270,7 @@ static struct platform_driver mtk_pcie_driver = {
.name = "mtk-pcie",
.of_match_table = mtk_pcie_ids,
.suppress_bind_attrs = true,
+   .pm = _pcie_pm_ops,
},
 };
 builtin_platform_driver(mtk_pcie_driver);
-- 
2.6.4



[PATCH v3] PCI: mediatek: Add system pm support for MT2712

2018-05-31 Thread honghui.zhang
From: Honghui Zhang 

The MTCMOS of PCIe Host for MT2712 will be off when system suspend, and all
the internal control register will be reset after system resume. The PCIe
link should be re-established and the related control register values
should be re-set after system resume.

Signed-off-by: Honghui Zhang 
CC: Ryder Lee 
---
 drivers/pci/host/pcie-mediatek.c | 60 
 1 file changed, 60 insertions(+)

diff --git a/drivers/pci/host/pcie-mediatek.c b/drivers/pci/host/pcie-mediatek.c
index dabf1086..5363cc7 100644
--- a/drivers/pci/host/pcie-mediatek.c
+++ b/drivers/pci/host/pcie-mediatek.c
@@ -132,12 +132,14 @@ struct mtk_pcie_port;
 /**
  * struct mtk_pcie_soc - differentiate between host generations
  * @need_fix_class_id: whether this host's class ID needed to be fixed or not
+ * @pm_support: whether the host's MTCMOS will be off when suspend
  * @ops: pointer to configuration access functions
  * @startup: pointer to controller setting functions
  * @setup_irq: pointer to initialize IRQ functions
  */
 struct mtk_pcie_soc {
bool need_fix_class_id;
+   bool pm_support;
struct pci_ops *ops;
int (*startup)(struct mtk_pcie_port *port);
int (*setup_irq)(struct mtk_pcie_port *port, struct device_node *node);
@@ -1179,12 +1181,69 @@ static int mtk_pcie_probe(struct platform_device *pdev)
return err;
 }
 
+#ifdef CONFIG_PM_SLEEP
+static int mtk_pcie_suspend_noirq(struct device *dev)
+{
+   struct mtk_pcie *pcie = dev_get_drvdata(dev);
+   const struct mtk_pcie_soc *soc = pcie->soc;
+   struct mtk_pcie_port *port;
+
+   if (!soc->pm_support)
+   return 0;
+
+   list_for_each_entry(port, >ports, list) {
+   clk_disable_unprepare(port->ahb_ck);
+   clk_disable_unprepare(port->sys_ck);
+   phy_power_off(port->phy);
+   }
+
+   return 0;
+}
+
+static int mtk_pcie_resume_noirq(struct device *dev)
+{
+   struct mtk_pcie *pcie = dev_get_drvdata(dev);
+   const struct mtk_pcie_soc *soc = pcie->soc;
+   struct mtk_pcie_port *port;
+   int ret;
+
+   if (!soc->pm_support)
+   return 0;
+
+   list_for_each_entry(port, >ports, list) {
+   phy_power_on(port->phy);
+   clk_prepare_enable(port->sys_ck);
+   clk_prepare_enable(port->ahb_ck);
+
+   ret = soc->startup(port);
+   if (ret) {
+   dev_err(dev, "Port%d link down\n", port->slot);
+   phy_power_off(port->phy);
+   clk_disable_unprepare(port->sys_ck);
+   clk_disable_unprepare(port->ahb_ck);
+   return ret;
+   }
+
+   if (IS_ENABLED(CONFIG_PCI_MSI))
+   mtk_pcie_enable_msi(port);
+   }
+
+   return 0;
+}
+#endif
+
+static const struct dev_pm_ops mtk_pcie_pm_ops = {
+   SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(mtk_pcie_suspend_noirq,
+ mtk_pcie_resume_noirq)
+};
+
 static const struct mtk_pcie_soc mtk_pcie_soc_v1 = {
.ops = _pcie_ops,
.startup = mtk_pcie_startup_port,
 };
 
 static const struct mtk_pcie_soc mtk_pcie_soc_mt2712 = {
+   .pm_support = true,
.ops = _pcie_ops_v2,
.startup = mtk_pcie_startup_port_v2,
.setup_irq = mtk_pcie_setup_irq,
@@ -1211,6 +1270,7 @@ static struct platform_driver mtk_pcie_driver = {
.name = "mtk-pcie",
.of_match_table = mtk_pcie_ids,
.suppress_bind_attrs = true,
+   .pm = _pcie_pm_ops,
},
 };
 builtin_platform_driver(mtk_pcie_driver);
-- 
2.6.4



Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-05-31 Thread Martin K. Petersen


Mike,

> 1) container A is tasked with managing some dedicated NVMe technology
> that absolutely needs native NVMe multipath.

> 2) container B is tasked with offering some canned layered product
> that was developed ontop of dm-multipath with its own multipath-tools
> oriented APIs, etc. And it is to manage some other NVMe technology on
> the same host as container A.

This assumes there is something to manage. And that the administrative
model currently employed by DM multipath will be easily applicable to
ANA devices. I don't believe that's the case. The configuration happens
on the storage side, not on the host.

With ALUA (and the proprietary implementations that predated the spec),
it was very fuzzy whether it was the host or the target that owned
responsibility for this or that. Part of the reason was that ALUA was
deliberately vague to accommodate everybody's existing, non-standards
compliant multipath storage implementations.

With ANA the heavy burden falls entirely on the storage. Most of the
things you would currently configure in multipath.conf have no meaning
in the context of ANA. Things that are currently the domain of
dm-multipath or multipathd are inextricably living either in the storage
device or in the NVMe ANA "device handler". And I think you are
significantly underestimating the effort required to expose that
information up the stack and to make use of it. That's not just a
multipath personality toggle switch.

If you want to make multipath -ll show something meaningful for ANA
devices, then by all means go ahead. I don't have any problem with
that. But I don't think the burden of allowing multipathd/DM to inject
themselves into the path transition state machine has any benefit
whatsoever to the user. It's only complicating things and therefore we'd
be doing people a disservice rather than a favor.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 0/3] Provide more fine grained control over multipathing

2018-05-31 Thread Martin K. Petersen


Mike,

> 1) container A is tasked with managing some dedicated NVMe technology
> that absolutely needs native NVMe multipath.

> 2) container B is tasked with offering some canned layered product
> that was developed ontop of dm-multipath with its own multipath-tools
> oriented APIs, etc. And it is to manage some other NVMe technology on
> the same host as container A.

This assumes there is something to manage. And that the administrative
model currently employed by DM multipath will be easily applicable to
ANA devices. I don't believe that's the case. The configuration happens
on the storage side, not on the host.

With ALUA (and the proprietary implementations that predated the spec),
it was very fuzzy whether it was the host or the target that owned
responsibility for this or that. Part of the reason was that ALUA was
deliberately vague to accommodate everybody's existing, non-standards
compliant multipath storage implementations.

With ANA the heavy burden falls entirely on the storage. Most of the
things you would currently configure in multipath.conf have no meaning
in the context of ANA. Things that are currently the domain of
dm-multipath or multipathd are inextricably living either in the storage
device or in the NVMe ANA "device handler". And I think you are
significantly underestimating the effort required to expose that
information up the stack and to make use of it. That's not just a
multipath personality toggle switch.

If you want to make multipath -ll show something meaningful for ANA
devices, then by all means go ahead. I don't have any problem with
that. But I don't think the burden of allowing multipathd/DM to inject
themselves into the path transition state machine has any benefit
whatsoever to the user. It's only complicating things and therefore we'd
be doing people a disservice rather than a favor.

-- 
Martin K. Petersen  Oracle Linux Engineering


4.14.44: BUG_ON(!list_empty(>wait_list));

2018-05-31 Thread Daniel J Blueman
Plugging in a USB-C power source on my Dell XPS 9550 trips an ACPI
BUG_ON [1], reproducible with mainline 4.14.44, suggesting other
threads are waiting for semaphore acquisition due to
"BUG_ON(!list_empty(>wait_list))".

This is the current 1.7.0 BIOS with Ubuntu 18.04 userspace, plugging
in an LG 27UD88 (also with the current firmware) monitor USB-C
connection which apparently advertises 60W charging (x1,
PowerDelivery, DisplayPort alternative mode, data). The same issues
reproduce on a Dell Precision 5510 with Ubuntu 16.04, the shipped
kernel and 4.14.44.

I can enable ACPI debugging if useful? Perhaps ACPI_DB_MUTEX or other
levels would be appropriate?

Thanks,
  Daniel

-- [1]

kernel BUG at /home/kernel/COD/linux/drivers/acpi/osl.c:1201
invalid opcode:  [#1] SMP PTI
Modules linked in: [...]
CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.14.44-041444-generic
#201805251612
Hardware name: Dell Inc. XPS 15 9550/, BIOS 1.7.0 02/23/2018
task: 9bc2ab6b9740 task.stack: bOca80034000
RIP: 0010:acpi_os_delete_semaphore+0x6d/0x70
RSP: 0018:bOca80037be8 EFLAGS: 00010283
RAX: bOca83f8fc40 RBX: 9bc238b5dbe0 RCX: 
RDX: 9bc238b5dbe8 RSI:  RDI: 9bc238b5dbe0
RBP: 9bc2adlc0990 ROB: 9bc2bdc25f20 R09: 9bc29ee56300
R10: e03bd2796440 R11: 9bc2ad183fa0 R12: 9bc22f1321e0
R13: 0001 R14: 0001 R15: 9bc22f132eb0
FS: 7fc03886f940() GS:9bc2bdc0() knlGS:
CS: 0010 DS:  ES:  CRO: 80050033
CR2: 7ffc645e70f8 CR3: 00049e120001 CR4: 003606f0
Call Trace:
acpi_ex_system_reset_event+0x3f/0x65
acpi_ex_opcode_1A_OT_0R+0x70/0xfa
acpi_ds_exec_end_op+0x15d/0x71b
acpi_ps_parse_loop+0x929/0x9d6
? acpi_ds_result_push+0x82/0x1d2
acpi_ps_parse_aml+0x1a2/0x4af
acpi_ps_execute_method+0x1ef/0x2ab
acpi_ns_evaluate+0x2e4/0x41d
acpi_evaluate_object+0x1cb/0x38e
acpi_enter_sleep_state_prep+0xae/0x13a
acpi_sleep_prepare.part.2+0x2e/0x40
acpi_power_off_prepare+0xf/0x20
[38871.1925361 kernel_power_off+0x42/0x70
SYSC_reboot+0x12f/0x210
? handle_mm_fault+0xea/0x1e0
[38871.1925861 ? do_writev+0x5e/0xf0
? do_writev+0x5e/0xf0
do_syscall_64+0x6e/0x120
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fc03839b373
RSP: 002b:7ffc645e70f8 EFLAGS: 0202 ORIG_RAX: 00a9
RAX: ffda RBX: 4321fedc RCX: 7fc03839b373
ROX: 4321fedc RSI: 28121969 RDI: fee1dead
RBP: 7ffc645e7160 R08:  R09: 
R10: 0002 R11: 0202 R12: 7ffc645e7168
R13:  R14: 001b0004 R15: 7ffc645e7458
Code: b8 00 04 00 00 48 c7 c1 c3 91 28 ab 48 c7 c2 20 91 28 ab be of
04 00 00 bf 00 00 00 01 03 41 85 04 00 58 eb b0 b8 01 10 00 00 c3 
Ob 90 Of if 44 00 00 80 3d 74 CO 97 01 00 41 54 55 53 Of 84
RIP: acpi_os_delete_semaphore+0x6d/0x70 RSP: b0ca80037be8
-- 
Daniel J Blueman


4.14.44: BUG_ON(!list_empty(>wait_list));

2018-05-31 Thread Daniel J Blueman
Plugging in a USB-C power source on my Dell XPS 9550 trips an ACPI
BUG_ON [1], reproducible with mainline 4.14.44, suggesting other
threads are waiting for semaphore acquisition due to
"BUG_ON(!list_empty(>wait_list))".

This is the current 1.7.0 BIOS with Ubuntu 18.04 userspace, plugging
in an LG 27UD88 (also with the current firmware) monitor USB-C
connection which apparently advertises 60W charging (x1,
PowerDelivery, DisplayPort alternative mode, data). The same issues
reproduce on a Dell Precision 5510 with Ubuntu 16.04, the shipped
kernel and 4.14.44.

I can enable ACPI debugging if useful? Perhaps ACPI_DB_MUTEX or other
levels would be appropriate?

Thanks,
  Daniel

-- [1]

kernel BUG at /home/kernel/COD/linux/drivers/acpi/osl.c:1201
invalid opcode:  [#1] SMP PTI
Modules linked in: [...]
CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.14.44-041444-generic
#201805251612
Hardware name: Dell Inc. XPS 15 9550/, BIOS 1.7.0 02/23/2018
task: 9bc2ab6b9740 task.stack: bOca80034000
RIP: 0010:acpi_os_delete_semaphore+0x6d/0x70
RSP: 0018:bOca80037be8 EFLAGS: 00010283
RAX: bOca83f8fc40 RBX: 9bc238b5dbe0 RCX: 
RDX: 9bc238b5dbe8 RSI:  RDI: 9bc238b5dbe0
RBP: 9bc2adlc0990 ROB: 9bc2bdc25f20 R09: 9bc29ee56300
R10: e03bd2796440 R11: 9bc2ad183fa0 R12: 9bc22f1321e0
R13: 0001 R14: 0001 R15: 9bc22f132eb0
FS: 7fc03886f940() GS:9bc2bdc0() knlGS:
CS: 0010 DS:  ES:  CRO: 80050033
CR2: 7ffc645e70f8 CR3: 00049e120001 CR4: 003606f0
Call Trace:
acpi_ex_system_reset_event+0x3f/0x65
acpi_ex_opcode_1A_OT_0R+0x70/0xfa
acpi_ds_exec_end_op+0x15d/0x71b
acpi_ps_parse_loop+0x929/0x9d6
? acpi_ds_result_push+0x82/0x1d2
acpi_ps_parse_aml+0x1a2/0x4af
acpi_ps_execute_method+0x1ef/0x2ab
acpi_ns_evaluate+0x2e4/0x41d
acpi_evaluate_object+0x1cb/0x38e
acpi_enter_sleep_state_prep+0xae/0x13a
acpi_sleep_prepare.part.2+0x2e/0x40
acpi_power_off_prepare+0xf/0x20
[38871.1925361 kernel_power_off+0x42/0x70
SYSC_reboot+0x12f/0x210
? handle_mm_fault+0xea/0x1e0
[38871.1925861 ? do_writev+0x5e/0xf0
? do_writev+0x5e/0xf0
do_syscall_64+0x6e/0x120
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fc03839b373
RSP: 002b:7ffc645e70f8 EFLAGS: 0202 ORIG_RAX: 00a9
RAX: ffda RBX: 4321fedc RCX: 7fc03839b373
ROX: 4321fedc RSI: 28121969 RDI: fee1dead
RBP: 7ffc645e7160 R08:  R09: 
R10: 0002 R11: 0202 R12: 7ffc645e7168
R13:  R14: 001b0004 R15: 7ffc645e7458
Code: b8 00 04 00 00 48 c7 c1 c3 91 28 ab 48 c7 c2 20 91 28 ab be of
04 00 00 bf 00 00 00 01 03 41 85 04 00 58 eb b0 b8 01 10 00 00 c3 
Ob 90 Of if 44 00 00 80 3d 74 CO 97 01 00 41 54 55 53 Of 84
RIP: acpi_os_delete_semaphore+0x6d/0x70 RSP: b0ca80037be8
-- 
Daniel J Blueman


Re: [PATCH v3 2/5] gpio: syscon: rockchip: add GPIO_MUTE support for rk3328

2018-05-31 Thread Levin

Hi Rob,

On 2018-05-31 10:45 PM, Rob Herring wrote:

On Wed, May 30, 2018 at 10:27 PM,   wrote:

From: Levin Du 

In Rockchip RK3328, the output only GPIO_MUTE pin, originally for codec
mute control, can also be used for general purpose. It is manipulated by
the GRF_SOC_CON10 register.

Signed-off-by: Levin Du 

---

Changes in v3:
- Change from general gpio-syscon to specific rk3328-gpio-mute

Changes in v2:
- Rename gpio_syscon10 to gpio_mute in doc

Changes in v1:
- Refactured for general gpio-syscon usage for Rockchip SoCs.
- Add doc rockchip,gpio-syscon.txt

  .../bindings/gpio/rockchip,rk3328-gpio-mute.txt| 28 +++
  drivers/gpio/gpio-syscon.c | 31 ++
  2 files changed, 59 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/gpio/rockchip,rk3328-gpio-mute.txt

diff --git 
a/Documentation/devicetree/bindings/gpio/rockchip,rk3328-gpio-mute.txt 
b/Documentation/devicetree/bindings/gpio/rockchip,rk3328-gpio-mute.txt
new file mode 100644
index 000..10bc632
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/rockchip,rk3328-gpio-mute.txt
@@ -0,0 +1,28 @@
+Rockchip RK3328 GPIO controller dedicated for the GPIO_MUTE pin.
+
+In Rockchip RK3328, the output only GPIO_MUTE pin, originally for codec mute
+control, can also be used for general purpose. It is manipulated by the
+GRF_SOC_CON10 register.
+
+Required properties:
+- compatible: Should contain "rockchip,rk3328-gpio-mute".
+- gpio-controller: Marks the device node as a gpio controller.
+- #gpio-cells: Should be 2. The first cell is the pin number and
+  the second cell is used to specify the gpio polarity:
+0 = Active high,
+1 = Active low.
+
+Example:
+
+   grf: syscon@ff10 {
+   compatible = "rockchip,rk3328-grf", "syscon", "simple-mfd";
+
+   gpio_mute: gpio-mute {

Node names should be generic:

gpio {

This also means you can't add another GPIO node in the future and
you'll have to live with "rockchip,rk3328-gpio-mute" covering more
than 1 GPIO if you do need to add more GPIOs.


As the first line describes, this GPIO controller is dedicated for the 
GPIO_MUTE pin.
There's only one GPIO pin in the GRF_SOC_CON10 register. Therefore the 
gpio_mute

name is proper IMHO.


+   compatible = "rockchip,rk3328-gpio-mute";
+   gpio-controller;
+   #gpio-cells = <2>;
+   };
+   };
+
+Note: The gpio_mute node should be declared as the child of the GRF (General
+Register File) node. The GPIO_MUTE pin is referred to as <_mute 0>.

This is wrong because you should have 2 cells. The phandle doesn't
count as a cell.

Rob


Thanks for pointing that out. So it should be:

   The GPIO_MUTE pin is referred to as <_mute 0 POLARITY>.


Thanks,
Levin




Re: [PATCH v3 2/5] gpio: syscon: rockchip: add GPIO_MUTE support for rk3328

2018-05-31 Thread Levin

Hi Rob,

On 2018-05-31 10:45 PM, Rob Herring wrote:

On Wed, May 30, 2018 at 10:27 PM,   wrote:

From: Levin Du 

In Rockchip RK3328, the output only GPIO_MUTE pin, originally for codec
mute control, can also be used for general purpose. It is manipulated by
the GRF_SOC_CON10 register.

Signed-off-by: Levin Du 

---

Changes in v3:
- Change from general gpio-syscon to specific rk3328-gpio-mute

Changes in v2:
- Rename gpio_syscon10 to gpio_mute in doc

Changes in v1:
- Refactured for general gpio-syscon usage for Rockchip SoCs.
- Add doc rockchip,gpio-syscon.txt

  .../bindings/gpio/rockchip,rk3328-gpio-mute.txt| 28 +++
  drivers/gpio/gpio-syscon.c | 31 ++
  2 files changed, 59 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/gpio/rockchip,rk3328-gpio-mute.txt

diff --git 
a/Documentation/devicetree/bindings/gpio/rockchip,rk3328-gpio-mute.txt 
b/Documentation/devicetree/bindings/gpio/rockchip,rk3328-gpio-mute.txt
new file mode 100644
index 000..10bc632
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/rockchip,rk3328-gpio-mute.txt
@@ -0,0 +1,28 @@
+Rockchip RK3328 GPIO controller dedicated for the GPIO_MUTE pin.
+
+In Rockchip RK3328, the output only GPIO_MUTE pin, originally for codec mute
+control, can also be used for general purpose. It is manipulated by the
+GRF_SOC_CON10 register.
+
+Required properties:
+- compatible: Should contain "rockchip,rk3328-gpio-mute".
+- gpio-controller: Marks the device node as a gpio controller.
+- #gpio-cells: Should be 2. The first cell is the pin number and
+  the second cell is used to specify the gpio polarity:
+0 = Active high,
+1 = Active low.
+
+Example:
+
+   grf: syscon@ff10 {
+   compatible = "rockchip,rk3328-grf", "syscon", "simple-mfd";
+
+   gpio_mute: gpio-mute {

Node names should be generic:

gpio {

This also means you can't add another GPIO node in the future and
you'll have to live with "rockchip,rk3328-gpio-mute" covering more
than 1 GPIO if you do need to add more GPIOs.


As the first line describes, this GPIO controller is dedicated for the 
GPIO_MUTE pin.
There's only one GPIO pin in the GRF_SOC_CON10 register. Therefore the 
gpio_mute

name is proper IMHO.


+   compatible = "rockchip,rk3328-gpio-mute";
+   gpio-controller;
+   #gpio-cells = <2>;
+   };
+   };
+
+Note: The gpio_mute node should be declared as the child of the GRF (General
+Register File) node. The GPIO_MUTE pin is referred to as <_mute 0>.

This is wrong because you should have 2 cells. The phandle doesn't
count as a cell.

Rob


Thanks for pointing that out. So it should be:

   The GPIO_MUTE pin is referred to as <_mute 0 POLARITY>.


Thanks,
Levin




[PATCH] rcu: Check the range of jiffies_till_{first,next}_fqs when setting them

2018-05-31 Thread Byungchul Park
Currently, the range of jiffies_till_{first,next}_fqs are checked and
adjusted on and on in the loop of rcu_gp_kthread on runtime.

However, it's enough to check them only when setting them, not every
time in the loop. So make them handled on a setting time via sysfs.

Signed-off-by: Byungchul Park 
---
 kernel/rcu/tree.c | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4e96761..eb54d7d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -518,8 +518,38 @@ void rcu_all_qs(void)
 static ulong jiffies_till_next_fqs = ULONG_MAX;
 static bool rcu_kick_kthreads;
 
-module_param(jiffies_till_first_fqs, ulong, 0644);
-module_param(jiffies_till_next_fqs, ulong, 0644);
+static int param_set_first_fqs_jiffies(const char *val, const struct 
kernel_param *kp)
+{
+   ulong j;
+   int ret = kstrtoul(val, 0, );
+
+   if (!ret)
+   WRITE_ONCE(*(ulong *)kp->arg, (j > HZ) ? HZ : j);
+   return ret;
+}
+
+static int param_set_next_fqs_jiffies(const char *val, const struct 
kernel_param *kp)
+{
+   ulong j;
+   int ret = kstrtoul(val, 0, );
+
+   if (!ret)
+   WRITE_ONCE(*(ulong *)kp->arg, (j > HZ) ? HZ : (j ?: 1));
+   return ret;
+}
+
+static struct kernel_param_ops first_fqs_jiffies_ops = {
+   .set = param_set_first_fqs_jiffies,
+   .get = param_get_ulong,
+};
+
+static struct kernel_param_ops next_fqs_jiffies_ops = {
+   .set = param_set_next_fqs_jiffies,
+   .get = param_get_ulong,
+};
+
+module_param_cb(jiffies_till_first_fqs, _fqs_jiffies_ops, 
_till_first_fqs, 0644);
+module_param_cb(jiffies_till_next_fqs, _fqs_jiffies_ops, 
_till_next_fqs, 0644);
 module_param(rcu_kick_kthreads, bool, 0644);
 
 /*
@@ -2129,10 +2159,6 @@ static int __noreturn rcu_gp_kthread(void *arg)
/* Handle quiescent-state forcing. */
first_gp_fqs = true;
j = jiffies_till_first_fqs;
-   if (j > HZ) {
-   j = HZ;
-   jiffies_till_first_fqs = HZ;
-   }
ret = 0;
for (;;) {
if (!ret) {
@@ -2167,13 +2193,6 @@ static int __noreturn rcu_gp_kthread(void *arg)
WRITE_ONCE(rsp->gp_activity, jiffies);
ret = 0; /* Force full wait till next FQS. */
j = jiffies_till_next_fqs;
-   if (j > HZ) {
-   j = HZ;
-   jiffies_till_next_fqs = HZ;
-   } else if (j < 1) {
-   j = 1;
-   jiffies_till_next_fqs = 1;
-   }
} else {
/* Deal with stray signal. */
cond_resched_tasks_rcu_qs();
-- 
1.9.1



Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks

2018-05-31 Thread Qing Huang




On 5/31/2018 2:10 AM, Michal Hocko wrote:

On Thu 31-05-18 10:55:32, Michal Hocko wrote:

On Thu 31-05-18 04:35:31, Eric Dumazet wrote:

[...]

I merely copied/pasted from alloc_skb_with_frags() :/

I will have a look at it. Thanks!

OK, so this is an example of an incremental development ;).

__GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
high order allocations") to prevent from OOM killer. Yet this was
not enough because fb05e7a89f50 ("net: don't wait for order-3 page
allocation") didn't want an excessive reclaim for non-costly orders
so it made it completely NOWAIT while it preserved __GFP_NORETRY in
place which is now redundant. Should I send a patch?



Just curious, how about GFP_ATOMIC flag? Would it work in a similar 
fashion? We experimented
with it a bit in the past but it seemed to cause other issue in our 
tests. :-)


By the way, we didn't encounter any OOM killer events. It seemed that 
the mlx4_alloc_icm() triggered slowpath.

We still had about 2GB free memory while it was highly fragmented.

 #0 [8801f308b380] remove_migration_pte at 811f0e0b
 #1 [8801f308b3e0] rmap_walk_file at 811cb890
 #2 [8801f308b440] rmap_walk at 811cbaf2
 #3 [8801f308b450] remove_migration_ptes at 811f0db0
 #4 [8801f308b490] __unmap_and_move at 811f2ea6
 #5 [8801f308b4e0] unmap_and_move at 811f2fc5
 #6 [8801f308b540] migrate_pages at 811f3219
 #7 [8801f308b5c0] compact_zone at 811b707e
 #8 [8801f308b650] compact_zone_order at 811b735d
 #9 [8801f308b6e0] try_to_compact_pages at 811b7485
#10 [8801f308b770] __alloc_pages_direct_compact at 81195f96
#11 [8801f308b7b0] __alloc_pages_slowpath at 811978a1
#12 [8801f308b890] __alloc_pages_nodemask at 81197ec1
#13 [8801f308b970] alloc_pages_current at 811e261f
#14 [8801f308b9e0] mlx4_alloc_icm at a01f39b2 [mlx4_core]

Thanks!


[PATCH] rcu: Check the range of jiffies_till_{first,next}_fqs when setting them

2018-05-31 Thread Byungchul Park
Currently, the range of jiffies_till_{first,next}_fqs are checked and
adjusted on and on in the loop of rcu_gp_kthread on runtime.

However, it's enough to check them only when setting them, not every
time in the loop. So make them handled on a setting time via sysfs.

Signed-off-by: Byungchul Park 
---
 kernel/rcu/tree.c | 45 -
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4e96761..eb54d7d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -518,8 +518,38 @@ void rcu_all_qs(void)
 static ulong jiffies_till_next_fqs = ULONG_MAX;
 static bool rcu_kick_kthreads;
 
-module_param(jiffies_till_first_fqs, ulong, 0644);
-module_param(jiffies_till_next_fqs, ulong, 0644);
+static int param_set_first_fqs_jiffies(const char *val, const struct 
kernel_param *kp)
+{
+   ulong j;
+   int ret = kstrtoul(val, 0, );
+
+   if (!ret)
+   WRITE_ONCE(*(ulong *)kp->arg, (j > HZ) ? HZ : j);
+   return ret;
+}
+
+static int param_set_next_fqs_jiffies(const char *val, const struct 
kernel_param *kp)
+{
+   ulong j;
+   int ret = kstrtoul(val, 0, );
+
+   if (!ret)
+   WRITE_ONCE(*(ulong *)kp->arg, (j > HZ) ? HZ : (j ?: 1));
+   return ret;
+}
+
+static struct kernel_param_ops first_fqs_jiffies_ops = {
+   .set = param_set_first_fqs_jiffies,
+   .get = param_get_ulong,
+};
+
+static struct kernel_param_ops next_fqs_jiffies_ops = {
+   .set = param_set_next_fqs_jiffies,
+   .get = param_get_ulong,
+};
+
+module_param_cb(jiffies_till_first_fqs, _fqs_jiffies_ops, 
_till_first_fqs, 0644);
+module_param_cb(jiffies_till_next_fqs, _fqs_jiffies_ops, 
_till_next_fqs, 0644);
 module_param(rcu_kick_kthreads, bool, 0644);
 
 /*
@@ -2129,10 +2159,6 @@ static int __noreturn rcu_gp_kthread(void *arg)
/* Handle quiescent-state forcing. */
first_gp_fqs = true;
j = jiffies_till_first_fqs;
-   if (j > HZ) {
-   j = HZ;
-   jiffies_till_first_fqs = HZ;
-   }
ret = 0;
for (;;) {
if (!ret) {
@@ -2167,13 +2193,6 @@ static int __noreturn rcu_gp_kthread(void *arg)
WRITE_ONCE(rsp->gp_activity, jiffies);
ret = 0; /* Force full wait till next FQS. */
j = jiffies_till_next_fqs;
-   if (j > HZ) {
-   j = HZ;
-   jiffies_till_next_fqs = HZ;
-   } else if (j < 1) {
-   j = 1;
-   jiffies_till_next_fqs = 1;
-   }
} else {
/* Deal with stray signal. */
cond_resched_tasks_rcu_qs();
-- 
1.9.1



Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks

2018-05-31 Thread Qing Huang




On 5/31/2018 2:10 AM, Michal Hocko wrote:

On Thu 31-05-18 10:55:32, Michal Hocko wrote:

On Thu 31-05-18 04:35:31, Eric Dumazet wrote:

[...]

I merely copied/pasted from alloc_skb_with_frags() :/

I will have a look at it. Thanks!

OK, so this is an example of an incremental development ;).

__GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
high order allocations") to prevent from OOM killer. Yet this was
not enough because fb05e7a89f50 ("net: don't wait for order-3 page
allocation") didn't want an excessive reclaim for non-costly orders
so it made it completely NOWAIT while it preserved __GFP_NORETRY in
place which is now redundant. Should I send a patch?



Just curious, how about GFP_ATOMIC flag? Would it work in a similar 
fashion? We experimented
with it a bit in the past but it seemed to cause other issue in our 
tests. :-)


By the way, we didn't encounter any OOM killer events. It seemed that 
the mlx4_alloc_icm() triggered slowpath.

We still had about 2GB free memory while it was highly fragmented.

 #0 [8801f308b380] remove_migration_pte at 811f0e0b
 #1 [8801f308b3e0] rmap_walk_file at 811cb890
 #2 [8801f308b440] rmap_walk at 811cbaf2
 #3 [8801f308b450] remove_migration_ptes at 811f0db0
 #4 [8801f308b490] __unmap_and_move at 811f2ea6
 #5 [8801f308b4e0] unmap_and_move at 811f2fc5
 #6 [8801f308b540] migrate_pages at 811f3219
 #7 [8801f308b5c0] compact_zone at 811b707e
 #8 [8801f308b650] compact_zone_order at 811b735d
 #9 [8801f308b6e0] try_to_compact_pages at 811b7485
#10 [8801f308b770] __alloc_pages_direct_compact at 81195f96
#11 [8801f308b7b0] __alloc_pages_slowpath at 811978a1
#12 [8801f308b890] __alloc_pages_nodemask at 81197ec1
#13 [8801f308b970] alloc_pages_current at 811e261f
#14 [8801f308b9e0] mlx4_alloc_icm at a01f39b2 [mlx4_core]

Thanks!


[PATCH] clk: vc5: Avoid divide by zero when rounding or setting rates

2018-05-31 Thread Steve Longerbeam
Add checks in the .round_rate and .set_rate ops for zero requested
rate or zero parent rate. If either are zero in .round_rate, just
return zero. If either are zero in .set_rate, return -EINVAL.

Signed-off-by: Steve Longerbeam 
---
 drivers/clk/clk-versaclock5.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/clk/clk-versaclock5.c b/drivers/clk/clk-versaclock5.c
index decffb3..5524e5d 100644
--- a/drivers/clk/clk-versaclock5.c
+++ b/drivers/clk/clk-versaclock5.c
@@ -304,6 +304,9 @@ static int vc5_dbl_set_rate(struct clk_hw *hw, unsigned 
long rate,
container_of(hw, struct vc5_driver_data, clk_mul);
u32 mask;
 
+   if (parent_rate == 0 || rate == 0)
+   return -EINVAL;
+
if ((parent_rate * 2) == rate)
mask = VC5_PRIM_SRC_SHDN_EN_DOUBLE_XTAL_FREQ;
else
@@ -349,6 +352,10 @@ static long vc5_pfd_round_rate(struct clk_hw *hw, unsigned 
long rate,
 {
unsigned long idiv;
 
+   /* avoid division by zero */
+   if (*parent_rate == 0 || rate == 0)
+   return 0;
+
/* PLL cannot operate with input clock above 50 MHz. */
if (rate > 5000)
return -EINVAL;
@@ -372,6 +379,10 @@ static int vc5_pfd_set_rate(struct clk_hw *hw, unsigned 
long rate,
unsigned long idiv;
u8 div;
 
+   /* avoid division by zero */
+   if (parent_rate == 0 || rate == 0)
+   return -EINVAL;
+
/* CLKIN within range of PLL input, feed directly to PLL. */
if (parent_rate <= 5000) {
regmap_update_bits(vc5->regmap, VC5_VCO_CTRL_AND_PREDIV,
@@ -429,6 +440,10 @@ static long vc5_pll_round_rate(struct clk_hw *hw, unsigned 
long rate,
u32 div_int;
u64 div_frc;
 
+   /* avoid division by zero */
+   if (*parent_rate == 0 || rate == 0)
+   return 0;
+
if (rate < VC5_PLL_VCO_MIN)
rate = VC5_PLL_VCO_MIN;
if (rate > VC5_PLL_VCO_MAX)
@@ -457,6 +472,9 @@ static int vc5_pll_set_rate(struct clk_hw *hw, unsigned 
long rate,
struct vc5_driver_data *vc5 = hwdata->vc5;
u8 fb[5];
 
+   if (parent_rate == 0 || rate == 0)
+   return -EINVAL;
+
fb[0] = hwdata->div_int >> 4;
fb[1] = hwdata->div_int << 4;
fb[2] = hwdata->div_frc >> 16;
@@ -509,6 +527,10 @@ static long vc5_fod_round_rate(struct clk_hw *hw, unsigned 
long rate,
u32 div_int;
u64 div_frc;
 
+   /* avoid division by zero */
+   if (*parent_rate == 0 || rate == 0)
+   return 0;
+
/* Determine integer part, which is 12 bit wide */
div_int = f_in / rate;
/*
@@ -546,6 +568,9 @@ static int vc5_fod_set_rate(struct clk_hw *hw, unsigned 
long rate,
0
};
 
+   if (parent_rate == 0 || rate == 0)
+   return -EINVAL;
+
regmap_bulk_write(vc5->regmap, VC5_OUT_DIV_FRAC(hwdata->num, 0),
  data, 14);
 
-- 
2.7.4



[PATCH] clk: vc5: Avoid divide by zero when rounding or setting rates

2018-05-31 Thread Steve Longerbeam
Add checks in the .round_rate and .set_rate ops for zero requested
rate or zero parent rate. If either are zero in .round_rate, just
return zero. If either are zero in .set_rate, return -EINVAL.

Signed-off-by: Steve Longerbeam 
---
 drivers/clk/clk-versaclock5.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/clk/clk-versaclock5.c b/drivers/clk/clk-versaclock5.c
index decffb3..5524e5d 100644
--- a/drivers/clk/clk-versaclock5.c
+++ b/drivers/clk/clk-versaclock5.c
@@ -304,6 +304,9 @@ static int vc5_dbl_set_rate(struct clk_hw *hw, unsigned 
long rate,
container_of(hw, struct vc5_driver_data, clk_mul);
u32 mask;
 
+   if (parent_rate == 0 || rate == 0)
+   return -EINVAL;
+
if ((parent_rate * 2) == rate)
mask = VC5_PRIM_SRC_SHDN_EN_DOUBLE_XTAL_FREQ;
else
@@ -349,6 +352,10 @@ static long vc5_pfd_round_rate(struct clk_hw *hw, unsigned 
long rate,
 {
unsigned long idiv;
 
+   /* avoid division by zero */
+   if (*parent_rate == 0 || rate == 0)
+   return 0;
+
/* PLL cannot operate with input clock above 50 MHz. */
if (rate > 5000)
return -EINVAL;
@@ -372,6 +379,10 @@ static int vc5_pfd_set_rate(struct clk_hw *hw, unsigned 
long rate,
unsigned long idiv;
u8 div;
 
+   /* avoid division by zero */
+   if (parent_rate == 0 || rate == 0)
+   return -EINVAL;
+
/* CLKIN within range of PLL input, feed directly to PLL. */
if (parent_rate <= 5000) {
regmap_update_bits(vc5->regmap, VC5_VCO_CTRL_AND_PREDIV,
@@ -429,6 +440,10 @@ static long vc5_pll_round_rate(struct clk_hw *hw, unsigned 
long rate,
u32 div_int;
u64 div_frc;
 
+   /* avoid division by zero */
+   if (*parent_rate == 0 || rate == 0)
+   return 0;
+
if (rate < VC5_PLL_VCO_MIN)
rate = VC5_PLL_VCO_MIN;
if (rate > VC5_PLL_VCO_MAX)
@@ -457,6 +472,9 @@ static int vc5_pll_set_rate(struct clk_hw *hw, unsigned 
long rate,
struct vc5_driver_data *vc5 = hwdata->vc5;
u8 fb[5];
 
+   if (parent_rate == 0 || rate == 0)
+   return -EINVAL;
+
fb[0] = hwdata->div_int >> 4;
fb[1] = hwdata->div_int << 4;
fb[2] = hwdata->div_frc >> 16;
@@ -509,6 +527,10 @@ static long vc5_fod_round_rate(struct clk_hw *hw, unsigned 
long rate,
u32 div_int;
u64 div_frc;
 
+   /* avoid division by zero */
+   if (*parent_rate == 0 || rate == 0)
+   return 0;
+
/* Determine integer part, which is 12 bit wide */
div_int = f_in / rate;
/*
@@ -546,6 +568,9 @@ static int vc5_fod_set_rate(struct clk_hw *hw, unsigned 
long rate,
0
};
 
+   if (parent_rate == 0 || rate == 0)
+   return -EINVAL;
+
regmap_bulk_write(vc5->regmap, VC5_OUT_DIV_FRAC(hwdata->num, 0),
  data, 14);
 
-- 
2.7.4



[PATCH] memcg: Replace mm->owner with mm->memcg

2018-05-31 Thread Eric W. Biederman


Recently Kiril Tkhai reported that mm_update_next_ownerwas executing
it's expensive for_each_process fallback.  Reading the code reveals
that all that is needed to trigger this is a multi-threaded process
where the thread group leader exits.  AKA something anyone can easily
trigger an any time.

Walking all of the processes in the system is quite expensive so
the current mm_update_next_owner needs to be killed.

To deal with this replace mm->owner with mm->memcg.  This removes
an unnecessary hop when dealing with mm and the memory control group.
As much as possible I have maintained the current semantics.  There
are two siginificant exceptions.  During fork the memcg of
the process calling fork is charged rather than init_css_set.  During
memory cgroup migration the charges are migrated not if the process
is the owner of the mm, but of the process being migrated has
the same memory cgroup as the mm.

I believe it was a bug if init_css_set is charged for memory activity
during fork, and the old behavior was simply a consequence of the new
task not having tsk->cgroup not initialized to it's proper cgroup.

On a previous reading of mem_cgroup_can_attach I thought it limited
task migration between memory cgroups, but the only limit it applies
is are if the target memory cgroup can not be precharged with the
charges of the mm being moved.  Which means that today the memory
cgroup code is supporting processes that share an mm with CLONE_VM
being in different memory cgroups as well as threads of a
multi-threaded process being in different memory cgroups.

There is a minor difference with charge migration.  Instead of moving
the charges exactly when the mm->owner is moved, charges are now moved
when any thread group leader that uses the mm and shares the mm memcg
is moved.  It requires playing games with vfork or CLONE_VM to setup a
situation with multiple thread group leaders sharing the same mm.  As
vfork only lasts a moment and the old LinuxThreads library appears out
of use, I do not expect there to be many occasions for there even to
be a chance for the difference in charge migration to be visible.

To ensure that the mm->memcg is updated appropriate I have implemented
the cgroup "attach" and "fork" methods and a special
mm_sync_memcg_from_task function.  These methods and the new function
ensure that at task migration, after fork, and after exec the task
has the appropriate memory cgroup.

The need to call something in exec is unfortunate and likely indicates
a defect in the way control groups and exec interact.  As migration
during exec is expected to be rare this interaction can wait to be
improved.

For simplicity instead of introducing a new mm lock I simply use
exchange on the pointer where the mm->memcg is updated to get atomic
updates.

The way the memory cgroup of an mm is reassigned has changed
significantly.  Instead of mm_update_next_owner assiging a new owner
and potential changing the memcg of the mm when a task exits, the
memcg of an mm remains the same unless a task using the memcg migrates
or the memcg is brought offline.  If the memcg is brought offline (AKA
rmdir on the memory cgroups directory in cgroupfs) then the next call
of mm_cgroup_from_mm will reassign mm->memcg to the first ancestor
memory cgroup that is still online, and then return that memory cgroup.

By walking returning the first ancestor memory cgroup that is online
this ensures a process that has been delegated a memory cgroup subtree
can not escape from the delegation point.  As every living thread of a
process will remain in some memory cgroup at or below the delegation
point, it is guaranteed at least the delegation point will remain
online if the mm is in use.  Thus this does not provide a memory
cgroup escape.

Looking at the history effectively this change is a revert.  The
reason given for adding mm->owner is so that multiple cgroups can be
attached to the same mm.  In the last 8 years a second user of
mm->owner has not appeared.  A feature that has never used, makes the
code more complicated and has horrible worst case performance should
go.

Fixes: cf475ad28ac3 ("cgroups: add an owner to the mm_struct")
Reported-by: Kirill Tkhai 
Signed-off-by: "Eric W. Biederman" 
---

I believe this is a correct patch but folks please look it over and see
if I am missing something.

 fs/exec.c  |   3 +-
 include/linux/memcontrol.h |  16 --
 include/linux/mm_types.h   |  12 +
 include/linux/sched/mm.h   |   8 ---
 kernel/exit.c  |  89 ---
 kernel/fork.c  |  14 +++--
 mm/debug.c |   4 +-
 mm/memcontrol.c| 130 +
 8 files changed, 126 insertions(+), 150 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 183059c427b9..32461a1543fc 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1040,11 +1040,12 @@ static int exec_mmap(struct mm_struct *mm)
up_read(_mm->mmap_sem);
  

[PATCH] memcg: Replace mm->owner with mm->memcg

2018-05-31 Thread Eric W. Biederman


Recently Kiril Tkhai reported that mm_update_next_ownerwas executing
it's expensive for_each_process fallback.  Reading the code reveals
that all that is needed to trigger this is a multi-threaded process
where the thread group leader exits.  AKA something anyone can easily
trigger an any time.

Walking all of the processes in the system is quite expensive so
the current mm_update_next_owner needs to be killed.

To deal with this replace mm->owner with mm->memcg.  This removes
an unnecessary hop when dealing with mm and the memory control group.
As much as possible I have maintained the current semantics.  There
are two siginificant exceptions.  During fork the memcg of
the process calling fork is charged rather than init_css_set.  During
memory cgroup migration the charges are migrated not if the process
is the owner of the mm, but of the process being migrated has
the same memory cgroup as the mm.

I believe it was a bug if init_css_set is charged for memory activity
during fork, and the old behavior was simply a consequence of the new
task not having tsk->cgroup not initialized to it's proper cgroup.

On a previous reading of mem_cgroup_can_attach I thought it limited
task migration between memory cgroups, but the only limit it applies
is are if the target memory cgroup can not be precharged with the
charges of the mm being moved.  Which means that today the memory
cgroup code is supporting processes that share an mm with CLONE_VM
being in different memory cgroups as well as threads of a
multi-threaded process being in different memory cgroups.

There is a minor difference with charge migration.  Instead of moving
the charges exactly when the mm->owner is moved, charges are now moved
when any thread group leader that uses the mm and shares the mm memcg
is moved.  It requires playing games with vfork or CLONE_VM to setup a
situation with multiple thread group leaders sharing the same mm.  As
vfork only lasts a moment and the old LinuxThreads library appears out
of use, I do not expect there to be many occasions for there even to
be a chance for the difference in charge migration to be visible.

To ensure that the mm->memcg is updated appropriate I have implemented
the cgroup "attach" and "fork" methods and a special
mm_sync_memcg_from_task function.  These methods and the new function
ensure that at task migration, after fork, and after exec the task
has the appropriate memory cgroup.

The need to call something in exec is unfortunate and likely indicates
a defect in the way control groups and exec interact.  As migration
during exec is expected to be rare this interaction can wait to be
improved.

For simplicity instead of introducing a new mm lock I simply use
exchange on the pointer where the mm->memcg is updated to get atomic
updates.

The way the memory cgroup of an mm is reassigned has changed
significantly.  Instead of mm_update_next_owner assiging a new owner
and potential changing the memcg of the mm when a task exits, the
memcg of an mm remains the same unless a task using the memcg migrates
or the memcg is brought offline.  If the memcg is brought offline (AKA
rmdir on the memory cgroups directory in cgroupfs) then the next call
of mm_cgroup_from_mm will reassign mm->memcg to the first ancestor
memory cgroup that is still online, and then return that memory cgroup.

By walking returning the first ancestor memory cgroup that is online
this ensures a process that has been delegated a memory cgroup subtree
can not escape from the delegation point.  As every living thread of a
process will remain in some memory cgroup at or below the delegation
point, it is guaranteed at least the delegation point will remain
online if the mm is in use.  Thus this does not provide a memory
cgroup escape.

Looking at the history effectively this change is a revert.  The
reason given for adding mm->owner is so that multiple cgroups can be
attached to the same mm.  In the last 8 years a second user of
mm->owner has not appeared.  A feature that has never used, makes the
code more complicated and has horrible worst case performance should
go.

Fixes: cf475ad28ac3 ("cgroups: add an owner to the mm_struct")
Reported-by: Kirill Tkhai 
Signed-off-by: "Eric W. Biederman" 
---

I believe this is a correct patch but folks please look it over and see
if I am missing something.

 fs/exec.c  |   3 +-
 include/linux/memcontrol.h |  16 --
 include/linux/mm_types.h   |  12 +
 include/linux/sched/mm.h   |   8 ---
 kernel/exit.c  |  89 ---
 kernel/fork.c  |  14 +++--
 mm/debug.c |   4 +-
 mm/memcontrol.c| 130 +
 8 files changed, 126 insertions(+), 150 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 183059c427b9..32461a1543fc 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1040,11 +1040,12 @@ static int exec_mmap(struct mm_struct *mm)
up_read(_mm->mmap_sem);
  

Re: linux-next: build warnings after merge of the kbuild tree

2018-05-31 Thread Masahiro Yamada
2018-05-31 12:53 GMT+09:00 Kees Cook :
> On Wed, May 30, 2018 at 6:26 PM, Kees Cook  wrote:
>> On Wed, May 30, 2018 at 6:12 PM, Masahiro Yamada
>>  wrote:
>>> Hi.
>>> (+CC Kees)
>>>
>>> 2018-05-31 7:40 GMT+09:00 Stephen Rothwell :
 Hi Masahiro,

 After merging the kbuild tree, today's linux-next build (x86_64
 allmodconfig) produced these warnings:

 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
 'wlc_phy_workarounds_nphy_rev7':
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16563:1: 
 warning: the frame size of 3136 bytes is larger than 2048 bytes 
 [-Wframe-larger-than=]
  }
  ^
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
 'wlc_phy_workarounds_nphy_rev3':
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16905:1: 
 warning: the frame size of 2872 bytes is larger than 2048 bytes 
 [-Wframe-larger-than=]
  }
  ^
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
 'wlc_phy_cal_txiqlo_nphy':
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:26033:1: 
 warning: the frame size of 2432 bytes is larger than 2048 bytes 
 [-Wframe-larger-than=]
  }
  ^

 I have no idea what caused these warnings to appear ... nothing in those
 functions looks too bad.
>>>
>>>
>>> This has been triggered by the following commit:
>>>
>>>
>>> commit 0e461945f3504e09b8ecf947b6398adce1287a28
>>> Author: Masahiro Yamada 
>>> Date:   Mon May 28 18:22:07 2018 +0900
>>>
>>> gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST
>>>
>>>
>>>
>>> CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL was previously disabled
>>> for COMPILE_TEST, which is now enabled.
>>
>> Weird -- I do build tests with plugins enabled pretty regularly. I
>> hadn't seen this before. I'll see if I can figure out what the
>> combination is...
>
> Weirdly, I only see this after merging kbuild/for-next into
> next-20180530. (I don't get the warning if I just force the plugins
> on.)


I see this warning on Linus' tree as well
if the plugins are enabled.


Just remove "depends on !COMPILE_TEST", and try allmodconfig.






masahiro@grover:~/ref/linux$ git show --pretty=short
commit 0512e0134582ef85dee77d51aae77dcd1edec495
Merge: dd52cb8 829bc787
Author: Linus Torvalds 

Merge tag 'xfs-4.17-fixes-3' of
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

masahiro@grover:~/ref/linux$ git diff
diff --git a/arch/Kconfig b/arch/Kconfig
index 75dd23a..7d44cfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -410,7 +410,6 @@ config HAVE_GCC_PLUGINS
 menuconfig GCC_PLUGINS
bool "GCC plugins"
depends on HAVE_GCC_PLUGINS
-   depends on !COMPILE_TEST
help
  GCC plugins are loadable modules that provide extra features to the
  compiler. They are useful for runtime instrumentation and
static analysis.
masahiro@grover:~/ref/linux$ make allmodconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  YACCscripts/kconfig/zconf.tab.c
  LEX scripts/kconfig/zconf.lex.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --allmodconfig Kconfig
#
# configuration written to .config
#
masahiro@grover:~/ref/linux$ make
drivers/net/wireless/broadcom/brcm80211/brcmsmac/
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
  HYPERCALLS arch/x86/include/generated/asm/xen-hypercalls.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  HOSTCC  scripts/basic/bin2c
  HOSTCC  arch/x86/tools/relocs_32.o
  HOSTCC  arch/x86/tools/relocs_64.o
  HOSTCC  arch/x86/tools/relocs_common.o
  HOSTLD  arch/x86/tools/relocs
  CHK include/config/kernel.release
  UPD include/config/kernel.release
  WRAParch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAParch/x86/include/generated/uapi/asm/poll.h
  WRAParch/x86/include/generated/asm/dma-contiguous.h
  WRAParch/x86/include/generated/asm/early_ioremap.h
  WRAParch/x86/include/generated/asm/mcs_spinlock.h
  WRAParch/x86/include/generated/asm/mm-arch-hooks.h
  CHK include/generated/uapi/linux/version.h
  UPD include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  UPD include/generated/utsrelease.h
  CC  arch/x86/purgatory/purgatory.o
  AS  arch/x86/purgatory/stack.o
  AS  arch/x86/purgatory/setup-x86_64.o
  CC  arch/x86/purgatory/sha256.o
  AS  arch/x86/purgatory/entry64.o
  CC  arch/x86/purgatory/string.o
  LD  arch/x86/purgatory/purgatory.ro
  BIN2C   arch/x86/purgatory/kexec-purgatory.c
  HOSTCXX 

Re: linux-next: build warnings after merge of the kbuild tree

2018-05-31 Thread Masahiro Yamada
2018-05-31 12:53 GMT+09:00 Kees Cook :
> On Wed, May 30, 2018 at 6:26 PM, Kees Cook  wrote:
>> On Wed, May 30, 2018 at 6:12 PM, Masahiro Yamada
>>  wrote:
>>> Hi.
>>> (+CC Kees)
>>>
>>> 2018-05-31 7:40 GMT+09:00 Stephen Rothwell :
 Hi Masahiro,

 After merging the kbuild tree, today's linux-next build (x86_64
 allmodconfig) produced these warnings:

 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
 'wlc_phy_workarounds_nphy_rev7':
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16563:1: 
 warning: the frame size of 3136 bytes is larger than 2048 bytes 
 [-Wframe-larger-than=]
  }
  ^
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
 'wlc_phy_workarounds_nphy_rev3':
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16905:1: 
 warning: the frame size of 2872 bytes is larger than 2048 bytes 
 [-Wframe-larger-than=]
  }
  ^
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
 'wlc_phy_cal_txiqlo_nphy':
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:26033:1: 
 warning: the frame size of 2432 bytes is larger than 2048 bytes 
 [-Wframe-larger-than=]
  }
  ^

 I have no idea what caused these warnings to appear ... nothing in those
 functions looks too bad.
>>>
>>>
>>> This has been triggered by the following commit:
>>>
>>>
>>> commit 0e461945f3504e09b8ecf947b6398adce1287a28
>>> Author: Masahiro Yamada 
>>> Date:   Mon May 28 18:22:07 2018 +0900
>>>
>>> gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST
>>>
>>>
>>>
>>> CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL was previously disabled
>>> for COMPILE_TEST, which is now enabled.
>>
>> Weird -- I do build tests with plugins enabled pretty regularly. I
>> hadn't seen this before. I'll see if I can figure out what the
>> combination is...
>
> Weirdly, I only see this after merging kbuild/for-next into
> next-20180530. (I don't get the warning if I just force the plugins
> on.)


I see this warning on Linus' tree as well
if the plugins are enabled.


Just remove "depends on !COMPILE_TEST", and try allmodconfig.






masahiro@grover:~/ref/linux$ git show --pretty=short
commit 0512e0134582ef85dee77d51aae77dcd1edec495
Merge: dd52cb8 829bc787
Author: Linus Torvalds 

Merge tag 'xfs-4.17-fixes-3' of
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

masahiro@grover:~/ref/linux$ git diff
diff --git a/arch/Kconfig b/arch/Kconfig
index 75dd23a..7d44cfe 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -410,7 +410,6 @@ config HAVE_GCC_PLUGINS
 menuconfig GCC_PLUGINS
bool "GCC plugins"
depends on HAVE_GCC_PLUGINS
-   depends on !COMPILE_TEST
help
  GCC plugins are loadable modules that provide extra features to the
  compiler. They are useful for runtime instrumentation and
static analysis.
masahiro@grover:~/ref/linux$ make allmodconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  YACCscripts/kconfig/zconf.tab.c
  LEX scripts/kconfig/zconf.lex.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --allmodconfig Kconfig
#
# configuration written to .config
#
masahiro@grover:~/ref/linux$ make
drivers/net/wireless/broadcom/brcm80211/brcmsmac/
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_64_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_64.h
  HYPERCALLS arch/x86/include/generated/asm/xen-hypercalls.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  HOSTCC  scripts/basic/bin2c
  HOSTCC  arch/x86/tools/relocs_32.o
  HOSTCC  arch/x86/tools/relocs_64.o
  HOSTCC  arch/x86/tools/relocs_common.o
  HOSTLD  arch/x86/tools/relocs
  CHK include/config/kernel.release
  UPD include/config/kernel.release
  WRAParch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAParch/x86/include/generated/uapi/asm/poll.h
  WRAParch/x86/include/generated/asm/dma-contiguous.h
  WRAParch/x86/include/generated/asm/early_ioremap.h
  WRAParch/x86/include/generated/asm/mcs_spinlock.h
  WRAParch/x86/include/generated/asm/mm-arch-hooks.h
  CHK include/generated/uapi/linux/version.h
  UPD include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  UPD include/generated/utsrelease.h
  CC  arch/x86/purgatory/purgatory.o
  AS  arch/x86/purgatory/stack.o
  AS  arch/x86/purgatory/setup-x86_64.o
  CC  arch/x86/purgatory/sha256.o
  AS  arch/x86/purgatory/entry64.o
  CC  arch/x86/purgatory/string.o
  LD  arch/x86/purgatory/purgatory.ro
  BIN2C   arch/x86/purgatory/kexec-purgatory.c
  HOSTCXX 

linux-next: manual merge of the vfs tree with the ext3 tree

2018-05-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the vfs tree got a conflict in:

  include/linux/fs.h

between commit:

  29aca8b3f7cd ("fsnotify: introduce prototype struct fsnotify_obj")

from the ext3 tree and commit:

  d9a08a9e616b ("fs: Add aio iopriority support")

from the vfs tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/fs.h
index d1f3d572349f,482563fe549c..
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@@ -36,7 -36,7 +36,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  
  #include 
  #include 
@@@ -1717,9 -1713,10 +1719,11 @@@ struct file_operations 
int (*iterate) (struct file *, struct dir_context *);
int (*iterate_shared) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
+   struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+   __poll_t (*poll_mask) (struct file *, __poll_t);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 +  int (*pre_mmap) (struct file *, unsigned long, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
unsigned long mmap_supported_flags;
int (*open) (struct inode *, struct file *);


pgpp0qioXa778.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the vfs tree with the ext3 tree

2018-05-31 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the vfs tree got a conflict in:

  include/linux/fs.h

between commit:

  29aca8b3f7cd ("fsnotify: introduce prototype struct fsnotify_obj")

from the ext3 tree and commit:

  d9a08a9e616b ("fs: Add aio iopriority support")

from the vfs tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/fs.h
index d1f3d572349f,482563fe549c..
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@@ -36,7 -36,7 +36,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  
  #include 
  #include 
@@@ -1717,9 -1713,10 +1719,11 @@@ struct file_operations 
int (*iterate) (struct file *, struct dir_context *);
int (*iterate_shared) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
+   struct wait_queue_head * (*get_poll_head)(struct file *, __poll_t);
+   __poll_t (*poll_mask) (struct file *, __poll_t);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 +  int (*pre_mmap) (struct file *, unsigned long, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
unsigned long mmap_supported_flags;
int (*open) (struct inode *, struct file *);


pgpp0qioXa778.pgp
Description: OpenPGP digital signature


Re: [PATCH v2 0/3] perf script python: Add more PMU fields

2018-05-31 Thread Andi Kleen
On Fri, Jun 01, 2018 at 05:01:00PM +0800, Jin Yao wrote:
> When doing pmu sampling and then running a script with
> perf script -s script.py, the process_event function gets
> dictionary with some fields from the perf ring buffer
> (like ip, sym, callchain etc).
> 
> But we miss quite a few fields we report now, for example,
> LBRs,data source,weight,transaction,iregs,uregs,and etc.
> 
> This patch adds these fields for perf script python.

Patches look good to me.

Reviewed-by: Andi Kleen 

-Andi


Re: [PATCH v2 0/3] perf script python: Add more PMU fields

2018-05-31 Thread Andi Kleen
On Fri, Jun 01, 2018 at 05:01:00PM +0800, Jin Yao wrote:
> When doing pmu sampling and then running a script with
> perf script -s script.py, the process_event function gets
> dictionary with some fields from the perf ring buffer
> (like ip, sym, callchain etc).
> 
> But we miss quite a few fields we report now, for example,
> LBRs,data source,weight,transaction,iregs,uregs,and etc.
> 
> This patch adds these fields for perf script python.

Patches look good to me.

Reviewed-by: Andi Kleen 

-Andi


RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-31 Thread Hoeun Ryu
I appreciate the detailed correction.
I will reflect the corrections in the next patch.
plus, the explanation in the code will be fixed.

> -Original Message-
> From: Petr Mladek [mailto:pmla...@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky 
> Cc: Hoeun Ryu ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 
> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 
> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr



RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-31 Thread Hoeun Ryu
I appreciate the detailed correction.
I will reflect the corrections in the next patch.
plus, the explanation in the code will be fixed.

> -Original Message-
> From: Petr Mladek [mailto:pmla...@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky 
> Cc: Hoeun Ryu ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 
> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 
> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr



Re: [PATCH 20/32] vfs: Make close() unmount the attached mount if so flagged [ver #8]

2018-05-31 Thread Al Viro
On Thu, May 31, 2018 at 08:19:55PM +0100, Al Viro wrote:
> On Fri, May 25, 2018 at 01:07:34AM +0100, David Howells wrote:
> > +   if (unlikely(file->f_mode & FMODE_NEED_UNMOUNT))
> > +   __detach_mounts(dentry);
> > +
> 
> This is completely wrong.  First of all, you want to dissolve the mount tree
> on file->f_path.mount, not every tree rooted at dentry equal to 
> file->f_path.dentry.
> This is easily done - it would be a simple call of drop_collected_mounts(mnt)
> if not for one detail.  You want it to happen only if the sucker isn't 
> attached
> anywhere by that point.  IOW,
> namespace_lock();
> lock_mount_hash();
>   if (!real_mount(mnt)->mnt_ns)
>   umount_tree(real_mount(mnt), UMOUNT_SYNC);
> unlock_mount_hash();
> namespace_unlock();
> and that's it.  You don't need that magical mystery turd in move_mount() later
> in the series and all the infrastructure you grow for it.
> 
> FWIW, I would've suggested this
>  void drop_collected_mounts(struct vfsmount *mnt)
>  {
>   namespace_lock();
>   lock_mount_hash();
> + if (!real_mount(mnt)->mnt_ns)
> + umount_tree(real_mount(mnt), UMOUNT_SYNC);
> - umount_tree(real_mount(mnt), UMOUNT_SYNC);
>   unlock_mount_hash();
>   namespace_unlock();
>  }
> 
> and in __fput()
>   if (unlikely(file->f_mode & FMODE_NEED_UNMOUNT))
>   drop_collected_mounts(mnt);
> 
> All there is to it, AFAICS...

... except that it should be a separate primitive - drop_collected_mounts() is
used put_mnt_ns(), where the root definitely has non-NULL ->mnt_ns.

Another thing: the same issue (misuse of __detach_mounts()) exists in cleanup
path of do_o_path().  What's more, doing it there is pointless - if
do_dentry_open() has set FMODE_NEED_UNMOUNT, it either succeeds or calls fput()
itself.  Either way, the caller should *not* do the cleanups done by fput().

Another thing: copy_mount_for_o_path() is bogus.  Horrible calling conventions
aside, what the hell is that lock_mount() for?  In do_loopback() we lock the
*mountpoint*; here the source gets locked, for no visible reason.  What we
should do is something like this:

1) common helper -

static struct mount *__do_loopback(struct path *from, bool recurse)
{
struct mount *mnt = ERR_PTR(-EINVAL), *f = real_mount(from->mnt); 

if (IS_MNT_UNBINDABLE(f))
return mnt;

if (!check_mnt(f) && from->dentry->d_op != _dentry_operations)
return mnt;

if (!recurse && has_locked_children(f, from->dentry))
return mnt;

if (recurse)
mnt = copy_tree(f, from->dentry, CL_COPY_MNT_NS_FILE);
else
mnt = clone_mnt(f, from->dentry, 0);
if (!IS_ERR(mnt))
mnt->mnt.mnt_flags &= ~MNT_LOCKED;
return mnt;
}

2) in do_loopback() we are left with

static int do_loopback(struct path *path, const char *old_name,
int recurse)
{
struct path old_path;
struct mount *mnt, *parent;
struct mountpoint *mp;
int err;
if (!old_name || !*old_name)
return -EINVAL;
err = kern_path(old_name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, _path);
if (err)
return err;

err = -EINVAL;
if (mnt_ns_loop(old_path.dentry))
goto out;

mp = lock_mount(path);
if (IS_ERR(mp)) {
err = PTR_ERR(mp);
goto out;
}

parent = real_mount(path->mnt);
if (!check_mnt(parent))
goto out2;

mnt = __do_loopback(_path, recurse);
if (IS_ERR(mnt)) {
err = PTR_ERR(mnt);
goto out2;
}

err = graft_tree(mnt, parent, mp);
if (err) {
lock_mount_hash();
umount_tree(mnt, UMOUNT_SYNC);
unlock_mount_hash();
}
out2:
unlock_mount(mp);
out:
path_put(_path);
return err;
}

3) copy_mount_for_o_path() with saner calling conventions:

int copy_mount_for_o_path(struct path *path, bool recurse)
{
struct mount *mnt = __do_loopback(path, recurse);
if (IS_ERR(mnt)) {
path_put(path);
return PTR_ERR(mnt);
}
mntput(path->mnt);
path->mnt = >mnt;
return 0;
}

4) in do_o_path():
static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file)
{
struct path path;
int error = path_lookupat(nd, flags, );
if (error)
return error;

if (file->f_flags & O_CLONE_MOUNT) {
error = copy_mount_for_o_path(,
!(file->f_flags & O_NON_RECURSIVE));
if (error < 0)
return error;
}

audit_inode(nd->name, path.dentry, 0);
error = vfs_open(, file, current_cred());
path_put();
 

Re: [PATCH 20/32] vfs: Make close() unmount the attached mount if so flagged [ver #8]

2018-05-31 Thread Al Viro
On Thu, May 31, 2018 at 08:19:55PM +0100, Al Viro wrote:
> On Fri, May 25, 2018 at 01:07:34AM +0100, David Howells wrote:
> > +   if (unlikely(file->f_mode & FMODE_NEED_UNMOUNT))
> > +   __detach_mounts(dentry);
> > +
> 
> This is completely wrong.  First of all, you want to dissolve the mount tree
> on file->f_path.mount, not every tree rooted at dentry equal to 
> file->f_path.dentry.
> This is easily done - it would be a simple call of drop_collected_mounts(mnt)
> if not for one detail.  You want it to happen only if the sucker isn't 
> attached
> anywhere by that point.  IOW,
> namespace_lock();
> lock_mount_hash();
>   if (!real_mount(mnt)->mnt_ns)
>   umount_tree(real_mount(mnt), UMOUNT_SYNC);
> unlock_mount_hash();
> namespace_unlock();
> and that's it.  You don't need that magical mystery turd in move_mount() later
> in the series and all the infrastructure you grow for it.
> 
> FWIW, I would've suggested this
>  void drop_collected_mounts(struct vfsmount *mnt)
>  {
>   namespace_lock();
>   lock_mount_hash();
> + if (!real_mount(mnt)->mnt_ns)
> + umount_tree(real_mount(mnt), UMOUNT_SYNC);
> - umount_tree(real_mount(mnt), UMOUNT_SYNC);
>   unlock_mount_hash();
>   namespace_unlock();
>  }
> 
> and in __fput()
>   if (unlikely(file->f_mode & FMODE_NEED_UNMOUNT))
>   drop_collected_mounts(mnt);
> 
> All there is to it, AFAICS...

... except that it should be a separate primitive - drop_collected_mounts() is
used put_mnt_ns(), where the root definitely has non-NULL ->mnt_ns.

Another thing: the same issue (misuse of __detach_mounts()) exists in cleanup
path of do_o_path().  What's more, doing it there is pointless - if
do_dentry_open() has set FMODE_NEED_UNMOUNT, it either succeeds or calls fput()
itself.  Either way, the caller should *not* do the cleanups done by fput().

Another thing: copy_mount_for_o_path() is bogus.  Horrible calling conventions
aside, what the hell is that lock_mount() for?  In do_loopback() we lock the
*mountpoint*; here the source gets locked, for no visible reason.  What we
should do is something like this:

1) common helper -

static struct mount *__do_loopback(struct path *from, bool recurse)
{
struct mount *mnt = ERR_PTR(-EINVAL), *f = real_mount(from->mnt); 

if (IS_MNT_UNBINDABLE(f))
return mnt;

if (!check_mnt(f) && from->dentry->d_op != _dentry_operations)
return mnt;

if (!recurse && has_locked_children(f, from->dentry))
return mnt;

if (recurse)
mnt = copy_tree(f, from->dentry, CL_COPY_MNT_NS_FILE);
else
mnt = clone_mnt(f, from->dentry, 0);
if (!IS_ERR(mnt))
mnt->mnt.mnt_flags &= ~MNT_LOCKED;
return mnt;
}

2) in do_loopback() we are left with

static int do_loopback(struct path *path, const char *old_name,
int recurse)
{
struct path old_path;
struct mount *mnt, *parent;
struct mountpoint *mp;
int err;
if (!old_name || !*old_name)
return -EINVAL;
err = kern_path(old_name, LOOKUP_FOLLOW|LOOKUP_AUTOMOUNT, _path);
if (err)
return err;

err = -EINVAL;
if (mnt_ns_loop(old_path.dentry))
goto out;

mp = lock_mount(path);
if (IS_ERR(mp)) {
err = PTR_ERR(mp);
goto out;
}

parent = real_mount(path->mnt);
if (!check_mnt(parent))
goto out2;

mnt = __do_loopback(_path, recurse);
if (IS_ERR(mnt)) {
err = PTR_ERR(mnt);
goto out2;
}

err = graft_tree(mnt, parent, mp);
if (err) {
lock_mount_hash();
umount_tree(mnt, UMOUNT_SYNC);
unlock_mount_hash();
}
out2:
unlock_mount(mp);
out:
path_put(_path);
return err;
}

3) copy_mount_for_o_path() with saner calling conventions:

int copy_mount_for_o_path(struct path *path, bool recurse)
{
struct mount *mnt = __do_loopback(path, recurse);
if (IS_ERR(mnt)) {
path_put(path);
return PTR_ERR(mnt);
}
mntput(path->mnt);
path->mnt = >mnt;
return 0;
}

4) in do_o_path():
static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file)
{
struct path path;
int error = path_lookupat(nd, flags, );
if (error)
return error;

if (file->f_flags & O_CLONE_MOUNT) {
error = copy_mount_for_o_path(,
!(file->f_flags & O_NON_RECURSIVE));
if (error < 0)
return error;
}

audit_inode(nd->name, path.dentry, 0);
error = vfs_open(, file, current_cred());
path_put();
 

linux-next: manual merge of the vfs tree with Linus' tree

2018-05-31 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the vfs tree got a conflict in:

  fs/afs/fsclient.c

between commit:

  684b0f68cf1c ("afs: Fix AFSFetchStatus decoder to provide OpenAFS 
compatibility")

from Linus' tree and commit:

  c875c76a061d ("afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()")

from the vfs tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/afs/fsclient.c
index b273e1d60478,b695d9e16c65..
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@@ -138,37 -137,14 +138,31 @@@ static int xdr_decode_AFSFetchStatus(st
u64 data_version, size;
u32 type, abort_code;
u8 flags = 0;
-   int ret;
- 
-   if (vnode)
-   write_seqlock(>cb_lock);
  
 +  abort_code = ntohl(xdr->abort_code);
 +
if (xdr->if_version != htonl(AFS_FSTATUS_VERSION)) {
 +  if (xdr->if_version == htonl(0) &&
 +  abort_code != 0 &&
 +  inline_error) {
 +  /* The OpenAFS fileserver has a bug in 
FS.InlineBulkStatus
 +   * whereby it doesn't set the interface version in the 
error
 +   * case.
 +   */
 +  status->abort_code = abort_code;
-   ret = 0;
-   goto out;
++  return 0;
 +  }
 +
pr_warn("Unknown AFSFetchStatus version %u\n", 
ntohl(xdr->if_version));
goto bad;
}
  
 +  if (abort_code != 0 && inline_error) {
 +  status->abort_code = abort_code;
-   ret = 0;
-   goto out;
++  return 0;
 +  }
 +
type = ntohl(xdr->type);
 -  abort_code = ntohl(xdr->abort_code);
switch (type) {
case AFS_FTYPE_FILE:
case AFS_FTYPE_DIR:


pgp6oAILUkEmO.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the vfs tree with Linus' tree

2018-05-31 Thread Stephen Rothwell
Hi Al,

Today's linux-next merge of the vfs tree got a conflict in:

  fs/afs/fsclient.c

between commit:

  684b0f68cf1c ("afs: Fix AFSFetchStatus decoder to provide OpenAFS 
compatibility")

from Linus' tree and commit:

  c875c76a061d ("afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()")

from the vfs tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/afs/fsclient.c
index b273e1d60478,b695d9e16c65..
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@@ -138,37 -137,14 +138,31 @@@ static int xdr_decode_AFSFetchStatus(st
u64 data_version, size;
u32 type, abort_code;
u8 flags = 0;
-   int ret;
- 
-   if (vnode)
-   write_seqlock(>cb_lock);
  
 +  abort_code = ntohl(xdr->abort_code);
 +
if (xdr->if_version != htonl(AFS_FSTATUS_VERSION)) {
 +  if (xdr->if_version == htonl(0) &&
 +  abort_code != 0 &&
 +  inline_error) {
 +  /* The OpenAFS fileserver has a bug in 
FS.InlineBulkStatus
 +   * whereby it doesn't set the interface version in the 
error
 +   * case.
 +   */
 +  status->abort_code = abort_code;
-   ret = 0;
-   goto out;
++  return 0;
 +  }
 +
pr_warn("Unknown AFSFetchStatus version %u\n", 
ntohl(xdr->if_version));
goto bad;
}
  
 +  if (abort_code != 0 && inline_error) {
 +  status->abort_code = abort_code;
-   ret = 0;
-   goto out;
++  return 0;
 +  }
 +
type = ntohl(xdr->type);
 -  abort_code = ntohl(xdr->abort_code);
switch (type) {
case AFS_FTYPE_FILE:
case AFS_FTYPE_DIR:


pgp6oAILUkEmO.pgp
Description: OpenPGP digital signature


Re: [PATCH] clk: vc5: Fix div-by-0 when rounding a rate of zero

2018-05-31 Thread Steve Longerbeam




On 05/31/2018 12:37 PM, Marek Vasut wrote:

On 05/31/2018 08:52 PM, Steve Longerbeam wrote:


On 05/31/2018 11:35 AM, Marek Vasut wrote:

On 05/31/2018 08:32 PM, Steve Longerbeam wrote:

Hi Marek,


On 05/31/2018 11:25 AM, Marek Vasut wrote:

On 05/31/2018 08:23 PM, Steve Longerbeam wrote:

Just return zero for a rounded rate if requested rate is zero.

This was caught by CONFIG_UBSAN:

[  192.266748] UBSAN: Undefined behaviour in
drivers/clk/clk-versaclock5.c:513:17
[  192.274050] division by zero
[  192.276976] CPU: 0 PID: 2579 Comm: vsp-unit-test-0 Tainted: G
B   WC  4.14.17-02752-g13fb96f #1
[  192.286378] Hardware name: Renesas Salvator-X board based on
r8a7795 ES2.0+ (DT)
[  192.293852] Call trace:
[  192.296343] [] dump_backtrace+0x0/0x390
[  192.301807] [] show_stack+0x14/0x1c
[  192.306920] [] dump_stack+0x134/0x1a8
[  192.312213] [] ubsan_epilogue+0x14/0x60
[  192.317677] []
__ubsan_handle_divrem_overflow+0x11c/0x170
[  192.324720] [] vc5_fod_round_rate+0x68/0x148
[  192.330620] [] clk_calc_new_rates+0x238/0x3fc
[  192.336607] [] clk_calc_new_rates+0x29c/0x3fc
[  192.342595] []
clk_core_set_rate_nolock+0x48/0x11c
[  192.349019] [] clk_set_rate+0x34/0x4c
[  192.354307] [] rcar_du_pm_suspend+0x274/0x2f4
[  192.360297] [] platform_pm_suspend+0x78/0xb8
[  192.366198] [] dpm_run_callback+0x584/0xa18
[  192.372010] [] __device_suspend+0x1a8/0x534
[  192.377822] [] dpm_suspend+0x130/0xea0
[  192.383197] [] dpm_suspend_start+0x130/0x138
[  192.389099] []
suspend_devices_and_enter+0xf0/0x1778
[  192.395700] [] pm_suspend+0x2408/0x245c
[  192.401162] [] state_store+0xf0/0x130
[  192.406451] [] kobj_attr_store+0x5c/0x6c
[  192.412002] [] sysfs_kf_write+0xe8/0xfc
[  192.417466] [] kernfs_fop_write+0x22c/0x2e4
[  192.423281] [] __vfs_write+0x104/0x34c
[  192.428656] [] vfs_write+0x134/0x2d8
[  192.433857] [] SyS_write+0xbc/0x12c
[  192.438967] Exception stack(0x8006cd1cfec0 to
0x8006cd1d)
[  192.445480] fec0: 0001 1e303f00
0004 959a5000
[  192.453397] fee0:  000155510004
0003 006d
[  192.461314] ff00: 0040 
cc304800 0020
[  192.469230] ff20:  
0001 0008
[  192.477148] ff40: 004eb3b8 958bb840
003d 0001
[  192.485065] ff60: 1e303f00 959a1508
0004 1e303f00
[  192.492982] ff80: 0004 004d4c68
0001 
[  192.500899] ffa0: 1e30d5c0 cc304820
958bec64 cc304820
[  192.508816] ffc0: 95912898 2000
0001 0040
[  192.516733] ffe0:  
 
[  192.524650] [] el0_svc_naked+0x24/0x28

Signed-off-by: Steve Longerbeam 
---
    drivers/clk/clk-versaclock5.c | 4 
    1 file changed, 4 insertions(+)

diff --git a/drivers/clk/clk-versaclock5.c
b/drivers/clk/clk-versaclock5.c
index decffb3..113523d 100644
--- a/drivers/clk/clk-versaclock5.c
+++ b/drivers/clk/clk-versaclock5.c
@@ -509,6 +509,10 @@ static long vc5_fod_round_rate(struct clk_hw
*hw, unsigned long rate,
    u32 div_int;
    u64 div_frc;
    +    /* prevent div-by-0 */
+    if (rate == 0)
+    return 0;
+
    /* Determine integer part, which is 12 bit wide */
    div_int = f_in / rate;
    /*


Can this actually happen ?

We caught this using the Renesas 3.6.0 BSP release, when performing
a suspend of rcar-du driver. The rcar_du_pm_suspend() in 3.6.0 BSP is
modified
from mainline version, including calling clk_set_rate() on the crtc
clocks with a
rate of zero. So this is not actually reproducible (yet) in mainline.

So it sets clock to 0 ?

Yep, see

https://kernel.googlesource.com/pub/scm/linux/kernel/git/horms/renesas-bsp/+/rcar-3.6.0/drivers/gpu/drm/rcar-du/rcar_du_drv.c#359



   Anyway, this looks sane, although maybe the
whole driver could use a once-over to see if there could be more of this.

Actually I do see more potential divide-by-zeros due to a passed rate
of zero, including vc5_pfd_round_rate() and vc5_pfd_set_rate().

I can resubmit this patch fixing all cases in clk-versaclock5.c if you
like (and probably remove the misleading backtrace in the commit
message since it is a Renesas 3.6.0 kernel backtrace not a mainline
backtrace).

Or perhaps just treat this as a heads-up, I'll leave it up to you.

It'd be nice if you resubmitted it fixing all the cases.



Ok. Please disregard this patch since there won't be a v2, the
new patch will have a different title.

Steve



Re: [PATCH] clk: vc5: Fix div-by-0 when rounding a rate of zero

2018-05-31 Thread Steve Longerbeam




On 05/31/2018 12:37 PM, Marek Vasut wrote:

On 05/31/2018 08:52 PM, Steve Longerbeam wrote:


On 05/31/2018 11:35 AM, Marek Vasut wrote:

On 05/31/2018 08:32 PM, Steve Longerbeam wrote:

Hi Marek,


On 05/31/2018 11:25 AM, Marek Vasut wrote:

On 05/31/2018 08:23 PM, Steve Longerbeam wrote:

Just return zero for a rounded rate if requested rate is zero.

This was caught by CONFIG_UBSAN:

[  192.266748] UBSAN: Undefined behaviour in
drivers/clk/clk-versaclock5.c:513:17
[  192.274050] division by zero
[  192.276976] CPU: 0 PID: 2579 Comm: vsp-unit-test-0 Tainted: G
B   WC  4.14.17-02752-g13fb96f #1
[  192.286378] Hardware name: Renesas Salvator-X board based on
r8a7795 ES2.0+ (DT)
[  192.293852] Call trace:
[  192.296343] [] dump_backtrace+0x0/0x390
[  192.301807] [] show_stack+0x14/0x1c
[  192.306920] [] dump_stack+0x134/0x1a8
[  192.312213] [] ubsan_epilogue+0x14/0x60
[  192.317677] []
__ubsan_handle_divrem_overflow+0x11c/0x170
[  192.324720] [] vc5_fod_round_rate+0x68/0x148
[  192.330620] [] clk_calc_new_rates+0x238/0x3fc
[  192.336607] [] clk_calc_new_rates+0x29c/0x3fc
[  192.342595] []
clk_core_set_rate_nolock+0x48/0x11c
[  192.349019] [] clk_set_rate+0x34/0x4c
[  192.354307] [] rcar_du_pm_suspend+0x274/0x2f4
[  192.360297] [] platform_pm_suspend+0x78/0xb8
[  192.366198] [] dpm_run_callback+0x584/0xa18
[  192.372010] [] __device_suspend+0x1a8/0x534
[  192.377822] [] dpm_suspend+0x130/0xea0
[  192.383197] [] dpm_suspend_start+0x130/0x138
[  192.389099] []
suspend_devices_and_enter+0xf0/0x1778
[  192.395700] [] pm_suspend+0x2408/0x245c
[  192.401162] [] state_store+0xf0/0x130
[  192.406451] [] kobj_attr_store+0x5c/0x6c
[  192.412002] [] sysfs_kf_write+0xe8/0xfc
[  192.417466] [] kernfs_fop_write+0x22c/0x2e4
[  192.423281] [] __vfs_write+0x104/0x34c
[  192.428656] [] vfs_write+0x134/0x2d8
[  192.433857] [] SyS_write+0xbc/0x12c
[  192.438967] Exception stack(0x8006cd1cfec0 to
0x8006cd1d)
[  192.445480] fec0: 0001 1e303f00
0004 959a5000
[  192.453397] fee0:  000155510004
0003 006d
[  192.461314] ff00: 0040 
cc304800 0020
[  192.469230] ff20:  
0001 0008
[  192.477148] ff40: 004eb3b8 958bb840
003d 0001
[  192.485065] ff60: 1e303f00 959a1508
0004 1e303f00
[  192.492982] ff80: 0004 004d4c68
0001 
[  192.500899] ffa0: 1e30d5c0 cc304820
958bec64 cc304820
[  192.508816] ffc0: 95912898 2000
0001 0040
[  192.516733] ffe0:  
 
[  192.524650] [] el0_svc_naked+0x24/0x28

Signed-off-by: Steve Longerbeam 
---
    drivers/clk/clk-versaclock5.c | 4 
    1 file changed, 4 insertions(+)

diff --git a/drivers/clk/clk-versaclock5.c
b/drivers/clk/clk-versaclock5.c
index decffb3..113523d 100644
--- a/drivers/clk/clk-versaclock5.c
+++ b/drivers/clk/clk-versaclock5.c
@@ -509,6 +509,10 @@ static long vc5_fod_round_rate(struct clk_hw
*hw, unsigned long rate,
    u32 div_int;
    u64 div_frc;
    +    /* prevent div-by-0 */
+    if (rate == 0)
+    return 0;
+
    /* Determine integer part, which is 12 bit wide */
    div_int = f_in / rate;
    /*


Can this actually happen ?

We caught this using the Renesas 3.6.0 BSP release, when performing
a suspend of rcar-du driver. The rcar_du_pm_suspend() in 3.6.0 BSP is
modified
from mainline version, including calling clk_set_rate() on the crtc
clocks with a
rate of zero. So this is not actually reproducible (yet) in mainline.

So it sets clock to 0 ?

Yep, see

https://kernel.googlesource.com/pub/scm/linux/kernel/git/horms/renesas-bsp/+/rcar-3.6.0/drivers/gpu/drm/rcar-du/rcar_du_drv.c#359



   Anyway, this looks sane, although maybe the
whole driver could use a once-over to see if there could be more of this.

Actually I do see more potential divide-by-zeros due to a passed rate
of zero, including vc5_pfd_round_rate() and vc5_pfd_set_rate().

I can resubmit this patch fixing all cases in clk-versaclock5.c if you
like (and probably remove the misleading backtrace in the commit
message since it is a Renesas 3.6.0 kernel backtrace not a mainline
backtrace).

Or perhaps just treat this as a heads-up, I'll leave it up to you.

It'd be nice if you resubmitted it fixing all the cases.



Ok. Please disregard this patch since there won't be a v2, the
new patch will have a different title.

Steve



Re: [RFC] rcu: Check the range of jiffies_till_xxx_fqs on setting them

2018-05-31 Thread Byungchul Park




On 2018-05-31 20:17, Paul E. McKenney wrote:

On Thu, May 31, 2018 at 11:51:40AM +0900, Byungchul Park wrote:

On 2018-05-31 11:18, Byungchul Park wrote:

On 2018-05-29 21:01, Paul E. McKenney wrote:


One approach would be to embed the kernel_params_ops structure inside
another structure containing the limits, then just have two structures.
Perhaps something like this already exists?  I don't see it right off,
but then again, I am not exactly an expert on module_param.

Thoughts?


Unfortunately, I couldn't find it. There might be no way to verify
range of a variable except the way I did. Could you give your opinion
about whether I should go on it?


Like..


This looks reasonable to me.  Although you could make something that took
ranges, that would be more code than what you have below, so what you


Exactly.


have below is good.  Could you please resend as a patch with Signed-off-by
and commit log?


Sure, I will. Thanks a lot Paul.

--
Thanks,
Byungchul


Re: [RFC] rcu: Check the range of jiffies_till_xxx_fqs on setting them

2018-05-31 Thread Byungchul Park




On 2018-05-31 20:17, Paul E. McKenney wrote:

On Thu, May 31, 2018 at 11:51:40AM +0900, Byungchul Park wrote:

On 2018-05-31 11:18, Byungchul Park wrote:

On 2018-05-29 21:01, Paul E. McKenney wrote:


One approach would be to embed the kernel_params_ops structure inside
another structure containing the limits, then just have two structures.
Perhaps something like this already exists?  I don't see it right off,
but then again, I am not exactly an expert on module_param.

Thoughts?


Unfortunately, I couldn't find it. There might be no way to verify
range of a variable except the way I did. Could you give your opinion
about whether I should go on it?


Like..


This looks reasonable to me.  Although you could make something that took
ranges, that would be more code than what you have below, so what you


Exactly.


have below is good.  Could you please resend as a patch with Signed-off-by
and commit log?


Sure, I will. Thanks a lot Paul.

--
Thanks,
Byungchul


[PATCH] clocksource/drivers/imx: add input capture support

2018-05-31 Thread Fabio Estevam
From: Steve Longerbeam 

This patch adds support for the input capture function in the
i.MX GPT. Output compare and input capture functions are mixed
in the same register block, so we need to modify the irq ack/enable/
disable primitives to not stomp on the other function.

The input capture API is modelled after request/free irq:

typedef void (*mxc_icap_handler_t)(int, void *, struct timespec *);

int mxc_request_input_capture(unsigned int chan,
  mxc_icap_handler_t handler,
  unsigned long capflags, void *dev_id);

- chan: the channel number being requested (0 or 1).

- handler: a callback when there is an input capture event. The
  handler is given the channel number, the dev_id, and a timespec
  marking the input capture event.

- capflags: IRQF_TRIGGER_RISING and/or IRQF_TRIGGER_FALLING. If
  both are specified, events will be triggered on both rising and
  falling edges of the input capture signal.

- dev_id: a context pointer given back to the handler.

void mxc_free_input_capture(unsigned int chan, void *dev_id);

This disables the given input capture channel in the GPT.

Signed-off-by: Steve Longerbeam 
Signed-off-by: Fabio Estevam 
---
 drivers/clocksource/timer-imx-gpt.c | 479 
 include/linux/mxc_icap.h|  14 ++
 2 files changed, 443 insertions(+), 50 deletions(-)
 create mode 100644 include/linux/mxc_icap.h

diff --git a/drivers/clocksource/timer-imx-gpt.c 
b/drivers/clocksource/timer-imx-gpt.c
index 6ec6d79..64f3b61 100644
--- a/drivers/clocksource/timer-imx-gpt.c
+++ b/drivers/clocksource/timer-imx-gpt.c
@@ -21,9 +21,11 @@
  * MA 02110-1301, USA.
  */
 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,6 +34,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 /*
@@ -65,16 +69,52 @@
 #define V2_TCTL_CLK_PER(2 << 6)
 #define V2_TCTL_CLK_OSC_DIV8   (5 << 6)
 #define V2_TCTL_FRR(1 << 9)
+#define V2_TCTL_IM1_BIT16
+#define V2_TCTL_IM2_BIT18
+#define V2_IM_DISABLE  0
+#define V2_IM_RISING   1
+#define V2_IM_FALLING  2
+#define V2_IM_BOTH 3
 #define V2_TCTL_24MEN  (1 << 10)
 #define V2_TPRER_PRE24M12
 #define V2_IR  0x0c
+#define V2_IR_OF1  (1 << 0)
+#define V2_IR_IF1  (1 << 3)
+#define V2_IR_IF2  (1 << 4)
 #define V2_TSTAT   0x08
 #define V2_TSTAT_OF1   (1 << 0)
+#define V2_TSTAT_IF1   (1 << 3)
+#define V2_TSTAT_IF2   (1 << 4)
 #define V2_TCN 0x24
 #define V2_TCMP0x10
+#define V2_TCAP1   0x1c
+#define V2_TCAP2   0x20
 
 #define V2_TIMER_RATE_OSC_DIV8 300
 
+struct imx_timer;
+
+struct icap_channel {
+   struct imx_timer *imxtm;
+
+   int chan;
+
+   u32 cnt_reg;
+   u32 irqen_bit;
+   u32 status_bit;
+   u32 mode_bit;
+
+   mxc_icap_handler_t handler;
+   void *dev_id;
+
+   struct cyclecounter cc;
+   struct timecounter tc;
+};
+
+/* FIXME, for now can't find icap unless it's statically allocated */
+static struct icap_channel icap_channel[2];
+static DEFINE_SPINLOCK(icap_lock);
+
 struct imx_timer {
enum imx_gpt_type type;
void __iomem *base;
@@ -90,12 +130,20 @@ struct imx_gpt_data {
int reg_tstat;
int reg_tcn;
int reg_tcmp;
-   void (*gpt_setup_tctl)(struct imx_timer *imxtm);
-   void (*gpt_irq_enable)(struct imx_timer *imxtm);
-   void (*gpt_irq_disable)(struct imx_timer *imxtm);
-   void (*gpt_irq_acknowledge)(struct imx_timer *imxtm);
+   void (*gpt_oc_setup_tctl)(struct imx_timer *imxtm);
+   void (*gpt_oc_irq_enable)(struct imx_timer *imxtm);
+   void (*gpt_oc_irq_disable)(struct imx_timer *imxtm);
+   void (*gpt_oc_irq_acknowledge)(struct imx_timer *imxtm);
+   bool (*gpt_is_oc_irq)(struct imx_timer *imxtm, unsigned int tstat);
int (*set_next_event)(unsigned long evt,
  struct clock_event_device *ced);
+
+   void (*gpt_ic_irq_enable)(struct icap_channel *ic);
+   void (*gpt_ic_irq_disable)(struct icap_channel *ic);
+   void (*gpt_ic_irq_acknowledge)(struct icap_channel *ic);
+   bool (*gpt_is_ic_irq)(struct icap_channel *ic, unsigned int tstat);
+   void (*gpt_ic_enable)(struct icap_channel *ic, unsigned int mode);
+   void (*gpt_ic_disable)(struct icap_channel *ic);
 };
 
 static inline struct imx_timer *to_imx_timer(struct clock_event_device *ced)
@@ -103,52 +151,144 @@ static inline struct imx_timer *to_imx_timer(struct 
clock_event_device *ced)
return container_of(ced, struct imx_timer, ced);
 }
 
-static void imx1_gpt_irq_disable(struct imx_timer *imxtm)
+static void imx1_gpt_oc_irq_disable(struct imx_timer *imxtm)
 {

[PATCH] clocksource/drivers/imx: add input capture support

2018-05-31 Thread Fabio Estevam
From: Steve Longerbeam 

This patch adds support for the input capture function in the
i.MX GPT. Output compare and input capture functions are mixed
in the same register block, so we need to modify the irq ack/enable/
disable primitives to not stomp on the other function.

The input capture API is modelled after request/free irq:

typedef void (*mxc_icap_handler_t)(int, void *, struct timespec *);

int mxc_request_input_capture(unsigned int chan,
  mxc_icap_handler_t handler,
  unsigned long capflags, void *dev_id);

- chan: the channel number being requested (0 or 1).

- handler: a callback when there is an input capture event. The
  handler is given the channel number, the dev_id, and a timespec
  marking the input capture event.

- capflags: IRQF_TRIGGER_RISING and/or IRQF_TRIGGER_FALLING. If
  both are specified, events will be triggered on both rising and
  falling edges of the input capture signal.

- dev_id: a context pointer given back to the handler.

void mxc_free_input_capture(unsigned int chan, void *dev_id);

This disables the given input capture channel in the GPT.

Signed-off-by: Steve Longerbeam 
Signed-off-by: Fabio Estevam 
---
 drivers/clocksource/timer-imx-gpt.c | 479 
 include/linux/mxc_icap.h|  14 ++
 2 files changed, 443 insertions(+), 50 deletions(-)
 create mode 100644 include/linux/mxc_icap.h

diff --git a/drivers/clocksource/timer-imx-gpt.c 
b/drivers/clocksource/timer-imx-gpt.c
index 6ec6d79..64f3b61 100644
--- a/drivers/clocksource/timer-imx-gpt.c
+++ b/drivers/clocksource/timer-imx-gpt.c
@@ -21,9 +21,11 @@
  * MA 02110-1301, USA.
  */
 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,6 +34,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 /*
@@ -65,16 +69,52 @@
 #define V2_TCTL_CLK_PER(2 << 6)
 #define V2_TCTL_CLK_OSC_DIV8   (5 << 6)
 #define V2_TCTL_FRR(1 << 9)
+#define V2_TCTL_IM1_BIT16
+#define V2_TCTL_IM2_BIT18
+#define V2_IM_DISABLE  0
+#define V2_IM_RISING   1
+#define V2_IM_FALLING  2
+#define V2_IM_BOTH 3
 #define V2_TCTL_24MEN  (1 << 10)
 #define V2_TPRER_PRE24M12
 #define V2_IR  0x0c
+#define V2_IR_OF1  (1 << 0)
+#define V2_IR_IF1  (1 << 3)
+#define V2_IR_IF2  (1 << 4)
 #define V2_TSTAT   0x08
 #define V2_TSTAT_OF1   (1 << 0)
+#define V2_TSTAT_IF1   (1 << 3)
+#define V2_TSTAT_IF2   (1 << 4)
 #define V2_TCN 0x24
 #define V2_TCMP0x10
+#define V2_TCAP1   0x1c
+#define V2_TCAP2   0x20
 
 #define V2_TIMER_RATE_OSC_DIV8 300
 
+struct imx_timer;
+
+struct icap_channel {
+   struct imx_timer *imxtm;
+
+   int chan;
+
+   u32 cnt_reg;
+   u32 irqen_bit;
+   u32 status_bit;
+   u32 mode_bit;
+
+   mxc_icap_handler_t handler;
+   void *dev_id;
+
+   struct cyclecounter cc;
+   struct timecounter tc;
+};
+
+/* FIXME, for now can't find icap unless it's statically allocated */
+static struct icap_channel icap_channel[2];
+static DEFINE_SPINLOCK(icap_lock);
+
 struct imx_timer {
enum imx_gpt_type type;
void __iomem *base;
@@ -90,12 +130,20 @@ struct imx_gpt_data {
int reg_tstat;
int reg_tcn;
int reg_tcmp;
-   void (*gpt_setup_tctl)(struct imx_timer *imxtm);
-   void (*gpt_irq_enable)(struct imx_timer *imxtm);
-   void (*gpt_irq_disable)(struct imx_timer *imxtm);
-   void (*gpt_irq_acknowledge)(struct imx_timer *imxtm);
+   void (*gpt_oc_setup_tctl)(struct imx_timer *imxtm);
+   void (*gpt_oc_irq_enable)(struct imx_timer *imxtm);
+   void (*gpt_oc_irq_disable)(struct imx_timer *imxtm);
+   void (*gpt_oc_irq_acknowledge)(struct imx_timer *imxtm);
+   bool (*gpt_is_oc_irq)(struct imx_timer *imxtm, unsigned int tstat);
int (*set_next_event)(unsigned long evt,
  struct clock_event_device *ced);
+
+   void (*gpt_ic_irq_enable)(struct icap_channel *ic);
+   void (*gpt_ic_irq_disable)(struct icap_channel *ic);
+   void (*gpt_ic_irq_acknowledge)(struct icap_channel *ic);
+   bool (*gpt_is_ic_irq)(struct icap_channel *ic, unsigned int tstat);
+   void (*gpt_ic_enable)(struct icap_channel *ic, unsigned int mode);
+   void (*gpt_ic_disable)(struct icap_channel *ic);
 };
 
 static inline struct imx_timer *to_imx_timer(struct clock_event_device *ced)
@@ -103,52 +151,144 @@ static inline struct imx_timer *to_imx_timer(struct 
clock_event_device *ced)
return container_of(ced, struct imx_timer, ced);
 }
 
-static void imx1_gpt_irq_disable(struct imx_timer *imxtm)
+static void imx1_gpt_oc_irq_disable(struct imx_timer *imxtm)
 {

Re: [PATCH] autofs: make autofs4 and autofs mutually exclusive

2018-05-31 Thread Ian Kent
On Thu, 2018-05-31 at 17:13 -0700, Andrew Morton wrote:
> On Wed, 30 May 2018 17:18:55 +0800 Ian Kent  wrote:
> 
> > > I actually had an alternative approach that I tried out successfully
> > > but discarded as being too different from the original code. Just for
> > > reference, this one would work as well, and allow both to be
> > > compiled together. The version you posted is probably better.
> > 
> > It's an attractive option but the problem is both implement the
> > autofs file system.
> > 
> > I've always thought you can't register the same file system at the
> > same time from two distinct sources.
> > 
> > If you're careful and compile each only as a module you could do it.
> > 
> > But many configurations have autofs compiled built-in because of the
> > auto-loading problems that arose back when there was an autofs fs
> > module as well as an autofs fs module present in the autofs4 directory.
> > 
> > Maybe it would actually work with one winning over the other but
> > I'd prefer not to go that way.
> > 
> > It will be gone in two subsequent releases if it gets merged and no
> > changes to the retained code will be needed with this approach.
> 
> I'm losing the plot here.  Can you please confirm that this is the
> patch we want?

Understandable.

This wasn't quite what I did and at the risk of confusing matters
further I'll try and explain what I did and why.

I folded the change into the patch which created fs/autofs/Kconfig
(autofs-create-autofs-kconfig-and-makefile.patch).

However doing what you're doing here should have the same effect as
long as Kbuild is smart enough to work out that
"depends on AUTOFS_FS = n" doesn't apply when fs/autofs/Kconfig hasn't
yet been created (by autofs-create-autofs-kconfig-and-makefile.patch).

The problem is that AUTOFS_FS=y might be still be present in .config
(surviving after many years) causing the bisection problem.

That's why I thought it best to add the depends in fs/autofs4/Kconfig
at the time fs/autofs/Kconfig is created rather than before that in
autofs-update-fs-autofs4-makefile.patch, as is done here.

Let me check if Kbuild will do the right thing and get back to you.

> 
> 
> From: Ian Kent 
> Subject: autofs: update fs/autofs4/Kconfig
> 
> Update Kconfig and add a depricated warning.
> 
> [ra...@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
>   Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@p
> luto.themaw.net
> Link: http://lkml.kernel.org/r/152626706133.28589.11994171621899212952.stgit@p
> luto.themaw.net
> Signed-off-by: Ian Kent 
> Tested-by: Randy Dunlap 
> Cc: Al Viro 
> Cc: Arnd Bergmann 
> Signed-off-by: Andrew Morton 
> ---
> 
>  fs/autofs4/Kconfig |   42 +++---
>  1 file changed, 31 insertions(+), 11 deletions(-)
> 
> diff -puN fs/autofs4/Kconfig~autofs-update-fs-autofs4-kconfig
> fs/autofs4/Kconfig
> --- a/fs/autofs4/Kconfig~autofs-update-fs-autofs4-kconfig
> +++ a/fs/autofs4/Kconfig
> @@ -1,5 +1,7 @@
>  config AUTOFS4_FS
> - tristate "Kernel automounter version 4 support (also supports v3)"
> + tristate "Kernel automounter version 4 support (also supports v3 and
> v5)"
> + default n
> + depends on AUTOFS_FS = n
>   help
> The automounter is a tool to automatically mount remote file
> systems
> on demand. This implementation is partially kernel-based to reduce
> @@ -7,14 +9,32 @@ config AUTOFS4_FS
> automounter (amd), which is a pure user space daemon.
>  
> To use the automounter you need the user-space tools from
> -   ; you also
> -   want to answer Y to "NFS file system support", below.
> +   ; you also want
> +   to answer Y to "NFS file system support", below.
>  
> -   To compile this support as a module, choose M here: the module will
> be
> -   called autofs4.  You will need to add "alias autofs autofs4" to
> your
> -   modules configuration file.
> -
> -   If you are not a part of a fairly large, distributed network or
> -   don't have a laptop which needs to dynamically reconfigure to the
> -   local network, you probably do not need an automounter, and can say
> -   N here.
> +   This module is in the process of being renamed from autofs4 to
> +   autofs. Since autofs is now the only module that provides the
> +   autofs file system the module is not version 4 specific.
> +
> +   The autofs4 module is now built from the source located in
> +   fs/autofs. The autofs4 directory and its configuration entry
> +   will be removed two kernel versions from the inclusion of this
> +   change.
> +
> +   Changes that will need to be made should be limited to:
> +   - source include statments should be changed from autofs_fs4.h to
> + autofs_fs.h since these two header files have been merged.
> +   - user space scripts 

Re: [PATCH] autofs: make autofs4 and autofs mutually exclusive

2018-05-31 Thread Ian Kent
On Thu, 2018-05-31 at 17:13 -0700, Andrew Morton wrote:
> On Wed, 30 May 2018 17:18:55 +0800 Ian Kent  wrote:
> 
> > > I actually had an alternative approach that I tried out successfully
> > > but discarded as being too different from the original code. Just for
> > > reference, this one would work as well, and allow both to be
> > > compiled together. The version you posted is probably better.
> > 
> > It's an attractive option but the problem is both implement the
> > autofs file system.
> > 
> > I've always thought you can't register the same file system at the
> > same time from two distinct sources.
> > 
> > If you're careful and compile each only as a module you could do it.
> > 
> > But many configurations have autofs compiled built-in because of the
> > auto-loading problems that arose back when there was an autofs fs
> > module as well as an autofs fs module present in the autofs4 directory.
> > 
> > Maybe it would actually work with one winning over the other but
> > I'd prefer not to go that way.
> > 
> > It will be gone in two subsequent releases if it gets merged and no
> > changes to the retained code will be needed with this approach.
> 
> I'm losing the plot here.  Can you please confirm that this is the
> patch we want?

Understandable.

This wasn't quite what I did and at the risk of confusing matters
further I'll try and explain what I did and why.

I folded the change into the patch which created fs/autofs/Kconfig
(autofs-create-autofs-kconfig-and-makefile.patch).

However doing what you're doing here should have the same effect as
long as Kbuild is smart enough to work out that
"depends on AUTOFS_FS = n" doesn't apply when fs/autofs/Kconfig hasn't
yet been created (by autofs-create-autofs-kconfig-and-makefile.patch).

The problem is that AUTOFS_FS=y might be still be present in .config
(surviving after many years) causing the bisection problem.

That's why I thought it best to add the depends in fs/autofs4/Kconfig
at the time fs/autofs/Kconfig is created rather than before that in
autofs-update-fs-autofs4-makefile.patch, as is done here.

Let me check if Kbuild will do the right thing and get back to you.

> 
> 
> From: Ian Kent 
> Subject: autofs: update fs/autofs4/Kconfig
> 
> Update Kconfig and add a depricated warning.
> 
> [ra...@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
>   Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@p
> luto.themaw.net
> Link: http://lkml.kernel.org/r/152626706133.28589.11994171621899212952.stgit@p
> luto.themaw.net
> Signed-off-by: Ian Kent 
> Tested-by: Randy Dunlap 
> Cc: Al Viro 
> Cc: Arnd Bergmann 
> Signed-off-by: Andrew Morton 
> ---
> 
>  fs/autofs4/Kconfig |   42 +++---
>  1 file changed, 31 insertions(+), 11 deletions(-)
> 
> diff -puN fs/autofs4/Kconfig~autofs-update-fs-autofs4-kconfig
> fs/autofs4/Kconfig
> --- a/fs/autofs4/Kconfig~autofs-update-fs-autofs4-kconfig
> +++ a/fs/autofs4/Kconfig
> @@ -1,5 +1,7 @@
>  config AUTOFS4_FS
> - tristate "Kernel automounter version 4 support (also supports v3)"
> + tristate "Kernel automounter version 4 support (also supports v3 and
> v5)"
> + default n
> + depends on AUTOFS_FS = n
>   help
> The automounter is a tool to automatically mount remote file
> systems
> on demand. This implementation is partially kernel-based to reduce
> @@ -7,14 +9,32 @@ config AUTOFS4_FS
> automounter (amd), which is a pure user space daemon.
>  
> To use the automounter you need the user-space tools from
> -   ; you also
> -   want to answer Y to "NFS file system support", below.
> +   ; you also want
> +   to answer Y to "NFS file system support", below.
>  
> -   To compile this support as a module, choose M here: the module will
> be
> -   called autofs4.  You will need to add "alias autofs autofs4" to
> your
> -   modules configuration file.
> -
> -   If you are not a part of a fairly large, distributed network or
> -   don't have a laptop which needs to dynamically reconfigure to the
> -   local network, you probably do not need an automounter, and can say
> -   N here.
> +   This module is in the process of being renamed from autofs4 to
> +   autofs. Since autofs is now the only module that provides the
> +   autofs file system the module is not version 4 specific.
> +
> +   The autofs4 module is now built from the source located in
> +   fs/autofs. The autofs4 directory and its configuration entry
> +   will be removed two kernel versions from the inclusion of this
> +   change.
> +
> +   Changes that will need to be made should be limited to:
> +   - source include statments should be changed from autofs_fs4.h to
> + autofs_fs.h since these two header files have been merged.
> +   - user space scripts 

Re: Can kfree() sleep at runtime?

2018-05-31 Thread Nadav Amit
Jia-Ju Bai  wrote:

> 
> 
> On 2018/5/31 22:30, Christopher Lameter wrote:
>> On Thu, 31 May 2018, Matthew Wilcox wrote:
>> 
 Freeing a page in the page allocator also was traditionally not sleeping.
 That has changed?
>>> No.  "Your bug" being "The bug in your static analysis tool".  It probably
>>> isn't following the data flow correctly (or deeply enough).
>> Well ok this is not going to trigger for kfree(), this is x86 specific and
>> requires CONFIG_DEBUG_PAGEALLOC and a free of a page in a huge page.
>> 
>> Ok that is a very contorted situation but how would a static checker deal
>> with that?
> 
> I admit that my tool does not follow the data flow well, and I need to 
> improve it.
> In this case of kfree(), I want know how the data flow leads to my mistake.

Note that is only enabled in debug mode:

static inline void
kernel_map_pages(struct page *page, int numpages, int enable)
{
if (!debug_pagealloc_enabled())
return;

__kernel_map_pages(page, numpages, enable);
}


signature.asc
Description: Message signed with OpenPGP


Re: Can kfree() sleep at runtime?

2018-05-31 Thread Nadav Amit
Jia-Ju Bai  wrote:

> 
> 
> On 2018/5/31 22:30, Christopher Lameter wrote:
>> On Thu, 31 May 2018, Matthew Wilcox wrote:
>> 
 Freeing a page in the page allocator also was traditionally not sleeping.
 That has changed?
>>> No.  "Your bug" being "The bug in your static analysis tool".  It probably
>>> isn't following the data flow correctly (or deeply enough).
>> Well ok this is not going to trigger for kfree(), this is x86 specific and
>> requires CONFIG_DEBUG_PAGEALLOC and a free of a page in a huge page.
>> 
>> Ok that is a very contorted situation but how would a static checker deal
>> with that?
> 
> I admit that my tool does not follow the data flow well, and I need to 
> improve it.
> In this case of kfree(), I want know how the data flow leads to my mistake.

Note that is only enabled in debug mode:

static inline void
kernel_map_pages(struct page *page, int numpages, int enable)
{
if (!debug_pagealloc_enabled())
return;

__kernel_map_pages(page, numpages, enable);
}


signature.asc
Description: Message signed with OpenPGP


Re: Can kfree() sleep at runtime?

2018-05-31 Thread Jia-Ju Bai




On 2018/5/31 22:30, Christopher Lameter wrote:

On Thu, 31 May 2018, Matthew Wilcox wrote:


Freeing a page in the page allocator also was traditionally not sleeping.
That has changed?

No.  "Your bug" being "The bug in your static analysis tool".  It probably
isn't following the data flow correctly (or deeply enough).

Well ok this is not going to trigger for kfree(), this is x86 specific and
requires CONFIG_DEBUG_PAGEALLOC and a free of a page in a huge page.

Ok that is a very contorted situation but how would a static checker deal
with that?


I admit that my tool does not follow the data flow well, and I need to 
improve it.

In this case of kfree(), I want know how the data flow leads to my mistake.


Best wishes,
Jia-Ju Bai


Re: Can kfree() sleep at runtime?

2018-05-31 Thread Jia-Ju Bai




On 2018/5/31 22:30, Christopher Lameter wrote:

On Thu, 31 May 2018, Matthew Wilcox wrote:


Freeing a page in the page allocator also was traditionally not sleeping.
That has changed?

No.  "Your bug" being "The bug in your static analysis tool".  It probably
isn't following the data flow correctly (or deeply enough).

Well ok this is not going to trigger for kfree(), this is x86 specific and
requires CONFIG_DEBUG_PAGEALLOC and a free of a page in a huge page.

Ok that is a very contorted situation but how would a static checker deal
with that?


I admit that my tool does not follow the data flow well, and I need to 
improve it.

In this case of kfree(), I want know how the data flow leads to my mistake.


Best wishes,
Jia-Ju Bai


Re: [PATCH 1/2] power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats

2018-05-31 Thread Guenter Roeck

Hi Brian,

On 05/31/2018 05:32 PM, Brian Norris wrote:

This driver was originally submitted for the TI BQ20Z75 battery IC
(commit a7640bfa10c5 ("power_supply: Add driver for TI BQ20Z75 gas gauge
IC")) and later renamed to express generic SBS support. While it's
mostly true that this driver implemented a standard SBS command set, it
takes liberties with the REG_MANUFACTURER_DATA register. This register
is specified in the SBS spec, but it doesn't make any mention of what
its actual contents are.

We've sort of noticed this optionality previously, with commit
17c6d3979e5b ("sbs-battery: make writes to ManufacturerAccess
optional"), where we found that some batteries NAK writes to this
register.

What this really means is that so far, we've just been lucky that most
batteries have either been compatible with the TI chip, or else at least
haven't reported highly-unexpected values.

For instance, one battery I have here seems to report either 0x or
0x0100 to the MANUFACTURER_ACCESS_STATUS command -- while this seems to
match either Wake Up (bits[11:8] = b) or Normal Discharge
(bits[11:8] = 0001b) status for the TI part [1], they don't seem to
actually correspond to real states (for instance, I never see 0101b =
Charge, even when charging).

On other batteries, I'm getting apparently random data in return, which
means that occasionally, we interpret this as "battery not present" or
"battery is not healthy".

All in all, it seems to be a really bad idea to make assumptions about
REG_MANUFACTURER_DATA, unless we already know what battery we're using.
Therefore, this patch reimplements the "present" and "health" checks to
the following on most SBS batteries:

1. HEALTH: report "unknown" -- I couldn't find a standard SBS command
that gives us much useful here
2. PRESENT: just send a REG_STATUS command; if it succeeds, then the
battery is present

Also, we stop sending MANUFACTURER_ACCESS_SLEEP to non-TI parts. I have
no proof that this is useful and supported.

If someone explicitly provided a 'ti,bq20z75' compatible property, then
we retain the existing TI command behaviors.

[1] http://www.ti.com/lit/er/sluu265a/sluu265a.pdf



Excellent idea. Couple of comments below.

Thanks,
Guenter


Signed-off-by: Brian Norris 
---
  drivers/power/supply/sbs-battery.c | 61 +-
  1 file changed, 52 insertions(+), 9 deletions(-)

diff --git a/drivers/power/supply/sbs-battery.c 
b/drivers/power/supply/sbs-battery.c
index 83d7b4115857..e15d0ca4729d 100644
--- a/drivers/power/supply/sbs-battery.c
+++ b/drivers/power/supply/sbs-battery.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -84,8 +85,9 @@ static const struct chip_data {
int min_value;
int max_value;
  } sbs_data[] = {
+   /* Manuf. data isn't directly useful as a POWER_SUPPLY_PROP */
[REG_MANUFACTURER_DATA] =
-   SBS_DATA(POWER_SUPPLY_PROP_PRESENT, 0x00, 0, 65535),
+   SBS_DATA(-1, 0x00, 0, 65535),


I don't think this change is necessary. For example, 
POWER_SUPPLY_PROP_SERIAL_NUMBER
is also not directly accessed, yet the REG_SERIAL_NUMBER array entry includes
POWER_SUPPLY_PROP_SERIAL_NUMBER.


[REG_TEMPERATURE] =
SBS_DATA(POWER_SUPPLY_PROP_TEMP, 0x08, 0, 65535),
[REG_VOLTAGE] =
@@ -156,6 +158,9 @@ static enum power_supply_property sbs_properties[] = {
POWER_SUPPLY_PROP_MODEL_NAME
  };
  
+/* Supports special manufacturer commands from TI BQ20Z75 IC. */

+#define SBS_FLAGS_TI_BQ20Z75   BIT(0)
+
  struct sbs_info {
struct i2c_client   *client;
struct power_supply *power_supply;
@@ -168,6 +173,7 @@ struct sbs_info {
u32 poll_retry_count;
struct delayed_work work;
struct mutexmode_lock;
+   u32 flags;
  };
  
  static char model_name[I2C_SMBUS_BLOCK_MAX + 1];

@@ -315,6 +321,31 @@ static int sbs_status_correct(struct i2c_client *client, 
int *intval)
  static int sbs_get_battery_presence_and_health(
struct i2c_client *client, enum power_supply_property psp,
union power_supply_propval *val)
+{
+   int ret;
+
+   switch (psp) {
+   case POWER_SUPPLY_PROP_PRESENT:
+   /* Dummy command; if it succeeds, battery is present. */
+   ret = sbs_read_word_data(client, sbs_data[REG_STATUS].addr);
+   if (ret < 0)
+   val->intval = 0; /* battery disconnected */


I don't know the relevance, but the original code returns the error
in this situation.


+   else
+   val->intval = 1; /* battery present */
+   return 0;
+   case POWER_SUPPLY_PROP_HEALTH:
+   /* SBS spec doesn't have a general health command. */
+   val->intval = POWER_SUPPLY_HEALTH_UNKNOWN;
+   return 0;
+   

Re: [PATCH 1/2] power: supply: sbs-battery: don't assume MANUFACTURER_DATA formats

2018-05-31 Thread Guenter Roeck

Hi Brian,

On 05/31/2018 05:32 PM, Brian Norris wrote:

This driver was originally submitted for the TI BQ20Z75 battery IC
(commit a7640bfa10c5 ("power_supply: Add driver for TI BQ20Z75 gas gauge
IC")) and later renamed to express generic SBS support. While it's
mostly true that this driver implemented a standard SBS command set, it
takes liberties with the REG_MANUFACTURER_DATA register. This register
is specified in the SBS spec, but it doesn't make any mention of what
its actual contents are.

We've sort of noticed this optionality previously, with commit
17c6d3979e5b ("sbs-battery: make writes to ManufacturerAccess
optional"), where we found that some batteries NAK writes to this
register.

What this really means is that so far, we've just been lucky that most
batteries have either been compatible with the TI chip, or else at least
haven't reported highly-unexpected values.

For instance, one battery I have here seems to report either 0x or
0x0100 to the MANUFACTURER_ACCESS_STATUS command -- while this seems to
match either Wake Up (bits[11:8] = b) or Normal Discharge
(bits[11:8] = 0001b) status for the TI part [1], they don't seem to
actually correspond to real states (for instance, I never see 0101b =
Charge, even when charging).

On other batteries, I'm getting apparently random data in return, which
means that occasionally, we interpret this as "battery not present" or
"battery is not healthy".

All in all, it seems to be a really bad idea to make assumptions about
REG_MANUFACTURER_DATA, unless we already know what battery we're using.
Therefore, this patch reimplements the "present" and "health" checks to
the following on most SBS batteries:

1. HEALTH: report "unknown" -- I couldn't find a standard SBS command
that gives us much useful here
2. PRESENT: just send a REG_STATUS command; if it succeeds, then the
battery is present

Also, we stop sending MANUFACTURER_ACCESS_SLEEP to non-TI parts. I have
no proof that this is useful and supported.

If someone explicitly provided a 'ti,bq20z75' compatible property, then
we retain the existing TI command behaviors.

[1] http://www.ti.com/lit/er/sluu265a/sluu265a.pdf



Excellent idea. Couple of comments below.

Thanks,
Guenter


Signed-off-by: Brian Norris 
---
  drivers/power/supply/sbs-battery.c | 61 +-
  1 file changed, 52 insertions(+), 9 deletions(-)

diff --git a/drivers/power/supply/sbs-battery.c 
b/drivers/power/supply/sbs-battery.c
index 83d7b4115857..e15d0ca4729d 100644
--- a/drivers/power/supply/sbs-battery.c
+++ b/drivers/power/supply/sbs-battery.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -84,8 +85,9 @@ static const struct chip_data {
int min_value;
int max_value;
  } sbs_data[] = {
+   /* Manuf. data isn't directly useful as a POWER_SUPPLY_PROP */
[REG_MANUFACTURER_DATA] =
-   SBS_DATA(POWER_SUPPLY_PROP_PRESENT, 0x00, 0, 65535),
+   SBS_DATA(-1, 0x00, 0, 65535),


I don't think this change is necessary. For example, 
POWER_SUPPLY_PROP_SERIAL_NUMBER
is also not directly accessed, yet the REG_SERIAL_NUMBER array entry includes
POWER_SUPPLY_PROP_SERIAL_NUMBER.


[REG_TEMPERATURE] =
SBS_DATA(POWER_SUPPLY_PROP_TEMP, 0x08, 0, 65535),
[REG_VOLTAGE] =
@@ -156,6 +158,9 @@ static enum power_supply_property sbs_properties[] = {
POWER_SUPPLY_PROP_MODEL_NAME
  };
  
+/* Supports special manufacturer commands from TI BQ20Z75 IC. */

+#define SBS_FLAGS_TI_BQ20Z75   BIT(0)
+
  struct sbs_info {
struct i2c_client   *client;
struct power_supply *power_supply;
@@ -168,6 +173,7 @@ struct sbs_info {
u32 poll_retry_count;
struct delayed_work work;
struct mutexmode_lock;
+   u32 flags;
  };
  
  static char model_name[I2C_SMBUS_BLOCK_MAX + 1];

@@ -315,6 +321,31 @@ static int sbs_status_correct(struct i2c_client *client, 
int *intval)
  static int sbs_get_battery_presence_and_health(
struct i2c_client *client, enum power_supply_property psp,
union power_supply_propval *val)
+{
+   int ret;
+
+   switch (psp) {
+   case POWER_SUPPLY_PROP_PRESENT:
+   /* Dummy command; if it succeeds, battery is present. */
+   ret = sbs_read_word_data(client, sbs_data[REG_STATUS].addr);
+   if (ret < 0)
+   val->intval = 0; /* battery disconnected */


I don't know the relevance, but the original code returns the error
in this situation.


+   else
+   val->intval = 1; /* battery present */
+   return 0;
+   case POWER_SUPPLY_PROP_HEALTH:
+   /* SBS spec doesn't have a general health command. */
+   val->intval = POWER_SUPPLY_HEALTH_UNKNOWN;
+   return 0;
+   

Re: Can kfree() sleep at runtime?

2018-05-31 Thread Jia-Ju Bai




On 2018/5/31 22:09, Christopher Lameter wrote:

On Thu, 31 May 2018, Jia-Ju Bai wrote:


I write a static analysis tool (DSAC), and it finds that kfree() can sleep.

That should not happen.


Here is the call path for kfree().
Please look at it *from the bottom up*.

[FUNC] alloc_pages(GFP_KERNEL)
arch/x86/mm/pageattr.c, 756: alloc_pages in split_large_page
arch/x86/mm/pageattr.c, 1283: split_large_page in __change_page_attr
arch/x86/mm/pageattr.c, 1391: __change_page_attr in __change_page_attr_set_clr
arch/x86/mm/pageattr.c, 2014: __change_page_attr_set_clr in __set_pages_np
arch/x86/mm/pageattr.c, 2034: __set_pages_np in __kernel_map_pages
./include/linux/mm.h, 2488: __kernel_map_pages in kernel_map_pages
mm/page_alloc.c, 1074: kernel_map_pages in free_pages_prepare

mapping pages in the page allocator can cause allocations?? How did that
get in there?


Thanks for reply :)
I am also confused about it.

I get in here according to the definition of free_pages_prepare():
1022. static bool free_pages_prepare(...) {
 ...
1072.arch_free_page(page, order);
1073.kernel_poison_pages(page, 1 << order, 0);
1074.kernel_map_pages(page, 1 << order, 0); // *Here*
1075.kasan_free_pages(page, order);
1076.
1077.return true;
1078. }


Best wishes,
Jia-Ju Bai


Re: Can kfree() sleep at runtime?

2018-05-31 Thread Jia-Ju Bai




On 2018/5/31 22:09, Christopher Lameter wrote:

On Thu, 31 May 2018, Jia-Ju Bai wrote:


I write a static analysis tool (DSAC), and it finds that kfree() can sleep.

That should not happen.


Here is the call path for kfree().
Please look at it *from the bottom up*.

[FUNC] alloc_pages(GFP_KERNEL)
arch/x86/mm/pageattr.c, 756: alloc_pages in split_large_page
arch/x86/mm/pageattr.c, 1283: split_large_page in __change_page_attr
arch/x86/mm/pageattr.c, 1391: __change_page_attr in __change_page_attr_set_clr
arch/x86/mm/pageattr.c, 2014: __change_page_attr_set_clr in __set_pages_np
arch/x86/mm/pageattr.c, 2034: __set_pages_np in __kernel_map_pages
./include/linux/mm.h, 2488: __kernel_map_pages in kernel_map_pages
mm/page_alloc.c, 1074: kernel_map_pages in free_pages_prepare

mapping pages in the page allocator can cause allocations?? How did that
get in there?


Thanks for reply :)
I am also confused about it.

I get in here according to the definition of free_pages_prepare():
1022. static bool free_pages_prepare(...) {
 ...
1072.arch_free_page(page, order);
1073.kernel_poison_pages(page, 1 << order, 0);
1074.kernel_map_pages(page, 1 << order, 0); // *Here*
1075.kasan_free_pages(page, order);
1076.
1077.return true;
1078. }


Best wishes,
Jia-Ju Bai


Re: Can kfree() sleep at runtime?

2018-05-31 Thread Jia-Ju Bai




On 2018/5/31 22:08, Matthew Wilcox wrote:

On Thu, May 31, 2018 at 09:10:07PM +0800, Jia-Ju Bai wrote:

I write a static analysis tool (DSAC), and it finds that kfree() can sleep.

Here is the call path for kfree().
Please look at it *from the bottom up*.

[FUNC] alloc_pages(GFP_KERNEL)
arch/x86/mm/pageattr.c, 756: alloc_pages in split_large_page
arch/x86/mm/pageattr.c, 1283: split_large_page in __change_page_attr

Here's your bug.  Coming from kfree(), we can't end up in the
split_large_page() path.  __change_page_attr may be called in several
different circumstances in which it would have to split a large page,
but the path from kfree() is not one of them.

I think the path from kfree() will lead to the 'level == PG_LEVEL_4K'
path, but I'm not really familiar with this x86 code.


Thanks for reply :)
But from the code in my call path, I cannot find why kfree() will only lead to 
the 'level == PG_LEVEL_4K' path.
Could you please explain it in more detail?


Best wishes,
Jia-Ju Bai
 



Re: Can kfree() sleep at runtime?

2018-05-31 Thread Jia-Ju Bai




On 2018/5/31 22:08, Matthew Wilcox wrote:

On Thu, May 31, 2018 at 09:10:07PM +0800, Jia-Ju Bai wrote:

I write a static analysis tool (DSAC), and it finds that kfree() can sleep.

Here is the call path for kfree().
Please look at it *from the bottom up*.

[FUNC] alloc_pages(GFP_KERNEL)
arch/x86/mm/pageattr.c, 756: alloc_pages in split_large_page
arch/x86/mm/pageattr.c, 1283: split_large_page in __change_page_attr

Here's your bug.  Coming from kfree(), we can't end up in the
split_large_page() path.  __change_page_attr may be called in several
different circumstances in which it would have to split a large page,
but the path from kfree() is not one of them.

I think the path from kfree() will lead to the 'level == PG_LEVEL_4K'
path, but I'm not really familiar with this x86 code.


Thanks for reply :)
But from the code in my call path, I cannot find why kfree() will only lead to 
the 'level == PG_LEVEL_4K' path.
Could you please explain it in more detail?


Best wishes,
Jia-Ju Bai
 



include/linux/syscalls.h:211:18: error: 'sys_mmap2' alias between functions of incompatible types 'long int(long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigne

2018-05-31 Thread kbuild test robot
Hi Al,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   0512e0134582ef85dee77d51aae77dcd1edec495
commit: ee076e81fc14ca79334d02970cea66604f183a14 sparc: trivial conversions to 
{COMPAT_,}SYSCALL_DEFINE()
date:   2 months ago
config: sparc-defconfig (attached as .config)
compiler: sparc-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout ee076e81fc14ca79334d02970cea66604f183a14
# save the attached .config to linux build tree
make.cross ARCH=sparc 

All errors (new ones prefixed by >>):

   In file included from arch/sparc/kernel/sys_sparc_32.c:21:
>> include/linux/syscalls.h:211:18: error: 'sys_mmap2' alias between functions 
>> of incompatible types 'long int(long unsigned int,  long unsigned int,  long 
>> unsigned int,  long unsigned int,  long unsigned int,  long unsigned int)' 
>> and 'long int(long int,  long int,  long int,  long int,  long int,  long 
>> int)' [-Werror=attribute-alias]
 asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:201:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:101:1: note: in expansion of macro 
'SYSCALL_DEFINE6'
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
^~~
   include/linux/syscalls.h:215:18: note: aliased declaration here
 asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:201:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:101:1: note: in expansion of macro 
'SYSCALL_DEFINE6'
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
^~~
>> include/linux/syscalls.h:211:18: error: 'sys_getdomainname' alias between 
>> functions of incompatible types 'long int(char *, int)' and 'long int(long 
>> int,  long int)' [-Werror=attribute-alias]
 asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:197:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:205:1: note: in expansion of macro 
'SYSCALL_DEFINE2'
SYSCALL_DEFINE2(getdomainname, char __user *, name, int, len)
^~~
   include/linux/syscalls.h:215:18: note: aliased declaration here
 asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:197:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:205:1: note: in expansion of macro 
'SYSCALL_DEFINE2'
SYSCALL_DEFINE2(getdomainname, char __user *, name, int, len)
^~~
   include/linux/syscalls.h:211:18: error: 'sys_rt_sigaction' alias between 
functions of incompatible types 'long int(int,  const struct sigaction *, 
struct sigaction *, void *, size_t)' {aka 'long int(int,  const struct 
sigaction *, struct sigaction *, void *, unsigned int)'} and 'long int(long 
int,  long int,  long int,  long int,  long int)' [-Werror=attribute-alias]
 asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:200:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:176:1: note: in expansion of macro 
'SYSCALL_DEFINE5'
SYSCALL_DEFINE5(rt_sigaction, 

include/linux/syscalls.h:211:18: error: 'sys_mmap2' alias between functions of incompatible types 'long int(long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigne

2018-05-31 Thread kbuild test robot
Hi Al,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   0512e0134582ef85dee77d51aae77dcd1edec495
commit: ee076e81fc14ca79334d02970cea66604f183a14 sparc: trivial conversions to 
{COMPAT_,}SYSCALL_DEFINE()
date:   2 months ago
config: sparc-defconfig (attached as .config)
compiler: sparc-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout ee076e81fc14ca79334d02970cea66604f183a14
# save the attached .config to linux build tree
make.cross ARCH=sparc 

All errors (new ones prefixed by >>):

   In file included from arch/sparc/kernel/sys_sparc_32.c:21:
>> include/linux/syscalls.h:211:18: error: 'sys_mmap2' alias between functions 
>> of incompatible types 'long int(long unsigned int,  long unsigned int,  long 
>> unsigned int,  long unsigned int,  long unsigned int,  long unsigned int)' 
>> and 'long int(long int,  long int,  long int,  long int,  long int,  long 
>> int)' [-Werror=attribute-alias]
 asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:201:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:101:1: note: in expansion of macro 
'SYSCALL_DEFINE6'
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
^~~
   include/linux/syscalls.h:215:18: note: aliased declaration here
 asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:201:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:101:1: note: in expansion of macro 
'SYSCALL_DEFINE6'
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
^~~
>> include/linux/syscalls.h:211:18: error: 'sys_getdomainname' alias between 
>> functions of incompatible types 'long int(char *, int)' and 'long int(long 
>> int,  long int)' [-Werror=attribute-alias]
 asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:197:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:205:1: note: in expansion of macro 
'SYSCALL_DEFINE2'
SYSCALL_DEFINE2(getdomainname, char __user *, name, int, len)
^~~
   include/linux/syscalls.h:215:18: note: aliased declaration here
 asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:197:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:205:1: note: in expansion of macro 
'SYSCALL_DEFINE2'
SYSCALL_DEFINE2(getdomainname, char __user *, name, int, len)
^~~
   include/linux/syscalls.h:211:18: error: 'sys_rt_sigaction' alias between 
functions of incompatible types 'long int(int,  const struct sigaction *, 
struct sigaction *, void *, size_t)' {aka 'long int(int,  const struct 
sigaction *, struct sigaction *, void *, unsigned int)'} and 'long int(long 
int,  long int,  long int,  long int,  long int)' [-Werror=attribute-alias]
 asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
 ^~~
   include/linux/syscalls.h:207:2: note: in expansion of macro 
'__SYSCALL_DEFINEx'
 __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
 ^
   include/linux/syscalls.h:200:36: note: in expansion of macro 
'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
   ^~~
   arch/sparc/kernel/sys_sparc_32.c:176:1: note: in expansion of macro 
'SYSCALL_DEFINE5'
SYSCALL_DEFINE5(rt_sigaction, 

Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-05-31 Thread Eric W. Biederman
Michal Hocko  writes:

> On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> This function searches for a new mm owner in children and siblings,
>> and then iterates over all processes in the system in unlikely case.
>> Despite the case is unlikely, its probability growths with the number
>> of processes in the system. The time, spent on iterations, also growths.
>> I regulary observe mm_update_next_owner() in crash dumps (not related
>> to this function) of the nodes with many processes (20K+), so it looks
>> like it's not so unlikely case.
>
> Did you manage to find the pattern that forces mm_update_next_owner to
> slow paths? This really shouldn't trigger very often. If we can fallback
> easily then I suspect that we should be better off reconsidering
> mm->owner and try to come up with something more clever. I've had a
> patch to remove owner few years back. It needed some work to finish but
> maybe that would be a better than try to make non-scalable thing suck
> less.

Reading through the code I just found a trivial pattern that triggers
this.  Create a multi-threaded process.  Have the thread group leader
(the first thread) exit.

This has the potential to be a significant DOS attack if anyone cares.

Eric



Re: [PATCH 0/4] exit: Make unlikely case in mm_update_next_owner() more scalable

2018-05-31 Thread Eric W. Biederman
Michal Hocko  writes:

> On Thu 26-04-18 14:00:19, Kirill Tkhai wrote:
>> This function searches for a new mm owner in children and siblings,
>> and then iterates over all processes in the system in unlikely case.
>> Despite the case is unlikely, its probability growths with the number
>> of processes in the system. The time, spent on iterations, also growths.
>> I regulary observe mm_update_next_owner() in crash dumps (not related
>> to this function) of the nodes with many processes (20K+), so it looks
>> like it's not so unlikely case.
>
> Did you manage to find the pattern that forces mm_update_next_owner to
> slow paths? This really shouldn't trigger very often. If we can fallback
> easily then I suspect that we should be better off reconsidering
> mm->owner and try to come up with something more clever. I've had a
> patch to remove owner few years back. It needed some work to finish but
> maybe that would be a better than try to make non-scalable thing suck
> less.

Reading through the code I just found a trivial pattern that triggers
this.  Create a multi-threaded process.  Have the thread group leader
(the first thread) exit.

This has the potential to be a significant DOS attack if anyone cares.

Eric



[PATCH v2 0/3] perf script python: Add more PMU fields

2018-05-31 Thread Jin Yao
When doing pmu sampling and then running a script with
perf script -s script.py, the process_event function gets
dictionary with some fields from the perf ring buffer
(like ip, sym, callchain etc).

But we miss quite a few fields we report now, for example,
LBRs,data source,weight,transaction,iregs,uregs,and etc.

This patch adds these fields for perf script python.

Jin Yao (3):
  perf script python: Move dsoname code to a new function
  perf script python: Add more PMU fields
  perf script python: Add fields introduction to Documentation

 tools/perf/Documentation/perf-script-python.txt|  26 +++
 .../util/scripting-engines/trace-event-python.c| 250 -
 2 files changed, 267 insertions(+), 9 deletions(-)

-- 
2.7.4



[PATCH v2 1/3] perf script python: Move dsoname code to a new function

2018-05-31 Thread Jin Yao
This patch creates a new function get_dsoname() and move the
code which gets the dsoname string to this function.

That's because in next patch, when we process LBR data, we will
also need get_dsoname() to return dsoname for branch from/to.

Signed-off-by: Jin Yao 
---
 .../util/scripting-engines/trace-event-python.c| 23 ++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c 
b/tools/perf/util/scripting-engines/trace-event-python.c
index 7f8afac..f863e96 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -372,6 +372,19 @@ static PyObject *get_field_numeric_entry(struct 
event_format *event,
return obj;
 }
 
+static const char *get_dsoname(struct map *map)
+{
+   const char *dsoname = "[unknown]";
+
+   if (map && map->dso) {
+   if (symbol_conf.show_kernel_path && map->dso->long_name)
+   dsoname = map->dso->long_name;
+   else
+   dsoname = map->dso->name;
+   }
+
+   return dsoname;
+}
 
 static PyObject *python_process_callchain(struct perf_sample *sample,
 struct perf_evsel *evsel,
@@ -427,14 +440,8 @@ static PyObject *python_process_callchain(struct 
perf_sample *sample,
}
 
if (node->map) {
-   struct map *map = node->map;
-   const char *dsoname = "[unknown]";
-   if (map && map->dso) {
-   if (symbol_conf.show_kernel_path && 
map->dso->long_name)
-   dsoname = map->dso->long_name;
-   else
-   dsoname = map->dso->name;
-   }
+   const char *dsoname = get_dsoname(node->map);
+
pydict_set_item_string_decref(pyelem, "dso",
_PyUnicode_FromString(dsoname));
}
-- 
2.7.4



[PATCH v2 2/3] perf script python: Add more PMU fields

2018-05-31 Thread Jin Yao
When doing pmu sampling and then running a script with
perf script -s script.py, the process_event function gets
dictionary with some fields from the perf ring buffer
(like ip, sym, callchain etc).

But we miss quite a few fields we report now, for example,
LBRs,data source,weight,transaction,iregs,uregs,and etc.

This patch reports these fields for perf script python
processing.

New created keys/items:
---
key  : brstack
items: from, to, from_dsoname, to_dsoname, mispred,
   predicted, in_tx, abort, cycles.

key  : brstacksym
items: from, to, pred, in_tx, abort (converted string)

key  : datasrc
key  : datasrc_decode (decoded string)
key  : iregs
key  : uregs
key  : weight
key  : transaction

v2:
---
Add new fields for dso.
Use PyBool_FromLong() for mispred/predicted/in_tx/abort

Signed-off-by: Jin Yao 
---
 .../util/scripting-engines/trace-event-python.c| 227 -
 1 file changed, 226 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/scripting-engines/trace-event-python.c 
b/tools/perf/util/scripting-engines/trace-event-python.c
index f863e96..9c29cfa3 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -48,6 +48,7 @@
 #include "cpumap.h"
 #include "print_binary.h"
 #include "stat.h"
+#include "mem-events.h"
 
 #if PY_MAJOR_VERSION < 3
 #define _PyUnicode_FromString(arg) \
@@ -455,6 +456,166 @@ static PyObject *python_process_callchain(struct 
perf_sample *sample,
return pylist;
 }
 
+static PyObject *python_process_brstack(struct perf_sample *sample,
+   struct thread *thread)
+{
+   struct branch_stack *br = sample->branch_stack;
+   PyObject *pylist;
+   u64 i;
+
+   pylist = PyList_New(0);
+   if (!pylist)
+   Py_FatalError("couldn't create Python list");
+
+   if (!(br && br->nr))
+   goto exit;
+
+   for (i = 0; i < br->nr; i++) {
+   PyObject *pyelem;
+   struct addr_location al;
+   const char *dsoname;
+
+   pyelem = PyDict_New();
+   if (!pyelem)
+   Py_FatalError("couldn't create Python dictionary");
+
+   pydict_set_item_string_decref(pyelem, "from",
+   PyLong_FromUnsignedLongLong(br->entries[i].from));
+   pydict_set_item_string_decref(pyelem, "to",
+   PyLong_FromUnsignedLongLong(br->entries[i].to));
+   pydict_set_item_string_decref(pyelem, "mispred",
+   PyBool_FromLong(br->entries[i].flags.mispred));
+   pydict_set_item_string_decref(pyelem, "predicted",
+   PyBool_FromLong(br->entries[i].flags.predicted));
+   pydict_set_item_string_decref(pyelem, "in_tx",
+   PyBool_FromLong(br->entries[i].flags.in_tx));
+   pydict_set_item_string_decref(pyelem, "abort",
+   PyBool_FromLong(br->entries[i].flags.abort));
+   pydict_set_item_string_decref(pyelem, "cycles",
+   PyLong_FromUnsignedLongLong(br->entries[i].flags.cycles));
+
+   thread__find_map(thread, sample->cpumode,
+br->entries[i].from, );
+   dsoname = get_dsoname(al.map);
+   pydict_set_item_string_decref(pyelem, "from_dsoname",
+ _PyUnicode_FromString(dsoname));
+
+   thread__find_map(thread, sample->cpumode,
+br->entries[i].to, );
+   dsoname = get_dsoname(al.map);
+   pydict_set_item_string_decref(pyelem, "to_dsoname",
+ _PyUnicode_FromString(dsoname));
+
+   PyList_Append(pylist, pyelem);
+   Py_DECREF(pyelem);
+   }
+
+exit:
+   return pylist;
+}
+
+static unsigned long get_offset(struct symbol *sym, struct addr_location *al)
+{
+   unsigned long offset;
+
+   if (al->addr < sym->end)
+   offset = al->addr - sym->start;
+   else
+   offset = al->addr - al->map->start - sym->start;
+
+   return offset;
+}
+
+static int get_symoff(struct symbol *sym, struct addr_location *al,
+ bool print_off, char *bf, int size)
+{
+   unsigned long offset;
+
+   if (!sym || !sym->name)
+   return scnprintf(bf, size, "%s", "[unknown]");
+
+   if (!print_off)
+   return scnprintf(bf, size, "%s", sym->name);
+
+   offset = get_offset(sym, al);
+
+   return scnprintf(bf, size, "%s+0x%x", sym->name, offset);
+}
+
+static int get_br_mspred(struct branch_flags *flags, char *bf, int size)
+{
+   if (!flags->mispred  && !flags->predicted)
+   return scnprintf(bf, size, "%s", "-");
+
+   if (flags->mispred)
+   return scnprintf(bf, size, "%s", "M");
+
+   return 

[PATCH v2 0/3] perf script python: Add more PMU fields

2018-05-31 Thread Jin Yao
When doing pmu sampling and then running a script with
perf script -s script.py, the process_event function gets
dictionary with some fields from the perf ring buffer
(like ip, sym, callchain etc).

But we miss quite a few fields we report now, for example,
LBRs,data source,weight,transaction,iregs,uregs,and etc.

This patch adds these fields for perf script python.

Jin Yao (3):
  perf script python: Move dsoname code to a new function
  perf script python: Add more PMU fields
  perf script python: Add fields introduction to Documentation

 tools/perf/Documentation/perf-script-python.txt|  26 +++
 .../util/scripting-engines/trace-event-python.c| 250 -
 2 files changed, 267 insertions(+), 9 deletions(-)

-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >