Re: RTL8723BE performance regression

2018-05-01 Thread Pkshih
On Wed, 2018-05-02 at 05:44 +, Pkshih wrote:
> 
> > -Original Message-
> > From: João Paulo Rechi Vita [mailto:jprv...@gmail.com]
> > Sent: Wednesday, May 02, 2018 6:41 AM
> > To: Larry Finger
> > Cc: Steve deRosier; 莊彥宣; Pkshih; Birming Chiu; Shaofu; Steven Ting; 
> > Chaoming_Li; Kalle Valo;
> > linux-wireless; Network Development; LKML; Daniel Drake; João Paulo Rechi 
> > Vita; linux@endlessm.c
> om
> > Subject: Re: RTL8723BE performance regression
> > 
> > On Tue, Apr 3, 2018 at 7:51 PM, Larry Finger  
> > wrote:
> > > On 04/03/2018 09:37 PM, João Paulo Rechi Vita wrote:
> > >>
> > >> On Tue, Apr 3, 2018 at 7:28 PM, Larry Finger 
> > >> wrote:
> > >>
> > >> (...)
> > >>
> > >>> As the antenna selection code changes affected your first bisection, do
> > >>> you
> > >>> have one of those HP laptops with only one antenna and the incorrect
> > >>> coding
> > >>> in the FUSE?
> > >>
> > >>
> > >> Yes, that is why I've been passing ant_sel=1 during my tests -- this
> > >> was needed to achieve a good performance in the past, before this
> > >> regression. I've also opened the laptop chassis and confirmed the
> > >> antenna cable is plugged to the connector labeled with "1" on the
> > >> card.
> > >>
> > >>> If so, please make sure that you still have the same signal
> > >>> strength for good and bad cases. I have tried to keep the driver and the
> > >>> btcoex code in sync, but there may be some combinations of antenna
> > >>> configuration and FUSE contents that cause the code to fail.
> > >>>
> > >>
> > >> What is the recommended way to monitor the signal strength?
> > >
> > >
> > > The btcoex code is developed for multiple platforms by a different group
> > > than the Linux driver. I think they made a change that caused ant_sel to
> > > switch from 1 to 2. At least numerous comments at
> > > github.com/lwfinger/rtlwifi_new claimed they needed to make that change.
> > >
> > > Mhy recommended method is to verify the wifi device name with "iw dev". 
> > > Then
> > > using that device
> > >
> > > sudo iw dev  scan | egrep "SSID|signal"
> > >
> > 
> > I have confirmed that the performance regression is indeed tied to
> > signal strength: on the good cases signal was between -16 and -8 dBm,
> > whereas in bad cases signal was always between -50 to - 40 dBm. I've
> > also switched to testing bandwidth in controlled LAN environment using
> > iperf3, as suggested by Steve deRosier, with the DUT being the only
> > machine connected to the 2.4 GHz radio and the machine running the
> > iperf3 server connected via ethernet.
> > 
> 
> We have new experimental results in commit af8a41cccf8f46 ("rtlwifi: cleanup 
> 8723be ant_sel definition"). You can use the above commit and do the same 
> experiments (with ant_sel=0, 1 and 2) in your side, and then share your 
> results.
> Since performance is tied to signal strength, you can only share signal 
> strength.
> 

Please pay attention to cold reboot once ant_sel is changed.



Re: RTL8723BE performance regression

2018-05-01 Thread Pkshih
On Wed, 2018-05-02 at 05:44 +, Pkshih wrote:
> 
> > -Original Message-
> > From: João Paulo Rechi Vita [mailto:jprv...@gmail.com]
> > Sent: Wednesday, May 02, 2018 6:41 AM
> > To: Larry Finger
> > Cc: Steve deRosier; 莊彥宣; Pkshih; Birming Chiu; Shaofu; Steven Ting; 
> > Chaoming_Li; Kalle Valo;
> > linux-wireless; Network Development; LKML; Daniel Drake; João Paulo Rechi 
> > Vita; linux@endlessm.c
> om
> > Subject: Re: RTL8723BE performance regression
> > 
> > On Tue, Apr 3, 2018 at 7:51 PM, Larry Finger  
> > wrote:
> > > On 04/03/2018 09:37 PM, João Paulo Rechi Vita wrote:
> > >>
> > >> On Tue, Apr 3, 2018 at 7:28 PM, Larry Finger 
> > >> wrote:
> > >>
> > >> (...)
> > >>
> > >>> As the antenna selection code changes affected your first bisection, do
> > >>> you
> > >>> have one of those HP laptops with only one antenna and the incorrect
> > >>> coding
> > >>> in the FUSE?
> > >>
> > >>
> > >> Yes, that is why I've been passing ant_sel=1 during my tests -- this
> > >> was needed to achieve a good performance in the past, before this
> > >> regression. I've also opened the laptop chassis and confirmed the
> > >> antenna cable is plugged to the connector labeled with "1" on the
> > >> card.
> > >>
> > >>> If so, please make sure that you still have the same signal
> > >>> strength for good and bad cases. I have tried to keep the driver and the
> > >>> btcoex code in sync, but there may be some combinations of antenna
> > >>> configuration and FUSE contents that cause the code to fail.
> > >>>
> > >>
> > >> What is the recommended way to monitor the signal strength?
> > >
> > >
> > > The btcoex code is developed for multiple platforms by a different group
> > > than the Linux driver. I think they made a change that caused ant_sel to
> > > switch from 1 to 2. At least numerous comments at
> > > github.com/lwfinger/rtlwifi_new claimed they needed to make that change.
> > >
> > > Mhy recommended method is to verify the wifi device name with "iw dev". 
> > > Then
> > > using that device
> > >
> > > sudo iw dev  scan | egrep "SSID|signal"
> > >
> > 
> > I have confirmed that the performance regression is indeed tied to
> > signal strength: on the good cases signal was between -16 and -8 dBm,
> > whereas in bad cases signal was always between -50 to - 40 dBm. I've
> > also switched to testing bandwidth in controlled LAN environment using
> > iperf3, as suggested by Steve deRosier, with the DUT being the only
> > machine connected to the 2.4 GHz radio and the machine running the
> > iperf3 server connected via ethernet.
> > 
> 
> We have new experimental results in commit af8a41cccf8f46 ("rtlwifi: cleanup 
> 8723be ant_sel definition"). You can use the above commit and do the same 
> experiments (with ant_sel=0, 1 and 2) in your side, and then share your 
> results.
> Since performance is tied to signal strength, you can only share signal 
> strength.
> 

Please pay attention to cold reboot once ant_sel is changed.



[RFC PATCH] Add /proc//numa_vamaps for numa node information

2018-05-01 Thread Prakash Sangappa
For analysis purpose it is useful to have numa node information
corresponding mapped address ranges of the process. Currently
/proc//numa_maps provides list of numa nodes from where pages are
allocated per VMA of the process. This is not useful if an user needs to
determine which numa node the mapped pages are allocated from for a
particular address range. It would have helped if the numa node information
presented in /proc//numa_maps was broken down by VA ranges showing the
exact numa node from where the pages have been allocated.

The format of /proc//numa_maps file content is dependent on
/proc//maps file content as mentioned in the manpage. i.e one line
entry for every VMA corresponding to entries in /proc//maps file.
Therefore changing the output of /proc//numa_maps may not be possible.

Hence, this patch proposes adding file /proc//numa_vamaps which will
provide proper break down of VA ranges by numa node id from where the mapped
pages are allocated. For Address ranges not having any pages mapped, a '-'
is printed instead of numa node id. In addition, this file will include most
of the other information currently presented in /proc//numa_maps. The
additional information included is for convenience. If this is not
preferred, the patch could be modified to just provide VA range to numa node
information as the rest of the information is already available thru
/proc//numa_maps file.

Since the VA range to numa node information does not include page's PFN,
reading this file will not be restricted(i.e requiring CAP_SYS_ADMIN).

Here is the snippet from the new file content showing the format.

0040-00401000 N0=1 kernelpagesize_kB=4 mapped=1 file=/tmp/hmap2
0060-00601000 N0=1 kernelpagesize_kB=4 anon=1 dirty=1 file=/tmp/hmap2
00601000-00602000 N0=1 kernelpagesize_kB=4 anon=1 dirty=1 file=/tmp/hmap2
7f021560-7f021580 N0=1 kernelpagesize_kB=2048 dirty=1 file=/mnt/f1
7f021580-7f0215c0 -  file=/mnt/f1
7f0215c0-7f0215e0 N0=1 kernelpagesize_kB=2048 dirty=1 file=/mnt/f1
7f0215e0-7f021620 -  file=/mnt/f1
..
7f0217ecb000-7f0217f2 N0=85 kernelpagesize_kB=4 mapped=85 mapmax=51
   file=/usr/lib64/libc-2.17.so
7f0217f2-7f0217f3 -  file=/usr/lib64/libc-2.17.so
7f0217f3-7f0217f9 N0=96 kernelpagesize_kB=4 mapped=96 mapmax=51
   file=/usr/lib64/libc-2.17.so
7f0217f9-7f0217fb -  file=/usr/lib64/libc-2.17.so
..

The 'pmap' command can be enhanced to include an option to show numa node
information which it can read from this new proc file. This will be a
follow on proposal.

There have been couple of previous patch proposals to provide numa node
information based on pfn or physical address. They seem to have not made
progress. Also it would appear reading numa node information based on PFN
or physical address will require privileges(CAP_SYS_ADMIN) similar to
reading PFN info from /proc//pagemap.

See
https://marc.info/?t=13963093821=1=2

https://marc.info/?t=13971872441=1=2

Signed-off-by: Prakash Sangappa 
---
 fs/proc/base.c |   2 +
 fs/proc/internal.h |   3 +
 fs/proc/task_mmu.c | 299 -
 3 files changed, 303 insertions(+), 1 deletion(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1b2ede6..8fd7cc5 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2960,6 +2960,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("maps",   S_IRUGO, proc_pid_maps_operations),
 #ifdef CONFIG_NUMA
REG("numa_maps",  S_IRUGO, proc_pid_numa_maps_operations),
+   REG("numa_vamaps",  S_IRUGO, proc_pid_numa_vamaps_operations),
 #endif
REG("mem",S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd",proc_cwd_link),
@@ -3352,6 +3353,7 @@ static const struct pid_entry tid_base_stuff[] = {
 #endif
 #ifdef CONFIG_NUMA
REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations),
+   REG("numa_vamaps",  S_IRUGO, proc_tid_numa_vamaps_operations),
 #endif
REG("mem",   S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd",   proc_cwd_link),
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 0f1692e..9a3ff80 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -273,6 +273,7 @@ struct proc_maps_private {
 #ifdef CONFIG_NUMA
struct mempolicy *task_mempolicy;
 #endif
+   u64 vma_off;
 } __randomize_layout;
 
 struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
@@ -280,7 +281,9 @@ struct mm_struct *proc_mem_open(struct inode *inode, 
unsigned int mode);
 extern const struct file_operations proc_pid_maps_operations;
 extern const struct file_operations proc_tid_maps_operations;
 extern const struct file_operations proc_pid_numa_maps_operations;
+extern const struct file_operations proc_pid_numa_vamaps_operations;
 extern const struct file_operations proc_tid_numa_maps_operations;
+extern const struct file_operations proc_tid_numa_vamaps_operations;
 extern const 

[RFC PATCH] Add /proc//numa_vamaps for numa node information

2018-05-01 Thread Prakash Sangappa
For analysis purpose it is useful to have numa node information
corresponding mapped address ranges of the process. Currently
/proc//numa_maps provides list of numa nodes from where pages are
allocated per VMA of the process. This is not useful if an user needs to
determine which numa node the mapped pages are allocated from for a
particular address range. It would have helped if the numa node information
presented in /proc//numa_maps was broken down by VA ranges showing the
exact numa node from where the pages have been allocated.

The format of /proc//numa_maps file content is dependent on
/proc//maps file content as mentioned in the manpage. i.e one line
entry for every VMA corresponding to entries in /proc//maps file.
Therefore changing the output of /proc//numa_maps may not be possible.

Hence, this patch proposes adding file /proc//numa_vamaps which will
provide proper break down of VA ranges by numa node id from where the mapped
pages are allocated. For Address ranges not having any pages mapped, a '-'
is printed instead of numa node id. In addition, this file will include most
of the other information currently presented in /proc//numa_maps. The
additional information included is for convenience. If this is not
preferred, the patch could be modified to just provide VA range to numa node
information as the rest of the information is already available thru
/proc//numa_maps file.

Since the VA range to numa node information does not include page's PFN,
reading this file will not be restricted(i.e requiring CAP_SYS_ADMIN).

Here is the snippet from the new file content showing the format.

0040-00401000 N0=1 kernelpagesize_kB=4 mapped=1 file=/tmp/hmap2
0060-00601000 N0=1 kernelpagesize_kB=4 anon=1 dirty=1 file=/tmp/hmap2
00601000-00602000 N0=1 kernelpagesize_kB=4 anon=1 dirty=1 file=/tmp/hmap2
7f021560-7f021580 N0=1 kernelpagesize_kB=2048 dirty=1 file=/mnt/f1
7f021580-7f0215c0 -  file=/mnt/f1
7f0215c0-7f0215e0 N0=1 kernelpagesize_kB=2048 dirty=1 file=/mnt/f1
7f0215e0-7f021620 -  file=/mnt/f1
..
7f0217ecb000-7f0217f2 N0=85 kernelpagesize_kB=4 mapped=85 mapmax=51
   file=/usr/lib64/libc-2.17.so
7f0217f2-7f0217f3 -  file=/usr/lib64/libc-2.17.so
7f0217f3-7f0217f9 N0=96 kernelpagesize_kB=4 mapped=96 mapmax=51
   file=/usr/lib64/libc-2.17.so
7f0217f9-7f0217fb -  file=/usr/lib64/libc-2.17.so
..

The 'pmap' command can be enhanced to include an option to show numa node
information which it can read from this new proc file. This will be a
follow on proposal.

There have been couple of previous patch proposals to provide numa node
information based on pfn or physical address. They seem to have not made
progress. Also it would appear reading numa node information based on PFN
or physical address will require privileges(CAP_SYS_ADMIN) similar to
reading PFN info from /proc//pagemap.

See
https://marc.info/?t=13963093821=1=2

https://marc.info/?t=13971872441=1=2

Signed-off-by: Prakash Sangappa 
---
 fs/proc/base.c |   2 +
 fs/proc/internal.h |   3 +
 fs/proc/task_mmu.c | 299 -
 3 files changed, 303 insertions(+), 1 deletion(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1b2ede6..8fd7cc5 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2960,6 +2960,7 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("maps",   S_IRUGO, proc_pid_maps_operations),
 #ifdef CONFIG_NUMA
REG("numa_maps",  S_IRUGO, proc_pid_numa_maps_operations),
+   REG("numa_vamaps",  S_IRUGO, proc_pid_numa_vamaps_operations),
 #endif
REG("mem",S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd",proc_cwd_link),
@@ -3352,6 +3353,7 @@ static const struct pid_entry tid_base_stuff[] = {
 #endif
 #ifdef CONFIG_NUMA
REG("numa_maps", S_IRUGO, proc_tid_numa_maps_operations),
+   REG("numa_vamaps",  S_IRUGO, proc_tid_numa_vamaps_operations),
 #endif
REG("mem",   S_IRUSR|S_IWUSR, proc_mem_operations),
LNK("cwd",   proc_cwd_link),
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 0f1692e..9a3ff80 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -273,6 +273,7 @@ struct proc_maps_private {
 #ifdef CONFIG_NUMA
struct mempolicy *task_mempolicy;
 #endif
+   u64 vma_off;
 } __randomize_layout;
 
 struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
@@ -280,7 +281,9 @@ struct mm_struct *proc_mem_open(struct inode *inode, 
unsigned int mode);
 extern const struct file_operations proc_pid_maps_operations;
 extern const struct file_operations proc_tid_maps_operations;
 extern const struct file_operations proc_pid_numa_maps_operations;
+extern const struct file_operations proc_pid_numa_vamaps_operations;
 extern const struct file_operations proc_tid_numa_maps_operations;
+extern const struct file_operations proc_tid_numa_vamaps_operations;
 extern const struct file_operations 

Re: linux-next: manual merge of the bpf-next tree with the bpf tree

2018-05-01 Thread Stephen Rothwell
Hi Song,

On Wed, 2 May 2018 04:40:20 + Song Liu  wrote:
>
> > -   CHECK(build_id_matches < 1, "build id match",
> > - "Didn't find expected build ID from the map\n");
> > +   if (CHECK(build_id_matches < 1, "build id match",
> > - "Didn't find expected build ID from the map"))
> > ++"Didn't find expected build ID from the map\n"))  
> 
> ^^^  Is there a "+" at the beginning of this line? 

No, this is a merge resolution diff, so the ++ means that the line did
not appear in either parent commit.

-- 
Cheers,
Stephen Rothwell


pgpja8JnP0Hum.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the bpf-next tree with the bpf tree

2018-05-01 Thread Stephen Rothwell
Hi Song,

On Wed, 2 May 2018 04:40:20 + Song Liu  wrote:
>
> > -   CHECK(build_id_matches < 1, "build id match",
> > - "Didn't find expected build ID from the map\n");
> > +   if (CHECK(build_id_matches < 1, "build id match",
> > - "Didn't find expected build ID from the map"))
> > ++"Didn't find expected build ID from the map\n"))  
> 
> ^^^  Is there a "+" at the beginning of this line? 

No, this is a merge resolution diff, so the ++ means that the line did
not appear in either parent commit.

-- 
Cheers,
Stephen Rothwell


pgpja8JnP0Hum.pgp
Description: OpenPGP digital signature


Re: [PATCH 1/2] i2c: sprd: Prevent i2c accesses after suspend is called

2018-05-01 Thread Baolin Wang
On 2 May 2018 at 13:23, Wolfram Sang  wrote:
>
>> > We should maybe handle this in the core somewhen, though. Or?
>>
>> Thanks. Yes, It will more helpful if we can handle this in the i2c core.
>
> To understand the issue better: which kind of devices in your system
> still send I2C transfers when the system is going to suspend? Do you
> know?

Now we found the touch screen device will trigger I2C transfers when
the system is going to suspend.

-- 
Baolin.wang
Best Regards


Re: [PATCH 1/2] i2c: sprd: Prevent i2c accesses after suspend is called

2018-05-01 Thread Baolin Wang
On 2 May 2018 at 13:23, Wolfram Sang  wrote:
>
>> > We should maybe handle this in the core somewhen, though. Or?
>>
>> Thanks. Yes, It will more helpful if we can handle this in the i2c core.
>
> To understand the issue better: which kind of devices in your system
> still send I2C transfers when the system is going to suspend? Do you
> know?

Now we found the touch screen device will trigger I2C transfers when
the system is going to suspend.

-- 
Baolin.wang
Best Regards


Re: [PATCH] staging: wilc1000: fix infinite loop and out-of-bounds access

2018-05-01 Thread Ajay Singh
On Mon, 30 Apr 2018 18:23:21 +0300
Dan Carpenter  wrote:

> On Mon, Apr 30, 2018 at 07:59:16PM +0530, Ajay Singh wrote:
> > Reviewed-by: Ajay Singh 
> > 
> > On Mon, 30 Apr 2018 07:50:40 -0500
> > "Gustavo A. R. Silva"  wrote:
> >   
> > > If i < slot_id is initially true then it will remain true. Also,
> > > as i is being decremented it will end up accessing memory out of
> > > bounds.
> > > 
> > > Fix this by incrementing *i* instead of decrementing it.  
> > 
> > Nice catch!
> > Thanks for submitting the changes.
> >   
> > > 
> > > Addresses-Coverity-ID: 1468454 ("Infinite loop")
> > > Fixes: faa657641081 ("staging: wilc1000: refactor scan() to free
> > > kmalloc memory on failure cases")
> > > Signed-off-by: Gustavo A. R. Silva 
> > > ---
> > > 
> > > BTW... at first sight it seems to me that variables slot_id
> > > and i should be of type unsigned instead of signed.  
> > 
> > Yes, 'slot_id' & 'i' can be changed to unsigned int.
> >   
> 
> A lot of people think making things unsigned improves the code but I
> hate "unsigned int i"...  I understand that in certain cases we do
> loop more than INT_MAX but those are a tiny tiny minority of loops.
> 
> Simple types like "int" are more readable than complicated types
> like "unsigned int".  Unsigned int just draws attention to itself
> and distracts people from things that might potentially matter.  We
> have real life subtle loops like in xtea_encrypt() where we need to
> use unsigned types.
> 
> And look at the function we're talking about here:
> 
> drivers/staging/wilc1000/wilc_wfi_cfgoperations.c
>577  static inline int
>578  wilc_wfi_cfg_alloc_fill_ssid(struct cfg80211_scan_request
> *request, 579   struct hidden_network
> *ntwk) 580  {
>581  int i;
>582  int slot_id = 0;
>583  
>584  ntwk->net_info = kcalloc(request->n_ssids,
>585   sizeof(struct
> hidden_network), GFP_KERNEL); 586  if (!ntwk->net_info)
>587  goto out;
>588  
>589  ntwk->n_ssids = request->n_ssids;
>590  
>591  for (i = 0; i < request->n_ssids; i++) {
> 
> request->n_ssids is an int.  It can't possibly be INT_MAX because
> the kcalloc() will fail.  Ideally the static analysis tool should
> be able to tell you that if you seed it with the knowledge of how
> much memory it's possible to kmalloc().  So it's just laziness here
> is why the static checker complains, it should see there is no
> issue.  Smatch fails here as well but I'll see if I can fix it.
> 
> Anyway let's say it could be negative then 0 is greater than
> negative values so the loop would be a no-op.  I've seen code where
> the user could set the loop bounds to s32min-4 but because it was
> "int i" instead of "unsigned int i" then it meant the loop was a
> no-op instead of being a security problem.  In other words,
> unsigned can be less secure.
> 
> I honestly have never seen a bug in the kernel where we intended to
> loop more than INT_MAX times, but there was a signedness bug
> preventing it. That's the only reason I can see to change the
> signedness.  Am I missing something?
> 

Hi Dan,

Thanks for providing the detailed explanation.

I understand your point, but what's your opinion about variable
'slot_id', as this variable is added to take only the positive
value(i.e from 0 to maximum number of filled elements), so for more
readability should we keep it same.

BTW in my opinion 'request->n_ssids' should have been unsigned
because it is suppose to hold only positive values along with 
smaller data type(may be u8 or u16, as the range might be enough to
hold the value of n_ssids). So we have advantage of size reduction
not just the increase in value range.

Suppose we get negative value for "request->n_ssids" & kmalloc is 
pass then we have to add some more extra code to free data and
return error code in wilc_wfi_cfg_alloc_fill_ssid(). In your opinion
should this condition be handled as the received 'request->n_ssids' is
of 'int' type.


Regards,
Ajay



Re: [PATCH] staging: wilc1000: fix infinite loop and out-of-bounds access

2018-05-01 Thread Ajay Singh
On Mon, 30 Apr 2018 18:23:21 +0300
Dan Carpenter  wrote:

> On Mon, Apr 30, 2018 at 07:59:16PM +0530, Ajay Singh wrote:
> > Reviewed-by: Ajay Singh 
> > 
> > On Mon, 30 Apr 2018 07:50:40 -0500
> > "Gustavo A. R. Silva"  wrote:
> >   
> > > If i < slot_id is initially true then it will remain true. Also,
> > > as i is being decremented it will end up accessing memory out of
> > > bounds.
> > > 
> > > Fix this by incrementing *i* instead of decrementing it.  
> > 
> > Nice catch!
> > Thanks for submitting the changes.
> >   
> > > 
> > > Addresses-Coverity-ID: 1468454 ("Infinite loop")
> > > Fixes: faa657641081 ("staging: wilc1000: refactor scan() to free
> > > kmalloc memory on failure cases")
> > > Signed-off-by: Gustavo A. R. Silva 
> > > ---
> > > 
> > > BTW... at first sight it seems to me that variables slot_id
> > > and i should be of type unsigned instead of signed.  
> > 
> > Yes, 'slot_id' & 'i' can be changed to unsigned int.
> >   
> 
> A lot of people think making things unsigned improves the code but I
> hate "unsigned int i"...  I understand that in certain cases we do
> loop more than INT_MAX but those are a tiny tiny minority of loops.
> 
> Simple types like "int" are more readable than complicated types
> like "unsigned int".  Unsigned int just draws attention to itself
> and distracts people from things that might potentially matter.  We
> have real life subtle loops like in xtea_encrypt() where we need to
> use unsigned types.
> 
> And look at the function we're talking about here:
> 
> drivers/staging/wilc1000/wilc_wfi_cfgoperations.c
>577  static inline int
>578  wilc_wfi_cfg_alloc_fill_ssid(struct cfg80211_scan_request
> *request, 579   struct hidden_network
> *ntwk) 580  {
>581  int i;
>582  int slot_id = 0;
>583  
>584  ntwk->net_info = kcalloc(request->n_ssids,
>585   sizeof(struct
> hidden_network), GFP_KERNEL); 586  if (!ntwk->net_info)
>587  goto out;
>588  
>589  ntwk->n_ssids = request->n_ssids;
>590  
>591  for (i = 0; i < request->n_ssids; i++) {
> 
> request->n_ssids is an int.  It can't possibly be INT_MAX because
> the kcalloc() will fail.  Ideally the static analysis tool should
> be able to tell you that if you seed it with the knowledge of how
> much memory it's possible to kmalloc().  So it's just laziness here
> is why the static checker complains, it should see there is no
> issue.  Smatch fails here as well but I'll see if I can fix it.
> 
> Anyway let's say it could be negative then 0 is greater than
> negative values so the loop would be a no-op.  I've seen code where
> the user could set the loop bounds to s32min-4 but because it was
> "int i" instead of "unsigned int i" then it meant the loop was a
> no-op instead of being a security problem.  In other words,
> unsigned can be less secure.
> 
> I honestly have never seen a bug in the kernel where we intended to
> loop more than INT_MAX times, but there was a signedness bug
> preventing it. That's the only reason I can see to change the
> signedness.  Am I missing something?
> 

Hi Dan,

Thanks for providing the detailed explanation.

I understand your point, but what's your opinion about variable
'slot_id', as this variable is added to take only the positive
value(i.e from 0 to maximum number of filled elements), so for more
readability should we keep it same.

BTW in my opinion 'request->n_ssids' should have been unsigned
because it is suppose to hold only positive values along with 
smaller data type(may be u8 or u16, as the range might be enough to
hold the value of n_ssids). So we have advantage of size reduction
not just the increase in value range.

Suppose we get negative value for "request->n_ssids" & kmalloc is 
pass then we have to add some more extra code to free data and
return error code in wilc_wfi_cfg_alloc_fill_ssid(). In your opinion
should this condition be handled as the received 'request->n_ssids' is
of 'int' type.


Regards,
Ajay



RE: RTL8723BE performance regression

2018-05-01 Thread Pkshih


> -Original Message-
> From: João Paulo Rechi Vita [mailto:jprv...@gmail.com]
> Sent: Wednesday, May 02, 2018 6:41 AM
> To: Larry Finger
> Cc: Steve deRosier; 莊彥宣; Pkshih; Birming Chiu; Shaofu; Steven Ting; 
> Chaoming_Li; Kalle Valo;
> linux-wireless; Network Development; LKML; Daniel Drake; João Paulo Rechi 
> Vita; li...@endlessm.com
> Subject: Re: RTL8723BE performance regression
> 
> On Tue, Apr 3, 2018 at 7:51 PM, Larry Finger  
> wrote:
> > On 04/03/2018 09:37 PM, João Paulo Rechi Vita wrote:
> >>
> >> On Tue, Apr 3, 2018 at 7:28 PM, Larry Finger 
> >> wrote:
> >>
> >> (...)
> >>
> >>> As the antenna selection code changes affected your first bisection, do
> >>> you
> >>> have one of those HP laptops with only one antenna and the incorrect
> >>> coding
> >>> in the FUSE?
> >>
> >>
> >> Yes, that is why I've been passing ant_sel=1 during my tests -- this
> >> was needed to achieve a good performance in the past, before this
> >> regression. I've also opened the laptop chassis and confirmed the
> >> antenna cable is plugged to the connector labeled with "1" on the
> >> card.
> >>
> >>> If so, please make sure that you still have the same signal
> >>> strength for good and bad cases. I have tried to keep the driver and the
> >>> btcoex code in sync, but there may be some combinations of antenna
> >>> configuration and FUSE contents that cause the code to fail.
> >>>
> >>
> >> What is the recommended way to monitor the signal strength?
> >
> >
> > The btcoex code is developed for multiple platforms by a different group
> > than the Linux driver. I think they made a change that caused ant_sel to
> > switch from 1 to 2. At least numerous comments at
> > github.com/lwfinger/rtlwifi_new claimed they needed to make that change.
> >
> > Mhy recommended method is to verify the wifi device name with "iw dev". Then
> > using that device
> >
> > sudo iw dev  scan | egrep "SSID|signal"
> >
> 
> I have confirmed that the performance regression is indeed tied to
> signal strength: on the good cases signal was between -16 and -8 dBm,
> whereas in bad cases signal was always between -50 to - 40 dBm. I've
> also switched to testing bandwidth in controlled LAN environment using
> iperf3, as suggested by Steve deRosier, with the DUT being the only
> machine connected to the 2.4 GHz radio and the machine running the
> iperf3 server connected via ethernet.
> 

We have new experimental results in commit af8a41cccf8f46 ("rtlwifi: cleanup 
8723be ant_sel definition"). You can use the above commit and do the same 
experiments (with ant_sel=0, 1 and 2) in your side, and then share your results.
Since performance is tied to signal strength, you can only share signal 
strength.

Regards
PK



RE: RTL8723BE performance regression

2018-05-01 Thread Pkshih


> -Original Message-
> From: João Paulo Rechi Vita [mailto:jprv...@gmail.com]
> Sent: Wednesday, May 02, 2018 6:41 AM
> To: Larry Finger
> Cc: Steve deRosier; 莊彥宣; Pkshih; Birming Chiu; Shaofu; Steven Ting; 
> Chaoming_Li; Kalle Valo;
> linux-wireless; Network Development; LKML; Daniel Drake; João Paulo Rechi 
> Vita; li...@endlessm.com
> Subject: Re: RTL8723BE performance regression
> 
> On Tue, Apr 3, 2018 at 7:51 PM, Larry Finger  
> wrote:
> > On 04/03/2018 09:37 PM, João Paulo Rechi Vita wrote:
> >>
> >> On Tue, Apr 3, 2018 at 7:28 PM, Larry Finger 
> >> wrote:
> >>
> >> (...)
> >>
> >>> As the antenna selection code changes affected your first bisection, do
> >>> you
> >>> have one of those HP laptops with only one antenna and the incorrect
> >>> coding
> >>> in the FUSE?
> >>
> >>
> >> Yes, that is why I've been passing ant_sel=1 during my tests -- this
> >> was needed to achieve a good performance in the past, before this
> >> regression. I've also opened the laptop chassis and confirmed the
> >> antenna cable is plugged to the connector labeled with "1" on the
> >> card.
> >>
> >>> If so, please make sure that you still have the same signal
> >>> strength for good and bad cases. I have tried to keep the driver and the
> >>> btcoex code in sync, but there may be some combinations of antenna
> >>> configuration and FUSE contents that cause the code to fail.
> >>>
> >>
> >> What is the recommended way to monitor the signal strength?
> >
> >
> > The btcoex code is developed for multiple platforms by a different group
> > than the Linux driver. I think they made a change that caused ant_sel to
> > switch from 1 to 2. At least numerous comments at
> > github.com/lwfinger/rtlwifi_new claimed they needed to make that change.
> >
> > Mhy recommended method is to verify the wifi device name with "iw dev". Then
> > using that device
> >
> > sudo iw dev  scan | egrep "SSID|signal"
> >
> 
> I have confirmed that the performance regression is indeed tied to
> signal strength: on the good cases signal was between -16 and -8 dBm,
> whereas in bad cases signal was always between -50 to - 40 dBm. I've
> also switched to testing bandwidth in controlled LAN environment using
> iperf3, as suggested by Steve deRosier, with the DUT being the only
> machine connected to the 2.4 GHz radio and the machine running the
> iperf3 server connected via ethernet.
> 

We have new experimental results in commit af8a41cccf8f46 ("rtlwifi: cleanup 
8723be ant_sel definition"). You can use the above commit and do the same 
experiments (with ant_sel=0, 1 and 2) in your side, and then share your results.
Since performance is tied to signal strength, you can only share signal 
strength.

Regards
PK



Re: [PATCH] hwtracing: intel_th: Change return type to vm_fault_t

2018-05-01 Thread Souptick Joarder
Any comment for this patch ?

On Tue, Apr 17, 2018 at 7:45 PM, Souptick Joarder  wrote:
> Use new return type vm_fault_t for fault handler. For
> now, this is just documenting that the function returns
> a VM_FAULT value rather than an errno. Once all instances
> are converted, vm_fault_t will become a distinct type.
>
> Reference id -> 1c8f422059ae ("mm: change return type to
> vm_fault_t")
>
> Signed-off-by: Souptick Joarder 
> ---
>  drivers/hwtracing/intel_th/msu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/intel_th/msu.c 
> b/drivers/hwtracing/intel_th/msu.c
> index dfb57ea..0ac84eb 100644
> --- a/drivers/hwtracing/intel_th/msu.c
> +++ b/drivers/hwtracing/intel_th/msu.c
> @@ -1190,7 +1190,7 @@ static void msc_mmap_close(struct vm_area_struct *vma)
> mutex_unlock(>buf_mutex);
>  }
>
> -static int msc_mmap_fault(struct vm_fault *vmf)
> +static vm_fault_t msc_mmap_fault(struct vm_fault *vmf)
>  {
> struct msc_iter *iter = vmf->vma->vm_file->private_data;
> struct msc *msc = iter->msc;
> --
> 1.9.1
>


Re: [PATCH] hwtracing: intel_th: Change return type to vm_fault_t

2018-05-01 Thread Souptick Joarder
Any comment for this patch ?

On Tue, Apr 17, 2018 at 7:45 PM, Souptick Joarder  wrote:
> Use new return type vm_fault_t for fault handler. For
> now, this is just documenting that the function returns
> a VM_FAULT value rather than an errno. Once all instances
> are converted, vm_fault_t will become a distinct type.
>
> Reference id -> 1c8f422059ae ("mm: change return type to
> vm_fault_t")
>
> Signed-off-by: Souptick Joarder 
> ---
>  drivers/hwtracing/intel_th/msu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/intel_th/msu.c 
> b/drivers/hwtracing/intel_th/msu.c
> index dfb57ea..0ac84eb 100644
> --- a/drivers/hwtracing/intel_th/msu.c
> +++ b/drivers/hwtracing/intel_th/msu.c
> @@ -1190,7 +1190,7 @@ static void msc_mmap_close(struct vm_area_struct *vma)
> mutex_unlock(>buf_mutex);
>  }
>
> -static int msc_mmap_fault(struct vm_fault *vmf)
> +static vm_fault_t msc_mmap_fault(struct vm_fault *vmf)
>  {
> struct msc_iter *iter = vmf->vma->vm_file->private_data;
> struct msc *msc = iter->msc;
> --
> 1.9.1
>


Re: [greybus-dev] [PATCH v2] staging: greybus: Use gpio_is_valid()

2018-05-01 Thread Viresh Kumar
On 28-04-18, 10:05, Arvind Yadav wrote:
> Replace the manual validity checks for the GPIO with the
> gpio_is_valid().
> 
> Signed-off-by: Arvind Yadav 
> ---
> chnage in v2 :
>  Returning invalid gpio as error instead of -ENODEV.
> 
>  drivers/staging/greybus/arche-platform.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/staging/greybus/arche-platform.c 
> b/drivers/staging/greybus/arche-platform.c
> index 83254a7..c3a7da5 100644
> --- a/drivers/staging/greybus/arche-platform.c
> +++ b/drivers/staging/greybus/arche-platform.c
> @@ -448,7 +448,7 @@ static int arche_platform_probe(struct platform_device 
> *pdev)
>   arche_pdata->svc_reset_gpio = of_get_named_gpio(np,
>   "svc,reset-gpio",
>   0);
> - if (arche_pdata->svc_reset_gpio < 0) {
> + if (!gpio_is_valid(arche_pdata->svc_reset_gpio)) {
>   dev_err(dev, "failed to get reset-gpio\n");
>   return arche_pdata->svc_reset_gpio;
>   }
> @@ -468,7 +468,7 @@ static int arche_platform_probe(struct platform_device 
> *pdev)
>   arche_pdata->svc_sysboot_gpio = of_get_named_gpio(np,
> "svc,sysboot-gpio",
> 0);
> - if (arche_pdata->svc_sysboot_gpio < 0) {
> + if (!gpio_is_valid(arche_pdata->svc_sysboot_gpio)) {
>   dev_err(dev, "failed to get sysboot gpio\n");
>   return arche_pdata->svc_sysboot_gpio;
>   }
> @@ -487,7 +487,7 @@ static int arche_platform_probe(struct platform_device 
> *pdev)
>   arche_pdata->svc_refclk_req = of_get_named_gpio(np,
>   "svc,refclk-req-gpio",
>   0);
> - if (arche_pdata->svc_refclk_req < 0) {
> + if (!gpio_is_valid(arche_pdata->svc_refclk_req)) {
>   dev_err(dev, "failed to get svc clock-req gpio\n");
>   return arche_pdata->svc_refclk_req;
>   }

Acked-by: Viresh Kumar 

-- 
viresh


Re: [greybus-dev] [PATCH v2] staging: greybus: Use gpio_is_valid()

2018-05-01 Thread Viresh Kumar
On 28-04-18, 10:05, Arvind Yadav wrote:
> Replace the manual validity checks for the GPIO with the
> gpio_is_valid().
> 
> Signed-off-by: Arvind Yadav 
> ---
> chnage in v2 :
>  Returning invalid gpio as error instead of -ENODEV.
> 
>  drivers/staging/greybus/arche-platform.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/staging/greybus/arche-platform.c 
> b/drivers/staging/greybus/arche-platform.c
> index 83254a7..c3a7da5 100644
> --- a/drivers/staging/greybus/arche-platform.c
> +++ b/drivers/staging/greybus/arche-platform.c
> @@ -448,7 +448,7 @@ static int arche_platform_probe(struct platform_device 
> *pdev)
>   arche_pdata->svc_reset_gpio = of_get_named_gpio(np,
>   "svc,reset-gpio",
>   0);
> - if (arche_pdata->svc_reset_gpio < 0) {
> + if (!gpio_is_valid(arche_pdata->svc_reset_gpio)) {
>   dev_err(dev, "failed to get reset-gpio\n");
>   return arche_pdata->svc_reset_gpio;
>   }
> @@ -468,7 +468,7 @@ static int arche_platform_probe(struct platform_device 
> *pdev)
>   arche_pdata->svc_sysboot_gpio = of_get_named_gpio(np,
> "svc,sysboot-gpio",
> 0);
> - if (arche_pdata->svc_sysboot_gpio < 0) {
> + if (!gpio_is_valid(arche_pdata->svc_sysboot_gpio)) {
>   dev_err(dev, "failed to get sysboot gpio\n");
>   return arche_pdata->svc_sysboot_gpio;
>   }
> @@ -487,7 +487,7 @@ static int arche_platform_probe(struct platform_device 
> *pdev)
>   arche_pdata->svc_refclk_req = of_get_named_gpio(np,
>   "svc,refclk-req-gpio",
>   0);
> - if (arche_pdata->svc_refclk_req < 0) {
> + if (!gpio_is_valid(arche_pdata->svc_refclk_req)) {
>   dev_err(dev, "failed to get svc clock-req gpio\n");
>   return arche_pdata->svc_refclk_req;
>   }

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH 3/6] PCI: Introduce devm_of_pci_get_host_bridge_resources

2018-05-01 Thread Jan Kiszka
On 2018-04-30 20:43, Sinan Kaya wrote:
> On 4/30/2018 2:40 PM, Bjorn Helgaas wrote:
>> On Sat, Apr 28, 2018 at 09:28:47AM +0200, Jan Kiszka wrote:
>>> On 2018-04-28 00:24, Bjorn Helgaas wrote:
 On Tue, Apr 24, 2018 at 05:13:39PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
>
> of_pci_get_host_bridge_resources allocates the resource structures it
> fills dynamically, but none of its callers care to release them so far.
> Rather than requiring everyone to do this explicitly, introduce a
> managed version of that service. This differs API-wise only in taking a
> reference to the associated device, rather than to the device tree node.
>
> As of_pci_get_host_bridge_resources is an exported interface, we cannot
> simply drop it at this point. After converting all in-tree users to the
> new API, we could phase out the unmanaged one over some grace period.

 It looks like it might be possible to split this into three or four
 patches:

   1) Factor __of_pci_get_host_bridge_resources() out of
  of_pci_get_host_bridge_resources()

   2) Add struct device * argument

   3) Convert pr_info() to dev_info()

   4) Add devm_of_pci_get_host_bridge_resources()
>>>
>>> Will do. I'm even considering
>>>
>>> 5) mark of_pci_get_host_bridge_resources() __deprecated, due to the leak
>>>and no remaining in-tree user - what do you think?
>>
>> Sounds good.
>>
>> It'd be nice if we had some guideline about deprecation -- whether we
>> actually need to mark things __deprecated, and then how long to wait
>> before actually removing them, but I don't see anything in
>> Documentation/.
> 
> I'm under the impression that we don't quite care about out-of-tree drivers.
> I have seen many times out-of-tree drivers to be broken due to API changes,
> renames or even parameter meaning change. 
> 
> If the plan is to remove the API, just remove the API today.

I've already sent a __deprecated patch on Monday [1], in v2 of this
series [2]. Please vote against that there, and I will replace it with a
complete removal of that API.

Thanks,
Jan

[1] https://lkml.org/lkml/2018/4/30/22
[2] https://lkml.org/lkml/2018/4/30/24

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


Re: [PATCH 3/6] PCI: Introduce devm_of_pci_get_host_bridge_resources

2018-05-01 Thread Jan Kiszka
On 2018-04-30 20:43, Sinan Kaya wrote:
> On 4/30/2018 2:40 PM, Bjorn Helgaas wrote:
>> On Sat, Apr 28, 2018 at 09:28:47AM +0200, Jan Kiszka wrote:
>>> On 2018-04-28 00:24, Bjorn Helgaas wrote:
 On Tue, Apr 24, 2018 at 05:13:39PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
>
> of_pci_get_host_bridge_resources allocates the resource structures it
> fills dynamically, but none of its callers care to release them so far.
> Rather than requiring everyone to do this explicitly, introduce a
> managed version of that service. This differs API-wise only in taking a
> reference to the associated device, rather than to the device tree node.
>
> As of_pci_get_host_bridge_resources is an exported interface, we cannot
> simply drop it at this point. After converting all in-tree users to the
> new API, we could phase out the unmanaged one over some grace period.

 It looks like it might be possible to split this into three or four
 patches:

   1) Factor __of_pci_get_host_bridge_resources() out of
  of_pci_get_host_bridge_resources()

   2) Add struct device * argument

   3) Convert pr_info() to dev_info()

   4) Add devm_of_pci_get_host_bridge_resources()
>>>
>>> Will do. I'm even considering
>>>
>>> 5) mark of_pci_get_host_bridge_resources() __deprecated, due to the leak
>>>and no remaining in-tree user - what do you think?
>>
>> Sounds good.
>>
>> It'd be nice if we had some guideline about deprecation -- whether we
>> actually need to mark things __deprecated, and then how long to wait
>> before actually removing them, but I don't see anything in
>> Documentation/.
> 
> I'm under the impression that we don't quite care about out-of-tree drivers.
> I have seen many times out-of-tree drivers to be broken due to API changes,
> renames or even parameter meaning change. 
> 
> If the plan is to remove the API, just remove the API today.

I've already sent a __deprecated patch on Monday [1], in v2 of this
series [2]. Please vote against that there, and I will replace it with a
complete removal of that API.

Thanks,
Jan

[1] https://lkml.org/lkml/2018/4/30/22
[2] https://lkml.org/lkml/2018/4/30/24

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Dan Williams
On Tue, May 1, 2018 at 9:14 PM, Linus Torvalds
 wrote:
> On Tue, May 1, 2018 at 9:00 PM Dan Williams 
> wrote:
>> >
>> > I  have some dim memory of "rep movs doesn't work well for pmem", but
> does
>> > it *seriously* need unrolling to cacheline boundaries? And if it does,
> who
>> > designed it, and why is anybody using it?
>> >
>
>> I think this is an FAQ from the original submission, in fact some guy
>> named "Linus Torvalds" asked [1]:
>
> Oh, I already mentioned that  I remembered that "rep movs" didn't work well.
>
> But there's a big gap between "just use 'rep movs' and 'do some cacheline
> unrollong'".
>
> Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
> the unrolling that it will already do on its own?
>
> I may have gotten that answered too, but there's no comment in the code
> about why it's such a disgusting mess, so I've long since forgotten _why_
> it's such a disgusting mess.
>
> That loop unrolling _used_ to be "hey, it's simple".
>
> Now it's "Hey, that's truly disgusting", with the separate fault handling
> for every single case in the unrolled loop.
>
> Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
> labels, and getting the number rof bytes copied right.
>
> And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
> COULD GET RID OF ALL OF THOSE?"
>
> Maybe you already did ask yourself.  But I'm asking because it sure isn't
> explained in the code.

Ah, sorry. Yeah, I don't see a good reason to keep the unrolling. It
would definitely clean up the fault handling, I'll respin.


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Dan Williams
On Tue, May 1, 2018 at 9:14 PM, Linus Torvalds
 wrote:
> On Tue, May 1, 2018 at 9:00 PM Dan Williams 
> wrote:
>> >
>> > I  have some dim memory of "rep movs doesn't work well for pmem", but
> does
>> > it *seriously* need unrolling to cacheline boundaries? And if it does,
> who
>> > designed it, and why is anybody using it?
>> >
>
>> I think this is an FAQ from the original submission, in fact some guy
>> named "Linus Torvalds" asked [1]:
>
> Oh, I already mentioned that  I remembered that "rep movs" didn't work well.
>
> But there's a big gap between "just use 'rep movs' and 'do some cacheline
> unrollong'".
>
> Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
> the unrolling that it will already do on its own?
>
> I may have gotten that answered too, but there's no comment in the code
> about why it's such a disgusting mess, so I've long since forgotten _why_
> it's such a disgusting mess.
>
> That loop unrolling _used_ to be "hey, it's simple".
>
> Now it's "Hey, that's truly disgusting", with the separate fault handling
> for every single case in the unrolled loop.
>
> Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
> labels, and getting the number rof bytes copied right.
>
> And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
> COULD GET RID OF ALL OF THOSE?"
>
> Maybe you already did ask yourself.  But I'm asking because it sure isn't
> explained in the code.

Ah, sorry. Yeah, I don't see a good reason to keep the unrolling. It
would definitely clean up the fault handling, I'll respin.


Re: [PATCH] cpufreq: s3c2440: fix spelling mistake: "divsiors" -> "divisors"

2018-05-01 Thread Viresh Kumar
On 30-04-18, 15:48, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in s3c_freq_dbg debug message text.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/cpufreq/s3c2440-cpufreq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/s3c2440-cpufreq.c 
> b/drivers/cpufreq/s3c2440-cpufreq.c
> index d0d75b65ddd6..d2f67b7a20dd 100644
> --- a/drivers/cpufreq/s3c2440-cpufreq.c
> +++ b/drivers/cpufreq/s3c2440-cpufreq.c
> @@ -143,7 +143,7 @@ static void s3c2440_cpufreq_setdivs(struct 
> s3c_cpufreq_config *cfg)
>  {
>   unsigned long clkdiv, camdiv;
>  
> - s3c_freq_dbg("%s: divsiors: h=%d, p=%d\n", __func__,
> + s3c_freq_dbg("%s: divisors: h=%d, p=%d\n", __func__,
>cfg->divs.h_divisor, cfg->divs.p_divisor);
>  
>   clkdiv = __raw_readl(S3C2410_CLKDIVN);

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH] cpufreq: s3c2440: fix spelling mistake: "divsiors" -> "divisors"

2018-05-01 Thread Viresh Kumar
On 30-04-18, 15:48, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in s3c_freq_dbg debug message text.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/cpufreq/s3c2440-cpufreq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/s3c2440-cpufreq.c 
> b/drivers/cpufreq/s3c2440-cpufreq.c
> index d0d75b65ddd6..d2f67b7a20dd 100644
> --- a/drivers/cpufreq/s3c2440-cpufreq.c
> +++ b/drivers/cpufreq/s3c2440-cpufreq.c
> @@ -143,7 +143,7 @@ static void s3c2440_cpufreq_setdivs(struct 
> s3c_cpufreq_config *cfg)
>  {
>   unsigned long clkdiv, camdiv;
>  
> - s3c_freq_dbg("%s: divsiors: h=%d, p=%d\n", __func__,
> + s3c_freq_dbg("%s: divisors: h=%d, p=%d\n", __func__,
>cfg->divs.h_divisor, cfg->divs.p_divisor);
>  
>   clkdiv = __raw_readl(S3C2410_CLKDIVN);

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH] coresight: Remove %px for printing pcsr value

2018-05-01 Thread Kees Cook
On Tue, May 1, 2018 at 10:00 PM, Leo Yan  wrote:
> The driver prints pcsr twice: the first time it uses specifier %px to
> print hexadecimal pcsr value and the second time uses specifier %pS for
> output kernel symbols.
>
> As suggested by Kees, using %pS should be sufficient and %px isn't
> necessary; the reason is if the pcsr is a kernel space address, we can
> easily get to know the code line from %pS format, on the other hand, if
> the pcsr value doesn't fall into kernel space range (e.g. if the CPU is
> stuck in firmware), %pS also gives out pcsr hexadecimal value.
>
> So this commit removes useless %px and update section "Output format"
> in the document for alignment between the code and document.
>
> Suggested-by: Kees Cook 
> Cc: Mathieu Poirier 
> Signed-off-by: Leo Yan 

Thanks!

Reviewed-by: Kees Cook 

-Kees

> ---
>  Documentation/trace/coresight-cpu-debug.txt   | 4 ++--
>  drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/trace/coresight-cpu-debug.txt 
> b/Documentation/trace/coresight-cpu-debug.txt
> index 2b9b51c..89ab09e 100644
> --- a/Documentation/trace/coresight-cpu-debug.txt
> +++ b/Documentation/trace/coresight-cpu-debug.txt
> @@ -177,11 +177,11 @@ Here is an example of the debugging output format:
>  ARM external debug module:
>  coresight-cpu-debug 85.debug: CPU[0]:
>  coresight-cpu-debug 85.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
> -coresight-cpu-debug 85.debug:  EDPCSR:  [] 
> handle_IPI+0x174/0x1d8
> +coresight-cpu-debug 85.debug:  EDPCSR:  handle_IPI+0x174/0x1d8
>  coresight-cpu-debug 85.debug:  EDCIDSR: 
>  coresight-cpu-debug 85.debug:  EDVIDSR: 9000 (State:Non-secure 
> Mode:EL1/0 Width:64bits VMID:0)
>  coresight-cpu-debug 852000.debug: CPU[1]:
>  coresight-cpu-debug 852000.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
> -coresight-cpu-debug 852000.debug:  EDPCSR:  [] 
> debug_notifier_call+0x23c/0x358
> +coresight-cpu-debug 852000.debug:  EDPCSR:  debug_notifier_call+0x23c/0x358
>  coresight-cpu-debug 852000.debug:  EDCIDSR: 
>  coresight-cpu-debug 852000.debug:  EDVIDSR: 9000 (State:Non-secure 
> Mode:EL1/0 Width:64bits VMID:0)
> diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
> b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> index 9cdb3fb..78a054e 100644
> --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
> +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> @@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata *drvdata)
> }
>
> pc = debug_adjust_pc(drvdata);
> -   dev_emerg(dev, " EDPCSR:  [<%px>] %pS\n", (void *)pc, (void *)pc);
> +   dev_emerg(dev, " EDPCSR:  %pS\n", (void *)pc);
>
> if (drvdata->edcidsr_present)
> dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr);
> --
> 2.7.4
>



-- 
Kees Cook
Pixel Security


Re: [PATCH] coresight: Remove %px for printing pcsr value

2018-05-01 Thread Kees Cook
On Tue, May 1, 2018 at 10:00 PM, Leo Yan  wrote:
> The driver prints pcsr twice: the first time it uses specifier %px to
> print hexadecimal pcsr value and the second time uses specifier %pS for
> output kernel symbols.
>
> As suggested by Kees, using %pS should be sufficient and %px isn't
> necessary; the reason is if the pcsr is a kernel space address, we can
> easily get to know the code line from %pS format, on the other hand, if
> the pcsr value doesn't fall into kernel space range (e.g. if the CPU is
> stuck in firmware), %pS also gives out pcsr hexadecimal value.
>
> So this commit removes useless %px and update section "Output format"
> in the document for alignment between the code and document.
>
> Suggested-by: Kees Cook 
> Cc: Mathieu Poirier 
> Signed-off-by: Leo Yan 

Thanks!

Reviewed-by: Kees Cook 

-Kees

> ---
>  Documentation/trace/coresight-cpu-debug.txt   | 4 ++--
>  drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/trace/coresight-cpu-debug.txt 
> b/Documentation/trace/coresight-cpu-debug.txt
> index 2b9b51c..89ab09e 100644
> --- a/Documentation/trace/coresight-cpu-debug.txt
> +++ b/Documentation/trace/coresight-cpu-debug.txt
> @@ -177,11 +177,11 @@ Here is an example of the debugging output format:
>  ARM external debug module:
>  coresight-cpu-debug 85.debug: CPU[0]:
>  coresight-cpu-debug 85.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
> -coresight-cpu-debug 85.debug:  EDPCSR:  [] 
> handle_IPI+0x174/0x1d8
> +coresight-cpu-debug 85.debug:  EDPCSR:  handle_IPI+0x174/0x1d8
>  coresight-cpu-debug 85.debug:  EDCIDSR: 
>  coresight-cpu-debug 85.debug:  EDVIDSR: 9000 (State:Non-secure 
> Mode:EL1/0 Width:64bits VMID:0)
>  coresight-cpu-debug 852000.debug: CPU[1]:
>  coresight-cpu-debug 852000.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
> -coresight-cpu-debug 852000.debug:  EDPCSR:  [] 
> debug_notifier_call+0x23c/0x358
> +coresight-cpu-debug 852000.debug:  EDPCSR:  debug_notifier_call+0x23c/0x358
>  coresight-cpu-debug 852000.debug:  EDCIDSR: 
>  coresight-cpu-debug 852000.debug:  EDVIDSR: 9000 (State:Non-secure 
> Mode:EL1/0 Width:64bits VMID:0)
> diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
> b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> index 9cdb3fb..78a054e 100644
> --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
> +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> @@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata *drvdata)
> }
>
> pc = debug_adjust_pc(drvdata);
> -   dev_emerg(dev, " EDPCSR:  [<%px>] %pS\n", (void *)pc, (void *)pc);
> +   dev_emerg(dev, " EDPCSR:  %pS\n", (void *)pc);
>
> if (drvdata->edcidsr_present)
> dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr);
> --
> 2.7.4
>



-- 
Kees Cook
Pixel Security


Re: [PATCH 1/2] i2c: sprd: Prevent i2c accesses after suspend is called

2018-05-01 Thread Wolfram Sang

> > We should maybe handle this in the core somewhen, though. Or?
> 
> Thanks. Yes, It will more helpful if we can handle this in the i2c core.

To understand the issue better: which kind of devices in your system
still send I2C transfers when the system is going to suspend? Do you
know?



signature.asc
Description: PGP signature


Re: [PATCH 1/2] i2c: sprd: Prevent i2c accesses after suspend is called

2018-05-01 Thread Wolfram Sang

> > We should maybe handle this in the core somewhen, though. Or?
> 
> Thanks. Yes, It will more helpful if we can handle this in the i2c core.

To understand the issue better: which kind of devices in your system
still send I2C transfers when the system is going to suspend? Do you
know?



signature.asc
Description: PGP signature


[PATCH] staging: lustre: o2iblnd: fix race at kiblnd_connect_peer

2018-05-01 Thread Doug Oucharek
From: Doug Oucahrek 

cmid will be destroyed at OFED if kiblnd_cm_callback return error.
if error happen before the end of kiblnd_connect_peer, it will touch
destroyed cmid and fail as
(o2iblnd_cb.c:1315:kiblnd_connect_peer())
ASSERTION( cmid->device != ((void *)0) ) failed:

Signed-off-by: Alexander Boyko 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10015
Reviewed-by: Alexey Lyashkov 
Reviewed-by: Doug Oucharek 
Reviewed-by: John L. Hammond 
Signed-off-by: Doug Oucharek 
---
 drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index b4a182d..a76c1f2 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1290,11 +1290,6 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
goto failed2;
}
 
-   LASSERT(cmid->device);
-   CDEBUG(D_NET, "%s: connection bound to %s:%pI4h:%s\n",
-  libcfs_nid2str(peer->ibp_nid), dev->ibd_ifname,
-  >ibd_ifip, cmid->device->name);
-
return;
 
  failed2:
@@ -2996,8 +2991,19 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
} else {
rc = rdma_resolve_route(
cmid, *kiblnd_tunables.kib_timeout * 1000);
-   if (!rc)
+   if (!rc) {
+   struct kib_net *net = peer->ibp_ni->ni_data;
+   struct kib_dev *dev = net->ibn_dev;
+
+   CDEBUG(D_NET, "%s: connection bound to "\
+  "%s:%pI4h:%s\n",
+  libcfs_nid2str(peer->ibp_nid),
+  dev->ibd_ifname,
+  >ibd_ifip, cmid->device->name);
+
return 0;
+   }
+
/* Can't initiate route resolution */
CERROR("Can't resolve route for %s: %d\n",
   libcfs_nid2str(peer->ibp_nid), rc);
-- 
1.8.3.1



[PATCH] staging: lustre: o2iblnd: fix race at kiblnd_connect_peer

2018-05-01 Thread Doug Oucharek
From: Doug Oucahrek 

cmid will be destroyed at OFED if kiblnd_cm_callback return error.
if error happen before the end of kiblnd_connect_peer, it will touch
destroyed cmid and fail as
(o2iblnd_cb.c:1315:kiblnd_connect_peer())
ASSERTION( cmid->device != ((void *)0) ) failed:

Signed-off-by: Alexander Boyko 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10015
Reviewed-by: Alexey Lyashkov 
Reviewed-by: Doug Oucharek 
Reviewed-by: John L. Hammond 
Signed-off-by: Doug Oucharek 
---
 drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index b4a182d..a76c1f2 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1290,11 +1290,6 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
goto failed2;
}
 
-   LASSERT(cmid->device);
-   CDEBUG(D_NET, "%s: connection bound to %s:%pI4h:%s\n",
-  libcfs_nid2str(peer->ibp_nid), dev->ibd_ifname,
-  >ibd_ifip, cmid->device->name);
-
return;
 
  failed2:
@@ -2996,8 +2991,19 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
} else {
rc = rdma_resolve_route(
cmid, *kiblnd_tunables.kib_timeout * 1000);
-   if (!rc)
+   if (!rc) {
+   struct kib_net *net = peer->ibp_ni->ni_data;
+   struct kib_dev *dev = net->ibn_dev;
+
+   CDEBUG(D_NET, "%s: connection bound to "\
+  "%s:%pI4h:%s\n",
+  libcfs_nid2str(peer->ibp_nid),
+  dev->ibd_ifname,
+  >ibd_ifip, cmid->device->name);
+
return 0;
+   }
+
/* Can't initiate route resolution */
CERROR("Can't resolve route for %s: %d\n",
   libcfs_nid2str(peer->ibp_nid), rc);
-- 
1.8.3.1



Re: [Xen-devel] [PATCH 2/6] xen-netfront: copy response out of shared buffer before accessing it

2018-05-01 Thread Oleksandr Andrushchenko

On 05/01/2018 12:01 AM, Marek Marczykowski-Górecki wrote:

Make local copy of the response, otherwise backend might modify it while
frontend is already processing it - leading to time of check / time of
use issue.

This is complementary to XSA155.

Cc: sta...@vger.kernel.org
Signed-off-by: Marek Marczykowski-Górecki 
---
  drivers/net/xen-netfront.c | 51 +++
  1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 4dd0668..dc99763 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -387,13 +387,13 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
rmb(); /* Ensure we see responses up to 'rp'. */
  
  		for (cons = queue->tx.rsp_cons; cons != prod; cons++) {

Side comment: the original concern was expressed on the above counters,
will those be addressed as a dedicated series?

-   struct xen_netif_tx_response *txrsp;
+   struct xen_netif_tx_response txrsp;
  
-			txrsp = RING_GET_RESPONSE(>tx, cons);

-   if (txrsp->status == XEN_NETIF_RSP_NULL)
+   RING_COPY_RESPONSE(>tx, cons, );
+   if (txrsp.status == XEN_NETIF_RSP_NULL)
continue;
  
IMO, there is still no guarantee you access consistent data after this 
change.

What if part of the response was ok when you started copying and
then, in the middle, backend poisons the end of the response?
This seems to be just like minimizing(?) chances to work with inconsistent
data rather than removing the possibility of such completely

-   id  = txrsp->id;
+   id  = txrsp.id;
skb = queue->tx_skbs[id].skb;
if (unlikely(gnttab_query_foreign_access(
queue->grant_tx_ref[id]) != 0)) {
@@ -741,7 +741,7 @@ static int xennet_get_extras(struct netfront_queue *queue,
 RING_IDX rp)
  
  {

-   struct xen_netif_extra_info *extra;
+   struct xen_netif_extra_info extra;
struct device *dev = >info->netdev->dev;
RING_IDX cons = queue->rx.rsp_cons;
int err = 0;
@@ -757,24 +757,23 @@ static int xennet_get_extras(struct netfront_queue *queue,
break;
}
  
-		extra = (struct xen_netif_extra_info *)

-   RING_GET_RESPONSE(>rx, ++cons);
+   RING_COPY_RESPONSE(>rx, ++cons, );
  
-		if (unlikely(!extra->type ||

-extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
+   if (unlikely(!extra.type ||
+extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
if (net_ratelimit())
dev_warn(dev, "Invalid extra type: %d\n",
-   extra->type);
+   extra.type);
err = -EINVAL;
} else {
-   memcpy([extra->type - 1], extra,
-  sizeof(*extra));
+   memcpy([extra.type - 1], ,
+  sizeof(extra));
}
  
  		skb = xennet_get_rx_skb(queue, cons);

ref = xennet_get_rx_ref(queue, cons);
xennet_move_rx_slot(queue, skb, ref);
-   } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
+   } while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);
  
  	queue->rx.rsp_cons = cons;

return err;
@@ -784,28 +783,28 @@ static int xennet_get_responses(struct netfront_queue 
*queue,
struct netfront_rx_info *rinfo, RING_IDX rp,
struct sk_buff_head *list)
  {
-   struct xen_netif_rx_response *rx = >rx;
+   struct xen_netif_rx_response rx = rinfo->rx;
struct xen_netif_extra_info *extras = rinfo->extras;
struct device *dev = >info->netdev->dev;
RING_IDX cons = queue->rx.rsp_cons;
struct sk_buff *skb = xennet_get_rx_skb(queue, cons);
grant_ref_t ref = xennet_get_rx_ref(queue, cons);
-   int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
+   int max = MAX_SKB_FRAGS + (rx.status <= RX_COPY_THRESHOLD);
int slots = 1;
int err = 0;
unsigned long ret;
  
-	if (rx->flags & XEN_NETRXF_extra_info) {

+   if (rx.flags & XEN_NETRXF_extra_info) {
err = xennet_get_extras(queue, extras, rp);
cons = queue->rx.rsp_cons;
}
  
  	for (;;) {

-   if (unlikely(rx->status < 0 ||
-rx->offset + rx->status > XEN_PAGE_SIZE)) {
+   if (unlikely(rx.status < 0 ||
+rx.offset + rx.status > XEN_PAGE_SIZE)) {
if (net_ratelimit())
 

Re: [Xen-devel] [PATCH 2/6] xen-netfront: copy response out of shared buffer before accessing it

2018-05-01 Thread Oleksandr Andrushchenko

On 05/01/2018 12:01 AM, Marek Marczykowski-Górecki wrote:

Make local copy of the response, otherwise backend might modify it while
frontend is already processing it - leading to time of check / time of
use issue.

This is complementary to XSA155.

Cc: sta...@vger.kernel.org
Signed-off-by: Marek Marczykowski-Górecki 
---
  drivers/net/xen-netfront.c | 51 +++
  1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 4dd0668..dc99763 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -387,13 +387,13 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
rmb(); /* Ensure we see responses up to 'rp'. */
  
  		for (cons = queue->tx.rsp_cons; cons != prod; cons++) {

Side comment: the original concern was expressed on the above counters,
will those be addressed as a dedicated series?

-   struct xen_netif_tx_response *txrsp;
+   struct xen_netif_tx_response txrsp;
  
-			txrsp = RING_GET_RESPONSE(>tx, cons);

-   if (txrsp->status == XEN_NETIF_RSP_NULL)
+   RING_COPY_RESPONSE(>tx, cons, );
+   if (txrsp.status == XEN_NETIF_RSP_NULL)
continue;
  
IMO, there is still no guarantee you access consistent data after this 
change.

What if part of the response was ok when you started copying and
then, in the middle, backend poisons the end of the response?
This seems to be just like minimizing(?) chances to work with inconsistent
data rather than removing the possibility of such completely

-   id  = txrsp->id;
+   id  = txrsp.id;
skb = queue->tx_skbs[id].skb;
if (unlikely(gnttab_query_foreign_access(
queue->grant_tx_ref[id]) != 0)) {
@@ -741,7 +741,7 @@ static int xennet_get_extras(struct netfront_queue *queue,
 RING_IDX rp)
  
  {

-   struct xen_netif_extra_info *extra;
+   struct xen_netif_extra_info extra;
struct device *dev = >info->netdev->dev;
RING_IDX cons = queue->rx.rsp_cons;
int err = 0;
@@ -757,24 +757,23 @@ static int xennet_get_extras(struct netfront_queue *queue,
break;
}
  
-		extra = (struct xen_netif_extra_info *)

-   RING_GET_RESPONSE(>rx, ++cons);
+   RING_COPY_RESPONSE(>rx, ++cons, );
  
-		if (unlikely(!extra->type ||

-extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
+   if (unlikely(!extra.type ||
+extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
if (net_ratelimit())
dev_warn(dev, "Invalid extra type: %d\n",
-   extra->type);
+   extra.type);
err = -EINVAL;
} else {
-   memcpy([extra->type - 1], extra,
-  sizeof(*extra));
+   memcpy([extra.type - 1], ,
+  sizeof(extra));
}
  
  		skb = xennet_get_rx_skb(queue, cons);

ref = xennet_get_rx_ref(queue, cons);
xennet_move_rx_slot(queue, skb, ref);
-   } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
+   } while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);
  
  	queue->rx.rsp_cons = cons;

return err;
@@ -784,28 +783,28 @@ static int xennet_get_responses(struct netfront_queue 
*queue,
struct netfront_rx_info *rinfo, RING_IDX rp,
struct sk_buff_head *list)
  {
-   struct xen_netif_rx_response *rx = >rx;
+   struct xen_netif_rx_response rx = rinfo->rx;
struct xen_netif_extra_info *extras = rinfo->extras;
struct device *dev = >info->netdev->dev;
RING_IDX cons = queue->rx.rsp_cons;
struct sk_buff *skb = xennet_get_rx_skb(queue, cons);
grant_ref_t ref = xennet_get_rx_ref(queue, cons);
-   int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
+   int max = MAX_SKB_FRAGS + (rx.status <= RX_COPY_THRESHOLD);
int slots = 1;
int err = 0;
unsigned long ret;
  
-	if (rx->flags & XEN_NETRXF_extra_info) {

+   if (rx.flags & XEN_NETRXF_extra_info) {
err = xennet_get_extras(queue, extras, rp);
cons = queue->rx.rsp_cons;
}
  
  	for (;;) {

-   if (unlikely(rx->status < 0 ||
-rx->offset + rx->status > XEN_PAGE_SIZE)) {
+   if (unlikely(rx.status < 0 ||
+rx.offset + rx.status > XEN_PAGE_SIZE)) {
if (net_ratelimit())
dev_warn(dev, 

Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup

2018-05-01 Thread Kohli, Gaurav



On 5/1/2018 6:49 PM, Peter Zijlstra wrote:

On 5/1/2018 5:01 PM, Peter Zijlstra wrote:



Let me ponder what the best solution is, it's a bit of a mess.


So there's:

  - TASK_PARKED, which we can make a special state; this solves the
problem because then wait_task_inactive() is guaranteed to see
TASK_PARKED and DTRT.


Yes this should work, as it takes pi-lock and blocking kthread_parkme.


  - complete(>parked), which we can do inside schedule(); this
solves the problem because then kthread_park() will not return early
and the task really is blocked.


I think complete will not help, as problem is like below :

Control ThreadCPUHP thread

  cpuhp_thread_fun
  Wake control thread
  complete(>done);

takedown_cpu
kthread_park
set_bit(KTHREAD_SHOULD_PARK

 Here cpuhp is looping,
//success case
 Generally when issue is not
 coming
 it schedule out by below :
   ht->thread_should_run(td->cpu
  scheduler
//failure case
before schedule
loop check
(kthread_should_park()
 enter here as PARKED set

wake_up_process(k)
__kthread_parkme
 complete(>parked);
SETS RUNNING
 schedule   
wait_for_completion(>parked);


So even we protect complete, it may see TASK_RUNNING as final state.
That's why we took pi-lock , So wake_up_process either see TASK_RUNNING
so return or it will exit early before parkme call.

Please correct me , if i misunderstood it. With pi-lock approach we are 
not seeing issue. SO you first solution should fix this(put parked in 
special state).


Regards
Gaurav




and I'm fairly sure I thought of a 3rd way to cure things, but now that
I'm writing things down I cannot seem to remember :/ -- could be we muck
with wait_task_inactive().

In any case, I hate all of them, but I think the completion one is the
least horrible because it gives the strongest guarantees and cleans up
most. But promoting TASK_PARKED to special certainly is the earier
patch.

The below boots, but that's about all I did with it. Opinions?

---

--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -62,6 +62,7 @@ void *kthread_probe_data(struct task_str
  int kthread_park(struct task_struct *k);
  void kthread_unpark(struct task_struct *k);
  void kthread_parkme(void);
+void kthread_park_complete(struct task_struct *k);
  
  int kthreadd(void *unused);

  extern struct task_struct *kthreadd_task;
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -55,7 +55,6 @@ enum KTHREAD_BITS {
KTHREAD_IS_PER_CPU = 0,
KTHREAD_SHOULD_STOP,
KTHREAD_SHOULD_PARK,
-   KTHREAD_IS_PARKED,
  };
  
  static inline void set_kthread_struct(void *kthread)

@@ -177,14 +176,12 @@ void *kthread_probe_data(struct task_str
  
  static void __kthread_parkme(struct kthread *self)

  {
-   __set_current_state(TASK_PARKED);
-   while (test_bit(KTHREAD_SHOULD_PARK, >flags)) {
-   if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags))
-   complete(>parked);
+   for (;;) {
+   set_current_state(TASK_PARKED);
+   if (!test_bit(KTHREAD_SHOULD_PARK, >flags))
+   break;
schedule();
-   __set_current_state(TASK_PARKED);
}
-   clear_bit(KTHREAD_IS_PARKED, >flags);
__set_current_state(TASK_RUNNING);
  }
  
@@ -194,6 +191,11 @@ void kthread_parkme(void)

  }
  EXPORT_SYMBOL_GPL(kthread_parkme);
  
+void kthread_park_complete(struct task_struct *k)

+{
+   complete(_kthread(k)->parked);
+}
+
  static int kthread(void *_create)
  {
/* Copy data: it's on kthread's stack */
@@ -450,22 +452,15 @@ void kthread_unpark(struct task_struct *
  {
struct kthread *kthread = to_kthread(k);
  
-	clear_bit(KTHREAD_SHOULD_PARK, >flags);

/*
-* We clear the IS_PARKED bit here as we don't wait
-* until the task has left the park code. So if we'd
-* park before that happens we'd see the IS_PARKED bit
-* which might be about to be cleared.
+* Newly created kthread was parked when the CPU was offline.
+* The binding was lost and we need to set it again.
 */
-   

Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup

2018-05-01 Thread Kohli, Gaurav



On 5/1/2018 6:49 PM, Peter Zijlstra wrote:

On 5/1/2018 5:01 PM, Peter Zijlstra wrote:



Let me ponder what the best solution is, it's a bit of a mess.


So there's:

  - TASK_PARKED, which we can make a special state; this solves the
problem because then wait_task_inactive() is guaranteed to see
TASK_PARKED and DTRT.


Yes this should work, as it takes pi-lock and blocking kthread_parkme.


  - complete(>parked), which we can do inside schedule(); this
solves the problem because then kthread_park() will not return early
and the task really is blocked.


I think complete will not help, as problem is like below :

Control ThreadCPUHP thread

  cpuhp_thread_fun
  Wake control thread
  complete(>done);

takedown_cpu
kthread_park
set_bit(KTHREAD_SHOULD_PARK

 Here cpuhp is looping,
//success case
 Generally when issue is not
 coming
 it schedule out by below :
   ht->thread_should_run(td->cpu
  scheduler
//failure case
before schedule
loop check
(kthread_should_park()
 enter here as PARKED set

wake_up_process(k)
__kthread_parkme
 complete(>parked);
SETS RUNNING
 schedule   
wait_for_completion(>parked);


So even we protect complete, it may see TASK_RUNNING as final state.
That's why we took pi-lock , So wake_up_process either see TASK_RUNNING
so return or it will exit early before parkme call.

Please correct me , if i misunderstood it. With pi-lock approach we are 
not seeing issue. SO you first solution should fix this(put parked in 
special state).


Regards
Gaurav




and I'm fairly sure I thought of a 3rd way to cure things, but now that
I'm writing things down I cannot seem to remember :/ -- could be we muck
with wait_task_inactive().

In any case, I hate all of them, but I think the completion one is the
least horrible because it gives the strongest guarantees and cleans up
most. But promoting TASK_PARKED to special certainly is the earier
patch.

The below boots, but that's about all I did with it. Opinions?

---

--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -62,6 +62,7 @@ void *kthread_probe_data(struct task_str
  int kthread_park(struct task_struct *k);
  void kthread_unpark(struct task_struct *k);
  void kthread_parkme(void);
+void kthread_park_complete(struct task_struct *k);
  
  int kthreadd(void *unused);

  extern struct task_struct *kthreadd_task;
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -55,7 +55,6 @@ enum KTHREAD_BITS {
KTHREAD_IS_PER_CPU = 0,
KTHREAD_SHOULD_STOP,
KTHREAD_SHOULD_PARK,
-   KTHREAD_IS_PARKED,
  };
  
  static inline void set_kthread_struct(void *kthread)

@@ -177,14 +176,12 @@ void *kthread_probe_data(struct task_str
  
  static void __kthread_parkme(struct kthread *self)

  {
-   __set_current_state(TASK_PARKED);
-   while (test_bit(KTHREAD_SHOULD_PARK, >flags)) {
-   if (!test_and_set_bit(KTHREAD_IS_PARKED, >flags))
-   complete(>parked);
+   for (;;) {
+   set_current_state(TASK_PARKED);
+   if (!test_bit(KTHREAD_SHOULD_PARK, >flags))
+   break;
schedule();
-   __set_current_state(TASK_PARKED);
}
-   clear_bit(KTHREAD_IS_PARKED, >flags);
__set_current_state(TASK_RUNNING);
  }
  
@@ -194,6 +191,11 @@ void kthread_parkme(void)

  }
  EXPORT_SYMBOL_GPL(kthread_parkme);
  
+void kthread_park_complete(struct task_struct *k)

+{
+   complete(_kthread(k)->parked);
+}
+
  static int kthread(void *_create)
  {
/* Copy data: it's on kthread's stack */
@@ -450,22 +452,15 @@ void kthread_unpark(struct task_struct *
  {
struct kthread *kthread = to_kthread(k);
  
-	clear_bit(KTHREAD_SHOULD_PARK, >flags);

/*
-* We clear the IS_PARKED bit here as we don't wait
-* until the task has left the park code. So if we'd
-* park before that happens we'd see the IS_PARKED bit
-* which might be about to be cleared.
+* Newly created kthread was parked when the CPU was offline.
+* The binding was lost and we need to set it again.
 */
-   

Re: [v2 PATCH 1/1] tg3: fix meaningless hw_stats reading after tg3_halt memset 0 hw_stats

2018-05-01 Thread Michael Chan
On Tue, May 1, 2018 at 5:42 PM, Zumeng Chen  wrote:

> diff --git a/drivers/net/ethernet/broadcom/tg3.h 
> b/drivers/net/ethernet/broadcom/tg3.h
> index 3b5e98e..c61d83c 100644
> --- a/drivers/net/ethernet/broadcom/tg3.h
> +++ b/drivers/net/ethernet/broadcom/tg3.h
> @@ -3102,6 +3102,7 @@ enum TG3_FLAGS {
> TG3_FLAG_ROBOSWITCH,
> TG3_FLAG_ONE_DMA_AT_ONCE,
> TG3_FLAG_RGMII_MODE,
> +   TG3_FLAG_HALT,

I think you should be able to use the existing INIT_COMPLETE flag and
not have to add a new flag.

>
> /* Add new flags before this comment and TG3_FLAG_NUMBER_OF_FLAGS */
> TG3_FLAG_NUMBER_OF_FLAGS,   /* Last entry in enum TG3_FLAGS */
> --
> 2.9.3
>


Re: [v2 PATCH 1/1] tg3: fix meaningless hw_stats reading after tg3_halt memset 0 hw_stats

2018-05-01 Thread Michael Chan
On Tue, May 1, 2018 at 5:42 PM, Zumeng Chen  wrote:

> diff --git a/drivers/net/ethernet/broadcom/tg3.h 
> b/drivers/net/ethernet/broadcom/tg3.h
> index 3b5e98e..c61d83c 100644
> --- a/drivers/net/ethernet/broadcom/tg3.h
> +++ b/drivers/net/ethernet/broadcom/tg3.h
> @@ -3102,6 +3102,7 @@ enum TG3_FLAGS {
> TG3_FLAG_ROBOSWITCH,
> TG3_FLAG_ONE_DMA_AT_ONCE,
> TG3_FLAG_RGMII_MODE,
> +   TG3_FLAG_HALT,

I think you should be able to use the existing INIT_COMPLETE flag and
not have to add a new flag.

>
> /* Add new flags before this comment and TG3_FLAG_NUMBER_OF_FLAGS */
> TG3_FLAG_NUMBER_OF_FLAGS,   /* Last entry in enum TG3_FLAGS */
> --
> 2.9.3
>


Re: [PATCH v9 24/27] dt-bindings: timer: new bindings for TI DaVinci timer

2018-05-01 Thread Sekhar Nori
On Wednesday 02 May 2018 07:22 AM, David Lechner wrote:
> Sekhar,
> 
> On 04/27/2018 09:05 AM, Rob Herring wrote:
>> On Thu, Apr 26, 2018 at 07:17:42PM -0500, David Lechner wrote:
>>> This adds new device tree bindings for the timer IP block of TI
>>> DaVinci-like SoCs.
>>>
>>> Signed-off-by: David Lechner 
>>> ---
>>>
>>> v9 changes:
>>> - new patch in v9
>>>
>>>
>>>   .../bindings/timer/ti,davinci-timer.txt   | 24 +++
>>>   1 file changed, 24 insertions(+)
>>>   create mode 100644
>>> Documentation/devicetree/bindings/timer/ti,davinci-timer.txt
>>>
>>> diff --git
>>> a/Documentation/devicetree/bindings/timer/ti,davinci-timer.txt
>>> b/Documentation/devicetree/bindings/timer/ti,davinci-timer.txt
>>> new file mode 100644
>>> index ..2091eca46981
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/timer/ti,davinci-timer.txt
>>> @@ -0,0 +1,24 @@
>>> +* Device tree bindings for Texas Instruments DaVinci timer
>>> +
>>> +This document provides bindings for the 64-bit timer in the DaVinci
>>> +architecture devices. The timer can be configured as a
>>> general-purpose 64-bit
>>> +timer, dual general-purpose 32-bit timers. When configured as dual
>>> 32-bit
>>> +timers, each half can operate in conjunction (chain mode) or
>>> independently
>>> +(unchained mode) of each other.
>>> +
>>> +It is global timer is a free running up-counter and can generate
>>> interrupt
>>
>> Doesn't make sense, too many 'is'.
>>
>> There's no interrupt property listed.
>>
>>> +when the counter reaches preset counter values.
>>> +
>>> +Required properties:
>>> +
>>> +- compatible : should be "ti,davinci-timer".
>>> +- reg : specifies base physical address and count of the registers.
>>> +- clocks : the clock feeding the timer clock.
>>> +
>>> +Example:
>>> +
>>> +    clocksource: timer@2 {
>>> +    compatible = "ti,davinci-timer";
>>> +    reg = <0x2 0x1000>;
>>> +    clocks = <_auxclk>;
>>> +    };
>>> -- 
>>> 2.17.0
>>>
> 
> What do you think about trying to reuse the keystone timer here instead of
> introducing our own binding? I assume it is basically the same since the
> watchdog timer is shared already between davinci and keystone.

When we move to using timer support to drivers/clocksource, surely we
should look at reusing the keystone timer. But even then, I think having
a separate binding document for DaVinci is not illegal. Since we are not
at a stage where we re-use Keystone driver, I prefer the binding to be
separate as well.

The timer IP on DA830/DA850 is not the same as that available on
Keystone and older DaVinci devices. There are additional compare
registers (starting at 0x60) on DA830/DA850 devices which allow the same
timer to be used for both clocksource and clockevent. We use that
facility on DA830 today because there was a shortage of timers on that
device and majority of timers had to be dedicated for DSP use.

So, I think its safer to introduce a "ti,da830-timer" compatible for
DA830 and DA850 timers.

On interrupts question, yes, we dont use interrupt number from DT today.
But defining an interrupt property that looks correct is fine, I think.

Thanks,
Sekhar


Re: [PATCH v9 24/27] dt-bindings: timer: new bindings for TI DaVinci timer

2018-05-01 Thread Sekhar Nori
On Wednesday 02 May 2018 07:22 AM, David Lechner wrote:
> Sekhar,
> 
> On 04/27/2018 09:05 AM, Rob Herring wrote:
>> On Thu, Apr 26, 2018 at 07:17:42PM -0500, David Lechner wrote:
>>> This adds new device tree bindings for the timer IP block of TI
>>> DaVinci-like SoCs.
>>>
>>> Signed-off-by: David Lechner 
>>> ---
>>>
>>> v9 changes:
>>> - new patch in v9
>>>
>>>
>>>   .../bindings/timer/ti,davinci-timer.txt   | 24 +++
>>>   1 file changed, 24 insertions(+)
>>>   create mode 100644
>>> Documentation/devicetree/bindings/timer/ti,davinci-timer.txt
>>>
>>> diff --git
>>> a/Documentation/devicetree/bindings/timer/ti,davinci-timer.txt
>>> b/Documentation/devicetree/bindings/timer/ti,davinci-timer.txt
>>> new file mode 100644
>>> index ..2091eca46981
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/timer/ti,davinci-timer.txt
>>> @@ -0,0 +1,24 @@
>>> +* Device tree bindings for Texas Instruments DaVinci timer
>>> +
>>> +This document provides bindings for the 64-bit timer in the DaVinci
>>> +architecture devices. The timer can be configured as a
>>> general-purpose 64-bit
>>> +timer, dual general-purpose 32-bit timers. When configured as dual
>>> 32-bit
>>> +timers, each half can operate in conjunction (chain mode) or
>>> independently
>>> +(unchained mode) of each other.
>>> +
>>> +It is global timer is a free running up-counter and can generate
>>> interrupt
>>
>> Doesn't make sense, too many 'is'.
>>
>> There's no interrupt property listed.
>>
>>> +when the counter reaches preset counter values.
>>> +
>>> +Required properties:
>>> +
>>> +- compatible : should be "ti,davinci-timer".
>>> +- reg : specifies base physical address and count of the registers.
>>> +- clocks : the clock feeding the timer clock.
>>> +
>>> +Example:
>>> +
>>> +    clocksource: timer@2 {
>>> +    compatible = "ti,davinci-timer";
>>> +    reg = <0x2 0x1000>;
>>> +    clocks = <_auxclk>;
>>> +    };
>>> -- 
>>> 2.17.0
>>>
> 
> What do you think about trying to reuse the keystone timer here instead of
> introducing our own binding? I assume it is basically the same since the
> watchdog timer is shared already between davinci and keystone.

When we move to using timer support to drivers/clocksource, surely we
should look at reusing the keystone timer. But even then, I think having
a separate binding document for DaVinci is not illegal. Since we are not
at a stage where we re-use Keystone driver, I prefer the binding to be
separate as well.

The timer IP on DA830/DA850 is not the same as that available on
Keystone and older DaVinci devices. There are additional compare
registers (starting at 0x60) on DA830/DA850 devices which allow the same
timer to be used for both clocksource and clockevent. We use that
facility on DA830 today because there was a shortage of timers on that
device and majority of timers had to be dedicated for DSP use.

So, I think its safer to introduce a "ti,da830-timer" compatible for
DA830 and DA850 timers.

On interrupts question, yes, we dont use interrupt number from DT today.
But defining an interrupt property that looks correct is fine, I think.

Thanks,
Sekhar


Re: [PATCH v2 2/2] usb: dwc2: fix isoc split in transfer with no data

2018-05-01 Thread Doug Anderson
Hi,

On Tue, May 1, 2018 at 8:04 PM, William Wu  wrote:
> If isoc split in transfer with no data (the length of DATA0
> packet is zero), we can't simply return immediately. Because
> the DATA0 can be the first transaction or the second transaction
> for the isoc split in transaction. If the DATA0 packet with no
> data is in the first transaction, we can return immediately.
> But if the DATA0 packet with no data is in the second transaction
> of isoc split in transaction sequence, we need to increase the
> qtd->isoc_frame_index and giveback urb to device driver if needed,
> otherwise, the MDATA packet will be lost.
>
> A typical test case is that connect the dwc2 controller with an
> usb hs Hub (GL852G-12), and plug an usb fs audio device (Plantronics
> headset) into the downstream port of Hub. Then use the usb mic
> to record, we can find noise when playback.
>
> In the case, the isoc split in transaction sequence like this:
>
> - SSPLIT IN transaction
> - CSPLIT IN transaction
>   - MDATA packet (176 bytes)
> - CSPLIT IN transaction
>   - DATA0 packet (0 byte)
>
> This patch use both the length of DATA0 and qtd->isoc_split_offset
> to check if the DATA0 is in the second transaction.
>
> Signed-off-by: William Wu 
> ---
> Changes in v2:
> - Modify the commit message
>
>  drivers/usb/dwc2/hcd_intr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
> index 5e2378f..479f628 100644
> --- a/drivers/usb/dwc2/hcd_intr.c
> +++ b/drivers/usb/dwc2/hcd_intr.c
> @@ -930,7 +930,7 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg 
> *hsotg,
> frame_desc = >urb->iso_descs[qtd->isoc_frame_index];
> len = dwc2_get_actual_xfer_length(hsotg, chan, chnum, qtd,
>   DWC2_HC_XFER_COMPLETE, NULL);
> -   if (!len) {
> +   if (!len && !qtd->isoc_split_offset) {
> qtd->complete_split = 0;
> qtd->isoc_split_offset = 0;
> return 0;

I don't think my USB-fu is strong enough to do a real review of this
patch, but one small nitpick is that you can remove
"qtd->isoc_split_offset = 0" in the if test now.  AKA:

if (!len && !qtd->isoc_split_offset) {
  qtd->complete_split = 0;
  return 0;
}

...since you only enter the "if" test now when isoc_split_offset is
already 0...  Hopefully John Youn will have some time to comment on
this patch with more real USB knowledge...


-Doug


Re: [PATCH v2 2/2] usb: dwc2: fix isoc split in transfer with no data

2018-05-01 Thread Doug Anderson
Hi,

On Tue, May 1, 2018 at 8:04 PM, William Wu  wrote:
> If isoc split in transfer with no data (the length of DATA0
> packet is zero), we can't simply return immediately. Because
> the DATA0 can be the first transaction or the second transaction
> for the isoc split in transaction. If the DATA0 packet with no
> data is in the first transaction, we can return immediately.
> But if the DATA0 packet with no data is in the second transaction
> of isoc split in transaction sequence, we need to increase the
> qtd->isoc_frame_index and giveback urb to device driver if needed,
> otherwise, the MDATA packet will be lost.
>
> A typical test case is that connect the dwc2 controller with an
> usb hs Hub (GL852G-12), and plug an usb fs audio device (Plantronics
> headset) into the downstream port of Hub. Then use the usb mic
> to record, we can find noise when playback.
>
> In the case, the isoc split in transaction sequence like this:
>
> - SSPLIT IN transaction
> - CSPLIT IN transaction
>   - MDATA packet (176 bytes)
> - CSPLIT IN transaction
>   - DATA0 packet (0 byte)
>
> This patch use both the length of DATA0 and qtd->isoc_split_offset
> to check if the DATA0 is in the second transaction.
>
> Signed-off-by: William Wu 
> ---
> Changes in v2:
> - Modify the commit message
>
>  drivers/usb/dwc2/hcd_intr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
> index 5e2378f..479f628 100644
> --- a/drivers/usb/dwc2/hcd_intr.c
> +++ b/drivers/usb/dwc2/hcd_intr.c
> @@ -930,7 +930,7 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg 
> *hsotg,
> frame_desc = >urb->iso_descs[qtd->isoc_frame_index];
> len = dwc2_get_actual_xfer_length(hsotg, chan, chnum, qtd,
>   DWC2_HC_XFER_COMPLETE, NULL);
> -   if (!len) {
> +   if (!len && !qtd->isoc_split_offset) {
> qtd->complete_split = 0;
> qtd->isoc_split_offset = 0;
> return 0;

I don't think my USB-fu is strong enough to do a real review of this
patch, but one small nitpick is that you can remove
"qtd->isoc_split_offset = 0" in the if test now.  AKA:

if (!len && !qtd->isoc_split_offset) {
  qtd->complete_split = 0;
  return 0;
}

...since you only enter the "if" test now when isoc_split_offset is
already 0...  Hopefully John Youn will have some time to comment on
this patch with more real USB knowledge...


-Doug


[PATCH] coresight: Remove %px for printing pcsr value

2018-05-01 Thread Leo Yan
The driver prints pcsr twice: the first time it uses specifier %px to
print hexadecimal pcsr value and the second time uses specifier %pS for
output kernel symbols.

As suggested by Kees, using %pS should be sufficient and %px isn't
necessary; the reason is if the pcsr is a kernel space address, we can
easily get to know the code line from %pS format, on the other hand, if
the pcsr value doesn't fall into kernel space range (e.g. if the CPU is
stuck in firmware), %pS also gives out pcsr hexadecimal value.

So this commit removes useless %px and update section "Output format"
in the document for alignment between the code and document.

Suggested-by: Kees Cook 
Cc: Mathieu Poirier 
Signed-off-by: Leo Yan 
---
 Documentation/trace/coresight-cpu-debug.txt   | 4 ++--
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/coresight-cpu-debug.txt 
b/Documentation/trace/coresight-cpu-debug.txt
index 2b9b51c..89ab09e 100644
--- a/Documentation/trace/coresight-cpu-debug.txt
+++ b/Documentation/trace/coresight-cpu-debug.txt
@@ -177,11 +177,11 @@ Here is an example of the debugging output format:
 ARM external debug module:
 coresight-cpu-debug 85.debug: CPU[0]:
 coresight-cpu-debug 85.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
-coresight-cpu-debug 85.debug:  EDPCSR:  [] 
handle_IPI+0x174/0x1d8
+coresight-cpu-debug 85.debug:  EDPCSR:  handle_IPI+0x174/0x1d8
 coresight-cpu-debug 85.debug:  EDCIDSR: 
 coresight-cpu-debug 85.debug:  EDVIDSR: 9000 (State:Non-secure 
Mode:EL1/0 Width:64bits VMID:0)
 coresight-cpu-debug 852000.debug: CPU[1]:
 coresight-cpu-debug 852000.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
-coresight-cpu-debug 852000.debug:  EDPCSR:  [] 
debug_notifier_call+0x23c/0x358
+coresight-cpu-debug 852000.debug:  EDPCSR:  debug_notifier_call+0x23c/0x358
 coresight-cpu-debug 852000.debug:  EDCIDSR: 
 coresight-cpu-debug 852000.debug:  EDVIDSR: 9000 (State:Non-secure 
Mode:EL1/0 Width:64bits VMID:0)
diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 9cdb3fb..78a054e 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata *drvdata)
}
 
pc = debug_adjust_pc(drvdata);
-   dev_emerg(dev, " EDPCSR:  [<%px>] %pS\n", (void *)pc, (void *)pc);
+   dev_emerg(dev, " EDPCSR:  %pS\n", (void *)pc);
 
if (drvdata->edcidsr_present)
dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr);
-- 
2.7.4



[PATCH] coresight: Remove %px for printing pcsr value

2018-05-01 Thread Leo Yan
The driver prints pcsr twice: the first time it uses specifier %px to
print hexadecimal pcsr value and the second time uses specifier %pS for
output kernel symbols.

As suggested by Kees, using %pS should be sufficient and %px isn't
necessary; the reason is if the pcsr is a kernel space address, we can
easily get to know the code line from %pS format, on the other hand, if
the pcsr value doesn't fall into kernel space range (e.g. if the CPU is
stuck in firmware), %pS also gives out pcsr hexadecimal value.

So this commit removes useless %px and update section "Output format"
in the document for alignment between the code and document.

Suggested-by: Kees Cook 
Cc: Mathieu Poirier 
Signed-off-by: Leo Yan 
---
 Documentation/trace/coresight-cpu-debug.txt   | 4 ++--
 drivers/hwtracing/coresight/coresight-cpu-debug.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/coresight-cpu-debug.txt 
b/Documentation/trace/coresight-cpu-debug.txt
index 2b9b51c..89ab09e 100644
--- a/Documentation/trace/coresight-cpu-debug.txt
+++ b/Documentation/trace/coresight-cpu-debug.txt
@@ -177,11 +177,11 @@ Here is an example of the debugging output format:
 ARM external debug module:
 coresight-cpu-debug 85.debug: CPU[0]:
 coresight-cpu-debug 85.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
-coresight-cpu-debug 85.debug:  EDPCSR:  [] 
handle_IPI+0x174/0x1d8
+coresight-cpu-debug 85.debug:  EDPCSR:  handle_IPI+0x174/0x1d8
 coresight-cpu-debug 85.debug:  EDCIDSR: 
 coresight-cpu-debug 85.debug:  EDVIDSR: 9000 (State:Non-secure 
Mode:EL1/0 Width:64bits VMID:0)
 coresight-cpu-debug 852000.debug: CPU[1]:
 coresight-cpu-debug 852000.debug:  EDPRSR:  0001 (Power:On DLK:Unlock)
-coresight-cpu-debug 852000.debug:  EDPCSR:  [] 
debug_notifier_call+0x23c/0x358
+coresight-cpu-debug 852000.debug:  EDPCSR:  debug_notifier_call+0x23c/0x358
 coresight-cpu-debug 852000.debug:  EDCIDSR: 
 coresight-cpu-debug 852000.debug:  EDVIDSR: 9000 (State:Non-secure 
Mode:EL1/0 Width:64bits VMID:0)
diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c 
b/drivers/hwtracing/coresight/coresight-cpu-debug.c
index 9cdb3fb..78a054e 100644
--- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
+++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
@@ -315,7 +315,7 @@ static void debug_dump_regs(struct debug_drvdata *drvdata)
}
 
pc = debug_adjust_pc(drvdata);
-   dev_emerg(dev, " EDPCSR:  [<%px>] %pS\n", (void *)pc, (void *)pc);
+   dev_emerg(dev, " EDPCSR:  %pS\n", (void *)pc);
 
if (drvdata->edcidsr_present)
dev_emerg(dev, " EDCIDSR: %08x\n", drvdata->edcidsr);
-- 
2.7.4



[PATCH] staging: lustre: o2iblnd: Fix FastReg map/unmap for MLX5

2018-05-01 Thread Doug Oucharek
The FastReg support in ko2iblnd was not unmapping pool items
causing the items to leak.  In addition, the mapping code
is not growing the pool like we do with FMR.

This patch makes sure we are unmapping FastReg pool elements
when we are done with them.  It also makes sure the pool
will grow when we depleat the pool.

Signed-off-by: Doug Oucharek 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9472
Reviewed-on: https://review.whamcloud.com/27015
Reviewed-by: Andrew Perepechko 
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: Doug Oucharek 
---
 drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c|  2 +-
 drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 12 
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 959e119..cace9ba 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1702,7 +1702,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, 
struct kib_tx *tx,
return 0;
}
spin_unlock(>fps_lock);
-   rc = -EBUSY;
+   rc = -EAGAIN;
}
 
spin_lock(>fps_lock);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index b4a182d..f245481 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -48,7 +48,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct 
kib_tx *tx, int type,
__u64 dstcookie);
 static void kiblnd_queue_tx_locked(struct kib_tx *tx, struct kib_conn *conn);
 static void kiblnd_queue_tx(struct kib_tx *tx, struct kib_conn *conn);
-static void kiblnd_unmap_tx(struct lnet_ni *ni, struct kib_tx *tx);
+static void kiblnd_unmap_tx(struct kib_tx *tx);
 static void kiblnd_check_sends_locked(struct kib_conn *conn);
 
 static void
@@ -66,7 +66,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct 
kib_tx *tx, int type,
LASSERT(!tx->tx_waiting); /* mustn't be awaiting peer 
response */
LASSERT(tx->tx_pool);
 
-   kiblnd_unmap_tx(ni, tx);
+   kiblnd_unmap_tx(tx);
 
/* tx may have up to 2 lnet msgs to finalise */
lntmsg[0] = tx->tx_lntmsg[0]; tx->tx_lntmsg[0] = NULL;
@@ -591,13 +591,9 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct 
kib_tx *tx, int type,
return 0;
 }
 
-static void kiblnd_unmap_tx(struct lnet_ni *ni, struct kib_tx *tx)
+static void kiblnd_unmap_tx(struct kib_tx *tx)
 {
-   struct kib_net *net = ni->ni_data;
-
-   LASSERT(net);
-
-   if (net->ibn_fmr_ps)
+   if (tx->fmr.fmr_pfmr || tx->fmr.fmr_frd)
kiblnd_fmr_pool_unmap(>fmr, tx->tx_status);
 
if (tx->tx_nfrags) {
-- 
1.8.3.1



[PATCH] staging: lustre: o2iblnd: Fix FastReg map/unmap for MLX5

2018-05-01 Thread Doug Oucharek
The FastReg support in ko2iblnd was not unmapping pool items
causing the items to leak.  In addition, the mapping code
is not growing the pool like we do with FMR.

This patch makes sure we are unmapping FastReg pool elements
when we are done with them.  It also makes sure the pool
will grow when we depleat the pool.

Signed-off-by: Doug Oucharek 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9472
Reviewed-on: https://review.whamcloud.com/27015
Reviewed-by: Andrew Perepechko 
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: Doug Oucharek 
---
 drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c|  2 +-
 drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 12 
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
index 959e119..cace9ba 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1702,7 +1702,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, 
struct kib_tx *tx,
return 0;
}
spin_unlock(>fps_lock);
-   rc = -EBUSY;
+   rc = -EAGAIN;
}
 
spin_lock(>fps_lock);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index b4a182d..f245481 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -48,7 +48,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct 
kib_tx *tx, int type,
__u64 dstcookie);
 static void kiblnd_queue_tx_locked(struct kib_tx *tx, struct kib_conn *conn);
 static void kiblnd_queue_tx(struct kib_tx *tx, struct kib_conn *conn);
-static void kiblnd_unmap_tx(struct lnet_ni *ni, struct kib_tx *tx);
+static void kiblnd_unmap_tx(struct kib_tx *tx);
 static void kiblnd_check_sends_locked(struct kib_conn *conn);
 
 static void
@@ -66,7 +66,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct 
kib_tx *tx, int type,
LASSERT(!tx->tx_waiting); /* mustn't be awaiting peer 
response */
LASSERT(tx->tx_pool);
 
-   kiblnd_unmap_tx(ni, tx);
+   kiblnd_unmap_tx(tx);
 
/* tx may have up to 2 lnet msgs to finalise */
lntmsg[0] = tx->tx_lntmsg[0]; tx->tx_lntmsg[0] = NULL;
@@ -591,13 +591,9 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct 
kib_tx *tx, int type,
return 0;
 }
 
-static void kiblnd_unmap_tx(struct lnet_ni *ni, struct kib_tx *tx)
+static void kiblnd_unmap_tx(struct kib_tx *tx)
 {
-   struct kib_net *net = ni->ni_data;
-
-   LASSERT(net);
-
-   if (net->ibn_fmr_ps)
+   if (tx->fmr.fmr_pfmr || tx->fmr.fmr_frd)
kiblnd_fmr_pool_unmap(>fmr, tx->tx_status);
 
if (tx->tx_nfrags) {
-- 
1.8.3.1



Re: [PATCH 9/9] iommu/vt-d: Clean up PASID talbe management for SVM

2018-05-01 Thread Lu Baolu
Hi,

On 05/01/2018 05:24 PM, Liu, Yi L wrote:
>> From: Lu Baolu [mailto:baolu...@linux.intel.com]
>> Sent: Tuesday, April 17, 2018 11:03 AM
>>
>> The previous per iommu pasid table alloc/free interfaces
>> are no longer used. Clean up the driver by removing it.
> I think this patch major cleans intel_svm_alloc_pasid_tables
> and intel_svm_free_pasid_tables.

Yes.

>  Actually, only PASID State
> table allocation is remained in these two functions.

Together with GB pages and 5-level page table support checks.

>
> Since PASID Table is modified to be per-iommu domain. How
> about the PASID State Table? Should it also be per-iommu domain?

Yes. This is in plan. I will do this in another patch series.

>
> Thanks,
> Yi Liu

Best regards,
Lu Baolu

>> Cc: Ashok Raj 
>> Cc: Jacob Pan 
>> Cc: Kevin Tian 
>> Cc: Liu Yi L 
>> Signed-off-by: Lu Baolu 
>> ---
>>  drivers/iommu/intel-iommu.c |  6 +++---
>>  drivers/iommu/intel-svm.c   | 17 ++---
>>  include/linux/intel-iommu.h |  5 ++---
>>  3 files changed, 7 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 5fe7f91..5acb90d 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -1736,7 +1736,7 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>>  if (pasid_enabled(iommu)) {
>>  if (ecap_prs(iommu->ecap))
>>  intel_svm_finish_prq(iommu);
>> -intel_svm_free_pasid_tables(iommu);
>> +intel_svm_exit(iommu);
>>  }
>>  #endif
>>  }
>> @@ -3291,7 +3291,7 @@ static int __init init_dmars(void)
>>  hw_pass_through = 0;
>>  #ifdef CONFIG_INTEL_IOMMU_SVM
>>  if (pasid_enabled(iommu))
>> -intel_svm_alloc_pasid_tables(iommu);
>> +intel_svm_init(iommu);
>>  #endif
>>  }
>>
>> @@ -4268,7 +4268,7 @@ static int intel_iommu_add(struct dmar_drhd_unit 
>> *dmaru)
>>
>>  #ifdef CONFIG_INTEL_IOMMU_SVM
>>  if (pasid_enabled(iommu))
>> -intel_svm_alloc_pasid_tables(iommu);
>> +intel_svm_init(iommu);
>>  #endif
>>
>>  if (dmaru->ignored) {
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 3b14819..38cae65 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -34,7 +34,7 @@
>>
>>  static irqreturn_t prq_event_thread(int irq, void *d);
>>
>> -int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
>> +int intel_svm_init(struct intel_iommu *iommu)
>>  {
>>  struct page *pages;
>>  int order;
>> @@ -59,15 +59,6 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu 
>> *iommu)
>>  iommu->pasid_max = 0x2;
>>
>>  order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
>> -pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
>> -if (!pages) {
>> -pr_warn("IOMMU: %s: Failed to allocate PASID table\n",
>> -iommu->name);
>> -return -ENOMEM;
>> -}
>> -iommu->pasid_table = page_address(pages);
>> -pr_info("%s: Allocated order %d PASID table.\n", iommu->name, order);
>> -
>>  if (ecap_dis(iommu->ecap)) {
>>  /* Just making it explicit... */
>>  BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct
>> pasid_state_entry));
>> @@ -82,14 +73,10 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu
>> *iommu)
>>  return 0;
>>  }
>>
>> -int intel_svm_free_pasid_tables(struct intel_iommu *iommu)
>> +int intel_svm_exit(struct intel_iommu *iommu)
>>  {
>>  int order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
>>
>> -if (iommu->pasid_table) {
>> -free_pages((unsigned long)iommu->pasid_table, order);
>> -iommu->pasid_table = NULL;
>> -}
>>  if (iommu->pasid_state_table) {
>>  free_pages((unsigned long)iommu->pasid_state_table, order);
>>  iommu->pasid_state_table = NULL;
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>> index 08e5811..44c7613 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -470,7 +470,6 @@ struct intel_iommu {
>>   * devices away to userspace processes (e.g. for DPDK) and don't
>>   * want to trust that userspace will use *only* the PASID it was
>>   * told to. But while it's all driver-arbitrated, we're fine. */
>> -struct pasid_entry *pasid_table;
>>  struct pasid_state_entry *pasid_state_table;
>>  struct page_req_dsc *prq;
>>  unsigned char prq_name[16];/* Name for PRQ interrupt */
>> @@ -539,8 +538,8 @@ void free_pgtable_page(void *vaddr);
>>  struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
>>
>>  #ifdef CONFIG_INTEL_IOMMU_SVM
>> -extern int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu);
>> 

Re: [PATCH 9/9] iommu/vt-d: Clean up PASID talbe management for SVM

2018-05-01 Thread Lu Baolu
Hi,

On 05/01/2018 05:24 PM, Liu, Yi L wrote:
>> From: Lu Baolu [mailto:baolu...@linux.intel.com]
>> Sent: Tuesday, April 17, 2018 11:03 AM
>>
>> The previous per iommu pasid table alloc/free interfaces
>> are no longer used. Clean up the driver by removing it.
> I think this patch major cleans intel_svm_alloc_pasid_tables
> and intel_svm_free_pasid_tables.

Yes.

>  Actually, only PASID State
> table allocation is remained in these two functions.

Together with GB pages and 5-level page table support checks.

>
> Since PASID Table is modified to be per-iommu domain. How
> about the PASID State Table? Should it also be per-iommu domain?

Yes. This is in plan. I will do this in another patch series.

>
> Thanks,
> Yi Liu

Best regards,
Lu Baolu

>> Cc: Ashok Raj 
>> Cc: Jacob Pan 
>> Cc: Kevin Tian 
>> Cc: Liu Yi L 
>> Signed-off-by: Lu Baolu 
>> ---
>>  drivers/iommu/intel-iommu.c |  6 +++---
>>  drivers/iommu/intel-svm.c   | 17 ++---
>>  include/linux/intel-iommu.h |  5 ++---
>>  3 files changed, 7 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 5fe7f91..5acb90d 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -1736,7 +1736,7 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>>  if (pasid_enabled(iommu)) {
>>  if (ecap_prs(iommu->ecap))
>>  intel_svm_finish_prq(iommu);
>> -intel_svm_free_pasid_tables(iommu);
>> +intel_svm_exit(iommu);
>>  }
>>  #endif
>>  }
>> @@ -3291,7 +3291,7 @@ static int __init init_dmars(void)
>>  hw_pass_through = 0;
>>  #ifdef CONFIG_INTEL_IOMMU_SVM
>>  if (pasid_enabled(iommu))
>> -intel_svm_alloc_pasid_tables(iommu);
>> +intel_svm_init(iommu);
>>  #endif
>>  }
>>
>> @@ -4268,7 +4268,7 @@ static int intel_iommu_add(struct dmar_drhd_unit 
>> *dmaru)
>>
>>  #ifdef CONFIG_INTEL_IOMMU_SVM
>>  if (pasid_enabled(iommu))
>> -intel_svm_alloc_pasid_tables(iommu);
>> +intel_svm_init(iommu);
>>  #endif
>>
>>  if (dmaru->ignored) {
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 3b14819..38cae65 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -34,7 +34,7 @@
>>
>>  static irqreturn_t prq_event_thread(int irq, void *d);
>>
>> -int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu)
>> +int intel_svm_init(struct intel_iommu *iommu)
>>  {
>>  struct page *pages;
>>  int order;
>> @@ -59,15 +59,6 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu 
>> *iommu)
>>  iommu->pasid_max = 0x2;
>>
>>  order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
>> -pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
>> -if (!pages) {
>> -pr_warn("IOMMU: %s: Failed to allocate PASID table\n",
>> -iommu->name);
>> -return -ENOMEM;
>> -}
>> -iommu->pasid_table = page_address(pages);
>> -pr_info("%s: Allocated order %d PASID table.\n", iommu->name, order);
>> -
>>  if (ecap_dis(iommu->ecap)) {
>>  /* Just making it explicit... */
>>  BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct
>> pasid_state_entry));
>> @@ -82,14 +73,10 @@ int intel_svm_alloc_pasid_tables(struct intel_iommu
>> *iommu)
>>  return 0;
>>  }
>>
>> -int intel_svm_free_pasid_tables(struct intel_iommu *iommu)
>> +int intel_svm_exit(struct intel_iommu *iommu)
>>  {
>>  int order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
>>
>> -if (iommu->pasid_table) {
>> -free_pages((unsigned long)iommu->pasid_table, order);
>> -iommu->pasid_table = NULL;
>> -}
>>  if (iommu->pasid_state_table) {
>>  free_pages((unsigned long)iommu->pasid_state_table, order);
>>  iommu->pasid_state_table = NULL;
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>> index 08e5811..44c7613 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -470,7 +470,6 @@ struct intel_iommu {
>>   * devices away to userspace processes (e.g. for DPDK) and don't
>>   * want to trust that userspace will use *only* the PASID it was
>>   * told to. But while it's all driver-arbitrated, we're fine. */
>> -struct pasid_entry *pasid_table;
>>  struct pasid_state_entry *pasid_state_table;
>>  struct page_req_dsc *prq;
>>  unsigned char prq_name[16];/* Name for PRQ interrupt */
>> @@ -539,8 +538,8 @@ void free_pgtable_page(void *vaddr);
>>  struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
>>
>>  #ifdef CONFIG_INTEL_IOMMU_SVM
>> -extern int intel_svm_alloc_pasid_tables(struct intel_iommu *iommu);
>> -extern int intel_svm_free_pasid_tables(struct intel_iommu *iommu);
>> +int intel_svm_init(struct intel_iommu *iommu);

Re: [PATCH] memcg, hugetlb: pages allocated for hugetlb's overcommit will be charged to memcg

2018-05-01 Thread Mike Kravetz
On 05/01/2018 06:19 PM, TSUKADA Koutaro wrote:
> If nr_overcommit_hugepages is assumed to be infinite, allocating pages for
> hugetlb's overcommit from buddy pool is all unlimited even if being limited
> by memcg. The purpose of this patch is that if we allocate the hugetlb page
> from the boddy pool, that means we should charge it to memcg.
> 
> A straightforward way for user applications to use hugetlb pages is to
> create the pool(nr_hugepages), but root privileges is required. For example,
> assuming the field of HPC, it can be said that giving root privs to general
> users is difficult. Also, the way to the creating pool is that we need to
> estimate exactly how much use hugetlb, and it feels a bit troublesome.
> 
> In such a case, using hugetlb's overcommit feature, considered to let the
> user use hugetlb page only with overcommit without creating the any pool.
> However, as mentioned earlier, the page can be allocated limitelessly in
> overcommit in the current implementation. Therefore, by introducing memcg
> charging, I wanted to be able to manage the memory resources used by the
> user application only with memcg's limitation.
> 
> This patch targets RHELSA(kernel-alt-4.11.0-45.6.1.el7a.src.rpm).

It would be very helpful to rebase this patch on a recent mainline kernel.
The code to allocate surplus huge pages has been significantly changed in
recent kernels.

I have no doubt that this is a real issue and we are not correctly charging
surplus to a memcg.  But your patch will be hard to evaluate when based on
an older distro kernel.

>   The patch
> does the following things.
> 
> When allocating the page from buddy at __hugetlb_alloc_buddy_huge_page,
> set the flag of HUGETLB_OVERCOMMIT on that page[1].private. When actually
> use the page which HUGETLB_OVERCOMMIT is set(at hugepage_add_new_anon_rmap
> or huge_add_to_page_cache), it tries to charge to memcg. If the charge is
> successful, add the flag of HUGETLB_OVERCOMMIT_CHARGED on that page[1].

What is the reason for not charging pages at allocation/reserve time?  I am
not an expert in memcg accounting, but I would think the pages should be
charged at allocation time.  Otherwise, a task could allocate a large number
of (reserved) pages that are not charged to a memcg.  memcg charges in other
code paths seem to happen at huge page allocation time.

-- 
Mike Kravetz

> 
> The page charged to memcg will finally be uncharged at free_huge_page.
> 
> Modification of memcontrol.c is for updating of statistical information
> when the task moves cgroup. The hpage_nr_pages works correctly for thp,
> but it is not suitable for such as hugetlb which uses the contiguous bit
> of aarch64, so need to modify it.
> 
> Signed-off-by: TSUKADA Koutaro 
> ---
>  include/linux/hugetlb.h |   45 ++
>  mm/hugetlb.c|   80 +++
>  mm/memcontrol.c |   98 
> ++--
>  3 files changed, 219 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index d67675e..67991cb 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -511,6 +511,51 @@ static inline void set_huge_swap_pte_at(struct mm_struct 
> *mm, unsigned long addr
>   set_huge_pte_at(mm, addr, ptep, pte);
>  }
>  #endif
> +
> +#define HUGETLB_OVERCOMMIT   1UL
> +#define HUGETLB_OVERCOMMIT_CHARGED   2UL
> +
> +static inline void add_hugetlb_compound_private(struct page *page,
> + unsigned long val)
> +{
> + page[1].private |= val;
> +}
> +
> +static inline unsigned long get_hugetlb_compound_private(struct page *page)
> +{
> + return page_private([1]);
> +}
> +
> +static inline void add_hugetlb_overcommit(struct page *page)
> +{
> + add_hugetlb_compound_private(page, HUGETLB_OVERCOMMIT);
> +}
> +
> +static inline void del_hugetlb_overcommit(struct page *page)
> +{
> + add_hugetlb_compound_private(page, ~(HUGETLB_OVERCOMMIT));
> +}
> +
> +static inline int is_hugetlb_overcommit(struct page *page)
> +{
> + return (get_hugetlb_compound_private(page) & HUGETLB_OVERCOMMIT);
> +}
> +
> +static inline void add_hugetlb_overcommit_charged(struct page *page)
> +{
> + add_hugetlb_compound_private(page, HUGETLB_OVERCOMMIT_CHARGED);
> +}
> +
> +static inline void del_hugetlb_overcommit_charged(struct page *page)
> +{
> + add_hugetlb_compound_private(page, ~(HUGETLB_OVERCOMMIT_CHARGED));
> +}
> +
> +static inline int is_hugetlb_overcommit_charged(struct page *page)
> +{
> + return (get_hugetlb_compound_private(page) &
> + HUGETLB_OVERCOMMIT_CHARGED);
> +}
>  #else/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  #define alloc_huge_page(v, a, r) NULL
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6191fb9..2cd91d9 100644
> --- a/mm/hugetlb.c
> +++ 

Re: [PATCH] memcg, hugetlb: pages allocated for hugetlb's overcommit will be charged to memcg

2018-05-01 Thread Mike Kravetz
On 05/01/2018 06:19 PM, TSUKADA Koutaro wrote:
> If nr_overcommit_hugepages is assumed to be infinite, allocating pages for
> hugetlb's overcommit from buddy pool is all unlimited even if being limited
> by memcg. The purpose of this patch is that if we allocate the hugetlb page
> from the boddy pool, that means we should charge it to memcg.
> 
> A straightforward way for user applications to use hugetlb pages is to
> create the pool(nr_hugepages), but root privileges is required. For example,
> assuming the field of HPC, it can be said that giving root privs to general
> users is difficult. Also, the way to the creating pool is that we need to
> estimate exactly how much use hugetlb, and it feels a bit troublesome.
> 
> In such a case, using hugetlb's overcommit feature, considered to let the
> user use hugetlb page only with overcommit without creating the any pool.
> However, as mentioned earlier, the page can be allocated limitelessly in
> overcommit in the current implementation. Therefore, by introducing memcg
> charging, I wanted to be able to manage the memory resources used by the
> user application only with memcg's limitation.
> 
> This patch targets RHELSA(kernel-alt-4.11.0-45.6.1.el7a.src.rpm).

It would be very helpful to rebase this patch on a recent mainline kernel.
The code to allocate surplus huge pages has been significantly changed in
recent kernels.

I have no doubt that this is a real issue and we are not correctly charging
surplus to a memcg.  But your patch will be hard to evaluate when based on
an older distro kernel.

>   The patch
> does the following things.
> 
> When allocating the page from buddy at __hugetlb_alloc_buddy_huge_page,
> set the flag of HUGETLB_OVERCOMMIT on that page[1].private. When actually
> use the page which HUGETLB_OVERCOMMIT is set(at hugepage_add_new_anon_rmap
> or huge_add_to_page_cache), it tries to charge to memcg. If the charge is
> successful, add the flag of HUGETLB_OVERCOMMIT_CHARGED on that page[1].

What is the reason for not charging pages at allocation/reserve time?  I am
not an expert in memcg accounting, but I would think the pages should be
charged at allocation time.  Otherwise, a task could allocate a large number
of (reserved) pages that are not charged to a memcg.  memcg charges in other
code paths seem to happen at huge page allocation time.

-- 
Mike Kravetz

> 
> The page charged to memcg will finally be uncharged at free_huge_page.
> 
> Modification of memcontrol.c is for updating of statistical information
> when the task moves cgroup. The hpage_nr_pages works correctly for thp,
> but it is not suitable for such as hugetlb which uses the contiguous bit
> of aarch64, so need to modify it.
> 
> Signed-off-by: TSUKADA Koutaro 
> ---
>  include/linux/hugetlb.h |   45 ++
>  mm/hugetlb.c|   80 +++
>  mm/memcontrol.c |   98 
> ++--
>  3 files changed, 219 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index d67675e..67991cb 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -511,6 +511,51 @@ static inline void set_huge_swap_pte_at(struct mm_struct 
> *mm, unsigned long addr
>   set_huge_pte_at(mm, addr, ptep, pte);
>  }
>  #endif
> +
> +#define HUGETLB_OVERCOMMIT   1UL
> +#define HUGETLB_OVERCOMMIT_CHARGED   2UL
> +
> +static inline void add_hugetlb_compound_private(struct page *page,
> + unsigned long val)
> +{
> + page[1].private |= val;
> +}
> +
> +static inline unsigned long get_hugetlb_compound_private(struct page *page)
> +{
> + return page_private([1]);
> +}
> +
> +static inline void add_hugetlb_overcommit(struct page *page)
> +{
> + add_hugetlb_compound_private(page, HUGETLB_OVERCOMMIT);
> +}
> +
> +static inline void del_hugetlb_overcommit(struct page *page)
> +{
> + add_hugetlb_compound_private(page, ~(HUGETLB_OVERCOMMIT));
> +}
> +
> +static inline int is_hugetlb_overcommit(struct page *page)
> +{
> + return (get_hugetlb_compound_private(page) & HUGETLB_OVERCOMMIT);
> +}
> +
> +static inline void add_hugetlb_overcommit_charged(struct page *page)
> +{
> + add_hugetlb_compound_private(page, HUGETLB_OVERCOMMIT_CHARGED);
> +}
> +
> +static inline void del_hugetlb_overcommit_charged(struct page *page)
> +{
> + add_hugetlb_compound_private(page, ~(HUGETLB_OVERCOMMIT_CHARGED));
> +}
> +
> +static inline int is_hugetlb_overcommit_charged(struct page *page)
> +{
> + return (get_hugetlb_compound_private(page) &
> + HUGETLB_OVERCOMMIT_CHARGED);
> +}
>  #else/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  #define alloc_huge_page(v, a, r) NULL
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6191fb9..2cd91d9 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -24,6 +24,7 

Re: linux-next: manual merge of the bpf-next tree with the bpf tree

2018-05-01 Thread Song Liu


> On May 1, 2018, at 7:09 PM, Stephen Rothwell  wrote:
> 
> Hi all,
> 
> Today's linux-next merge of the bpf-next tree got a conflict in:
> 
>  tools/testing/selftests/bpf/test_progs.c
> 
> between commit:
> 
>  a4e21ff8d9a3 ("bpf: minor fix to selftest test_stacktrace_build_id()")
> 
> from the bpf tree and commit:
> 
>  79b453501310 ("tools/bpf: add a test for bpf_get_stack with tracepoint prog")
> 
> from the bpf-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc tools/testing/selftests/bpf/test_progs.c
> index fac581f1c57f,aa336f0abebc..
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@@ -1137,9 -1193,14 +1193,14 @@@ static void test_stacktrace_build_id(vo
> err, errno))
>   goto disable_pmu;
> 
> + stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
> + if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap",
> +   "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> + 
>   assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
>  == 0);
> - assert(system("./urandom_read if=/dev/urandom of=/dev/zero count=4 2> 
> /dev/null") == 0);
> + assert(system("./urandom_read") == 0);
>   /* disable stack trace collection */
>   key = 0;
>   val = 1;
> @@@ -1188,8 -1249,15 +1249,15 @@@
>   previous_key = key;
>   } while (bpf_map_get_next_key(stackmap_fd, _key, ) == 0);
> 
> - CHECK(build_id_matches < 1, "build id match",
> -   "Didn't find expected build ID from the map\n");
> + if (CHECK(build_id_matches < 1, "build id match",
> -   "Didn't find expected build ID from the map"))
> ++  "Didn't find expected build ID from the map\n"))

^^^  Is there a "+" at the beginning of this line? 

Thanks,
Song

> + goto disable_pmu;
> + 
> + stack_trace_len = PERF_MAX_STACK_DEPTH
> + * sizeof(struct bpf_stack_build_id);
> + err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len);
> + CHECK(err, "compare_stack_ips stackmap vs. stack_amap",
> +   "err %d errno %d\n", err, errno);
> 
>  disable_pmu:
>   ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);



Re: linux-next: manual merge of the bpf-next tree with the bpf tree

2018-05-01 Thread Song Liu


> On May 1, 2018, at 7:09 PM, Stephen Rothwell  wrote:
> 
> Hi all,
> 
> Today's linux-next merge of the bpf-next tree got a conflict in:
> 
>  tools/testing/selftests/bpf/test_progs.c
> 
> between commit:
> 
>  a4e21ff8d9a3 ("bpf: minor fix to selftest test_stacktrace_build_id()")
> 
> from the bpf tree and commit:
> 
>  79b453501310 ("tools/bpf: add a test for bpf_get_stack with tracepoint prog")
> 
> from the bpf-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc tools/testing/selftests/bpf/test_progs.c
> index fac581f1c57f,aa336f0abebc..
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@@ -1137,9 -1193,14 +1193,14 @@@ static void test_stacktrace_build_id(vo
> err, errno))
>   goto disable_pmu;
> 
> + stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
> + if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap",
> +   "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> + 
>   assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
>  == 0);
> - assert(system("./urandom_read if=/dev/urandom of=/dev/zero count=4 2> 
> /dev/null") == 0);
> + assert(system("./urandom_read") == 0);
>   /* disable stack trace collection */
>   key = 0;
>   val = 1;
> @@@ -1188,8 -1249,15 +1249,15 @@@
>   previous_key = key;
>   } while (bpf_map_get_next_key(stackmap_fd, _key, ) == 0);
> 
> - CHECK(build_id_matches < 1, "build id match",
> -   "Didn't find expected build ID from the map\n");
> + if (CHECK(build_id_matches < 1, "build id match",
> -   "Didn't find expected build ID from the map"))
> ++  "Didn't find expected build ID from the map\n"))

^^^  Is there a "+" at the beginning of this line? 

Thanks,
Song

> + goto disable_pmu;
> + 
> + stack_trace_len = PERF_MAX_STACK_DEPTH
> + * sizeof(struct bpf_stack_build_id);
> + err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len);
> + CHECK(err, "compare_stack_ips stackmap vs. stack_amap",
> +   "err %d errno %d\n", err, errno);
> 
>  disable_pmu:
>   ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);



Re: [PATCH 8/9] iommu/vt-d: Use per-domain pasid table

2018-05-01 Thread Lu Baolu
Hi,

On 05/01/2018 05:23 PM, Liu, Yi L wrote:
>> From: Lu Baolu [mailto:baolu...@linux.intel.com]
>> Sent: Tuesday, April 17, 2018 11:03 AM
>>
>> This patch replaces current per iommu pasid table with
>> the new added per domain pasid table. Each svm-capable
>> PCI device will have its own pasid table.
> This is not accurate. pasid table is per-iommu domain. May
> more accurate "Each svm-capable PCI device will be configed
> with a pasid table which shares with other svm-capable device
> within its iommu domain"

Yes. Should be refined.

>
> Can include my reviewed by after refining the description.
>
> Reviewed-by: Liu, Yi L 

Sure.

>
> Thanks,
> Yi Liu

Best regards,
Lu Baolu

>> Cc: Ashok Raj 
>> Cc: Jacob Pan 
>> Cc: Kevin Tian 
>> Cc: Liu Yi L 
>> Signed-off-by: Lu Baolu 
>> ---
>>  drivers/iommu/intel-iommu.c |  6 +++---
>>  drivers/iommu/intel-svm.c   | 37 +
>>  2 files changed, 28 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index d4f9cea..5fe7f91 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5191,7 +5191,7 @@ int intel_iommu_enable_pasid(struct intel_iommu
>> *iommu, struct intel_svm_dev *sd
>>  if (!(ctx_lo & CONTEXT_PASIDE)) {
>>  if (iommu->pasid_state_table)
>>  context[1].hi = (u64)virt_to_phys(iommu-
>>> pasid_state_table);
>> -context[1].lo = (u64)virt_to_phys(iommu->pasid_table) |
>> +context[1].lo = (u64)virt_to_phys(domain->pasid_table) |
>>  intel_iommu_get_pts(domain);
>>
>>  wmb();
>> @@ -5259,8 +5259,8 @@ struct intel_iommu *intel_svm_device_to_iommu(struct
>> device *dev)
>>  return NULL;
>>  }
>>
>> -if (!iommu->pasid_table) {
>> -dev_err(dev, "PASID not enabled on IOMMU; cannot enable
>> SVM\n");
>> +if (!intel_pasid_get_table(dev)) {
>> +dev_err(dev, "No PASID table for device; cannot enable SVM\n");
>>  return NULL;
>>  }
>>
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 3abc94f..3b14819 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -256,6 +256,7 @@ static void intel_flush_pasid_dev(struct intel_svm *svm,
>> struct intel_svm_dev *s
>>  static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>>  {
>>  struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
>> +struct pasid_entry *pasid_table;
>>  struct intel_svm_dev *sdev;
>>
>>  /* This might end up being called from exit_mmap(), *before* the page
>> @@ -270,11 +271,16 @@ static void intel_mm_release(struct mmu_notifier *mn,
>> struct mm_struct *mm)
>>   * page) so that we end up taking a fault that the hardware really
>>   * *has* to handle gracefully without affecting other processes.
>>   */
>> -svm->iommu->pasid_table[svm->pasid].val = 0;
>> -wmb();
>> -
>>  rcu_read_lock();
>>  list_for_each_entry_rcu(sdev, >devs, list) {
>> +pasid_table = intel_pasid_get_table(sdev->dev);
>> +if (!pasid_table)
>> +continue;
>> +
>> +pasid_table[svm->pasid].val = 0;
>> +/* Make sure the entry update is visible before translation. */
>> +wmb();
>> +
>>  intel_flush_pasid_dev(svm, sdev, svm->pasid);
>>  intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
>>  }
>> @@ -295,6 +301,7 @@ static LIST_HEAD(global_svm_list);
>>  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
>> svm_dev_ops
>> *ops)
>>  {
>>  struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>> +struct pasid_entry *pasid_table;
>>  struct intel_svm_dev *sdev;
>>  struct intel_svm *svm = NULL;
>>  struct mm_struct *mm = NULL;
>> @@ -302,7 +309,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
>> flags, struct svm_dev_
>>  int pasid_max;
>>  int ret;
>>
>> -if (WARN_ON(!iommu || !iommu->pasid_table))
>> +pasid_table = intel_pasid_get_table(dev);
>> +if (WARN_ON(!iommu || !pasid_table))
>>  return -EINVAL;
>>
>>  if (dev_is_pci(dev)) {
>> @@ -380,8 +388,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
>> flags, struct svm_dev_
>>  }
>>  svm->iommu = iommu;
>>
>> -if (pasid_max > iommu->pasid_max)
>> -pasid_max = iommu->pasid_max;
>> +if (pasid_max > intel_pasid_max_id)
>> +pasid_max = intel_pasid_max_id;
>>
>>  /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
>>  ret = intel_pasid_alloc_id(svm,
>> @@ -414,7 +422,7 @@ int intel_svm_bind_mm(struct device *dev, 

Re: [PATCH 8/9] iommu/vt-d: Use per-domain pasid table

2018-05-01 Thread Lu Baolu
Hi,

On 05/01/2018 05:23 PM, Liu, Yi L wrote:
>> From: Lu Baolu [mailto:baolu...@linux.intel.com]
>> Sent: Tuesday, April 17, 2018 11:03 AM
>>
>> This patch replaces current per iommu pasid table with
>> the new added per domain pasid table. Each svm-capable
>> PCI device will have its own pasid table.
> This is not accurate. pasid table is per-iommu domain. May
> more accurate "Each svm-capable PCI device will be configed
> with a pasid table which shares with other svm-capable device
> within its iommu domain"

Yes. Should be refined.

>
> Can include my reviewed by after refining the description.
>
> Reviewed-by: Liu, Yi L 

Sure.

>
> Thanks,
> Yi Liu

Best regards,
Lu Baolu

>> Cc: Ashok Raj 
>> Cc: Jacob Pan 
>> Cc: Kevin Tian 
>> Cc: Liu Yi L 
>> Signed-off-by: Lu Baolu 
>> ---
>>  drivers/iommu/intel-iommu.c |  6 +++---
>>  drivers/iommu/intel-svm.c   | 37 +
>>  2 files changed, 28 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index d4f9cea..5fe7f91 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5191,7 +5191,7 @@ int intel_iommu_enable_pasid(struct intel_iommu
>> *iommu, struct intel_svm_dev *sd
>>  if (!(ctx_lo & CONTEXT_PASIDE)) {
>>  if (iommu->pasid_state_table)
>>  context[1].hi = (u64)virt_to_phys(iommu-
>>> pasid_state_table);
>> -context[1].lo = (u64)virt_to_phys(iommu->pasid_table) |
>> +context[1].lo = (u64)virt_to_phys(domain->pasid_table) |
>>  intel_iommu_get_pts(domain);
>>
>>  wmb();
>> @@ -5259,8 +5259,8 @@ struct intel_iommu *intel_svm_device_to_iommu(struct
>> device *dev)
>>  return NULL;
>>  }
>>
>> -if (!iommu->pasid_table) {
>> -dev_err(dev, "PASID not enabled on IOMMU; cannot enable
>> SVM\n");
>> +if (!intel_pasid_get_table(dev)) {
>> +dev_err(dev, "No PASID table for device; cannot enable SVM\n");
>>  return NULL;
>>  }
>>
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 3abc94f..3b14819 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -256,6 +256,7 @@ static void intel_flush_pasid_dev(struct intel_svm *svm,
>> struct intel_svm_dev *s
>>  static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>>  {
>>  struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
>> +struct pasid_entry *pasid_table;
>>  struct intel_svm_dev *sdev;
>>
>>  /* This might end up being called from exit_mmap(), *before* the page
>> @@ -270,11 +271,16 @@ static void intel_mm_release(struct mmu_notifier *mn,
>> struct mm_struct *mm)
>>   * page) so that we end up taking a fault that the hardware really
>>   * *has* to handle gracefully without affecting other processes.
>>   */
>> -svm->iommu->pasid_table[svm->pasid].val = 0;
>> -wmb();
>> -
>>  rcu_read_lock();
>>  list_for_each_entry_rcu(sdev, >devs, list) {
>> +pasid_table = intel_pasid_get_table(sdev->dev);
>> +if (!pasid_table)
>> +continue;
>> +
>> +pasid_table[svm->pasid].val = 0;
>> +/* Make sure the entry update is visible before translation. */
>> +wmb();
>> +
>>  intel_flush_pasid_dev(svm, sdev, svm->pasid);
>>  intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
>>  }
>> @@ -295,6 +301,7 @@ static LIST_HEAD(global_svm_list);
>>  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
>> svm_dev_ops
>> *ops)
>>  {
>>  struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>> +struct pasid_entry *pasid_table;
>>  struct intel_svm_dev *sdev;
>>  struct intel_svm *svm = NULL;
>>  struct mm_struct *mm = NULL;
>> @@ -302,7 +309,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
>> flags, struct svm_dev_
>>  int pasid_max;
>>  int ret;
>>
>> -if (WARN_ON(!iommu || !iommu->pasid_table))
>> +pasid_table = intel_pasid_get_table(dev);
>> +if (WARN_ON(!iommu || !pasid_table))
>>  return -EINVAL;
>>
>>  if (dev_is_pci(dev)) {
>> @@ -380,8 +388,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
>> flags, struct svm_dev_
>>  }
>>  svm->iommu = iommu;
>>
>> -if (pasid_max > iommu->pasid_max)
>> -pasid_max = iommu->pasid_max;
>> +if (pasid_max > intel_pasid_max_id)
>> +pasid_max = intel_pasid_max_id;
>>
>>  /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
>>  ret = intel_pasid_alloc_id(svm,
>> @@ -414,7 +422,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int
>> flags, struct svm_dev_
>>  if (cpu_feature_enabled(X86_FEATURE_LA57))
>>  pasid_entry_val 

Re: [PATCH v2 1/2] usb: dwc2: alloc dma aligned buffer for isoc split in

2018-05-01 Thread Doug Anderson
Hi,

On Tue, May 1, 2018 at 8:04 PM, William Wu  wrote:
> The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in
> a more supported way") rips out a lot of code to simply the
> allocation of aligned DMA. However, it also introduces a new
> issue when use isoc split in transfer.
>
> In my test case, I connect the dwc2 controller with an usb hs
> Hub (GL852G-12), and plug an usb fs audio device (Plantronics
> headset) into the downstream port of Hub. Then use the usb mic
> to record, we can find noise when playback.
>
> It's because that the usb Hub uses an MDATA for the first
> transaction and a DATA0 for the second transaction for the isoc
> split in transaction. An typical isoc split in transaction sequence
> like this:
>
> - SSPLIT IN transaction
> - CSPLIT IN transaction
>   - MDATA packet
> - CSPLIT IN transaction
>   - DATA0 packet
>
> The DMA address of MDATA (urb->dma) is always DWORD-aligned, but
> the DMA address of DATA0 (urb->dma + qtd->isoc_split_offset) may
> not be DWORD-aligned, it depends on the qtd->isoc_split_offset (the
> length of MDATA). In my test case, the length of MDATA is usually
> unaligned, this casue DATA0 packet transmission error.
>
> This patch base on the old way of aligned DMA allocation in the
> dwc2 driver to get aligned DMA for isoc split in.
>
> Signed-off-by: William Wu 
> ---
> Changes in v2:
> - None
>
>  drivers/usb/dwc2/hcd.c   | 63 
> +---
>  drivers/usb/dwc2/hcd.h   | 10 +++
>  drivers/usb/dwc2/hcd_intr.c  |  8 ++
>  drivers/usb/dwc2/hcd_queue.c |  8 +-
>  4 files changed, 85 insertions(+), 4 deletions(-)

A word of warning that I'm pretty rusty on dwc2 and even when I was
making lots of patches I still considered myself a bit clueless.
...so if something seems wrong, please call me on it...


> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
> index 190f959..8c2b35f 100644
> --- a/drivers/usb/dwc2/hcd.c
> +++ b/drivers/usb/dwc2/hcd.c
> @@ -1562,11 +1562,20 @@ static void dwc2_hc_start_transfer(struct dwc2_hsotg 
> *hsotg,
> }
>
> if (hsotg->params.host_dma) {
> -   dwc2_writel((u32)chan->xfer_dma,
> -   hsotg->regs + HCDMA(chan->hc_num));
> +   dma_addr_t dma_addr;
> +
> +   if (chan->align_buf) {
> +   if (dbg_hc(chan))
> +   dev_vdbg(hsotg->dev, "align_buf\n");
> +   dma_addr = chan->align_buf;
> +   } else {
> +   dma_addr = chan->xfer_dma;
> +   }
> +   dwc2_writel((u32)dma_addr, hsotg->regs + HCDMA(chan->hc_num));
> +
> if (dbg_hc(chan))
> dev_vdbg(hsotg->dev, "Wrote %08lx to HCDMA(%d)\n",
> -(unsigned long)chan->xfer_dma, chan->hc_num);
> +(unsigned long)dma_addr, chan->hc_num);
> }
>
> /* Start the split */
> @@ -2620,6 +2629,33 @@ static void dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
> }
>  }
>
> +static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg,
> +   struct dwc2_qh *qh,
> +   struct dwc2_host_chan *chan)
> +{
> +   if (!qh->dw_align_buf) {
> +   qh->dw_align_buf = kmalloc(chan->max_packet,

So you're potentially doing a bounce buffer atop a bounce buffer now,
right?  That seems pretty non-optimal.  You're also back to doing a
kmalloc at interrupt time which I found was pretty bad for
performance.

Is there really no way you can do your memory allocation in
dwc2_alloc_dma_aligned_buffer() instead of here?  For input packets,
if you could just allocate an extra 3 bytes in the original bounce
buffer you could just re-use the original bounce buffer together with
a memmove?  AKA:

transfersize = 13 + 64;
buf = alloc(16 + 64);

// Do the first transfer, no problems.
dma_into(buf, 13);

// 2nd transfer isn't aligned, so align.
// we allocated a little extra to account for this
dma_into(buf + 16, 64);

// move back where it belongs.
memmove(buf + 13, buf + 16, 64);


To make the above work you'd need to still allocate a bounce buffer
even if the original "urb->transfer_buffer" was already aligned.
Ideally you'd be able to know when dwc2_alloc_dma_aligned_buffer()
that this is one of the special cases where you need a slightly
oversized bounce buffer.

---

If you somehow need to do something for output, you'd do the opposite.
You'd copy backwards top already transferred data to align.


> +  GFP_ATOMIC | GFP_DMA);
> +   if (!qh->dw_align_buf)
> +   return -ENOMEM;
> +
> +   qh->dw_align_buf_size = chan->max_packet;
> +   }
> +
> +   qh->dw_align_buf_dma = dma_map_single(hsotg->dev, qh->dw_align_buf,

Re: [PATCH v2 1/2] usb: dwc2: alloc dma aligned buffer for isoc split in

2018-05-01 Thread Doug Anderson
Hi,

On Tue, May 1, 2018 at 8:04 PM, William Wu  wrote:
> The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in
> a more supported way") rips out a lot of code to simply the
> allocation of aligned DMA. However, it also introduces a new
> issue when use isoc split in transfer.
>
> In my test case, I connect the dwc2 controller with an usb hs
> Hub (GL852G-12), and plug an usb fs audio device (Plantronics
> headset) into the downstream port of Hub. Then use the usb mic
> to record, we can find noise when playback.
>
> It's because that the usb Hub uses an MDATA for the first
> transaction and a DATA0 for the second transaction for the isoc
> split in transaction. An typical isoc split in transaction sequence
> like this:
>
> - SSPLIT IN transaction
> - CSPLIT IN transaction
>   - MDATA packet
> - CSPLIT IN transaction
>   - DATA0 packet
>
> The DMA address of MDATA (urb->dma) is always DWORD-aligned, but
> the DMA address of DATA0 (urb->dma + qtd->isoc_split_offset) may
> not be DWORD-aligned, it depends on the qtd->isoc_split_offset (the
> length of MDATA). In my test case, the length of MDATA is usually
> unaligned, this casue DATA0 packet transmission error.
>
> This patch base on the old way of aligned DMA allocation in the
> dwc2 driver to get aligned DMA for isoc split in.
>
> Signed-off-by: William Wu 
> ---
> Changes in v2:
> - None
>
>  drivers/usb/dwc2/hcd.c   | 63 
> +---
>  drivers/usb/dwc2/hcd.h   | 10 +++
>  drivers/usb/dwc2/hcd_intr.c  |  8 ++
>  drivers/usb/dwc2/hcd_queue.c |  8 +-
>  4 files changed, 85 insertions(+), 4 deletions(-)

A word of warning that I'm pretty rusty on dwc2 and even when I was
making lots of patches I still considered myself a bit clueless.
...so if something seems wrong, please call me on it...


> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
> index 190f959..8c2b35f 100644
> --- a/drivers/usb/dwc2/hcd.c
> +++ b/drivers/usb/dwc2/hcd.c
> @@ -1562,11 +1562,20 @@ static void dwc2_hc_start_transfer(struct dwc2_hsotg 
> *hsotg,
> }
>
> if (hsotg->params.host_dma) {
> -   dwc2_writel((u32)chan->xfer_dma,
> -   hsotg->regs + HCDMA(chan->hc_num));
> +   dma_addr_t dma_addr;
> +
> +   if (chan->align_buf) {
> +   if (dbg_hc(chan))
> +   dev_vdbg(hsotg->dev, "align_buf\n");
> +   dma_addr = chan->align_buf;
> +   } else {
> +   dma_addr = chan->xfer_dma;
> +   }
> +   dwc2_writel((u32)dma_addr, hsotg->regs + HCDMA(chan->hc_num));
> +
> if (dbg_hc(chan))
> dev_vdbg(hsotg->dev, "Wrote %08lx to HCDMA(%d)\n",
> -(unsigned long)chan->xfer_dma, chan->hc_num);
> +(unsigned long)dma_addr, chan->hc_num);
> }
>
> /* Start the split */
> @@ -2620,6 +2629,33 @@ static void dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
> }
>  }
>
> +static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg,
> +   struct dwc2_qh *qh,
> +   struct dwc2_host_chan *chan)
> +{
> +   if (!qh->dw_align_buf) {
> +   qh->dw_align_buf = kmalloc(chan->max_packet,

So you're potentially doing a bounce buffer atop a bounce buffer now,
right?  That seems pretty non-optimal.  You're also back to doing a
kmalloc at interrupt time which I found was pretty bad for
performance.

Is there really no way you can do your memory allocation in
dwc2_alloc_dma_aligned_buffer() instead of here?  For input packets,
if you could just allocate an extra 3 bytes in the original bounce
buffer you could just re-use the original bounce buffer together with
a memmove?  AKA:

transfersize = 13 + 64;
buf = alloc(16 + 64);

// Do the first transfer, no problems.
dma_into(buf, 13);

// 2nd transfer isn't aligned, so align.
// we allocated a little extra to account for this
dma_into(buf + 16, 64);

// move back where it belongs.
memmove(buf + 13, buf + 16, 64);


To make the above work you'd need to still allocate a bounce buffer
even if the original "urb->transfer_buffer" was already aligned.
Ideally you'd be able to know when dwc2_alloc_dma_aligned_buffer()
that this is one of the special cases where you need a slightly
oversized bounce buffer.

---

If you somehow need to do something for output, you'd do the opposite.
You'd copy backwards top already transferred data to align.


> +  GFP_ATOMIC | GFP_DMA);
> +   if (!qh->dw_align_buf)
> +   return -ENOMEM;
> +
> +   qh->dw_align_buf_size = chan->max_packet;
> +   }
> +
> +   qh->dw_align_buf_dma = dma_map_single(hsotg->dev, qh->dw_align_buf,
> + 

Re: [PATCH v3] dmaengine: axi-dmac: Request IRQ with IRQF_SHARED

2018-05-01 Thread Vinod Koul
On Sat, Apr 28, 2018 at 12:57:54PM -0700, Moritz Fischer wrote:
> Request IRQ with IRQF_SHARED flag to enable setups with multiple
> instances of the core sharing a single IRQ line.
> This works out since the IRQ handler already checks if there is
> an actual IRQ pending and returns IRQ_NONE otherwise.

Applied, thanks

-- 
~Vinod


Re: [PATCH v3] dmaengine: axi-dmac: Request IRQ with IRQF_SHARED

2018-05-01 Thread Vinod Koul
On Sat, Apr 28, 2018 at 12:57:54PM -0700, Moritz Fischer wrote:
> Request IRQ with IRQF_SHARED flag to enable setups with multiple
> instances of the core sharing a single IRQ line.
> This works out since the IRQ handler already checks if there is
> an actual IRQ pending and returns IRQ_NONE otherwise.

Applied, thanks

-- 
~Vinod


Re: bug-introducing patches

2018-05-01 Thread Willy Tarreau
On Tue, May 01, 2018 at 10:02:30PM +, Sasha Levin wrote:
> On Tue, May 01, 2018 at 04:54:48PM -0400, Theodore Y. Ts'o wrote:
> >Post -rc3 or -rc4, in my opinion bug fixes should wait until the next
> >merge window before they get merged at all.  (And certainly features
> >bugs should be Right Out.)  And sure, bug fixes should certainly get
> >more testing.  So I guess my main objection is your making a blanket
> >statement about all fixes, instead of breaking out regression fixes
> >versus bug fixes.  Since in my opinion they are very different animals...
> 
> I understant your point, you want to make fixes available to testers as
> soon as possible. This might make sense, as you've mentioned, in < -rc3.
> 
> So yes, maybe the solution isn't to force -next, but rather add more
> "quiet time" at the end of the cycle? Make special rules for -rc7/8? Or
> even add a "test"/"beta" release at the end of the cycle?

I disagree with the proposals above, and for multiple reasons :
  - leaving a known bug on purpose automatically degrades the quality of
each release. Given that less than 100% of the fixes introduce
regressions, by not merging any of these fixes, we'll end up with
more bugs. That's a very bad idea.

  - this will give a worse image of dot-0 releases, and users will be
even less interested in testing them, prefering to wait for the
first stable version. In this case what's the point of dot-0 if it
is known broken and nobody uses it ?

  - letting fixes rot longer on the developer side will send a very bad
signal to developers : "we don't care about your bugs". Companies
relying on contractors will have a harder time including fixes in
the contract, as it will only cover what's needed to get the feature
merged. Again this will put the focus on extremely fast and dirty
development, given that fixes will not be considered during the same
window.

I'd rather do the exact opposite except for those who introduce too many
regressions : set up a delay penalty to developers who create too many
regressions and make this penalty easy to check. I think it will generally
not affect subsystem maintainers, unless they pull and push lots of crap
without checking, of course. But it could prove very useful for those
developing under contract, because companies employing them will want to
ensure that their work will not be delayed due to a penalty. Thus is will
be important for these developers to be more careful.

After all, the person proposing a fix always knows better than anyone
else if this fix was done seriously or not. Developers who do lots of
testing before sending should not be penalized, and should get their
fix merged immediately. Those who just send untested patches should be
trusted much less.

> From what I see, the same number of bugs-per-line-of-code applies for
> commits accross all -rc releases, so while it makes sense to get a fix
> in quickly at -rc1 to allow testing to continue, the same must not
> happen during -rc8, but unfourtenately it does now.

That's where I strongly disagree, since it would mean releasing with even
more bugs than today.

Willy



Re: bug-introducing patches

2018-05-01 Thread Willy Tarreau
On Tue, May 01, 2018 at 10:02:30PM +, Sasha Levin wrote:
> On Tue, May 01, 2018 at 04:54:48PM -0400, Theodore Y. Ts'o wrote:
> >Post -rc3 or -rc4, in my opinion bug fixes should wait until the next
> >merge window before they get merged at all.  (And certainly features
> >bugs should be Right Out.)  And sure, bug fixes should certainly get
> >more testing.  So I guess my main objection is your making a blanket
> >statement about all fixes, instead of breaking out regression fixes
> >versus bug fixes.  Since in my opinion they are very different animals...
> 
> I understant your point, you want to make fixes available to testers as
> soon as possible. This might make sense, as you've mentioned, in < -rc3.
> 
> So yes, maybe the solution isn't to force -next, but rather add more
> "quiet time" at the end of the cycle? Make special rules for -rc7/8? Or
> even add a "test"/"beta" release at the end of the cycle?

I disagree with the proposals above, and for multiple reasons :
  - leaving a known bug on purpose automatically degrades the quality of
each release. Given that less than 100% of the fixes introduce
regressions, by not merging any of these fixes, we'll end up with
more bugs. That's a very bad idea.

  - this will give a worse image of dot-0 releases, and users will be
even less interested in testing them, prefering to wait for the
first stable version. In this case what's the point of dot-0 if it
is known broken and nobody uses it ?

  - letting fixes rot longer on the developer side will send a very bad
signal to developers : "we don't care about your bugs". Companies
relying on contractors will have a harder time including fixes in
the contract, as it will only cover what's needed to get the feature
merged. Again this will put the focus on extremely fast and dirty
development, given that fixes will not be considered during the same
window.

I'd rather do the exact opposite except for those who introduce too many
regressions : set up a delay penalty to developers who create too many
regressions and make this penalty easy to check. I think it will generally
not affect subsystem maintainers, unless they pull and push lots of crap
without checking, of course. But it could prove very useful for those
developing under contract, because companies employing them will want to
ensure that their work will not be delayed due to a penalty. Thus is will
be important for these developers to be more careful.

After all, the person proposing a fix always knows better than anyone
else if this fix was done seriously or not. Developers who do lots of
testing before sending should not be penalized, and should get their
fix merged immediately. Those who just send untested patches should be
trusted much less.

> From what I see, the same number of bugs-per-line-of-code applies for
> commits accross all -rc releases, so while it makes sense to get a fix
> in quickly at -rc1 to allow testing to continue, the same must not
> happen during -rc8, but unfourtenately it does now.

That's where I strongly disagree, since it would mean releasing with even
more bugs than today.

Willy



Re: [PATCH] Revert "dmaengine: pl330: add DMA_PAUSE feature"

2018-05-01 Thread Vinod Koul
On Sat, Apr 28, 2018 at 05:50:58PM -0400, Frank Mori Hess wrote:
> This reverts commit 88987d2c7534a0269f567fb101e6d71a08f0f01d.
> 
> The pl330.c pause implementation violates the dmaengine requirement
> for no data loss, since it relies on the DMAKILL
> instruction.  However, DMAKILL discards in-flight data from the
> dma controller's fifo.  This is documented in the dma-330 manual
> and I have observed it with hardware doing device-to-memory burst
> transfers.  The discarded data may or may not show up in the
> residue count, depending on timing (resulting in data corruption
> effectively).

I am dropping the orignal patch for queue, so no need for revert patch.

-- 
~Vinod


Re: [PATCH] Revert "dmaengine: pl330: add DMA_PAUSE feature"

2018-05-01 Thread Vinod Koul
On Sat, Apr 28, 2018 at 05:50:58PM -0400, Frank Mori Hess wrote:
> This reverts commit 88987d2c7534a0269f567fb101e6d71a08f0f01d.
> 
> The pl330.c pause implementation violates the dmaengine requirement
> for no data loss, since it relies on the DMAKILL
> instruction.  However, DMAKILL discards in-flight data from the
> dma controller's fifo.  This is documented in the dma-330 manual
> and I have observed it with hardware doing device-to-memory burst
> transfers.  The discarded data may or may not show up in the
> residue count, depending on timing (resulting in data corruption
> effectively).

I am dropping the orignal patch for queue, so no need for revert patch.

-- 
~Vinod


Re: [PATCH][V2] dmaengine: stm32-mdma: fix spelling mistake: "avalaible" -> "available"

2018-05-01 Thread Vinod Koul
On Sat, Apr 28, 2018 at 11:03:52PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in dev_err error message text and make
> channel plural.

Applied, thanks

-- 
~Vinod


Re: [PATCH][V2] dmaengine: stm32-mdma: fix spelling mistake: "avalaible" -> "available"

2018-05-01 Thread Vinod Koul
On Sat, Apr 28, 2018 at 11:03:52PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in dev_err error message text and make
> channel plural.

Applied, thanks

-- 
~Vinod


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 9:00 PM Dan Williams 
wrote:
> >
> > I  have some dim memory of "rep movs doesn't work well for pmem", but
does
> > it *seriously* need unrolling to cacheline boundaries? And if it does,
who
> > designed it, and why is anybody using it?
> >

> I think this is an FAQ from the original submission, in fact some guy
> named "Linus Torvalds" asked [1]:

Oh, I already mentioned that  I remembered that "rep movs" didn't work well.

But there's a big gap between "just use 'rep movs' and 'do some cacheline
unrollong'".

Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
the unrolling that it will already do on its own?

I may have gotten that answered too, but there's no comment in the code
about why it's such a disgusting mess, so I've long since forgotten _why_
it's such a disgusting mess.

That loop unrolling _used_ to be "hey, it's simple".

Now it's "Hey, that's truly disgusting", with the separate fault handling
for every single case in the unrolled loop.

Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
labels, and getting the number rof bytes copied right.

And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
COULD GET RID OF ALL OF THOSE?"

Maybe you already did ask yourself.  But I'm asking because it sure isn't
explained in the code.

 Linus


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 9:00 PM Dan Williams 
wrote:
> >
> > I  have some dim memory of "rep movs doesn't work well for pmem", but
does
> > it *seriously* need unrolling to cacheline boundaries? And if it does,
who
> > designed it, and why is anybody using it?
> >

> I think this is an FAQ from the original submission, in fact some guy
> named "Linus Torvalds" asked [1]:

Oh, I already mentioned that  I remembered that "rep movs" didn't work well.

But there's a big gap between "just use 'rep movs' and 'do some cacheline
unrollong'".

Why isn't it just doing a simple word-at-a-time loop and letting the CPU do
the unrolling that it will already do on its own?

I may have gotten that answered too, but there's no comment in the code
about why it's such a disgusting mess, so I've long since forgotten _why_
it's such a disgusting mess.

That loop unrolling _used_ to be "hey, it's simple".

Now it's "Hey, that's truly disgusting", with the separate fault handling
for every single case in the unrolled loop.

Just look at the nasty _ASM_EXTABLE_FAULT() uses and those E_cache_x error
labels, and getting the number rof bytes copied right.

And then ask yourself "what if we didn't unroll that thing 8 times, AND WE
COULD GET RID OF ALL OF THOSE?"

Maybe you already did ask yourself.  But I'm asking because it sure isn't
explained in the code.

 Linus


[PATCH] staging: lustre: o2iblnd: Enable Multiple OPA Endpoints between Nodes

2018-05-01 Thread Doug Oucharek
OPA driver optimizations are based on the MPI model where it is
expected to have multiple endpoints between two given nodes. To
enable this optimization for Lustre, we need to make it possible,
via an LND-specific tuneable, to create multiple endpoints and to
balance the traffic over them.

Both sides of a connection must have this patch for it to work.
Only the active side of the connection (usually the client)
needs to have the new tuneable set > 1.

Signed-off-by: Doug Oucharek 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8943
Reviewed-on: https://review.whamcloud.com/25168
Reviewed-by: Amir Shehata 
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: Doug Oucharek 
---
 .../lustre/include/uapi/linux/lnet/lnet-dlc.h  |  3 ++-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h| 17 ---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 25 +++---
 .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |  9 
 4 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h 
b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
index e45d828..c1619f4 100644
--- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
+++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
@@ -53,7 +53,8 @@ struct lnet_ioctl_config_o2iblnd_tunables {
__u32 lnd_fmr_pool_size;
__u32 lnd_fmr_flush_trigger;
__u32 lnd_fmr_cache;
-   __u32 pad;
+   __u16 lnd_conns_per_peer;
+   __u16 pad;
 };
 
 struct lnet_ioctl_config_lnd_tunables {
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
index ca6e09d..da8929d 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
@@ -568,6 +568,8 @@ struct kib_peer {
lnet_nid_t   ibp_nid; /* who's on the other end(s) */
struct lnet_ni  *ibp_ni; /* LNet interface */
struct list_head ibp_conns;   /* all active connections */
+   struct kib_conn *ibp_next_conn;  /* next connection to send on for
+   round robin */
struct list_head ibp_tx_queue;/* msgs waiting for a conn */
__u64ibp_incarnation; /* incarnation of peer */
/* when (in jiffies) I was last alive */
@@ -581,7 +583,7 @@ struct kib_peer {
/* current active connection attempts */
unsigned short  ibp_connecting;
/* reconnect this peer later */
-   unsigned short  ibp_reconnecting:1;
+   unsigned char   ibp_reconnecting;
/* counter of how many times we triggered a conn race */
unsigned char   ibp_races;
/* # consecutive reconnection attempts to this peer */
@@ -744,10 +746,19 @@ struct kib_peer {
 static inline struct kib_conn *
 kiblnd_get_conn_locked(struct kib_peer *peer)
 {
+   struct list_head *next;
+
LASSERT(!list_empty(>ibp_conns));
 
-   /* just return the first connection */
-   return list_entry(peer->ibp_conns.next, struct kib_conn, ibc_list);
+   /* Advance to next connection, be sure to skip the head node */
+   if (!peer->ibp_next_conn ||
+   peer->ibp_next_conn->ibc_list.next == >ibp_conns)
+   next = peer->ibp_conns.next;
+   else
+   next = peer->ibp_next_conn->ibc_list.next;
+   peer->ibp_next_conn = list_entry(next, struct kib_conn, ibc_list);
+
+   return peer->ibp_next_conn;
 }
 
 static inline int
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index b4a182d..303e4e1 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1250,7 +1250,6 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
 
LASSERT(net);
LASSERT(peer->ibp_connecting > 0);
-   LASSERT(!peer->ibp_reconnecting);
 
cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, peer, RDMA_PS_TCP,
 IB_QPT_RC);
@@ -1332,7 +1331,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
 
LASSERT(!peer->ibp_accepting && !peer->ibp_connecting &&
list_empty(>ibp_conns));
-   peer->ibp_reconnecting = 0;
+   peer->ibp_reconnecting--;
 
if (!kiblnd_peer_active(peer)) {
list_splice_init(>ibp_tx_queue, );
@@ -1365,6 +1364,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
rwlock_t *g_lock = _data.kib_global_lock;
unsigned long flags;
int rc;
+   inti;
+   struct lnet_ioctl_config_o2iblnd_tunables 

[PATCH] staging: lustre: o2iblnd: Enable Multiple OPA Endpoints between Nodes

2018-05-01 Thread Doug Oucharek
OPA driver optimizations are based on the MPI model where it is
expected to have multiple endpoints between two given nodes. To
enable this optimization for Lustre, we need to make it possible,
via an LND-specific tuneable, to create multiple endpoints and to
balance the traffic over them.

Both sides of a connection must have this patch for it to work.
Only the active side of the connection (usually the client)
needs to have the new tuneable set > 1.

Signed-off-by: Doug Oucharek 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8943
Reviewed-on: https://review.whamcloud.com/25168
Reviewed-by: Amir Shehata 
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: Doug Oucharek 
---
 .../lustre/include/uapi/linux/lnet/lnet-dlc.h  |  3 ++-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h| 17 ---
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 25 +++---
 .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c  |  9 
 4 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h 
b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
index e45d828..c1619f4 100644
--- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
+++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h
@@ -53,7 +53,8 @@ struct lnet_ioctl_config_o2iblnd_tunables {
__u32 lnd_fmr_pool_size;
__u32 lnd_fmr_flush_trigger;
__u32 lnd_fmr_cache;
-   __u32 pad;
+   __u16 lnd_conns_per_peer;
+   __u16 pad;
 };
 
 struct lnet_ioctl_config_lnd_tunables {
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
index ca6e09d..da8929d 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
@@ -568,6 +568,8 @@ struct kib_peer {
lnet_nid_t   ibp_nid; /* who's on the other end(s) */
struct lnet_ni  *ibp_ni; /* LNet interface */
struct list_head ibp_conns;   /* all active connections */
+   struct kib_conn *ibp_next_conn;  /* next connection to send on for
+   round robin */
struct list_head ibp_tx_queue;/* msgs waiting for a conn */
__u64ibp_incarnation; /* incarnation of peer */
/* when (in jiffies) I was last alive */
@@ -581,7 +583,7 @@ struct kib_peer {
/* current active connection attempts */
unsigned short  ibp_connecting;
/* reconnect this peer later */
-   unsigned short  ibp_reconnecting:1;
+   unsigned char   ibp_reconnecting;
/* counter of how many times we triggered a conn race */
unsigned char   ibp_races;
/* # consecutive reconnection attempts to this peer */
@@ -744,10 +746,19 @@ struct kib_peer {
 static inline struct kib_conn *
 kiblnd_get_conn_locked(struct kib_peer *peer)
 {
+   struct list_head *next;
+
LASSERT(!list_empty(>ibp_conns));
 
-   /* just return the first connection */
-   return list_entry(peer->ibp_conns.next, struct kib_conn, ibc_list);
+   /* Advance to next connection, be sure to skip the head node */
+   if (!peer->ibp_next_conn ||
+   peer->ibp_next_conn->ibc_list.next == >ibp_conns)
+   next = peer->ibp_conns.next;
+   else
+   next = peer->ibp_next_conn->ibc_list.next;
+   peer->ibp_next_conn = list_entry(next, struct kib_conn, ibc_list);
+
+   return peer->ibp_next_conn;
 }
 
 static inline int
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c 
b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index b4a182d..303e4e1 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1250,7 +1250,6 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
 
LASSERT(net);
LASSERT(peer->ibp_connecting > 0);
-   LASSERT(!peer->ibp_reconnecting);
 
cmid = kiblnd_rdma_create_id(kiblnd_cm_callback, peer, RDMA_PS_TCP,
 IB_QPT_RC);
@@ -1332,7 +1331,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
 
LASSERT(!peer->ibp_accepting && !peer->ibp_connecting &&
list_empty(>ibp_conns));
-   peer->ibp_reconnecting = 0;
+   peer->ibp_reconnecting--;
 
if (!kiblnd_peer_active(peer)) {
list_splice_init(>ibp_tx_queue, );
@@ -1365,6 +1364,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
rwlock_t *g_lock = _data.kib_global_lock;
unsigned long flags;
int rc;
+   inti;
+   struct lnet_ioctl_config_o2iblnd_tunables *tunables;
 
/*
 * If I get here, I've committed to send, so I complete the tx with
@@ -1461,7 +1462,8 @@ static int 

[PATCH v3] soc: mediatek: add a fixed wait for SRAM stable

2018-05-01 Thread sean.wang
From: Sean Wang 

MT7622_POWER_DOMAIN_WB doesn't send an ACK when its managed SRAM becomes
stable, which is not like the behavior the other power domains should
have. Therefore, it's necessary for such a power domain to have a fixed
and well-predefined duration to wait until its managed SRAM can be allowed
to access by all functions running on the top.

v1 -> v2:
 - use MTK_SCPD_FWAIT_SRAM flag as an indication requiring force waiting.

v2 -> v3:
 - change the order for condition judgment with we check an existence of
   the MTK_SCPD_FWAIT_SRAM first to get rid of any oversight on negative
   condition.

Signed-off-by: Sean Wang 
Cc: Matthias Brugger 
Cc: Ulf Hansson 
Cc: Weiyi Lu 
---
 drivers/soc/mediatek/mtk-scpsys.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
b/drivers/soc/mediatek/mtk-scpsys.c
index b1b45e4..2080c8f 100644
--- a/drivers/soc/mediatek/mtk-scpsys.c
+++ b/drivers/soc/mediatek/mtk-scpsys.c
@@ -32,6 +32,7 @@
 #define MTK_POLL_TIMEOUT(jiffies_to_usecs(HZ))
 
 #define MTK_SCPD_ACTIVE_WAKEUP BIT(0)
+#define MTK_SCPD_FWAIT_SRAMBIT(1)
 #define MTK_SCPD_CAPS(_scpd, _x)   ((_scpd)->data->caps & (_x))
 
 #define SPM_VDE_PWR_CON0x0210
@@ -237,11 +238,21 @@ static int scpsys_power_on(struct generic_pm_domain 
*genpd)
val &= ~scpd->data->sram_pdn_bits;
writel(val, ctl_addr);
 
-   /* wait until SRAM_PDN_ACK all 0 */
-   ret = readl_poll_timeout(ctl_addr, tmp, (tmp & pdn_ack) == 0,
-MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
-   if (ret < 0)
-   goto err_pwr_ack;
+   /* Either wait until SRAM_PDN_ACK all 0 or have a force wait */
+   if (MTK_SCPD_CAPS(scpd, MTK_SCPD_FWAIT_SRAM)) {
+   /*
+* Currently, MTK_SCPD_FWAIT_SRAM is necessary only for
+* MT7622_POWER_DOMAIN_WB and thus just a trivial setup is
+* applied here.
+*/
+   usleep_range(12000, 12100);
+
+   } else {
+   ret = readl_poll_timeout(ctl_addr, tmp, (tmp & pdn_ack) == 0,
+MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
+   if (ret < 0)
+   goto err_pwr_ack;
+   };
 
if (scpd->data->bus_prot_mask) {
ret = mtk_infracfg_clear_bus_protection(scp->infracfg,
@@ -785,7 +796,7 @@ static const struct scp_domain_data 
scp_domain_data_mt7622[] = {
.sram_pdn_ack_bits = 0,
.clk_id = {CLK_NONE},
.bus_prot_mask = MT7622_TOP_AXI_PROT_EN_WB,
-   .caps = MTK_SCPD_ACTIVE_WAKEUP,
+   .caps = MTK_SCPD_ACTIVE_WAKEUP | MTK_SCPD_FWAIT_SRAM,
},
 };
 
-- 
2.7.4



[PATCH v3] soc: mediatek: add a fixed wait for SRAM stable

2018-05-01 Thread sean.wang
From: Sean Wang 

MT7622_POWER_DOMAIN_WB doesn't send an ACK when its managed SRAM becomes
stable, which is not like the behavior the other power domains should
have. Therefore, it's necessary for such a power domain to have a fixed
and well-predefined duration to wait until its managed SRAM can be allowed
to access by all functions running on the top.

v1 -> v2:
 - use MTK_SCPD_FWAIT_SRAM flag as an indication requiring force waiting.

v2 -> v3:
 - change the order for condition judgment with we check an existence of
   the MTK_SCPD_FWAIT_SRAM first to get rid of any oversight on negative
   condition.

Signed-off-by: Sean Wang 
Cc: Matthias Brugger 
Cc: Ulf Hansson 
Cc: Weiyi Lu 
---
 drivers/soc/mediatek/mtk-scpsys.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/soc/mediatek/mtk-scpsys.c 
b/drivers/soc/mediatek/mtk-scpsys.c
index b1b45e4..2080c8f 100644
--- a/drivers/soc/mediatek/mtk-scpsys.c
+++ b/drivers/soc/mediatek/mtk-scpsys.c
@@ -32,6 +32,7 @@
 #define MTK_POLL_TIMEOUT(jiffies_to_usecs(HZ))
 
 #define MTK_SCPD_ACTIVE_WAKEUP BIT(0)
+#define MTK_SCPD_FWAIT_SRAMBIT(1)
 #define MTK_SCPD_CAPS(_scpd, _x)   ((_scpd)->data->caps & (_x))
 
 #define SPM_VDE_PWR_CON0x0210
@@ -237,11 +238,21 @@ static int scpsys_power_on(struct generic_pm_domain 
*genpd)
val &= ~scpd->data->sram_pdn_bits;
writel(val, ctl_addr);
 
-   /* wait until SRAM_PDN_ACK all 0 */
-   ret = readl_poll_timeout(ctl_addr, tmp, (tmp & pdn_ack) == 0,
-MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
-   if (ret < 0)
-   goto err_pwr_ack;
+   /* Either wait until SRAM_PDN_ACK all 0 or have a force wait */
+   if (MTK_SCPD_CAPS(scpd, MTK_SCPD_FWAIT_SRAM)) {
+   /*
+* Currently, MTK_SCPD_FWAIT_SRAM is necessary only for
+* MT7622_POWER_DOMAIN_WB and thus just a trivial setup is
+* applied here.
+*/
+   usleep_range(12000, 12100);
+
+   } else {
+   ret = readl_poll_timeout(ctl_addr, tmp, (tmp & pdn_ack) == 0,
+MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
+   if (ret < 0)
+   goto err_pwr_ack;
+   };
 
if (scpd->data->bus_prot_mask) {
ret = mtk_infracfg_clear_bus_protection(scp->infracfg,
@@ -785,7 +796,7 @@ static const struct scp_domain_data 
scp_domain_data_mt7622[] = {
.sram_pdn_ack_bits = 0,
.clk_id = {CLK_NONE},
.bus_prot_mask = MT7622_TOP_AXI_PROT_EN_WB,
-   .caps = MTK_SCPD_ACTIVE_WAKEUP,
+   .caps = MTK_SCPD_ACTIVE_WAKEUP | MTK_SCPD_FWAIT_SRAM,
},
 };
 
-- 
2.7.4



Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Dan Williams
On Tue, May 1, 2018 at 8:33 PM, Linus Torvalds
 wrote:
> On Tue, May 1, 2018 at 8:22 PM Dan Williams 
> wrote:
>
>> All that to say that having a typical RAM page covering poisoned pmem
>> would complicate the 'clear badblocks' implementation.
>
> Ugh, ok.
>
> I guess the good news is that your patches aren't so big, and don't really
> affect anything else.
>
> But can we at least take this to be the impetus for just getting rid of
> that disgusting unrolled memcpy? Ablout half of the lines in the patch set
> comes from that thing.
>
> Is anybody seriously going to use pmem with some in-order chip that can't
> even get something as simple as a memory copy loop right? "git blame"
> fingers Tony Luck, I think he may have been influenced by the fumes from
> Itanium.
>
> I  have some dim memory of "rep movs doesn't work well for pmem", but does
> it *seriously* need unrolling to cacheline boundaries? And if it does, who
> designed it, and why is anybody using it?
>

I think this is an FAQ from the original submission, in fact some guy
named "Linus Torvalds" asked [1]:

---

>  - why does this use the complex - and slower, on modern machines -
> unrolled manual memory copy, when you might as well just use a single
>
>  rep ; movsb
>
> which not only makes it smaller, but makes the exception fixup trivial.

Because current generation cpus don't give a recoverable machine
check if we consume with a "rep ; movsb" :-(
When we have that we can pick the best copy function based
on the capabilities of the cpu we are running on.

---

[1]: https://lkml.org/lkml/2016/2/18/608


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Dan Williams
On Tue, May 1, 2018 at 8:33 PM, Linus Torvalds
 wrote:
> On Tue, May 1, 2018 at 8:22 PM Dan Williams 
> wrote:
>
>> All that to say that having a typical RAM page covering poisoned pmem
>> would complicate the 'clear badblocks' implementation.
>
> Ugh, ok.
>
> I guess the good news is that your patches aren't so big, and don't really
> affect anything else.
>
> But can we at least take this to be the impetus for just getting rid of
> that disgusting unrolled memcpy? Ablout half of the lines in the patch set
> comes from that thing.
>
> Is anybody seriously going to use pmem with some in-order chip that can't
> even get something as simple as a memory copy loop right? "git blame"
> fingers Tony Luck, I think he may have been influenced by the fumes from
> Itanium.
>
> I  have some dim memory of "rep movs doesn't work well for pmem", but does
> it *seriously* need unrolling to cacheline boundaries? And if it does, who
> designed it, and why is anybody using it?
>

I think this is an FAQ from the original submission, in fact some guy
named "Linus Torvalds" asked [1]:

---

>  - why does this use the complex - and slower, on modern machines -
> unrolled manual memory copy, when you might as well just use a single
>
>  rep ; movsb
>
> which not only makes it smaller, but makes the exception fixup trivial.

Because current generation cpus don't give a recoverable machine
check if we consume with a "rep ; movsb" :-(
When we have that we can pick the best copy function based
on the capabilities of the cpu we are running on.

---

[1]: https://lkml.org/lkml/2016/2/18/608


Re: [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg

2018-05-01 Thread Kim Phillips
On Tue, 1 May 2018 10:10:51 +0100
Suzuki K Poulose  wrote:

> Convert component enable/disable messages from dev_info to dev_dbg.
> This is required to prevent LOCKDEP splats when operating in perf
> mode where we could be called with locks held to enable a coresight

Can we see the splats?  Doesn't lockdep turn itself off if it starts
triggering too many splats?

> path. If someone wants to really see the messages, they can always
> enable it at runtime via dynamic_debug.

Won't the splats still occur when the messages are enabled with
dynamic_debug?

So in effect this patch only tries to mitigate the splats, all the
while making things harder for regular users that now have to recompile
their kernels, in exchange for a very small convenience for kernel
developers that happen to see a splat or two with DEBUG_LOCKDEP set?

Not the greatest choice...How about moving the dev_infos outside of the
locks instead?

Thanks,

Kim


Re: [PATCH v2 21/27] coresight: Convert driver messages to dev_dbg

2018-05-01 Thread Kim Phillips
On Tue, 1 May 2018 10:10:51 +0100
Suzuki K Poulose  wrote:

> Convert component enable/disable messages from dev_info to dev_dbg.
> This is required to prevent LOCKDEP splats when operating in perf
> mode where we could be called with locks held to enable a coresight

Can we see the splats?  Doesn't lockdep turn itself off if it starts
triggering too many splats?

> path. If someone wants to really see the messages, they can always
> enable it at runtime via dynamic_debug.

Won't the splats still occur when the messages are enabled with
dynamic_debug?

So in effect this patch only tries to mitigate the splats, all the
while making things harder for regular users that now have to recompile
their kernels, in exchange for a very small convenience for kernel
developers that happen to see a splat or two with DEBUG_LOCKDEP set?

Not the greatest choice...How about moving the dev_infos outside of the
locks instead?

Thanks,

Kim


Re: [RFC PATCH for 4.18 00/14] Restartable Sequences

2018-05-01 Thread Daniel Colascione
Hi Mathieu: this work looks very cool. See inline.

On Mon, Apr 30, 2018 at 3:48 PM Mathieu Desnoyers <
mathieu.desnoy...@efficios.com> wrote:

> Hi,

> Here is an updated RFC round of the Restartable Sequences patchset
> based on kernel 4.17-rc3. Based on feedback from Linus, I'm introducing
> only the rseq system call, keeping the rest for later.

> This already enables speeding up the Facebook jemalloc and arm64 PMC
> read from user-space use-cases, as well as speedup of use-cases relying
> on getting the current cpu number from user-space. We'll have to wait
> until a more complete solution is introduced before the LTTng-UST
> tracer can replace its ring buffer atomic instructions with rseq
> though. But let's proceed one step at a time.

I like the general theme of the kernel using its "superpowers" (in this
case, knowledge of preemption) to help userspace do a better job without
userspace code needing to enter the kernel to benefit. The per-CPU data
structures this patch enables help in a lot of use cases, but I think
there's another use case that you might not have considered, one that can
benefit from a extension to your proposed API.

Consider mutexes: in the kernel, for mutual exclusion, we can use a
spinlock, which in the kernel ends up being simpler and (in a lot of
scenarios) more efficient than a mutex: a core that takes a spinlock
promises to keep the lock held for only a very short time, and it disables
interrupts to delay asynchronous work that might unexpectedly lengthen the
critical section. A different core that wants to grab that spinlock can
just spin on the lock word, confident that its spin will be short because
any core owning the lock is guaranteed to release it very quickly. (Long
spins would be very bad for power.) The overall result is a lock that's
much lighter than a mutex. (A spinlock can also be used in places we can't
sleep, but this ability isn't relevant to the discussion below.)

Userspace doesn't have a good equivalent to a lightweight spinlock. While
you can build a spinlock in userspace, the result ends up having serious
problems because of preemption: first, a thread owning such a lock can be
preempted in its critical section, lengthening the critical section
arbitrarily. Second, a thread spinning on a lock will keep spinning even
when the owning thread isn't scheduled anywhere.

Userspace can just implement a mutex as a try-acquire and a FUTEX_WAIT on
failure. This approach works fine when there's no contention, but a system
call is a pretty heavy operation. Why pay for a system call on occasional
light congestion with a short critical section. Can we do better?

The usual approach to "better" is an "adaptive mutex". Such a thing, when
it attempts to acquire a lock another thread owns, spins for some number of
iterations, then falls back to futex. I guess that's a little better than
immediately jumping to futex, but it's not optimal. We can still spin when
the lock owner isn't scheduled, and the spin count is usually some guess
(either specified manually or estimated statistically) that's not
guaranteed to produce decent results. Even if we do pick a good spin count,
we run a very good chance of under- or over-spinning on any given
lock-acquire. We always want to sleep when spinning would be pointless.

One important case of the spin-while-not-scheduled problem is operation on
a uniprocessor system: on such a system, only a single task can be
scheduled at a time, making all spins maximally pointless. The usual
approach to avoiding wasted spins for adaptive mutexes is for the adaptive
mutex library to ask upon initialization "How many cores are in this
system?", and if the answer comes back as "1", disable spinning. This
approach is inadequate: CPU affinity can change at arbitrary times, and CPU
affinity can produce the same spin pessimization that a uniprocessor system
does.

I think a small enhancement to rseq would let us build a perfect userspace
mutex, one that spins on lock-acquire only when the lock owner is running
and that sleeps otherwise, freeing userspace from both specifying ad-hoc
spin counts and from trying to detect situations in which spinning is
generally pointless.

It'd work like this: in the per-thread rseq data structure, we'd include a
description of a futex operation for the kernel would perform (in the
context of the preempted thread) upon preemption, immediately before
schedule(). If the futex operation itself sleeps, that's no problem: we
will have still accomplished our goal of running some other thread instead
of the preempted thread.

Suppose we make a userspace mutex implemented with a lock word having three
bits: acquired, sleep_mode, and wait_pending, with the rest of the word not
being relevant at the moment.

We'd implement lock-acquire the usual way, CASing the acquired bit into the
lock and deeming the lock taken if we're successful. Except that before
attempting the CAS, we'd configure the current thread's rseq area to

Re: [RFC PATCH for 4.18 00/14] Restartable Sequences

2018-05-01 Thread Daniel Colascione
Hi Mathieu: this work looks very cool. See inline.

On Mon, Apr 30, 2018 at 3:48 PM Mathieu Desnoyers <
mathieu.desnoy...@efficios.com> wrote:

> Hi,

> Here is an updated RFC round of the Restartable Sequences patchset
> based on kernel 4.17-rc3. Based on feedback from Linus, I'm introducing
> only the rseq system call, keeping the rest for later.

> This already enables speeding up the Facebook jemalloc and arm64 PMC
> read from user-space use-cases, as well as speedup of use-cases relying
> on getting the current cpu number from user-space. We'll have to wait
> until a more complete solution is introduced before the LTTng-UST
> tracer can replace its ring buffer atomic instructions with rseq
> though. But let's proceed one step at a time.

I like the general theme of the kernel using its "superpowers" (in this
case, knowledge of preemption) to help userspace do a better job without
userspace code needing to enter the kernel to benefit. The per-CPU data
structures this patch enables help in a lot of use cases, but I think
there's another use case that you might not have considered, one that can
benefit from a extension to your proposed API.

Consider mutexes: in the kernel, for mutual exclusion, we can use a
spinlock, which in the kernel ends up being simpler and (in a lot of
scenarios) more efficient than a mutex: a core that takes a spinlock
promises to keep the lock held for only a very short time, and it disables
interrupts to delay asynchronous work that might unexpectedly lengthen the
critical section. A different core that wants to grab that spinlock can
just spin on the lock word, confident that its spin will be short because
any core owning the lock is guaranteed to release it very quickly. (Long
spins would be very bad for power.) The overall result is a lock that's
much lighter than a mutex. (A spinlock can also be used in places we can't
sleep, but this ability isn't relevant to the discussion below.)

Userspace doesn't have a good equivalent to a lightweight spinlock. While
you can build a spinlock in userspace, the result ends up having serious
problems because of preemption: first, a thread owning such a lock can be
preempted in its critical section, lengthening the critical section
arbitrarily. Second, a thread spinning on a lock will keep spinning even
when the owning thread isn't scheduled anywhere.

Userspace can just implement a mutex as a try-acquire and a FUTEX_WAIT on
failure. This approach works fine when there's no contention, but a system
call is a pretty heavy operation. Why pay for a system call on occasional
light congestion with a short critical section. Can we do better?

The usual approach to "better" is an "adaptive mutex". Such a thing, when
it attempts to acquire a lock another thread owns, spins for some number of
iterations, then falls back to futex. I guess that's a little better than
immediately jumping to futex, but it's not optimal. We can still spin when
the lock owner isn't scheduled, and the spin count is usually some guess
(either specified manually or estimated statistically) that's not
guaranteed to produce decent results. Even if we do pick a good spin count,
we run a very good chance of under- or over-spinning on any given
lock-acquire. We always want to sleep when spinning would be pointless.

One important case of the spin-while-not-scheduled problem is operation on
a uniprocessor system: on such a system, only a single task can be
scheduled at a time, making all spins maximally pointless. The usual
approach to avoiding wasted spins for adaptive mutexes is for the adaptive
mutex library to ask upon initialization "How many cores are in this
system?", and if the answer comes back as "1", disable spinning. This
approach is inadequate: CPU affinity can change at arbitrary times, and CPU
affinity can produce the same spin pessimization that a uniprocessor system
does.

I think a small enhancement to rseq would let us build a perfect userspace
mutex, one that spins on lock-acquire only when the lock owner is running
and that sleeps otherwise, freeing userspace from both specifying ad-hoc
spin counts and from trying to detect situations in which spinning is
generally pointless.

It'd work like this: in the per-thread rseq data structure, we'd include a
description of a futex operation for the kernel would perform (in the
context of the preempted thread) upon preemption, immediately before
schedule(). If the futex operation itself sleeps, that's no problem: we
will have still accomplished our goal of running some other thread instead
of the preempted thread.

Suppose we make a userspace mutex implemented with a lock word having three
bits: acquired, sleep_mode, and wait_pending, with the rest of the word not
being relevant at the moment.

We'd implement lock-acquire the usual way, CASing the acquired bit into the
lock and deeming the lock taken if we're successful. Except that before
attempting the CAS, we'd configure the current thread's rseq area to

Re: [PATCH 2/2] arm64: dts: mt7622: add audio related device nodes

2018-05-01 Thread Sean Wang
Hi, Matthias

On Wed, 2018-05-02 at 11:41 +0800, sean.w...@mediatek.com wrote:
> From: Ryder Lee 
> 
> Add audio device nodes and its proper setup for all used pins
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sean Wang 
> ---
>  arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts | 11 +++-
>  arch/arm64/boot/dts/mediatek/mt7622.dtsi | 89 
> 
>  2 files changed, 98 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts 
> b/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
> index 45d8655..b783764 100644
> --- a/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
> +++ b/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
> @@ -18,7 +18,7 @@
>   compatible = "mediatek,mt7622-rfb1", "mediatek,mt7622";
>  
>   chosen {
> - bootargs = "console=ttyS0,115200n1";
> + bootargs = "console=ttyS0,115200n1 swiotlb=512";
>   };
>  
>   cpus {
> @@ -163,10 +163,17 @@
>   i2s1_pins: i2s1-pins {
>   mux {
>   function = "i2s";
> - groups =  "i2s_out_bclk_ws_mclk",
> + groups =  "i2s_out_mclk_bclk_ws",

The old one ("i2s_out_bclk_ws_mclk") should be the indecisive value when
I initially worked on the pinctrl device.

It has to be corrected with "i2s_out_mclk_bclk_ws" to comply with the
final dt-bindings.

> "i2s1_in_data",
> "i2s1_out_data";
>   };
> +
> + conf {
> + pins = "I2S1_IN", "I2S1_OUT", "I2S_BCLK",
> +"I2S_WS", "I2S_MCLK";
> + drive-strength = <12>;
> + bias-pull-down;
> + };
>   };
>  
>   irrx_pins: irrx-pins {
> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi 
> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> index 6bbabb6..9213c96 100644
> --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> @@ -527,6 +527,95 @@
>   status = "disabled";
>   };
>  
> + audsys: clock-controller@1122 {
> + compatible = "mediatek,mt7622-audsys", "syscon";
> + reg = <0 0x1122 0 0x2000>;
> + #clock-cells = <1>;

The binding have been through broonie/sound.git around a week ago

> +
> + afe: audio-controller {
> + compatible = "mediatek,mt7622-audio";
> + interrupts =  ,
> +   ;
> + interrupt-names = "afe", "asys";
> +
> + clocks = < CLK_INFRA_AUDIO_PD>,
> +  < CLK_TOP_AUD1_SEL>,
> +  < CLK_TOP_AUD2_SEL>,
> +  < CLK_TOP_A1SYS_HP_DIV_PD>,
> +  < CLK_TOP_A2SYS_HP_DIV_PD>,
> +  < CLK_TOP_I2S0_MCK_SEL>,
> +  < CLK_TOP_I2S1_MCK_SEL>,
> +  < CLK_TOP_I2S2_MCK_SEL>,
> +  < CLK_TOP_I2S3_MCK_SEL>,
> +  < CLK_TOP_I2S0_MCK_DIV>,
> +  < CLK_TOP_I2S1_MCK_DIV>,
> +  < CLK_TOP_I2S2_MCK_DIV>,
> +  < CLK_TOP_I2S3_MCK_DIV>,
> +  < CLK_TOP_I2S0_MCK_DIV_PD>,
> +  < CLK_TOP_I2S1_MCK_DIV_PD>,
> +  < CLK_TOP_I2S2_MCK_DIV_PD>,
> +  < CLK_TOP_I2S3_MCK_DIV_PD>,
> +  < CLK_AUDIO_I2SO1>,
> +  < CLK_AUDIO_I2SO2>,
> +  < CLK_AUDIO_I2SO3>,
> +  < CLK_AUDIO_I2SO4>,
> +  < CLK_AUDIO_I2SIN1>,
> +  < CLK_AUDIO_I2SIN2>,
> +  < CLK_AUDIO_I2SIN3>,
> +  < CLK_AUDIO_I2SIN4>,
> +  < CLK_AUDIO_ASRCO1>,
> +  < CLK_AUDIO_ASRCO2>,
> +  < CLK_AUDIO_ASRCO3>,
> +  < CLK_AUDIO_ASRCO4>,
> +  < CLK_AUDIO_AFE>,
> +  < CLK_AUDIO_AFE_CONN>,
> +  < CLK_AUDIO_A1SYS>,
> +  < CLK_AUDIO_A2SYS>;
> +
> + clock-names = "infra_sys_audio_clk",
> +   "top_audio_mux1_sel",
> +   "top_audio_mux2_sel",
> +   "top_audio_a1sys_hp",
> +   "top_audio_a2sys_hp",
> +   "i2s0_src_sel",
> +   "i2s1_src_sel",
> +   "i2s2_src_sel",
> +  

Re: [PATCH 2/2] arm64: dts: mt7622: add audio related device nodes

2018-05-01 Thread Sean Wang
Hi, Matthias

On Wed, 2018-05-02 at 11:41 +0800, sean.w...@mediatek.com wrote:
> From: Ryder Lee 
> 
> Add audio device nodes and its proper setup for all used pins
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sean Wang 
> ---
>  arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts | 11 +++-
>  arch/arm64/boot/dts/mediatek/mt7622.dtsi | 89 
> 
>  2 files changed, 98 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts 
> b/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
> index 45d8655..b783764 100644
> --- a/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
> +++ b/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
> @@ -18,7 +18,7 @@
>   compatible = "mediatek,mt7622-rfb1", "mediatek,mt7622";
>  
>   chosen {
> - bootargs = "console=ttyS0,115200n1";
> + bootargs = "console=ttyS0,115200n1 swiotlb=512";
>   };
>  
>   cpus {
> @@ -163,10 +163,17 @@
>   i2s1_pins: i2s1-pins {
>   mux {
>   function = "i2s";
> - groups =  "i2s_out_bclk_ws_mclk",
> + groups =  "i2s_out_mclk_bclk_ws",

The old one ("i2s_out_bclk_ws_mclk") should be the indecisive value when
I initially worked on the pinctrl device.

It has to be corrected with "i2s_out_mclk_bclk_ws" to comply with the
final dt-bindings.

> "i2s1_in_data",
> "i2s1_out_data";
>   };
> +
> + conf {
> + pins = "I2S1_IN", "I2S1_OUT", "I2S_BCLK",
> +"I2S_WS", "I2S_MCLK";
> + drive-strength = <12>;
> + bias-pull-down;
> + };
>   };
>  
>   irrx_pins: irrx-pins {
> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi 
> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> index 6bbabb6..9213c96 100644
> --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> @@ -527,6 +527,95 @@
>   status = "disabled";
>   };
>  
> + audsys: clock-controller@1122 {
> + compatible = "mediatek,mt7622-audsys", "syscon";
> + reg = <0 0x1122 0 0x2000>;
> + #clock-cells = <1>;

The binding have been through broonie/sound.git around a week ago

> +
> + afe: audio-controller {
> + compatible = "mediatek,mt7622-audio";
> + interrupts =  ,
> +   ;
> + interrupt-names = "afe", "asys";
> +
> + clocks = < CLK_INFRA_AUDIO_PD>,
> +  < CLK_TOP_AUD1_SEL>,
> +  < CLK_TOP_AUD2_SEL>,
> +  < CLK_TOP_A1SYS_HP_DIV_PD>,
> +  < CLK_TOP_A2SYS_HP_DIV_PD>,
> +  < CLK_TOP_I2S0_MCK_SEL>,
> +  < CLK_TOP_I2S1_MCK_SEL>,
> +  < CLK_TOP_I2S2_MCK_SEL>,
> +  < CLK_TOP_I2S3_MCK_SEL>,
> +  < CLK_TOP_I2S0_MCK_DIV>,
> +  < CLK_TOP_I2S1_MCK_DIV>,
> +  < CLK_TOP_I2S2_MCK_DIV>,
> +  < CLK_TOP_I2S3_MCK_DIV>,
> +  < CLK_TOP_I2S0_MCK_DIV_PD>,
> +  < CLK_TOP_I2S1_MCK_DIV_PD>,
> +  < CLK_TOP_I2S2_MCK_DIV_PD>,
> +  < CLK_TOP_I2S3_MCK_DIV_PD>,
> +  < CLK_AUDIO_I2SO1>,
> +  < CLK_AUDIO_I2SO2>,
> +  < CLK_AUDIO_I2SO3>,
> +  < CLK_AUDIO_I2SO4>,
> +  < CLK_AUDIO_I2SIN1>,
> +  < CLK_AUDIO_I2SIN2>,
> +  < CLK_AUDIO_I2SIN3>,
> +  < CLK_AUDIO_I2SIN4>,
> +  < CLK_AUDIO_ASRCO1>,
> +  < CLK_AUDIO_ASRCO2>,
> +  < CLK_AUDIO_ASRCO3>,
> +  < CLK_AUDIO_ASRCO4>,
> +  < CLK_AUDIO_AFE>,
> +  < CLK_AUDIO_AFE_CONN>,
> +  < CLK_AUDIO_A1SYS>,
> +  < CLK_AUDIO_A2SYS>;
> +
> + clock-names = "infra_sys_audio_clk",
> +   "top_audio_mux1_sel",
> +   "top_audio_mux2_sel",
> +   "top_audio_a1sys_hp",
> +   "top_audio_a2sys_hp",
> +   "i2s0_src_sel",
> +   "i2s1_src_sel",
> +   "i2s2_src_sel",
> +   "i2s3_src_sel",
> +

Re: [RFC v3 0/5] virtio: support packed ring

2018-05-01 Thread Jason Wang



On 2018年04月25日 13:15, Tiwei Bie wrote:

Hello everyone,

This RFC implements packed ring support in virtio driver.

Some simple functional tests have been done with Jason's
packed ring implementation in vhost:

https://lkml.org/lkml/2018/4/23/12

Both of ping and netperf worked as expected (with EVENT_IDX
disabled). But there are below known issues:

1. Reloading the guest driver will break the Tx/Rx;


It looks like the reason is we don't sync wrap counter information 
between host and qemu through VHOST_SET/GET_VRING_BASE. And both vhost 
and qemu need to do this through encoding warp counters to higher bits 
of vhost_vring_state.num,


Thanks


Re: [RFC v3 0/5] virtio: support packed ring

2018-05-01 Thread Jason Wang



On 2018年04月25日 13:15, Tiwei Bie wrote:

Hello everyone,

This RFC implements packed ring support in virtio driver.

Some simple functional tests have been done with Jason's
packed ring implementation in vhost:

https://lkml.org/lkml/2018/4/23/12

Both of ping and netperf worked as expected (with EVENT_IDX
disabled). But there are below known issues:

1. Reloading the guest driver will break the Tx/Rx;


It looks like the reason is we don't sync wrap counter information 
between host and qemu through VHOST_SET/GET_VRING_BASE. And both vhost 
and qemu need to do this through encoding warp counters to higher bits 
of vhost_vring_state.num,


Thanks


[PATCH 2/2] arm64: dts: mt7622: add audio related device nodes

2018-05-01 Thread sean.wang
From: Ryder Lee 

Add audio device nodes and its proper setup for all used pins

Signed-off-by: Ryder Lee 
Signed-off-by: Sean Wang 
---
 arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts | 11 +++-
 arch/arm64/boot/dts/mediatek/mt7622.dtsi | 89 
 2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts 
b/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
index 45d8655..b783764 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
+++ b/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
@@ -18,7 +18,7 @@
compatible = "mediatek,mt7622-rfb1", "mediatek,mt7622";
 
chosen {
-   bootargs = "console=ttyS0,115200n1";
+   bootargs = "console=ttyS0,115200n1 swiotlb=512";
};
 
cpus {
@@ -163,10 +163,17 @@
i2s1_pins: i2s1-pins {
mux {
function = "i2s";
-   groups =  "i2s_out_bclk_ws_mclk",
+   groups =  "i2s_out_mclk_bclk_ws",
  "i2s1_in_data",
  "i2s1_out_data";
};
+
+   conf {
+   pins = "I2S1_IN", "I2S1_OUT", "I2S_BCLK",
+  "I2S_WS", "I2S_MCLK";
+   drive-strength = <12>;
+   bias-pull-down;
+   };
};
 
irrx_pins: irrx-pins {
diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi 
b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
index 6bbabb6..9213c96 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
@@ -527,6 +527,95 @@
status = "disabled";
};
 
+   audsys: clock-controller@1122 {
+   compatible = "mediatek,mt7622-audsys", "syscon";
+   reg = <0 0x1122 0 0x2000>;
+   #clock-cells = <1>;
+
+   afe: audio-controller {
+   compatible = "mediatek,mt7622-audio";
+   interrupts =  ,
+ ;
+   interrupt-names = "afe", "asys";
+
+   clocks = < CLK_INFRA_AUDIO_PD>,
+< CLK_TOP_AUD1_SEL>,
+< CLK_TOP_AUD2_SEL>,
+< CLK_TOP_A1SYS_HP_DIV_PD>,
+< CLK_TOP_A2SYS_HP_DIV_PD>,
+< CLK_TOP_I2S0_MCK_SEL>,
+< CLK_TOP_I2S1_MCK_SEL>,
+< CLK_TOP_I2S2_MCK_SEL>,
+< CLK_TOP_I2S3_MCK_SEL>,
+< CLK_TOP_I2S0_MCK_DIV>,
+< CLK_TOP_I2S1_MCK_DIV>,
+< CLK_TOP_I2S2_MCK_DIV>,
+< CLK_TOP_I2S3_MCK_DIV>,
+< CLK_TOP_I2S0_MCK_DIV_PD>,
+< CLK_TOP_I2S1_MCK_DIV_PD>,
+< CLK_TOP_I2S2_MCK_DIV_PD>,
+< CLK_TOP_I2S3_MCK_DIV_PD>,
+< CLK_AUDIO_I2SO1>,
+< CLK_AUDIO_I2SO2>,
+< CLK_AUDIO_I2SO3>,
+< CLK_AUDIO_I2SO4>,
+< CLK_AUDIO_I2SIN1>,
+< CLK_AUDIO_I2SIN2>,
+< CLK_AUDIO_I2SIN3>,
+< CLK_AUDIO_I2SIN4>,
+< CLK_AUDIO_ASRCO1>,
+< CLK_AUDIO_ASRCO2>,
+< CLK_AUDIO_ASRCO3>,
+< CLK_AUDIO_ASRCO4>,
+< CLK_AUDIO_AFE>,
+< CLK_AUDIO_AFE_CONN>,
+< CLK_AUDIO_A1SYS>,
+< CLK_AUDIO_A2SYS>;
+
+   clock-names = "infra_sys_audio_clk",
+ "top_audio_mux1_sel",
+ "top_audio_mux2_sel",
+ "top_audio_a1sys_hp",
+ "top_audio_a2sys_hp",
+ "i2s0_src_sel",
+ "i2s1_src_sel",
+ "i2s2_src_sel",
+ "i2s3_src_sel",
+ "i2s0_src_div",
+ "i2s1_src_div",
+ "i2s2_src_div",
+ "i2s3_src_div",
+ "i2s0_mclk_en",
+ "i2s1_mclk_en",
+ 

[PATCH 2/2] arm64: dts: mt7622: add audio related device nodes

2018-05-01 Thread sean.wang
From: Ryder Lee 

Add audio device nodes and its proper setup for all used pins

Signed-off-by: Ryder Lee 
Signed-off-by: Sean Wang 
---
 arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts | 11 +++-
 arch/arm64/boot/dts/mediatek/mt7622.dtsi | 89 
 2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts 
b/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
index 45d8655..b783764 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
+++ b/arch/arm64/boot/dts/mediatek/mt7622-rfb1.dts
@@ -18,7 +18,7 @@
compatible = "mediatek,mt7622-rfb1", "mediatek,mt7622";
 
chosen {
-   bootargs = "console=ttyS0,115200n1";
+   bootargs = "console=ttyS0,115200n1 swiotlb=512";
};
 
cpus {
@@ -163,10 +163,17 @@
i2s1_pins: i2s1-pins {
mux {
function = "i2s";
-   groups =  "i2s_out_bclk_ws_mclk",
+   groups =  "i2s_out_mclk_bclk_ws",
  "i2s1_in_data",
  "i2s1_out_data";
};
+
+   conf {
+   pins = "I2S1_IN", "I2S1_OUT", "I2S_BCLK",
+  "I2S_WS", "I2S_MCLK";
+   drive-strength = <12>;
+   bias-pull-down;
+   };
};
 
irrx_pins: irrx-pins {
diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi 
b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
index 6bbabb6..9213c96 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
@@ -527,6 +527,95 @@
status = "disabled";
};
 
+   audsys: clock-controller@1122 {
+   compatible = "mediatek,mt7622-audsys", "syscon";
+   reg = <0 0x1122 0 0x2000>;
+   #clock-cells = <1>;
+
+   afe: audio-controller {
+   compatible = "mediatek,mt7622-audio";
+   interrupts =  ,
+ ;
+   interrupt-names = "afe", "asys";
+
+   clocks = < CLK_INFRA_AUDIO_PD>,
+< CLK_TOP_AUD1_SEL>,
+< CLK_TOP_AUD2_SEL>,
+< CLK_TOP_A1SYS_HP_DIV_PD>,
+< CLK_TOP_A2SYS_HP_DIV_PD>,
+< CLK_TOP_I2S0_MCK_SEL>,
+< CLK_TOP_I2S1_MCK_SEL>,
+< CLK_TOP_I2S2_MCK_SEL>,
+< CLK_TOP_I2S3_MCK_SEL>,
+< CLK_TOP_I2S0_MCK_DIV>,
+< CLK_TOP_I2S1_MCK_DIV>,
+< CLK_TOP_I2S2_MCK_DIV>,
+< CLK_TOP_I2S3_MCK_DIV>,
+< CLK_TOP_I2S0_MCK_DIV_PD>,
+< CLK_TOP_I2S1_MCK_DIV_PD>,
+< CLK_TOP_I2S2_MCK_DIV_PD>,
+< CLK_TOP_I2S3_MCK_DIV_PD>,
+< CLK_AUDIO_I2SO1>,
+< CLK_AUDIO_I2SO2>,
+< CLK_AUDIO_I2SO3>,
+< CLK_AUDIO_I2SO4>,
+< CLK_AUDIO_I2SIN1>,
+< CLK_AUDIO_I2SIN2>,
+< CLK_AUDIO_I2SIN3>,
+< CLK_AUDIO_I2SIN4>,
+< CLK_AUDIO_ASRCO1>,
+< CLK_AUDIO_ASRCO2>,
+< CLK_AUDIO_ASRCO3>,
+< CLK_AUDIO_ASRCO4>,
+< CLK_AUDIO_AFE>,
+< CLK_AUDIO_AFE_CONN>,
+< CLK_AUDIO_A1SYS>,
+< CLK_AUDIO_A2SYS>;
+
+   clock-names = "infra_sys_audio_clk",
+ "top_audio_mux1_sel",
+ "top_audio_mux2_sel",
+ "top_audio_a1sys_hp",
+ "top_audio_a2sys_hp",
+ "i2s0_src_sel",
+ "i2s1_src_sel",
+ "i2s2_src_sel",
+ "i2s3_src_sel",
+ "i2s0_src_div",
+ "i2s1_src_div",
+ "i2s2_src_div",
+ "i2s3_src_div",
+ "i2s0_mclk_en",
+ "i2s1_mclk_en",
+ "i2s2_mclk_en",
+ "i2s3_mclk_en",
+   

[PATCH 1/2] arm64: dts: mt7622: add High-Speed DMA device nodes

2018-05-01 Thread sean.wang
From: Sean Wang 

add High-Speed DMA (HSDMA) nodes

Signed-off-by: Sean Wang 
---
 arch/arm64/boot/dts/mediatek/mt7622.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi 
b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
index e9d5130..6bbabb6 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
@@ -735,6 +735,16 @@
#reset-cells = <1>;
};
 
+   hsdma: dma-controller@1b007000 {
+   compatible = "mediatek,mt7622-hsdma";
+   reg = <0 0x1b007000 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_ETH_HSDMA_EN>;
+   clock-names = "hsdma";
+   power-domains = < MT7622_POWER_DOMAIN_ETHSYS>;
+   #dma-cells = <1>;
+   };
+
eth: ethernet@1b10 {
compatible = "mediatek,mt7622-eth",
 "mediatek,mt2701-eth",
-- 
2.7.4



[PATCH 1/2] arm64: dts: mt7622: add High-Speed DMA device nodes

2018-05-01 Thread sean.wang
From: Sean Wang 

add High-Speed DMA (HSDMA) nodes

Signed-off-by: Sean Wang 
---
 arch/arm64/boot/dts/mediatek/mt7622.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi 
b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
index e9d5130..6bbabb6 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
@@ -735,6 +735,16 @@
#reset-cells = <1>;
};
 
+   hsdma: dma-controller@1b007000 {
+   compatible = "mediatek,mt7622-hsdma";
+   reg = <0 0x1b007000 0 0x1000>;
+   interrupts = ;
+   clocks = < CLK_ETH_HSDMA_EN>;
+   clock-names = "hsdma";
+   power-domains = < MT7622_POWER_DOMAIN_ETHSYS>;
+   #dma-cells = <1>;
+   };
+
eth: ethernet@1b10 {
compatible = "mediatek,mt7622-eth",
 "mediatek,mt2701-eth",
-- 
2.7.4



Re: [PATCH] scsi: megaraid_sas: fix spelling mistake: "disbale" -> "disable"

2018-05-01 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in module parameter description text

Mangled patch. Applied by hand.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: mpt3sas: fix spelling mistake: "disbale" -> "disable"

2018-05-01 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in module parameter description text

Ditto. Also hand-applied to 4.18/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: mpt3sas: fix spelling mistake: "disbale" -> "disable"

2018-05-01 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in module parameter description text

Ditto. Also hand-applied to 4.18/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: megaraid_sas: fix spelling mistake: "disbale" -> "disable"

2018-05-01 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in module parameter description text

Mangled patch. Applied by hand.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 8:22 PM Dan Williams 
wrote:

> All that to say that having a typical RAM page covering poisoned pmem
> would complicate the 'clear badblocks' implementation.

Ugh, ok.

I guess the good news is that your patches aren't so big, and don't really
affect anything else.

But can we at least take this to be the impetus for just getting rid of
that disgusting unrolled memcpy? Ablout half of the lines in the patch set
comes from that thing.

Is anybody seriously going to use pmem with some in-order chip that can't
even get something as simple as a memory copy loop right? "git blame"
fingers Tony Luck, I think he may have been influenced by the fumes from
Itanium.

I  have some dim memory of "rep movs doesn't work well for pmem", but does
it *seriously* need unrolling to cacheline boundaries? And if it does, who
designed it, and why is anybody using it?

  Linus


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 8:22 PM Dan Williams 
wrote:

> All that to say that having a typical RAM page covering poisoned pmem
> would complicate the 'clear badblocks' implementation.

Ugh, ok.

I guess the good news is that your patches aren't so big, and don't really
affect anything else.

But can we at least take this to be the impetus for just getting rid of
that disgusting unrolled memcpy? Ablout half of the lines in the patch set
comes from that thing.

Is anybody seriously going to use pmem with some in-order chip that can't
even get something as simple as a memory copy loop right? "git blame"
fingers Tony Luck, I think he may have been influenced by the fumes from
Itanium.

I  have some dim memory of "rep movs doesn't work well for pmem", but does
it *seriously* need unrolling to cacheline boundaries? And if it does, who
designed it, and why is anybody using it?

  Linus


Re: [PATCH] scsi: esas2r: fix spelling mistake: "asynchromous" -> "asynchronous"

2018-05-01 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in module description text

Applied to 4.18/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: esas2r: fix spelling mistake: "asynchromous" -> "asynchronous"

2018-05-01 Thread Martin K. Petersen

Colin,

> Trivial fix to spelling mistake in module description text

Applied to 4.18/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: isci: remove redundant check on in_connection_align_insertion_frequency

2018-05-01 Thread Martin K. Petersen

Colin,

> The sanity check on u->in_connection_align_insertion_frequency is
> being performed twice and hence the first check can be removed since
> it is redundant. Cleans up cppcheck warning:
>
> drivers/scsi/ibmvscsi/ibmvscsi.c:1711: (warning) Identical inner 'if'
> condition is always true.

Applied to 4.18/scsi-queue. Thanks.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: isci: remove redundant check on in_connection_align_insertion_frequency

2018-05-01 Thread Martin K. Petersen

Colin,

> The sanity check on u->in_connection_align_insertion_frequency is
> being performed twice and hence the first check can be removed since
> it is redundant. Cleans up cppcheck warning:
>
> drivers/scsi/ibmvscsi/ibmvscsi.c:1711: (warning) Identical inner 'if'
> condition is always true.

Applied to 4.18/scsi-queue. Thanks.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 1/2] i2c: sprd: Prevent i2c accesses after suspend is called

2018-05-01 Thread Baolin Wang
Hi Wolfram,

On 27 April 2018 at 20:14, Wolfram Sang  wrote:
> On Mon, Apr 09, 2018 at 02:40:54PM +0800, Baolin Wang wrote:
>> Add one flag to indicate if the i2c controller has been in suspend state,
>> which can prevent i2c accesses after i2c controller is suspended following
>> system suspend.
>>
>> Signed-off-by: Baolin Wang 
>
> Applied to for-current, thanks!
>
> We should maybe handle this in the core somewhen, though. Or?

Thanks. Yes, It will more helpful if we can handle this in the i2c core.

-- 
Baolin.wang
Best Regards


Re: [PATCH 1/2] i2c: sprd: Prevent i2c accesses after suspend is called

2018-05-01 Thread Baolin Wang
Hi Wolfram,

On 27 April 2018 at 20:14, Wolfram Sang  wrote:
> On Mon, Apr 09, 2018 at 02:40:54PM +0800, Baolin Wang wrote:
>> Add one flag to indicate if the i2c controller has been in suspend state,
>> which can prevent i2c accesses after i2c controller is suspended following
>> system suspend.
>>
>> Signed-off-by: Baolin Wang 
>
> Applied to for-current, thanks!
>
> We should maybe handle this in the core somewhen, though. Or?

Thanks. Yes, It will more helpful if we can handle this in the i2c core.

-- 
Baolin.wang
Best Regards


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Dan Williams
On Tue, May 1, 2018 at 8:20 PM, Dan Williams  wrote:
> On Tue, May 1, 2018 at 8:13 PM, Linus Torvalds
>  wrote:
>> On Tue, May 1, 2018 at 8:03 PM Dan Williams 
>> wrote:
>>
>>> Because dax. There's no page cache indirection games we can play here
>>> to poison a page and map in another page. The mapped page is 1:1
>>> associated with the filesystem block and physical memory address.
>>
>> I'm not talking page cache indirection.
>>
>> I'm talking literally mapping a different page into the kernel virtual
>> address space that the failing read was done for.
>>
>> But you seem to be right that we don't actually support that. I'm guessing
>> the hwpoison code has never had to run in that kind of situation and will
>> just give up.
>>
>> That would seem to be sad. It really feels like the obvious solution to any
>> MCE's - just map a dummy page at the address that causes problems.
>>
>> That can have bad effects for real memory (because who knows what internal
>> kernel data structure might be in there), but would seem to be the
>> _optimal_ solution for some  random pmem access. And makes it absolutely
>> trivial to just return to the execution that got  the error exception.
>
> The other property of pmem that we need to contend with that makes it
> a snowflake relative to typical memory is that errors can be repaired
> by sending a slow-path command to the DIMM device. We trap block-layer
> writes in the pmem driver that target known 'badblocks' and send the
> sideband command to clear the error along with the new data.

All that to say that having a typical RAM page covering poisoned pmem
would complicate the 'clear badblocks' implementation.


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Dan Williams
On Tue, May 1, 2018 at 8:20 PM, Dan Williams  wrote:
> On Tue, May 1, 2018 at 8:13 PM, Linus Torvalds
>  wrote:
>> On Tue, May 1, 2018 at 8:03 PM Dan Williams 
>> wrote:
>>
>>> Because dax. There's no page cache indirection games we can play here
>>> to poison a page and map in another page. The mapped page is 1:1
>>> associated with the filesystem block and physical memory address.
>>
>> I'm not talking page cache indirection.
>>
>> I'm talking literally mapping a different page into the kernel virtual
>> address space that the failing read was done for.
>>
>> But you seem to be right that we don't actually support that. I'm guessing
>> the hwpoison code has never had to run in that kind of situation and will
>> just give up.
>>
>> That would seem to be sad. It really feels like the obvious solution to any
>> MCE's - just map a dummy page at the address that causes problems.
>>
>> That can have bad effects for real memory (because who knows what internal
>> kernel data structure might be in there), but would seem to be the
>> _optimal_ solution for some  random pmem access. And makes it absolutely
>> trivial to just return to the execution that got  the error exception.
>
> The other property of pmem that we need to contend with that makes it
> a snowflake relative to typical memory is that errors can be repaired
> by sending a slow-path command to the DIMM device. We trap block-layer
> writes in the pmem driver that target known 'badblocks' and send the
> sideband command to clear the error along with the new data.

All that to say that having a typical RAM page covering poisoned pmem
would complicate the 'clear badblocks' implementation.


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Dan Williams
On Tue, May 1, 2018 at 8:13 PM, Linus Torvalds
 wrote:
> On Tue, May 1, 2018 at 8:03 PM Dan Williams 
> wrote:
>
>> Because dax. There's no page cache indirection games we can play here
>> to poison a page and map in another page. The mapped page is 1:1
>> associated with the filesystem block and physical memory address.
>
> I'm not talking page cache indirection.
>
> I'm talking literally mapping a different page into the kernel virtual
> address space that the failing read was done for.
>
> But you seem to be right that we don't actually support that. I'm guessing
> the hwpoison code has never had to run in that kind of situation and will
> just give up.
>
> That would seem to be sad. It really feels like the obvious solution to any
> MCE's - just map a dummy page at the address that causes problems.
>
> That can have bad effects for real memory (because who knows what internal
> kernel data structure might be in there), but would seem to be the
> _optimal_ solution for some  random pmem access. And makes it absolutely
> trivial to just return to the execution that got  the error exception.

The other property of pmem that we need to contend with that makes it
a snowflake relative to typical memory is that errors can be repaired
by sending a slow-path command to the DIMM device. We trap block-layer
writes in the pmem driver that target known 'badblocks' and send the
sideband command to clear the error along with the new data.


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Dan Williams
On Tue, May 1, 2018 at 8:13 PM, Linus Torvalds
 wrote:
> On Tue, May 1, 2018 at 8:03 PM Dan Williams 
> wrote:
>
>> Because dax. There's no page cache indirection games we can play here
>> to poison a page and map in another page. The mapped page is 1:1
>> associated with the filesystem block and physical memory address.
>
> I'm not talking page cache indirection.
>
> I'm talking literally mapping a different page into the kernel virtual
> address space that the failing read was done for.
>
> But you seem to be right that we don't actually support that. I'm guessing
> the hwpoison code has never had to run in that kind of situation and will
> just give up.
>
> That would seem to be sad. It really feels like the obvious solution to any
> MCE's - just map a dummy page at the address that causes problems.
>
> That can have bad effects for real memory (because who knows what internal
> kernel data structure might be in there), but would seem to be the
> _optimal_ solution for some  random pmem access. And makes it absolutely
> trivial to just return to the execution that got  the error exception.

The other property of pmem that we need to contend with that makes it
a snowflake relative to typical memory is that errors can be repaired
by sending a slow-path command to the DIMM device. We trap block-layer
writes in the pmem driver that target known 'badblocks' and send the
sideband command to clear the error along with the new data.


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 8:03 PM Dan Williams 
wrote:

> Because dax. There's no page cache indirection games we can play here
> to poison a page and map in another page. The mapped page is 1:1
> associated with the filesystem block and physical memory address.

I'm not talking page cache indirection.

I'm talking literally mapping a different page into the kernel virtual
address space that the failing read was done for.

But you seem to be right that we don't actually support that. I'm guessing
the hwpoison code has never had to run in that kind of situation and will
just give up.

That would seem to be sad. It really feels like the obvious solution to any
MCE's - just map a dummy page at the address that causes problems.

That can have bad effects for real memory (because who knows what internal
kernel data structure might be in there), but would seem to be the
_optimal_ solution for some  random pmem access. And makes it absolutely
trivial to just return to the execution that got  the error exception.

Linus


Re: [PATCH 0/6] use memcpy_mcsafe() for copy_to_iter()

2018-05-01 Thread Linus Torvalds
On Tue, May 1, 2018 at 8:03 PM Dan Williams 
wrote:

> Because dax. There's no page cache indirection games we can play here
> to poison a page and map in another page. The mapped page is 1:1
> associated with the filesystem block and physical memory address.

I'm not talking page cache indirection.

I'm talking literally mapping a different page into the kernel virtual
address space that the failing read was done for.

But you seem to be right that we don't actually support that. I'm guessing
the hwpoison code has never had to run in that kind of situation and will
just give up.

That would seem to be sad. It really feels like the obvious solution to any
MCE's - just map a dummy page at the address that causes problems.

That can have bad effects for real memory (because who knows what internal
kernel data structure might be in there), but would seem to be the
_optimal_ solution for some  random pmem access. And makes it absolutely
trivial to just return to the execution that got  the error exception.

Linus


[PATCH v2 2/2] usb: dwc2: fix isoc split in transfer with no data

2018-05-01 Thread William Wu
If isoc split in transfer with no data (the length of DATA0
packet is zero), we can't simply return immediately. Because
the DATA0 can be the first transaction or the second transaction
for the isoc split in transaction. If the DATA0 packet with no
data is in the first transaction, we can return immediately.
But if the DATA0 packet with no data is in the second transaction
of isoc split in transaction sequence, we need to increase the
qtd->isoc_frame_index and giveback urb to device driver if needed,
otherwise, the MDATA packet will be lost.

A typical test case is that connect the dwc2 controller with an
usb hs Hub (GL852G-12), and plug an usb fs audio device (Plantronics
headset) into the downstream port of Hub. Then use the usb mic
to record, we can find noise when playback.

In the case, the isoc split in transaction sequence like this:

- SSPLIT IN transaction
- CSPLIT IN transaction
  - MDATA packet (176 bytes)
- CSPLIT IN transaction
  - DATA0 packet (0 byte)

This patch use both the length of DATA0 and qtd->isoc_split_offset
to check if the DATA0 is in the second transaction.

Signed-off-by: William Wu 
---
Changes in v2:
- Modify the commit message

 drivers/usb/dwc2/hcd_intr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index 5e2378f..479f628 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -930,7 +930,7 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg 
*hsotg,
frame_desc = >urb->iso_descs[qtd->isoc_frame_index];
len = dwc2_get_actual_xfer_length(hsotg, chan, chnum, qtd,
  DWC2_HC_XFER_COMPLETE, NULL);
-   if (!len) {
+   if (!len && !qtd->isoc_split_offset) {
qtd->complete_split = 0;
qtd->isoc_split_offset = 0;
return 0;
-- 
2.0.0




[PATCH v2 2/2] usb: dwc2: fix isoc split in transfer with no data

2018-05-01 Thread William Wu
If isoc split in transfer with no data (the length of DATA0
packet is zero), we can't simply return immediately. Because
the DATA0 can be the first transaction or the second transaction
for the isoc split in transaction. If the DATA0 packet with no
data is in the first transaction, we can return immediately.
But if the DATA0 packet with no data is in the second transaction
of isoc split in transaction sequence, we need to increase the
qtd->isoc_frame_index and giveback urb to device driver if needed,
otherwise, the MDATA packet will be lost.

A typical test case is that connect the dwc2 controller with an
usb hs Hub (GL852G-12), and plug an usb fs audio device (Plantronics
headset) into the downstream port of Hub. Then use the usb mic
to record, we can find noise when playback.

In the case, the isoc split in transaction sequence like this:

- SSPLIT IN transaction
- CSPLIT IN transaction
  - MDATA packet (176 bytes)
- CSPLIT IN transaction
  - DATA0 packet (0 byte)

This patch use both the length of DATA0 and qtd->isoc_split_offset
to check if the DATA0 is in the second transaction.

Signed-off-by: William Wu 
---
Changes in v2:
- Modify the commit message

 drivers/usb/dwc2/hcd_intr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index 5e2378f..479f628 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -930,7 +930,7 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg 
*hsotg,
frame_desc = >urb->iso_descs[qtd->isoc_frame_index];
len = dwc2_get_actual_xfer_length(hsotg, chan, chnum, qtd,
  DWC2_HC_XFER_COMPLETE, NULL);
-   if (!len) {
+   if (!len && !qtd->isoc_split_offset) {
qtd->complete_split = 0;
qtd->isoc_split_offset = 0;
return 0;
-- 
2.0.0




  1   2   3   4   5   6   7   8   9   10   >