Re: [PATCH] memory tier: rename destroy_memory_type() to put_memory_type()

2023-07-06 Thread Xiao Yang

On 2023/7/6 14:39, Miaohe Lin wrote:

It appears that destroy_memory_type() isn't a very good name because
we usually will not free the memory_type here. So rename it to a more
appropriate name i.e. put_memory_type().

Suggested-by: Huang, Ying 
Signed-off-by: Miaohe Lin 
---
  drivers/dax/kmem.c   | 4 ++--
  include/linux/memory-tiers.h | 4 ++--
  mm/memory-tiers.c| 6 +++---
  3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 898ca9505754..c57acb73e3db 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -264,7 +264,7 @@ static int __init dax_kmem_init(void)
return rc;
  
  error_dax_driver:

-   destroy_memory_type(dax_slowmem_type);
+   put_memory_type(dax_slowmem_type);
  err_dax_slowmem_type:
kfree_const(kmem_name);
return rc;
@@ -275,7 +275,7 @@ static void __exit dax_kmem_exit(void)
dax_driver_unregister(_dax_kmem_driver);
if (!any_hotremove_failed)
kfree_const(kmem_name);
-   destroy_memory_type(dax_slowmem_type);
+   put_memory_type(dax_slowmem_type);
  }
  
  MODULE_AUTHOR("Intel Corporation");

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index fc9647b1b4f9..437441cdf78f 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -33,7 +33,7 @@ struct memory_dev_type {
  #ifdef CONFIG_NUMA
  extern bool numa_demotion_enabled;
  struct memory_dev_type *alloc_memory_type(int adistance);
-void destroy_memory_type(struct memory_dev_type *memtype);
+void put_memory_type(struct memory_dev_type *memtype);
  void init_node_memory_type(int node, struct memory_dev_type *default_type);
  void clear_node_memory_type(int node, struct memory_dev_type *memtype);
  #ifdef CONFIG_MIGRATION
@@ -68,7 +68,7 @@ static inline struct memory_dev_type *alloc_memory_type(int 
adistance)
return NULL;
  }
  
-static inline void destroy_memory_type(struct memory_dev_type *memtype)

+static inline void put_memory_type(struct memory_dev_type *memtype)
  {
  
  }

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 1719fa3bcf02..c49ab03f49b1 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -560,11 +560,11 @@ struct memory_dev_type *alloc_memory_type(int adistance)
  }
  EXPORT_SYMBOL_GPL(alloc_memory_type);
  
-void destroy_memory_type(struct memory_dev_type *memtype)

+void put_memory_type(struct memory_dev_type *memtype)
  {
kref_put(>kref, release_memtype);
  }
-EXPORT_SYMBOL_GPL(destroy_memory_type);
+EXPORT_SYMBOL_GPL(put_memory_type);
  
  void init_node_memory_type(int node, struct memory_dev_type *memtype)

  {
@@ -586,7 +586,7 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 */
if (!node_memory_types[node].map_count) {
node_memory_types[node].memtype = NULL;
-   destroy_memory_type(memtype);
+   put_memory_type(memtype);

Hi Maohe,

I didn't find that destroy_memory_type(memtype) is called here on 
mainline kernel. Did I miss something?


Other than that, it looks good to me.
Reviewed-by: Xiao Yang 

Best Regards,
Xiao Yang


}
mutex_unlock(_tier_lock);
  }




Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-09-19 Thread Yang , Xiao/杨 晓

Hi Darrick, Brian and Christoph

Ping. I hope to get your feedback.

1) I have confirmed that the following patch set did not change the test 
result of generic/470 with thin-volume. Besides, I didn't see any 
failure when running generic/470 based on normal PMEM device instaed of 
thin-volume.

https://lore.kernel.org/linux-xfs/20211129102203.2243509-1-...@lst.de/

2) I can reproduce the failure of generic/482 without thin-volume.

3) Is it necessary to make thin-volume support DAX. Is there any use 
case for the requirement?


Best Regards,
Xiao Yang

On 2022/9/16 10:04, Yang, Xiao/杨 晓 wrote:

On 2022/9/15 18:14, Yang, Xiao/杨 晓 wrote:

On 2022/9/15 0:28, Darrick J. Wong wrote:

On Wed, Sep 14, 2022 at 08:34:26AM -0400, Brian Foster wrote:

On Wed, Sep 14, 2022 at 05:38:02PM +0800, Yang, Xiao/杨 晓 wrote:

On 2022/9/14 14:44, Yang, Xiao/杨 晓 wrote:

On 2022/9/9 21:01, Brian Foster wrote:
Yes.. I don't recall all the internals of the tools and test, but 
IIRC
it relied on discard to perform zeroing between checkpoints or 
some such

and avoid spurious failures. The purpose of running on dm-thin was
merely to provide reliable discard zeroing behavior on the target 
device

and thus to allow the test to run reliably.

Hi Brian,

As far as I know, generic/470 was original designed to verify
mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the
reason, we need to ensure that all underlying devices under
dm-log-writes device support DAX. However dm-thin device never 
supports

DAX so
running generic/470 with dm-thin device always returns "not run".

Please see the difference between old and new logic:

    old logic  new logic
---
log-writes device(DAX) log-writes device(DAX)
  |   |
PMEM0(DAX) + PMEM1(DAX)   Thin device(non-DAX) + PMEM1(DAX)
    |
  PMEM0(DAX)
---

We think dm-thin device is not a good solution for generic/470, is 
there

any other solution to support both discard zero and DAX?


Hi Brian,

I have sent a patch[1] to revert your fix because I think it's not 
good for

generic/470 to use thin volume as my revert patch[1] describes:
[1] 
https://lore.kernel.org/fstests/20220914090625.32207-1-yangx...@fujitsu.com/T/#u 





I think the history here is that generic/482 was changed over first in
commit 65cc9a235919 ("generic/482: use thin volume as data device"), 
and
then sometime later we realized generic/455,457,470 had the same 
general
flaw and were switched over. The dm/dax compatibility thing was 
probably
just an oversight, but I am a little curious about that because it 
should


It's not an oversight -- it used to work (albeit with EXPERIMENTAL
tags), and now we've broken it on fsdax as the pmem/blockdev divorce
progresses.

Hi

Do you mean that the following patch set changed the test result of 
generic/470 with thin-volume? (pass => not run/failure)

https://lore.kernel.org/linux-xfs/20211129102203.2243509-1-...@lst.de/




have been obvious that the change caused the test to no longer run. Did
something change after that to trigger that change in behavior?

With the revert, generic/470 can always run successfully on my 
environment

so I wonder how to reproduce the out-of-order replay issue on XFS v5
filesystem?



I don't quite recall the characteristics of the failures beyond that we
were seeing spurious test failures with generic/482 that were due to
essentially putting the fs/log back in time in a way that wasn't quite
accurate due to the clearing by the logwrites tool not taking place. If
you wanted to reproduce in order to revisit that, perhaps start with
generic/482 and let it run in a loop for a while and see if it
eventually triggers a failure/corruption..?

PS: I want to reproduce the issue and try to find a better solution 
to fix

it.



It's been a while since I looked at any of this tooling to semi-grok 
how

it works.


I /think/ this was the crux of the problem, back in 2019?
https://lore.kernel.org/fstests/20190227061529.GF16436@dastard/


Agreed.




Perhaps it could learn to rely on something more explicit like
zero range (instead of discard?) or fall back to manual zeroing?


AFAICT src/log-writes/ actually /can/ do zeroing, but (a) it probably
ought to be adapted to call BLKZEROOUT and (b) in the worst case it
writes zeroes to the entire device, which is/can be slow.

For a (crass) example, one of my cloudy test VMs uses 34GB partitions,
and for cost optimization purposes we're only "paying" for the cheapest
tier.  Weirdly that maps to an upper limit of 6500 write iops and
48MB/s(!) but that would take about 20 minutes to zero the entire
device if the dm-thin hack wasn't in place.  Frustratingly, it doesn't
support disca

Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-09-15 Thread Yang , Xiao/杨 晓

On 2022/9/15 18:14, Yang, Xiao/杨 晓 wrote:

On 2022/9/15 0:28, Darrick J. Wong wrote:

On Wed, Sep 14, 2022 at 08:34:26AM -0400, Brian Foster wrote:

On Wed, Sep 14, 2022 at 05:38:02PM +0800, Yang, Xiao/杨 晓 wrote:

On 2022/9/14 14:44, Yang, Xiao/杨 晓 wrote:

On 2022/9/9 21:01, Brian Foster wrote:
Yes.. I don't recall all the internals of the tools and test, but 
IIRC
it relied on discard to perform zeroing between checkpoints or 
some such

and avoid spurious failures. The purpose of running on dm-thin was
merely to provide reliable discard zeroing behavior on the target 
device

and thus to allow the test to run reliably.

Hi Brian,

As far as I know, generic/470 was original designed to verify
mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the
reason, we need to ensure that all underlying devices under
dm-log-writes device support DAX. However dm-thin device never 
supports

DAX so
running generic/470 with dm-thin device always returns "not run".

Please see the difference between old and new logic:

    old logic  new logic
---
log-writes device(DAX) log-writes device(DAX)
  |   |
PMEM0(DAX) + PMEM1(DAX)   Thin device(non-DAX) + PMEM1(DAX)
    |
  PMEM0(DAX)
---

We think dm-thin device is not a good solution for generic/470, is 
there

any other solution to support both discard zero and DAX?


Hi Brian,

I have sent a patch[1] to revert your fix because I think it's not 
good for

generic/470 to use thin volume as my revert patch[1] describes:
[1] 
https://lore.kernel.org/fstests/20220914090625.32207-1-yangx...@fujitsu.com/T/#u 





I think the history here is that generic/482 was changed over first in
commit 65cc9a235919 ("generic/482: use thin volume as data device"), and
then sometime later we realized generic/455,457,470 had the same general
flaw and were switched over. The dm/dax compatibility thing was probably
just an oversight, but I am a little curious about that because it 
should


It's not an oversight -- it used to work (albeit with EXPERIMENTAL
tags), and now we've broken it on fsdax as the pmem/blockdev divorce
progresses.

Hi

Do you mean that the following patch set changed the test result of 
generic/470 with thin-volume? (pass => not run/failure)

https://lore.kernel.org/linux-xfs/20211129102203.2243509-1-...@lst.de/




have been obvious that the change caused the test to no longer run. Did
something change after that to trigger that change in behavior?

With the revert, generic/470 can always run successfully on my 
environment

so I wonder how to reproduce the out-of-order replay issue on XFS v5
filesystem?



I don't quite recall the characteristics of the failures beyond that we
were seeing spurious test failures with generic/482 that were due to
essentially putting the fs/log back in time in a way that wasn't quite
accurate due to the clearing by the logwrites tool not taking place. If
you wanted to reproduce in order to revisit that, perhaps start with
generic/482 and let it run in a loop for a while and see if it
eventually triggers a failure/corruption..?

PS: I want to reproduce the issue and try to find a better solution 
to fix

it.



It's been a while since I looked at any of this tooling to semi-grok how
it works.


I /think/ this was the crux of the problem, back in 2019?
https://lore.kernel.org/fstests/20190227061529.GF16436@dastard/


Agreed.




Perhaps it could learn to rely on something more explicit like
zero range (instead of discard?) or fall back to manual zeroing?


AFAICT src/log-writes/ actually /can/ do zeroing, but (a) it probably
ought to be adapted to call BLKZEROOUT and (b) in the worst case it
writes zeroes to the entire device, which is/can be slow.

For a (crass) example, one of my cloudy test VMs uses 34GB partitions,
and for cost optimization purposes we're only "paying" for the cheapest
tier.  Weirdly that maps to an upper limit of 6500 write iops and
48MB/s(!) but that would take about 20 minutes to zero the entire
device if the dm-thin hack wasn't in place.  Frustratingly, it doesn't
support discard or write-zeroes.


Do you mean that discard zero(BLKDISCARD) is faster than both fill 
zero(BLKZEROOUT) and write zero on user space?


Hi Darrick, Brian and Christoph

According to the discussion about generic/470. I wonder if it is 
necessary to make thin-pool support DAX. Is there any use case for the 
requirement?


Best Regards,
Xiao Yang


Best Regards,
Xiao Yang



If the
eventual solution is simple and low enough overhead, it might make some
sense to replace the dmthin hack across the set of tests mentioned
above.


That said, for a *pmem* test you'd expect it to be faster than that...

--D

Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-09-15 Thread Yang , Xiao/杨 晓

On 2022/9/15 0:28, Darrick J. Wong wrote:

On Wed, Sep 14, 2022 at 08:34:26AM -0400, Brian Foster wrote:

On Wed, Sep 14, 2022 at 05:38:02PM +0800, Yang, Xiao/杨 晓 wrote:

On 2022/9/14 14:44, Yang, Xiao/杨 晓 wrote:

On 2022/9/9 21:01, Brian Foster wrote:

Yes.. I don't recall all the internals of the tools and test, but IIRC
it relied on discard to perform zeroing between checkpoints or some such
and avoid spurious failures. The purpose of running on dm-thin was
merely to provide reliable discard zeroing behavior on the target device
and thus to allow the test to run reliably.

Hi Brian,

As far as I know, generic/470 was original designed to verify
mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the
reason, we need to ensure that all underlying devices under
dm-log-writes device support DAX. However dm-thin device never supports
DAX so
running generic/470 with dm-thin device always returns "not run".

Please see the difference between old and new logic:

old logic  new logic
---
log-writes device(DAX) log-writes device(DAX)
  |   |
PMEM0(DAX) + PMEM1(DAX)   Thin device(non-DAX) + PMEM1(DAX)
|
  PMEM0(DAX)
---

We think dm-thin device is not a good solution for generic/470, is there
any other solution to support both discard zero and DAX?


Hi Brian,

I have sent a patch[1] to revert your fix because I think it's not good for
generic/470 to use thin volume as my revert patch[1] describes:
[1] 
https://lore.kernel.org/fstests/20220914090625.32207-1-yangx...@fujitsu.com/T/#u



I think the history here is that generic/482 was changed over first in
commit 65cc9a235919 ("generic/482: use thin volume as data device"), and
then sometime later we realized generic/455,457,470 had the same general
flaw and were switched over. The dm/dax compatibility thing was probably
just an oversight, but I am a little curious about that because it should


It's not an oversight -- it used to work (albeit with EXPERIMENTAL
tags), and now we've broken it on fsdax as the pmem/blockdev divorce
progresses.

Hi

Do you mean that the following patch set changed the test result of 
generic/470 with thin-volume? (pass => not run/failure)

https://lore.kernel.org/linux-xfs/20211129102203.2243509-1-...@lst.de/




have been obvious that the change caused the test to no longer run. Did
something change after that to trigger that change in behavior?


With the revert, generic/470 can always run successfully on my environment
so I wonder how to reproduce the out-of-order replay issue on XFS v5
filesystem?



I don't quite recall the characteristics of the failures beyond that we
were seeing spurious test failures with generic/482 that were due to
essentially putting the fs/log back in time in a way that wasn't quite
accurate due to the clearing by the logwrites tool not taking place. If
you wanted to reproduce in order to revisit that, perhaps start with
generic/482 and let it run in a loop for a while and see if it
eventually triggers a failure/corruption..?


PS: I want to reproduce the issue and try to find a better solution to fix
it.



It's been a while since I looked at any of this tooling to semi-grok how
it works.


I /think/ this was the crux of the problem, back in 2019?
https://lore.kernel.org/fstests/20190227061529.GF16436@dastard/


Agreed.




Perhaps it could learn to rely on something more explicit like
zero range (instead of discard?) or fall back to manual zeroing?


AFAICT src/log-writes/ actually /can/ do zeroing, but (a) it probably
ought to be adapted to call BLKZEROOUT and (b) in the worst case it
writes zeroes to the entire device, which is/can be slow.

For a (crass) example, one of my cloudy test VMs uses 34GB partitions,
and for cost optimization purposes we're only "paying" for the cheapest
tier.  Weirdly that maps to an upper limit of 6500 write iops and
48MB/s(!) but that would take about 20 minutes to zero the entire
device if the dm-thin hack wasn't in place.  Frustratingly, it doesn't
support discard or write-zeroes.


Do you mean that discard zero(BLKDISCARD) is faster than both fill 
zero(BLKZEROOUT) and write zero on user space?


Best Regards,
Xiao Yang



If the
eventual solution is simple and low enough overhead, it might make some
sense to replace the dmthin hack across the set of tests mentioned
above.


That said, for a *pmem* test you'd expect it to be faster than that...

--D


Brian


Best Regards,
Xiao Yang



BTW, only log-writes, stripe and linear support DAX for now.








Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-09-14 Thread Yang , Xiao/杨 晓

On 2022/9/14 14:44, Yang, Xiao/杨 晓 wrote:

On 2022/9/9 21:01, Brian Foster wrote:

Yes.. I don't recall all the internals of the tools and test, but IIRC
it relied on discard to perform zeroing between checkpoints or some such
and avoid spurious failures. The purpose of running on dm-thin was
merely to provide reliable discard zeroing behavior on the target device
and thus to allow the test to run reliably.

Hi Brian,

As far as I know, generic/470 was original designed to verify
mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the
reason, we need to ensure that all underlying devices under
dm-log-writes device support DAX. However dm-thin device never supports
DAX so
running generic/470 with dm-thin device always returns "not run".

Please see the difference between old and new logic:

   old logic  new logic
---
log-writes device(DAX) log-writes device(DAX)
 |   |
PMEM0(DAX) + PMEM1(DAX)   Thin device(non-DAX) + PMEM1(DAX)
   |
 PMEM0(DAX)
---

We think dm-thin device is not a good solution for generic/470, is there
any other solution to support both discard zero and DAX?


Hi Brian,

I have sent a patch[1] to revert your fix because I think it's not good 
for generic/470 to use thin volume as my revert patch[1] describes:
[1] 
https://lore.kernel.org/fstests/20220914090625.32207-1-yangx...@fujitsu.com/T/#u


With the revert, generic/470 can always run successfully on my 
environment so I wonder how to reproduce the out-of-order replay issue 
on XFS v5 filesystem?


PS: I want to reproduce the issue and try to find a better solution to 
fix it.


Best Regards,
Xiao Yang



BTW, only log-writes, stripe and linear support DAX for now.




Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-09-14 Thread Yang , Xiao/杨 晓

On 2022/9/9 21:01, Brian Foster wrote:

Yes.. I don't recall all the internals of the tools and test, but IIRC
it relied on discard to perform zeroing between checkpoints or some such
and avoid spurious failures. The purpose of running on dm-thin was
merely to provide reliable discard zeroing behavior on the target device
and thus to allow the test to run reliably.


Hi Brian,

As far as I know, generic/470 was original designed to verify 
mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the 
reason, we need to ensure that all underlying devices under 
dm-log-writes device support DAX. However dm-thin device never supports 
DAX so

running generic/470 with dm-thin device always returns "not run".

Please see the difference between old and new logic:

 old logic  new logic
---
log-writes device(DAX) log-writes device(DAX)
   |   |
PMEM0(DAX) + PMEM1(DAX)   Thin device(non-DAX) + PMEM1(DAX)
 |
   PMEM0(DAX)
---

We think dm-thin device is not a good solution for generic/470, is there 
any other solution to support both discard zero and DAX?


BTW, only log-writes, stripe and linear support DAX for now.

Best Regards,
Xiao Yang



Re: PROBLEM: Recent raid10 block discard patchset causes filesystem corruption on fstrim

2021-02-02 Thread Xiao Ni




On 02/02/2021 11:42 AM, Matthew Ruffell wrote:

Hi Xiao,

On 24/12/20 11:18 pm, Xiao Ni wrote:> The root cause is found. Now we use a 
similar way with raid0 to handle discard request

for raid10. Because the discard region is very big, we can calculate the 
start/end address
for each disk. Then we can submit the discard request to each disk. But for 
raid10, it has
copies. For near layout, if the discard request doesn't align with chunk size, 
we calculate
a start_disk_offset. Now we only use start_disk_offset for the first disk, but 
it should be
used for the near copies disks too.

Thanks for finding the root cause and making a patch that corrects the offset
addresses for multiple disks!


[  789.709501] discard bio start : 70968, size : 191176
[  789.709507] first stripe index 69, start disk index 0, start disk offset 
70968
[  789.709509] last stripe index 256, end disk index 0, end disk offset 262144
[  789.709511] disk 0, dev start : 70968, dev end : 262144
[  789.709515] disk 1, dev start : 70656, dev end : 262144

For example, in this test case, it has 2 near copies. The start_disk_offset for 
the first disk is 70968.
It should use the same offset address for second disk. But it uses the start 
address of this chunk.
It discard more region. The patch in the attachment can fix this problem. It 
split the region that
doesn't align with chunk size.

Just wondering, what is the current status of the patchset? Is there anything
that I can do to help?


There is another problem. The stripe size should be calculated differently for 
near layout and far layout.


I can help review the patch and help test the patches anytime. Do you need help
with making a patch to calculate the stripe size for near and far layouts?

Let me know how you are going with this patchset, and if there is anything I
can do for you.

Thanks,
Matthew


Hi Matthew

I'm doing the test for the new patch set. I'll send the patch soon 
again. Thanks for the help.


Regards
Xiao



Re: PROBLEM: Recent raid10 block discard patchset causes filesystem corruption on fstrim

2020-12-24 Thread Xiao Ni



On 12/09/2020 12:17 PM, Song Liu wrote:

Hi Matthew,


On Dec 8, 2020, at 7:46 PM, Matthew Ruffell  
wrote:

Hello,

I recently backported the following patches into the Ubuntu stable kernels:

md: add md_submit_discard_bio() for submitting discard bio
md/raid10: extend r10bio devs to raid disks
md/raid10: pull codes that wait for blocked dev into one function
md/raid10: improve raid10 discard request
md/raid10: improve discard request for far layout
dm raid: fix discard limits for raid1 and raid10
dm raid: remove unnecessary discard limits for raid10

Thanks for the report!

Hi Xiao,

Could you please take a look at this and let me know soon? We need to fix
this before 5.10 official release.

Thanks,
Song


Hi all

The root cause is found. Now we use a similar way with raid0 to handle 
discard request
for raid10. Because the discard region is very big, we can calculate the 
start/end address
for each disk. Then we can submit the discard request to each disk. But 
for raid10, it has
copies. For near layout, if the discard request doesn't align with chunk 
size, we calculate
a start_disk_offset. Now we only use start_disk_offset for the first 
disk, but it should be

used for the near copies disks too.

[  789.709501] discard bio start : 70968, size : 191176
[  789.709507] first stripe index 69, start disk index 0, start disk 
offset 70968
[  789.709509] last stripe index 256, end disk index 0, end disk offset 
262144

[  789.709511] disk 0, dev start : 70968, dev end : 262144
[  789.709515] disk 1, dev start : 70656, dev end : 262144

For example, in this test case, it has 2 near copies. The 
start_disk_offset for the first disk is 70968.
It should use the same offset address for second disk. But it uses the 
start address of this chunk.
It discard more region. The patch in the attachment can fix this 
problem. It split the region that

doesn't align with chunk size.

There is another problem. The stripe size should be calculated 
differently for near layout and far layout.


@Song, do you want me to use a separate patch for this fix, or fix this 
in the original patch?


Merry Christmas
Xiao

commit 0d74ac66ed0ec5af70296545e26044723a14657c
Author: Xiao Ni 
Date:   Thu Dec 24 17:58:43 2020 +0800

fix

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3153183b7772..92182cf40d22 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1604,6 +1604,7 @@ static int raid10_handle_discard(struct mddev *mddev, 
struct bio *bio)
sector_t chunk;
unsigned int stripe_size;
sector_t split_size;
+   sector_t chunk_size = 1 << geo->chunk_shift;
 
sector_t bio_start, bio_end;
sector_t first_stripe_index, last_stripe_index;
@@ -1624,7 +1625,8 @@ static int raid10_handle_discard(struct mddev *mddev, 
struct bio *bio)
if (test_bit(MD_RECOVERY_RESHAPE, >recovery))
goto out;
 
-   stripe_size = geo->raid_disks << geo->chunk_shift;
+   stripe_size = geo->near_copies ? geo->near_copies << geo->chunk_shift:
+   geo->raid_disks << geo->chunk_shift;
bio_start = bio->bi_iter.bi_sector;
bio_end = bio_end_sector(bio);
 
@@ -1637,6 +1639,18 @@ static int raid10_handle_discard(struct mddev *mddev, 
struct bio *bio)
if (bio_sectors(bio) < stripe_size*2)
goto out;
 
+   /* Keep the discard start/end address aligned with chunk size */
+   if (bio_start & geo->chunk_mask) {
+   split_size = (chunk_size - (bio_start & geo->chunk_mask));
+   bio = raid10_split_bio(conf, bio, split_size, false);
+   }
+   if (bio_end & geo->chunk_mask) {
+   split_size = bio_end & geo->chunk_mask;
+   bio = raid10_split_bio(conf, bio, split_size, true);
+   }
+   bio_start = bio->bi_iter.bi_sector;
+   bio_end = bio_end_sector(bio);
+
/* For far and far offset layout, if bio is not aligned with stripe 
size,
 * it splits the part that is not aligned with strip size.
 */
@@ -1664,8 +1678,8 @@ static int raid10_handle_discard(struct mddev *mddev, 
struct bio *bio)
start_disk_index = sector_div(first_stripe_index, geo->raid_disks);
if (geo->far_offset)
first_stripe_index *= geo->far_copies;
-   start_disk_offset = (bio_start & geo->chunk_mask) +
-   (first_stripe_index << geo->chunk_shift);
+   /* Now the bio is aligned with chunk size */
+   start_disk_offset = first_stripe_index << geo->chunk_shift;
 
chunk = bio_end >> geo->chunk_shift;
chunk *= geo->near_copies;
@@ -1673,8 +1687,7 @@ static int raid10_handle_discard(struct mddev *mddev, 
struct bio *bio)
end_disk_index = sector_div(last_stripe_index, geo->raid_disks)

Re: PROBLEM: Recent raid10 block discard patchset causes filesystem corruption on fstrim

2020-12-09 Thread Xiao Ni




On 12/09/2020 12:17 PM, Song Liu wrote:

Hi Matthew,


On Dec 8, 2020, at 7:46 PM, Matthew Ruffell  
wrote:

Hello,

I recently backported the following patches into the Ubuntu stable kernels:

md: add md_submit_discard_bio() for submitting discard bio
md/raid10: extend r10bio devs to raid disks
md/raid10: pull codes that wait for blocked dev into one function
md/raid10: improve raid10 discard request
md/raid10: improve discard request for far layout
dm raid: fix discard limits for raid1 and raid10
dm raid: remove unnecessary discard limits for raid10

Thanks for the report!

Hi Xiao,

Could you please take a look at this and let me know soon? We need to fix
this before 5.10 official release.

Thanks,
Song


Hi all

Sorry for the trouble. But I'm in pto with no test machines. I'll have a 
look at this problem

next week.



and this morning, a user reported the following downstream bug:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/

Their weekly cronjob that runs fstrim had run, and their raid10 array has
extensive data corruption.

The issue is reproducible on the latest 5.10-rc7 mainline kernel, steps are
below.

I used a m5d.4xlarge instance on AWS to ultilise 2x 300GB SSDs that support
block discard. You will want to select small disks to lower the time needed
to reproduce.

$ uname -rv
5.10.0-rc7+ #1 SMP Wed Dec 9 01:15:27 UTC 2020

Create a raid10 array, with LVM:

$ lsblk
NAMEMAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme0n1 259:00 8G  0 disk
└─nvme0n1p1 259:10 8G  0 part /
nvme1n1 259:20 279.4G  0 disk
nvme2n1 259:30 279.4G  0 disk

$ sudo -s
# mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme1n1 /dev/nvme2n1
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: size set to 292836352K
mdadm: automatically enabling write-intent bitmap on large array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
# pvcreate -ff -y /dev/md0
  Physical volume "/dev/md0" successfully created.
# vgcreate -f -y VolGroup /dev/md0
  Volume group "VolGroup" successfully created
# lvcreate -n root -L 100G -ay -y VolGroup
  Logical volume "root" created.
# mkfs.ext4 /dev/VolGroup/root
mke2fs 1.44.1 (24-Mar-2018)
Discarding device blocks: done
Creating filesystem with 26214400 4k blocks and 6553600 inodes
Filesystem UUID: d7be2e14-fa4d-4489-884b-3bef63b1e1db
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 2048, 23887872

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done
# mount /dev/VolGroup/root /mnt

Next, wait for the disk check to complete, 25 minutes on m5d.4xlarge instance.

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10]
md0 : active raid10 nvme2n1[1] nvme1n1[0]
  292836352 blocks super 1.2 2 near-copies [2/2] [UU]
  [==>..]  resync = 12.0% (35211392/292836352) 
finish=21.4min speed=200340K/sec
  bitmap: 3/3 pages [12KB], 65536KB chunk

unused devices: 
# cat /sys/block/md0/md/mismatch_cnt
76918016

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10]
md0 : active raid10 nvme2n1[1] nvme1n1[0]
  292836352 blocks super 1.2 2 near-copies [2/2] [UU]
  bitmap: 0/3 pages [0KB], 65536KB chunk

unused devices: 
# cat /sys/block/md0/md/mismatch_cnt
582330240

Now that the check is complete, create a file, sync and delete it:

# dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M
1048576+0 records in
1048576+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.95974 s, 1.1 GB/s
# sync
# rm /mnt/data.raw

Perform a check:

# echo check > /sys/block/md0/md/sync_action

Again, wait 25 minutes for it to complete:

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10]
md0 : active raid10 nvme1n1[1] nvme2n1[0]
  292836352 blocks super 1.2 2 near-copies [2/2] [UU]
  [==>..]  check = 13.7% (40356224/292836352) 
finish=20.8min speed=201707K/sec
  bitmap: 0/3 pages [0KB], 65536KB chunk

unused devices: 
# cat /sys/block/md0/md/mismatch_cnt
1469696

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
[raid10]
md0 : active raid10 nvme1n1[1] nvme2n1[0]
  292836352 blocks super 1.2 2 near-copies [2/2] [UU]
  bitmap: 0/3 pages [0KB], 65536KB chunk

unused devices: 
# cat /sys/block/md0/md/mismatch_cnt
1469696

Now, perform the fstrim:

# fstrim /mnt --verbose
/mnt: 97.9 GiB (105089236992 bytes) trimmed

Go for another check:

# echo check >/sys/block/md0/md/sync_action
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4

Re: [PATCH v2 0/3] md superblock write alignment on 512e devices

2020-11-04 Thread Xiao Ni




On 11/04/2020 04:12 AM, Chris Unkel wrote:

Hi Xiao,

Thanks for the excellent feedback.  Since bitmap_offset appears to be
a free-form field, it wasn't apparent to me that the bitmap never
starts within 4K of the bitmap.

I don't think it's worth worrying about a logical block size that's
more than 4K here--from what I can see logical block size larger than
the usual 4K page isn't going to happen.

I do think that it makes sense to handle the case where the physical
block size is more than 4K.  I think what you propose works, but I
think in the physical block > MAX_SB_SIZE case it makes more sense to
align the superblock writes to the physical block size (as now) rather
Is it a typo error? You want to say if physical block > MAX_SB_SIZE, it 
should align the
superblock writes to logical block size? Because I see the comments 
below, your solution

is to align to logical block size when physical block > MAX_SB_SIZE.

than rejecting the create/assemble.  Mounting with the possible
performance hit seems like a better outcome for the user in that case
than refusing to assemble.
It's the same check that would have to be written to reject the
assembly in that case and so the code shouldn't really be any more
complex.

So basically what I propose is:  if the physical block size is no
larger than MAX_SB_SIZE, pad to that; otherwise pad to to
logical_block_size, that is, replace queue_logical_block_size()
with something equivalent to:

 queue_physical_block_size(...) > MAX_SB_SIZE ?
queue_logical_block_size(...) : queue_physical_block_size(...)

which is simple, safe in all cases, doesn't reject any feasible
assembly, and generates aligned sb writes on all common current
devices (512n,4kn,512e.)

What do you think?

Yes, It's a nice solution :)

Regards
Xiao



Re: [PATCH 0/3] mdraid sb and bitmap write alignment on 512e drives

2020-11-03 Thread Xiao Ni




On 11/03/2020 02:59 AM, Chris Unkel wrote:

Hi Xiao,

That particular array is super1.2.  The block trace was captured on
the disk underlying the partition device on which the md array member
resides, not on the partition device itself.  The partition starts
2048 sectors into the disk (1MB).  So the 2048 sectors offset to the
beginning of the partition, plus the 8 sector superblock offset for
super1.2 ends up at 2056.

Sorry for the confusion there.

Regards,

  --Chris


Thanks for the explanation. I still have some questions in other emails 
about

your patch. Could you have a look when you are free.

Regards
Xiao



Re: [PATCH 2/3] md: align superblock writes to physical blocks

2020-11-02 Thread Xiao Ni




On 10/30/2020 04:13 AM, Christopher Unkel wrote:

Writes of the md superblock are aligned to the logical blocks of the
containing device, but no attempt is made to align them to physical
block boundaries.  This means that on a "512e" device (4k physical, 512
logical) every superblock update hits the 512-byte emulation and the
possible associated performance penalty.

Respect the physical block alignment when possible, that is, when the
write padded out to the physical block doesn't run into the data or
bitmap.

Signed-off-by: Christopher Unkel 
---
This series replaces the first patch of the previous series
(https://lkml.org/lkml/2020/10/22/1058), with the following changes:

1. Creates a helper function super_1_sb_length_ok().
2. Fixes operator placement style violation.
3. Covers case in super_1_sync().
4. Refactors duplicate logic.
5. Covers a case in existing code where aligned superblock could
run into bitmap.

  drivers/md/md.c | 45 +
  1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d6a55ca1d52e..802a9a256fe5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1646,15 +1646,52 @@ static __le32 calc_sb_1_csum(struct mdp_superblock_1 
*sb)
return cpu_to_le32(csum);
  }
  
+static int

+super_1_sb_length_ok(struct md_rdev *rdev, int minor_version, int sb_len)
+{
+   int sectors = sb_len / 512;
+   struct mdp_superblock_1 *sb;
+
+   /* superblock is stored in memory as a single page */
+   if (sb_len > PAGE_SIZE)
+   return 0;
+
+   /* check if sb runs into data */
+   if (minor_version) {
+   if (rdev->sb_start + sectors > rdev->data_offset
+   || rdev->sb_start + sectors > rdev->new_data_offset)
+   return 0;
+   } else if (sb_len > 4096)
+   return 0;
+
+   /* check if sb runs into bitmap */
+   sb = page_address(rdev->sb_page);
+   if (le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) {
+   __s32 bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset);
+   if (bitmap_offset > 0 && sectors > bitmap_offset)
+   return 0;
+   }
+
+   return 1;
+}
+
For super1.0 it doesn't need to consider this. Because the data and 
bitmap is before superblock.
For super1.1 and 1.2 it only needs to check whether it runs into bitmap. 
The data is behind the

bitmap.

Regards
Xiao

  /*
   * set rdev->sb_size to that required for number of devices in array
   * with appropriate padding to underlying sectors
   */
  static void
-super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev)
+super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev, int minor_version)
  {
int sb_size = max_dev * 2 + 256;
-   rdev->sb_size = round_up(sb_size, bdev_logical_block_size(rdev->bdev));
+   int pb_aligned_size = round_up(sb_size,
+  bdev_physical_block_size(rdev->bdev));
+
+   /* generate physical-block aligned writes if legal */
+   if (super_1_sb_length_ok(rdev, minor_version, pb_aligned_size))
+   rdev->sb_size = pb_aligned_size;
+   else
+   rdev->sb_size = round_up(sb_size,
+bdev_logical_block_size(rdev->bdev));
  }
  
  static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_version)

@@ -1730,7 +1767,7 @@ static int super_1_load(struct md_rdev *rdev, struct 
md_rdev *refdev, int minor_
rdev->new_data_offset += (s32)le32_to_cpu(sb->new_offset);
atomic_set(>corrected_errors, 
le32_to_cpu(sb->cnt_corrected_read));
  
-	super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev));

+   super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev), minor_version);
  
  	if (minor_version

&& rdev->data_offset < sb_start + (rdev->sb_size/512))
@@ -2140,7 +2177,7 @@ static void super_1_sync(struct mddev *mddev, struct 
md_rdev *rdev)
  
  	if (max_dev > le32_to_cpu(sb->max_dev)) {

sb->max_dev = cpu_to_le32(max_dev);
-   super_1_set_rdev_sb_size(rdev, max_dev);
+   super_1_set_rdev_sb_size(rdev, max_dev, mddev->minor_version);
} else
max_dev = le32_to_cpu(sb->max_dev);
  




Re: [PATCH v2 0/3] md superblock write alignment on 512e devices

2020-11-01 Thread Xiao Ni




On 10/30/2020 04:13 AM, Christopher Unkel wrote:

Hello,

Thanks for the feedback on the previous patch series.

A updated patch series with the same function as the first patch
(https://lkml.org/lkml/2020/10/22/1058 "md: align superblock writes to
physical blocks") follows.

As suggested, it introduces a helper function, which can be used to
reduce some code duplication.  It handles the case in super_1_sync()
where the superblock is extended by the addition of new component
devices.

I think it also fixes a bug where the existing code in super_1_load()
ought to be rejecting the array with EINVAL: if the superblock padded
out to the *logical* block length runs into the bitmap.  For example, if
the bitmap offset is 2 (bitmap 1K after superblock) and the logical
block size is 4K, the superblock padded out to 4K runs into the bitmap.
This case may be unusual (perhaps only happens if the array is created
on a 512n device and then raw contents are copied onto a 4kn device) but
I think it is possible.

Hi Chris
For super1.1 and super1.2 bitmap offset is 8. It's a fixed value. So it 
should

not have the risk?

But for future maybe it has this problem. If the disk logical or 
physical block size

is larger than 4K in future, it has data corruption risk.


With respect to the option of simply replacing
queue_logical_block_size() with queue_physical_block_size(), I think
this can result in the code rejecting devices that can be loaded, but

In mdadm it defines the max super size of super1 is 4096
#define MAX_SB_SIZE 4096
/* bitmap super size is 256, but we round up to a sector for alignment */
#define BM_SUPER_SIZE 512
#define MAX_DEVS ((int)(MAX_SB_SIZE - sizeof(struct mdp_superblock_1)) / 2)
#define SUPER1_SIZE (MAX_SB_SIZE + BM_SUPER_SIZE \
 + sizeof(struct misc_dev_info))

It should be ok to replace queue_logical_block_size with 
queue_physical_block_size?
Now it doesn't check physical block size and super block size. For 
super1, we can add
a check that if physical block size is larger than MAX_SB_SIZE, then we 
reject to create/assmble

the raid device.

for which the physical block alignment can't be respected--the longer
padded size would trigger the EINVAL cases testing against
data_offset/new_data_offset.  I think it's better to proceed in such
cases, just with unaligned superblock writes as would presently happen.
Also if I'm right about the above bug, then I think this subsitution
would be more likely to trigger it.

Thanks,

   --Chris


Christopher Unkel (3):
   md: factor out repeated sb alignment logic
   md: align superblock writes to physical blocks
   md: reuse sb length-checking logic

  drivers/md/md.c | 69 +
  1 file changed, 52 insertions(+), 17 deletions(-)





Re: [PATCH 0/3] mdraid sb and bitmap write alignment on 512e drives

2020-11-01 Thread Xiao Ni




On 10/23/2020 11:31 AM, Christopher Unkel wrote:

Hello all,

While investigating some performance issues on mdraid 10 volumes
formed with "512e" disks (4k native/physical sector size but with 512
byte sector emulation), I've found two cases where mdraid will
needlessly issue writes that start on 4k byte boundary, but are are
shorter than 4k:

1. writes of the raid superblock; and
2. writes of the last page of the write-intent bitmap.

The following is an excerpt of a blocktrace of one of the component
members of a mdraid 10 volume during a 4k write near the end of the
array:

   8,32  112 0.01687   711  D  WS 2064 + 8 [kworker/11:1H]
* 8,32  115 0.001454119   711  D  WS 2056 + 1 [kworker/11:1H]
* 8,32  118 0.002847204   711  D  WS 2080 + 7 [kworker/11:1H]
   8,32  11   11 0.003700545  3094  D  WS 11721043920 + 8 [md127_raid1]
   8,32  11   14 0.308785692   711  D  WS 2064 + 8 [kworker/11:1H]
* 8,32  11   17 0.310201697   711  D  WS 2056 + 1 [kworker/11:1H]
   8,32  11   20 5.500799245   711  D  WS 2064 + 8 [kworker/11:1H]
* 8,32  11   2315.740923558   711  D  WS 2080 + 7 [kworker/11:1H]

Note the starred transactions, which each start on a 4k boundary, but
are less than 4k in length, and so will use the 512-byte emulation.
Sector 2056 holds the superblock, and is written as a single 512-byte
write.  Sector 2086 holds the bitmap bit relevant to the written
sector.  When it is written the active bits of the last page of the
bitmap are written, starting at sector 2080, padded out to the end of
the 512-byte logical sector as required.  This results in a 3.5kb
write, again using the 512-byte emulation.


Hi Christopher

Which superblock version do you use? If it's super1.1, superblock starts 
at 0 sector.
If it's super1.2, superblock starts at 8 sector. If it's super1.0, 
superblock starts at the
end of device and bitmap is before superblock. As mentioned above, 
bitmap is behind
the superblock, so it should not be super1.0. So I have a question why 
does 2056 hold

the superblock?

Regards
Xiao



Note that in some arrays the last page of the bitmap may be
sufficiently full that they are not affected by the issue with the
bitmap write.

As there can be a substantial penalty to using the 512-byte sector
emulation (turning writes into read-modify writes if the relevant
sector is not in the drive's cache) I believe it makes sense to pad
these writes out to a 4k boundary.  The writes are already padded out
for "4k native" drives, where the short access is illegal.

The following patch set changes the superblock and bitmap writes to
respect the physical block size (e.g. 4k for today's 512e drives) when
possible.  In each case there is already logic for padding out to the
underlying logical sector size.  I reuse or repeat the logic for
padding out to the physical sector size, but treat the padding out as
optional rather than mandatory.

The corresponding block trace with these patches is:

8,32   12 0.03410   694  D  WS 2064 + 8 [kworker/1:1H]
8,32   15 0.001368788   694  D  WS 2056 + 8 [kworker/1:1H]
8,32   18 0.002727981   694  D  WS 2080 + 8 [kworker/1:1H]
8,32   1   11 0.003533831  3063  D  WS 11721043920 + 8 [md127_raid1]
8,32   1   14 0.253952321   694  D  WS 2064 + 8 [kworker/1:1H]
8,32   1   17 0.255354215   694  D  WS 2056 + 8 [kworker/1:1H]
8,32   1   20 5.337938486   694  D  WS 2064 + 8 [kworker/1:1H]
8,32   1   2315.577963062   694  D  WS 2080 + 8 [kworker/1:1H]

I do notice that the code for bitmap writes has a more sophisticated
and thorough check for overlap than the code for superblock writes.
(Compare write_sb_page in md-bitmap.c vs. super_1_load in md.c.) From
what I know since the various structures starts have always been 4k
aligned anyway, it is always safe to pad the superblock write out to
4k (as occurs on 4k native drives) but not necessarily futher.

Feedback appreciated.

   --Chris


Christopher Unkel (3):
   md: align superblock writes to physical blocks
   md: factor sb write alignment check into function
   md: pad writes to end of bitmap to physical blocks

  drivers/md/md-bitmap.c | 80 +-
  drivers/md/md.c| 15 
  2 files changed, 63 insertions(+), 32 deletions(-)





Re: [PATCH] clk: rockchip: Fix overflow rate during fractional approximation

2020-09-14 Thread Finley Xiao
ommit:

commit <88a5404a2277> ("clk: rockchip: fix up the
rockchip_fractional_approximation")

commit <4186a0e4239b> ("clk: rockchip: Add supprot to limit input rate
for fractional divider")

Signed-off-by: Jagan Teki 
Signed-off-by: Finley Xiao 
---
  drivers/clk/rockchip/clk-px30.c | 12 ++--
  drivers/clk/rockchip/clk.c  |  9 +
  2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/clk/rockchip/clk-px30.c b/drivers/clk/rockchip/clk-px30.c
index 6fb9c98b7d24..06d3ff39d12f 100644
--- a/drivers/clk/rockchip/clk-px30.c
+++ b/drivers/clk/rockchip/clk-px30.c
@@ -660,7 +660,7 @@ static struct rockchip_clk_branch px30_clk_branches[] 
__initdata = {
COMPOSITE(SCLK_UART1_SRC, "clk_uart1_src", mux_uart_src_p, 
CLK_SET_RATE_NO_REPARENT,
PX30_CLKSEL_CON(34), 14, 2, MFLAGS, 0, 5, DFLAGS,
PX30_CLKGATE_CON(10), 12, GFLAGS),
-   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart1_np5", "clk_uart1_src", 0,
+   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart1_np5", "clk_uart1_src", 
CLK_SET_RATE_PARENT,
PX30_CLKSEL_CON(35), 0, 5, DFLAGS,
PX30_CLKGATE_CON(10), 13, GFLAGS),
COMPOSITE_FRACMUX(0, "clk_uart1_frac", "clk_uart1_src", 
CLK_SET_RATE_PARENT,
@@ -673,7 +673,7 @@ static struct rockchip_clk_branch px30_clk_branches[] 
__initdata = {
COMPOSITE(SCLK_UART2_SRC, "clk_uart2_src", mux_uart_src_p, 0,
PX30_CLKSEL_CON(37), 14, 2, MFLAGS, 0, 5, DFLAGS,
PX30_CLKGATE_CON(11), 0, GFLAGS),
-   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart2_np5", "clk_uart2_src", 0,
+   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart2_np5", "clk_uart2_src", 
CLK_SET_RATE_PARENT,
PX30_CLKSEL_CON(38), 0, 5, DFLAGS,
PX30_CLKGATE_CON(11), 1, GFLAGS),
COMPOSITE_FRACMUX(0, "clk_uart2_frac", "clk_uart2_src", 
CLK_SET_RATE_PARENT,
@@ -686,7 +686,7 @@ static struct rockchip_clk_branch px30_clk_branches[] 
__initdata = {
COMPOSITE(0, "clk_uart3_src", mux_uart_src_p, 0,
PX30_CLKSEL_CON(40), 14, 2, MFLAGS, 0, 5, DFLAGS,
PX30_CLKGATE_CON(11), 4, GFLAGS),
-   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart3_np5", "clk_uart3_src", 0,
+   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart3_np5", "clk_uart3_src", 
CLK_SET_RATE_PARENT,
PX30_CLKSEL_CON(41), 0, 5, DFLAGS,
PX30_CLKGATE_CON(11), 5, GFLAGS),
COMPOSITE_FRACMUX(0, "clk_uart3_frac", "clk_uart3_src", 
CLK_SET_RATE_PARENT,
@@ -699,7 +699,7 @@ static struct rockchip_clk_branch px30_clk_branches[] 
__initdata = {
COMPOSITE(0, "clk_uart4_src", mux_uart_src_p, 0,
PX30_CLKSEL_CON(43), 14, 2, MFLAGS, 0, 5, DFLAGS,
PX30_CLKGATE_CON(11), 8, GFLAGS),
-   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart4_np5", "clk_uart4_src", 0,
+   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart4_np5", "clk_uart4_src", 
CLK_SET_RATE_PARENT,
PX30_CLKSEL_CON(44), 0, 5, DFLAGS,
PX30_CLKGATE_CON(11), 9, GFLAGS),
COMPOSITE_FRACMUX(0, "clk_uart4_frac", "clk_uart4_src", 
CLK_SET_RATE_PARENT,
@@ -712,7 +712,7 @@ static struct rockchip_clk_branch px30_clk_branches[] 
__initdata = {
COMPOSITE(0, "clk_uart5_src", mux_uart_src_p, 0,
PX30_CLKSEL_CON(46), 14, 2, MFLAGS, 0, 5, DFLAGS,
PX30_CLKGATE_CON(11), 12, GFLAGS),
-   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart5_np5", "clk_uart5_src", 0,
+   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart5_np5", "clk_uart5_src", 
CLK_SET_RATE_PARENT,
PX30_CLKSEL_CON(47), 0, 5, DFLAGS,
PX30_CLKGATE_CON(11), 13, GFLAGS),
COMPOSITE_FRACMUX(0, "clk_uart5_frac", "clk_uart5_src", 
CLK_SET_RATE_PARENT,
@@ -934,7 +934,7 @@ static struct rockchip_clk_branch px30_clk_pmu_branches[] 
__initdata = {
COMPOSITE(0, "clk_uart0_pmu_src", mux_uart_src_p, 0,
PX30_PMU_CLKSEL_CON(3), 14, 2, MFLAGS, 0, 5, DFLAGS,
PX30_PMU_CLKGATE_CON(1), 0, GFLAGS),
-   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart0_np5", "clk_uart0_pmu_src", 0,
+   COMPOSITE_NOMUX_HALFDIV(0, "clk_uart0_np5", "clk_uart0_pmu_src", 
CLK_SET_RATE_PARENT,
PX30_PMU_CLKSEL_CON(4), 0, 5, DFLAGS,
PX30_PMU_CLKGATE_CON(1), 1, GFLAGS),
COMPOSITE_FRACMUX(0, "clk_uart0_frac", "clk_uart0_pmu_src",

[PATCH v1] thermal/of: Introduce k-po, k-pu and k-i for a thermal zone

2020-08-11 Thread Finley Xiao
The default value for k_pu is:
2 * sustainable_power / (desired_temperature - switch_on_temp)
The default value for k_po is:
sustainable_power / (desired_temperature - switch_on_temp)
The default value for k_i is 10.

Even though these parameters of the PID controller can be changed
by the following sysfs files:
/sys/class/thermal/thermal_zoneX/k_pu
/sys/class/thermal/thermal_zoneX/k_po
/sys/class/thermal/thermal_zoneX/k_i

But it's still more convenient to change the default values by devicetree,
so introduce these three optional properties. If provided these properties,
they will be parsed and associated with the thermal zone via the thermal
zone parameters.

Signed-off-by: Finley Xiao 
---
 Documentation/devicetree/bindings/thermal/thermal.txt | 14 ++
 drivers/thermal/thermal_of.c  |  7 +++
 2 files changed, 21 insertions(+)

diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt 
b/Documentation/devicetree/bindings/thermal/thermal.txt
index f78bec19ca35..ebe936b57ded 100644
--- a/Documentation/devicetree/bindings/thermal/thermal.txt
+++ b/Documentation/devicetree/bindings/thermal/thermal.txt
@@ -165,6 +165,20 @@ Optional property:
2000mW, while on a 10'' tablet is around
4500mW.
 
+- k-po:Proportional parameter of the PID controller 
when
+   current temperature is above the target.
+  Type: signed
+  Size: one cell
+
+- k-pu:Proportional parameter of the PID controller 
when
+   current temperature is below the target.
+  Type: signed
+  Size: one cell
+
+- k-i: Integral parameter of the PID controller.
+  Type: signed
+  Size: one cell
+
 Note: The delay properties are bound to the maximum dT/dt (temperature
 derivative over time) in two situations for a thermal zone:
 (i)  - when passive cooling is activated (polling-delay-passive); and
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index ddf88dbe7ba2..b2a9f92cd8d2 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -1089,6 +1089,7 @@ int __init of_parse_thermal_zones(void)
struct thermal_zone_params *tzp;
int i, mask = 0;
u32 prop;
+   s32 sval;
 
tz = thermal_of_build_thermal_zone(child);
if (IS_ERR(tz)) {
@@ -1113,6 +1114,12 @@ int __init of_parse_thermal_zones(void)
 
if (!of_property_read_u32(child, "sustainable-power", ))
tzp->sustainable_power = prop;
+   if (!of_property_read_s32(child, "k-po", ))
+   tzp->k_po = sval;
+   if (!of_property_read_s32(child, "k-pu", ))
+   tzp->k_pu = sval;
+   if (!of_property_read_s32(child, "k-i", ))
+   tzp->k_i = sval;
 
for (i = 0; i < tz->ntrips; i++)
mask |= 1 << i;
-- 
2.11.0





[PATCH] thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power

2020-06-19 Thread Finley Xiao
The function cpu_power_to_freq is used to find a frequency and set the
cooling device to consume at most the power to be converted. For example,
if the power to be converted is 80mW, and the em table is as follow.
struct em_cap_state table[] = {
/* KHz mW */
{ 1008000, 36, 0 },
{ 120, 49, 0 },
{ 1296000, 59, 0 },
{ 1416000, 72, 0 },
{ 1512000, 86, 0 },
};
The target frequency should be 1416000KHz, not 1512000KHz.

Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables")
Cc:  # v4.13+
Signed-off-by: Finley Xiao 
Acked-by: Viresh Kumar 
---
 drivers/thermal/cpufreq_cooling.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c 
b/drivers/thermal/cpufreq_cooling.c
index 9e124020519f..6c0e1b053126 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -123,12 +123,12 @@ static u32 cpu_power_to_freq(struct 
cpufreq_cooling_device *cpufreq_cdev,
 {
int i;
 
-   for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) {
-   if (power > cpufreq_cdev->em->table[i].power)
+   for (i = cpufreq_cdev->max_level; i >= 0; i--) {
+   if (power >= cpufreq_cdev->em->table[i].power)
break;
}
 
-   return cpufreq_cdev->em->table[i + 1].frequency;
+   return cpufreq_cdev->em->table[i].frequency;
 }
 
 /**
-- 
2.11.0





[PATCH] thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power

2020-06-18 Thread Finley Xiao
The function cpu_power_to_freq is used to find a frequency and set the
cooling device to consume at most the power to be converted. For example,
if the power to be converted is 80mW, and the em table is as follow.
struct em_cap_state table[] = {
/* KHz mW */
{ 1008000, 36, 0 },
{ 120, 49, 0 },
{ 1296000, 59, 0 },
{ 1416000, 72, 0 },
{ 1512000, 86, 0 },
};
The target frequency should be 1416000KHz, not 1512000KHz.

Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables")
Signed-off-by: Finley Xiao 
---
 drivers/thermal/cpufreq_cooling.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c 
b/drivers/thermal/cpufreq_cooling.c
index 9e124020519f..6c0e1b053126 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -123,12 +123,12 @@ static u32 cpu_power_to_freq(struct 
cpufreq_cooling_device *cpufreq_cdev,
 {
int i;
 
-   for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) {
-   if (power > cpufreq_cdev->em->table[i].power)
+   for (i = cpufreq_cdev->max_level; i >= 0; i--) {
+   if (power >= cpufreq_cdev->em->table[i].power)
break;
}
 
-   return cpufreq_cdev->em->table[i + 1].frequency;
+   return cpufreq_cdev->em->table[i].frequency;
 }
 
 /**
-- 
2.11.0





Re: [PATCH] xfs/XXX: Add xfs/XXX

2020-06-02 Thread Xiao Yang

On 2020/6/3 2:14, Darrick J. Wong wrote:

On Tue, Jun 02, 2020 at 04:51:48PM +0800, Xiao Yang wrote:

On 2020/4/14 0:30, Darrick J. Wong wrote:

This might be a good time to introduce a few new helpers:

_require_scratch_dax ("Does $SCRATCH_DEV support DAX?")
_require_scratch_dax_mountopt ("Does the fs support the DAX mount options?")
_require_scratch_daX_iflag ("Does the fs support FS_XFLAG_DAX?")

Hi Darrick,

Now, I am trying to introduce these new helpers and have some questions:
1) There are five testcases related to old dax implementation, should we
only convert them to new dax implementation or make them compatible with old
and new dax implementation?


What is the 'old' DAX implementation?  ext2 XIP?

Hi Darrick,

Thanks for your quick feedback.

Right, the 'old' DAX implementation means old dax mount option(i.e. -o dax)

Compare new and old dax mount option on ext4 and xfs, is the following 
logic right?

-o dax=always == -o dax
-o dax=never == without dax
-o dax=inode == nothing

Of course, we should uses new option if ext4/xfs supports new dax mount 
option on distros.  But should we fallback to use old option if ext4/xfs 
doesn't support new dax mount option on some old distros?

btw:
it seems hard for testcases to use two different sets of mount 
options(i.e. old and new) so do you have any suggestion?





2) I think _require_xfs_io_command "chattr" "x" is enough to check if fs
supports FS_XFLAG_DAX.  Is it necessary to add _require_scratch_dax_iflag()?
like this:
_require_scratch_dax_iflag()
{
_require_xfs_io_command "chattr" "x"
}


I suggested that list based on the major control knobs that will be
visible to userspace programs.  Even if this is just a one-line helper,
its name is useful for recognizing which of those knobs we're looking
for.

Yes, you could probably save a trivial amount of time by skipping one
iteration of bash function calling, but now everyone has to remember
that the xfs_io chattr "x" flag means the dax inode flag, and not
confuse it for chmod +x or something else.


Got it, thanks for your detailed explanation.

Best Regards,
Xiao Yang


--D


Best Regards,
Xiao Yang





.







Re: [PATCH] xfs/XXX: Add xfs/XXX

2020-06-02 Thread Xiao Yang

On 2020/4/14 0:30, Darrick J. Wong wrote:

This might be a good time to introduce a few new helpers:

_require_scratch_dax ("Does $SCRATCH_DEV support DAX?")
_require_scratch_dax_mountopt ("Does the fs support the DAX mount options?")
_require_scratch_daX_iflag ("Does the fs support FS_XFLAG_DAX?")

Hi Darrick,

Now, I am trying to introduce these new helpers and have some questions:
1) There are five testcases related to old dax implementation, should we 
only convert them to new dax implementation or make them compatible with 
old and new dax implementation?


2) I think _require_xfs_io_command "chattr" "x" is enough to check if fs 
supports FS_XFLAG_DAX.  Is it necessary to add 
_require_scratch_dax_iflag()? like this:

_require_scratch_dax_iflag()
{
_require_xfs_io_command "chattr" "x"
}

Best Regards,
Xiao Yang




Re: [PATCH V4 6/8] fs/ext4: Make DAX mount option a tri-state

2020-06-01 Thread Xiao Yang

On 2020/5/28 17:41, Jan Kara wrote:

On Thu 28-05-20 16:56:51, Xiao Yang wrote:

On 2020/5/28 7:50, Ira Weiny wrote:

On Wed, May 27, 2020 at 01:54:54PM +0800, Xiao Yang wrote:

On 2020/5/22 3:13, ira.we...@intel.com wrote:

From: Ira Weiny

We add 'always', 'never', and 'inode' (default).  '-o dax' continues to
operate the same which is equivalent to 'always'.  This new
functionality is limited to ext4 only.

Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set
it and EXT4_MOUNT_DAX_ALWAYS appropriately for the mode.

We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX.

Finally, EXT4_MOUNT2_DAX_INODE is used solely to detect if the user
specified that option for printing.

Hi Ira,

I have two questions when reviewing this patch:
1) After doing mount with the same dax=inode option, ext4/xfs shows
differnt output(i.e. xfs doesn't print 'dax=inode'):
---
# mount -o dax=inode /dev/pmem0 /mnt/xfstests/test/
# mount | grep pmem0
/dev/pmem0 on /mnt/xfstests/test type ext4 (rw,relatime,seclabel,dax=inode)

# mount -odax=inode /dev/pmem1 /mnt/xfstests/scratch/
# mount | grep pmem1
/dev/pmem1 on /mnt/xfstests/scratch type xfs
(rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Is this expected output? why don't unify the output?


Correct. dax=inode is the default.  xfs treats that default the same whether
you specify it on the command line or not.

For ext4 Jan specifically asked that if the user specified dax=inode on the
command line that it be printed on the mount options.  If you don't specify
anything then dax=inode is in effect but ext4 will not print anything.

I had the behavior the same as XFS originally but Jan wanted it this way.  The
XFS behavior is IMO better and is what the new mount infrastructure gives by
default.


Could we unify the output?  It is strange for me to use differnt output on
ext4 and xfs.


If we'd unify the output with XFS, it would be inconsistent with all the
other ext4 mount options. So I disagree with that. I agree it is not ideal
to have different behavior between xfs and ext4 but such is the historical
behavior. If we want to change that, we need to change the handling for all
the ext4 mount options. I'm open for that discussion but it is a problem
unrelated to this patch set.

Hi Jan,

Thanks for your quick feedback.
Of course, this doubt should not block the patch set.

Best Regards,
Xiao Yang


Honza






Re: [PATCH V4 6/8] fs/ext4: Make DAX mount option a tri-state

2020-05-28 Thread Xiao Yang

On 2020/5/28 7:50, Ira Weiny wrote:

On Wed, May 27, 2020 at 01:54:54PM +0800, Xiao Yang wrote:

On 2020/5/22 3:13, ira.we...@intel.com wrote:

From: Ira Weiny

We add 'always', 'never', and 'inode' (default).  '-o dax' continues to
operate the same which is equivalent to 'always'.  This new
functionality is limited to ext4 only.

Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set
it and EXT4_MOUNT_DAX_ALWAYS appropriately for the mode.

We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX.

Finally, EXT4_MOUNT2_DAX_INODE is used solely to detect if the user
specified that option for printing.

Hi Ira,

I have two questions when reviewing this patch:
1) After doing mount with the same dax=inode option, ext4/xfs shows
differnt output(i.e. xfs doesn't print 'dax=inode'):
---
# mount -o dax=inode /dev/pmem0 /mnt/xfstests/test/
# mount | grep pmem0
/dev/pmem0 on /mnt/xfstests/test type ext4 (rw,relatime,seclabel,dax=inode)

# mount -odax=inode /dev/pmem1 /mnt/xfstests/scratch/
# mount | grep pmem1
/dev/pmem1 on /mnt/xfstests/scratch type xfs
(rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Is this expected output? why don't unify the output?


Correct. dax=inode is the default.  xfs treats that default the same whether
you specify it on the command line or not.

For ext4 Jan specifically asked that if the user specified dax=inode on the
command line that it be printed on the mount options.  If you don't specify
anything then dax=inode is in effect but ext4 will not print anything.

I had the behavior the same as XFS originally but Jan wanted it this way.  The
XFS behavior is IMO better and is what the new mount infrastructure gives by
default.

Hi Ira,

Could we unify the output?  It is strange for me to use differnt output 
on ext4 and xfs.






2) Do mount without dax and mount with -odax=inode have the same behavior?


Yes.


---
# mount /dev/pmem0 /mnt/xfstests/test/
# mount | grep pmem0
/dev/pmem0 on /mnt/xfstests/test type ext4 (rw,relatime,seclabel)
# umount /mnt/xfstests/test
# mount -odax=inode /dev/pmem0 /mnt/xfstests/test/
# mount | grep pmem0
/dev/pmem0 on /mnt/xfstests/test type ext4 (rw,relatime,seclabel,dax=inode
---


These are the same behavior.  But because you specified the option it was
echoed in the second output.


Got it, thanks for your explanation.





BTW: I focus on the support of per-file/directory DAX operations recently.



Reviewed-by: Jan Kara
Signed-off-by: Ira Weiny

---
Changes from V1:
Fix up mounting options to only show an option if specified
Fix remount to prevent dax changes
Isolate behavior to ext4 only

Changes from RFC:
Combine remount check for DAX_NEVER with DAX_ALWAYS
Update ext4_should_enable_dax()
---
   fs/ext4/ext4.h  |  2 ++
   fs/ext4/inode.c |  2 ++
   fs/ext4/super.c | 67 +
   3 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f5291693ce6e..65ffb831b2b9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1168,6 +1168,8 @@ struct ext4_inode_info {
  blocks */
   #define EXT4_MOUNT2_HURD_COMPAT  0x0004 /* Support 
HURD-castrated
  file systems */
+#define EXT4_MOUNT2_DAX_NEVER  0x0008 /* Do not allow Direct 
Access */
+#define EXT4_MOUNT2_DAX_INODE  0x0010 /* For printing options only 
*/

   #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM0x0008 /* User 
explicitly
specified journal checksum */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 01636cf5f322..68fac9289109 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4402,6 +4402,8 @@ static bool ext4_should_enable_dax(struct inode *inode)
   {
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);

+   if (test_opt2(inode->i_sb, DAX_NEVER))
+   return false;
if (!S_ISREG(inode->i_mode))
return false;
if (ext4_should_journal_data(inode))
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 80eb814c47eb..5e056aa20ce9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1512,7 +1512,8 @@ enum {
Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota,
Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
-   Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version, Opt_dax,
+   Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+   Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn

Re: [PATCH V4 6/8] fs/ext4: Make DAX mount option a tri-state

2020-05-26 Thread Xiao Yang
t_delalloc, "delalloc"},
>   {Opt_warn_on_error, "warn_on_error"},
> @@ -1726,6 +1730,7 @@ static int clear_qf_name(struct super_block *sb, int 
> qtype)
>   #define MOPT_NO_EXT30x0200
>   #define MOPT_EXT4_ONLY  (MOPT_NO_EXT2 | MOPT_NO_EXT3)
>   #define MOPT_STRING 0x0400
> +#define MOPT_SKIP0x0800
> 
>   static const struct mount_opts {
>   int token;
> @@ -1775,7 +1780,13 @@ static const struct mount_opts {
>   {Opt_min_batch_time, 0, MOPT_GTE0},
>   {Opt_inode_readahead_blks, 0, MOPT_GTE0},
>   {Opt_init_itable, 0, MOPT_GTE0},
> - {Opt_dax, EXT4_MOUNT_DAX_ALWAYS, MOPT_SET},
> + {Opt_dax, EXT4_MOUNT_DAX_ALWAYS, MOPT_SET | MOPT_SKIP},
> + {Opt_dax_always, EXT4_MOUNT_DAX_ALWAYS,
> + MOPT_EXT4_ONLY | MOPT_SET | MOPT_SKIP},
> + {Opt_dax_inode, EXT4_MOUNT2_DAX_INODE,
> + MOPT_EXT4_ONLY | MOPT_SET | MOPT_SKIP},
> + {Opt_dax_never, EXT4_MOUNT2_DAX_NEVER,
> + MOPT_EXT4_ONLY | MOPT_SET | MOPT_SKIP},
>   {Opt_stripe, 0, MOPT_GTE0},
>   {Opt_resuid, 0, MOPT_GTE0},
>   {Opt_resgid, 0, MOPT_GTE0},
> @@ -2084,13 +2095,32 @@ static int handle_mount_opt(struct super_block *sb, 
> char *opt, int token,
>   }
>   sbi->s_jquota_fmt = m->mount_opt;
>   #endif
> - } else if (token == Opt_dax) {
> + } else if (token == Opt_dax || token == Opt_dax_always ||
> +token == Opt_dax_inode || token == Opt_dax_never) {
>   #ifdef CONFIG_FS_DAX
> - ext4_msg(sb, KERN_WARNING,
> - "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> - sbi->s_mount_opt |= m->mount_opt;
> + switch (token) {
> + case Opt_dax:
> + case Opt_dax_always:
> + ext4_msg(sb, KERN_WARNING,
> + "DAX enabled. Warning: EXPERIMENTAL, use at 
> your own risk");
> + sbi->s_mount_opt |= EXT4_MOUNT_DAX_ALWAYS;
> + sbi->s_mount_opt2&= ~EXT4_MOUNT2_DAX_NEVER;
> + break;
> + case Opt_dax_never:
> + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
> + sbi->s_mount_opt&= ~EXT4_MOUNT_DAX_ALWAYS;
> + break;
> + case Opt_dax_inode:
> + sbi->s_mount_opt&= ~EXT4_MOUNT_DAX_ALWAYS;
> + sbi->s_mount_opt2&= ~EXT4_MOUNT2_DAX_NEVER;
> + /* Strictly for printing options */
> + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_INODE;
> + break;
> + }
>   #else
>   ext4_msg(sb, KERN_INFO, "dax option not supported");
> + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER;
> + sbi->s_mount_opt&= ~EXT4_MOUNT_DAX_ALWAYS;
>   return -1;
>   #endif

For s_mount_opt/s_mount_opt2, could we make the code more readable by
using set_opt()/set_opt2()/clear_opt()/clear_opt2() macros?

Thanks,
Xiao Yang
>   } else if (token == Opt_data_err_abort) {
> @@ -2254,7 +2284,7 @@ static int _ext4_show_options(struct seq_file *seq, 
> struct super_block *sb,
>   for (m = ext4_mount_opts; m->token != Opt_err; m++) {
>   int want_set = m->flags&  MOPT_SET;
>   if (((m->flags&  (MOPT_SET|MOPT_CLEAR)) == 0) ||
> - (m->flags&  MOPT_CLEAR_ERR))
> + (m->flags&  MOPT_CLEAR_ERR) || m->flags&  MOPT_SKIP)
>   continue;
>   if (!nodefs&&  !(m->mount_opt&  (sbi->s_mount_opt ^ 
> def_mount_opt)))
>   continue; /* skip if same as the default */
> @@ -2314,6 +2344,17 @@ static int _ext4_show_options(struct seq_file *seq, 
> struct super_block *sb,
>   if (DUMMY_ENCRYPTION_ENABLED(sbi))
>   SEQ_OPTS_PUTS("test_dummy_encryption");
> 
> + if (test_opt(sb, DAX_ALWAYS)) {
> + if (IS_EXT2_SB(sb))
> + SEQ_OPTS_PUTS("dax");
> + else
> + SEQ_OPTS_PUTS("dax=always");
> + } else if (test_opt2(sb, DAX_NEVER)) {
> + SEQ_OPTS_PUTS("dax=never");
> + } else if (test_opt2(sb, DAX_INODE)) {
> + SEQ_OPTS_PUTS("dax=inode");
> + }
> +
>   ext4_show_quota_options(seq, sb);
>   return 0;
>   }
> @@ -5436,10 +5477,16 @@ static int ext4_remount(struct super_block *sb, int 
> *flags, char *data)
>   goto restore_opts;
>   }
> 
&

Re: [PATCH v2] tools/perf/util/*.c: Use "%zu" output format for size_t type

2020-05-11 Thread Xiao Yang
Hi,

Ping!

Thanks,
Xiao Yang
On 2020/2/26 9:08, Xiao Yang wrote:
> Avoid the following errors when building perf on i386:
> ---
> util/session.c: In function 'perf_session__process_compressed_event':
> util/session.c:91:11: error: format '%ld' expects argument of type 'long 
> int', but argument 4 has type 'size_t {aka unsigned int}' [-Werror=format=]
>pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size);
> ...
> til/zstd.c: In function 'zstd_decompress_stream':
> util/zstd.c:102:11: error: format '%ld' expects argument of type 'long int', 
> but argument 4 has type 'size_t {aka unsigned int}' [-Werror=format=]
>  pr_err("failed to decompress (B): %ld ->  %ld, dst_size %ld : %s\n",
> ...
> ---
> 
> Reported-by: kernel test robot
> Signed-off-by: Xiao Yang
> ---
>   tools/perf/util/session.c | 2 +-
>   tools/perf/util/zstd.c| 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index d0d7d25b23e3..55c3d2fefd41 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -88,7 +88,7 @@ static int perf_session__process_compressed_event(struct 
> perf_session *session,
>   session->decomp_last = decomp;
>   }
> 
> - pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size);
> + pr_debug("decomp (B): %zu to %zu\n", src_size, decomp_size);
> 
>   return 0;
>   }
> diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
> index d2202392ffdb..a051237cacf2 100644
> --- a/tools/perf/util/zstd.c
> +++ b/tools/perf/util/zstd.c
> @@ -99,7 +99,7 @@ size_t zstd_decompress_stream(struct zstd_data *data, void 
> *src, size_t src_size
>   while (input.pos<  input.size) {
>   ret = ZSTD_decompressStream(data->dstream,,);
>   if (ZSTD_isError(ret)) {
> - pr_err("failed to decompress (B): %ld ->  %ld, dst_size 
> %ld : %s\n",
> + pr_err("failed to decompress (B): %zu ->  %zu, dst_size 
> %zu : %s\n",
>  src_size, output.size, dst_size, 
> ZSTD_getErrorName(ret));
>   break;
>   }





Re: [PATCH 3/3] selftests/ftrace: Use /bin/echo instead of built-in echo

2020-05-11 Thread Xiao Yang

On 2020/5/7 17:15, Masami Hiramatsu wrote:

On Thu, 7 May 2020 14:45:16 +0800
Xiao Yang  wrote:


On 2020/5/1 21:38, Masami Hiramatsu wrote:

Since the built-in echo has different behavior in POSIX shell
(dash) and bash, we forcibly use /bin/echo -E (not interpret
backslash escapes) by default.

This also fixes some test cases which expects built-in
echo command.

Reported-by: Liu Yiding
Signed-off-by: Masami Hiramatsu
---
   tools/testing/selftests/ftrace/test.d/functions|3 +++
   .../test.d/trigger/trigger-trace-marker-hist.tc|2 +-
   .../trigger-trace-marker-synthetic-kernel.tc   |4 
   .../trigger/trigger-trace-marker-synthetic.tc  |4 ++--
   4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/functions 
b/tools/testing/selftests/ftrace/test.d/functions
index 5d4550591ff9..ea59b6ea2c3e 100644
--- a/tools/testing/selftests/ftrace/test.d/functions
+++ b/tools/testing/selftests/ftrace/test.d/functions
@@ -1,3 +1,6 @@
+# Since the built-in echo has different behavior in POSIX shell (dash) and
+# bash, we forcibly use /bin/echo -E (not interpret backslash escapes).
+alias echo="/bin/echo -E"

Hi Masami, Steven

It seems that only kprobe_syntax_errors.tc is impacted by the issue
currently.  Is it necessary for all tests to use /bin/echo and could we
just make kprobe_syntax_errors.tc use /bin/echo?


Yes, I would like to unify the "echo"'s behavior among the testcases
instead of patching each failure in the future.
Or would you have any concern on it?

Hi Masami,

Very sorry for the late reply.

We may not avoid fixing related failures after your change:
1) We have to reuse built-in echo (do alias echo=echo) if we want to 
test common_pid for histogram.
2) We have to reuse built-in echo if some new tests want to interpret 
backslash escapes in future.


Is it simple to provide two implementations of echo?(built-in echo and 
echo command?) and then just apply echo command for kprobe_syntax_errors.tc?


BTW: My suggestion may not be correct.

Best Regards,
Xiao Yang


Thank you,



Best Regards,
Xiao Yang



   clear_trace() { # reset trace output
   echo>   trace
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc
index ab6bedb25736..b3f70f53ee69 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc
@@ -30,7 +30,7 @@ fi

   echo "Test histogram trace_marker tigger"

-echo 'hist:keys=common_pid'>   events/ftrace/print/trigger
+echo 'hist:keys=ip'>   events/ftrace/print/trigger
   for i in `seq 1 10` ; do echo "hello">   trace_marker; done
   grep 'hitcount: *10$' events/ftrace/print/hist>   /dev/null || \
   fail "hist trigger did not trigger correct times on trace_marker"
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc
index 18b4d1c2807e..c1625d945f4d 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc
@@ -44,6 +44,10 @@ echo 'latency u64 lat'>   synthetic_events
   echo 'hist:keys=pid:ts0=common_timestamp.usecs'>   
events/sched/sched_waking/trigger
   echo 
'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).latency($lat)'>
   events/ftrace/print/trigger
   echo 'hist:keys=common_pid,lat:sort=lat'>   events/synthetic/latency/trigger
+
+# We have to use the built-in echo here because waking up pid must be same
+# as echoing pid.
+alias echo=echo
   sleep 1
   echo "hello">   trace_marker

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc
index dd262d6d0db6..23e52c8d71de 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc
@@ -36,8 +36,8 @@ fi
   echo "Test histogram trace_marker to trace_marker latency histogram trigger"

   echo 'latency u64 lat'>   synthetic_events
-echo 'hist:keys=common_pid:ts0=common_timestamp.usecs if buf == "start"'>   
events/ftrace/print/trigger
-echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print).latency($lat) 
if buf == "end"'>>   events/ftrace/print/trigger
+echo 'hist:keys=ip:ts0=common_timestamp.usecs if buf == "start"'>   
events/ftrace/print/trigger
+echo 'hist:keys=ip:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print

Re: [PATCH] tracing: Wait for preempt irq delay thread to finish

2020-05-07 Thread Xiao Yang

Hi Steven,

Thanks for your further investigation.

I used the following ways to test your fix patch on my slow vm and 
didn't see any issue:

1) Insert and remove preemptirq_delay_test in loops.
2) Insert preemptirq_delay_test, write to 
/sys/kernel/preemptirq_delay_test/trigger and remove 
preemptirq_delay_test in loops.

3) Ran irqsoff_tracer.tc in loops.

BTW: For irqsoff_tracer.tc, should we extend code to test the burst 
feature and the sysfs trigger?


Reviewed-by: Xiao Yang 

Thanks,
Xiao Yang
On 2020/5/6 22:30, Steven Rostedt wrote:

From: "Steven Rostedt (VMware)"

Running on a slower machine, it is possible that the preempt delay kernel
thread may still be executing if the module was immediately removed after
added, and this can cause the kernel to crash as the kernel thread might be
executing after its code has been removed.

There's no reason that the caller of the code shouldn't just wait for the
delay thread to finish, as the thread can also be created by a trigger in
the sysfs code, which also has the same issues.

Link: http://lore.kernel.org/r/5ea2b0c8.2080...@cn.fujitsu.com

Cc: sta...@vger.kernel.org
Fixes: 793937236d1ee ("lib: Add module for testing preemptoff/irqsoff latency 
tracers")
Reported-by: Xiao Yang
Signed-off-by: Steven Rostedt (VMware)
---
  kernel/trace/preemptirq_delay_test.c | 30 ++--
  1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index 31c0fad4cb9e..c4c86de63cf9 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -113,22 +113,42 @@ static int preemptirq_delay_run(void *data)

for (i = 0; i<  s; i++)
(testfuncs[i])(i);
+
+   set_current_state(TASK_INTERRUPTIBLE);
+   while (!kthread_should_stop()) {
+   schedule();
+   set_current_state(TASK_INTERRUPTIBLE);
+   }
+
+   __set_current_state(TASK_RUNNING);
+
return 0;
  }

-static struct task_struct *preemptirq_start_test(void)
+static int preemptirq_run_test(void)
  {
+   struct task_struct *task;
+
char task_name[50];

snprintf(task_name, sizeof(task_name), "%s_test", test_mode);
-   return kthread_run(preemptirq_delay_run, NULL, task_name);
+   task =  kthread_run(preemptirq_delay_run, NULL, task_name);
+   if (IS_ERR(task))
+   return PTR_ERR(task);
+   if (task)
+   kthread_stop(task);
+   return 0;
  }


  static ssize_t trigger_store(struct kobject *kobj, struct kobj_attribute 
*attr,
 const char *buf, size_t count)
  {
-   preemptirq_start_test();
+   ssize_t ret;
+
+   ret = preemptirq_run_test();
+   if (ret)
+   return ret;
return count;
  }

@@ -148,11 +168,9 @@ static struct kobject *preemptirq_delay_kobj;

  static int __init preemptirq_delay_init(void)
  {
-   struct task_struct *test_task;
int retval;

-   test_task = preemptirq_start_test();
-   retval = PTR_ERR_OR_ZERO(test_task);
+   retval = preemptirq_run_test();
if (retval != 0)
return retval;







Re: [PATCH 3/3] selftests/ftrace: Use /bin/echo instead of built-in echo

2020-05-07 Thread Xiao Yang

On 2020/5/1 21:38, Masami Hiramatsu wrote:

Since the built-in echo has different behavior in POSIX shell
(dash) and bash, we forcibly use /bin/echo -E (not interpret
backslash escapes) by default.

This also fixes some test cases which expects built-in
echo command.

Reported-by: Liu Yiding
Signed-off-by: Masami Hiramatsu
---
  tools/testing/selftests/ftrace/test.d/functions|3 +++
  .../test.d/trigger/trigger-trace-marker-hist.tc|2 +-
  .../trigger-trace-marker-synthetic-kernel.tc   |4 
  .../trigger/trigger-trace-marker-synthetic.tc  |4 ++--
  4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/functions 
b/tools/testing/selftests/ftrace/test.d/functions
index 5d4550591ff9..ea59b6ea2c3e 100644
--- a/tools/testing/selftests/ftrace/test.d/functions
+++ b/tools/testing/selftests/ftrace/test.d/functions
@@ -1,3 +1,6 @@
+# Since the built-in echo has different behavior in POSIX shell (dash) and
+# bash, we forcibly use /bin/echo -E (not interpret backslash escapes).
+alias echo="/bin/echo -E"

Hi Masami, Steven

It seems that only kprobe_syntax_errors.tc is impacted by the issue 
currently.  Is it necessary for all tests to use /bin/echo and could we 
just make kprobe_syntax_errors.tc use /bin/echo?


Best Regards,
Xiao Yang



  clear_trace() { # reset trace output
  echo>  trace
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc
index ab6bedb25736..b3f70f53ee69 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc
@@ -30,7 +30,7 @@ fi

  echo "Test histogram trace_marker tigger"

-echo 'hist:keys=common_pid'>  events/ftrace/print/trigger
+echo 'hist:keys=ip'>  events/ftrace/print/trigger
  for i in `seq 1 10` ; do echo "hello">  trace_marker; done
  grep 'hitcount: *10$' events/ftrace/print/hist>  /dev/null || \
  fail "hist trigger did not trigger correct times on trace_marker"
diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc
index 18b4d1c2807e..c1625d945f4d 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc
@@ -44,6 +44,10 @@ echo 'latency u64 lat'>  synthetic_events
  echo 'hist:keys=pid:ts0=common_timestamp.usecs'>  
events/sched/sched_waking/trigger
  echo 
'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).latency($lat)'>
  events/ftrace/print/trigger
  echo 'hist:keys=common_pid,lat:sort=lat'>  events/synthetic/latency/trigger
+
+# We have to use the built-in echo here because waking up pid must be same
+# as echoing pid.
+alias echo=echo
  sleep 1
  echo "hello">  trace_marker

diff --git 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc
 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc
index dd262d6d0db6..23e52c8d71de 100644
--- 
a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc
+++ 
b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc
@@ -36,8 +36,8 @@ fi
  echo "Test histogram trace_marker to trace_marker latency histogram trigger"

  echo 'latency u64 lat'>  synthetic_events
-echo 'hist:keys=common_pid:ts0=common_timestamp.usecs if buf == "start"'>  
events/ftrace/print/trigger
-echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print).latency($lat) 
if buf == "end"'>>  events/ftrace/print/trigger
+echo 'hist:keys=ip:ts0=common_timestamp.usecs if buf == "start"'>  
events/ftrace/print/trigger
+echo 'hist:keys=ip:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print).latency($lat) if buf 
== "end"'>>  events/ftrace/print/trigger
  echo 'hist:keys=common_pid,lat:sort=lat'>  events/synthetic/latency/trigger
  echo -n "start">  trace_marker
  echo -n "end">  trace_marker



.







Re: [PATCH] kernel/trace: Stop and wait for kthread on preempt irq module unload

2020-04-29 Thread Xiao Yang

On 2020/4/28 22:45, Steven Rostedt wrote:

diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c

 index 1c28ca20e30b..6d9131ae7e8c 100644
 --- a/kernel/trace/preemptirq_delay_test.c
 +++ b/kernel/trace/preemptirq_delay_test.c
 @@ -113,15 +113,27 @@ static int preemptirq_delay_run(void *data)

for (i = 0; i<  s; i++)
(testfuncs[i])(i);
 +
 +  while (!kthread_should_stop()) {
 +  schedule();
 +  set_current_state(TASK_INTERRUPTIBLE);
 +  }
 +
 +  __set_current_state(TASK_RUNNING);
 +
return 0;
   }


Hi Steven,

Thanks for your patch.

I also used the following steps to do test and didn't get any 
warning/panic after applying your patch.

-
for i in $(seq 1 100); do modprobe preemptirq_delay_test test_mode=irq 
delay=50; rmmod preemptirq_delay_test; done
for i in $(seq 1 100); do modprobe preemptirq_delay_test 
test_mode=preempt delay=50; rmmod preemptirq_delay_test; done

-

But I am not sure which fix(from you and Joel) is better.

Thanks,
Xiao Yang





Re: [PATCH] kernel/trace: Stop and wait for kthread on preempt irq module unload

2020-04-29 Thread Xiao Yang

On 2020/4/28 22:15, Joel Fernandes wrote:

I am wondering if it is because in your test, the kthread exits too quickly.
We have these comments in kthread_stop():
  * If threadfn() may call do_exit() itself, the caller must ensure
  * task_struct can't go away.

Does the below diff on top of the previous patch help?

---8<---

diff --git a/kernel/trace/preemptirq_delay_test.c 
b/kernel/trace/preemptirq_delay_test.c
index 1c28ca20e30b6..8051946a18989 100644
--- a/kernel/trace/preemptirq_delay_test.c
+++ b/kernel/trace/preemptirq_delay_test.c
@@ -152,6 +152,8 @@ static int __init preemptirq_delay_init(void)
int retval;

test_task = preemptirq_start_test();
+   get_task_struct(test_task);
+
retval = PTR_ERR_OR_ZERO(test_task);
if (retval != 0)
return retval;
@@ -172,8 +174,10 @@ static void __exit preemptirq_delay_exit(void)
  {
kobject_put(preemptirq_delay_kobj);

-   if (test_task)
+   if (test_task) {
kthread_stop(test_task);
+   put_task_struct(test_task);
+   }
  }


Hi Joel,

Thanks for your additional patch.

First, We have to avoid kbuild error by including 
---
kernel/trace/preemptirq_delay_test.c: In function ‘preemptirq_delay_init’:
kernel/trace/preemptirq_delay_test.c:155:2: error: implicit declaration 
of function ‘get_task_struct’; did you mean ‘set_task_cpu’? 
[-Werror=implicit-function-declaration]

  get_task_struct(test_task);
  ^~~
  set_task_cpu
kernel/trace/preemptirq_delay_test.c: In function ‘preemptirq_delay_exit’:
kernel/trace/preemptirq_delay_test.c:179:3: error: implicit declaration 
of function ‘put_task_struct’; did you mean ‘set_task_cpu’? 
[-Werror=implicit-function-declaration]

   put_task_struct(test_task);
   ^~~
   set_task_cpu
cc1: some warnings being treated as errors
---

Second, I used the following steps to do test and didn't get any 
warning/panic after applying your additional patch:

---
for i in $(seq 1 100); do modprobe preemptirq_delay_test test_mode=irq 
delay=50; rmmod preemptirq_delay_test; done
for i in $(seq 1 100); do modprobe preemptirq_delay_test 
test_mode=preempt delay=50; rmmod preemptirq_delay_test; done

---

Thanks,
Xiao Yang




Re: [PATCH] kernel/trace: Stop and wait for kthread on preempt irq module unload

2020-04-29 Thread Xiao Yang
[  178.641300] Internal error: Oops: 9604 [#1] PREEMPT SMP
[  178.641931] Modules linked in: sunrpc vfat fat ext4 mbcache jbd2
crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ip_tables xfs
libcrc32c virtio_net net_failover virtio_mmio failover virtio_blk [last
unloaded: preemptirq_delay_test]
[  178.643547] CPU: 1 PID: 2394 Comm: pmlogger_check Tainted: GW
5.6.0-rc7+ #18
[  178.644219] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0
02/06/2015
[  178.644782] pstate: 6005 (nZCv daif -PAN -UAO)
[  178.645219] pc : __kmalloc+0xc8/0x2e8
[  178.645538] lr : __kmalloc+0x9c/0x2e8
[  178.645841] sp : fe001312fc20
[  178.646127] x29: fe001312fc20 x28: fc0042a0af00
[  178.646532] x27: fc005a7ad000 x26: cc79f001
[  178.647037] x25: fc0042a0af00 x24: fc007d76fc00
[  178.647431] x23: fe00103e95e4 x22: 001b
[  178.647895] x21: 0cc0 x20: fc007d76fc00
[  178.648276] x19: fc0043ca x18: 
[  178.648673] x17:  x16: 
[  178.649077] x15:  x14: 0010
[  178.649608] x13:  x12: 
[  178.649963] x11:  x10: 
[  178.650337] x9 :  x8 : 1f18
[  178.650683] x7 : 1f18 x6 : fc005a7ad1f8
[  178.651099] x5 :  x4 : 
[  178.651557] x3 :  x2 : fe001021d1e0
[  178.652006] x1 :  x0 : 0001
[  178.652407] Call trace:
[  178.652694]  __kmalloc+0xc8/0x2e8
[  178.652972]  load_elf_binary+0xdc/0xdb8
[  178.653286]  search_binary_handler+0x98/0x268
[  178.653626]  __do_execve_file+0x4ec/0x7c8
[  178.653985]  __arm64_sys_execve+0x40/0x50
[  178.654310]  do_el0_svc+0xf8/0x1e0
[  178.654578]  el0_sync_handler+0x134/0x1bc
[  178.654962]  el0_sync+0x140/0x180
[  178.655509] Code: b4001073 b9402281 b9401380 11000400 (f8616a7b)
[  178.656543] ---[ end trace 437c7bc9df5e92df ]---
[  178.657100] Kernel panic - not syncing: Fatal exception
[  178.657630] SMP: stopping secondary CPUs
[  178.658237] Kernel Offset: disabled
[  178.658644] CPU features: 0x10002,20006082
[  178.658903] Memory Limit: none
[  178.659263] ---[ end Kernel panic - not syncing: Fatal exception ]---
---

Thanks,
Xiao Yang
On 2020/4/28 18:19, Xiao Yang wrote:
> Hi Joel,
> 
> Thanks for your quick fix.
> 
> Unfortunately, it fixes my original panic but introduces other
> issues(two wanings and one panic) on my arm64 vm, as below:
> 
> [ 3465.434942] [ cut here ]
> [ 3465.435481] refcount_t: addition on 0; use-after-free.
> [ 3465.437071] WARNING: CPU: 1 PID: 6708 at lib/refcount.c:25
> refcount_warn_saturate+0x9c/0x140
> [ 3465.437720] Modules linked in: preemptirq_delay_test(O-) sunrpc vfat
> fat ext4 mbcache jbd2 crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce
> ip_tables xfs libcrc32c virtio_net net_failover failover virtio_mmio
> virtio_blk
> [ 3465.439787] CPU: 1 PID: 6708 Comm: rmmod Tainted: G   O
> 5.6.0-rc7+ #18
> [ 3465.440316] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0
> 02/06/2015
> [ 3465.440967] pstate: 4005 (nZcv daif -PAN -UAO)
> [ 3465.441297] pc : refcount_warn_saturate+0x9c/0x140
> [ 3465.441592] lr : refcount_warn_saturate+0x9c/0x140
> [ 3465.441919] sp : fe001382fd70
> [ 3465.442160] x29: fe001382fd70 x28: fc004876d200
> [ 3465.442649] x27:  x26: 
> [ 3465.443071] x25:  x24: fe00115bbbf0
> [ 3465.443670] x23:  x22: 0200
> [ 3465.444194] x21: fe0011273988 x20: fc0063fdd228
> [ 3465.444576] x19: fc0063fdd200 x18: 0010
> [ 3465.444939] x17:  x16: 
> [ 3465.445329] x15:  x14: fe0011273988
> [ 3465.445698] x13: fe009382fa97 x12: fe001382fa9f
> [ 3465.446116] x11: fe00112b x10: fe001382fa20
> [ 3465.446498] x9 : ffd0 x8 : 6572662d72657466
> [ 3465.446941] x7 : 0149 x6 : fe001127cf50
> [ 3465.447375] x5 : fe001127c000 x4 : 
> [ 3465.447757] x3 : fe001127cf50 x2 : 
> [ 3465.448161] x1 : 2e36d2803fe6b700 x0 : 
> [ 3465.448702] Call trace:
> [ 3465.448979]  refcount_warn_saturate+0x9c/0x140
> [ 3465.449330]  kthread_stop+0x48/0x278
> [ 3465.450144]  preemptirq_delay_exit+0x28/0xfc8c [preemptirq_delay_test]
> [ 3465.450625]  __arm64_sys_delete_module+0x14c/0x298
> [ 3465.450998]  do_el0_svc+0xf8/0x1e0
> [ 3465.451372]  el0_sync_handler+0x134/0x1bc
> [ 3465.451701]  el0_sync+0x140/0x180
> [ 3465.452099] ---[ end trace 1a8ec2201af5e8c7 ]---
> [ 3465.478208] ---

Re: [PATCH] selftests/ftrace: treat module requirement unmet situation as unsupported

2020-04-29 Thread Xiao Yang
Hi Lin,

It looks fine to me.
Reviewed-by: Xiao Yang 

Thanks,
Xiao Yang
On 2020/4/29 17:50, Po-Hsu Lin wrote:
> When the required module for the test does not exist, use
> exit_unsupported instead of exit_unresolved to indicate this test is
> not supported.
> 
> By doing this we can make test behaviour in sync with the
> irqsoff_tracer.tc test in preemptirq, which is also treating module
> existence in this way. Moreover, the test won't exit with a non-zero
> return value if the module does not exist.
> 
> Fixes: 646f01ccdd59 ("ftrace/selftest: Add tests to test 
> register_ftrace_direct()")
> Fixes: 4d23e9b4fd2e ("selftests/ftrace: Add trace_printk sample module test")
> Fixes: 7bc026d6c032 ("selftests/ftrace: Add function filter on module 
> testcase")
> Fixes: af2a0750f374 ("selftests/ftrace: Improve kprobe on module testcase to 
> load/unload module")
> Signed-off-by: Po-Hsu Lin
> ---
>   tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc  | 2 +-
>   tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc  | 2 +-
>   tools/testing/selftests/ftrace/test.d/event/trace_printk.tc| 2 +-
>   tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc | 2 +-
>   tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc  | 2 +-
>   5 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc 
> b/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc
> index d75a869..3d6189e 100644
> --- a/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc
> +++ b/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc
> @@ -5,7 +5,7 @@
>   rmmod ftrace-direct ||:
>   if ! modprobe ftrace-direct ; then
> echo "No ftrace-direct sample module - please make 
> CONFIG_SAMPLE_FTRACE_DIRECT=m"
> -  exit_unresolved;
> +  exit_unsupported;
>   fi
> 
>   echo "Let the module run a little"
> diff --git a/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc 
> b/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc
> index 801ecb6..3d0e3ca 100644
> --- a/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc
> +++ b/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc
> @@ -5,7 +5,7 @@
>   rmmod ftrace-direct ||:
>   if ! modprobe ftrace-direct ; then
> echo "No ftrace-direct sample module - please build with 
> CONFIG_SAMPLE_FTRACE_DIRECT=m"
> -  exit_unresolved;
> +  exit_unsupported;
>   fi
> 
>   if [ ! -f kprobe_events ]; then
> diff --git a/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc 
> b/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc
> index b02550b..dd8b10d 100644
> --- a/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc
> +++ b/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc
> @@ -5,7 +5,7 @@
>   rmmod trace-printk ||:
>   if ! modprobe trace-printk ; then
> echo "No trace-printk sample module - please make 
> CONFIG_SAMPLE_TRACE_PRINTK=m"
> -  exit_unresolved;
> +  exit_unsupported;
>   fi
> 
>   echo "Waiting for irq work"
> diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc 
> b/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc
> index 1a4b4a4..26dc06a 100644
> --- a/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc
> +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc
> @@ -13,7 +13,7 @@ echo '*:mod:trace_printk'>  set_ftrace_filter
>   if ! modprobe trace-printk ; then
> echo "No trace-printk sample module - please make 
> CONFIG_SAMPLE_TRACE_PRINTK=
>   m"
> -  exit_unresolved;
> +  exit_unsupported;
>   fi
> 
>   : "Wildcard should be resolved after loading module"
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc 
> b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc
> index d861bd7..4e07c69 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc
> @@ -8,7 +8,7 @@ rmmod trace-printk ||:
>   if ! modprobe trace-printk ; then
> echo "No trace-printk sample module - please make 
> CONFIG_SAMPLE_TRACE_PRINTK=
>   m"
> -  exit_unresolved;
> +  exit_unsupported;
>   fi
> 
>   MOD=trace_printk





Re: [PATCH] kernel/trace: Stop and wait for kthread on preempt irq module unload

2020-04-28 Thread Xiao Yang
: 9606 [#1] PREEMPT SMP
[ 3465.509527] Modules linked in: preemptirq_delay_test(O-) sunrpc vfat
fat ext4 mbcache jbd2 crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce
ip_tables xfs libcrc32c virtio_net net_failover failover virtio_mmio
virtio_blk
[ 3465.510964] CPU: 1 PID: 6708 Comm: rmmod Tainted: GW  O
5.6.0-rc7+ #18
[ 3465.511527] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0
02/06/2015
[ 3465.512031] pstate: 6085 (nZCv daIf -PAN -UAO)
[ 3465.512392] pc : __list_add_valid+0x18/0xa0
[ 3465.512740] lr : wait_for_completion+0xc8/0x178
[ 3465.513085] sp : fe001382fd00
[ 3465.513362] x29: fe001382fd00 x28: fc004876d200
[ 3465.513769] x27:  x26: 
[ 3465.514205] x25:  x24: fc0076e616c8
[ 3465.514638] x23: fe001382fd68 x22: 
[ 3465.515030] x21: fe0011273988 x20: fc0076e616c0
[ 3465.515444] x19: fc0076e616b8 x18: 0010
[ 3465.515826] x17:  x16: 
[ 3465.516184] x15:  x14: fe0011273988
[ 3465.516584] x13: fe009382fa97 x12: fe001382fa9f
[ 3465.516976] x11: fe00112b x10: fe001382fa20
[ 3465.517351] x9 : ffd0 x8 : 6572662d72657466
[ 3465.517750] x7 : 0149 x6 : fe001127cf50
[ 3465.518169] x5 : 0001 x4 : fc0076e616c8
[ 3465.518454] x3 : fe0010128b38 x2 : 
[ 3465.518711] x1 :  x0 : fe001382fd68
[ 3465.518985] Call trace:
[ 3465.519157]  __list_add_valid+0x18/0xa0
[ 3465.519351]  wait_for_completion+0xc8/0x178
[ 3465.519578]  kthread_stop+0x9c/0x278
[ 3465.519779]  preemptirq_delay_exit+0x28/0xfc8c [preemptirq_delay_test]
[ 3465.520109]  __arm64_sys_delete_module+0x14c/0x298
[ 3465.520342]  do_el0_svc+0xf8/0x1e0
[ 3465.520520]  el0_sync_handler+0x134/0x1bc
[ 3465.520718]  el0_sync+0x140/0x180
[ 3465.521177] Code: 910003fd f9400442 eb01005f 54000141 (f9400041)
[ 3465.522258] ---[ end trace 1a8ec2201af5e8c9 ]---
[ 3465.522746] Kernel panic - not syncing: Fatal exception
[ 3465.523242] SMP: stopping secondary CPUs
[ 3465.523898] Kernel Offset: disabled
[ 3465.524423] CPU features: 0x10002,20006082
[ 3465.524939] Memory Limit: none
[ 3465.525534] ---[ end Kernel panic - not syncing: Fatal exception ]---


I am looking into these issues.

Thanks,
Xiao Yang
On 2020/4/25 6:36, Joel Fernandes (Google) wrote:
> Kthread running the test needs to be stopped or it can continue
> executing code unloaded by module causing a crash.
> 
> Suggested-by: Steven Rostedt
> Reported-by: Xiao Yang
> Link: http://lore.kernel.org/r/5ea2b0c8.2080...@cn.fujitsu.com
> Signed-off-by: Joel Fernandes (Google)
> ---
>   kernel/trace/preemptirq_delay_test.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/preemptirq_delay_test.c 
> b/kernel/trace/preemptirq_delay_test.c
> index 31c0fad4cb9e1..1c28ca20e30b6 100644
> --- a/kernel/trace/preemptirq_delay_test.c
> +++ b/kernel/trace/preemptirq_delay_test.c
> @@ -145,10 +145,10 @@ static struct attribute_group attr_group = {
>   };
> 
>   static struct kobject *preemptirq_delay_kobj;
> +static struct task_struct *test_task;
> 
>   static int __init preemptirq_delay_init(void)
>   {
> - struct task_struct *test_task;
>   int retval;
> 
>   test_task = preemptirq_start_test();
> @@ -171,6 +171,9 @@ static int __init preemptirq_delay_init(void)
>   static void __exit preemptirq_delay_exit(void)
>   {
>   kobject_put(preemptirq_delay_kobj);
> +
> + if (test_task)
> + kthread_stop(test_task);
>   }
> 
>   module_init(preemptirq_delay_init)





[PATCH v1 3/3] clk: rockchip: Add clock controller for the RK3308

2019-09-03 Thread Finley Xiao
Add the clock tree definition for the new RK3308 SoC.

Signed-off-by: Finley Xiao 
---
 drivers/clk/rockchip/Makefile |   1 +
 drivers/clk/rockchip/clk-rk3308.c | 955 ++
 drivers/clk/rockchip/clk.h|  13 +
 3 files changed, 969 insertions(+)
 create mode 100644 drivers/clk/rockchip/clk-rk3308.c

diff --git a/drivers/clk/rockchip/Makefile b/drivers/clk/rockchip/Makefile
index ff35ab463a6f..7c5b5813a87c 100644
--- a/drivers/clk/rockchip/Makefile
+++ b/drivers/clk/rockchip/Makefile
@@ -20,6 +20,7 @@ obj-y += clk-rk3128.o
 obj-y  += clk-rk3188.o
 obj-y  += clk-rk3228.o
 obj-y  += clk-rk3288.o
+obj-y  += clk-rk3308.o
 obj-y  += clk-rk3328.o
 obj-y  += clk-rk3368.o
 obj-y  += clk-rk3399.o
diff --git a/drivers/clk/rockchip/clk-rk3308.c 
b/drivers/clk/rockchip/clk-rk3308.c
new file mode 100644
index ..b0baf87a283e
--- /dev/null
+++ b/drivers/clk/rockchip/clk-rk3308.c
@@ -0,0 +1,955 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2019 Rockchip Electronics Co. Ltd.
+ * Author: Finley Xiao 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "clk.h"
+
+#define RK3308_GRF_SOC_STATUS0 0x380
+
+enum rk3308_plls {
+   apll, dpll, vpll0, vpll1,
+};
+
+static struct rockchip_pll_rate_table rk3308_pll_rates[] = {
+   /* _mhz, _refdiv, _fbdiv, _postdiv1, _postdiv2, _dsmpd, _frac */
+   RK3036_PLL_RATE(160800, 1, 67, 1, 1, 1, 0),
+   RK3036_PLL_RATE(158400, 1, 66, 1, 1, 1, 0),
+   RK3036_PLL_RATE(156000, 1, 65, 1, 1, 1, 0),
+   RK3036_PLL_RATE(153600, 1, 64, 1, 1, 1, 0),
+   RK3036_PLL_RATE(151200, 1, 63, 1, 1, 1, 0),
+   RK3036_PLL_RATE(148800, 1, 62, 1, 1, 1, 0),
+   RK3036_PLL_RATE(146400, 1, 61, 1, 1, 1, 0),
+   RK3036_PLL_RATE(144000, 1, 60, 1, 1, 1, 0),
+   RK3036_PLL_RATE(141600, 1, 59, 1, 1, 1, 0),
+   RK3036_PLL_RATE(139200, 1, 58, 1, 1, 1, 0),
+   RK3036_PLL_RATE(136800, 1, 57, 1, 1, 1, 0),
+   RK3036_PLL_RATE(134400, 1, 56, 1, 1, 1, 0),
+   RK3036_PLL_RATE(132000, 1, 55, 1, 1, 1, 0),
+   RK3036_PLL_RATE(129600, 1, 54, 1, 1, 1, 0),
+   RK3036_PLL_RATE(127200, 1, 53, 1, 1, 1, 0),
+   RK3036_PLL_RATE(124800, 1, 52, 1, 1, 1, 0),
+   RK3036_PLL_RATE(12, 1, 50, 1, 1, 1, 0),
+   RK3036_PLL_RATE(118800, 2, 99, 1, 1, 1, 0),
+   RK3036_PLL_RATE(110400, 1, 46, 1, 1, 1, 0),
+   RK3036_PLL_RATE(11, 12, 550, 1, 1, 1, 0),
+   RK3036_PLL_RATE(100800, 1, 84, 2, 1, 1, 0),
+   RK3036_PLL_RATE(10, 6, 500, 2, 1, 1, 0),
+   RK3036_PLL_RATE(98400, 1, 82, 2, 1, 1, 0),
+   RK3036_PLL_RATE(96000, 1, 80, 2, 1, 1, 0),
+   RK3036_PLL_RATE(93600, 1, 78, 2, 1, 1, 0),
+   RK3036_PLL_RATE(91200, 1, 76, 2, 1, 1, 0),
+   RK3036_PLL_RATE(9, 4, 300, 2, 1, 1, 0),
+   RK3036_PLL_RATE(88800, 1, 74, 2, 1, 1, 0),
+   RK3036_PLL_RATE(86400, 1, 72, 2, 1, 1, 0),
+   RK3036_PLL_RATE(84000, 1, 70, 2, 1, 1, 0),
+   RK3036_PLL_RATE(81600, 1, 68, 2, 1, 1, 0),
+   RK3036_PLL_RATE(8, 6, 400, 2, 1, 1, 0),
+   RK3036_PLL_RATE(7, 6, 350, 2, 1, 1, 0),
+   RK3036_PLL_RATE(69600, 1, 58, 2, 1, 1, 0),
+   RK3036_PLL_RATE(62400, 1, 52, 2, 1, 1, 0),
+   RK3036_PLL_RATE(6, 1, 75, 3, 1, 1, 0),
+   RK3036_PLL_RATE(59400, 2, 99, 2, 1, 1, 0),
+   RK3036_PLL_RATE(50400, 1, 63, 3, 1, 1, 0),
+   RK3036_PLL_RATE(5, 6, 250, 2, 1, 1, 0),
+   RK3036_PLL_RATE(40800, 1, 68, 2, 2, 1, 0),
+   RK3036_PLL_RATE(31200, 1, 52, 2, 2, 1, 0),
+   RK3036_PLL_RATE(21600, 1, 72, 4, 2, 1, 0),
+   RK3036_PLL_RATE(9600, 1, 64, 4, 4, 1, 0),
+   { /* sentinel */ },
+};
+
+#define RK3308_DIV_ACLKM_MASK  0x7
+#define RK3308_DIV_ACLKM_SHIFT 12
+#define RK3308_DIV_PCLK_DBG_MASK   0xf
+#define RK3308_DIV_PCLK_DBG_SHIFT  8
+
+#define RK3308_CLKSEL0(_aclk_core, _pclk_dbg)  \
+{  \
+   .reg = RK3308_CLKSEL_CON(0),\
+   .val = HIWORD_UPDATE(_aclk_core, RK3308_DIV_ACLKM_MASK, \
+RK3308_DIV_ACLKM_SHIFT) |  \
+  HIWORD_UPDATE(_pclk_dbg, RK3308_DIV_PCLK_DBG_MASK,   \
+RK3308_DIV_PCLK_DBG_SHIFT),\
+}
+
+#define RK3308_CPUCLK_RATE(_prate, _aclk_core, _pclk_dbg)  \
+{  \
+   .prate = _prate,\
+   .divs = {   \
+   RK3308_CLKSEL0(_aclk_core, _pclk_dbg),  \
+   },  \
+}

[PATCH v1 1/3] dt-bindings: Add bindings for rk3308 clock controller

2019-09-03 Thread Finley Xiao
Add devicetree bindings for Rockchip cru which found on
Rockchip SoCs.

Signed-off-by: Finley Xiao 
---
 .../devicetree/bindings/clock/rockchip,rk3308.txt  | 58 ++
 1 file changed, 58 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/rockchip,rk3308.txt

diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3308.txt 
b/Documentation/devicetree/bindings/clock/rockchip,rk3308.txt
new file mode 100644
index ..5d46c9b7f937
--- /dev/null
+++ b/Documentation/devicetree/bindings/clock/rockchip,rk3308.txt
@@ -0,0 +1,58 @@
+* Rockchip RK3308 Clock and Reset Unit
+
+The RK3308 clock controller generates and supplies clock to various
+controllers within the SoC and also implements a reset controller for SoC
+peripherals.
+
+Required Properties:
+
+- compatible: CRU should be "rockchip,rk3308-cru"
+- reg: physical base address of the controller and length of memory mapped
+  region.
+- #clock-cells: should be 1.
+- #reset-cells: should be 1.
+
+Optional Properties:
+
+- rockchip,grf: phandle to the syscon managing the "general register files"
+  If missing, pll rates are not changeable, due to the missing pll lock status.
+
+Each clock is assigned an identifier and client nodes can use this identifier
+to specify the clock which they consume. All available clocks are defined as
+preprocessor macros in the dt-bindings/clock/rk3308-cru.h headers and can be
+used in device tree sources. Similar macros exist for the reset sources in
+these files.
+
+External clocks:
+
+There are several clocks that are generated outside the SoC. It is expected
+that they are defined using standard clock bindings with following
+clock-output-names:
+ - "xin24m" - crystal input - required,
+ - "xin32k" - rtc clock - optional,
+ - "mclk_i2sx_xch_in" - external I2S or SPDIF clock - optional,
+ - "mac_clkin" - external MAC clock - optional
+
+Example: Clock controller node:
+
+   cru: clock-controller@ff50 {
+   compatible = "rockchip,rk3308-cru";
+   reg = <0x0 0xff50 0x0 0x1000>;
+   rockchip,grf = <>;
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
+
+Example: UART controller node that consumes the clock generated by the clock
+  controller:
+
+   uart0: serial@ff0a {
+   compatible = "rockchip,rk3308-uart", "snps,dw-apb-uart";
+   reg = <0x0 0xff0a 0x0 0x100>;
+   interrupts = ;
+   clocks = < SCLK_UART0>, < PCLK_UART0>;
+   clock-names = "baudclk", "apb_pclk";
+   reg-shift = <2>;
+   reg-io-width = <4>;
+   status = "disabled";
+   };
-- 
2.11.0





[PATCH v1 0/3] clk: rockchip: support clock controller for rk3308 SoC

2019-09-03 Thread Finley Xiao
Finley Xiao (3):
  dt-bindings: Add bindings for rk3308 clock controller
  clk: rockchip: Add dt-binding header for rk3308
  clk: rockchip: Add clock controller for the RK3308

 .../devicetree/bindings/clock/rockchip,rk3308.txt  |  58 ++
 drivers/clk/rockchip/Makefile  |   1 +
 drivers/clk/rockchip/clk-rk3308.c  | 955 +
 drivers/clk/rockchip/clk.h |  13 +
 include/dt-bindings/clock/rk3308-cru.h | 387 +
 5 files changed, 1414 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/clock/rockchip,rk3308.txt
 create mode 100644 drivers/clk/rockchip/clk-rk3308.c
 create mode 100644 include/dt-bindings/clock/rk3308-cru.h

-- 
2.11.0





[PATCH v1 2/3] clk: rockchip: Add dt-binding header for rk3308

2019-09-03 Thread Finley Xiao
Add the dt-bindings header for the rk3308, that gets shared between
the clock controller and the clock references in the dts.

Signed-off-by: Finley Xiao 
---
 include/dt-bindings/clock/rk3308-cru.h | 387 +
 1 file changed, 387 insertions(+)
 create mode 100644 include/dt-bindings/clock/rk3308-cru.h

diff --git a/include/dt-bindings/clock/rk3308-cru.h 
b/include/dt-bindings/clock/rk3308-cru.h
new file mode 100644
index ..d97840f9ee2e
--- /dev/null
+++ b/include/dt-bindings/clock/rk3308-cru.h
@@ -0,0 +1,387 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019 Rockchip Electronics Co. Ltd.
+ * Author: Finley Xiao 
+ */
+
+#ifndef _DT_BINDINGS_CLK_ROCKCHIP_RK3308_H
+#define _DT_BINDINGS_CLK_ROCKCHIP_RK3308_H
+
+/* core clocks */
+#define PLL_APLL   1
+#define PLL_DPLL   2
+#define PLL_VPLL0  3
+#define PLL_VPLL1  4
+#define ARMCLK 5
+
+/* sclk (special clocks) */
+#define USB480M14
+#define SCLK_RTC32K15
+#define SCLK_PVTM_CORE 16
+#define SCLK_UART0 17
+#define SCLK_UART1 18
+#define SCLK_UART2 19
+#define SCLK_UART3 20
+#define SCLK_UART4 21
+#define SCLK_I2C0  22
+#define SCLK_I2C1  23
+#define SCLK_I2C2  24
+#define SCLK_I2C3  25
+#define SCLK_PWM0  26
+#define SCLK_SPI0  27
+#define SCLK_SPI1  28
+#define SCLK_SPI2  29
+#define SCLK_TIMER030
+#define SCLK_TIMER131
+#define SCLK_TIMER232
+#define SCLK_TIMER333
+#define SCLK_TIMER434
+#define SCLK_TIMER535
+#define SCLK_TSADC 36
+#define SCLK_SARADC37
+#define SCLK_OTP   38
+#define SCLK_OTP_USR   39
+#define SCLK_CPU_BOOST 40
+#define SCLK_CRYPTO41
+#define SCLK_CRYPTO_APK42
+#define SCLK_NANDC_DIV 43
+#define SCLK_NANDC_DIV50   44
+#define SCLK_NANDC 45
+#define SCLK_SDMMC_DIV 46
+#define SCLK_SDMMC_DIV50   47
+#define SCLK_SDMMC 48
+#define SCLK_SDMMC_DRV 49
+#define SCLK_SDMMC_SAMPLE  50
+#define SCLK_SDIO_DIV  51
+#define SCLK_SDIO_DIV5052
+#define SCLK_SDIO  53
+#define SCLK_SDIO_DRV  54
+#define SCLK_SDIO_SAMPLE   55
+#define SCLK_EMMC_DIV  56
+#define SCLK_EMMC_DIV5057
+#define SCLK_EMMC  58
+#define SCLK_EMMC_DRV  59
+#define SCLK_EMMC_SAMPLE   60
+#define SCLK_SFC   61
+#define SCLK_OTG_ADP   62
+#define SCLK_MAC_SRC   63
+#define SCLK_MAC   64
+#define SCLK_MAC_REF   65
+#define SCLK_MAC_RX_TX 66
+#define SCLK_MAC_RMII  67
+#define SCLK_DDR_MON_TIMER 68
+#define SCLK_DDR_MON   69
+#define SCLK_DDRCLK70
+#define SCLK_PMU   71
+#define SCLK_USBPHY_REF72
+#define SCLK_WIFI  73
+#define SCLK_PVTM_PMU  74
+#define SCLK_PDM   75
+#define SCLK_I2S0_8CH_TX   76
+#define SCLK_I2S0_8CH_TX_OUT   77
+#define SCLK_I2S0_8CH_RX   78
+#define SCLK_I2S0_8CH_RX_OUT   79
+#define SCLK_I2S1_8CH_TX   80
+#define SCLK_I2S1_8CH_TX_OUT   81
+#define SCLK_I2S1_8CH_RX   82
+#define SCLK_I2S1_8CH_RX_OUT   83
+#define SCLK_I2S2_8CH_TX   84
+#define SCLK_I2S2_8CH_TX_OUT   85
+#define SCLK_I2S2_8CH_RX   86
+#define SCLK_I2S2_8CH_RX_OUT   87
+#define SCLK_I2S3_8CH_TX   88
+#define SCLK_I2S3_8CH_TX_OUT   89
+#define SCLK_I2S3_8CH_RX   90
+#define SCLK_I2S3_8CH_RX_OUT   91
+#define SCLK_I2S0_2CH  92
+#define SCLK_I2S0_2CH_OUT  93
+#define SCLK_I2S1_2CH  94
+#define SCLK_I2S1_2CH_OUT  95
+#define SCLK_SPDIF_TX_DIV  96
+#define SCLK_SPDIF_TX_DIV5097
+#define SCLK_SPDIF_TX  98
+#define SCLK_SPDIF_RX_DIV  99
+#define SCLK_SPDIF_RX_DIV50100
+#define SCLK_SPDIF_RX  101
+#define SCLK_I2S0_8CH_TX_MUX   102
+#define SCLK_I2S0_8CH_RX_MUX   103
+#define SCLK_I2S1_8CH_TX_MUX   104
+#define SCLK_I2S1_8CH_RX_MUX   105
+#define SCLK_I2S2_8CH_TX_MUX   106
+#define SCLK_I2S2_8CH_RX_MUX   107
+#define SCLK_I2S3_8CH_TX_MUX   108
+#define SCLK_I2S3_8CH_RX_MUX   109
+#define SCLK_I2S0_8CH_TX_SRC   110
+#define SCLK_I2S0_8CH_RX_SRC   111
+#define SCLK_I2S1_8CH_TX_SRC   112
+#define SCLK_I2S1_8CH_RX_SRC   113
+#define SCLK_I2S2_8CH_TX_SRC   114
+#define SCLK_I2S2_8CH_RX_SRC   115
+#define SCLK_I2S3_8CH_TX_SRC   116
+#define SCLK_I2S3_8CH_RX_SRC   117
+#define SCLK_I2S0_2CH_SRC  118
+#define SCLK_I2S1_2CH_SRC  119
+#define SCLK_PWM1  120
+#define SCLK_PWM2  121
+#define SCLK_OWIRE 122
+
+/* dclk */
+#define DCLK_VOP   125
+
+/* aclk */
+#define ACLK_BUS_SRC   130
+#define ACLK_BUS

[PATCH] kbuild: Remove unused variable TMPO of ld-option

2019-08-06 Thread Xiao Yang
ld-option implementation has been simplified so variable TMPO
is no longer needed.

Fixes: Commit 0294e6f4a000 ("kbuild: simplify ld-option implementation")
Signed-off-by: Xiao Yang 
---
 scripts/Kbuild.include | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 4b0432e095ae..13a6f627351d 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -91,12 +91,11 @@ TMPOUT := $(if $(KBUILD_EXTMOD),$(firstword 
$(KBUILD_EXTMOD))/)
 # automatically cleaned up.
 try-run = $(shell set -e;  \
TMP="$(TMPOUT)..tmp";   \
-   TMPO="$(TMPOUT)..o";\
if ($(1)) >/dev/null 2>&1;  \
then echo "$(2)";   \
else echo "$(3)";   \
fi; \
-   rm -f "$$TMP" "$$TMPO")
+   rm -f "$$TMP")
 
 # as-option
 # Usage: cflags-y += $(call as-option,-Wa$(comma)-isa=foo,)
-- 
2.21.0




Re: [PATCH v1 1/3] net: hisilicon: make hip04_tx_reclaim non-reentrant

2019-08-05 Thread Jiangfeng Xiao


On 2019/8/6 8:46, Jakub Kicinski wrote:
> On Sat, 3 Aug 2019 20:31:39 +0800, Jiangfeng Xiao wrote:
>> If hip04_tx_reclaim is interrupted while it is running
>> and then __napi_schedule continues to execute
>> hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs
>> and oops is generated. So you need to mask the interrupt
>> during the hip04_tx_reclaim run.
> 
> Napi poll method for the same napi instance can't be run concurrently.
> Could you explain a little more what happens here?
> 
Because netif_napi_add(ndev, >napi, hip04_rx_poll, NAPI_POLL_WEIGHT);
So hip04_rx_poll is a napi instance.
I did not say that hip04_rx_poll has reentered.
I am talking about the reentrant of hip04_tx_reclaim.


Pre-modification code:
static int hip04_rx_poll(struct napi_struct *napi, int budget)
{
[...]
/* enable rx interrupt */
writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);

napi_complete_done(napi, rx);
done:
/* clean up tx descriptors and start a new timer if necessary */
tx_remaining = hip04_tx_reclaim(ndev, false);
[...]
}
hip04_tx_reclaim is executed after "enable rx interrupt" and napi_complete_done.

If hip04_tx_reclaim is interrupted while it is running, and then
__irq_svc->gic_handle_irq->hip04_mac_interrupt->__napi_schedule->hip04_rx_poll->hip04_tx_reclaim


Also looking at hip04_tx_reclaim

static int hip04_tx_reclaim(struct net_device *ndev, bool force)
{
[1] struct hip04_priv *priv = netdev_priv(ndev);
[2] unsigned tx_tail = priv->tx_tail;
[3] [...]
[4] bytes_compl += priv->tx_skb[tx_tail]->len;
[5] dev_kfree_skb(priv->tx_skb[tx_tail]);
[6] priv->tx_skb[tx_tail] = NULL;
[7] tx_tail = TX_NEXT(tx_tail);
[8] [...]
[9] priv->tx_tail = tx_tail;
}

An interrupt occurs if hip04_tx_reclaim just executes to the line 6,
priv->tx_skb[tx_tail] is NULL, and then
__irq_svc->gic_handle_irq->hip04_mac_interrupt->__napi_schedule->hip04_rx_poll->hip04_tx_reclaim

Then hip04_tx_reclaim will handle kernel NULL pointer dereference on line 4.
A reentrant occurs in hip04_tx_reclaim and oops is generated.





My commit is to execute hip04_tx_reclaim before "enable rx interrupt" and 
napi_complete_done.
Then hip04_tx_reclaim can also be protected by the napi poll method so that no 
reentry occurs.

thanks.



[PATCH v1 0/3] net: hisilicon: Fix a few problems with hip04_eth

2019-08-03 Thread Jiangfeng Xiao
During the use of the hip04_eth driver,
several problems were found,
which solved the hip04_tx_reclaim reentry problem,
fixed the problem that hip04_mac_start_xmit never
returns NETDEV_TX_BUSY
and the dma_map_single failed on the arm64 platform.

Jiangfeng Xiao (3):
  net: hisilicon: make hip04_tx_reclaim non-reentrant
  net: hisilicon: fix hip04-xmit never return TX_BUSY
  net: hisilicon: Fix dma_map_single failed on arm64

 drivers/net/ethernet/hisilicon/hip04_eth.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

-- 
1.8.5.6



[PATCH v1 2/3] net: hisilicon: fix hip04-xmit never return TX_BUSY

2019-08-03 Thread Jiangfeng Xiao
TX_DESC_NUM is 256, in tx_count, the maximum value of
mod(TX_DESC_NUM - 1) is 254, the variable "count" in
the hip04_mac_start_xmit function is never equal to
(TX_DESC_NUM - 1), so hip04_mac_start_xmit never
return NETDEV_TX_BUSY.

tx_count is modified to mod(TX_DESC_NUM) so that
the maximum value of tx_count can reach
(TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn
NETDEV_TX_BUSY.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 1e1b154..d775b98 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -248,7 +248,7 @@ struct hip04_priv {
 
 static inline unsigned int tx_count(unsigned int head, unsigned int tail)
 {
-   return (head - tail) % (TX_DESC_NUM - 1);
+   return (head - tail) % TX_DESC_NUM;
 }
 
 static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
-- 
1.8.5.6



[PATCH v1 3/3] net: hisilicon: Fix dma_map_single failed on arm64

2019-08-03 Thread Jiangfeng Xiao
On the arm64 platform, executing "ifconfig eth0 up" will fail,
returning "ifconfig: SIOCSIFFLAGS: Input/output error."

ndev->dev is not initialized, dma_map_single->get_dma_ops->
dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE
directly, so when we use dma_map_single, the first parameter
is to use the device of platform_device.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index d775b98..c841674 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -220,6 +220,7 @@ struct hip04_priv {
unsigned int reg_inten;
 
struct napi_struct napi;
+   struct device *dev;
struct net_device *ndev;
 
struct tx_desc *tx_desc;
@@ -465,7 +466,7 @@ static int hip04_tx_reclaim(struct net_device *ndev, bool 
force)
}
 
if (priv->tx_phys[tx_tail]) {
-   dma_unmap_single(>dev, priv->tx_phys[tx_tail],
+   dma_unmap_single(priv->dev, priv->tx_phys[tx_tail],
 priv->tx_skb[tx_tail]->len,
 DMA_TO_DEVICE);
priv->tx_phys[tx_tail] = 0;
@@ -516,8 +517,8 @@ static void hip04_start_tx_timer(struct hip04_priv *priv)
return NETDEV_TX_BUSY;
}
 
-   phys = dma_map_single(>dev, skb->data, skb->len, DMA_TO_DEVICE);
-   if (dma_mapping_error(>dev, phys)) {
+   phys = dma_map_single(priv->dev, skb->data, skb->len, DMA_TO_DEVICE);
+   if (dma_mapping_error(priv->dev, phys)) {
dev_kfree_skb(skb);
return NETDEV_TX_OK;
}
@@ -596,7 +597,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int 
budget)
goto refill;
}
 
-   dma_unmap_single(>dev, priv->rx_phys[priv->rx_head],
+   dma_unmap_single(priv->dev, priv->rx_phys[priv->rx_head],
 RX_BUF_SIZE, DMA_FROM_DEVICE);
priv->rx_phys[priv->rx_head] = 0;
 
@@ -625,9 +626,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int 
budget)
buf = netdev_alloc_frag(priv->rx_buf_size);
if (!buf)
goto done;
-   phys = dma_map_single(>dev, buf,
+   phys = dma_map_single(priv->dev, buf,
  RX_BUF_SIZE, DMA_FROM_DEVICE);
-   if (dma_mapping_error(>dev, phys))
+   if (dma_mapping_error(priv->dev, phys))
goto done;
priv->rx_buf[priv->rx_head] = buf;
priv->rx_phys[priv->rx_head] = phys;
@@ -730,9 +731,9 @@ static int hip04_mac_open(struct net_device *ndev)
for (i = 0; i < RX_DESC_NUM; i++) {
dma_addr_t phys;
 
-   phys = dma_map_single(>dev, priv->rx_buf[i],
+   phys = dma_map_single(priv->dev, priv->rx_buf[i],
  RX_BUF_SIZE, DMA_FROM_DEVICE);
-   if (dma_mapping_error(>dev, phys))
+   if (dma_mapping_error(priv->dev, phys))
return -EIO;
 
priv->rx_phys[i] = phys;
@@ -766,7 +767,7 @@ static int hip04_mac_stop(struct net_device *ndev)
 
for (i = 0; i < RX_DESC_NUM; i++) {
if (priv->rx_phys[i]) {
-   dma_unmap_single(>dev, priv->rx_phys[i],
+   dma_unmap_single(priv->dev, priv->rx_phys[i],
 RX_BUF_SIZE, DMA_FROM_DEVICE);
priv->rx_phys[i] = 0;
}
@@ -909,6 +910,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
return -ENOMEM;
 
priv = netdev_priv(ndev);
+   priv->dev = d;
priv->ndev = ndev;
platform_set_drvdata(pdev, ndev);
SET_NETDEV_DEV(ndev, >dev);
-- 
1.8.5.6



[PATCH v1 1/3] net: hisilicon: make hip04_tx_reclaim non-reentrant

2019-08-03 Thread Jiangfeng Xiao
If hip04_tx_reclaim is interrupted while it is running
and then __napi_schedule continues to execute
hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs
and oops is generated. So you need to mask the interrupt
during the hip04_tx_reclaim run.

The kernel oops exception stack is as follows:

Unable to handle kernel NULL pointer dereference
at virtual address 0050
pgd = c0003000
[0050] *pgd=8000a04003, *pmd=
Internal error: Oops: 206 [#1] SMP ARM
Modules linked in: hip04_eth mtdblock mtd_blkdevs mtd
ohci_platform ehci_platform ohci_hcd ehci_hcd
vfat fat sd_mod usb_storage scsi_mod usbcore usb_common
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   O4.4.185 #1
Hardware name: Hisilicon A15
task: c0a250e0 task.stack: c0a0
PC is at hip04_tx_reclaim+0xe0/0x17c [hip04_eth]
LR is at hip04_tx_reclaim+0x30/0x17c [hip04_eth]
pc : []lr : []psr: 600e0313
sp : c0a01d88  ip :   fp : c0601f9c
r10:   r9 : c3482380  r8 : 0001
r7 :   r6 : 00e1  r5 : c3482000  r4 : 000c
r3 : f2209800  r2 :   r1 :   r0 : 
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 32c5387d  Table: 03d28c80  DAC: 
Process swapper/0 (pid: 0, stack limit = 0xc0a00190)
Stack: (0xc0a01d88 to 0xc0a02000)
[] (hip04_tx_reclaim [hip04_eth]) from []
(hip04_rx_poll+0x88/0x368 
[hip04_eth])
[] (hip04_rx_poll [hip04_eth]) from [] 
(net_rx_action+0x114/0x34c)
[] (net_rx_action) from [] (__do_softirq+0x218/0x318)
[] (__do_softirq) from [] (irq_exit+0x88/0xac)
[] (irq_exit) from [] (msa_irq_exit+0x11c/0x1d4)
[] (msa_irq_exit) from [] (__handle_domain_irq+0x110/0x148)
[] (__handle_domain_irq) from [] (gic_handle_irq+0xd4/0x118)
[] (gic_handle_irq) from [] (__irq_svc+0x40/0x58)
Exception stack(0xc0a01f30 to 0xc0a01f78)
1f20: c0ae8b40   
1f40: 0002 e000 c0601f9c   c0a2257c c0a22440 c0831a38
1f60: c0a01ec4 c0a01f80 c0203714 c0203718 600e0213 
[] (__irq_svc) from [] (arch_cpu_idle+0x20/0x3c)
[] (arch_cpu_idle) from [] (cpu_startup_entry+0x244/0x29c)
[] (cpu_startup_entry) from [] (rest_init+0xc8/0x10c)
[] (rest_init) from [] (start_kernel+0x468/0x514)
Code: a40599e5 016086e2 018088e2 7660efe6 (503090e5)
---[ end trace 1db21d6d09c49d74 ]---
Kernel panic - not syncing: Fatal exception in interrupt
CPU3: stopping
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G  DO4.4.185 #1

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index d604528..1e1b154 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -585,6 +585,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int 
budget)
u16 len;
u32 err;
 
+   /* clean up tx descriptors */
+   tx_remaining = hip04_tx_reclaim(ndev, false);
+
while (cnt && !last) {
buf = priv->rx_buf[priv->rx_head];
skb = build_skb(buf, priv->rx_buf_size);
@@ -645,8 +648,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int 
budget)
}
napi_complete_done(napi, rx);
 done:
-   /* clean up tx descriptors and start a new timer if necessary */
-   tx_remaining = hip04_tx_reclaim(ndev, false);
+   /* start a new timer if necessary */
if (rx < budget && tx_remaining)
hip04_start_tx_timer(priv);
 
-- 
1.8.5.6



[PATCH] net: hisilicon: Use devm_platform_ioremap_resource

2019-07-12 Thread Jiangfeng Xiao
Use devm_platform_ioremap_resource instead of
devm_ioremap_resource. Make the code simpler.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c| 7 ++-
 drivers/net/ethernet/hisilicon/hisi_femac.c   | 7 ++-
 drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 7 ++-
 drivers/net/ethernet/hisilicon/hns_mdio.c | 4 +---
 4 files changed, 7 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 6256357..d604528 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -899,7 +899,6 @@ static int hip04_mac_probe(struct platform_device *pdev)
struct of_phandle_args arg;
struct net_device *ndev;
struct hip04_priv *priv;
-   struct resource *res;
int irq;
int ret;
 
@@ -912,16 +911,14 @@ static int hip04_mac_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, ndev);
SET_NETDEV_DEV(ndev, >dev);
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   priv->base = devm_ioremap_resource(d, res);
+   priv->base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(priv->base)) {
ret = PTR_ERR(priv->base);
goto init_fail;
}
 
 #if defined(CONFIG_HI13X1_GMAC)
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
-   priv->sysctrl_base = devm_ioremap_resource(d, res);
+   priv->sysctrl_base = devm_platform_ioremap_resource(pdev, 1);
if (IS_ERR(priv->sysctrl_base)) {
ret = PTR_ERR(priv->sysctrl_base);
goto init_fail;
diff --git a/drivers/net/ethernet/hisilicon/hisi_femac.c 
b/drivers/net/ethernet/hisilicon/hisi_femac.c
index d2e019d..689f18e 100644
--- a/drivers/net/ethernet/hisilicon/hisi_femac.c
+++ b/drivers/net/ethernet/hisilicon/hisi_femac.c
@@ -781,7 +781,6 @@ static int hisi_femac_drv_probe(struct platform_device 
*pdev)
 {
struct device *dev = >dev;
struct device_node *node = dev->of_node;
-   struct resource *res;
struct net_device *ndev;
struct hisi_femac_priv *priv;
struct phy_device *phy;
@@ -799,15 +798,13 @@ static int hisi_femac_drv_probe(struct platform_device 
*pdev)
priv->dev = dev;
priv->ndev = ndev;
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   priv->port_base = devm_ioremap_resource(dev, res);
+   priv->port_base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(priv->port_base)) {
ret = PTR_ERR(priv->port_base);
goto out_free_netdev;
}
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
-   priv->glb_base = devm_ioremap_resource(dev, res);
+   priv->glb_base = devm_platform_ioremap_resource(pdev, 1);
if (IS_ERR(priv->glb_base)) {
ret = PTR_ERR(priv->glb_base);
goto out_free_netdev;
diff --git a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c 
b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
index 89ef764..3499705 100644
--- a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
@@ -1097,7 +1097,6 @@ static int hix5hd2_dev_probe(struct platform_device *pdev)
const struct of_device_id *of_id = NULL;
struct net_device *ndev;
struct hix5hd2_priv *priv;
-   struct resource *res;
struct mii_bus *bus;
const char *mac_addr;
int ret;
@@ -1119,15 +1118,13 @@ static int hix5hd2_dev_probe(struct platform_device 
*pdev)
}
priv->hw_cap = (unsigned long)of_id->data;
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   priv->base = devm_ioremap_resource(dev, res);
+   priv->base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(priv->base)) {
ret = PTR_ERR(priv->base);
goto out_free_netdev;
}
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
-   priv->ctrl_base = devm_ioremap_resource(dev, res);
+   priv->ctrl_base = devm_platform_ioremap_resource(pdev, 1);
if (IS_ERR(priv->ctrl_base)) {
ret = PTR_ERR(priv->ctrl_base);
goto out_free_netdev;
diff --git a/drivers/net/ethernet/hisilicon/hns_mdio.c 
b/drivers/net/ethernet/hisilicon/hns_mdio.c
index 918cab1..3e863a7 100644
--- a/drivers/net/ethernet/hisilicon/hns_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns_mdio.c
@@ -417,7 +417,6 @@ static int hns_mdio_probe(struct platform_device *pdev)
 {
struct hns_mdio_device *mdio_dev;
struct mii_bus *new_bus;
-   struct resource *res;
int ret = -ENODEV;
 
if (!pdev) {
@@ -442,8 +441,7 @@ static int hns_mdio_probe(struct platform_device *pdev)
new_bus->priv = mdio_dev;
new_bus-

Re: [PATCH v2 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first

2019-07-09 Thread Jiangfeng Xiao



On 2019/7/9 21:48, Jiangfeng Xiao wrote:
> 
> 
> On 2019/7/9 17:35, Sergei Shtylyov wrote:
>> Hello!
>>
>> On 09.07.2019 6:31, Jiangfeng Xiao wrote:
>>
[...]
>>> @@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device 
>>> *pdev)
>>>   goto init_fail;
>>>   }
>>>   +#if defined(CONFIG_HI13X1_GMAC)
>>> +res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>>> +priv->sysctrl_base = devm_ioremap_resource(d, res);
>>
>>There's devm_platform_ioremap_resource() now.
> 
> Thank you for your review, Great issue, which makes my code more concise.
> 
> I will fix it in v3. Or submit a patch to modify it separately, if maintainer
> applies this patch series.
> 
I decided to wait for this series of patches to sync to the mainline
and then fix this based on the mainline.

Because the mainline does not currently have this part of the code,
if I submit the changes, and the patch is accidentally merged into
another branch or another maintainer to handle, a conflict will occur.

As we all know, maintianer has to deal with many commits every day,
I don't want to increase the burden of maintainer.

So I decided to wait until the patch is synced to the mainline
and then modify it, which is more safe.



Re: [PATCH v2 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first

2019-07-09 Thread Jiangfeng Xiao



On 2019/7/9 17:35, Sergei Shtylyov wrote:
> Hello!
> 
> On 09.07.2019 6:31, Jiangfeng Xiao wrote:
> 
>> HI13X1_GMAC delete request for soft reset at first,
>> otherwise, the subsequent initialization will not
>> take effect.
>>
>> Signed-off-by: Jiangfeng Xiao 
>> ---
>>   drivers/net/ethernet/hisilicon/hip04_eth.c | 24 
>>   1 file changed, 24 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
>> b/drivers/net/ethernet/hisilicon/hip04_eth.c
>> index fe61b01..19d8cfd 100644
>> --- a/drivers/net/ethernet/hisilicon/hip04_eth.c
>> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
> [...]
>> @@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device *pdev)
>>   goto init_fail;
>>   }
>>   +#if defined(CONFIG_HI13X1_GMAC)
>> +res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>> +priv->sysctrl_base = devm_ioremap_resource(d, res);
> 
>There's devm_platform_ioremap_resource() now.

Thank you for your review, Great issue, which makes my code more concise.

I will fix it in v3. Or submit a patch to modify it separately, if maintainer
applies this patch series.



[PATCH v2 00/10] net: hisilicon: Add support for HI13X1 to hip04_eth

2019-07-08 Thread Jiangfeng Xiao
The main purpose of this patch series is to extend the
hip04_eth driver to support HI13X1_GMAC.

The offset and bitmap of some registers of HI13X1_GMAC
are different from hip04_eth common soc. In addition,
the definition of send descriptor and parsing descriptor
are different from hip04_eth common soc. So the macro
of the register offset is redefined to adapt the HI13X1_GMAC.

Clean up the sparse warning by the way.

Change since v1:
* Add a cover letter.

Jiangfeng Xiao (10):
  net: hisilicon: Add support for HI13X1 to hip04_eth
  net: hisilicon: Cleanup for got restricted __be32
  net: hisilicon: Cleanup for cast to restricted __be32
  net: hisilicon: HI13X1_GMAX skip write LOCAL_PAGE_REG
  net: hisilicon: HI13X1_GMAX need dreq reset at first
  net: hisilicon: dt-bindings: Add an field of port-handle
  net: hisilicon: Add group field to adapt HI13X1_GMAC
  net: hisilicon: Offset buf address to adapt HI13X1_GMAC
  net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
  net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC

 .../bindings/net/hisilicon-hip04-net.txt   |   7 +-
 drivers/net/ethernet/hisilicon/Kconfig |  10 ++
 drivers/net/ethernet/hisilicon/hip04_eth.c | 142 ++---
 3 files changed, 136 insertions(+), 23 deletions(-)

-- 
1.8.5.6



[PATCH v2 10/10] net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC

2019-07-08 Thread Jiangfeng Xiao
HI13X1 changed the offsets and bitmaps for tx_desc
registers in the same peripheral device on different
models of the hip04_eth.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 780fc46..6256357 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -76,8 +76,15 @@
 /* TX descriptor config */
 #define TX_FREE_MEMBIT(0)
 #define TX_READ_ALLOC_L3   BIT(1)
-#define TX_FINISH_CACHE_INVBIT(2)
+#if defined(CONFIG_HI13X1_GMAC)
+#define TX_CLEAR_WBBIT(7)
+#define TX_RELEASE_TO_PPE  BIT(4)
+#define TX_FINISH_CACHE_INVBIT(6)
+#define TX_POOL_SHIFT  16
+#else
 #define TX_CLEAR_WBBIT(4)
+#define TX_FINISH_CACHE_INVBIT(2)
+#endif
 #define TX_L3_CHECKSUM BIT(5)
 #define TX_LOOP_BACK   BIT(11)
 
@@ -124,6 +131,7 @@
 /* buf unit size is cache_line_size, which is 64, so the shift is 6 */
 #define PPE_BUF_SIZE_SHIFT 6
 #define PPE_TX_BUF_HOLDBIT(31)
+#define CACHE_LINE_MASK0x3F
 #else
 #define PPE_CFG_QOS_VMID_GRP_SHIFT 8
 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
@@ -163,11 +171,22 @@
 #define HIP04_MIN_TX_COALESCE_FRAMES   100
 
 struct tx_desc {
+#if defined(CONFIG_HI13X1_GMAC)
+   u32 reserved1[2];
+   u32 send_addr;
+   u16 send_size;
+   u16 data_offset;
+   u32 reserved2[7];
+   u32 cfg;
+   u32 wb_addr;
+   u32 reserved3[3];
+#else
u32 send_addr;
u32 send_size;
u32 next_addr;
u32 cfg;
u32 wb_addr;
+#endif
 } __aligned(64);
 
 struct rx_desc {
@@ -505,11 +524,20 @@ static void hip04_start_tx_timer(struct hip04_priv *priv)
 
priv->tx_skb[tx_head] = skb;
priv->tx_phys[tx_head] = phys;
-   desc->send_addr = (__force u32)cpu_to_be32(phys);
+
desc->send_size = (__force u32)cpu_to_be32(skb->len);
+#if defined(CONFIG_HI13X1_GMAC)
+   desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV
+   | TX_RELEASE_TO_PPE | priv->port << TX_POOL_SHIFT);
+   desc->data_offset = (__force u32)cpu_to_be32(phys & CACHE_LINE_MASK);
+   desc->send_addr =  (__force u32)cpu_to_be32(phys & ~CACHE_LINE_MASK);
+#else
desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+   desc->send_addr = (__force u32)cpu_to_be32(phys);
+#endif
phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
-   desc->wb_addr = (__force u32)cpu_to_be32(phys);
+   desc->wb_addr = (__force u32)cpu_to_be32(phys +
+   offsetof(struct tx_desc, send_addr));
skb_tx_timestamp(skb);
 
hip04_set_xmit_desc(priv, phys);
-- 
1.8.5.6



[PATCH v2 04/10] net: hisilicon: HI13X1_GMAX skip write LOCAL_PAGE_REG

2019-07-08 Thread Jiangfeng Xiao
HI13X1_GMAC changed the offsets and bitmaps for
GE_TX_LOCAL_PAGE_REG registers in the same peripheral
device on different models of the hip04_eth. With the
default configuration, HI13X1_GMAC can also work without
any writes to the GE_TX_LOCAL_PAGE_REG register.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index d8f0619..fe61b01 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -308,8 +308,10 @@ static void hip04_config_fifo(struct hip04_priv *priv)
val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
 
+#ifndef CONFIG_HI13X1_GMAC
val = GE_AUTO_NEG_CTL;
writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+#endif
 }
 
 static void hip04_mac_enable(struct net_device *ndev)
-- 
1.8.5.6



[PATCH v2 07/10] net: hisilicon: Add group field to adapt HI13X1_GMAC

2019-07-08 Thread Jiangfeng Xiao
In general, group is the same as the port, but some
boards specify a special group for better load
balancing of each processing unit.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 19d8cfd..5328219 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -178,6 +178,7 @@ struct hip04_priv {
int phy_mode;
int chan;
unsigned int port;
+   unsigned int group;
unsigned int speed;
unsigned int duplex;
unsigned int reg_inten;
@@ -278,10 +279,10 @@ static void hip04_config_fifo(struct hip04_priv *priv)
val |= PPE_CFG_STS_RX_PKT_CNT_RC;
writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
 
-   val = BIT(priv->port);
+   val = BIT(priv->group);
regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
 
-   val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
+   val = priv->group << PPE_CFG_QOS_VMID_GRP_SHIFT;
val |= PPE_CFG_QOS_VMID_MODE;
writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
 
@@ -876,7 +877,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
}
 #endif
 
-   ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, );
+   ret = of_parse_phandle_with_fixed_args(node, "port-handle", 3, 0, );
if (ret < 0) {
dev_warn(d, "no port-handle\n");
goto init_fail;
@@ -884,6 +885,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
 
priv->port = arg.args[0];
priv->chan = arg.args[1] * RX_DESC_NUM;
+   priv->group = arg.args[2];
 
hrtimer_init(>tx_coalesce_timer, CLOCK_MONOTONIC, 
HRTIMER_MODE_REL);
 
-- 
1.8.5.6



[PATCH v2 09/10] net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC

2019-07-08 Thread Jiangfeng Xiao
HI13X1 changed the offsets and bitmaps for rx_desc
registers in the same peripheral device on different
models of the hip04_eth.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index c578934..780fc46 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -171,11 +171,20 @@ struct tx_desc {
 } __aligned(64);
 
 struct rx_desc {
+#if defined(CONFIG_HI13X1_GMAC)
+   u32 reserved1[3];
+   u16 pkt_len;
+   u16 reserved_16;
+   u32 reserved2[6];
+   u32 pkt_err;
+   u32 reserved3[5];
+#else
u16 reserved_16;
u16 pkt_len;
u32 reserve1[3];
u32 pkt_err;
u32 reserve2[4];
+#endif
 };
 
 struct hip04_priv {
-- 
1.8.5.6



[PATCH v2 01/10] net: hisilicon: Add support for HI13X1 to hip04_eth

2019-07-08 Thread Jiangfeng Xiao
Extend the hip04_eth driver to support HI13X1_GMAC.
Enable it with CONFIG_HI13X1_GMAC option.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/Kconfig | 10 
 drivers/net/ethernet/hisilicon/hip04_eth.c | 37 --
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index a0d780c..3892a20 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -46,6 +46,16 @@ config HIP04_ETH
  If you wish to compile a kernel for a hardware with hisilicon p04 SoC 
and
  want to use the internal ethernet then you should answer Y to this.
 
+config HI13X1_GMAC
+   bool "Hisilicon HI13X1 Network Device Support"
+   depends on HIP04_ETH
+   help
+ If you wish to compile a kernel for a hardware with hisilicon 
hi13x1_gamc
+ then you should answer Y to this. This makes this driver suitable for 
use
+ on certain boards such as the HI13X1.
+
+ If you are unsure, say N.
+
 config HNS_MDIO
tristate
select PHYLIB
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index e1f2978..2b5112b 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -33,10 +33,23 @@
 #define GE_MODE_CHANGE_REG 0x1b4
 #define GE_RECV_CONTROL_REG0x1e0
 #define GE_STATION_MAC_ADDRESS 0x210
-#define PPE_CFG_CPU_ADD_ADDR   0x580
-#define PPE_CFG_MAX_FRAME_LEN_REG  0x408
+
 #define PPE_CFG_BUS_CTRL_REG   0x424
 #define PPE_CFG_RX_CTRL_REG0x428
+
+#if defined(CONFIG_HI13X1_GMAC)
+#define PPE_CFG_CPU_ADD_ADDR   0x6D0
+#define PPE_CFG_MAX_FRAME_LEN_REG  0x500
+#define PPE_CFG_RX_PKT_MODE_REG0x504
+#define PPE_CFG_QOS_VMID_GEN   0x520
+#define PPE_CFG_RX_PKT_INT 0x740
+#define PPE_INTEN  0x700
+#define PPE_INTSTS 0x708
+#define PPE_RINT   0x704
+#define PPE_CFG_STS_MODE   0x880
+#else
+#define PPE_CFG_CPU_ADD_ADDR   0x580
+#define PPE_CFG_MAX_FRAME_LEN_REG  0x408
 #define PPE_CFG_RX_PKT_MODE_REG0x438
 #define PPE_CFG_QOS_VMID_GEN   0x500
 #define PPE_CFG_RX_PKT_INT 0x538
@@ -44,6 +57,8 @@
 #define PPE_INTSTS 0x608
 #define PPE_RINT   0x604
 #define PPE_CFG_STS_MODE   0x700
+#endif /* CONFIG_HI13X1_GMAC */
+
 #define PPE_HIS_RX_PKT_CNT 0x804
 
 /* REG_INTERRUPT */
@@ -93,18 +108,26 @@
 #define GE_RX_PORT_EN  BIT(1)
 #define GE_TX_PORT_EN  BIT(2)
 
-#define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(12)
-
 #define PPE_CFG_RX_PKT_ALIGN   BIT(18)
-#define PPE_CFG_QOS_VMID_MODE  BIT(14)
+
+#if defined(CONFIG_HI13X1_GMAC)
+#define PPE_CFG_QOS_VMID_GRP_SHIFT 4
+#define PPE_CFG_RX_CTRL_ALIGN_SHIFT7
+#define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(0)
+#define PPE_CFG_QOS_VMID_MODE  BIT(15)
+#define PPE_CFG_BUS_LOCAL_REL  (BIT(9) | BIT(15) | BIT(19) | BIT(23))
+#else
 #define PPE_CFG_QOS_VMID_GRP_SHIFT 8
+#define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
+#define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(12)
+#define PPE_CFG_QOS_VMID_MODE  BIT(14)
+#define PPE_CFG_BUS_LOCAL_REL  BIT(14)
+#endif /* CONFIG_HI13X1_GMAC */
 
 #define PPE_CFG_RX_FIFO_FSFU   BIT(11)
 #define PPE_CFG_RX_DEPTH_SHIFT 16
 #define PPE_CFG_RX_START_SHIFT 0
-#define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
 
-#define PPE_CFG_BUS_LOCAL_REL  BIT(14)
 #define PPE_CFG_BUS_BIG_ENDIEN BIT(0)
 
 #define RX_DESC_NUM128
-- 
1.8.5.6



[PATCH v2 08/10] net: hisilicon: Offset buf address to adapt HI13X1_GMAC

2019-07-08 Thread Jiangfeng Xiao
The buf unit size of HI13X1_GMAC is cache_line_size,
which is 64, so the address we write to the buf register
needs to be shifted right by 6 bits.

The 31st bit of the PPE_CFG_CPU_ADD_ADDR register
of HI13X1_GMAC indicates whether to release the buffer
of the message, and the low indicates that it is valid.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 5328219..c578934 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -120,12 +120,20 @@
 #define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(0)
 #define PPE_CFG_QOS_VMID_MODE  BIT(15)
 #define PPE_CFG_BUS_LOCAL_REL  (BIT(9) | BIT(15) | BIT(19) | BIT(23))
+
+/* buf unit size is cache_line_size, which is 64, so the shift is 6 */
+#define PPE_BUF_SIZE_SHIFT 6
+#define PPE_TX_BUF_HOLDBIT(31)
 #else
 #define PPE_CFG_QOS_VMID_GRP_SHIFT 8
 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
 #define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(12)
 #define PPE_CFG_QOS_VMID_MODE  BIT(14)
 #define PPE_CFG_BUS_LOCAL_REL  BIT(14)
+
+/* buf unit size is 1, so the shift is 6 */
+#define PPE_BUF_SIZE_SHIFT 0
+#define PPE_TX_BUF_HOLD0
 #endif /* CONFIG_HI13X1_GMAC */
 
 #define PPE_CFG_RX_FIFO_FSFU   BIT(11)
@@ -286,7 +294,7 @@ static void hip04_config_fifo(struct hip04_priv *priv)
val |= PPE_CFG_QOS_VMID_MODE;
writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
 
-   val = RX_BUF_SIZE;
+   val = RX_BUF_SIZE >> PPE_BUF_SIZE_SHIFT;
regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
 
val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
@@ -369,12 +377,18 @@ static void hip04_mac_disable(struct net_device *ndev)
 
 static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
 {
-   writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
+   u32 val;
+
+   val = phys >> PPE_BUF_SIZE_SHIFT | PPE_TX_BUF_HOLD;
+   writel(val, priv->base + PPE_CFG_CPU_ADD_ADDR);
 }
 
 static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
 {
-   regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+   u32 val;
+
+   val = phys >> PPE_BUF_SIZE_SHIFT;
+   regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, val);
 }
 
 static u32 hip04_recv_cnt(struct hip04_priv *priv)
-- 
1.8.5.6



[PATCH v2 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first

2019-07-08 Thread Jiangfeng Xiao
HI13X1_GMAC delete request for soft reset at first,
otherwise, the subsequent initialization will not
take effect.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index fe61b01..19d8cfd 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 
+#define SC_PPE_RESET_DREQ  0x026C
+
 #define PPE_CFG_RX_ADDR0x100
 #define PPE_CFG_POOL_GRP   0x300
 #define PPE_CFG_RX_BUF_SIZE0x400
@@ -61,6 +63,8 @@
 
 #define PPE_HIS_RX_PKT_CNT 0x804
 
+#define RESET_DREQ_ALL 0x
+
 /* REG_INTERRUPT */
 #define RCV_INTBIT(10)
 #define RCV_NOBUF  BIT(8)
@@ -168,6 +172,9 @@ struct rx_desc {
 
 struct hip04_priv {
void __iomem *base;
+#if defined(CONFIG_HI13X1_GMAC)
+   void __iomem *sysctrl_base;
+#endif
int phy_mode;
int chan;
unsigned int port;
@@ -244,6 +251,13 @@ static void hip04_config_port(struct net_device *ndev, u32 
speed, u32 duplex)
writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
 }
 
+static void hip04_reset_dreq(struct hip04_priv *priv)
+{
+#if defined(CONFIG_HI13X1_GMAC)
+   writel_relaxed(RESET_DREQ_ALL, priv->sysctrl_base + SC_PPE_RESET_DREQ);
+#endif
+}
+
 static void hip04_reset_ppe(struct hip04_priv *priv)
 {
u32 val, tmp, timeout = 0;
@@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device *pdev)
goto init_fail;
}
 
+#if defined(CONFIG_HI13X1_GMAC)
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+   priv->sysctrl_base = devm_ioremap_resource(d, res);
+   if (IS_ERR(priv->sysctrl_base)) {
+   ret = PTR_ERR(priv->sysctrl_base);
+   goto init_fail;
+   }
+#endif
+
ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, );
if (ret < 0) {
dev_warn(d, "no port-handle\n");
@@ -921,6 +944,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
ndev->irq = irq;
netif_napi_add(ndev, >napi, hip04_rx_poll, NAPI_POLL_WEIGHT);
 
+   hip04_reset_dreq(priv);
hip04_reset_ppe(priv);
if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
-- 
1.8.5.6



[PATCH v2 06/10] net: hisilicon: dt-bindings: Add an field of port-handle

2019-07-08 Thread Jiangfeng Xiao
In general, group is the same as the port, but some
boards specify a special group for better load
balancing of each processing unit.

Signed-off-by: Jiangfeng Xiao 
---
 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt 
b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
index d1df8a0..464c0da 100644
--- a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -10,6 +10,7 @@ Required properties:
phandle, specifies a reference to the syscon ppe node
port, port number connected to the controller
channel, recv channel start from channel * number (RX_DESC_NUM)
+   group, field in the pkg desc, in general, it is the same as the port.
 - phy-mode: see ethernet.txt [1].
 
 Optional properties:
@@ -66,7 +67,7 @@ Example:
reg = <0x28b 0x1>;
interrupts = <0 413 4>;
phy-mode = "mii";
-   port-handle = < 31 0>;
+   port-handle = < 31 0 31>;
};
 
ge0: ethernet@280 {
@@ -74,7 +75,7 @@ Example:
reg = <0x280 0x1>;
interrupts = <0 402 4>;
phy-mode = "sgmii";
-   port-handle = < 0 1>;
+   port-handle = < 0 1 0>;
phy-handle = <>;
};
 
@@ -83,6 +84,6 @@ Example:
reg = <0x288 0x1>;
interrupts = <0 410 4>;
phy-mode = "sgmii";
-   port-handle = < 8 2>;
+   port-handle = < 8 2 8>;
phy-handle = <>;
};
-- 
1.8.5.6



[PATCH v2 03/10] net: hisilicon: Cleanup for cast to restricted __be32

2019-07-08 Thread Jiangfeng Xiao
This patch fixes the following warning from sparse:
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 31f13cf..d8f0619 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -530,8 +530,8 @@ static int hip04_rx_poll(struct napi_struct *napi, int 
budget)
priv->rx_phys[priv->rx_head] = 0;
 
desc = (struct rx_desc *)skb->data;
-   len = be16_to_cpu(desc->pkt_len);
-   err = be32_to_cpu(desc->pkt_err);
+   len = be16_to_cpu((__force __be16)desc->pkt_len);
+   err = be32_to_cpu((__force __be32)desc->pkt_err);
 
if (0 == len) {
dev_kfree_skb_any(skb);
-- 
1.8.5.6



[PATCH v2 02/10] net: hisilicon: Cleanup for got restricted __be32

2019-07-08 Thread Jiangfeng Xiao
This patch fixes the following warning from sparse:
hip04_eth.c:468:25: warning: incorrect type in assignment
hip04_eth.c:468:25:expected unsigned int [usertype] send_addr
hip04_eth.c:468:25:got restricted __be32 [usertype]
hip04_eth.c:469:25: warning: incorrect type in assignment
hip04_eth.c:469:25:expected unsigned int [usertype] send_size
hip04_eth.c:469:25:got restricted __be32 [usertype]
hip04_eth.c:470:19: warning: incorrect type in assignment
hip04_eth.c:470:19:expected unsigned int [usertype] cfg
hip04_eth.c:470:19:got restricted __be32 [usertype]
hip04_eth.c:472:23: warning: incorrect type in assignment
hip04_eth.c:472:23:expected unsigned int [usertype] wb_addr
hip04_eth.c:472:23:got restricted __be32 [usertype]

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 2b5112b..31f13cf 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -465,11 +465,11 @@ static void hip04_start_tx_timer(struct hip04_priv *priv)
 
priv->tx_skb[tx_head] = skb;
priv->tx_phys[tx_head] = phys;
-   desc->send_addr = cpu_to_be32(phys);
-   desc->send_size = cpu_to_be32(skb->len);
-   desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+   desc->send_addr = (__force u32)cpu_to_be32(phys);
+   desc->send_size = (__force u32)cpu_to_be32(skb->len);
+   desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
-   desc->wb_addr = cpu_to_be32(phys);
+   desc->wb_addr = (__force u32)cpu_to_be32(phys);
skb_tx_timestamp(skb);
 
hip04_set_xmit_desc(priv, phys);
-- 
1.8.5.6



Re: [PATCH] net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC

2019-07-08 Thread Jiangfeng Xiao



On 2019/7/9 2:18, David Miller wrote:
> From: David Miller 
> Date: Sun, 07 Jul 2019 22:18:05 -0700 (PDT)
> 
>> From: Jiangfeng Xiao 
>> Date: Fri, 5 Jul 2019 14:10:03 +0800
>>
>>> HI13X1 changed the offsets and bitmaps for tx_desc
>>> registers in the same peripheral device on different
>>> models of the hip04_eth.
>>>
>>> Signed-off-by: Jiangfeng Xiao 
>>
>> Applied.
> 
> Actually I didn't apply this because I can't see that HI13X1_GMAC
> kconfig knob anywhere in the tree at all.
> 

Thank you for your guidance, I made a mistake, for which I am
sincerely sorry for wasting your time.

I will submit the correct one again.
I will not make this low-level mistake again.

Thanks,
Jiangfeng Xiao





[PATCH 09/10] net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC

2019-07-05 Thread Jiangfeng Xiao
HI13X1 changed the offsets and bitmaps for rx_desc
registers in the same peripheral device on different
models of the hip04_eth.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index c578934..780fc46 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -171,11 +171,20 @@ struct tx_desc {
 } __aligned(64);
 
 struct rx_desc {
+#if defined(CONFIG_HI13X1_GMAC)
+   u32 reserved1[3];
+   u16 pkt_len;
+   u16 reserved_16;
+   u32 reserved2[6];
+   u32 pkt_err;
+   u32 reserved3[5];
+#else
u16 reserved_16;
u16 pkt_len;
u32 reserve1[3];
u32 pkt_err;
u32 reserve2[4];
+#endif
 };
 
 struct hip04_priv {
-- 
1.8.5.6



[PATCH 10/10] net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC

2019-07-05 Thread Jiangfeng Xiao
HI13X1 changed the offsets and bitmaps for tx_desc
registers in the same peripheral device on different
models of the hip04_eth.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 780fc46..6256357 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -76,8 +76,15 @@
 /* TX descriptor config */
 #define TX_FREE_MEMBIT(0)
 #define TX_READ_ALLOC_L3   BIT(1)
-#define TX_FINISH_CACHE_INVBIT(2)
+#if defined(CONFIG_HI13X1_GMAC)
+#define TX_CLEAR_WBBIT(7)
+#define TX_RELEASE_TO_PPE  BIT(4)
+#define TX_FINISH_CACHE_INVBIT(6)
+#define TX_POOL_SHIFT  16
+#else
 #define TX_CLEAR_WBBIT(4)
+#define TX_FINISH_CACHE_INVBIT(2)
+#endif
 #define TX_L3_CHECKSUM BIT(5)
 #define TX_LOOP_BACK   BIT(11)
 
@@ -124,6 +131,7 @@
 /* buf unit size is cache_line_size, which is 64, so the shift is 6 */
 #define PPE_BUF_SIZE_SHIFT 6
 #define PPE_TX_BUF_HOLDBIT(31)
+#define CACHE_LINE_MASK0x3F
 #else
 #define PPE_CFG_QOS_VMID_GRP_SHIFT 8
 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
@@ -163,11 +171,22 @@
 #define HIP04_MIN_TX_COALESCE_FRAMES   100
 
 struct tx_desc {
+#if defined(CONFIG_HI13X1_GMAC)
+   u32 reserved1[2];
+   u32 send_addr;
+   u16 send_size;
+   u16 data_offset;
+   u32 reserved2[7];
+   u32 cfg;
+   u32 wb_addr;
+   u32 reserved3[3];
+#else
u32 send_addr;
u32 send_size;
u32 next_addr;
u32 cfg;
u32 wb_addr;
+#endif
 } __aligned(64);
 
 struct rx_desc {
@@ -505,11 +524,20 @@ static void hip04_start_tx_timer(struct hip04_priv *priv)
 
priv->tx_skb[tx_head] = skb;
priv->tx_phys[tx_head] = phys;
-   desc->send_addr = (__force u32)cpu_to_be32(phys);
+
desc->send_size = (__force u32)cpu_to_be32(skb->len);
+#if defined(CONFIG_HI13X1_GMAC)
+   desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV
+   | TX_RELEASE_TO_PPE | priv->port << TX_POOL_SHIFT);
+   desc->data_offset = (__force u32)cpu_to_be32(phys & CACHE_LINE_MASK);
+   desc->send_addr =  (__force u32)cpu_to_be32(phys & ~CACHE_LINE_MASK);
+#else
desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+   desc->send_addr = (__force u32)cpu_to_be32(phys);
+#endif
phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
-   desc->wb_addr = (__force u32)cpu_to_be32(phys);
+   desc->wb_addr = (__force u32)cpu_to_be32(phys +
+   offsetof(struct tx_desc, send_addr));
skb_tx_timestamp(skb);
 
hip04_set_xmit_desc(priv, phys);
-- 
1.8.5.6



[PATCH 08/10] net: hisilicon: Offset buf address to adapt HI13X1_GMAC

2019-07-05 Thread Jiangfeng Xiao
The buf unit size of HI13X1_GMAC is cache_line_size,
which is 64, so the address we write to the buf register
needs to be shifted right by 6 bits.

The 31st bit of the PPE_CFG_CPU_ADD_ADDR register
of HI13X1_GMAC indicates whether to release the buffer
of the message, and the low indicates that it is valid.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 5328219..c578934 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -120,12 +120,20 @@
 #define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(0)
 #define PPE_CFG_QOS_VMID_MODE  BIT(15)
 #define PPE_CFG_BUS_LOCAL_REL  (BIT(9) | BIT(15) | BIT(19) | BIT(23))
+
+/* buf unit size is cache_line_size, which is 64, so the shift is 6 */
+#define PPE_BUF_SIZE_SHIFT 6
+#define PPE_TX_BUF_HOLDBIT(31)
 #else
 #define PPE_CFG_QOS_VMID_GRP_SHIFT 8
 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
 #define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(12)
 #define PPE_CFG_QOS_VMID_MODE  BIT(14)
 #define PPE_CFG_BUS_LOCAL_REL  BIT(14)
+
+/* buf unit size is 1, so the shift is 6 */
+#define PPE_BUF_SIZE_SHIFT 0
+#define PPE_TX_BUF_HOLD0
 #endif /* CONFIG_HI13X1_GMAC */
 
 #define PPE_CFG_RX_FIFO_FSFU   BIT(11)
@@ -286,7 +294,7 @@ static void hip04_config_fifo(struct hip04_priv *priv)
val |= PPE_CFG_QOS_VMID_MODE;
writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
 
-   val = RX_BUF_SIZE;
+   val = RX_BUF_SIZE >> PPE_BUF_SIZE_SHIFT;
regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
 
val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
@@ -369,12 +377,18 @@ static void hip04_mac_disable(struct net_device *ndev)
 
 static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
 {
-   writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
+   u32 val;
+
+   val = phys >> PPE_BUF_SIZE_SHIFT | PPE_TX_BUF_HOLD;
+   writel(val, priv->base + PPE_CFG_CPU_ADD_ADDR);
 }
 
 static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
 {
-   regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+   u32 val;
+
+   val = phys >> PPE_BUF_SIZE_SHIFT;
+   regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, val);
 }
 
 static u32 hip04_recv_cnt(struct hip04_priv *priv)
-- 
1.8.5.6



[PATCH 06/10] net: hisilicon: dt-bindings: Add an field of port-handle

2019-07-05 Thread Jiangfeng Xiao
In general, group is the same as the port, but some
boards specify a special group for better load
balancing of each processing unit.

Signed-off-by: Jiangfeng Xiao 
---
 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt 
b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
index d1df8a0..464c0da 100644
--- a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -10,6 +10,7 @@ Required properties:
phandle, specifies a reference to the syscon ppe node
port, port number connected to the controller
channel, recv channel start from channel * number (RX_DESC_NUM)
+   group, field in the pkg desc, in general, it is the same as the port.
 - phy-mode: see ethernet.txt [1].
 
 Optional properties:
@@ -66,7 +67,7 @@ Example:
reg = <0x28b 0x1>;
interrupts = <0 413 4>;
phy-mode = "mii";
-   port-handle = < 31 0>;
+   port-handle = < 31 0 31>;
};
 
ge0: ethernet@280 {
@@ -74,7 +75,7 @@ Example:
reg = <0x280 0x1>;
interrupts = <0 402 4>;
phy-mode = "sgmii";
-   port-handle = < 0 1>;
+   port-handle = < 0 1 0>;
phy-handle = <>;
};
 
@@ -83,6 +84,6 @@ Example:
reg = <0x288 0x1>;
interrupts = <0 410 4>;
phy-mode = "sgmii";
-   port-handle = < 8 2>;
+   port-handle = < 8 2 8>;
phy-handle = <>;
};
-- 
1.8.5.6



[PATCH 07/10] net: hisilicon: Add group field to adapt HI13X1_GMAC

2019-07-05 Thread Jiangfeng Xiao
In general, group is the same as the port, but some
boards specify a special group for better load
balancing of each processing unit.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 19d8cfd..5328219 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -178,6 +178,7 @@ struct hip04_priv {
int phy_mode;
int chan;
unsigned int port;
+   unsigned int group;
unsigned int speed;
unsigned int duplex;
unsigned int reg_inten;
@@ -278,10 +279,10 @@ static void hip04_config_fifo(struct hip04_priv *priv)
val |= PPE_CFG_STS_RX_PKT_CNT_RC;
writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
 
-   val = BIT(priv->port);
+   val = BIT(priv->group);
regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
 
-   val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
+   val = priv->group << PPE_CFG_QOS_VMID_GRP_SHIFT;
val |= PPE_CFG_QOS_VMID_MODE;
writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
 
@@ -876,7 +877,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
}
 #endif
 
-   ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, );
+   ret = of_parse_phandle_with_fixed_args(node, "port-handle", 3, 0, );
if (ret < 0) {
dev_warn(d, "no port-handle\n");
goto init_fail;
@@ -884,6 +885,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
 
priv->port = arg.args[0];
priv->chan = arg.args[1] * RX_DESC_NUM;
+   priv->group = arg.args[2];
 
hrtimer_init(>tx_coalesce_timer, CLOCK_MONOTONIC, 
HRTIMER_MODE_REL);
 
-- 
1.8.5.6



[PATCH 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first

2019-07-05 Thread Jiangfeng Xiao
HI13X1_GMAC delete request for soft reset at first,
otherwise, the subsequent initialization will not
take effect.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index fe61b01..19d8cfd 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 
+#define SC_PPE_RESET_DREQ  0x026C
+
 #define PPE_CFG_RX_ADDR0x100
 #define PPE_CFG_POOL_GRP   0x300
 #define PPE_CFG_RX_BUF_SIZE0x400
@@ -61,6 +63,8 @@
 
 #define PPE_HIS_RX_PKT_CNT 0x804
 
+#define RESET_DREQ_ALL 0x
+
 /* REG_INTERRUPT */
 #define RCV_INTBIT(10)
 #define RCV_NOBUF  BIT(8)
@@ -168,6 +172,9 @@ struct rx_desc {
 
 struct hip04_priv {
void __iomem *base;
+#if defined(CONFIG_HI13X1_GMAC)
+   void __iomem *sysctrl_base;
+#endif
int phy_mode;
int chan;
unsigned int port;
@@ -244,6 +251,13 @@ static void hip04_config_port(struct net_device *ndev, u32 
speed, u32 duplex)
writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
 }
 
+static void hip04_reset_dreq(struct hip04_priv *priv)
+{
+#if defined(CONFIG_HI13X1_GMAC)
+   writel_relaxed(RESET_DREQ_ALL, priv->sysctrl_base + SC_PPE_RESET_DREQ);
+#endif
+}
+
 static void hip04_reset_ppe(struct hip04_priv *priv)
 {
u32 val, tmp, timeout = 0;
@@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device *pdev)
goto init_fail;
}
 
+#if defined(CONFIG_HI13X1_GMAC)
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+   priv->sysctrl_base = devm_ioremap_resource(d, res);
+   if (IS_ERR(priv->sysctrl_base)) {
+   ret = PTR_ERR(priv->sysctrl_base);
+   goto init_fail;
+   }
+#endif
+
ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, );
if (ret < 0) {
dev_warn(d, "no port-handle\n");
@@ -921,6 +944,7 @@ static int hip04_mac_probe(struct platform_device *pdev)
ndev->irq = irq;
netif_napi_add(ndev, >napi, hip04_rx_poll, NAPI_POLL_WEIGHT);
 
+   hip04_reset_dreq(priv);
hip04_reset_ppe(priv);
if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
-- 
1.8.5.6



[PATCH 04/10] net: hisilicon: HI13X1_GMAX skip write LOCAL_PAGE_REG

2019-07-05 Thread Jiangfeng Xiao
HI13X1_GMAC changed the offsets and bitmaps for
GE_TX_LOCAL_PAGE_REG registers in the same peripheral
device on different models of the hip04_eth. With the
default configuration, HI13X1_GMAC can also work without
any writes to the GE_TX_LOCAL_PAGE_REG register.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index d8f0619..fe61b01 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -308,8 +308,10 @@ static void hip04_config_fifo(struct hip04_priv *priv)
val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
 
+#ifndef CONFIG_HI13X1_GMAC
val = GE_AUTO_NEG_CTL;
writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+#endif
 }
 
 static void hip04_mac_enable(struct net_device *ndev)
-- 
1.8.5.6



[PATCH 01/10] net: hisilicon: Add support for HI13X1 to hip04_eth

2019-07-05 Thread Jiangfeng Xiao
Extend the hip04_eth driver to support HI13X1_GMAC.
Enable it with CONFIG_HI13X1_GMAC option.

HI13X1 changed the offsets and bitmaps for registers
in the same peripheral device on different models of
the hip04_eth.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/Kconfig | 10 
 drivers/net/ethernet/hisilicon/hip04_eth.c | 37 --
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index a0d780c..3892a20 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -46,6 +46,16 @@ config HIP04_ETH
  If you wish to compile a kernel for a hardware with hisilicon p04 SoC 
and
  want to use the internal ethernet then you should answer Y to this.
 
+config HI13X1_GMAC
+   bool "Hisilicon HI13X1 Network Device Support"
+   depends on HIP04_ETH
+   help
+ If you wish to compile a kernel for a hardware with hisilicon 
hi13x1_gamc
+ then you should answer Y to this. This makes this driver suitable for 
use
+ on certain boards such as the HI13X1.
+
+ If you are unsure, say N.
+
 config HNS_MDIO
tristate
select PHYLIB
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index e1f2978..2b5112b 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -33,10 +33,23 @@
 #define GE_MODE_CHANGE_REG 0x1b4
 #define GE_RECV_CONTROL_REG0x1e0
 #define GE_STATION_MAC_ADDRESS 0x210
-#define PPE_CFG_CPU_ADD_ADDR   0x580
-#define PPE_CFG_MAX_FRAME_LEN_REG  0x408
+
 #define PPE_CFG_BUS_CTRL_REG   0x424
 #define PPE_CFG_RX_CTRL_REG0x428
+
+#if defined(CONFIG_HI13X1_GMAC)
+#define PPE_CFG_CPU_ADD_ADDR   0x6D0
+#define PPE_CFG_MAX_FRAME_LEN_REG  0x500
+#define PPE_CFG_RX_PKT_MODE_REG0x504
+#define PPE_CFG_QOS_VMID_GEN   0x520
+#define PPE_CFG_RX_PKT_INT 0x740
+#define PPE_INTEN  0x700
+#define PPE_INTSTS 0x708
+#define PPE_RINT   0x704
+#define PPE_CFG_STS_MODE   0x880
+#else
+#define PPE_CFG_CPU_ADD_ADDR   0x580
+#define PPE_CFG_MAX_FRAME_LEN_REG  0x408
 #define PPE_CFG_RX_PKT_MODE_REG0x438
 #define PPE_CFG_QOS_VMID_GEN   0x500
 #define PPE_CFG_RX_PKT_INT 0x538
@@ -44,6 +57,8 @@
 #define PPE_INTSTS 0x608
 #define PPE_RINT   0x604
 #define PPE_CFG_STS_MODE   0x700
+#endif /* CONFIG_HI13X1_GMAC */
+
 #define PPE_HIS_RX_PKT_CNT 0x804
 
 /* REG_INTERRUPT */
@@ -93,18 +108,26 @@
 #define GE_RX_PORT_EN  BIT(1)
 #define GE_TX_PORT_EN  BIT(2)
 
-#define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(12)
-
 #define PPE_CFG_RX_PKT_ALIGN   BIT(18)
-#define PPE_CFG_QOS_VMID_MODE  BIT(14)
+
+#if defined(CONFIG_HI13X1_GMAC)
+#define PPE_CFG_QOS_VMID_GRP_SHIFT 4
+#define PPE_CFG_RX_CTRL_ALIGN_SHIFT7
+#define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(0)
+#define PPE_CFG_QOS_VMID_MODE  BIT(15)
+#define PPE_CFG_BUS_LOCAL_REL  (BIT(9) | BIT(15) | BIT(19) | BIT(23))
+#else
 #define PPE_CFG_QOS_VMID_GRP_SHIFT 8
+#define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
+#define PPE_CFG_STS_RX_PKT_CNT_RC  BIT(12)
+#define PPE_CFG_QOS_VMID_MODE  BIT(14)
+#define PPE_CFG_BUS_LOCAL_REL  BIT(14)
+#endif /* CONFIG_HI13X1_GMAC */
 
 #define PPE_CFG_RX_FIFO_FSFU   BIT(11)
 #define PPE_CFG_RX_DEPTH_SHIFT 16
 #define PPE_CFG_RX_START_SHIFT 0
-#define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
 
-#define PPE_CFG_BUS_LOCAL_REL  BIT(14)
 #define PPE_CFG_BUS_BIG_ENDIEN BIT(0)
 
 #define RX_DESC_NUM128
-- 
1.8.5.6



[PATCH 02/10] net: hisilicon: Cleanup for got restricted __be32

2019-07-05 Thread Jiangfeng Xiao
This patch fixes the following warning from sparse:
hip04_eth.c:468:25: warning: incorrect type in assignment
hip04_eth.c:468:25:expected unsigned int [usertype] send_addr
hip04_eth.c:468:25:got restricted __be32 [usertype]
hip04_eth.c:469:25: warning: incorrect type in assignment
hip04_eth.c:469:25:expected unsigned int [usertype] send_size
hip04_eth.c:469:25:got restricted __be32 [usertype]
hip04_eth.c:470:19: warning: incorrect type in assignment
hip04_eth.c:470:19:expected unsigned int [usertype] cfg
hip04_eth.c:470:19:got restricted __be32 [usertype]
hip04_eth.c:472:23: warning: incorrect type in assignment
hip04_eth.c:472:23:expected unsigned int [usertype] wb_addr
hip04_eth.c:472:23:got restricted __be32 [usertype]

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 2b5112b..31f13cf 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -465,11 +465,11 @@ static void hip04_start_tx_timer(struct hip04_priv *priv)
 
priv->tx_skb[tx_head] = skb;
priv->tx_phys[tx_head] = phys;
-   desc->send_addr = cpu_to_be32(phys);
-   desc->send_size = cpu_to_be32(skb->len);
-   desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+   desc->send_addr = (__force u32)cpu_to_be32(phys);
+   desc->send_size = (__force u32)cpu_to_be32(skb->len);
+   desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
-   desc->wb_addr = cpu_to_be32(phys);
+   desc->wb_addr = (__force u32)cpu_to_be32(phys);
skb_tx_timestamp(skb);
 
hip04_set_xmit_desc(priv, phys);
-- 
1.8.5.6



[PATCH 03/10] net: hisilicon: Cleanup for cast to restricted __be32

2019-07-05 Thread Jiangfeng Xiao
This patch fixes the following warning from sparse:
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 31f13cf..d8f0619 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -530,8 +530,8 @@ static int hip04_rx_poll(struct napi_struct *napi, int 
budget)
priv->rx_phys[priv->rx_head] = 0;
 
desc = (struct rx_desc *)skb->data;
-   len = be16_to_cpu(desc->pkt_len);
-   err = be32_to_cpu(desc->pkt_err);
+   len = be16_to_cpu((__force __be16)desc->pkt_len);
+   err = be32_to_cpu((__force __be32)desc->pkt_err);
 
if (0 == len) {
dev_kfree_skb_any(skb);
-- 
1.8.5.6



[PATCH] net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC

2019-07-05 Thread Jiangfeng Xiao
HI13X1 changed the offsets and bitmaps for tx_desc
registers in the same peripheral device on different
models of the hip04_eth.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/net/ethernet/hisilicon/hip04_eth.c | 34 +++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c 
b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 780fc46..6256357 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -76,8 +76,15 @@
 /* TX descriptor config */
 #define TX_FREE_MEMBIT(0)
 #define TX_READ_ALLOC_L3   BIT(1)
-#define TX_FINISH_CACHE_INVBIT(2)
+#if defined(CONFIG_HI13X1_GMAC)
+#define TX_CLEAR_WBBIT(7)
+#define TX_RELEASE_TO_PPE  BIT(4)
+#define TX_FINISH_CACHE_INVBIT(6)
+#define TX_POOL_SHIFT  16
+#else
 #define TX_CLEAR_WBBIT(4)
+#define TX_FINISH_CACHE_INVBIT(2)
+#endif
 #define TX_L3_CHECKSUM BIT(5)
 #define TX_LOOP_BACK   BIT(11)
 
@@ -124,6 +131,7 @@
 /* buf unit size is cache_line_size, which is 64, so the shift is 6 */
 #define PPE_BUF_SIZE_SHIFT 6
 #define PPE_TX_BUF_HOLDBIT(31)
+#define CACHE_LINE_MASK0x3F
 #else
 #define PPE_CFG_QOS_VMID_GRP_SHIFT 8
 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11
@@ -163,11 +171,22 @@
 #define HIP04_MIN_TX_COALESCE_FRAMES   100
 
 struct tx_desc {
+#if defined(CONFIG_HI13X1_GMAC)
+   u32 reserved1[2];
+   u32 send_addr;
+   u16 send_size;
+   u16 data_offset;
+   u32 reserved2[7];
+   u32 cfg;
+   u32 wb_addr;
+   u32 reserved3[3];
+#else
u32 send_addr;
u32 send_size;
u32 next_addr;
u32 cfg;
u32 wb_addr;
+#endif
 } __aligned(64);
 
 struct rx_desc {
@@ -505,11 +524,20 @@ static void hip04_start_tx_timer(struct hip04_priv *priv)
 
priv->tx_skb[tx_head] = skb;
priv->tx_phys[tx_head] = phys;
-   desc->send_addr = (__force u32)cpu_to_be32(phys);
+
desc->send_size = (__force u32)cpu_to_be32(skb->len);
+#if defined(CONFIG_HI13X1_GMAC)
+   desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV
+   | TX_RELEASE_TO_PPE | priv->port << TX_POOL_SHIFT);
+   desc->data_offset = (__force u32)cpu_to_be32(phys & CACHE_LINE_MASK);
+   desc->send_addr =  (__force u32)cpu_to_be32(phys & ~CACHE_LINE_MASK);
+#else
desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+   desc->send_addr = (__force u32)cpu_to_be32(phys);
+#endif
phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
-   desc->wb_addr = (__force u32)cpu_to_be32(phys);
+   desc->wb_addr = (__force u32)cpu_to_be32(phys +
+   offsetof(struct tx_desc, send_addr));
skb_tx_timestamp(skb);
 
hip04_set_xmit_desc(priv, phys);
-- 
1.8.5.6



Re: [PATCH] irqchip/gic: Add dependency for ARM_GIC

2019-06-27 Thread Jiangfeng Xiao



On 2019/6/27 22:20, Marc Zyngier wrote:
> On 27/06/2019 15:11, Jiangfeng Xiao wrote:
>> Not every arch has ARM_GIC, it is strange
>> to see ARM_GIC_MAX_NR in the .config file
>> of the x86 and IA-64 architecture. so fix
>> build by adding necessary dependency.
>>
>> Signed-off-by: Jiangfeng Xiao 
>> ---
>>  drivers/irqchip/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
>> index 659c5e0..e338b471 100644
>> --- a/drivers/irqchip/Kconfig
>> +++ b/drivers/irqchip/Kconfig
>> @@ -19,6 +19,7 @@ config ARM_GIC_PM
>>  
>>  config ARM_GIC_MAX_NR
>>  int
>> +depends on ARM_GIC
>>  default 2 if ARCH_REALVIEW
>>  default 1
>>  
>>
> 
> Isn't that the patch[1] that has already been in -next for the past 10 
> days or so?
> 
> Thanks,
> 
>   M.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next=702655234dd1d07e76f9bff9575e438c969fc32c
> 

I am very sorry. I thought that this patch was not reviewed,
so I pushed it again. I asked my colleagues how to get information
about whether the patch is included.

Will not make this low-level mistake again.

Thanks.



[PATCH] irqchip/gic: Add dependency for ARM_GIC

2019-06-27 Thread Jiangfeng Xiao
Not every arch has ARM_GIC, it is strange
to see ARM_GIC_MAX_NR in the .config file
of the x86 and IA-64 architecture. so fix
build by adding necessary dependency.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/irqchip/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 659c5e0..e338b471 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -19,6 +19,7 @@ config ARM_GIC_PM
 
 config ARM_GIC_MAX_NR
int
+   depends on ARM_GIC
default 2 if ARCH_REALVIEW
default 1
 
-- 
1.8.5.6



[PATCH] irqchip/gic: Add dependency for ARM_GIC_MAX_NR

2019-06-14 Thread Jiangfeng Xiao
CONFIG_ARM_GIC_MAX_NR is enabled by default.
It is redundant in x86 and IA-64 where is
without GIC.

Signed-off-by: Jiangfeng Xiao 
---
 drivers/irqchip/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 659c5e0..e338b471 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -19,6 +19,7 @@ config ARM_GIC_PM
 
 config ARM_GIC_MAX_NR
int
+   depends on ARM_GIC
default 2 if ARCH_REALVIEW
default 1
 
-- 
1.8.5.6



[PATCH] PCI/hotplug: fix potential null pointer deference

2019-06-14 Thread Young Xiao
There is otherwise a risk of a null pointer dereference.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/pci/hotplug/cpqphp_ctrl.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/hotplug/cpqphp_ctrl.c 
b/drivers/pci/hotplug/cpqphp_ctrl.c
index b7f4e1f..3c8399f 100644
--- a/drivers/pci/hotplug/cpqphp_ctrl.c
+++ b/drivers/pci/hotplug/cpqphp_ctrl.c
@@ -598,10 +598,11 @@ static struct pci_resource *get_io_resource(struct 
pci_resource **head, u32 size
*head = node->next;
} else {
prevnode = *head;
-   while (prevnode->next != node)
+   while (prevnode && prevnode->next != node)
prevnode = prevnode->next;
 
-   prevnode->next = node->next;
+   if (prevnode)
+   prevnode->next = node->next;
}
node->next = NULL;
break;
@@ -788,10 +789,11 @@ static struct pci_resource *get_resource(struct 
pci_resource **head, u32 size)
*head = node->next;
} else {
prevnode = *head;
-   while (prevnode->next != node)
+   while (prevnode && prevnode->next != node)
prevnode = prevnode->next;
 
-   prevnode->next = node->next;
+   if (prevnode)
+   prevnode->next = node->next;
}
node->next = NULL;
break;
-- 
2.7.4



[PATCH] drm/amd: fix hotplug race at startup

2019-06-14 Thread Young Xiao
We should check mode_config_initialized flag in amdgpu_hotplug_work_func.

See commit 7f98ca454ad3 ("drm/radeon: fix hotplug race at startup") for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index af4c3b1..13186d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -85,6 +85,9 @@ static void amdgpu_hotplug_work_func(struct work_struct *work)
struct drm_mode_config *mode_config = >mode_config;
struct drm_connector *connector;
 
+   if (!adev->mode_info.mode_config_initialized)
+   return;
+
mutex_lock(_config->mutex);
list_for_each_entry(connector, _config->connector_list, head)
amdgpu_connector_hotplug(connector);
-- 
2.7.4



[PATCH] scsi: bnx2fc: Check if sense buffer has been allocated during completion

2019-06-14 Thread Young Xiao
sc_cmd->sense_buffer is not guaranteed to be allocated so we need to
sc_cmd->check if the pointer is NULL before trying to copy anything into it.

See commit 16a611154dc1 ("scsi: qedf: Check if sense buffer has been allocated
during completion") for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/scsi/bnx2fc/bnx2fc_io.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_io.c b/drivers/scsi/bnx2fc/bnx2fc_io.c
index 8def63c..44a5e59 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_io.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_io.c
@@ -1789,9 +1789,11 @@ static void bnx2fc_parse_fcp_rsp(struct bnx2fc_cmd 
*io_req,
fcp_sns_len = SCSI_SENSE_BUFFERSIZE;
}
 
-   memset(sc_cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
-   if (fcp_sns_len)
-   memcpy(sc_cmd->sense_buffer, rq_data, fcp_sns_len);
+   if (sc_cmd->sense_buffer) {
+   memset(sc_cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
+   if (fcp_sns_len)
+   memcpy(sc_cmd->sense_buffer, rq_data, 
fcp_sns_len);
+   }
 
/* return RQ entries */
for (i = 0; i < num_rq; i++)
-- 
2.7.4



[PATCH] af_key: Fix memory leak in key_notify_policy.

2019-06-14 Thread Young Xiao
We leak the allocated out_skb in case pfkey_xfrm_policy2msg() fails.
Fix this by freeing it on error.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 net/key/af_key.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 4af1e1d..ec414f6 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -2443,6 +2443,7 @@ static int key_pol_get_resp(struct sock *sk, struct 
xfrm_policy *xp, const struc
}
err = pfkey_xfrm_policy2msg(out_skb, xp, dir);
if (err < 0)
+   kfree_skb(out_skb);
goto out;
 
out_hdr = (struct sadb_msg *) out_skb->data;
@@ -2695,6 +2696,7 @@ static int dump_sp(struct xfrm_policy *xp, int dir, int 
count, void *ptr)
 
err = pfkey_xfrm_policy2msg(out_skb, xp, dir);
if (err < 0)
+   kfree_skb(out_skb);
return err;
 
out_hdr = (struct sadb_msg *) out_skb->data;
-- 
2.7.4



[PATCH] nfc: Ensure presence of required attributes in the deactivate_target handler

2019-06-14 Thread Young Xiao
Check that the NFC_ATTR_TARGET_INDEX attributes (in addition to
NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to
accessing them. This prevents potential unhandled NULL pointer dereference
exceptions which can be triggered by malicious user-mode programs,
if they omit one or both of these attributes.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 net/nfc/netlink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c
index 04a8e47..89d885d 100644
--- a/net/nfc/netlink.c
+++ b/net/nfc/netlink.c
@@ -923,7 +923,8 @@ static int nfc_genl_deactivate_target(struct sk_buff *skb,
u32 device_idx, target_idx;
int rc;
 
-   if (!info->attrs[NFC_ATTR_DEVICE_INDEX])
+   if (!info->attrs[NFC_ATTR_DEVICE_INDEX] ||
+   !info->attrs[NFC_ATTR_TARGET_INDEX])
return -EINVAL;
 
device_idx = nla_get_u32(info->attrs[NFC_ATTR_DEVICE_INDEX]);
-- 
2.7.4



Re: [PATCH v3 1/9] Documentation: Introduce EPT based Subpage Protection

2019-06-09 Thread Jidong Xiao
On Fri, Jun 7, 2019 at 7:12 AM Yang Weijiang  wrote:
>
> On Thu, Jun 06, 2019 at 09:57:00PM -0600, Jidong Xiao wrote:
> > Hi, Weijiang,
> >
> > Does this require some specific Intel processors or is it supported by
> > older processors as well?
> >
> > -Jidong
> Hi, Jidong,
> SPP is a feature on new platforms, so only available with new
> Intel processors.
Oh, I see. Thanks!

-Jidong

> >
> > On Thu, Jun 6, 2019 at 9:33 AM Yang Weijiang  
> > wrote:
> > >
> > > Signed-off-by: Yang Weijiang 
> > > ---
> > >  Documentation/virtual/kvm/spp_kvm.txt | 216 ++
> > >  1 file changed, 216 insertions(+)
> > >  create mode 100644 Documentation/virtual/kvm/spp_kvm.txt
> > >
> > > diff --git a/Documentation/virtual/kvm/spp_kvm.txt 
> > > b/Documentation/virtual/kvm/spp_kvm.txt
> > > new file mode 100644
> > > index ..4b5edcaf48b6
> > > --- /dev/null
> > > +++ b/Documentation/virtual/kvm/spp_kvm.txt
> > > @@ -0,0 +1,216 @@
> > > +EPT-Based Sub-Page Protection (SPP) for KVM
> > > +=
> > > +
> > > +1. Overview
> > > +
> > > +EPT-based Sub-Page Protection (SPP) capability to allow Virtual Machine
> > > +Monitors to specify write-protection for guest physical memory at a
> > > +sub-page (128 byte) granularity. When this capability is utilized, the
> > > +CPU enforces write-access permissions for sub-page regions inside 4K 
> > > pages
> > > +as specified by the VMI tools.
> > > +
> > > +2. Operation of SPP
> > > +
> > > +Sub-Page Protection Table (SPPT) is introduced to manage sub-page
> > > +write-access.
> > > +
> > > +SPPT is active when:
> > > +a) moddule parameter spp=on is configured for kvm-intel.ko
> > > +b) large paging is disabled on host
> > > +c) "sub-page write protection" VM-execution control bit is set
> > > +SPPT looks up the guest physical address to seek a 64-bit
> > > +bitmap indicating sub-page write permission in SPPT leaf entry.
> > > +
> > > +When the "sub-page write protection" VM-execution control is 1, the SPPT
> > > +is used to lookup write permission bits for the 128 byte sub-page regions
> > > +contained in the 4KB guest physical page. EPT specifies the 4KB page
> > > +write-protection privilege whereas SPPT defines the write permissions
> > > +at 128-byte granularity within one 4KB page. Write accesses
> > > +prevented due to sub-page permissions induces EPT violation VM exits.
> > > +Similar to EPT, a logical processor uses SPPT to lookup sub-page level
> > > +write permissions for guest-physical addresses only when those addresses
> > > +are used to access memory.
> > > +__
> > > +
> > > +How SPP hardware works:
> > > +__
> > > +
> > > +Guest write access --> GPA --> Walk EPT --> EPT leaf entry -|
> > > +|---|
> > > +|-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
> > > + |
> > > + |->  --> EPT legacy behavior
> > > + |
> > > + |
> > > + |->   --> if ept_leaf_entry.writable
> > > +  |
> > > +  |->   --> Ignore SPP
> > > +  |
> > > +  |->  --> GPA --> Walk SPP 4-level table--|
> > > +  |
> > > +|<--get-the-SPPT-point-from-VMCS-filed-<--|
> > > +|
> > > +Walk SPP L4E table
> > > +|
> > > +|---> if-entry-misconfiguration >---|---<-|
> > > + |  | |
> > > +else| |
> > > + |  | |
> > > + |   |--SPP VMexit<-| |
> > > + |   ||
> > > + |   |

Re: [PATCH v3 1/9] Documentation: Introduce EPT based Subpage Protection

2019-06-06 Thread Jidong Xiao
Hi, Weijiang,

Does this require some specific Intel processors or is it supported by
older processors as well?

-Jidong

On Thu, Jun 6, 2019 at 9:33 AM Yang Weijiang  wrote:
>
> Signed-off-by: Yang Weijiang 
> ---
>  Documentation/virtual/kvm/spp_kvm.txt | 216 ++
>  1 file changed, 216 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/spp_kvm.txt
>
> diff --git a/Documentation/virtual/kvm/spp_kvm.txt 
> b/Documentation/virtual/kvm/spp_kvm.txt
> new file mode 100644
> index ..4b5edcaf48b6
> --- /dev/null
> +++ b/Documentation/virtual/kvm/spp_kvm.txt
> @@ -0,0 +1,216 @@
> +EPT-Based Sub-Page Protection (SPP) for KVM
> +=
> +
> +1. Overview
> +
> +EPT-based Sub-Page Protection (SPP) capability to allow Virtual Machine
> +Monitors to specify write-protection for guest physical memory at a
> +sub-page (128 byte) granularity. When this capability is utilized, the
> +CPU enforces write-access permissions for sub-page regions inside 4K pages
> +as specified by the VMI tools.
> +
> +2. Operation of SPP
> +
> +Sub-Page Protection Table (SPPT) is introduced to manage sub-page
> +write-access.
> +
> +SPPT is active when:
> +a) moddule parameter spp=on is configured for kvm-intel.ko
> +b) large paging is disabled on host
> +c) "sub-page write protection" VM-execution control bit is set
> +SPPT looks up the guest physical address to seek a 64-bit
> +bitmap indicating sub-page write permission in SPPT leaf entry.
> +
> +When the "sub-page write protection" VM-execution control is 1, the SPPT
> +is used to lookup write permission bits for the 128 byte sub-page regions
> +contained in the 4KB guest physical page. EPT specifies the 4KB page
> +write-protection privilege whereas SPPT defines the write permissions
> +at 128-byte granularity within one 4KB page. Write accesses
> +prevented due to sub-page permissions induces EPT violation VM exits.
> +Similar to EPT, a logical processor uses SPPT to lookup sub-page level
> +write permissions for guest-physical addresses only when those addresses
> +are used to access memory.
> +__
> +
> +How SPP hardware works:
> +__
> +
> +Guest write access --> GPA --> Walk EPT --> EPT leaf entry -|
> +|---|
> +|-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
> + |
> + |->  --> EPT legacy behavior
> + |
> + |
> + |->   --> if ept_leaf_entry.writable
> +  |
> +  |->   --> Ignore SPP
> +  |
> +  |->  --> GPA --> Walk SPP 4-level table--|
> +  |
> +|<--get-the-SPPT-point-from-VMCS-filed-<--|
> +|
> +Walk SPP L4E table
> +|
> +|---> if-entry-misconfiguration >---|---<-|
> + |  | |
> +else| |
> + |  | |
> + |   |--SPP VMexit<-| |
> + |   ||
> + |   |-> exit_qualification & sppt_misconfig --> sppt misconfig   |
> + |   ||
> + |   |-> exit_qualification & sppt_miss --> sppt miss |
> + |---||
> + ||
> +walk SPPT L3E--|--> if-entry-misconfiguration>|
> +   |  |
> +  else|
> +   |  |
> +   |  |
> +walk SPPT L2E --|--> if-entry-misconfiguration>---|
> +| |
> +   else   |
> +| |
> +| |
> + walk SPPT L1E --|-> if-entry-misconfiguration--->|
> + |
> +   else
> + |
> + |-> if sub-page writable
> + |->   allow, write access
> + |->  disallow, EPT violation
> 

Re: [PATCH 3/3] rpmsg: virtio_rpmsg_bus: get buffer size from config space

2019-06-05 Thread xiang xiao
On Wed, Jun 5, 2019 at 4:02 PM Arnaud Pouliquen  wrote:
>
>
>
> On 6/5/19 4:40 AM, xiang xiao wrote:
> > On Tue, Jun 4, 2019 at 10:25 PM Arnaud Pouliquen
> >  wrote:
> >>
> >> Hello Xiang,
> >>
> >> On 5/9/19 3:00 PM, xiang xiao wrote:
> >>> On Thu, May 9, 2019 at 8:36 PM Arnaud Pouliquen  
> >>> wrote:
> >>>>
> >>>> Hello Xiang,
> >>>>
> >>>> Similar mechanism has been proposed by Loic 2 years ago (link to the
> >>>> series here https://lkml.org/lkml/2017/3/28/349).
> >>>>
> >>>> Did you see them? Regarding history, patches seem just on hold...
> >>>>
> >>>
> >>> Just saw this patchset, so it's common problem hit by many vendor,
> >>> rpmsg framework need to address it.:)
> >>>
> >>>> Main differences (except interesting RX/TX size split) seems that you
> >>>> - don't use the virtio_config_ops->get
> >>>
> >>> virtio_cread call virtio_config_ops->get internally, the ideal is same
> >>> for both patch, just the implementation detail is different.
> >>>
> >>>> - define a new feature VIRTIO_RPMSG_F_NS.
> >>>
> >>> I add this flag to keep the compatibility with old remote peer, and
> >>> also follow the common virito driver practice.
> >> I discussed with Loic, he is ok to go further with your patch and
> >> abandon his one. Please find some remarks below in-line
> >>>
> >>>>
> >>>> Regards
> >>>> Arnaud
> >>>>
> >>>>
> >>>> On 1/31/19 4:41 PM, Xiang Xiao wrote:
> >>>>> 512 bytes isn't always suitable for all case, let firmware
> >>>>> maker decide the best value from resource table.
> >>>>> enable by VIRTIO_RPMSG_F_BUFSZ feature bit.
> >>>>>
> >>>>> Signed-off-by: Xiang Xiao 
> >>>>> ---
> >>>>>  drivers/rpmsg/virtio_rpmsg_bus.c  | 50 
> >>>>> +--
> >>>>>  include/uapi/linux/virtio_rpmsg.h | 24 +++
> >>>>>  2 files changed, 56 insertions(+), 18 deletions(-)
> >>>>>  create mode 100644 include/uapi/linux/virtio_rpmsg.h
> >>>>>
> >>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c 
> >>>>> b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>> index 59c4554..049dd97 100644
> >>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>>>> @@ -16,6 +16,7 @@
> >>>>>  #include 
> >>>>>  #include 
> >>>>>  #include 
> >>>>> +#include 
> >>>>>  #include 
> >>>>>  #include 
> >>>>>  #include 
> >>>>> @@ -38,7 +39,8 @@
> >>>>>   * @sbufs:   kernel address of tx buffers
> >>>>>   * @num_rbufs:   total number of buffers for rx
> >>>>>   * @num_sbufs:   total number of buffers for tx
> >>>>> - * @buf_size:size of one rx or tx buffer
> >>>>> + * @rbuf_size:   size of one rx buffer
> >>>>> + * @sbuf_size:   size of one tx buffer
> >>>>>   * @last_sbuf:   index of last tx buffer used
> >>>>>   * @rbufs_dma:   dma base addr of rx buffers
> >>>>>   * @sbufs_dma:   dma base addr of tx buffers
> >>>>> @@ -61,7 +63,8 @@ struct virtproc_info {
> >>>>>   void *rbufs, *sbufs;
> >>>>>   unsigned int num_rbufs;
> >>>>>   unsigned int num_sbufs;
> >>>>> - unsigned int buf_size;
> >>>>> + unsigned int rbuf_size;
> >>>>> + unsigned int sbuf_size;
> >>>>>   int last_sbuf;
> >>>>>   dma_addr_t rbufs_dma;
> >>>>>   dma_addr_t sbufs_dma;
> >>>>> @@ -73,9 +76,6 @@ struct virtproc_info {
> >>>>>   struct rpmsg_endpoint *ns_ept;
> >>>>>  };
> >>>>>
> >>>>> -/* The feature bitmap for virtio rpmsg */
> >>>>> -#define VIRTIO_RPMSG_F_NS0 /* RP supports name service 
> >>>>> notifications */
> >>>>> -
> >>>>>  /**
> >>>>>   *

Re: [PATCH 0/3] Enhance virtio rpmsg bus driver buffer allocation

2019-06-05 Thread xiang xiao
On Wed, Jun 5, 2019 at 3:33 PM Arnaud Pouliquen  wrote:
>
> Hi Bjorn,
>
> On 6/5/19 6:34 AM, Bjorn Andersson wrote:
> > On Thu 31 Jan 07:41 PST 2019, Xiang Xiao wrote:
> >
> >> Hi,
> >> This series enhance the buffer allocation by:
> >> 1.Support the different buffer number in rx/tx direction
> >> 2.Get the individual rx/tx buffer size from config space
> >>
> >> Here is the related OpenAMP change:
> >> https://github.com/OpenAMP/open-amp/pull/155
> >>
> >
> > This looks pretty reasonable, but can you confirm that it's possible to
> > use new firmware with an old Linux kernel when introducing this?
> >
> >
> > But ever since we discussed Loic's similar proposal earlier I've been
> > questioning if the fixed buffer size isn't just an artifact of how we
> > preallocate our buffers. The virtqueue seems to support arbitrary sizes
> > of buffers and I see that the receive function in OpenAMP has been fixed
> > to put back the buffer of the size that was received, rather than 512
> > bytes. So it seems like Linux would be able to send whatever size
> > messages to OpenAMP it would handle it.
> >
> > The question is if we could do the same in the other direction, perhaps
> > by letting the OpenAMP side do it's message allocation when it's
> > sending, rather than Linux pushing inbufs to be filled by the remote.
>
> IMHO, both could be useful and could be not correlated.
> On-the fly buffer allocation seems more efficient but needs an
> allocator.This can be a generic allocator (with a va to da) for system
> where large amount of memories are accessible from both side.
>
> Now what about system with small shared memory? In this case you have to
> deal with a limited/optimized memory chunk. To avoid memory
> fragmentation the allocator should have a pre-reserved buffers pool(so
> similar to existing implementation). This serie seems useful to optimize
> the size of the pre-reserved pool.
>

Maybe we can reuse rxbuf_size/txbuf_size to trigger the different
allocation policy:
1.If buf_size equal 0xfff, turn on the dynamic allocator
2.If buf_size equall 0, turn on the fixed allocator with the default buffer size
3.otherwise, turn on the fixed allocator with the configed buffer size
So, both requirement could be satisfied without breaking the compatibility.

> >
> > This would remove the problem of always having suboptimal buffer sizes.
> >
> > Regards,
> > Bjorn
> >
> >> Xiang Xiao (3):
> >>   rpmsg: virtio_rpmsg_bus: allow the different vring size for send/recv
> >>   rpmsg: virtio_rpmsg_bus: allocate rx/tx buffer separately
> >>   rpmsg: virtio_rpmsg_bus: get buffer size from config space
> >>
> >>  drivers/rpmsg/virtio_rpmsg_bus.c  | 127 
> >> +++---
> >>  include/uapi/linux/virtio_rpmsg.h |  24 +++
> >>  2 files changed, 100 insertions(+), 51 deletions(-)
> >>  create mode 100644 include/uapi/linux/virtio_rpmsg.h
> >>
> >> --
> >> 2.7.4
> >>
>
> --
>
> Regards,
> Arnaud


Re: [PATCH 3/3] rpmsg: virtio_rpmsg_bus: get buffer size from config space

2019-06-04 Thread xiang xiao
On Tue, Jun 4, 2019 at 10:25 PM Arnaud Pouliquen
 wrote:
>
> Hello Xiang,
>
> On 5/9/19 3:00 PM, xiang xiao wrote:
> > On Thu, May 9, 2019 at 8:36 PM Arnaud Pouliquen  
> > wrote:
> >>
> >> Hello Xiang,
> >>
> >> Similar mechanism has been proposed by Loic 2 years ago (link to the
> >> series here https://lkml.org/lkml/2017/3/28/349).
> >>
> >> Did you see them? Regarding history, patches seem just on hold...
> >>
> >
> > Just saw this patchset, so it's common problem hit by many vendor,
> > rpmsg framework need to address it.:)
> >
> >> Main differences (except interesting RX/TX size split) seems that you
> >> - don't use the virtio_config_ops->get
> >
> > virtio_cread call virtio_config_ops->get internally, the ideal is same
> > for both patch, just the implementation detail is different.
> >
> >> - define a new feature VIRTIO_RPMSG_F_NS.
> >
> > I add this flag to keep the compatibility with old remote peer, and
> > also follow the common virito driver practice.
> I discussed with Loic, he is ok to go further with your patch and
> abandon his one. Please find some remarks below in-line
> >
> >>
> >> Regards
> >> Arnaud
> >>
> >>
> >> On 1/31/19 4:41 PM, Xiang Xiao wrote:
> >>> 512 bytes isn't always suitable for all case, let firmware
> >>> maker decide the best value from resource table.
> >>> enable by VIRTIO_RPMSG_F_BUFSZ feature bit.
> >>>
> >>> Signed-off-by: Xiang Xiao 
> >>> ---
> >>>  drivers/rpmsg/virtio_rpmsg_bus.c  | 50 
> >>> +--
> >>>  include/uapi/linux/virtio_rpmsg.h | 24 +++
> >>>  2 files changed, 56 insertions(+), 18 deletions(-)
> >>>  create mode 100644 include/uapi/linux/virtio_rpmsg.h
> >>>
> >>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c 
> >>> b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> index 59c4554..049dd97 100644
> >>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c
> >>> @@ -16,6 +16,7 @@
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>> +#include 
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>> @@ -38,7 +39,8 @@
> >>>   * @sbufs:   kernel address of tx buffers
> >>>   * @num_rbufs:   total number of buffers for rx
> >>>   * @num_sbufs:   total number of buffers for tx
> >>> - * @buf_size:size of one rx or tx buffer
> >>> + * @rbuf_size:   size of one rx buffer
> >>> + * @sbuf_size:   size of one tx buffer
> >>>   * @last_sbuf:   index of last tx buffer used
> >>>   * @rbufs_dma:   dma base addr of rx buffers
> >>>   * @sbufs_dma:   dma base addr of tx buffers
> >>> @@ -61,7 +63,8 @@ struct virtproc_info {
> >>>   void *rbufs, *sbufs;
> >>>   unsigned int num_rbufs;
> >>>   unsigned int num_sbufs;
> >>> - unsigned int buf_size;
> >>> + unsigned int rbuf_size;
> >>> + unsigned int sbuf_size;
> >>>   int last_sbuf;
> >>>   dma_addr_t rbufs_dma;
> >>>   dma_addr_t sbufs_dma;
> >>> @@ -73,9 +76,6 @@ struct virtproc_info {
> >>>   struct rpmsg_endpoint *ns_ept;
> >>>  };
> >>>
> >>> -/* The feature bitmap for virtio rpmsg */
> >>> -#define VIRTIO_RPMSG_F_NS0 /* RP supports name service notifications 
> >>> */
> >>> -
> >>>  /**
> >>>   * struct rpmsg_hdr - common header for all rpmsg messages
> >>>   * @src: source address
> >>> @@ -452,7 +452,7 @@ static void *get_a_tx_buf(struct virtproc_info *vrp)
> >>>
> >>>   /* either pick the next unused tx buffer */
> >>>   if (vrp->last_sbuf < vrp->num_sbufs)
> >>> - ret = vrp->sbufs + vrp->buf_size * vrp->last_sbuf++;
> >>> + ret = vrp->sbufs + vrp->sbuf_size * vrp->last_sbuf++;
> >>>   /* or recycle a used one */
> >>>   else
> >>>   ret = virtqueue_get_buf(vrp->svq, );
> >>> @@ -578,7 +578,7 @@ static int rpmsg_send_offchannel_raw(struct 
> >>> rpmsg_device *rpdev,
> >>>* messaging), or t

Re: [PATCH] net: compat: fix msg_controllen overflow in scm_detach_fds_compat()

2019-06-04 Thread Yang Xiao
On Tue, Jun 4, 2019 at 8:46 PM Daniel Borkmann  wrote:
>
> On 06/04/2019 02:31 PM, Young Xiao wrote:
> > There is a missing check between kmsg->msg_controllen and cmlen,
> > which can possibly lead to overflow.
> >
> > This bug is similar to vulnerability that was fixed in commit 6900317f5eff
> > ("net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds").
>
> Back then I mentioned in commit 6900317f5eff:
>
> In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
> issue in my tests as alignment seems always on 4 byte boundary. Same
> should be in case of native 32 bit, where we end up with 4 byte boundaries
> as well.
>
> Do you have an actual reproducer or is it based on code inspection?

based on inspection.
>
> > Signed-off-by: Young Xiao <92siuy...@gmail.com>
> > ---
> >  net/compat.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/compat.c b/net/compat.c
> > index a031bd3..8e74dfb 100644
> > --- a/net/compat.c
> > +++ b/net/compat.c
> > @@ -301,6 +301,8 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct 
> > scm_cookie *scm)
> >   err = put_user(cmlen, >cmsg_len);
> >   if (!err) {
> >   cmlen = CMSG_COMPAT_SPACE(i * sizeof(int));
> > + if (kmsg->msg_controllen < cmlen)
> > + cmlen = kmsg->msg_controllen;
> >   kmsg->msg_control += cmlen;
> >   kmsg->msg_controllen -= cmlen;
> >   }
> >
>


-- 
Best regards!

Young
---


[PATCH] iio:core: Fix bug in length of event info_mask and catch unhandled bits set in masks.

2019-06-04 Thread Young Xiao
The incorrect limit for the for_each_set_bit loop was noticed whilst fixing
this other case.  Note that as we only have 3 possible entries a the moment
and the value was set to 4, the bug would not have any effect currently.
It will bite fairly soon though, so best fix it now.

See commit ef4b4856593f ("iio:core: Fix bug in length of event info_mask and
catch unhandled bits set in masks.") for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/iio/industrialio-core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index f5a4581..dd8873a 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1107,6 +1107,8 @@ static int iio_device_add_info_mask_type_avail(struct 
iio_dev *indio_dev,
char *avail_postfix;
 
for_each_set_bit(i, infomask, sizeof(*infomask) * 8) {
+   if (i >= ARRAY_SIZE(iio_chan_info_postfix))
+   return -EINVAL;
avail_postfix = kasprintf(GFP_KERNEL,
  "%s_available",
  iio_chan_info_postfix[i]);
-- 
2.7.4



[PATCH] net: compat: fix msg_controllen overflow in scm_detach_fds_compat()

2019-06-04 Thread Young Xiao
There is a missing check between kmsg->msg_controllen and cmlen,
which can possibly lead to overflow.

This bug is similar to vulnerability that was fixed in commit 6900317f5eff
("net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds").

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 net/compat.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/compat.c b/net/compat.c
index a031bd3..8e74dfb 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -301,6 +301,8 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct 
scm_cookie *scm)
err = put_user(cmlen, >cmsg_len);
if (!err) {
cmlen = CMSG_COMPAT_SPACE(i * sizeof(int));
+   if (kmsg->msg_controllen < cmlen)
+   cmlen = kmsg->msg_controllen;
kmsg->msg_control += cmlen;
kmsg->msg_controllen -= cmlen;
}
-- 
2.7.4



[PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()

2019-06-04 Thread Young Xiao
If vpif_probe() fails on v4l2_device_register() and vpif_probe_complete(),
then memory allocated at initialize_vpif() for global vpif_obj.dev[i]
become unreleased.

The patch adds deallocation of vpif_obj.dev[i] on the error path.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/media/platform/davinci/vpif_capture.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/davinci/vpif_capture.c 
b/drivers/media/platform/davinci/vpif_capture.c
index b5aacb0..75c2c10 100644
--- a/drivers/media/platform/davinci/vpif_capture.c
+++ b/drivers/media/platform/davinci/vpif_capture.c
@@ -1385,6 +1385,14 @@ static int initialize_vpif(void)
return err;
 }
 
+static inline void free_vpif_objs(void)
+{
+   int i;
+
+   for (i = 0; i < VPIF_CAPTURE_MAX_DEVICES; i++)
+   kfree(vpif_obj.dev[i]);
+}
+
 static int vpif_async_bound(struct v4l2_async_notifier *notifier,
struct v4l2_subdev *subdev,
struct v4l2_async_subdev *asd)
@@ -1654,7 +1662,7 @@ static __init int vpif_probe(struct platform_device *pdev)
err = v4l2_device_register(vpif_dev, _obj.v4l2_dev);
if (err) {
v4l2_err(vpif_dev->driver, "Error registering v4l2 device\n");
-   goto cleanup;
+   goto vpif_free;
}
 
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, res_idx))) {
@@ -1701,7 +1709,9 @@ static __init int vpif_probe(struct platform_device *pdev)
  "registered sub device %s\n",
   subdevdata->name);
}
-   vpif_probe_complete();
+   err = vpif_probe_complete();
+   if (err)
+   goto probe_subdev_out;
} else {
vpif_obj.notifier.ops = _async_ops;
err = v4l2_async_notifier_register(_obj.v4l2_dev,
@@ -1720,6 +1730,8 @@ static __init int vpif_probe(struct platform_device *pdev)
kfree(vpif_obj.sd);
 vpif_unregister:
v4l2_device_unregister(_obj.v4l2_dev);
+vpif_free:
+   free_vpif_objs();
 cleanup:
v4l2_async_notifier_cleanup(_obj.notifier);
 
-- 
2.7.4



Re: [PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()

2019-06-04 Thread Yang Xiao
On Tue, Jun 4, 2019 at 4:15 PM Lad, Prabhakar
 wrote:
>
> Hi Young,
>
> Thanks for the patch.
>
> On Tue, Jun 4, 2019 at 8:49 AM Young Xiao <92siuy...@gmail.com> wrote:
> >
> > If vpif_probe() fails on v4l2_device_register() and vpif_probe_complete(),
> > then memory allocated at initialize_vpif() for global vpif_obj.dev[i]
> > become unreleased.
> >
> > The patch adds deallocation of vpif_obj.dev[i] on the error path.
> >
> > Signed-off-by: Young Xiao <92siuy...@gmail.com>
> > ---
> >  drivers/media/platform/davinci/vpif_capture.c | 19 ---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/media/platform/davinci/vpif_capture.c 
> > b/drivers/media/platform/davinci/vpif_capture.c
> > index b5aacb0..277d500 100644
> > --- a/drivers/media/platform/davinci/vpif_capture.c
> > +++ b/drivers/media/platform/davinci/vpif_capture.c
> > @@ -1385,6 +1385,14 @@ static int initialize_vpif(void)
> > return err;
> >  }
> >
> > +static void free_vpif_objs(void)
> > +{
> function could be made inline.
>
> > +   int i;
> > +
> > +   for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++)
>
> VPIF_DISPLAY_MAX_DEVICES ? this should be  VPIF_CAPTURE_MAX_DEVICES
>
> > +   kfree(vpif_obj.dev[i]);
> > +}
> > +
> >  static int vpif_async_bound(struct v4l2_async_notifier *notifier,
> > struct v4l2_subdev *subdev,
> > struct v4l2_async_subdev *asd)
> > @@ -1654,7 +1662,7 @@ static __init int vpif_probe(struct platform_device 
> > *pdev)
> > err = v4l2_device_register(vpif_dev, _obj.v4l2_dev);
> > if (err) {
> > v4l2_err(vpif_dev->driver, "Error registering v4l2 
> > device\n");
> > -   goto cleanup;
> > +   goto vpif_free;
> > }
> >
> > while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, 
> > res_idx))) {
> > @@ -1701,7 +1709,10 @@ static __init int vpif_probe(struct platform_device 
> > *pdev)
> >   "registered sub device %s\n",
> >subdevdata->name);
> > }
> > -   vpif_probe_complete();
> > +   err = vpif_probe_complete();
> > +   if (err) {
> > +   goto probe_subdev_out;
> > +   }
>
> No need for { and } as per kernel coding style.

Sorry, I can not get your point here.
There is no need to check the return value of vpif_probe_complete(), isn't it?
So, we just fix the memory leak in the error path of v4l2_device_register()?

>
> > } else {
> > vpif_obj.notifier.ops = _async_ops;
> > err = v4l2_async_notifier_register(_obj.v4l2_dev,
> > @@ -1720,6 +1731,8 @@ static __init int vpif_probe(struct platform_device 
> > *pdev)
> > kfree(vpif_obj.sd);
> >  vpif_unregister:
> > v4l2_device_unregister(_obj.v4l2_dev);
> > +vpif_free:
> > +   free_vpif_objs();
> >  cleanup:
> > v4l2_async_notifier_cleanup(_obj.notifier);
> >
> > @@ -1748,8 +1761,8 @@ static int vpif_remove(struct platform_device *device)
> > ch = vpif_obj.dev[i];
> > /* Unregister video device */
> > video_unregister_device(>video_dev);
> > -   kfree(vpif_obj.dev[i]);
> > }
> > +   free_vpif_objs();
>
> no need for this change, leave it as it is.
>
> Cheers,
> Prabhakar Lad



-- 
Best regards!

Young
---


[PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()

2019-06-04 Thread Young Xiao
If vpif_probe() fails on v4l2_device_register() and vpif_probe_complete(),
then memory allocated at initialize_vpif() for global vpif_obj.dev[i]
become unreleased.

The patch adds deallocation of vpif_obj.dev[i] on the error path.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/media/platform/davinci/vpif_capture.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/media/platform/davinci/vpif_capture.c 
b/drivers/media/platform/davinci/vpif_capture.c
index b5aacb0..277d500 100644
--- a/drivers/media/platform/davinci/vpif_capture.c
+++ b/drivers/media/platform/davinci/vpif_capture.c
@@ -1385,6 +1385,14 @@ static int initialize_vpif(void)
return err;
 }
 
+static void free_vpif_objs(void)
+{
+   int i;
+
+   for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++)
+   kfree(vpif_obj.dev[i]);
+}
+
 static int vpif_async_bound(struct v4l2_async_notifier *notifier,
struct v4l2_subdev *subdev,
struct v4l2_async_subdev *asd)
@@ -1654,7 +1662,7 @@ static __init int vpif_probe(struct platform_device *pdev)
err = v4l2_device_register(vpif_dev, _obj.v4l2_dev);
if (err) {
v4l2_err(vpif_dev->driver, "Error registering v4l2 device\n");
-   goto cleanup;
+   goto vpif_free;
}
 
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, res_idx))) {
@@ -1701,7 +1709,10 @@ static __init int vpif_probe(struct platform_device 
*pdev)
  "registered sub device %s\n",
   subdevdata->name);
}
-   vpif_probe_complete();
+   err = vpif_probe_complete();
+   if (err) {
+   goto probe_subdev_out;
+   }
} else {
vpif_obj.notifier.ops = _async_ops;
err = v4l2_async_notifier_register(_obj.v4l2_dev,
@@ -1720,6 +1731,8 @@ static __init int vpif_probe(struct platform_device *pdev)
kfree(vpif_obj.sd);
 vpif_unregister:
v4l2_device_unregister(_obj.v4l2_dev);
+vpif_free:
+   free_vpif_objs();
 cleanup:
v4l2_async_notifier_cleanup(_obj.notifier);
 
@@ -1748,8 +1761,8 @@ static int vpif_remove(struct platform_device *device)
ch = vpif_obj.dev[i];
/* Unregister video device */
video_unregister_device(>video_dev);
-   kfree(vpif_obj.dev[i]);
}
+   free_vpif_objs();
return 0;
 }
 
-- 
2.7.4



Re: [PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()

2019-06-04 Thread Yang Xiao
Yes, you are correct. I will fix the issue and resubmit the patch again.

On Mon, Jun 3, 2019 at 7:55 PM Hans Verkuil  wrote:
>
> On 5/29/19 3:09 PM, Young Xiao wrote:
> > If vpif_probe() fails on vpif_probe_complete(), then memory
> > allocated at initialize_vpif() for global vpif_obj.dev[i]
> > become unreleased.
> >
> > The patch adds deallocation of vpif_obj.dev[i] on the error path.
> >
> > Signed-off-by: Young Xiao <92siuy...@gmail.com>
> > ---
> >  drivers/media/platform/davinci/vpif_capture.c | 16 ++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/media/platform/davinci/vpif_capture.c 
> > b/drivers/media/platform/davinci/vpif_capture.c
> > index b5aacb0..63e6ec4 100644
> > --- a/drivers/media/platform/davinci/vpif_capture.c
> > +++ b/drivers/media/platform/davinci/vpif_capture.c
> > @@ -1621,6 +1621,14 @@ vpif_capture_get_pdata(struct platform_device *pdev)
> >   return NULL;
> >  }
> >
> > +static void free_vpif_objs(void)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++)
> > + kfree(vpif_obj.dev[i]);
> > +}
> > +
> >  /**
> >   * vpif_probe : This function probes the vpif capture driver
> >   * @pdev: platform device pointer
> > @@ -1701,7 +1709,10 @@ static __init int vpif_probe(struct platform_device 
> > *pdev)
> > "registered sub device %s\n",
> >  subdevdata->name);
> >   }
> > - vpif_probe_complete();
> > + err = vpif_probe_complete();
> > + if (err) {
> > + goto probe_subdev_out;
> > + }
>
> No need for { and } as per kernel coding style.
>
> >   } else {
> >   vpif_obj.notifier.ops = _async_ops;
> >   err = v4l2_async_notifier_register(_obj.v4l2_dev,
> > @@ -1722,6 +1733,7 @@ static __init int vpif_probe(struct platform_device 
> > *pdev)
> >   v4l2_device_unregister(_obj.v4l2_dev);
> >  cleanup:
> >   v4l2_async_notifier_cleanup(_obj.notifier);
> > + free_vpif_objs();
>
> This would break the case where initialize_vpif() returns an error, since
> initialize_vpif already frees these objects on error.
>
> In this case the easiest way of doing this is to just return if 
> initialize_vpif
> returns an error. No need to clean up anything.
>
> Regards,
>
> Hans
>
> >
> >   return err;
> >  }
> > @@ -1748,8 +1760,8 @@ static int vpif_remove(struct platform_device *device)
> >   ch = vpif_obj.dev[i];
> >   /* Unregister video device */
> >   video_unregister_device(>video_dev);
> > - kfree(vpif_obj.dev[i]);
> >   }
> > + free_vpif_objs()
> >   return 0;
> >  }
> >
> >
>


Re: [PATCH] ipv6: Prevent overrun when parsing v6 header options

2019-06-04 Thread Yang Xiao
Sorry, I don't get your point. Why is xfrm6_transport_output() buggy?
The point is that there would be out-of-bound access in
mip6_destopt_offset() and mip6_destopt_offset(), since there is no
sanity check for offset.

There is chance that offset + sizeof(struct ipv6_opt_hdr) > packet_len.

As described in CVE-2017-9074:  "The IPv6 fragmentation implementation
in the Linux kernel through 4.11.1 does not consider that the nexthdr
field may be associated with an invalid option, which allows local
users to cause a denial of service (out-of-bounds read and BUG)".

At the same time, there are bugs in  mip6_destopt_offset() and
mip6_destopt_offset(), which is similar to CVE-2017-7542.

On Sat, Jun 1, 2019 at 1:35 AM Eric Dumazet  wrote:
>
>
>
> On 5/30/19 8:04 PM, Yang Xiao wrote:
> > On Fri, May 31, 2019 at 1:17 AM Eric Dumazet  wrote:
> >>
> >>
> >>
> >> On 5/30/19 8:28 AM, Young Xiao wrote:
> >>> The fragmentation code tries to parse the header options in order
> >>> to figure out where to insert the fragment option.  Since nexthdr points
> >>> to an invalid option, the calculation of the size of the network header
> >>> can made to be much larger than the linear section of the skb and data
> >>> is read outside of it.
> >>>
> >>> This vulnerability is similar to CVE-2017-9074.
> >>>
> >>> Signed-off-by: Young Xiao <92siuy...@gmail.com>
> >>> ---
> >>>  net/ipv6/mip6.c | 24 ++--
> >>>  1 file changed, 14 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
> >>> index 64f0f7b..30ed1c5 100644
> >>> --- a/net/ipv6/mip6.c
> >>> +++ b/net/ipv6/mip6.c
> >>> @@ -263,8 +263,6 @@ static int mip6_destopt_offset(struct xfrm_state *x, 
> >>> struct sk_buff *skb,
> >>>  u8 **nexthdr)
> >>>  {
> >>>   u16 offset = sizeof(struct ipv6hdr);
> >>> - struct ipv6_opt_hdr *exthdr =
> >>> -(struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 
> >>> 1);
> >>>   const unsigned char *nh = skb_network_header(skb);
> >>>   unsigned int packet_len = skb_tail_pointer(skb) -
> >>>   skb_network_header(skb);
> >>> @@ -272,7 +270,8 @@ static int mip6_destopt_offset(struct xfrm_state *x, 
> >>> struct sk_buff *skb,
> >>>
> >>>   *nexthdr = _hdr(skb)->nexthdr;
> >>>
> >>> - while (offset + 1 <= packet_len) {
> >>> + while (offset <= packet_len) {
> >>> + struct ipv6_opt_hdr *exthdr;
> >>>
> >>>   switch (**nexthdr) {
> >>>   case NEXTHDR_HOP:
> >>> @@ -299,12 +298,15 @@ static int mip6_destopt_offset(struct xfrm_state 
> >>> *x, struct sk_buff *skb,
> >>>   return offset;
> >>>   }
> >>>
> >>> + if (offset + sizeof(struct ipv6_opt_hdr) > packet_len)
> >>> + return -EINVAL;
> >>> +
> >>> + exthdr = (struct ipv6_opt_hdr *)(nh + offset);
> >>>   offset += ipv6_optlen(exthdr);
> >>>   *nexthdr = >nexthdr;
> >>> - exthdr = (struct ipv6_opt_hdr *)(nh + offset);
> >>>   }
> >>>
> >>> - return offset;
> >>> + return -EINVAL;
> >>>  }
> >>>
> >>
> >>
> >> Ok, but have you checked that callers have been fixed ?
> >
> > I've checked the callers. There are two callers:
> > xfrm6_transport_output() and xfrm6_ro_output(). There are checks in
> > both function.
> >
> > --
> > hdr_len = x->type->hdr_offset(x, skb, );
> > if (hdr_len < 0)
> > return hdr_len;
> > --
> >>
> >> xfrm6_transport_output() seems buggy as well,
> >> unless the skbs are linearized before entering these functions ?
> > I can not understand what you mean about this comment.
> > Could you explain it in more detail.
>
>
> If we had a problem, then the memmove(ipv6_hdr(skb), iph, hdr_len);
>  in xfrm6_transport_output() would be buggy, since iph could also point to 
> freed memory.
>
>
>


-- 
Best regards!

Young
---


Re: [PATCH] ipvlan: Don't propagate IFF_ALLMULTI changes on down interfaces.

2019-06-03 Thread Yang Xiao
Based on your explanation, I found that the patch I submitted is useless.
Great thanks and sorry for the trouble.

On Mon, Jun 3, 2019 at 3:54 PM Xin Long  wrote:
>
> On Mon, Jun 3, 2019 at 11:22 AM Young Xiao <92siuy...@gmail.com> wrote:
> >
> > Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti
> > overflow on the underlying interface.
> >
> > Attempting the set IFF_ALLMULTI on the underlying interface would cause an
> > error and the log message:
> >
> > "allmulti touches root, set allmulti failed."
> s/root/roof
>
> I guess this patch was inspired by:
>
> commit bbeb0eadcf9fe74fb2b9b1a6fea82cd538b1e556
> Author: Peter Christensen 
> Date:   Thu May 8 11:15:37 2014 +0200
>
> macvlan: Don't propagate IFF_ALLMULTI changes on down interfaces.
>
> I could trigger this error on macvlan prior to this patch with:
>
>   # ip link add mymacvlan1 link eth2 type macvlan mode bridge
>   # ip link set mymacvlan1 up
>   # ip link set mymacvlan1 allmulticast on
>   # ip link set mymacvlan1 down
>   # ip link set mymacvlan1 allmulticast off
>   # ip link set mymacvlan1 allmulticast on
>
> but not on ipvlan, could you?
>
> macvlan had this problem, as lowerdev's allmulticast is cleared when doing
> dev_stop and set when doing dev_open, which doesn't happen on ipvlan.
>
> So I'd think this patch fixes nothing unless you want to add
> dev_set_allmulti(1/-1) in ipvlan_open/stop(), but that's another topic.
>
> did I miss something?
>
> >
> > Signed-off-by: Young Xiao <92siuy...@gmail.com>
> > ---
> >  drivers/net/ipvlan/ipvlan_main.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ipvlan/ipvlan_main.c 
> > b/drivers/net/ipvlan/ipvlan_main.c
> > index bbeb162..523bb83 100644
> > --- a/drivers/net/ipvlan/ipvlan_main.c
> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> > @@ -242,8 +242,10 @@ static void ipvlan_change_rx_flags(struct net_device 
> > *dev, int change)
> > struct ipvl_dev *ipvlan = netdev_priv(dev);
> > struct net_device *phy_dev = ipvlan->phy_dev;
> >
> > -   if (change & IFF_ALLMULTI)
> > -   dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : 
> > -1);
> > +   if (dev->flags & IFF_UP) {
> > +   if (change & IFF_ALLMULTI)
> > +   dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI 
> > ? 1 : -1);
> > +   }
> >  }
> >
> >  static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> > --
> > 2.7.4
> >


[PATCH] ipvlan: Don't propagate IFF_ALLMULTI changes on down interfaces.

2019-06-02 Thread Young Xiao
Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti
overflow on the underlying interface.

Attempting the set IFF_ALLMULTI on the underlying interface would cause an
error and the log message:

"allmulti touches root, set allmulti failed."

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index bbeb162..523bb83 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -242,8 +242,10 @@ static void ipvlan_change_rx_flags(struct net_device *dev, 
int change)
struct ipvl_dev *ipvlan = netdev_priv(dev);
struct net_device *phy_dev = ipvlan->phy_dev;
 
-   if (change & IFF_ALLMULTI)
-   dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
+   if (dev->flags & IFF_UP) {
+   if (change & IFF_ALLMULTI)
+   dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI ? 1 
: -1);
+   }
 }
 
 static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
-- 
2.7.4



[PATCH] unicore32: check stack pointer in get_wchan

2019-06-02 Thread Young Xiao
get_wchan() is lockless. Task may wakeup at any time and change its own 
stack, thus each next stack frame may be overwritten and filled with 
random stuff.

This patch fixes oops in unwind_frame() by adding stack pointer validation
on each step (as x86 code do), unwind_frame() already checks frame pointer.

See commit 1b15ec7a7427 ("ARM: 7912/1: check stack pointer in get_wchan")
for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 arch/unicore32/kernel/process.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 2bc10b8..1899ebc 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -277,6 +277,7 @@ EXPORT_SYMBOL(dump_fpu);
 unsigned long get_wchan(struct task_struct *p)
 {
struct stackframe frame;
+   unsigned long stack_page;
int count = 0;
if (!p || p == current || p->state == TASK_RUNNING)
return 0;
@@ -285,9 +286,11 @@ unsigned long get_wchan(struct task_struct *p)
frame.sp = thread_saved_sp(p);
frame.lr = 0;   /* recovered from the stack */
frame.pc = thread_saved_pc(p);
+   stack_page = (unsigned long)task_stack_page(p);
do {
-   int ret = unwind_frame();
-   if (ret < 0)
+   if (frame.sp < stack_page ||
+   frame.sp >= stack_page + THREAD_SIZE ||
+   unwind_frame() < 0)
return 0;
if (!in_sched_functions(frame.pc))
return frame.pc;
-- 
2.7.4



Re: [PATCH] ipv6: Prevent overrun when parsing v6 header options

2019-05-31 Thread Yang Xiao
On Fri, May 31, 2019 at 11:57 PM Eric Dumazet  wrote:
>
>
>
> On 5/31/19 7:54 AM, Herbert Xu wrote:
> > On Fri, May 31, 2019 at 07:50:06AM -0700, Eric Dumazet wrote:
> >>
> >> What do you mean by should ?
> >>
> >> Are they currently already linearized before the function is called,
> >> or is it missing and a bug needs to be fixed ?
> >
> > AFAICS this is the code-path for locally generated outbound packets.
> > Under what circumstances can the IPv6 header be not in the head?
> >
> >
>
> I guess this means we had yet another random submission from Young Xiao :/

Excuse me, what do you mean about random submission from Young?
A month ago, I submitted the patch, and I was told that the format
should be correct.
Then, I resubmitted again.


>
> Thanks.
>


[PATCH] pinctrl: pinctrl-mtk-common: fix a possible NULL pointer deference

2019-05-31 Thread Young Xiao
The function, external interrupt controller, is made as an optional to
pinctrl. But if we don't want pio behaves as an external interrupt
controller, it would lead to pctl->eint not be created properly and then
will cause 'kernel NULL pointer' issue when gpiochip try to call .to_irq
or .set_config. To fix it, check pctl->eint before accessing the member.

See commit 5f591543a937 ("pinctrl: mt7622: fix a kernel panic when pio
don't work as EINT controller") for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/pinctrl/mediatek/pinctrl-mtk-common.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c 
b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
index 0716238..b395f0b 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
+++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
@@ -836,6 +836,9 @@ static int mtk_gpio_to_irq(struct gpio_chip *chip, unsigned 
offset)
const struct mtk_desc_pin *pin;
unsigned long eint_n;
 
+   if (!pctl->eint)
+   return -ENOTSUPP;
+
pin = pctl->devdata->pins + offset;
if (pin->eint.eintnum == NO_EINT_SUPPORT)
return -EINVAL;
@@ -853,7 +856,8 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, 
unsigned offset,
unsigned long eint_n;
u32 debounce;
 
-   if (pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE)
+   if (!pctl->eint ||
+   pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE)
return -ENOTSUPP;
 
pin = pctl->devdata->pins + offset;
-- 
2.7.4



Re: [PATCH] ipv6: Prevent overrun when parsing v6 header options

2019-05-30 Thread Yang Xiao
On Fri, May 31, 2019 at 1:17 AM Eric Dumazet  wrote:
>
>
>
> On 5/30/19 8:28 AM, Young Xiao wrote:
> > The fragmentation code tries to parse the header options in order
> > to figure out where to insert the fragment option.  Since nexthdr points
> > to an invalid option, the calculation of the size of the network header
> > can made to be much larger than the linear section of the skb and data
> > is read outside of it.
> >
> > This vulnerability is similar to CVE-2017-9074.
> >
> > Signed-off-by: Young Xiao <92siuy...@gmail.com>
> > ---
> >  net/ipv6/mip6.c | 24 ++--
> >  1 file changed, 14 insertions(+), 10 deletions(-)
> >
> > diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
> > index 64f0f7b..30ed1c5 100644
> > --- a/net/ipv6/mip6.c
> > +++ b/net/ipv6/mip6.c
> > @@ -263,8 +263,6 @@ static int mip6_destopt_offset(struct xfrm_state *x, 
> > struct sk_buff *skb,
> >  u8 **nexthdr)
> >  {
> >   u16 offset = sizeof(struct ipv6hdr);
> > - struct ipv6_opt_hdr *exthdr =
> > -(struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 1);
> >   const unsigned char *nh = skb_network_header(skb);
> >   unsigned int packet_len = skb_tail_pointer(skb) -
> >   skb_network_header(skb);
> > @@ -272,7 +270,8 @@ static int mip6_destopt_offset(struct xfrm_state *x, 
> > struct sk_buff *skb,
> >
> >   *nexthdr = _hdr(skb)->nexthdr;
> >
> > - while (offset + 1 <= packet_len) {
> > + while (offset <= packet_len) {
> > + struct ipv6_opt_hdr *exthdr;
> >
> >   switch (**nexthdr) {
> >   case NEXTHDR_HOP:
> > @@ -299,12 +298,15 @@ static int mip6_destopt_offset(struct xfrm_state *x, 
> > struct sk_buff *skb,
> >   return offset;
> >   }
> >
> > + if (offset + sizeof(struct ipv6_opt_hdr) > packet_len)
> > + return -EINVAL;
> > +
> > + exthdr = (struct ipv6_opt_hdr *)(nh + offset);
> >   offset += ipv6_optlen(exthdr);
> >   *nexthdr = >nexthdr;
> > - exthdr = (struct ipv6_opt_hdr *)(nh + offset);
> >   }
> >
> > - return offset;
> > + return -EINVAL;
> >  }
> >
>
>
> Ok, but have you checked that callers have been fixed ?

I've checked the callers. There are two callers:
xfrm6_transport_output() and xfrm6_ro_output(). There are checks in
both function.

--
hdr_len = x->type->hdr_offset(x, skb, );
if (hdr_len < 0)
return hdr_len;
--
>
> xfrm6_transport_output() seems buggy as well,
> unless the skbs are linearized before entering these functions ?
I can not understand what you mean about this comment.
Could you explain it in more detail.

>
> Thanks.
>
>
>


[PATCH] ipv6: Prevent overrun when parsing v6 header options

2019-05-30 Thread Young Xiao
The fragmentation code tries to parse the header options in order
to figure out where to insert the fragment option.  Since nexthdr points
to an invalid option, the calculation of the size of the network header
can made to be much larger than the linear section of the skb and data
is read outside of it.

This vulnerability is similar to CVE-2017-9074.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 net/ipv6/mip6.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
index 64f0f7b..30ed1c5 100644
--- a/net/ipv6/mip6.c
+++ b/net/ipv6/mip6.c
@@ -263,8 +263,6 @@ static int mip6_destopt_offset(struct xfrm_state *x, struct 
sk_buff *skb,
   u8 **nexthdr)
 {
u16 offset = sizeof(struct ipv6hdr);
-   struct ipv6_opt_hdr *exthdr =
-  (struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 1);
const unsigned char *nh = skb_network_header(skb);
unsigned int packet_len = skb_tail_pointer(skb) -
skb_network_header(skb);
@@ -272,7 +270,8 @@ static int mip6_destopt_offset(struct xfrm_state *x, struct 
sk_buff *skb,
 
*nexthdr = _hdr(skb)->nexthdr;
 
-   while (offset + 1 <= packet_len) {
+   while (offset <= packet_len) {
+   struct ipv6_opt_hdr *exthdr;
 
switch (**nexthdr) {
case NEXTHDR_HOP:
@@ -299,12 +298,15 @@ static int mip6_destopt_offset(struct xfrm_state *x, 
struct sk_buff *skb,
return offset;
}
 
+   if (offset + sizeof(struct ipv6_opt_hdr) > packet_len)
+   return -EINVAL;
+
+   exthdr = (struct ipv6_opt_hdr *)(nh + offset);
offset += ipv6_optlen(exthdr);
*nexthdr = >nexthdr;
-   exthdr = (struct ipv6_opt_hdr *)(nh + offset);
}
 
-   return offset;
+   return -EINVAL;
 }
 
 static int mip6_destopt_init_state(struct xfrm_state *x)
@@ -399,8 +401,6 @@ static int mip6_rthdr_offset(struct xfrm_state *x, struct 
sk_buff *skb,
 u8 **nexthdr)
 {
u16 offset = sizeof(struct ipv6hdr);
-   struct ipv6_opt_hdr *exthdr =
-  (struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 1);
const unsigned char *nh = skb_network_header(skb);
unsigned int packet_len = skb_tail_pointer(skb) -
skb_network_header(skb);
@@ -408,7 +408,8 @@ static int mip6_rthdr_offset(struct xfrm_state *x, struct 
sk_buff *skb,
 
*nexthdr = _hdr(skb)->nexthdr;
 
-   while (offset + 1 <= packet_len) {
+   while (offset <= packet_len) {
+   struct ipv6_opt_hdr *exthdr;
 
switch (**nexthdr) {
case NEXTHDR_HOP:
@@ -434,12 +435,15 @@ static int mip6_rthdr_offset(struct xfrm_state *x, struct 
sk_buff *skb,
return offset;
}
 
+   if (offset + sizeof(struct ipv6_opt_hdr) > packet_len)
+   return -EINVAL;
+
+   exthdr = (struct ipv6_opt_hdr *)(nh + offset);
offset += ipv6_optlen(exthdr);
*nexthdr = >nexthdr;
-   exthdr = (struct ipv6_opt_hdr *)(nh + offset);
}
 
-   return offset;
+   return -EINVAL;
 }
 
 static int mip6_rthdr_init_state(struct xfrm_state *x)
-- 
2.7.4



Re: [PATCH] amd64-agp: fix arbitrary kernel memory writes

2019-05-29 Thread Yang Xiao
I am not so sure about taking off the cast, just to be in line with
patch in 194b3da873fd.
The comment can be deleted.

On Wed, May 29, 2019 at 4:35 PM Greg KH  wrote:
>
> On Wed, May 29, 2019 at 12:52:01PM +0800, Young Xiao wrote:
> > pg_start is copied from userspace on AGPIOC_BIND and AGPIOC_UNBIND ioctl
> > cmds of agp_ioctl() and passed to agpioc_bind_wrap().  As said in the
> > comment, (pg_start + mem->page_count) may wrap in case of AGPIOC_BIND,
> > and it is not checked at all in case of AGPIOC_UNBIND.  As a result, user
> > with sufficient privileges (usually "video" group) may generate either
> > local DoS or privilege escalation.
> >
> > See commit 194b3da873fd ("agp: fix arbitrary kernel memory writes")
> > for details.
> >
> > Signed-off-by: Young Xiao <92siuy...@gmail.com>
> > ---
> >  drivers/char/agp/amd64-agp.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c
> > index c69e39f..5daa0e3 100644
> > --- a/drivers/char/agp/amd64-agp.c
> > +++ b/drivers/char/agp/amd64-agp.c
> > @@ -60,7 +60,8 @@ static int amd64_insert_memory(struct agp_memory *mem, 
> > off_t pg_start, int type)
> >
> >   /* Make sure we can fit the range in the gatt table. */
> >   /* FIXME: could wrap */
> > - if (((unsigned long)pg_start + mem->page_count) > num_entries)
> > + if (((pg_start + mem->page_count) > num_entries) ||
> > + ((pg_start + mem->page_count) < pg_start))
>
> Why did you take off the cast for the first test?
>
> And if this really does fix this issue, should you remove the FIXME
> line?
>
> thanks,
>
> greg k-h



-- 
Best regards!

Young
---


Re: [PATCH] ipv4: tcp_input: fix stack out of bounds when parsing TCP options.

2019-05-29 Thread Yang Xiao
Indeed, condition opsize < 2 and opsize > length can deduce that length >= 2.
However, before the condition (if opsize < 2), there may be one-byte
out-of-bound access in line 12.
I'm not sure whether I have put it very clearly.

On Wed, May 29, 2019 at 10:20 PM Eric Dumazet  wrote:
>
> On Wed, May 29, 2019 at 1:10 AM Young Xiao <92siuy...@gmail.com> wrote:
> >
> > The TCP option parsing routines in tcp_parse_options function could
> > read one byte out of the buffer of the TCP options.
> >
> > 1 while (length > 0) {
> > 2 int opcode = *ptr++;
> > 3 int opsize;
> > 4
> > 5 switch (opcode) {
> > 6 case TCPOPT_EOL:
> > 7 return;
> > 8 case TCPOPT_NOP:/* Ref: RFC 793 section 3.1 */
> > 9 length--;
> > 10continue;
> > 11default:
> > 12opsize = *ptr++; //out of bound access
> >
> > If length = 1, then there is an access in line2.
> > And another access is occurred in line 12.
> > This would lead to out-of-bound access.
> >
> > Therefore, in the patch we check that the available data length is
> > larger enough to pase both TCP option code and size.
> >
> > Signed-off-by: Young Xiao <92siuy...@gmail.com>
> > ---
> >  net/ipv4/tcp_input.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 20f6fac..9775825 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -3791,6 +3791,8 @@ void tcp_parse_options(const struct net *net,
> > length--;
> > continue;
> > default:
> > +   if (length < 2)
> > +   return;
> > opsize = *ptr++;
> > if (opsize < 2) /* "silly options" */
> > return;
>
> In practice we are good, since we have at least 320 bytes of room there,
> and the test done later catches silly options.
>
> if (opsize < 2) /* "silly options" */
> return;
> if (opsize > length)   /* remember, opsize >= 2 here */
>  return; /* don't parse partial options */
>
> I guess adding yet another conditional will make this code obviously
> correct for all eyes
> and various tools.
>
> Thanks.
>
> Signed-off-by: Eric Dumazet 



-- 
Best regards!

Young
---


[PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()

2019-05-29 Thread Young Xiao
If vpif_probe() fails on vpif_probe_complete(), then memory
allocated at initialize_vpif() for global vpif_obj.dev[i]
become unreleased.

The patch adds deallocation of vpif_obj.dev[i] on the error path.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/media/platform/davinci/vpif_capture.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/davinci/vpif_capture.c 
b/drivers/media/platform/davinci/vpif_capture.c
index b5aacb0..63e6ec4 100644
--- a/drivers/media/platform/davinci/vpif_capture.c
+++ b/drivers/media/platform/davinci/vpif_capture.c
@@ -1621,6 +1621,14 @@ vpif_capture_get_pdata(struct platform_device *pdev)
return NULL;
 }
 
+static void free_vpif_objs(void)
+{
+   int i;
+
+   for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++)
+   kfree(vpif_obj.dev[i]);
+}
+
 /**
  * vpif_probe : This function probes the vpif capture driver
  * @pdev: platform device pointer
@@ -1701,7 +1709,10 @@ static __init int vpif_probe(struct platform_device 
*pdev)
  "registered sub device %s\n",
   subdevdata->name);
}
-   vpif_probe_complete();
+   err = vpif_probe_complete();
+   if (err) {
+   goto probe_subdev_out;
+   }
} else {
vpif_obj.notifier.ops = _async_ops;
err = v4l2_async_notifier_register(_obj.v4l2_dev,
@@ -1722,6 +1733,7 @@ static __init int vpif_probe(struct platform_device *pdev)
v4l2_device_unregister(_obj.v4l2_dev);
 cleanup:
v4l2_async_notifier_cleanup(_obj.notifier);
+   free_vpif_objs();
 
return err;
 }
@@ -1748,8 +1760,8 @@ static int vpif_remove(struct platform_device *device)
ch = vpif_obj.dev[i];
/* Unregister video device */
video_unregister_device(>video_dev);
-   kfree(vpif_obj.dev[i]);
}
+   free_vpif_objs()
return 0;
 }
 
-- 
2.7.4



[PATCH] isdn: hisax: hfc_2bds0: Fix a possible concurrency use-after-free bug in HFCD_l1hw()

2019-05-29 Thread Young Xiao
In drivers/isdn/hisax/hfc_2bds0.c, the function hfc2bds0_interrupt() and
HFCD_l1hw() may be concurrently executed.

HFCD_l1hw()
line 969: if (!cs->tx_skb)

hfc2bds0_interrupt()
line 875: dev_kfree_skb_irq(cs->tx_skb);

Thus, a possible concurrency use-after-free bug may occur in HFCD_l1hw().

To fix these bugs, the calls to spin_lock_irqsave() and
spin_unlock_irqrestore() are added in HFCD_l1hw(), to protect the
access to cs->tx_skb.

See commit 7418e6520f22 ("isdn: hisax: hfc_pci: Fix a possible concurrency
use-after-free bug in HFCPCI_l1hw()") for details.

Signed-off-by: Young Xiao <92siuy...@gmail.com>
---
 drivers/isdn/hisax/hfc_2bds0.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/isdn/hisax/hfc_2bds0.c b/drivers/isdn/hisax/hfc_2bds0.c
index 3715fa0..ade12c0 100644
--- a/drivers/isdn/hisax/hfc_2bds0.c
+++ b/drivers/isdn/hisax/hfc_2bds0.c
@@ -966,11 +966,13 @@ HFCD_l1hw(struct PStack *st, int pr, void *arg)
if (cs->debug & L1_DEB_LAPD)
debugl1(cs, "-> PH_REQUEST_PULL");
 #endif
+   spin_lock_irqsave(>lock, flags);
if (!cs->tx_skb) {
test_and_clear_bit(FLG_L1_PULL_REQ, >l1.Flags);
st->l1.l1l2(st, PH_PULL | CONFIRM, NULL);
} else
test_and_set_bit(FLG_L1_PULL_REQ, >l1.Flags);
+   spin_unlock_irqrestore(>lock, flags);
break;
case (HW_RESET | REQUEST):
spin_lock_irqsave(>lock, flags);
-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >