Re: [PATCH] memory tier: rename destroy_memory_type() to put_memory_type()
On 2023/7/6 14:39, Miaohe Lin wrote: It appears that destroy_memory_type() isn't a very good name because we usually will not free the memory_type here. So rename it to a more appropriate name i.e. put_memory_type(). Suggested-by: Huang, Ying Signed-off-by: Miaohe Lin --- drivers/dax/kmem.c | 4 ++-- include/linux/memory-tiers.h | 4 ++-- mm/memory-tiers.c| 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index 898ca9505754..c57acb73e3db 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -264,7 +264,7 @@ static int __init dax_kmem_init(void) return rc; error_dax_driver: - destroy_memory_type(dax_slowmem_type); + put_memory_type(dax_slowmem_type); err_dax_slowmem_type: kfree_const(kmem_name); return rc; @@ -275,7 +275,7 @@ static void __exit dax_kmem_exit(void) dax_driver_unregister(_dax_kmem_driver); if (!any_hotremove_failed) kfree_const(kmem_name); - destroy_memory_type(dax_slowmem_type); + put_memory_type(dax_slowmem_type); } MODULE_AUTHOR("Intel Corporation"); diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index fc9647b1b4f9..437441cdf78f 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -33,7 +33,7 @@ struct memory_dev_type { #ifdef CONFIG_NUMA extern bool numa_demotion_enabled; struct memory_dev_type *alloc_memory_type(int adistance); -void destroy_memory_type(struct memory_dev_type *memtype); +void put_memory_type(struct memory_dev_type *memtype); void init_node_memory_type(int node, struct memory_dev_type *default_type); void clear_node_memory_type(int node, struct memory_dev_type *memtype); #ifdef CONFIG_MIGRATION @@ -68,7 +68,7 @@ static inline struct memory_dev_type *alloc_memory_type(int adistance) return NULL; } -static inline void destroy_memory_type(struct memory_dev_type *memtype) +static inline void put_memory_type(struct memory_dev_type *memtype) { } diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 1719fa3bcf02..c49ab03f49b1 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -560,11 +560,11 @@ struct memory_dev_type *alloc_memory_type(int adistance) } EXPORT_SYMBOL_GPL(alloc_memory_type); -void destroy_memory_type(struct memory_dev_type *memtype) +void put_memory_type(struct memory_dev_type *memtype) { kref_put(>kref, release_memtype); } -EXPORT_SYMBOL_GPL(destroy_memory_type); +EXPORT_SYMBOL_GPL(put_memory_type); void init_node_memory_type(int node, struct memory_dev_type *memtype) { @@ -586,7 +586,7 @@ void clear_node_memory_type(int node, struct memory_dev_type *memtype) */ if (!node_memory_types[node].map_count) { node_memory_types[node].memtype = NULL; - destroy_memory_type(memtype); + put_memory_type(memtype); Hi Maohe, I didn't find that destroy_memory_type(memtype) is called here on mainline kernel. Did I miss something? Other than that, it looks good to me. Reviewed-by: Xiao Yang Best Regards, Xiao Yang } mutex_unlock(_tier_lock); }
Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition
Hi Darrick, Brian and Christoph Ping. I hope to get your feedback. 1) I have confirmed that the following patch set did not change the test result of generic/470 with thin-volume. Besides, I didn't see any failure when running generic/470 based on normal PMEM device instaed of thin-volume. https://lore.kernel.org/linux-xfs/20211129102203.2243509-1-...@lst.de/ 2) I can reproduce the failure of generic/482 without thin-volume. 3) Is it necessary to make thin-volume support DAX. Is there any use case for the requirement? Best Regards, Xiao Yang On 2022/9/16 10:04, Yang, Xiao/杨 晓 wrote: On 2022/9/15 18:14, Yang, Xiao/杨 晓 wrote: On 2022/9/15 0:28, Darrick J. Wong wrote: On Wed, Sep 14, 2022 at 08:34:26AM -0400, Brian Foster wrote: On Wed, Sep 14, 2022 at 05:38:02PM +0800, Yang, Xiao/杨 晓 wrote: On 2022/9/14 14:44, Yang, Xiao/杨 晓 wrote: On 2022/9/9 21:01, Brian Foster wrote: Yes.. I don't recall all the internals of the tools and test, but IIRC it relied on discard to perform zeroing between checkpoints or some such and avoid spurious failures. The purpose of running on dm-thin was merely to provide reliable discard zeroing behavior on the target device and thus to allow the test to run reliably. Hi Brian, As far as I know, generic/470 was original designed to verify mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the reason, we need to ensure that all underlying devices under dm-log-writes device support DAX. However dm-thin device never supports DAX so running generic/470 with dm-thin device always returns "not run". Please see the difference between old and new logic: old logic new logic --- log-writes device(DAX) log-writes device(DAX) | | PMEM0(DAX) + PMEM1(DAX) Thin device(non-DAX) + PMEM1(DAX) | PMEM0(DAX) --- We think dm-thin device is not a good solution for generic/470, is there any other solution to support both discard zero and DAX? Hi Brian, I have sent a patch[1] to revert your fix because I think it's not good for generic/470 to use thin volume as my revert patch[1] describes: [1] https://lore.kernel.org/fstests/20220914090625.32207-1-yangx...@fujitsu.com/T/#u I think the history here is that generic/482 was changed over first in commit 65cc9a235919 ("generic/482: use thin volume as data device"), and then sometime later we realized generic/455,457,470 had the same general flaw and were switched over. The dm/dax compatibility thing was probably just an oversight, but I am a little curious about that because it should It's not an oversight -- it used to work (albeit with EXPERIMENTAL tags), and now we've broken it on fsdax as the pmem/blockdev divorce progresses. Hi Do you mean that the following patch set changed the test result of generic/470 with thin-volume? (pass => not run/failure) https://lore.kernel.org/linux-xfs/20211129102203.2243509-1-...@lst.de/ have been obvious that the change caused the test to no longer run. Did something change after that to trigger that change in behavior? With the revert, generic/470 can always run successfully on my environment so I wonder how to reproduce the out-of-order replay issue on XFS v5 filesystem? I don't quite recall the characteristics of the failures beyond that we were seeing spurious test failures with generic/482 that were due to essentially putting the fs/log back in time in a way that wasn't quite accurate due to the clearing by the logwrites tool not taking place. If you wanted to reproduce in order to revisit that, perhaps start with generic/482 and let it run in a loop for a while and see if it eventually triggers a failure/corruption..? PS: I want to reproduce the issue and try to find a better solution to fix it. It's been a while since I looked at any of this tooling to semi-grok how it works. I /think/ this was the crux of the problem, back in 2019? https://lore.kernel.org/fstests/20190227061529.GF16436@dastard/ Agreed. Perhaps it could learn to rely on something more explicit like zero range (instead of discard?) or fall back to manual zeroing? AFAICT src/log-writes/ actually /can/ do zeroing, but (a) it probably ought to be adapted to call BLKZEROOUT and (b) in the worst case it writes zeroes to the entire device, which is/can be slow. For a (crass) example, one of my cloudy test VMs uses 34GB partitions, and for cost optimization purposes we're only "paying" for the cheapest tier. Weirdly that maps to an upper limit of 6500 write iops and 48MB/s(!) but that would take about 20 minutes to zero the entire device if the dm-thin hack wasn't in place. Frustratingly, it doesn't support disca
Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition
On 2022/9/15 18:14, Yang, Xiao/杨 晓 wrote: On 2022/9/15 0:28, Darrick J. Wong wrote: On Wed, Sep 14, 2022 at 08:34:26AM -0400, Brian Foster wrote: On Wed, Sep 14, 2022 at 05:38:02PM +0800, Yang, Xiao/杨 晓 wrote: On 2022/9/14 14:44, Yang, Xiao/杨 晓 wrote: On 2022/9/9 21:01, Brian Foster wrote: Yes.. I don't recall all the internals of the tools and test, but IIRC it relied on discard to perform zeroing between checkpoints or some such and avoid spurious failures. The purpose of running on dm-thin was merely to provide reliable discard zeroing behavior on the target device and thus to allow the test to run reliably. Hi Brian, As far as I know, generic/470 was original designed to verify mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the reason, we need to ensure that all underlying devices under dm-log-writes device support DAX. However dm-thin device never supports DAX so running generic/470 with dm-thin device always returns "not run". Please see the difference between old and new logic: old logic new logic --- log-writes device(DAX) log-writes device(DAX) | | PMEM0(DAX) + PMEM1(DAX) Thin device(non-DAX) + PMEM1(DAX) | PMEM0(DAX) --- We think dm-thin device is not a good solution for generic/470, is there any other solution to support both discard zero and DAX? Hi Brian, I have sent a patch[1] to revert your fix because I think it's not good for generic/470 to use thin volume as my revert patch[1] describes: [1] https://lore.kernel.org/fstests/20220914090625.32207-1-yangx...@fujitsu.com/T/#u I think the history here is that generic/482 was changed over first in commit 65cc9a235919 ("generic/482: use thin volume as data device"), and then sometime later we realized generic/455,457,470 had the same general flaw and were switched over. The dm/dax compatibility thing was probably just an oversight, but I am a little curious about that because it should It's not an oversight -- it used to work (albeit with EXPERIMENTAL tags), and now we've broken it on fsdax as the pmem/blockdev divorce progresses. Hi Do you mean that the following patch set changed the test result of generic/470 with thin-volume? (pass => not run/failure) https://lore.kernel.org/linux-xfs/20211129102203.2243509-1-...@lst.de/ have been obvious that the change caused the test to no longer run. Did something change after that to trigger that change in behavior? With the revert, generic/470 can always run successfully on my environment so I wonder how to reproduce the out-of-order replay issue on XFS v5 filesystem? I don't quite recall the characteristics of the failures beyond that we were seeing spurious test failures with generic/482 that were due to essentially putting the fs/log back in time in a way that wasn't quite accurate due to the clearing by the logwrites tool not taking place. If you wanted to reproduce in order to revisit that, perhaps start with generic/482 and let it run in a loop for a while and see if it eventually triggers a failure/corruption..? PS: I want to reproduce the issue and try to find a better solution to fix it. It's been a while since I looked at any of this tooling to semi-grok how it works. I /think/ this was the crux of the problem, back in 2019? https://lore.kernel.org/fstests/20190227061529.GF16436@dastard/ Agreed. Perhaps it could learn to rely on something more explicit like zero range (instead of discard?) or fall back to manual zeroing? AFAICT src/log-writes/ actually /can/ do zeroing, but (a) it probably ought to be adapted to call BLKZEROOUT and (b) in the worst case it writes zeroes to the entire device, which is/can be slow. For a (crass) example, one of my cloudy test VMs uses 34GB partitions, and for cost optimization purposes we're only "paying" for the cheapest tier. Weirdly that maps to an upper limit of 6500 write iops and 48MB/s(!) but that would take about 20 minutes to zero the entire device if the dm-thin hack wasn't in place. Frustratingly, it doesn't support discard or write-zeroes. Do you mean that discard zero(BLKDISCARD) is faster than both fill zero(BLKZEROOUT) and write zero on user space? Hi Darrick, Brian and Christoph According to the discussion about generic/470. I wonder if it is necessary to make thin-pool support DAX. Is there any use case for the requirement? Best Regards, Xiao Yang Best Regards, Xiao Yang If the eventual solution is simple and low enough overhead, it might make some sense to replace the dmthin hack across the set of tests mentioned above. That said, for a *pmem* test you'd expect it to be faster than that... --D
Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition
On 2022/9/15 0:28, Darrick J. Wong wrote: On Wed, Sep 14, 2022 at 08:34:26AM -0400, Brian Foster wrote: On Wed, Sep 14, 2022 at 05:38:02PM +0800, Yang, Xiao/杨 晓 wrote: On 2022/9/14 14:44, Yang, Xiao/杨 晓 wrote: On 2022/9/9 21:01, Brian Foster wrote: Yes.. I don't recall all the internals of the tools and test, but IIRC it relied on discard to perform zeroing between checkpoints or some such and avoid spurious failures. The purpose of running on dm-thin was merely to provide reliable discard zeroing behavior on the target device and thus to allow the test to run reliably. Hi Brian, As far as I know, generic/470 was original designed to verify mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the reason, we need to ensure that all underlying devices under dm-log-writes device support DAX. However dm-thin device never supports DAX so running generic/470 with dm-thin device always returns "not run". Please see the difference between old and new logic: old logic new logic --- log-writes device(DAX) log-writes device(DAX) | | PMEM0(DAX) + PMEM1(DAX) Thin device(non-DAX) + PMEM1(DAX) | PMEM0(DAX) --- We think dm-thin device is not a good solution for generic/470, is there any other solution to support both discard zero and DAX? Hi Brian, I have sent a patch[1] to revert your fix because I think it's not good for generic/470 to use thin volume as my revert patch[1] describes: [1] https://lore.kernel.org/fstests/20220914090625.32207-1-yangx...@fujitsu.com/T/#u I think the history here is that generic/482 was changed over first in commit 65cc9a235919 ("generic/482: use thin volume as data device"), and then sometime later we realized generic/455,457,470 had the same general flaw and were switched over. The dm/dax compatibility thing was probably just an oversight, but I am a little curious about that because it should It's not an oversight -- it used to work (albeit with EXPERIMENTAL tags), and now we've broken it on fsdax as the pmem/blockdev divorce progresses. Hi Do you mean that the following patch set changed the test result of generic/470 with thin-volume? (pass => not run/failure) https://lore.kernel.org/linux-xfs/20211129102203.2243509-1-...@lst.de/ have been obvious that the change caused the test to no longer run. Did something change after that to trigger that change in behavior? With the revert, generic/470 can always run successfully on my environment so I wonder how to reproduce the out-of-order replay issue on XFS v5 filesystem? I don't quite recall the characteristics of the failures beyond that we were seeing spurious test failures with generic/482 that were due to essentially putting the fs/log back in time in a way that wasn't quite accurate due to the clearing by the logwrites tool not taking place. If you wanted to reproduce in order to revisit that, perhaps start with generic/482 and let it run in a loop for a while and see if it eventually triggers a failure/corruption..? PS: I want to reproduce the issue and try to find a better solution to fix it. It's been a while since I looked at any of this tooling to semi-grok how it works. I /think/ this was the crux of the problem, back in 2019? https://lore.kernel.org/fstests/20190227061529.GF16436@dastard/ Agreed. Perhaps it could learn to rely on something more explicit like zero range (instead of discard?) or fall back to manual zeroing? AFAICT src/log-writes/ actually /can/ do zeroing, but (a) it probably ought to be adapted to call BLKZEROOUT and (b) in the worst case it writes zeroes to the entire device, which is/can be slow. For a (crass) example, one of my cloudy test VMs uses 34GB partitions, and for cost optimization purposes we're only "paying" for the cheapest tier. Weirdly that maps to an upper limit of 6500 write iops and 48MB/s(!) but that would take about 20 minutes to zero the entire device if the dm-thin hack wasn't in place. Frustratingly, it doesn't support discard or write-zeroes. Do you mean that discard zero(BLKDISCARD) is faster than both fill zero(BLKZEROOUT) and write zero on user space? Best Regards, Xiao Yang If the eventual solution is simple and low enough overhead, it might make some sense to replace the dmthin hack across the set of tests mentioned above. That said, for a *pmem* test you'd expect it to be faster than that... --D Brian Best Regards, Xiao Yang BTW, only log-writes, stripe and linear support DAX for now.
Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition
On 2022/9/14 14:44, Yang, Xiao/杨 晓 wrote: On 2022/9/9 21:01, Brian Foster wrote: Yes.. I don't recall all the internals of the tools and test, but IIRC it relied on discard to perform zeroing between checkpoints or some such and avoid spurious failures. The purpose of running on dm-thin was merely to provide reliable discard zeroing behavior on the target device and thus to allow the test to run reliably. Hi Brian, As far as I know, generic/470 was original designed to verify mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the reason, we need to ensure that all underlying devices under dm-log-writes device support DAX. However dm-thin device never supports DAX so running generic/470 with dm-thin device always returns "not run". Please see the difference between old and new logic: old logic new logic --- log-writes device(DAX) log-writes device(DAX) | | PMEM0(DAX) + PMEM1(DAX) Thin device(non-DAX) + PMEM1(DAX) | PMEM0(DAX) --- We think dm-thin device is not a good solution for generic/470, is there any other solution to support both discard zero and DAX? Hi Brian, I have sent a patch[1] to revert your fix because I think it's not good for generic/470 to use thin volume as my revert patch[1] describes: [1] https://lore.kernel.org/fstests/20220914090625.32207-1-yangx...@fujitsu.com/T/#u With the revert, generic/470 can always run successfully on my environment so I wonder how to reproduce the out-of-order replay issue on XFS v5 filesystem? PS: I want to reproduce the issue and try to find a better solution to fix it. Best Regards, Xiao Yang BTW, only log-writes, stripe and linear support DAX for now.
Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition
On 2022/9/9 21:01, Brian Foster wrote: Yes.. I don't recall all the internals of the tools and test, but IIRC it relied on discard to perform zeroing between checkpoints or some such and avoid spurious failures. The purpose of running on dm-thin was merely to provide reliable discard zeroing behavior on the target device and thus to allow the test to run reliably. Hi Brian, As far as I know, generic/470 was original designed to verify mmap(MAP_SYNC) on the dm-log-writes device enabling DAX. Due to the reason, we need to ensure that all underlying devices under dm-log-writes device support DAX. However dm-thin device never supports DAX so running generic/470 with dm-thin device always returns "not run". Please see the difference between old and new logic: old logic new logic --- log-writes device(DAX) log-writes device(DAX) | | PMEM0(DAX) + PMEM1(DAX) Thin device(non-DAX) + PMEM1(DAX) | PMEM0(DAX) --- We think dm-thin device is not a good solution for generic/470, is there any other solution to support both discard zero and DAX? BTW, only log-writes, stripe and linear support DAX for now. Best Regards, Xiao Yang
Re: PROBLEM: Recent raid10 block discard patchset causes filesystem corruption on fstrim
On 02/02/2021 11:42 AM, Matthew Ruffell wrote: Hi Xiao, On 24/12/20 11:18 pm, Xiao Ni wrote:> The root cause is found. Now we use a similar way with raid0 to handle discard request for raid10. Because the discard region is very big, we can calculate the start/end address for each disk. Then we can submit the discard request to each disk. But for raid10, it has copies. For near layout, if the discard request doesn't align with chunk size, we calculate a start_disk_offset. Now we only use start_disk_offset for the first disk, but it should be used for the near copies disks too. Thanks for finding the root cause and making a patch that corrects the offset addresses for multiple disks! [ 789.709501] discard bio start : 70968, size : 191176 [ 789.709507] first stripe index 69, start disk index 0, start disk offset 70968 [ 789.709509] last stripe index 256, end disk index 0, end disk offset 262144 [ 789.709511] disk 0, dev start : 70968, dev end : 262144 [ 789.709515] disk 1, dev start : 70656, dev end : 262144 For example, in this test case, it has 2 near copies. The start_disk_offset for the first disk is 70968. It should use the same offset address for second disk. But it uses the start address of this chunk. It discard more region. The patch in the attachment can fix this problem. It split the region that doesn't align with chunk size. Just wondering, what is the current status of the patchset? Is there anything that I can do to help? There is another problem. The stripe size should be calculated differently for near layout and far layout. I can help review the patch and help test the patches anytime. Do you need help with making a patch to calculate the stripe size for near and far layouts? Let me know how you are going with this patchset, and if there is anything I can do for you. Thanks, Matthew Hi Matthew I'm doing the test for the new patch set. I'll send the patch soon again. Thanks for the help. Regards Xiao
Re: PROBLEM: Recent raid10 block discard patchset causes filesystem corruption on fstrim
On 12/09/2020 12:17 PM, Song Liu wrote: Hi Matthew, On Dec 8, 2020, at 7:46 PM, Matthew Ruffell wrote: Hello, I recently backported the following patches into the Ubuntu stable kernels: md: add md_submit_discard_bio() for submitting discard bio md/raid10: extend r10bio devs to raid disks md/raid10: pull codes that wait for blocked dev into one function md/raid10: improve raid10 discard request md/raid10: improve discard request for far layout dm raid: fix discard limits for raid1 and raid10 dm raid: remove unnecessary discard limits for raid10 Thanks for the report! Hi Xiao, Could you please take a look at this and let me know soon? We need to fix this before 5.10 official release. Thanks, Song Hi all The root cause is found. Now we use a similar way with raid0 to handle discard request for raid10. Because the discard region is very big, we can calculate the start/end address for each disk. Then we can submit the discard request to each disk. But for raid10, it has copies. For near layout, if the discard request doesn't align with chunk size, we calculate a start_disk_offset. Now we only use start_disk_offset for the first disk, but it should be used for the near copies disks too. [ 789.709501] discard bio start : 70968, size : 191176 [ 789.709507] first stripe index 69, start disk index 0, start disk offset 70968 [ 789.709509] last stripe index 256, end disk index 0, end disk offset 262144 [ 789.709511] disk 0, dev start : 70968, dev end : 262144 [ 789.709515] disk 1, dev start : 70656, dev end : 262144 For example, in this test case, it has 2 near copies. The start_disk_offset for the first disk is 70968. It should use the same offset address for second disk. But it uses the start address of this chunk. It discard more region. The patch in the attachment can fix this problem. It split the region that doesn't align with chunk size. There is another problem. The stripe size should be calculated differently for near layout and far layout. @Song, do you want me to use a separate patch for this fix, or fix this in the original patch? Merry Christmas Xiao commit 0d74ac66ed0ec5af70296545e26044723a14657c Author: Xiao Ni Date: Thu Dec 24 17:58:43 2020 +0800 fix diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 3153183b7772..92182cf40d22 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1604,6 +1604,7 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio) sector_t chunk; unsigned int stripe_size; sector_t split_size; + sector_t chunk_size = 1 << geo->chunk_shift; sector_t bio_start, bio_end; sector_t first_stripe_index, last_stripe_index; @@ -1624,7 +1625,8 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio) if (test_bit(MD_RECOVERY_RESHAPE, >recovery)) goto out; - stripe_size = geo->raid_disks << geo->chunk_shift; + stripe_size = geo->near_copies ? geo->near_copies << geo->chunk_shift: + geo->raid_disks << geo->chunk_shift; bio_start = bio->bi_iter.bi_sector; bio_end = bio_end_sector(bio); @@ -1637,6 +1639,18 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio) if (bio_sectors(bio) < stripe_size*2) goto out; + /* Keep the discard start/end address aligned with chunk size */ + if (bio_start & geo->chunk_mask) { + split_size = (chunk_size - (bio_start & geo->chunk_mask)); + bio = raid10_split_bio(conf, bio, split_size, false); + } + if (bio_end & geo->chunk_mask) { + split_size = bio_end & geo->chunk_mask; + bio = raid10_split_bio(conf, bio, split_size, true); + } + bio_start = bio->bi_iter.bi_sector; + bio_end = bio_end_sector(bio); + /* For far and far offset layout, if bio is not aligned with stripe size, * it splits the part that is not aligned with strip size. */ @@ -1664,8 +1678,8 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio) start_disk_index = sector_div(first_stripe_index, geo->raid_disks); if (geo->far_offset) first_stripe_index *= geo->far_copies; - start_disk_offset = (bio_start & geo->chunk_mask) + - (first_stripe_index << geo->chunk_shift); + /* Now the bio is aligned with chunk size */ + start_disk_offset = first_stripe_index << geo->chunk_shift; chunk = bio_end >> geo->chunk_shift; chunk *= geo->near_copies; @@ -1673,8 +1687,7 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio) end_disk_index = sector_div(last_stripe_index, geo->raid_disks)
Re: PROBLEM: Recent raid10 block discard patchset causes filesystem corruption on fstrim
On 12/09/2020 12:17 PM, Song Liu wrote: Hi Matthew, On Dec 8, 2020, at 7:46 PM, Matthew Ruffell wrote: Hello, I recently backported the following patches into the Ubuntu stable kernels: md: add md_submit_discard_bio() for submitting discard bio md/raid10: extend r10bio devs to raid disks md/raid10: pull codes that wait for blocked dev into one function md/raid10: improve raid10 discard request md/raid10: improve discard request for far layout dm raid: fix discard limits for raid1 and raid10 dm raid: remove unnecessary discard limits for raid10 Thanks for the report! Hi Xiao, Could you please take a look at this and let me know soon? We need to fix this before 5.10 official release. Thanks, Song Hi all Sorry for the trouble. But I'm in pto with no test machines. I'll have a look at this problem next week. and this morning, a user reported the following downstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/ Their weekly cronjob that runs fstrim had run, and their raid10 array has extensive data corruption. The issue is reproducible on the latest 5.10-rc7 mainline kernel, steps are below. I used a m5d.4xlarge instance on AWS to ultilise 2x 300GB SSDs that support block discard. You will want to select small disks to lower the time needed to reproduce. $ uname -rv 5.10.0-rc7+ #1 SMP Wed Dec 9 01:15:27 UTC 2020 Create a raid10 array, with LVM: $ lsblk NAMEMAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:00 8G 0 disk └─nvme0n1p1 259:10 8G 0 part / nvme1n1 259:20 279.4G 0 disk nvme2n1 259:30 279.4G 0 disk $ sudo -s # mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme1n1 /dev/nvme2n1 mdadm: layout defaults to n2 mdadm: layout defaults to n2 mdadm: chunk size defaults to 512K mdadm: size set to 292836352K mdadm: automatically enabling write-intent bitmap on large array mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. # pvcreate -ff -y /dev/md0 Physical volume "/dev/md0" successfully created. # vgcreate -f -y VolGroup /dev/md0 Volume group "VolGroup" successfully created # lvcreate -n root -L 100G -ay -y VolGroup Logical volume "root" created. # mkfs.ext4 /dev/VolGroup/root mke2fs 1.44.1 (24-Mar-2018) Discarding device blocks: done Creating filesystem with 26214400 4k blocks and 6553600 inodes Filesystem UUID: d7be2e14-fa4d-4489-884b-3bef63b1e1db Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 2048, 23887872 Allocating group tables: done Writing inode tables: done Creating journal (131072 blocks): done Writing superblocks and filesystem accounting information: done # mount /dev/VolGroup/root /mnt Next, wait for the disk check to complete, 25 minutes on m5d.4xlarge instance. # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid10 nvme2n1[1] nvme1n1[0] 292836352 blocks super 1.2 2 near-copies [2/2] [UU] [==>..] resync = 12.0% (35211392/292836352) finish=21.4min speed=200340K/sec bitmap: 3/3 pages [12KB], 65536KB chunk unused devices: # cat /sys/block/md0/md/mismatch_cnt 76918016 # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid10 nvme2n1[1] nvme1n1[0] 292836352 blocks super 1.2 2 near-copies [2/2] [UU] bitmap: 0/3 pages [0KB], 65536KB chunk unused devices: # cat /sys/block/md0/md/mismatch_cnt 582330240 Now that the check is complete, create a file, sync and delete it: # dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.95974 s, 1.1 GB/s # sync # rm /mnt/data.raw Perform a check: # echo check > /sys/block/md0/md/sync_action Again, wait 25 minutes for it to complete: # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid10 nvme1n1[1] nvme2n1[0] 292836352 blocks super 1.2 2 near-copies [2/2] [UU] [==>..] check = 13.7% (40356224/292836352) finish=20.8min speed=201707K/sec bitmap: 0/3 pages [0KB], 65536KB chunk unused devices: # cat /sys/block/md0/md/mismatch_cnt 1469696 # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid10 nvme1n1[1] nvme2n1[0] 292836352 blocks super 1.2 2 near-copies [2/2] [UU] bitmap: 0/3 pages [0KB], 65536KB chunk unused devices: # cat /sys/block/md0/md/mismatch_cnt 1469696 Now, perform the fstrim: # fstrim /mnt --verbose /mnt: 97.9 GiB (105089236992 bytes) trimmed Go for another check: # echo check >/sys/block/md0/md/sync_action # cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4
Re: [PATCH v2 0/3] md superblock write alignment on 512e devices
On 11/04/2020 04:12 AM, Chris Unkel wrote: Hi Xiao, Thanks for the excellent feedback. Since bitmap_offset appears to be a free-form field, it wasn't apparent to me that the bitmap never starts within 4K of the bitmap. I don't think it's worth worrying about a logical block size that's more than 4K here--from what I can see logical block size larger than the usual 4K page isn't going to happen. I do think that it makes sense to handle the case where the physical block size is more than 4K. I think what you propose works, but I think in the physical block > MAX_SB_SIZE case it makes more sense to align the superblock writes to the physical block size (as now) rather Is it a typo error? You want to say if physical block > MAX_SB_SIZE, it should align the superblock writes to logical block size? Because I see the comments below, your solution is to align to logical block size when physical block > MAX_SB_SIZE. than rejecting the create/assemble. Mounting with the possible performance hit seems like a better outcome for the user in that case than refusing to assemble. It's the same check that would have to be written to reject the assembly in that case and so the code shouldn't really be any more complex. So basically what I propose is: if the physical block size is no larger than MAX_SB_SIZE, pad to that; otherwise pad to to logical_block_size, that is, replace queue_logical_block_size() with something equivalent to: queue_physical_block_size(...) > MAX_SB_SIZE ? queue_logical_block_size(...) : queue_physical_block_size(...) which is simple, safe in all cases, doesn't reject any feasible assembly, and generates aligned sb writes on all common current devices (512n,4kn,512e.) What do you think? Yes, It's a nice solution :) Regards Xiao
Re: [PATCH 0/3] mdraid sb and bitmap write alignment on 512e drives
On 11/03/2020 02:59 AM, Chris Unkel wrote: Hi Xiao, That particular array is super1.2. The block trace was captured on the disk underlying the partition device on which the md array member resides, not on the partition device itself. The partition starts 2048 sectors into the disk (1MB). So the 2048 sectors offset to the beginning of the partition, plus the 8 sector superblock offset for super1.2 ends up at 2056. Sorry for the confusion there. Regards, --Chris Thanks for the explanation. I still have some questions in other emails about your patch. Could you have a look when you are free. Regards Xiao
Re: [PATCH 2/3] md: align superblock writes to physical blocks
On 10/30/2020 04:13 AM, Christopher Unkel wrote: Writes of the md superblock are aligned to the logical blocks of the containing device, but no attempt is made to align them to physical block boundaries. This means that on a "512e" device (4k physical, 512 logical) every superblock update hits the 512-byte emulation and the possible associated performance penalty. Respect the physical block alignment when possible, that is, when the write padded out to the physical block doesn't run into the data or bitmap. Signed-off-by: Christopher Unkel --- This series replaces the first patch of the previous series (https://lkml.org/lkml/2020/10/22/1058), with the following changes: 1. Creates a helper function super_1_sb_length_ok(). 2. Fixes operator placement style violation. 3. Covers case in super_1_sync(). 4. Refactors duplicate logic. 5. Covers a case in existing code where aligned superblock could run into bitmap. drivers/md/md.c | 45 + 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index d6a55ca1d52e..802a9a256fe5 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -1646,15 +1646,52 @@ static __le32 calc_sb_1_csum(struct mdp_superblock_1 *sb) return cpu_to_le32(csum); } +static int +super_1_sb_length_ok(struct md_rdev *rdev, int minor_version, int sb_len) +{ + int sectors = sb_len / 512; + struct mdp_superblock_1 *sb; + + /* superblock is stored in memory as a single page */ + if (sb_len > PAGE_SIZE) + return 0; + + /* check if sb runs into data */ + if (minor_version) { + if (rdev->sb_start + sectors > rdev->data_offset + || rdev->sb_start + sectors > rdev->new_data_offset) + return 0; + } else if (sb_len > 4096) + return 0; + + /* check if sb runs into bitmap */ + sb = page_address(rdev->sb_page); + if (le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) { + __s32 bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset); + if (bitmap_offset > 0 && sectors > bitmap_offset) + return 0; + } + + return 1; +} + For super1.0 it doesn't need to consider this. Because the data and bitmap is before superblock. For super1.1 and 1.2 it only needs to check whether it runs into bitmap. The data is behind the bitmap. Regards Xiao /* * set rdev->sb_size to that required for number of devices in array * with appropriate padding to underlying sectors */ static void -super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev) +super_1_set_rdev_sb_size(struct md_rdev *rdev, int max_dev, int minor_version) { int sb_size = max_dev * 2 + 256; - rdev->sb_size = round_up(sb_size, bdev_logical_block_size(rdev->bdev)); + int pb_aligned_size = round_up(sb_size, + bdev_physical_block_size(rdev->bdev)); + + /* generate physical-block aligned writes if legal */ + if (super_1_sb_length_ok(rdev, minor_version, pb_aligned_size)) + rdev->sb_size = pb_aligned_size; + else + rdev->sb_size = round_up(sb_size, +bdev_logical_block_size(rdev->bdev)); } static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_version) @@ -1730,7 +1767,7 @@ static int super_1_load(struct md_rdev *rdev, struct md_rdev *refdev, int minor_ rdev->new_data_offset += (s32)le32_to_cpu(sb->new_offset); atomic_set(>corrected_errors, le32_to_cpu(sb->cnt_corrected_read)); - super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev)); + super_1_set_rdev_sb_size(rdev, le32_to_cpu(sb->max_dev), minor_version); if (minor_version && rdev->data_offset < sb_start + (rdev->sb_size/512)) @@ -2140,7 +2177,7 @@ static void super_1_sync(struct mddev *mddev, struct md_rdev *rdev) if (max_dev > le32_to_cpu(sb->max_dev)) { sb->max_dev = cpu_to_le32(max_dev); - super_1_set_rdev_sb_size(rdev, max_dev); + super_1_set_rdev_sb_size(rdev, max_dev, mddev->minor_version); } else max_dev = le32_to_cpu(sb->max_dev);
Re: [PATCH v2 0/3] md superblock write alignment on 512e devices
On 10/30/2020 04:13 AM, Christopher Unkel wrote: Hello, Thanks for the feedback on the previous patch series. A updated patch series with the same function as the first patch (https://lkml.org/lkml/2020/10/22/1058 "md: align superblock writes to physical blocks") follows. As suggested, it introduces a helper function, which can be used to reduce some code duplication. It handles the case in super_1_sync() where the superblock is extended by the addition of new component devices. I think it also fixes a bug where the existing code in super_1_load() ought to be rejecting the array with EINVAL: if the superblock padded out to the *logical* block length runs into the bitmap. For example, if the bitmap offset is 2 (bitmap 1K after superblock) and the logical block size is 4K, the superblock padded out to 4K runs into the bitmap. This case may be unusual (perhaps only happens if the array is created on a 512n device and then raw contents are copied onto a 4kn device) but I think it is possible. Hi Chris For super1.1 and super1.2 bitmap offset is 8. It's a fixed value. So it should not have the risk? But for future maybe it has this problem. If the disk logical or physical block size is larger than 4K in future, it has data corruption risk. With respect to the option of simply replacing queue_logical_block_size() with queue_physical_block_size(), I think this can result in the code rejecting devices that can be loaded, but In mdadm it defines the max super size of super1 is 4096 #define MAX_SB_SIZE 4096 /* bitmap super size is 256, but we round up to a sector for alignment */ #define BM_SUPER_SIZE 512 #define MAX_DEVS ((int)(MAX_SB_SIZE - sizeof(struct mdp_superblock_1)) / 2) #define SUPER1_SIZE (MAX_SB_SIZE + BM_SUPER_SIZE \ + sizeof(struct misc_dev_info)) It should be ok to replace queue_logical_block_size with queue_physical_block_size? Now it doesn't check physical block size and super block size. For super1, we can add a check that if physical block size is larger than MAX_SB_SIZE, then we reject to create/assmble the raid device. for which the physical block alignment can't be respected--the longer padded size would trigger the EINVAL cases testing against data_offset/new_data_offset. I think it's better to proceed in such cases, just with unaligned superblock writes as would presently happen. Also if I'm right about the above bug, then I think this subsitution would be more likely to trigger it. Thanks, --Chris Christopher Unkel (3): md: factor out repeated sb alignment logic md: align superblock writes to physical blocks md: reuse sb length-checking logic drivers/md/md.c | 69 + 1 file changed, 52 insertions(+), 17 deletions(-)
Re: [PATCH 0/3] mdraid sb and bitmap write alignment on 512e drives
On 10/23/2020 11:31 AM, Christopher Unkel wrote: Hello all, While investigating some performance issues on mdraid 10 volumes formed with "512e" disks (4k native/physical sector size but with 512 byte sector emulation), I've found two cases where mdraid will needlessly issue writes that start on 4k byte boundary, but are are shorter than 4k: 1. writes of the raid superblock; and 2. writes of the last page of the write-intent bitmap. The following is an excerpt of a blocktrace of one of the component members of a mdraid 10 volume during a 4k write near the end of the array: 8,32 112 0.01687 711 D WS 2064 + 8 [kworker/11:1H] * 8,32 115 0.001454119 711 D WS 2056 + 1 [kworker/11:1H] * 8,32 118 0.002847204 711 D WS 2080 + 7 [kworker/11:1H] 8,32 11 11 0.003700545 3094 D WS 11721043920 + 8 [md127_raid1] 8,32 11 14 0.308785692 711 D WS 2064 + 8 [kworker/11:1H] * 8,32 11 17 0.310201697 711 D WS 2056 + 1 [kworker/11:1H] 8,32 11 20 5.500799245 711 D WS 2064 + 8 [kworker/11:1H] * 8,32 11 2315.740923558 711 D WS 2080 + 7 [kworker/11:1H] Note the starred transactions, which each start on a 4k boundary, but are less than 4k in length, and so will use the 512-byte emulation. Sector 2056 holds the superblock, and is written as a single 512-byte write. Sector 2086 holds the bitmap bit relevant to the written sector. When it is written the active bits of the last page of the bitmap are written, starting at sector 2080, padded out to the end of the 512-byte logical sector as required. This results in a 3.5kb write, again using the 512-byte emulation. Hi Christopher Which superblock version do you use? If it's super1.1, superblock starts at 0 sector. If it's super1.2, superblock starts at 8 sector. If it's super1.0, superblock starts at the end of device and bitmap is before superblock. As mentioned above, bitmap is behind the superblock, so it should not be super1.0. So I have a question why does 2056 hold the superblock? Regards Xiao Note that in some arrays the last page of the bitmap may be sufficiently full that they are not affected by the issue with the bitmap write. As there can be a substantial penalty to using the 512-byte sector emulation (turning writes into read-modify writes if the relevant sector is not in the drive's cache) I believe it makes sense to pad these writes out to a 4k boundary. The writes are already padded out for "4k native" drives, where the short access is illegal. The following patch set changes the superblock and bitmap writes to respect the physical block size (e.g. 4k for today's 512e drives) when possible. In each case there is already logic for padding out to the underlying logical sector size. I reuse or repeat the logic for padding out to the physical sector size, but treat the padding out as optional rather than mandatory. The corresponding block trace with these patches is: 8,32 12 0.03410 694 D WS 2064 + 8 [kworker/1:1H] 8,32 15 0.001368788 694 D WS 2056 + 8 [kworker/1:1H] 8,32 18 0.002727981 694 D WS 2080 + 8 [kworker/1:1H] 8,32 1 11 0.003533831 3063 D WS 11721043920 + 8 [md127_raid1] 8,32 1 14 0.253952321 694 D WS 2064 + 8 [kworker/1:1H] 8,32 1 17 0.255354215 694 D WS 2056 + 8 [kworker/1:1H] 8,32 1 20 5.337938486 694 D WS 2064 + 8 [kworker/1:1H] 8,32 1 2315.577963062 694 D WS 2080 + 8 [kworker/1:1H] I do notice that the code for bitmap writes has a more sophisticated and thorough check for overlap than the code for superblock writes. (Compare write_sb_page in md-bitmap.c vs. super_1_load in md.c.) From what I know since the various structures starts have always been 4k aligned anyway, it is always safe to pad the superblock write out to 4k (as occurs on 4k native drives) but not necessarily futher. Feedback appreciated. --Chris Christopher Unkel (3): md: align superblock writes to physical blocks md: factor sb write alignment check into function md: pad writes to end of bitmap to physical blocks drivers/md/md-bitmap.c | 80 +- drivers/md/md.c| 15 2 files changed, 63 insertions(+), 32 deletions(-)
Re: [PATCH] clk: rockchip: Fix overflow rate during fractional approximation
ommit: commit <88a5404a2277> ("clk: rockchip: fix up the rockchip_fractional_approximation") commit <4186a0e4239b> ("clk: rockchip: Add supprot to limit input rate for fractional divider") Signed-off-by: Jagan Teki Signed-off-by: Finley Xiao --- drivers/clk/rockchip/clk-px30.c | 12 ++-- drivers/clk/rockchip/clk.c | 9 + 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/clk/rockchip/clk-px30.c b/drivers/clk/rockchip/clk-px30.c index 6fb9c98b7d24..06d3ff39d12f 100644 --- a/drivers/clk/rockchip/clk-px30.c +++ b/drivers/clk/rockchip/clk-px30.c @@ -660,7 +660,7 @@ static struct rockchip_clk_branch px30_clk_branches[] __initdata = { COMPOSITE(SCLK_UART1_SRC, "clk_uart1_src", mux_uart_src_p, CLK_SET_RATE_NO_REPARENT, PX30_CLKSEL_CON(34), 14, 2, MFLAGS, 0, 5, DFLAGS, PX30_CLKGATE_CON(10), 12, GFLAGS), - COMPOSITE_NOMUX_HALFDIV(0, "clk_uart1_np5", "clk_uart1_src", 0, + COMPOSITE_NOMUX_HALFDIV(0, "clk_uart1_np5", "clk_uart1_src", CLK_SET_RATE_PARENT, PX30_CLKSEL_CON(35), 0, 5, DFLAGS, PX30_CLKGATE_CON(10), 13, GFLAGS), COMPOSITE_FRACMUX(0, "clk_uart1_frac", "clk_uart1_src", CLK_SET_RATE_PARENT, @@ -673,7 +673,7 @@ static struct rockchip_clk_branch px30_clk_branches[] __initdata = { COMPOSITE(SCLK_UART2_SRC, "clk_uart2_src", mux_uart_src_p, 0, PX30_CLKSEL_CON(37), 14, 2, MFLAGS, 0, 5, DFLAGS, PX30_CLKGATE_CON(11), 0, GFLAGS), - COMPOSITE_NOMUX_HALFDIV(0, "clk_uart2_np5", "clk_uart2_src", 0, + COMPOSITE_NOMUX_HALFDIV(0, "clk_uart2_np5", "clk_uart2_src", CLK_SET_RATE_PARENT, PX30_CLKSEL_CON(38), 0, 5, DFLAGS, PX30_CLKGATE_CON(11), 1, GFLAGS), COMPOSITE_FRACMUX(0, "clk_uart2_frac", "clk_uart2_src", CLK_SET_RATE_PARENT, @@ -686,7 +686,7 @@ static struct rockchip_clk_branch px30_clk_branches[] __initdata = { COMPOSITE(0, "clk_uart3_src", mux_uart_src_p, 0, PX30_CLKSEL_CON(40), 14, 2, MFLAGS, 0, 5, DFLAGS, PX30_CLKGATE_CON(11), 4, GFLAGS), - COMPOSITE_NOMUX_HALFDIV(0, "clk_uart3_np5", "clk_uart3_src", 0, + COMPOSITE_NOMUX_HALFDIV(0, "clk_uart3_np5", "clk_uart3_src", CLK_SET_RATE_PARENT, PX30_CLKSEL_CON(41), 0, 5, DFLAGS, PX30_CLKGATE_CON(11), 5, GFLAGS), COMPOSITE_FRACMUX(0, "clk_uart3_frac", "clk_uart3_src", CLK_SET_RATE_PARENT, @@ -699,7 +699,7 @@ static struct rockchip_clk_branch px30_clk_branches[] __initdata = { COMPOSITE(0, "clk_uart4_src", mux_uart_src_p, 0, PX30_CLKSEL_CON(43), 14, 2, MFLAGS, 0, 5, DFLAGS, PX30_CLKGATE_CON(11), 8, GFLAGS), - COMPOSITE_NOMUX_HALFDIV(0, "clk_uart4_np5", "clk_uart4_src", 0, + COMPOSITE_NOMUX_HALFDIV(0, "clk_uart4_np5", "clk_uart4_src", CLK_SET_RATE_PARENT, PX30_CLKSEL_CON(44), 0, 5, DFLAGS, PX30_CLKGATE_CON(11), 9, GFLAGS), COMPOSITE_FRACMUX(0, "clk_uart4_frac", "clk_uart4_src", CLK_SET_RATE_PARENT, @@ -712,7 +712,7 @@ static struct rockchip_clk_branch px30_clk_branches[] __initdata = { COMPOSITE(0, "clk_uart5_src", mux_uart_src_p, 0, PX30_CLKSEL_CON(46), 14, 2, MFLAGS, 0, 5, DFLAGS, PX30_CLKGATE_CON(11), 12, GFLAGS), - COMPOSITE_NOMUX_HALFDIV(0, "clk_uart5_np5", "clk_uart5_src", 0, + COMPOSITE_NOMUX_HALFDIV(0, "clk_uart5_np5", "clk_uart5_src", CLK_SET_RATE_PARENT, PX30_CLKSEL_CON(47), 0, 5, DFLAGS, PX30_CLKGATE_CON(11), 13, GFLAGS), COMPOSITE_FRACMUX(0, "clk_uart5_frac", "clk_uart5_src", CLK_SET_RATE_PARENT, @@ -934,7 +934,7 @@ static struct rockchip_clk_branch px30_clk_pmu_branches[] __initdata = { COMPOSITE(0, "clk_uart0_pmu_src", mux_uart_src_p, 0, PX30_PMU_CLKSEL_CON(3), 14, 2, MFLAGS, 0, 5, DFLAGS, PX30_PMU_CLKGATE_CON(1), 0, GFLAGS), - COMPOSITE_NOMUX_HALFDIV(0, "clk_uart0_np5", "clk_uart0_pmu_src", 0, + COMPOSITE_NOMUX_HALFDIV(0, "clk_uart0_np5", "clk_uart0_pmu_src", CLK_SET_RATE_PARENT, PX30_PMU_CLKSEL_CON(4), 0, 5, DFLAGS, PX30_PMU_CLKGATE_CON(1), 1, GFLAGS), COMPOSITE_FRACMUX(0, "clk_uart0_frac", "clk_uart0_pmu_src",
[PATCH v1] thermal/of: Introduce k-po, k-pu and k-i for a thermal zone
The default value for k_pu is: 2 * sustainable_power / (desired_temperature - switch_on_temp) The default value for k_po is: sustainable_power / (desired_temperature - switch_on_temp) The default value for k_i is 10. Even though these parameters of the PID controller can be changed by the following sysfs files: /sys/class/thermal/thermal_zoneX/k_pu /sys/class/thermal/thermal_zoneX/k_po /sys/class/thermal/thermal_zoneX/k_i But it's still more convenient to change the default values by devicetree, so introduce these three optional properties. If provided these properties, they will be parsed and associated with the thermal zone via the thermal zone parameters. Signed-off-by: Finley Xiao --- Documentation/devicetree/bindings/thermal/thermal.txt | 14 ++ drivers/thermal/thermal_of.c | 7 +++ 2 files changed, 21 insertions(+) diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt index f78bec19ca35..ebe936b57ded 100644 --- a/Documentation/devicetree/bindings/thermal/thermal.txt +++ b/Documentation/devicetree/bindings/thermal/thermal.txt @@ -165,6 +165,20 @@ Optional property: 2000mW, while on a 10'' tablet is around 4500mW. +- k-po:Proportional parameter of the PID controller when + current temperature is above the target. + Type: signed + Size: one cell + +- k-pu:Proportional parameter of the PID controller when + current temperature is below the target. + Type: signed + Size: one cell + +- k-i: Integral parameter of the PID controller. + Type: signed + Size: one cell + Note: The delay properties are bound to the maximum dT/dt (temperature derivative over time) in two situations for a thermal zone: (i) - when passive cooling is activated (polling-delay-passive); and diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c index ddf88dbe7ba2..b2a9f92cd8d2 100644 --- a/drivers/thermal/thermal_of.c +++ b/drivers/thermal/thermal_of.c @@ -1089,6 +1089,7 @@ int __init of_parse_thermal_zones(void) struct thermal_zone_params *tzp; int i, mask = 0; u32 prop; + s32 sval; tz = thermal_of_build_thermal_zone(child); if (IS_ERR(tz)) { @@ -1113,6 +1114,12 @@ int __init of_parse_thermal_zones(void) if (!of_property_read_u32(child, "sustainable-power", )) tzp->sustainable_power = prop; + if (!of_property_read_s32(child, "k-po", )) + tzp->k_po = sval; + if (!of_property_read_s32(child, "k-pu", )) + tzp->k_pu = sval; + if (!of_property_read_s32(child, "k-i", )) + tzp->k_i = sval; for (i = 0; i < tz->ntrips; i++) mask |= 1 << i; -- 2.11.0
[PATCH] thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power
The function cpu_power_to_freq is used to find a frequency and set the cooling device to consume at most the power to be converted. For example, if the power to be converted is 80mW, and the em table is as follow. struct em_cap_state table[] = { /* KHz mW */ { 1008000, 36, 0 }, { 120, 49, 0 }, { 1296000, 59, 0 }, { 1416000, 72, 0 }, { 1512000, 86, 0 }, }; The target frequency should be 1416000KHz, not 1512000KHz. Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables") Cc: # v4.13+ Signed-off-by: Finley Xiao Acked-by: Viresh Kumar --- drivers/thermal/cpufreq_cooling.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index 9e124020519f..6c0e1b053126 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -123,12 +123,12 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev, { int i; - for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (power > cpufreq_cdev->em->table[i].power) + for (i = cpufreq_cdev->max_level; i >= 0; i--) { + if (power >= cpufreq_cdev->em->table[i].power) break; } - return cpufreq_cdev->em->table[i + 1].frequency; + return cpufreq_cdev->em->table[i].frequency; } /** -- 2.11.0
[PATCH] thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power
The function cpu_power_to_freq is used to find a frequency and set the cooling device to consume at most the power to be converted. For example, if the power to be converted is 80mW, and the em table is as follow. struct em_cap_state table[] = { /* KHz mW */ { 1008000, 36, 0 }, { 120, 49, 0 }, { 1296000, 59, 0 }, { 1416000, 72, 0 }, { 1512000, 86, 0 }, }; The target frequency should be 1416000KHz, not 1512000KHz. Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables") Signed-off-by: Finley Xiao --- drivers/thermal/cpufreq_cooling.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index 9e124020519f..6c0e1b053126 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -123,12 +123,12 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev, { int i; - for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (power > cpufreq_cdev->em->table[i].power) + for (i = cpufreq_cdev->max_level; i >= 0; i--) { + if (power >= cpufreq_cdev->em->table[i].power) break; } - return cpufreq_cdev->em->table[i + 1].frequency; + return cpufreq_cdev->em->table[i].frequency; } /** -- 2.11.0
Re: [PATCH] xfs/XXX: Add xfs/XXX
On 2020/6/3 2:14, Darrick J. Wong wrote: On Tue, Jun 02, 2020 at 04:51:48PM +0800, Xiao Yang wrote: On 2020/4/14 0:30, Darrick J. Wong wrote: This might be a good time to introduce a few new helpers: _require_scratch_dax ("Does $SCRATCH_DEV support DAX?") _require_scratch_dax_mountopt ("Does the fs support the DAX mount options?") _require_scratch_daX_iflag ("Does the fs support FS_XFLAG_DAX?") Hi Darrick, Now, I am trying to introduce these new helpers and have some questions: 1) There are five testcases related to old dax implementation, should we only convert them to new dax implementation or make them compatible with old and new dax implementation? What is the 'old' DAX implementation? ext2 XIP? Hi Darrick, Thanks for your quick feedback. Right, the 'old' DAX implementation means old dax mount option(i.e. -o dax) Compare new and old dax mount option on ext4 and xfs, is the following logic right? -o dax=always == -o dax -o dax=never == without dax -o dax=inode == nothing Of course, we should uses new option if ext4/xfs supports new dax mount option on distros. But should we fallback to use old option if ext4/xfs doesn't support new dax mount option on some old distros? btw: it seems hard for testcases to use two different sets of mount options(i.e. old and new) so do you have any suggestion? 2) I think _require_xfs_io_command "chattr" "x" is enough to check if fs supports FS_XFLAG_DAX. Is it necessary to add _require_scratch_dax_iflag()? like this: _require_scratch_dax_iflag() { _require_xfs_io_command "chattr" "x" } I suggested that list based on the major control knobs that will be visible to userspace programs. Even if this is just a one-line helper, its name is useful for recognizing which of those knobs we're looking for. Yes, you could probably save a trivial amount of time by skipping one iteration of bash function calling, but now everyone has to remember that the xfs_io chattr "x" flag means the dax inode flag, and not confuse it for chmod +x or something else. Got it, thanks for your detailed explanation. Best Regards, Xiao Yang --D Best Regards, Xiao Yang .
Re: [PATCH] xfs/XXX: Add xfs/XXX
On 2020/4/14 0:30, Darrick J. Wong wrote: This might be a good time to introduce a few new helpers: _require_scratch_dax ("Does $SCRATCH_DEV support DAX?") _require_scratch_dax_mountopt ("Does the fs support the DAX mount options?") _require_scratch_daX_iflag ("Does the fs support FS_XFLAG_DAX?") Hi Darrick, Now, I am trying to introduce these new helpers and have some questions: 1) There are five testcases related to old dax implementation, should we only convert them to new dax implementation or make them compatible with old and new dax implementation? 2) I think _require_xfs_io_command "chattr" "x" is enough to check if fs supports FS_XFLAG_DAX. Is it necessary to add _require_scratch_dax_iflag()? like this: _require_scratch_dax_iflag() { _require_xfs_io_command "chattr" "x" } Best Regards, Xiao Yang
Re: [PATCH V4 6/8] fs/ext4: Make DAX mount option a tri-state
On 2020/5/28 17:41, Jan Kara wrote: On Thu 28-05-20 16:56:51, Xiao Yang wrote: On 2020/5/28 7:50, Ira Weiny wrote: On Wed, May 27, 2020 at 01:54:54PM +0800, Xiao Yang wrote: On 2020/5/22 3:13, ira.we...@intel.com wrote: From: Ira Weiny We add 'always', 'never', and 'inode' (default). '-o dax' continues to operate the same which is equivalent to 'always'. This new functionality is limited to ext4 only. Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set it and EXT4_MOUNT_DAX_ALWAYS appropriately for the mode. We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX. Finally, EXT4_MOUNT2_DAX_INODE is used solely to detect if the user specified that option for printing. Hi Ira, I have two questions when reviewing this patch: 1) After doing mount with the same dax=inode option, ext4/xfs shows differnt output(i.e. xfs doesn't print 'dax=inode'): --- # mount -o dax=inode /dev/pmem0 /mnt/xfstests/test/ # mount | grep pmem0 /dev/pmem0 on /mnt/xfstests/test type ext4 (rw,relatime,seclabel,dax=inode) # mount -odax=inode /dev/pmem1 /mnt/xfstests/scratch/ # mount | grep pmem1 /dev/pmem1 on /mnt/xfstests/scratch type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) Is this expected output? why don't unify the output? Correct. dax=inode is the default. xfs treats that default the same whether you specify it on the command line or not. For ext4 Jan specifically asked that if the user specified dax=inode on the command line that it be printed on the mount options. If you don't specify anything then dax=inode is in effect but ext4 will not print anything. I had the behavior the same as XFS originally but Jan wanted it this way. The XFS behavior is IMO better and is what the new mount infrastructure gives by default. Could we unify the output? It is strange for me to use differnt output on ext4 and xfs. If we'd unify the output with XFS, it would be inconsistent with all the other ext4 mount options. So I disagree with that. I agree it is not ideal to have different behavior between xfs and ext4 but such is the historical behavior. If we want to change that, we need to change the handling for all the ext4 mount options. I'm open for that discussion but it is a problem unrelated to this patch set. Hi Jan, Thanks for your quick feedback. Of course, this doubt should not block the patch set. Best Regards, Xiao Yang Honza
Re: [PATCH V4 6/8] fs/ext4: Make DAX mount option a tri-state
On 2020/5/28 7:50, Ira Weiny wrote: On Wed, May 27, 2020 at 01:54:54PM +0800, Xiao Yang wrote: On 2020/5/22 3:13, ira.we...@intel.com wrote: From: Ira Weiny We add 'always', 'never', and 'inode' (default). '-o dax' continues to operate the same which is equivalent to 'always'. This new functionality is limited to ext4 only. Specifically we introduce a 2nd DAX mount flag EXT4_MOUNT2_DAX_NEVER and set it and EXT4_MOUNT_DAX_ALWAYS appropriately for the mode. We also force EXT4_MOUNT2_DAX_NEVER if !CONFIG_FS_DAX. Finally, EXT4_MOUNT2_DAX_INODE is used solely to detect if the user specified that option for printing. Hi Ira, I have two questions when reviewing this patch: 1) After doing mount with the same dax=inode option, ext4/xfs shows differnt output(i.e. xfs doesn't print 'dax=inode'): --- # mount -o dax=inode /dev/pmem0 /mnt/xfstests/test/ # mount | grep pmem0 /dev/pmem0 on /mnt/xfstests/test type ext4 (rw,relatime,seclabel,dax=inode) # mount -odax=inode /dev/pmem1 /mnt/xfstests/scratch/ # mount | grep pmem1 /dev/pmem1 on /mnt/xfstests/scratch type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) Is this expected output? why don't unify the output? Correct. dax=inode is the default. xfs treats that default the same whether you specify it on the command line or not. For ext4 Jan specifically asked that if the user specified dax=inode on the command line that it be printed on the mount options. If you don't specify anything then dax=inode is in effect but ext4 will not print anything. I had the behavior the same as XFS originally but Jan wanted it this way. The XFS behavior is IMO better and is what the new mount infrastructure gives by default. Hi Ira, Could we unify the output? It is strange for me to use differnt output on ext4 and xfs. 2) Do mount without dax and mount with -odax=inode have the same behavior? Yes. --- # mount /dev/pmem0 /mnt/xfstests/test/ # mount | grep pmem0 /dev/pmem0 on /mnt/xfstests/test type ext4 (rw,relatime,seclabel) # umount /mnt/xfstests/test # mount -odax=inode /dev/pmem0 /mnt/xfstests/test/ # mount | grep pmem0 /dev/pmem0 on /mnt/xfstests/test type ext4 (rw,relatime,seclabel,dax=inode --- These are the same behavior. But because you specified the option it was echoed in the second output. Got it, thanks for your explanation. BTW: I focus on the support of per-file/directory DAX operations recently. Reviewed-by: Jan Kara Signed-off-by: Ira Weiny --- Changes from V1: Fix up mounting options to only show an option if specified Fix remount to prevent dax changes Isolate behavior to ext4 only Changes from RFC: Combine remount check for DAX_NEVER with DAX_ALWAYS Update ext4_should_enable_dax() --- fs/ext4/ext4.h | 2 ++ fs/ext4/inode.c | 2 ++ fs/ext4/super.c | 67 + 3 files changed, 61 insertions(+), 10 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index f5291693ce6e..65ffb831b2b9 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1168,6 +1168,8 @@ struct ext4_inode_info { blocks */ #define EXT4_MOUNT2_HURD_COMPAT 0x0004 /* Support HURD-castrated file systems */ +#define EXT4_MOUNT2_DAX_NEVER 0x0008 /* Do not allow Direct Access */ +#define EXT4_MOUNT2_DAX_INODE 0x0010 /* For printing options only */ #define EXT4_MOUNT2_EXPLICIT_JOURNAL_CHECKSUM0x0008 /* User explicitly specified journal checksum */ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 01636cf5f322..68fac9289109 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4402,6 +4402,8 @@ static bool ext4_should_enable_dax(struct inode *inode) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + if (test_opt2(inode->i_sb, DAX_NEVER)) + return false; if (!S_ISREG(inode->i_mode)) return false; if (ext4_should_journal_data(inode)) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 80eb814c47eb..5e056aa20ce9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1512,7 +1512,8 @@ enum { Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_jqfmt_vfsv1, Opt_quota, Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err, - Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version, Opt_dax, + Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version, + Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never, Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn
Re: [PATCH V4 6/8] fs/ext4: Make DAX mount option a tri-state
t_delalloc, "delalloc"}, > {Opt_warn_on_error, "warn_on_error"}, > @@ -1726,6 +1730,7 @@ static int clear_qf_name(struct super_block *sb, int > qtype) > #define MOPT_NO_EXT30x0200 > #define MOPT_EXT4_ONLY (MOPT_NO_EXT2 | MOPT_NO_EXT3) > #define MOPT_STRING 0x0400 > +#define MOPT_SKIP0x0800 > > static const struct mount_opts { > int token; > @@ -1775,7 +1780,13 @@ static const struct mount_opts { > {Opt_min_batch_time, 0, MOPT_GTE0}, > {Opt_inode_readahead_blks, 0, MOPT_GTE0}, > {Opt_init_itable, 0, MOPT_GTE0}, > - {Opt_dax, EXT4_MOUNT_DAX_ALWAYS, MOPT_SET}, > + {Opt_dax, EXT4_MOUNT_DAX_ALWAYS, MOPT_SET | MOPT_SKIP}, > + {Opt_dax_always, EXT4_MOUNT_DAX_ALWAYS, > + MOPT_EXT4_ONLY | MOPT_SET | MOPT_SKIP}, > + {Opt_dax_inode, EXT4_MOUNT2_DAX_INODE, > + MOPT_EXT4_ONLY | MOPT_SET | MOPT_SKIP}, > + {Opt_dax_never, EXT4_MOUNT2_DAX_NEVER, > + MOPT_EXT4_ONLY | MOPT_SET | MOPT_SKIP}, > {Opt_stripe, 0, MOPT_GTE0}, > {Opt_resuid, 0, MOPT_GTE0}, > {Opt_resgid, 0, MOPT_GTE0}, > @@ -2084,13 +2095,32 @@ static int handle_mount_opt(struct super_block *sb, > char *opt, int token, > } > sbi->s_jquota_fmt = m->mount_opt; > #endif > - } else if (token == Opt_dax) { > + } else if (token == Opt_dax || token == Opt_dax_always || > +token == Opt_dax_inode || token == Opt_dax_never) { > #ifdef CONFIG_FS_DAX > - ext4_msg(sb, KERN_WARNING, > - "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"); > - sbi->s_mount_opt |= m->mount_opt; > + switch (token) { > + case Opt_dax: > + case Opt_dax_always: > + ext4_msg(sb, KERN_WARNING, > + "DAX enabled. Warning: EXPERIMENTAL, use at > your own risk"); > + sbi->s_mount_opt |= EXT4_MOUNT_DAX_ALWAYS; > + sbi->s_mount_opt2&= ~EXT4_MOUNT2_DAX_NEVER; > + break; > + case Opt_dax_never: > + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER; > + sbi->s_mount_opt&= ~EXT4_MOUNT_DAX_ALWAYS; > + break; > + case Opt_dax_inode: > + sbi->s_mount_opt&= ~EXT4_MOUNT_DAX_ALWAYS; > + sbi->s_mount_opt2&= ~EXT4_MOUNT2_DAX_NEVER; > + /* Strictly for printing options */ > + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_INODE; > + break; > + } > #else > ext4_msg(sb, KERN_INFO, "dax option not supported"); > + sbi->s_mount_opt2 |= EXT4_MOUNT2_DAX_NEVER; > + sbi->s_mount_opt&= ~EXT4_MOUNT_DAX_ALWAYS; > return -1; > #endif For s_mount_opt/s_mount_opt2, could we make the code more readable by using set_opt()/set_opt2()/clear_opt()/clear_opt2() macros? Thanks, Xiao Yang > } else if (token == Opt_data_err_abort) { > @@ -2254,7 +2284,7 @@ static int _ext4_show_options(struct seq_file *seq, > struct super_block *sb, > for (m = ext4_mount_opts; m->token != Opt_err; m++) { > int want_set = m->flags& MOPT_SET; > if (((m->flags& (MOPT_SET|MOPT_CLEAR)) == 0) || > - (m->flags& MOPT_CLEAR_ERR)) > + (m->flags& MOPT_CLEAR_ERR) || m->flags& MOPT_SKIP) > continue; > if (!nodefs&& !(m->mount_opt& (sbi->s_mount_opt ^ > def_mount_opt))) > continue; /* skip if same as the default */ > @@ -2314,6 +2344,17 @@ static int _ext4_show_options(struct seq_file *seq, > struct super_block *sb, > if (DUMMY_ENCRYPTION_ENABLED(sbi)) > SEQ_OPTS_PUTS("test_dummy_encryption"); > > + if (test_opt(sb, DAX_ALWAYS)) { > + if (IS_EXT2_SB(sb)) > + SEQ_OPTS_PUTS("dax"); > + else > + SEQ_OPTS_PUTS("dax=always"); > + } else if (test_opt2(sb, DAX_NEVER)) { > + SEQ_OPTS_PUTS("dax=never"); > + } else if (test_opt2(sb, DAX_INODE)) { > + SEQ_OPTS_PUTS("dax=inode"); > + } > + > ext4_show_quota_options(seq, sb); > return 0; > } > @@ -5436,10 +5477,16 @@ static int ext4_remount(struct super_block *sb, int > *flags, char *data) > goto restore_opts; > } > &
Re: [PATCH v2] tools/perf/util/*.c: Use "%zu" output format for size_t type
Hi, Ping! Thanks, Xiao Yang On 2020/2/26 9:08, Xiao Yang wrote: > Avoid the following errors when building perf on i386: > --- > util/session.c: In function 'perf_session__process_compressed_event': > util/session.c:91:11: error: format '%ld' expects argument of type 'long > int', but argument 4 has type 'size_t {aka unsigned int}' [-Werror=format=] >pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size); > ... > til/zstd.c: In function 'zstd_decompress_stream': > util/zstd.c:102:11: error: format '%ld' expects argument of type 'long int', > but argument 4 has type 'size_t {aka unsigned int}' [-Werror=format=] > pr_err("failed to decompress (B): %ld -> %ld, dst_size %ld : %s\n", > ... > --- > > Reported-by: kernel test robot > Signed-off-by: Xiao Yang > --- > tools/perf/util/session.c | 2 +- > tools/perf/util/zstd.c| 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > index d0d7d25b23e3..55c3d2fefd41 100644 > --- a/tools/perf/util/session.c > +++ b/tools/perf/util/session.c > @@ -88,7 +88,7 @@ static int perf_session__process_compressed_event(struct > perf_session *session, > session->decomp_last = decomp; > } > > - pr_debug("decomp (B): %ld to %ld\n", src_size, decomp_size); > + pr_debug("decomp (B): %zu to %zu\n", src_size, decomp_size); > > return 0; > } > diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c > index d2202392ffdb..a051237cacf2 100644 > --- a/tools/perf/util/zstd.c > +++ b/tools/perf/util/zstd.c > @@ -99,7 +99,7 @@ size_t zstd_decompress_stream(struct zstd_data *data, void > *src, size_t src_size > while (input.pos< input.size) { > ret = ZSTD_decompressStream(data->dstream,,); > if (ZSTD_isError(ret)) { > - pr_err("failed to decompress (B): %ld -> %ld, dst_size > %ld : %s\n", > + pr_err("failed to decompress (B): %zu -> %zu, dst_size > %zu : %s\n", > src_size, output.size, dst_size, > ZSTD_getErrorName(ret)); > break; > }
Re: [PATCH 3/3] selftests/ftrace: Use /bin/echo instead of built-in echo
On 2020/5/7 17:15, Masami Hiramatsu wrote: On Thu, 7 May 2020 14:45:16 +0800 Xiao Yang wrote: On 2020/5/1 21:38, Masami Hiramatsu wrote: Since the built-in echo has different behavior in POSIX shell (dash) and bash, we forcibly use /bin/echo -E (not interpret backslash escapes) by default. This also fixes some test cases which expects built-in echo command. Reported-by: Liu Yiding Signed-off-by: Masami Hiramatsu --- tools/testing/selftests/ftrace/test.d/functions|3 +++ .../test.d/trigger/trigger-trace-marker-hist.tc|2 +- .../trigger-trace-marker-synthetic-kernel.tc |4 .../trigger/trigger-trace-marker-synthetic.tc |4 ++-- 4 files changed, 10 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/ftrace/test.d/functions b/tools/testing/selftests/ftrace/test.d/functions index 5d4550591ff9..ea59b6ea2c3e 100644 --- a/tools/testing/selftests/ftrace/test.d/functions +++ b/tools/testing/selftests/ftrace/test.d/functions @@ -1,3 +1,6 @@ +# Since the built-in echo has different behavior in POSIX shell (dash) and +# bash, we forcibly use /bin/echo -E (not interpret backslash escapes). +alias echo="/bin/echo -E" Hi Masami, Steven It seems that only kprobe_syntax_errors.tc is impacted by the issue currently. Is it necessary for all tests to use /bin/echo and could we just make kprobe_syntax_errors.tc use /bin/echo? Yes, I would like to unify the "echo"'s behavior among the testcases instead of patching each failure in the future. Or would you have any concern on it? Hi Masami, Very sorry for the late reply. We may not avoid fixing related failures after your change: 1) We have to reuse built-in echo (do alias echo=echo) if we want to test common_pid for histogram. 2) We have to reuse built-in echo if some new tests want to interpret backslash escapes in future. Is it simple to provide two implementations of echo?(built-in echo and echo command?) and then just apply echo command for kprobe_syntax_errors.tc? BTW: My suggestion may not be correct. Best Regards, Xiao Yang Thank you, Best Regards, Xiao Yang clear_trace() { # reset trace output echo> trace diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc index ab6bedb25736..b3f70f53ee69 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc @@ -30,7 +30,7 @@ fi echo "Test histogram trace_marker tigger" -echo 'hist:keys=common_pid'> events/ftrace/print/trigger +echo 'hist:keys=ip'> events/ftrace/print/trigger for i in `seq 1 10` ; do echo "hello"> trace_marker; done grep 'hitcount: *10$' events/ftrace/print/hist> /dev/null || \ fail "hist trigger did not trigger correct times on trace_marker" diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc index 18b4d1c2807e..c1625d945f4d 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc @@ -44,6 +44,10 @@ echo 'latency u64 lat'> synthetic_events echo 'hist:keys=pid:ts0=common_timestamp.usecs'> events/sched/sched_waking/trigger echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).latency($lat)'> events/ftrace/print/trigger echo 'hist:keys=common_pid,lat:sort=lat'> events/synthetic/latency/trigger + +# We have to use the built-in echo here because waking up pid must be same +# as echoing pid. +alias echo=echo sleep 1 echo "hello"> trace_marker diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc index dd262d6d0db6..23e52c8d71de 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc @@ -36,8 +36,8 @@ fi echo "Test histogram trace_marker to trace_marker latency histogram trigger" echo 'latency u64 lat'> synthetic_events -echo 'hist:keys=common_pid:ts0=common_timestamp.usecs if buf == "start"'> events/ftrace/print/trigger -echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print).latency($lat) if buf == "end"'>> events/ftrace/print/trigger +echo 'hist:keys=ip:ts0=common_timestamp.usecs if buf == "start"'> events/ftrace/print/trigger +echo 'hist:keys=ip:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print
Re: [PATCH] tracing: Wait for preempt irq delay thread to finish
Hi Steven, Thanks for your further investigation. I used the following ways to test your fix patch on my slow vm and didn't see any issue: 1) Insert and remove preemptirq_delay_test in loops. 2) Insert preemptirq_delay_test, write to /sys/kernel/preemptirq_delay_test/trigger and remove preemptirq_delay_test in loops. 3) Ran irqsoff_tracer.tc in loops. BTW: For irqsoff_tracer.tc, should we extend code to test the burst feature and the sysfs trigger? Reviewed-by: Xiao Yang Thanks, Xiao Yang On 2020/5/6 22:30, Steven Rostedt wrote: From: "Steven Rostedt (VMware)" Running on a slower machine, it is possible that the preempt delay kernel thread may still be executing if the module was immediately removed after added, and this can cause the kernel to crash as the kernel thread might be executing after its code has been removed. There's no reason that the caller of the code shouldn't just wait for the delay thread to finish, as the thread can also be created by a trigger in the sysfs code, which also has the same issues. Link: http://lore.kernel.org/r/5ea2b0c8.2080...@cn.fujitsu.com Cc: sta...@vger.kernel.org Fixes: 793937236d1ee ("lib: Add module for testing preemptoff/irqsoff latency tracers") Reported-by: Xiao Yang Signed-off-by: Steven Rostedt (VMware) --- kernel/trace/preemptirq_delay_test.c | 30 ++-- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index 31c0fad4cb9e..c4c86de63cf9 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -113,22 +113,42 @@ static int preemptirq_delay_run(void *data) for (i = 0; i< s; i++) (testfuncs[i])(i); + + set_current_state(TASK_INTERRUPTIBLE); + while (!kthread_should_stop()) { + schedule(); + set_current_state(TASK_INTERRUPTIBLE); + } + + __set_current_state(TASK_RUNNING); + return 0; } -static struct task_struct *preemptirq_start_test(void) +static int preemptirq_run_test(void) { + struct task_struct *task; + char task_name[50]; snprintf(task_name, sizeof(task_name), "%s_test", test_mode); - return kthread_run(preemptirq_delay_run, NULL, task_name); + task = kthread_run(preemptirq_delay_run, NULL, task_name); + if (IS_ERR(task)) + return PTR_ERR(task); + if (task) + kthread_stop(task); + return 0; } static ssize_t trigger_store(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count) { - preemptirq_start_test(); + ssize_t ret; + + ret = preemptirq_run_test(); + if (ret) + return ret; return count; } @@ -148,11 +168,9 @@ static struct kobject *preemptirq_delay_kobj; static int __init preemptirq_delay_init(void) { - struct task_struct *test_task; int retval; - test_task = preemptirq_start_test(); - retval = PTR_ERR_OR_ZERO(test_task); + retval = preemptirq_run_test(); if (retval != 0) return retval;
Re: [PATCH 3/3] selftests/ftrace: Use /bin/echo instead of built-in echo
On 2020/5/1 21:38, Masami Hiramatsu wrote: Since the built-in echo has different behavior in POSIX shell (dash) and bash, we forcibly use /bin/echo -E (not interpret backslash escapes) by default. This also fixes some test cases which expects built-in echo command. Reported-by: Liu Yiding Signed-off-by: Masami Hiramatsu --- tools/testing/selftests/ftrace/test.d/functions|3 +++ .../test.d/trigger/trigger-trace-marker-hist.tc|2 +- .../trigger-trace-marker-synthetic-kernel.tc |4 .../trigger/trigger-trace-marker-synthetic.tc |4 ++-- 4 files changed, 10 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/ftrace/test.d/functions b/tools/testing/selftests/ftrace/test.d/functions index 5d4550591ff9..ea59b6ea2c3e 100644 --- a/tools/testing/selftests/ftrace/test.d/functions +++ b/tools/testing/selftests/ftrace/test.d/functions @@ -1,3 +1,6 @@ +# Since the built-in echo has different behavior in POSIX shell (dash) and +# bash, we forcibly use /bin/echo -E (not interpret backslash escapes). +alias echo="/bin/echo -E" Hi Masami, Steven It seems that only kprobe_syntax_errors.tc is impacted by the issue currently. Is it necessary for all tests to use /bin/echo and could we just make kprobe_syntax_errors.tc use /bin/echo? Best Regards, Xiao Yang clear_trace() { # reset trace output echo> trace diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc index ab6bedb25736..b3f70f53ee69 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-hist.tc @@ -30,7 +30,7 @@ fi echo "Test histogram trace_marker tigger" -echo 'hist:keys=common_pid'> events/ftrace/print/trigger +echo 'hist:keys=ip'> events/ftrace/print/trigger for i in `seq 1 10` ; do echo "hello"> trace_marker; done grep 'hitcount: *10$' events/ftrace/print/hist> /dev/null || \ fail "hist trigger did not trigger correct times on trace_marker" diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc index 18b4d1c2807e..c1625d945f4d 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic-kernel.tc @@ -44,6 +44,10 @@ echo 'latency u64 lat'> synthetic_events echo 'hist:keys=pid:ts0=common_timestamp.usecs'> events/sched/sched_waking/trigger echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).latency($lat)'> events/ftrace/print/trigger echo 'hist:keys=common_pid,lat:sort=lat'> events/synthetic/latency/trigger + +# We have to use the built-in echo here because waking up pid must be same +# as echoing pid. +alias echo=echo sleep 1 echo "hello"> trace_marker diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc index dd262d6d0db6..23e52c8d71de 100644 --- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc +++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-trace-marker-synthetic.tc @@ -36,8 +36,8 @@ fi echo "Test histogram trace_marker to trace_marker latency histogram trigger" echo 'latency u64 lat'> synthetic_events -echo 'hist:keys=common_pid:ts0=common_timestamp.usecs if buf == "start"'> events/ftrace/print/trigger -echo 'hist:keys=common_pid:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print).latency($lat) if buf == "end"'>> events/ftrace/print/trigger +echo 'hist:keys=ip:ts0=common_timestamp.usecs if buf == "start"'> events/ftrace/print/trigger +echo 'hist:keys=ip:lat=common_timestamp.usecs-$ts0:onmatch(ftrace.print).latency($lat) if buf == "end"'>> events/ftrace/print/trigger echo 'hist:keys=common_pid,lat:sort=lat'> events/synthetic/latency/trigger echo -n "start"> trace_marker echo -n "end"> trace_marker .
Re: [PATCH] kernel/trace: Stop and wait for kthread on preempt irq module unload
On 2020/4/28 22:45, Steven Rostedt wrote: diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index 1c28ca20e30b..6d9131ae7e8c 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -113,15 +113,27 @@ static int preemptirq_delay_run(void *data) for (i = 0; i< s; i++) (testfuncs[i])(i); + + while (!kthread_should_stop()) { + schedule(); + set_current_state(TASK_INTERRUPTIBLE); + } + + __set_current_state(TASK_RUNNING); + return 0; } Hi Steven, Thanks for your patch. I also used the following steps to do test and didn't get any warning/panic after applying your patch. - for i in $(seq 1 100); do modprobe preemptirq_delay_test test_mode=irq delay=50; rmmod preemptirq_delay_test; done for i in $(seq 1 100); do modprobe preemptirq_delay_test test_mode=preempt delay=50; rmmod preemptirq_delay_test; done - But I am not sure which fix(from you and Joel) is better. Thanks, Xiao Yang
Re: [PATCH] kernel/trace: Stop and wait for kthread on preempt irq module unload
On 2020/4/28 22:15, Joel Fernandes wrote: I am wondering if it is because in your test, the kthread exits too quickly. We have these comments in kthread_stop(): * If threadfn() may call do_exit() itself, the caller must ensure * task_struct can't go away. Does the below diff on top of the previous patch help? ---8<--- diff --git a/kernel/trace/preemptirq_delay_test.c b/kernel/trace/preemptirq_delay_test.c index 1c28ca20e30b6..8051946a18989 100644 --- a/kernel/trace/preemptirq_delay_test.c +++ b/kernel/trace/preemptirq_delay_test.c @@ -152,6 +152,8 @@ static int __init preemptirq_delay_init(void) int retval; test_task = preemptirq_start_test(); + get_task_struct(test_task); + retval = PTR_ERR_OR_ZERO(test_task); if (retval != 0) return retval; @@ -172,8 +174,10 @@ static void __exit preemptirq_delay_exit(void) { kobject_put(preemptirq_delay_kobj); - if (test_task) + if (test_task) { kthread_stop(test_task); + put_task_struct(test_task); + } } Hi Joel, Thanks for your additional patch. First, We have to avoid kbuild error by including --- kernel/trace/preemptirq_delay_test.c: In function ‘preemptirq_delay_init’: kernel/trace/preemptirq_delay_test.c:155:2: error: implicit declaration of function ‘get_task_struct’; did you mean ‘set_task_cpu’? [-Werror=implicit-function-declaration] get_task_struct(test_task); ^~~ set_task_cpu kernel/trace/preemptirq_delay_test.c: In function ‘preemptirq_delay_exit’: kernel/trace/preemptirq_delay_test.c:179:3: error: implicit declaration of function ‘put_task_struct’; did you mean ‘set_task_cpu’? [-Werror=implicit-function-declaration] put_task_struct(test_task); ^~~ set_task_cpu cc1: some warnings being treated as errors --- Second, I used the following steps to do test and didn't get any warning/panic after applying your additional patch: --- for i in $(seq 1 100); do modprobe preemptirq_delay_test test_mode=irq delay=50; rmmod preemptirq_delay_test; done for i in $(seq 1 100); do modprobe preemptirq_delay_test test_mode=preempt delay=50; rmmod preemptirq_delay_test; done --- Thanks, Xiao Yang
Re: [PATCH] kernel/trace: Stop and wait for kthread on preempt irq module unload
[ 178.641300] Internal error: Oops: 9604 [#1] PREEMPT SMP [ 178.641931] Modules linked in: sunrpc vfat fat ext4 mbcache jbd2 crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ip_tables xfs libcrc32c virtio_net net_failover virtio_mmio failover virtio_blk [last unloaded: preemptirq_delay_test] [ 178.643547] CPU: 1 PID: 2394 Comm: pmlogger_check Tainted: GW 5.6.0-rc7+ #18 [ 178.644219] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 178.644782] pstate: 6005 (nZCv daif -PAN -UAO) [ 178.645219] pc : __kmalloc+0xc8/0x2e8 [ 178.645538] lr : __kmalloc+0x9c/0x2e8 [ 178.645841] sp : fe001312fc20 [ 178.646127] x29: fe001312fc20 x28: fc0042a0af00 [ 178.646532] x27: fc005a7ad000 x26: cc79f001 [ 178.647037] x25: fc0042a0af00 x24: fc007d76fc00 [ 178.647431] x23: fe00103e95e4 x22: 001b [ 178.647895] x21: 0cc0 x20: fc007d76fc00 [ 178.648276] x19: fc0043ca x18: [ 178.648673] x17: x16: [ 178.649077] x15: x14: 0010 [ 178.649608] x13: x12: [ 178.649963] x11: x10: [ 178.650337] x9 : x8 : 1f18 [ 178.650683] x7 : 1f18 x6 : fc005a7ad1f8 [ 178.651099] x5 : x4 : [ 178.651557] x3 : x2 : fe001021d1e0 [ 178.652006] x1 : x0 : 0001 [ 178.652407] Call trace: [ 178.652694] __kmalloc+0xc8/0x2e8 [ 178.652972] load_elf_binary+0xdc/0xdb8 [ 178.653286] search_binary_handler+0x98/0x268 [ 178.653626] __do_execve_file+0x4ec/0x7c8 [ 178.653985] __arm64_sys_execve+0x40/0x50 [ 178.654310] do_el0_svc+0xf8/0x1e0 [ 178.654578] el0_sync_handler+0x134/0x1bc [ 178.654962] el0_sync+0x140/0x180 [ 178.655509] Code: b4001073 b9402281 b9401380 11000400 (f8616a7b) [ 178.656543] ---[ end trace 437c7bc9df5e92df ]--- [ 178.657100] Kernel panic - not syncing: Fatal exception [ 178.657630] SMP: stopping secondary CPUs [ 178.658237] Kernel Offset: disabled [ 178.658644] CPU features: 0x10002,20006082 [ 178.658903] Memory Limit: none [ 178.659263] ---[ end Kernel panic - not syncing: Fatal exception ]--- --- Thanks, Xiao Yang On 2020/4/28 18:19, Xiao Yang wrote: > Hi Joel, > > Thanks for your quick fix. > > Unfortunately, it fixes my original panic but introduces other > issues(two wanings and one panic) on my arm64 vm, as below: > > [ 3465.434942] [ cut here ] > [ 3465.435481] refcount_t: addition on 0; use-after-free. > [ 3465.437071] WARNING: CPU: 1 PID: 6708 at lib/refcount.c:25 > refcount_warn_saturate+0x9c/0x140 > [ 3465.437720] Modules linked in: preemptirq_delay_test(O-) sunrpc vfat > fat ext4 mbcache jbd2 crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce > ip_tables xfs libcrc32c virtio_net net_failover failover virtio_mmio > virtio_blk > [ 3465.439787] CPU: 1 PID: 6708 Comm: rmmod Tainted: G O > 5.6.0-rc7+ #18 > [ 3465.440316] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 > 02/06/2015 > [ 3465.440967] pstate: 4005 (nZcv daif -PAN -UAO) > [ 3465.441297] pc : refcount_warn_saturate+0x9c/0x140 > [ 3465.441592] lr : refcount_warn_saturate+0x9c/0x140 > [ 3465.441919] sp : fe001382fd70 > [ 3465.442160] x29: fe001382fd70 x28: fc004876d200 > [ 3465.442649] x27: x26: > [ 3465.443071] x25: x24: fe00115bbbf0 > [ 3465.443670] x23: x22: 0200 > [ 3465.444194] x21: fe0011273988 x20: fc0063fdd228 > [ 3465.444576] x19: fc0063fdd200 x18: 0010 > [ 3465.444939] x17: x16: > [ 3465.445329] x15: x14: fe0011273988 > [ 3465.445698] x13: fe009382fa97 x12: fe001382fa9f > [ 3465.446116] x11: fe00112b x10: fe001382fa20 > [ 3465.446498] x9 : ffd0 x8 : 6572662d72657466 > [ 3465.446941] x7 : 0149 x6 : fe001127cf50 > [ 3465.447375] x5 : fe001127c000 x4 : > [ 3465.447757] x3 : fe001127cf50 x2 : > [ 3465.448161] x1 : 2e36d2803fe6b700 x0 : > [ 3465.448702] Call trace: > [ 3465.448979] refcount_warn_saturate+0x9c/0x140 > [ 3465.449330] kthread_stop+0x48/0x278 > [ 3465.450144] preemptirq_delay_exit+0x28/0xfc8c [preemptirq_delay_test] > [ 3465.450625] __arm64_sys_delete_module+0x14c/0x298 > [ 3465.450998] do_el0_svc+0xf8/0x1e0 > [ 3465.451372] el0_sync_handler+0x134/0x1bc > [ 3465.451701] el0_sync+0x140/0x180 > [ 3465.452099] ---[ end trace 1a8ec2201af5e8c7 ]--- > [ 3465.478208] ---
Re: [PATCH] selftests/ftrace: treat module requirement unmet situation as unsupported
Hi Lin, It looks fine to me. Reviewed-by: Xiao Yang Thanks, Xiao Yang On 2020/4/29 17:50, Po-Hsu Lin wrote: > When the required module for the test does not exist, use > exit_unsupported instead of exit_unresolved to indicate this test is > not supported. > > By doing this we can make test behaviour in sync with the > irqsoff_tracer.tc test in preemptirq, which is also treating module > existence in this way. Moreover, the test won't exit with a non-zero > return value if the module does not exist. > > Fixes: 646f01ccdd59 ("ftrace/selftest: Add tests to test > register_ftrace_direct()") > Fixes: 4d23e9b4fd2e ("selftests/ftrace: Add trace_printk sample module test") > Fixes: 7bc026d6c032 ("selftests/ftrace: Add function filter on module > testcase") > Fixes: af2a0750f374 ("selftests/ftrace: Improve kprobe on module testcase to > load/unload module") > Signed-off-by: Po-Hsu Lin > --- > tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc | 2 +- > tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc | 2 +- > tools/testing/selftests/ftrace/test.d/event/trace_printk.tc| 2 +- > tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc | 2 +- > tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc | 2 +- > 5 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc > b/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc > index d75a869..3d6189e 100644 > --- a/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc > +++ b/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc > @@ -5,7 +5,7 @@ > rmmod ftrace-direct ||: > if ! modprobe ftrace-direct ; then > echo "No ftrace-direct sample module - please make > CONFIG_SAMPLE_FTRACE_DIRECT=m" > - exit_unresolved; > + exit_unsupported; > fi > > echo "Let the module run a little" > diff --git a/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc > b/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc > index 801ecb6..3d0e3ca 100644 > --- a/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc > +++ b/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc > @@ -5,7 +5,7 @@ > rmmod ftrace-direct ||: > if ! modprobe ftrace-direct ; then > echo "No ftrace-direct sample module - please build with > CONFIG_SAMPLE_FTRACE_DIRECT=m" > - exit_unresolved; > + exit_unsupported; > fi > > if [ ! -f kprobe_events ]; then > diff --git a/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc > b/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc > index b02550b..dd8b10d 100644 > --- a/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc > +++ b/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc > @@ -5,7 +5,7 @@ > rmmod trace-printk ||: > if ! modprobe trace-printk ; then > echo "No trace-printk sample module - please make > CONFIG_SAMPLE_TRACE_PRINTK=m" > - exit_unresolved; > + exit_unsupported; > fi > > echo "Waiting for irq work" > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc > b/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc > index 1a4b4a4..26dc06a 100644 > --- a/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc > @@ -13,7 +13,7 @@ echo '*:mod:trace_printk'> set_ftrace_filter > if ! modprobe trace-printk ; then > echo "No trace-printk sample module - please make > CONFIG_SAMPLE_TRACE_PRINTK= > m" > - exit_unresolved; > + exit_unsupported; > fi > > : "Wildcard should be resolved after loading module" > diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc > b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc > index d861bd7..4e07c69 100644 > --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc > @@ -8,7 +8,7 @@ rmmod trace-printk ||: > if ! modprobe trace-printk ; then > echo "No trace-printk sample module - please make > CONFIG_SAMPLE_TRACE_PRINTK= > m" > - exit_unresolved; > + exit_unsupported; > fi > > MOD=trace_printk
Re: [PATCH] kernel/trace: Stop and wait for kthread on preempt irq module unload
: 9606 [#1] PREEMPT SMP [ 3465.509527] Modules linked in: preemptirq_delay_test(O-) sunrpc vfat fat ext4 mbcache jbd2 crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ip_tables xfs libcrc32c virtio_net net_failover failover virtio_mmio virtio_blk [ 3465.510964] CPU: 1 PID: 6708 Comm: rmmod Tainted: GW O 5.6.0-rc7+ #18 [ 3465.511527] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 3465.512031] pstate: 6085 (nZCv daIf -PAN -UAO) [ 3465.512392] pc : __list_add_valid+0x18/0xa0 [ 3465.512740] lr : wait_for_completion+0xc8/0x178 [ 3465.513085] sp : fe001382fd00 [ 3465.513362] x29: fe001382fd00 x28: fc004876d200 [ 3465.513769] x27: x26: [ 3465.514205] x25: x24: fc0076e616c8 [ 3465.514638] x23: fe001382fd68 x22: [ 3465.515030] x21: fe0011273988 x20: fc0076e616c0 [ 3465.515444] x19: fc0076e616b8 x18: 0010 [ 3465.515826] x17: x16: [ 3465.516184] x15: x14: fe0011273988 [ 3465.516584] x13: fe009382fa97 x12: fe001382fa9f [ 3465.516976] x11: fe00112b x10: fe001382fa20 [ 3465.517351] x9 : ffd0 x8 : 6572662d72657466 [ 3465.517750] x7 : 0149 x6 : fe001127cf50 [ 3465.518169] x5 : 0001 x4 : fc0076e616c8 [ 3465.518454] x3 : fe0010128b38 x2 : [ 3465.518711] x1 : x0 : fe001382fd68 [ 3465.518985] Call trace: [ 3465.519157] __list_add_valid+0x18/0xa0 [ 3465.519351] wait_for_completion+0xc8/0x178 [ 3465.519578] kthread_stop+0x9c/0x278 [ 3465.519779] preemptirq_delay_exit+0x28/0xfc8c [preemptirq_delay_test] [ 3465.520109] __arm64_sys_delete_module+0x14c/0x298 [ 3465.520342] do_el0_svc+0xf8/0x1e0 [ 3465.520520] el0_sync_handler+0x134/0x1bc [ 3465.520718] el0_sync+0x140/0x180 [ 3465.521177] Code: 910003fd f9400442 eb01005f 54000141 (f9400041) [ 3465.522258] ---[ end trace 1a8ec2201af5e8c9 ]--- [ 3465.522746] Kernel panic - not syncing: Fatal exception [ 3465.523242] SMP: stopping secondary CPUs [ 3465.523898] Kernel Offset: disabled [ 3465.524423] CPU features: 0x10002,20006082 [ 3465.524939] Memory Limit: none [ 3465.525534] ---[ end Kernel panic - not syncing: Fatal exception ]--- I am looking into these issues. Thanks, Xiao Yang On 2020/4/25 6:36, Joel Fernandes (Google) wrote: > Kthread running the test needs to be stopped or it can continue > executing code unloaded by module causing a crash. > > Suggested-by: Steven Rostedt > Reported-by: Xiao Yang > Link: http://lore.kernel.org/r/5ea2b0c8.2080...@cn.fujitsu.com > Signed-off-by: Joel Fernandes (Google) > --- > kernel/trace/preemptirq_delay_test.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/trace/preemptirq_delay_test.c > b/kernel/trace/preemptirq_delay_test.c > index 31c0fad4cb9e1..1c28ca20e30b6 100644 > --- a/kernel/trace/preemptirq_delay_test.c > +++ b/kernel/trace/preemptirq_delay_test.c > @@ -145,10 +145,10 @@ static struct attribute_group attr_group = { > }; > > static struct kobject *preemptirq_delay_kobj; > +static struct task_struct *test_task; > > static int __init preemptirq_delay_init(void) > { > - struct task_struct *test_task; > int retval; > > test_task = preemptirq_start_test(); > @@ -171,6 +171,9 @@ static int __init preemptirq_delay_init(void) > static void __exit preemptirq_delay_exit(void) > { > kobject_put(preemptirq_delay_kobj); > + > + if (test_task) > + kthread_stop(test_task); > } > > module_init(preemptirq_delay_init)
[PATCH v1 3/3] clk: rockchip: Add clock controller for the RK3308
Add the clock tree definition for the new RK3308 SoC. Signed-off-by: Finley Xiao --- drivers/clk/rockchip/Makefile | 1 + drivers/clk/rockchip/clk-rk3308.c | 955 ++ drivers/clk/rockchip/clk.h| 13 + 3 files changed, 969 insertions(+) create mode 100644 drivers/clk/rockchip/clk-rk3308.c diff --git a/drivers/clk/rockchip/Makefile b/drivers/clk/rockchip/Makefile index ff35ab463a6f..7c5b5813a87c 100644 --- a/drivers/clk/rockchip/Makefile +++ b/drivers/clk/rockchip/Makefile @@ -20,6 +20,7 @@ obj-y += clk-rk3128.o obj-y += clk-rk3188.o obj-y += clk-rk3228.o obj-y += clk-rk3288.o +obj-y += clk-rk3308.o obj-y += clk-rk3328.o obj-y += clk-rk3368.o obj-y += clk-rk3399.o diff --git a/drivers/clk/rockchip/clk-rk3308.c b/drivers/clk/rockchip/clk-rk3308.c new file mode 100644 index ..b0baf87a283e --- /dev/null +++ b/drivers/clk/rockchip/clk-rk3308.c @@ -0,0 +1,955 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2019 Rockchip Electronics Co. Ltd. + * Author: Finley Xiao + */ + +#include +#include +#include +#include +#include +#include +#include "clk.h" + +#define RK3308_GRF_SOC_STATUS0 0x380 + +enum rk3308_plls { + apll, dpll, vpll0, vpll1, +}; + +static struct rockchip_pll_rate_table rk3308_pll_rates[] = { + /* _mhz, _refdiv, _fbdiv, _postdiv1, _postdiv2, _dsmpd, _frac */ + RK3036_PLL_RATE(160800, 1, 67, 1, 1, 1, 0), + RK3036_PLL_RATE(158400, 1, 66, 1, 1, 1, 0), + RK3036_PLL_RATE(156000, 1, 65, 1, 1, 1, 0), + RK3036_PLL_RATE(153600, 1, 64, 1, 1, 1, 0), + RK3036_PLL_RATE(151200, 1, 63, 1, 1, 1, 0), + RK3036_PLL_RATE(148800, 1, 62, 1, 1, 1, 0), + RK3036_PLL_RATE(146400, 1, 61, 1, 1, 1, 0), + RK3036_PLL_RATE(144000, 1, 60, 1, 1, 1, 0), + RK3036_PLL_RATE(141600, 1, 59, 1, 1, 1, 0), + RK3036_PLL_RATE(139200, 1, 58, 1, 1, 1, 0), + RK3036_PLL_RATE(136800, 1, 57, 1, 1, 1, 0), + RK3036_PLL_RATE(134400, 1, 56, 1, 1, 1, 0), + RK3036_PLL_RATE(132000, 1, 55, 1, 1, 1, 0), + RK3036_PLL_RATE(129600, 1, 54, 1, 1, 1, 0), + RK3036_PLL_RATE(127200, 1, 53, 1, 1, 1, 0), + RK3036_PLL_RATE(124800, 1, 52, 1, 1, 1, 0), + RK3036_PLL_RATE(12, 1, 50, 1, 1, 1, 0), + RK3036_PLL_RATE(118800, 2, 99, 1, 1, 1, 0), + RK3036_PLL_RATE(110400, 1, 46, 1, 1, 1, 0), + RK3036_PLL_RATE(11, 12, 550, 1, 1, 1, 0), + RK3036_PLL_RATE(100800, 1, 84, 2, 1, 1, 0), + RK3036_PLL_RATE(10, 6, 500, 2, 1, 1, 0), + RK3036_PLL_RATE(98400, 1, 82, 2, 1, 1, 0), + RK3036_PLL_RATE(96000, 1, 80, 2, 1, 1, 0), + RK3036_PLL_RATE(93600, 1, 78, 2, 1, 1, 0), + RK3036_PLL_RATE(91200, 1, 76, 2, 1, 1, 0), + RK3036_PLL_RATE(9, 4, 300, 2, 1, 1, 0), + RK3036_PLL_RATE(88800, 1, 74, 2, 1, 1, 0), + RK3036_PLL_RATE(86400, 1, 72, 2, 1, 1, 0), + RK3036_PLL_RATE(84000, 1, 70, 2, 1, 1, 0), + RK3036_PLL_RATE(81600, 1, 68, 2, 1, 1, 0), + RK3036_PLL_RATE(8, 6, 400, 2, 1, 1, 0), + RK3036_PLL_RATE(7, 6, 350, 2, 1, 1, 0), + RK3036_PLL_RATE(69600, 1, 58, 2, 1, 1, 0), + RK3036_PLL_RATE(62400, 1, 52, 2, 1, 1, 0), + RK3036_PLL_RATE(6, 1, 75, 3, 1, 1, 0), + RK3036_PLL_RATE(59400, 2, 99, 2, 1, 1, 0), + RK3036_PLL_RATE(50400, 1, 63, 3, 1, 1, 0), + RK3036_PLL_RATE(5, 6, 250, 2, 1, 1, 0), + RK3036_PLL_RATE(40800, 1, 68, 2, 2, 1, 0), + RK3036_PLL_RATE(31200, 1, 52, 2, 2, 1, 0), + RK3036_PLL_RATE(21600, 1, 72, 4, 2, 1, 0), + RK3036_PLL_RATE(9600, 1, 64, 4, 4, 1, 0), + { /* sentinel */ }, +}; + +#define RK3308_DIV_ACLKM_MASK 0x7 +#define RK3308_DIV_ACLKM_SHIFT 12 +#define RK3308_DIV_PCLK_DBG_MASK 0xf +#define RK3308_DIV_PCLK_DBG_SHIFT 8 + +#define RK3308_CLKSEL0(_aclk_core, _pclk_dbg) \ +{ \ + .reg = RK3308_CLKSEL_CON(0),\ + .val = HIWORD_UPDATE(_aclk_core, RK3308_DIV_ACLKM_MASK, \ +RK3308_DIV_ACLKM_SHIFT) | \ + HIWORD_UPDATE(_pclk_dbg, RK3308_DIV_PCLK_DBG_MASK, \ +RK3308_DIV_PCLK_DBG_SHIFT),\ +} + +#define RK3308_CPUCLK_RATE(_prate, _aclk_core, _pclk_dbg) \ +{ \ + .prate = _prate,\ + .divs = { \ + RK3308_CLKSEL0(_aclk_core, _pclk_dbg), \ + }, \ +}
[PATCH v1 1/3] dt-bindings: Add bindings for rk3308 clock controller
Add devicetree bindings for Rockchip cru which found on Rockchip SoCs. Signed-off-by: Finley Xiao --- .../devicetree/bindings/clock/rockchip,rk3308.txt | 58 ++ 1 file changed, 58 insertions(+) create mode 100644 Documentation/devicetree/bindings/clock/rockchip,rk3308.txt diff --git a/Documentation/devicetree/bindings/clock/rockchip,rk3308.txt b/Documentation/devicetree/bindings/clock/rockchip,rk3308.txt new file mode 100644 index ..5d46c9b7f937 --- /dev/null +++ b/Documentation/devicetree/bindings/clock/rockchip,rk3308.txt @@ -0,0 +1,58 @@ +* Rockchip RK3308 Clock and Reset Unit + +The RK3308 clock controller generates and supplies clock to various +controllers within the SoC and also implements a reset controller for SoC +peripherals. + +Required Properties: + +- compatible: CRU should be "rockchip,rk3308-cru" +- reg: physical base address of the controller and length of memory mapped + region. +- #clock-cells: should be 1. +- #reset-cells: should be 1. + +Optional Properties: + +- rockchip,grf: phandle to the syscon managing the "general register files" + If missing, pll rates are not changeable, due to the missing pll lock status. + +Each clock is assigned an identifier and client nodes can use this identifier +to specify the clock which they consume. All available clocks are defined as +preprocessor macros in the dt-bindings/clock/rk3308-cru.h headers and can be +used in device tree sources. Similar macros exist for the reset sources in +these files. + +External clocks: + +There are several clocks that are generated outside the SoC. It is expected +that they are defined using standard clock bindings with following +clock-output-names: + - "xin24m" - crystal input - required, + - "xin32k" - rtc clock - optional, + - "mclk_i2sx_xch_in" - external I2S or SPDIF clock - optional, + - "mac_clkin" - external MAC clock - optional + +Example: Clock controller node: + + cru: clock-controller@ff50 { + compatible = "rockchip,rk3308-cru"; + reg = <0x0 0xff50 0x0 0x1000>; + rockchip,grf = <>; + #clock-cells = <1>; + #reset-cells = <1>; + }; + +Example: UART controller node that consumes the clock generated by the clock + controller: + + uart0: serial@ff0a { + compatible = "rockchip,rk3308-uart", "snps,dw-apb-uart"; + reg = <0x0 0xff0a 0x0 0x100>; + interrupts = ; + clocks = < SCLK_UART0>, < PCLK_UART0>; + clock-names = "baudclk", "apb_pclk"; + reg-shift = <2>; + reg-io-width = <4>; + status = "disabled"; + }; -- 2.11.0
[PATCH v1 0/3] clk: rockchip: support clock controller for rk3308 SoC
Finley Xiao (3): dt-bindings: Add bindings for rk3308 clock controller clk: rockchip: Add dt-binding header for rk3308 clk: rockchip: Add clock controller for the RK3308 .../devicetree/bindings/clock/rockchip,rk3308.txt | 58 ++ drivers/clk/rockchip/Makefile | 1 + drivers/clk/rockchip/clk-rk3308.c | 955 + drivers/clk/rockchip/clk.h | 13 + include/dt-bindings/clock/rk3308-cru.h | 387 + 5 files changed, 1414 insertions(+) create mode 100644 Documentation/devicetree/bindings/clock/rockchip,rk3308.txt create mode 100644 drivers/clk/rockchip/clk-rk3308.c create mode 100644 include/dt-bindings/clock/rk3308-cru.h -- 2.11.0
[PATCH v1 2/3] clk: rockchip: Add dt-binding header for rk3308
Add the dt-bindings header for the rk3308, that gets shared between the clock controller and the clock references in the dts. Signed-off-by: Finley Xiao --- include/dt-bindings/clock/rk3308-cru.h | 387 + 1 file changed, 387 insertions(+) create mode 100644 include/dt-bindings/clock/rk3308-cru.h diff --git a/include/dt-bindings/clock/rk3308-cru.h b/include/dt-bindings/clock/rk3308-cru.h new file mode 100644 index ..d97840f9ee2e --- /dev/null +++ b/include/dt-bindings/clock/rk3308-cru.h @@ -0,0 +1,387 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2019 Rockchip Electronics Co. Ltd. + * Author: Finley Xiao + */ + +#ifndef _DT_BINDINGS_CLK_ROCKCHIP_RK3308_H +#define _DT_BINDINGS_CLK_ROCKCHIP_RK3308_H + +/* core clocks */ +#define PLL_APLL 1 +#define PLL_DPLL 2 +#define PLL_VPLL0 3 +#define PLL_VPLL1 4 +#define ARMCLK 5 + +/* sclk (special clocks) */ +#define USB480M14 +#define SCLK_RTC32K15 +#define SCLK_PVTM_CORE 16 +#define SCLK_UART0 17 +#define SCLK_UART1 18 +#define SCLK_UART2 19 +#define SCLK_UART3 20 +#define SCLK_UART4 21 +#define SCLK_I2C0 22 +#define SCLK_I2C1 23 +#define SCLK_I2C2 24 +#define SCLK_I2C3 25 +#define SCLK_PWM0 26 +#define SCLK_SPI0 27 +#define SCLK_SPI1 28 +#define SCLK_SPI2 29 +#define SCLK_TIMER030 +#define SCLK_TIMER131 +#define SCLK_TIMER232 +#define SCLK_TIMER333 +#define SCLK_TIMER434 +#define SCLK_TIMER535 +#define SCLK_TSADC 36 +#define SCLK_SARADC37 +#define SCLK_OTP 38 +#define SCLK_OTP_USR 39 +#define SCLK_CPU_BOOST 40 +#define SCLK_CRYPTO41 +#define SCLK_CRYPTO_APK42 +#define SCLK_NANDC_DIV 43 +#define SCLK_NANDC_DIV50 44 +#define SCLK_NANDC 45 +#define SCLK_SDMMC_DIV 46 +#define SCLK_SDMMC_DIV50 47 +#define SCLK_SDMMC 48 +#define SCLK_SDMMC_DRV 49 +#define SCLK_SDMMC_SAMPLE 50 +#define SCLK_SDIO_DIV 51 +#define SCLK_SDIO_DIV5052 +#define SCLK_SDIO 53 +#define SCLK_SDIO_DRV 54 +#define SCLK_SDIO_SAMPLE 55 +#define SCLK_EMMC_DIV 56 +#define SCLK_EMMC_DIV5057 +#define SCLK_EMMC 58 +#define SCLK_EMMC_DRV 59 +#define SCLK_EMMC_SAMPLE 60 +#define SCLK_SFC 61 +#define SCLK_OTG_ADP 62 +#define SCLK_MAC_SRC 63 +#define SCLK_MAC 64 +#define SCLK_MAC_REF 65 +#define SCLK_MAC_RX_TX 66 +#define SCLK_MAC_RMII 67 +#define SCLK_DDR_MON_TIMER 68 +#define SCLK_DDR_MON 69 +#define SCLK_DDRCLK70 +#define SCLK_PMU 71 +#define SCLK_USBPHY_REF72 +#define SCLK_WIFI 73 +#define SCLK_PVTM_PMU 74 +#define SCLK_PDM 75 +#define SCLK_I2S0_8CH_TX 76 +#define SCLK_I2S0_8CH_TX_OUT 77 +#define SCLK_I2S0_8CH_RX 78 +#define SCLK_I2S0_8CH_RX_OUT 79 +#define SCLK_I2S1_8CH_TX 80 +#define SCLK_I2S1_8CH_TX_OUT 81 +#define SCLK_I2S1_8CH_RX 82 +#define SCLK_I2S1_8CH_RX_OUT 83 +#define SCLK_I2S2_8CH_TX 84 +#define SCLK_I2S2_8CH_TX_OUT 85 +#define SCLK_I2S2_8CH_RX 86 +#define SCLK_I2S2_8CH_RX_OUT 87 +#define SCLK_I2S3_8CH_TX 88 +#define SCLK_I2S3_8CH_TX_OUT 89 +#define SCLK_I2S3_8CH_RX 90 +#define SCLK_I2S3_8CH_RX_OUT 91 +#define SCLK_I2S0_2CH 92 +#define SCLK_I2S0_2CH_OUT 93 +#define SCLK_I2S1_2CH 94 +#define SCLK_I2S1_2CH_OUT 95 +#define SCLK_SPDIF_TX_DIV 96 +#define SCLK_SPDIF_TX_DIV5097 +#define SCLK_SPDIF_TX 98 +#define SCLK_SPDIF_RX_DIV 99 +#define SCLK_SPDIF_RX_DIV50100 +#define SCLK_SPDIF_RX 101 +#define SCLK_I2S0_8CH_TX_MUX 102 +#define SCLK_I2S0_8CH_RX_MUX 103 +#define SCLK_I2S1_8CH_TX_MUX 104 +#define SCLK_I2S1_8CH_RX_MUX 105 +#define SCLK_I2S2_8CH_TX_MUX 106 +#define SCLK_I2S2_8CH_RX_MUX 107 +#define SCLK_I2S3_8CH_TX_MUX 108 +#define SCLK_I2S3_8CH_RX_MUX 109 +#define SCLK_I2S0_8CH_TX_SRC 110 +#define SCLK_I2S0_8CH_RX_SRC 111 +#define SCLK_I2S1_8CH_TX_SRC 112 +#define SCLK_I2S1_8CH_RX_SRC 113 +#define SCLK_I2S2_8CH_TX_SRC 114 +#define SCLK_I2S2_8CH_RX_SRC 115 +#define SCLK_I2S3_8CH_TX_SRC 116 +#define SCLK_I2S3_8CH_RX_SRC 117 +#define SCLK_I2S0_2CH_SRC 118 +#define SCLK_I2S1_2CH_SRC 119 +#define SCLK_PWM1 120 +#define SCLK_PWM2 121 +#define SCLK_OWIRE 122 + +/* dclk */ +#define DCLK_VOP 125 + +/* aclk */ +#define ACLK_BUS_SRC 130 +#define ACLK_BUS
[PATCH] kbuild: Remove unused variable TMPO of ld-option
ld-option implementation has been simplified so variable TMPO is no longer needed. Fixes: Commit 0294e6f4a000 ("kbuild: simplify ld-option implementation") Signed-off-by: Xiao Yang --- scripts/Kbuild.include | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 4b0432e095ae..13a6f627351d 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -91,12 +91,11 @@ TMPOUT := $(if $(KBUILD_EXTMOD),$(firstword $(KBUILD_EXTMOD))/) # automatically cleaned up. try-run = $(shell set -e; \ TMP="$(TMPOUT)..tmp"; \ - TMPO="$(TMPOUT)..o";\ if ($(1)) >/dev/null 2>&1; \ then echo "$(2)"; \ else echo "$(3)"; \ fi; \ - rm -f "$$TMP" "$$TMPO") + rm -f "$$TMP") # as-option # Usage: cflags-y += $(call as-option,-Wa$(comma)-isa=foo,) -- 2.21.0
Re: [PATCH v1 1/3] net: hisilicon: make hip04_tx_reclaim non-reentrant
On 2019/8/6 8:46, Jakub Kicinski wrote: > On Sat, 3 Aug 2019 20:31:39 +0800, Jiangfeng Xiao wrote: >> If hip04_tx_reclaim is interrupted while it is running >> and then __napi_schedule continues to execute >> hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs >> and oops is generated. So you need to mask the interrupt >> during the hip04_tx_reclaim run. > > Napi poll method for the same napi instance can't be run concurrently. > Could you explain a little more what happens here? > Because netif_napi_add(ndev, >napi, hip04_rx_poll, NAPI_POLL_WEIGHT); So hip04_rx_poll is a napi instance. I did not say that hip04_rx_poll has reentered. I am talking about the reentrant of hip04_tx_reclaim. Pre-modification code: static int hip04_rx_poll(struct napi_struct *napi, int budget) { [...] /* enable rx interrupt */ writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN); napi_complete_done(napi, rx); done: /* clean up tx descriptors and start a new timer if necessary */ tx_remaining = hip04_tx_reclaim(ndev, false); [...] } hip04_tx_reclaim is executed after "enable rx interrupt" and napi_complete_done. If hip04_tx_reclaim is interrupted while it is running, and then __irq_svc->gic_handle_irq->hip04_mac_interrupt->__napi_schedule->hip04_rx_poll->hip04_tx_reclaim Also looking at hip04_tx_reclaim static int hip04_tx_reclaim(struct net_device *ndev, bool force) { [1] struct hip04_priv *priv = netdev_priv(ndev); [2] unsigned tx_tail = priv->tx_tail; [3] [...] [4] bytes_compl += priv->tx_skb[tx_tail]->len; [5] dev_kfree_skb(priv->tx_skb[tx_tail]); [6] priv->tx_skb[tx_tail] = NULL; [7] tx_tail = TX_NEXT(tx_tail); [8] [...] [9] priv->tx_tail = tx_tail; } An interrupt occurs if hip04_tx_reclaim just executes to the line 6, priv->tx_skb[tx_tail] is NULL, and then __irq_svc->gic_handle_irq->hip04_mac_interrupt->__napi_schedule->hip04_rx_poll->hip04_tx_reclaim Then hip04_tx_reclaim will handle kernel NULL pointer dereference on line 4. A reentrant occurs in hip04_tx_reclaim and oops is generated. My commit is to execute hip04_tx_reclaim before "enable rx interrupt" and napi_complete_done. Then hip04_tx_reclaim can also be protected by the napi poll method so that no reentry occurs. thanks.
[PATCH v1 0/3] net: hisilicon: Fix a few problems with hip04_eth
During the use of the hip04_eth driver, several problems were found, which solved the hip04_tx_reclaim reentry problem, fixed the problem that hip04_mac_start_xmit never returns NETDEV_TX_BUSY and the dma_map_single failed on the arm64 platform. Jiangfeng Xiao (3): net: hisilicon: make hip04_tx_reclaim non-reentrant net: hisilicon: fix hip04-xmit never return TX_BUSY net: hisilicon: Fix dma_map_single failed on arm64 drivers/net/ethernet/hisilicon/hip04_eth.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) -- 1.8.5.6
[PATCH v1 2/3] net: hisilicon: fix hip04-xmit never return TX_BUSY
TX_DESC_NUM is 256, in tx_count, the maximum value of mod(TX_DESC_NUM - 1) is 254, the variable "count" in the hip04_mac_start_xmit function is never equal to (TX_DESC_NUM - 1), so hip04_mac_start_xmit never return NETDEV_TX_BUSY. tx_count is modified to mod(TX_DESC_NUM) so that the maximum value of tx_count can reach (TX_DESC_NUM - 1), then hip04_mac_start_xmit can reurn NETDEV_TX_BUSY. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 1e1b154..d775b98 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -248,7 +248,7 @@ struct hip04_priv { static inline unsigned int tx_count(unsigned int head, unsigned int tail) { - return (head - tail) % (TX_DESC_NUM - 1); + return (head - tail) % TX_DESC_NUM; } static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex) -- 1.8.5.6
[PATCH v1 3/3] net: hisilicon: Fix dma_map_single failed on arm64
On the arm64 platform, executing "ifconfig eth0 up" will fail, returning "ifconfig: SIOCSIFFLAGS: Input/output error." ndev->dev is not initialized, dma_map_single->get_dma_ops-> dummy_dma_ops->__dummy_map_page will return DMA_ERROR_CODE directly, so when we use dma_map_single, the first parameter is to use the device of platform_device. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index d775b98..c841674 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -220,6 +220,7 @@ struct hip04_priv { unsigned int reg_inten; struct napi_struct napi; + struct device *dev; struct net_device *ndev; struct tx_desc *tx_desc; @@ -465,7 +466,7 @@ static int hip04_tx_reclaim(struct net_device *ndev, bool force) } if (priv->tx_phys[tx_tail]) { - dma_unmap_single(>dev, priv->tx_phys[tx_tail], + dma_unmap_single(priv->dev, priv->tx_phys[tx_tail], priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE); priv->tx_phys[tx_tail] = 0; @@ -516,8 +517,8 @@ static void hip04_start_tx_timer(struct hip04_priv *priv) return NETDEV_TX_BUSY; } - phys = dma_map_single(>dev, skb->data, skb->len, DMA_TO_DEVICE); - if (dma_mapping_error(>dev, phys)) { + phys = dma_map_single(priv->dev, skb->data, skb->len, DMA_TO_DEVICE); + if (dma_mapping_error(priv->dev, phys)) { dev_kfree_skb(skb); return NETDEV_TX_OK; } @@ -596,7 +597,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) goto refill; } - dma_unmap_single(>dev, priv->rx_phys[priv->rx_head], + dma_unmap_single(priv->dev, priv->rx_phys[priv->rx_head], RX_BUF_SIZE, DMA_FROM_DEVICE); priv->rx_phys[priv->rx_head] = 0; @@ -625,9 +626,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) buf = netdev_alloc_frag(priv->rx_buf_size); if (!buf) goto done; - phys = dma_map_single(>dev, buf, + phys = dma_map_single(priv->dev, buf, RX_BUF_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(>dev, phys)) + if (dma_mapping_error(priv->dev, phys)) goto done; priv->rx_buf[priv->rx_head] = buf; priv->rx_phys[priv->rx_head] = phys; @@ -730,9 +731,9 @@ static int hip04_mac_open(struct net_device *ndev) for (i = 0; i < RX_DESC_NUM; i++) { dma_addr_t phys; - phys = dma_map_single(>dev, priv->rx_buf[i], + phys = dma_map_single(priv->dev, priv->rx_buf[i], RX_BUF_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(>dev, phys)) + if (dma_mapping_error(priv->dev, phys)) return -EIO; priv->rx_phys[i] = phys; @@ -766,7 +767,7 @@ static int hip04_mac_stop(struct net_device *ndev) for (i = 0; i < RX_DESC_NUM; i++) { if (priv->rx_phys[i]) { - dma_unmap_single(>dev, priv->rx_phys[i], + dma_unmap_single(priv->dev, priv->rx_phys[i], RX_BUF_SIZE, DMA_FROM_DEVICE); priv->rx_phys[i] = 0; } @@ -909,6 +910,7 @@ static int hip04_mac_probe(struct platform_device *pdev) return -ENOMEM; priv = netdev_priv(ndev); + priv->dev = d; priv->ndev = ndev; platform_set_drvdata(pdev, ndev); SET_NETDEV_DEV(ndev, >dev); -- 1.8.5.6
[PATCH v1 1/3] net: hisilicon: make hip04_tx_reclaim non-reentrant
If hip04_tx_reclaim is interrupted while it is running and then __napi_schedule continues to execute hip04_rx_poll->hip04_tx_reclaim, reentrancy occurs and oops is generated. So you need to mask the interrupt during the hip04_tx_reclaim run. The kernel oops exception stack is as follows: Unable to handle kernel NULL pointer dereference at virtual address 0050 pgd = c0003000 [0050] *pgd=8000a04003, *pmd= Internal error: Oops: 206 [#1] SMP ARM Modules linked in: hip04_eth mtdblock mtd_blkdevs mtd ohci_platform ehci_platform ohci_hcd ehci_hcd vfat fat sd_mod usb_storage scsi_mod usbcore usb_common CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O4.4.185 #1 Hardware name: Hisilicon A15 task: c0a250e0 task.stack: c0a0 PC is at hip04_tx_reclaim+0xe0/0x17c [hip04_eth] LR is at hip04_tx_reclaim+0x30/0x17c [hip04_eth] pc : []lr : []psr: 600e0313 sp : c0a01d88 ip : fp : c0601f9c r10: r9 : c3482380 r8 : 0001 r7 : r6 : 00e1 r5 : c3482000 r4 : 000c r3 : f2209800 r2 : r1 : r0 : Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 32c5387d Table: 03d28c80 DAC: Process swapper/0 (pid: 0, stack limit = 0xc0a00190) Stack: (0xc0a01d88 to 0xc0a02000) [] (hip04_tx_reclaim [hip04_eth]) from [] (hip04_rx_poll+0x88/0x368 [hip04_eth]) [] (hip04_rx_poll [hip04_eth]) from [] (net_rx_action+0x114/0x34c) [] (net_rx_action) from [] (__do_softirq+0x218/0x318) [] (__do_softirq) from [] (irq_exit+0x88/0xac) [] (irq_exit) from [] (msa_irq_exit+0x11c/0x1d4) [] (msa_irq_exit) from [] (__handle_domain_irq+0x110/0x148) [] (__handle_domain_irq) from [] (gic_handle_irq+0xd4/0x118) [] (gic_handle_irq) from [] (__irq_svc+0x40/0x58) Exception stack(0xc0a01f30 to 0xc0a01f78) 1f20: c0ae8b40 1f40: 0002 e000 c0601f9c c0a2257c c0a22440 c0831a38 1f60: c0a01ec4 c0a01f80 c0203714 c0203718 600e0213 [] (__irq_svc) from [] (arch_cpu_idle+0x20/0x3c) [] (arch_cpu_idle) from [] (cpu_startup_entry+0x244/0x29c) [] (cpu_startup_entry) from [] (rest_init+0xc8/0x10c) [] (rest_init) from [] (start_kernel+0x468/0x514) Code: a40599e5 016086e2 018088e2 7660efe6 (503090e5) ---[ end trace 1db21d6d09c49d74 ]--- Kernel panic - not syncing: Fatal exception in interrupt CPU3: stopping CPU: 3 PID: 0 Comm: swapper/3 Tainted: G DO4.4.185 #1 Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index d604528..1e1b154 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -585,6 +585,9 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) u16 len; u32 err; + /* clean up tx descriptors */ + tx_remaining = hip04_tx_reclaim(ndev, false); + while (cnt && !last) { buf = priv->rx_buf[priv->rx_head]; skb = build_skb(buf, priv->rx_buf_size); @@ -645,8 +648,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) } napi_complete_done(napi, rx); done: - /* clean up tx descriptors and start a new timer if necessary */ - tx_remaining = hip04_tx_reclaim(ndev, false); + /* start a new timer if necessary */ if (rx < budget && tx_remaining) hip04_start_tx_timer(priv); -- 1.8.5.6
[PATCH] net: hisilicon: Use devm_platform_ioremap_resource
Use devm_platform_ioremap_resource instead of devm_ioremap_resource. Make the code simpler. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c| 7 ++- drivers/net/ethernet/hisilicon/hisi_femac.c | 7 ++- drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 7 ++- drivers/net/ethernet/hisilicon/hns_mdio.c | 4 +--- 4 files changed, 7 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 6256357..d604528 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -899,7 +899,6 @@ static int hip04_mac_probe(struct platform_device *pdev) struct of_phandle_args arg; struct net_device *ndev; struct hip04_priv *priv; - struct resource *res; int irq; int ret; @@ -912,16 +911,14 @@ static int hip04_mac_probe(struct platform_device *pdev) platform_set_drvdata(pdev, ndev); SET_NETDEV_DEV(ndev, >dev); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - priv->base = devm_ioremap_resource(d, res); + priv->base = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(priv->base)) { ret = PTR_ERR(priv->base); goto init_fail; } #if defined(CONFIG_HI13X1_GMAC) - res = platform_get_resource(pdev, IORESOURCE_MEM, 1); - priv->sysctrl_base = devm_ioremap_resource(d, res); + priv->sysctrl_base = devm_platform_ioremap_resource(pdev, 1); if (IS_ERR(priv->sysctrl_base)) { ret = PTR_ERR(priv->sysctrl_base); goto init_fail; diff --git a/drivers/net/ethernet/hisilicon/hisi_femac.c b/drivers/net/ethernet/hisilicon/hisi_femac.c index d2e019d..689f18e 100644 --- a/drivers/net/ethernet/hisilicon/hisi_femac.c +++ b/drivers/net/ethernet/hisilicon/hisi_femac.c @@ -781,7 +781,6 @@ static int hisi_femac_drv_probe(struct platform_device *pdev) { struct device *dev = >dev; struct device_node *node = dev->of_node; - struct resource *res; struct net_device *ndev; struct hisi_femac_priv *priv; struct phy_device *phy; @@ -799,15 +798,13 @@ static int hisi_femac_drv_probe(struct platform_device *pdev) priv->dev = dev; priv->ndev = ndev; - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - priv->port_base = devm_ioremap_resource(dev, res); + priv->port_base = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(priv->port_base)) { ret = PTR_ERR(priv->port_base); goto out_free_netdev; } - res = platform_get_resource(pdev, IORESOURCE_MEM, 1); - priv->glb_base = devm_ioremap_resource(dev, res); + priv->glb_base = devm_platform_ioremap_resource(pdev, 1); if (IS_ERR(priv->glb_base)) { ret = PTR_ERR(priv->glb_base); goto out_free_netdev; diff --git a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c index 89ef764..3499705 100644 --- a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c +++ b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c @@ -1097,7 +1097,6 @@ static int hix5hd2_dev_probe(struct platform_device *pdev) const struct of_device_id *of_id = NULL; struct net_device *ndev; struct hix5hd2_priv *priv; - struct resource *res; struct mii_bus *bus; const char *mac_addr; int ret; @@ -1119,15 +1118,13 @@ static int hix5hd2_dev_probe(struct platform_device *pdev) } priv->hw_cap = (unsigned long)of_id->data; - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - priv->base = devm_ioremap_resource(dev, res); + priv->base = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(priv->base)) { ret = PTR_ERR(priv->base); goto out_free_netdev; } - res = platform_get_resource(pdev, IORESOURCE_MEM, 1); - priv->ctrl_base = devm_ioremap_resource(dev, res); + priv->ctrl_base = devm_platform_ioremap_resource(pdev, 1); if (IS_ERR(priv->ctrl_base)) { ret = PTR_ERR(priv->ctrl_base); goto out_free_netdev; diff --git a/drivers/net/ethernet/hisilicon/hns_mdio.c b/drivers/net/ethernet/hisilicon/hns_mdio.c index 918cab1..3e863a7 100644 --- a/drivers/net/ethernet/hisilicon/hns_mdio.c +++ b/drivers/net/ethernet/hisilicon/hns_mdio.c @@ -417,7 +417,6 @@ static int hns_mdio_probe(struct platform_device *pdev) { struct hns_mdio_device *mdio_dev; struct mii_bus *new_bus; - struct resource *res; int ret = -ENODEV; if (!pdev) { @@ -442,8 +441,7 @@ static int hns_mdio_probe(struct platform_device *pdev) new_bus->priv = mdio_dev; new_bus-
Re: [PATCH v2 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first
On 2019/7/9 21:48, Jiangfeng Xiao wrote: > > > On 2019/7/9 17:35, Sergei Shtylyov wrote: >> Hello! >> >> On 09.07.2019 6:31, Jiangfeng Xiao wrote: >> [...] >>> @@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device >>> *pdev) >>> goto init_fail; >>> } >>> +#if defined(CONFIG_HI13X1_GMAC) >>> +res = platform_get_resource(pdev, IORESOURCE_MEM, 1); >>> +priv->sysctrl_base = devm_ioremap_resource(d, res); >> >>There's devm_platform_ioremap_resource() now. > > Thank you for your review, Great issue, which makes my code more concise. > > I will fix it in v3. Or submit a patch to modify it separately, if maintainer > applies this patch series. > I decided to wait for this series of patches to sync to the mainline and then fix this based on the mainline. Because the mainline does not currently have this part of the code, if I submit the changes, and the patch is accidentally merged into another branch or another maintainer to handle, a conflict will occur. As we all know, maintianer has to deal with many commits every day, I don't want to increase the burden of maintainer. So I decided to wait until the patch is synced to the mainline and then modify it, which is more safe.
Re: [PATCH v2 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first
On 2019/7/9 17:35, Sergei Shtylyov wrote: > Hello! > > On 09.07.2019 6:31, Jiangfeng Xiao wrote: > >> HI13X1_GMAC delete request for soft reset at first, >> otherwise, the subsequent initialization will not >> take effect. >> >> Signed-off-by: Jiangfeng Xiao >> --- >> drivers/net/ethernet/hisilicon/hip04_eth.c | 24 >> 1 file changed, 24 insertions(+) >> >> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c >> b/drivers/net/ethernet/hisilicon/hip04_eth.c >> index fe61b01..19d8cfd 100644 >> --- a/drivers/net/ethernet/hisilicon/hip04_eth.c >> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c > [...] >> @@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device *pdev) >> goto init_fail; >> } >> +#if defined(CONFIG_HI13X1_GMAC) >> +res = platform_get_resource(pdev, IORESOURCE_MEM, 1); >> +priv->sysctrl_base = devm_ioremap_resource(d, res); > >There's devm_platform_ioremap_resource() now. Thank you for your review, Great issue, which makes my code more concise. I will fix it in v3. Or submit a patch to modify it separately, if maintainer applies this patch series.
[PATCH v2 00/10] net: hisilicon: Add support for HI13X1 to hip04_eth
The main purpose of this patch series is to extend the hip04_eth driver to support HI13X1_GMAC. The offset and bitmap of some registers of HI13X1_GMAC are different from hip04_eth common soc. In addition, the definition of send descriptor and parsing descriptor are different from hip04_eth common soc. So the macro of the register offset is redefined to adapt the HI13X1_GMAC. Clean up the sparse warning by the way. Change since v1: * Add a cover letter. Jiangfeng Xiao (10): net: hisilicon: Add support for HI13X1 to hip04_eth net: hisilicon: Cleanup for got restricted __be32 net: hisilicon: Cleanup for cast to restricted __be32 net: hisilicon: HI13X1_GMAX skip write LOCAL_PAGE_REG net: hisilicon: HI13X1_GMAX need dreq reset at first net: hisilicon: dt-bindings: Add an field of port-handle net: hisilicon: Add group field to adapt HI13X1_GMAC net: hisilicon: Offset buf address to adapt HI13X1_GMAC net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC .../bindings/net/hisilicon-hip04-net.txt | 7 +- drivers/net/ethernet/hisilicon/Kconfig | 10 ++ drivers/net/ethernet/hisilicon/hip04_eth.c | 142 ++--- 3 files changed, 136 insertions(+), 23 deletions(-) -- 1.8.5.6
[PATCH v2 10/10] net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
HI13X1 changed the offsets and bitmaps for tx_desc registers in the same peripheral device on different models of the hip04_eth. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 780fc46..6256357 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -76,8 +76,15 @@ /* TX descriptor config */ #define TX_FREE_MEMBIT(0) #define TX_READ_ALLOC_L3 BIT(1) -#define TX_FINISH_CACHE_INVBIT(2) +#if defined(CONFIG_HI13X1_GMAC) +#define TX_CLEAR_WBBIT(7) +#define TX_RELEASE_TO_PPE BIT(4) +#define TX_FINISH_CACHE_INVBIT(6) +#define TX_POOL_SHIFT 16 +#else #define TX_CLEAR_WBBIT(4) +#define TX_FINISH_CACHE_INVBIT(2) +#endif #define TX_L3_CHECKSUM BIT(5) #define TX_LOOP_BACK BIT(11) @@ -124,6 +131,7 @@ /* buf unit size is cache_line_size, which is 64, so the shift is 6 */ #define PPE_BUF_SIZE_SHIFT 6 #define PPE_TX_BUF_HOLDBIT(31) +#define CACHE_LINE_MASK0x3F #else #define PPE_CFG_QOS_VMID_GRP_SHIFT 8 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 @@ -163,11 +171,22 @@ #define HIP04_MIN_TX_COALESCE_FRAMES 100 struct tx_desc { +#if defined(CONFIG_HI13X1_GMAC) + u32 reserved1[2]; + u32 send_addr; + u16 send_size; + u16 data_offset; + u32 reserved2[7]; + u32 cfg; + u32 wb_addr; + u32 reserved3[3]; +#else u32 send_addr; u32 send_size; u32 next_addr; u32 cfg; u32 wb_addr; +#endif } __aligned(64); struct rx_desc { @@ -505,11 +524,20 @@ static void hip04_start_tx_timer(struct hip04_priv *priv) priv->tx_skb[tx_head] = skb; priv->tx_phys[tx_head] = phys; - desc->send_addr = (__force u32)cpu_to_be32(phys); + desc->send_size = (__force u32)cpu_to_be32(skb->len); +#if defined(CONFIG_HI13X1_GMAC) + desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV + | TX_RELEASE_TO_PPE | priv->port << TX_POOL_SHIFT); + desc->data_offset = (__force u32)cpu_to_be32(phys & CACHE_LINE_MASK); + desc->send_addr = (__force u32)cpu_to_be32(phys & ~CACHE_LINE_MASK); +#else desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV); + desc->send_addr = (__force u32)cpu_to_be32(phys); +#endif phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc); - desc->wb_addr = (__force u32)cpu_to_be32(phys); + desc->wb_addr = (__force u32)cpu_to_be32(phys + + offsetof(struct tx_desc, send_addr)); skb_tx_timestamp(skb); hip04_set_xmit_desc(priv, phys); -- 1.8.5.6
[PATCH v2 04/10] net: hisilicon: HI13X1_GMAX skip write LOCAL_PAGE_REG
HI13X1_GMAC changed the offsets and bitmaps for GE_TX_LOCAL_PAGE_REG registers in the same peripheral device on different models of the hip04_eth. With the default configuration, HI13X1_GMAC can also work without any writes to the GE_TX_LOCAL_PAGE_REG register. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index d8f0619..fe61b01 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -308,8 +308,10 @@ static void hip04_config_fifo(struct hip04_priv *priv) val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN; writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG); +#ifndef CONFIG_HI13X1_GMAC val = GE_AUTO_NEG_CTL; writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG); +#endif } static void hip04_mac_enable(struct net_device *ndev) -- 1.8.5.6
[PATCH v2 07/10] net: hisilicon: Add group field to adapt HI13X1_GMAC
In general, group is the same as the port, but some boards specify a special group for better load balancing of each processing unit. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 19d8cfd..5328219 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -178,6 +178,7 @@ struct hip04_priv { int phy_mode; int chan; unsigned int port; + unsigned int group; unsigned int speed; unsigned int duplex; unsigned int reg_inten; @@ -278,10 +279,10 @@ static void hip04_config_fifo(struct hip04_priv *priv) val |= PPE_CFG_STS_RX_PKT_CNT_RC; writel_relaxed(val, priv->base + PPE_CFG_STS_MODE); - val = BIT(priv->port); + val = BIT(priv->group); regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val); - val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT; + val = priv->group << PPE_CFG_QOS_VMID_GRP_SHIFT; val |= PPE_CFG_QOS_VMID_MODE; writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN); @@ -876,7 +877,7 @@ static int hip04_mac_probe(struct platform_device *pdev) } #endif - ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, ); + ret = of_parse_phandle_with_fixed_args(node, "port-handle", 3, 0, ); if (ret < 0) { dev_warn(d, "no port-handle\n"); goto init_fail; @@ -884,6 +885,7 @@ static int hip04_mac_probe(struct platform_device *pdev) priv->port = arg.args[0]; priv->chan = arg.args[1] * RX_DESC_NUM; + priv->group = arg.args[2]; hrtimer_init(>tx_coalesce_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); -- 1.8.5.6
[PATCH v2 09/10] net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
HI13X1 changed the offsets and bitmaps for rx_desc registers in the same peripheral device on different models of the hip04_eth. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index c578934..780fc46 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -171,11 +171,20 @@ struct tx_desc { } __aligned(64); struct rx_desc { +#if defined(CONFIG_HI13X1_GMAC) + u32 reserved1[3]; + u16 pkt_len; + u16 reserved_16; + u32 reserved2[6]; + u32 pkt_err; + u32 reserved3[5]; +#else u16 reserved_16; u16 pkt_len; u32 reserve1[3]; u32 pkt_err; u32 reserve2[4]; +#endif }; struct hip04_priv { -- 1.8.5.6
[PATCH v2 01/10] net: hisilicon: Add support for HI13X1 to hip04_eth
Extend the hip04_eth driver to support HI13X1_GMAC. Enable it with CONFIG_HI13X1_GMAC option. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/Kconfig | 10 drivers/net/ethernet/hisilicon/hip04_eth.c | 37 -- 2 files changed, 40 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig index a0d780c..3892a20 100644 --- a/drivers/net/ethernet/hisilicon/Kconfig +++ b/drivers/net/ethernet/hisilicon/Kconfig @@ -46,6 +46,16 @@ config HIP04_ETH If you wish to compile a kernel for a hardware with hisilicon p04 SoC and want to use the internal ethernet then you should answer Y to this. +config HI13X1_GMAC + bool "Hisilicon HI13X1 Network Device Support" + depends on HIP04_ETH + help + If you wish to compile a kernel for a hardware with hisilicon hi13x1_gamc + then you should answer Y to this. This makes this driver suitable for use + on certain boards such as the HI13X1. + + If you are unsure, say N. + config HNS_MDIO tristate select PHYLIB diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index e1f2978..2b5112b 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -33,10 +33,23 @@ #define GE_MODE_CHANGE_REG 0x1b4 #define GE_RECV_CONTROL_REG0x1e0 #define GE_STATION_MAC_ADDRESS 0x210 -#define PPE_CFG_CPU_ADD_ADDR 0x580 -#define PPE_CFG_MAX_FRAME_LEN_REG 0x408 + #define PPE_CFG_BUS_CTRL_REG 0x424 #define PPE_CFG_RX_CTRL_REG0x428 + +#if defined(CONFIG_HI13X1_GMAC) +#define PPE_CFG_CPU_ADD_ADDR 0x6D0 +#define PPE_CFG_MAX_FRAME_LEN_REG 0x500 +#define PPE_CFG_RX_PKT_MODE_REG0x504 +#define PPE_CFG_QOS_VMID_GEN 0x520 +#define PPE_CFG_RX_PKT_INT 0x740 +#define PPE_INTEN 0x700 +#define PPE_INTSTS 0x708 +#define PPE_RINT 0x704 +#define PPE_CFG_STS_MODE 0x880 +#else +#define PPE_CFG_CPU_ADD_ADDR 0x580 +#define PPE_CFG_MAX_FRAME_LEN_REG 0x408 #define PPE_CFG_RX_PKT_MODE_REG0x438 #define PPE_CFG_QOS_VMID_GEN 0x500 #define PPE_CFG_RX_PKT_INT 0x538 @@ -44,6 +57,8 @@ #define PPE_INTSTS 0x608 #define PPE_RINT 0x604 #define PPE_CFG_STS_MODE 0x700 +#endif /* CONFIG_HI13X1_GMAC */ + #define PPE_HIS_RX_PKT_CNT 0x804 /* REG_INTERRUPT */ @@ -93,18 +108,26 @@ #define GE_RX_PORT_EN BIT(1) #define GE_TX_PORT_EN BIT(2) -#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12) - #define PPE_CFG_RX_PKT_ALIGN BIT(18) -#define PPE_CFG_QOS_VMID_MODE BIT(14) + +#if defined(CONFIG_HI13X1_GMAC) +#define PPE_CFG_QOS_VMID_GRP_SHIFT 4 +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT7 +#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(0) +#define PPE_CFG_QOS_VMID_MODE BIT(15) +#define PPE_CFG_BUS_LOCAL_REL (BIT(9) | BIT(15) | BIT(19) | BIT(23)) +#else #define PPE_CFG_QOS_VMID_GRP_SHIFT 8 +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 +#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12) +#define PPE_CFG_QOS_VMID_MODE BIT(14) +#define PPE_CFG_BUS_LOCAL_REL BIT(14) +#endif /* CONFIG_HI13X1_GMAC */ #define PPE_CFG_RX_FIFO_FSFU BIT(11) #define PPE_CFG_RX_DEPTH_SHIFT 16 #define PPE_CFG_RX_START_SHIFT 0 -#define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 -#define PPE_CFG_BUS_LOCAL_REL BIT(14) #define PPE_CFG_BUS_BIG_ENDIEN BIT(0) #define RX_DESC_NUM128 -- 1.8.5.6
[PATCH v2 08/10] net: hisilicon: Offset buf address to adapt HI13X1_GMAC
The buf unit size of HI13X1_GMAC is cache_line_size, which is 64, so the address we write to the buf register needs to be shifted right by 6 bits. The 31st bit of the PPE_CFG_CPU_ADD_ADDR register of HI13X1_GMAC indicates whether to release the buffer of the message, and the low indicates that it is valid. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 5328219..c578934 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -120,12 +120,20 @@ #define PPE_CFG_STS_RX_PKT_CNT_RC BIT(0) #define PPE_CFG_QOS_VMID_MODE BIT(15) #define PPE_CFG_BUS_LOCAL_REL (BIT(9) | BIT(15) | BIT(19) | BIT(23)) + +/* buf unit size is cache_line_size, which is 64, so the shift is 6 */ +#define PPE_BUF_SIZE_SHIFT 6 +#define PPE_TX_BUF_HOLDBIT(31) #else #define PPE_CFG_QOS_VMID_GRP_SHIFT 8 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 #define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12) #define PPE_CFG_QOS_VMID_MODE BIT(14) #define PPE_CFG_BUS_LOCAL_REL BIT(14) + +/* buf unit size is 1, so the shift is 6 */ +#define PPE_BUF_SIZE_SHIFT 0 +#define PPE_TX_BUF_HOLD0 #endif /* CONFIG_HI13X1_GMAC */ #define PPE_CFG_RX_FIFO_FSFU BIT(11) @@ -286,7 +294,7 @@ static void hip04_config_fifo(struct hip04_priv *priv) val |= PPE_CFG_QOS_VMID_MODE; writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN); - val = RX_BUF_SIZE; + val = RX_BUF_SIZE >> PPE_BUF_SIZE_SHIFT; regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val); val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT; @@ -369,12 +377,18 @@ static void hip04_mac_disable(struct net_device *ndev) static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys) { - writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR); + u32 val; + + val = phys >> PPE_BUF_SIZE_SHIFT | PPE_TX_BUF_HOLD; + writel(val, priv->base + PPE_CFG_CPU_ADD_ADDR); } static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys) { - regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys); + u32 val; + + val = phys >> PPE_BUF_SIZE_SHIFT; + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, val); } static u32 hip04_recv_cnt(struct hip04_priv *priv) -- 1.8.5.6
[PATCH v2 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first
HI13X1_GMAC delete request for soft reset at first, otherwise, the subsequent initialization will not take effect. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 24 1 file changed, 24 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index fe61b01..19d8cfd 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -16,6 +16,8 @@ #include #include +#define SC_PPE_RESET_DREQ 0x026C + #define PPE_CFG_RX_ADDR0x100 #define PPE_CFG_POOL_GRP 0x300 #define PPE_CFG_RX_BUF_SIZE0x400 @@ -61,6 +63,8 @@ #define PPE_HIS_RX_PKT_CNT 0x804 +#define RESET_DREQ_ALL 0x + /* REG_INTERRUPT */ #define RCV_INTBIT(10) #define RCV_NOBUF BIT(8) @@ -168,6 +172,9 @@ struct rx_desc { struct hip04_priv { void __iomem *base; +#if defined(CONFIG_HI13X1_GMAC) + void __iomem *sysctrl_base; +#endif int phy_mode; int chan; unsigned int port; @@ -244,6 +251,13 @@ static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex) writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG); } +static void hip04_reset_dreq(struct hip04_priv *priv) +{ +#if defined(CONFIG_HI13X1_GMAC) + writel_relaxed(RESET_DREQ_ALL, priv->sysctrl_base + SC_PPE_RESET_DREQ); +#endif +} + static void hip04_reset_ppe(struct hip04_priv *priv) { u32 val, tmp, timeout = 0; @@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device *pdev) goto init_fail; } +#if defined(CONFIG_HI13X1_GMAC) + res = platform_get_resource(pdev, IORESOURCE_MEM, 1); + priv->sysctrl_base = devm_ioremap_resource(d, res); + if (IS_ERR(priv->sysctrl_base)) { + ret = PTR_ERR(priv->sysctrl_base); + goto init_fail; + } +#endif + ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, ); if (ret < 0) { dev_warn(d, "no port-handle\n"); @@ -921,6 +944,7 @@ static int hip04_mac_probe(struct platform_device *pdev) ndev->irq = irq; netif_napi_add(ndev, >napi, hip04_rx_poll, NAPI_POLL_WEIGHT); + hip04_reset_dreq(priv); hip04_reset_ppe(priv); if (priv->phy_mode == PHY_INTERFACE_MODE_MII) hip04_config_port(ndev, SPEED_100, DUPLEX_FULL); -- 1.8.5.6
[PATCH v2 06/10] net: hisilicon: dt-bindings: Add an field of port-handle
In general, group is the same as the port, but some boards specify a special group for better load balancing of each processing unit. Signed-off-by: Jiangfeng Xiao --- Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt index d1df8a0..464c0da 100644 --- a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt @@ -10,6 +10,7 @@ Required properties: phandle, specifies a reference to the syscon ppe node port, port number connected to the controller channel, recv channel start from channel * number (RX_DESC_NUM) + group, field in the pkg desc, in general, it is the same as the port. - phy-mode: see ethernet.txt [1]. Optional properties: @@ -66,7 +67,7 @@ Example: reg = <0x28b 0x1>; interrupts = <0 413 4>; phy-mode = "mii"; - port-handle = < 31 0>; + port-handle = < 31 0 31>; }; ge0: ethernet@280 { @@ -74,7 +75,7 @@ Example: reg = <0x280 0x1>; interrupts = <0 402 4>; phy-mode = "sgmii"; - port-handle = < 0 1>; + port-handle = < 0 1 0>; phy-handle = <>; }; @@ -83,6 +84,6 @@ Example: reg = <0x288 0x1>; interrupts = <0 410 4>; phy-mode = "sgmii"; - port-handle = < 8 2>; + port-handle = < 8 2 8>; phy-handle = <>; }; -- 1.8.5.6
[PATCH v2 03/10] net: hisilicon: Cleanup for cast to restricted __be32
This patch fixes the following warning from sparse: hip04_eth.c:533:23: warning: cast to restricted __be16 hip04_eth.c:533:23: warning: cast to restricted __be16 hip04_eth.c:533:23: warning: cast to restricted __be16 hip04_eth.c:533:23: warning: cast to restricted __be16 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 31f13cf..d8f0619 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -530,8 +530,8 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) priv->rx_phys[priv->rx_head] = 0; desc = (struct rx_desc *)skb->data; - len = be16_to_cpu(desc->pkt_len); - err = be32_to_cpu(desc->pkt_err); + len = be16_to_cpu((__force __be16)desc->pkt_len); + err = be32_to_cpu((__force __be32)desc->pkt_err); if (0 == len) { dev_kfree_skb_any(skb); -- 1.8.5.6
[PATCH v2 02/10] net: hisilicon: Cleanup for got restricted __be32
This patch fixes the following warning from sparse: hip04_eth.c:468:25: warning: incorrect type in assignment hip04_eth.c:468:25:expected unsigned int [usertype] send_addr hip04_eth.c:468:25:got restricted __be32 [usertype] hip04_eth.c:469:25: warning: incorrect type in assignment hip04_eth.c:469:25:expected unsigned int [usertype] send_size hip04_eth.c:469:25:got restricted __be32 [usertype] hip04_eth.c:470:19: warning: incorrect type in assignment hip04_eth.c:470:19:expected unsigned int [usertype] cfg hip04_eth.c:470:19:got restricted __be32 [usertype] hip04_eth.c:472:23: warning: incorrect type in assignment hip04_eth.c:472:23:expected unsigned int [usertype] wb_addr hip04_eth.c:472:23:got restricted __be32 [usertype] Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 2b5112b..31f13cf 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -465,11 +465,11 @@ static void hip04_start_tx_timer(struct hip04_priv *priv) priv->tx_skb[tx_head] = skb; priv->tx_phys[tx_head] = phys; - desc->send_addr = cpu_to_be32(phys); - desc->send_size = cpu_to_be32(skb->len); - desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV); + desc->send_addr = (__force u32)cpu_to_be32(phys); + desc->send_size = (__force u32)cpu_to_be32(skb->len); + desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV); phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc); - desc->wb_addr = cpu_to_be32(phys); + desc->wb_addr = (__force u32)cpu_to_be32(phys); skb_tx_timestamp(skb); hip04_set_xmit_desc(priv, phys); -- 1.8.5.6
Re: [PATCH] net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
On 2019/7/9 2:18, David Miller wrote: > From: David Miller > Date: Sun, 07 Jul 2019 22:18:05 -0700 (PDT) > >> From: Jiangfeng Xiao >> Date: Fri, 5 Jul 2019 14:10:03 +0800 >> >>> HI13X1 changed the offsets and bitmaps for tx_desc >>> registers in the same peripheral device on different >>> models of the hip04_eth. >>> >>> Signed-off-by: Jiangfeng Xiao >> >> Applied. > > Actually I didn't apply this because I can't see that HI13X1_GMAC > kconfig knob anywhere in the tree at all. > Thank you for your guidance, I made a mistake, for which I am sincerely sorry for wasting your time. I will submit the correct one again. I will not make this low-level mistake again. Thanks, Jiangfeng Xiao
[PATCH 09/10] net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
HI13X1 changed the offsets and bitmaps for rx_desc registers in the same peripheral device on different models of the hip04_eth. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index c578934..780fc46 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -171,11 +171,20 @@ struct tx_desc { } __aligned(64); struct rx_desc { +#if defined(CONFIG_HI13X1_GMAC) + u32 reserved1[3]; + u16 pkt_len; + u16 reserved_16; + u32 reserved2[6]; + u32 pkt_err; + u32 reserved3[5]; +#else u16 reserved_16; u16 pkt_len; u32 reserve1[3]; u32 pkt_err; u32 reserve2[4]; +#endif }; struct hip04_priv { -- 1.8.5.6
[PATCH 10/10] net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
HI13X1 changed the offsets and bitmaps for tx_desc registers in the same peripheral device on different models of the hip04_eth. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 780fc46..6256357 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -76,8 +76,15 @@ /* TX descriptor config */ #define TX_FREE_MEMBIT(0) #define TX_READ_ALLOC_L3 BIT(1) -#define TX_FINISH_CACHE_INVBIT(2) +#if defined(CONFIG_HI13X1_GMAC) +#define TX_CLEAR_WBBIT(7) +#define TX_RELEASE_TO_PPE BIT(4) +#define TX_FINISH_CACHE_INVBIT(6) +#define TX_POOL_SHIFT 16 +#else #define TX_CLEAR_WBBIT(4) +#define TX_FINISH_CACHE_INVBIT(2) +#endif #define TX_L3_CHECKSUM BIT(5) #define TX_LOOP_BACK BIT(11) @@ -124,6 +131,7 @@ /* buf unit size is cache_line_size, which is 64, so the shift is 6 */ #define PPE_BUF_SIZE_SHIFT 6 #define PPE_TX_BUF_HOLDBIT(31) +#define CACHE_LINE_MASK0x3F #else #define PPE_CFG_QOS_VMID_GRP_SHIFT 8 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 @@ -163,11 +171,22 @@ #define HIP04_MIN_TX_COALESCE_FRAMES 100 struct tx_desc { +#if defined(CONFIG_HI13X1_GMAC) + u32 reserved1[2]; + u32 send_addr; + u16 send_size; + u16 data_offset; + u32 reserved2[7]; + u32 cfg; + u32 wb_addr; + u32 reserved3[3]; +#else u32 send_addr; u32 send_size; u32 next_addr; u32 cfg; u32 wb_addr; +#endif } __aligned(64); struct rx_desc { @@ -505,11 +524,20 @@ static void hip04_start_tx_timer(struct hip04_priv *priv) priv->tx_skb[tx_head] = skb; priv->tx_phys[tx_head] = phys; - desc->send_addr = (__force u32)cpu_to_be32(phys); + desc->send_size = (__force u32)cpu_to_be32(skb->len); +#if defined(CONFIG_HI13X1_GMAC) + desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV + | TX_RELEASE_TO_PPE | priv->port << TX_POOL_SHIFT); + desc->data_offset = (__force u32)cpu_to_be32(phys & CACHE_LINE_MASK); + desc->send_addr = (__force u32)cpu_to_be32(phys & ~CACHE_LINE_MASK); +#else desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV); + desc->send_addr = (__force u32)cpu_to_be32(phys); +#endif phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc); - desc->wb_addr = (__force u32)cpu_to_be32(phys); + desc->wb_addr = (__force u32)cpu_to_be32(phys + + offsetof(struct tx_desc, send_addr)); skb_tx_timestamp(skb); hip04_set_xmit_desc(priv, phys); -- 1.8.5.6
[PATCH 08/10] net: hisilicon: Offset buf address to adapt HI13X1_GMAC
The buf unit size of HI13X1_GMAC is cache_line_size, which is 64, so the address we write to the buf register needs to be shifted right by 6 bits. The 31st bit of the PPE_CFG_CPU_ADD_ADDR register of HI13X1_GMAC indicates whether to release the buffer of the message, and the low indicates that it is valid. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 5328219..c578934 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -120,12 +120,20 @@ #define PPE_CFG_STS_RX_PKT_CNT_RC BIT(0) #define PPE_CFG_QOS_VMID_MODE BIT(15) #define PPE_CFG_BUS_LOCAL_REL (BIT(9) | BIT(15) | BIT(19) | BIT(23)) + +/* buf unit size is cache_line_size, which is 64, so the shift is 6 */ +#define PPE_BUF_SIZE_SHIFT 6 +#define PPE_TX_BUF_HOLDBIT(31) #else #define PPE_CFG_QOS_VMID_GRP_SHIFT 8 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 #define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12) #define PPE_CFG_QOS_VMID_MODE BIT(14) #define PPE_CFG_BUS_LOCAL_REL BIT(14) + +/* buf unit size is 1, so the shift is 6 */ +#define PPE_BUF_SIZE_SHIFT 0 +#define PPE_TX_BUF_HOLD0 #endif /* CONFIG_HI13X1_GMAC */ #define PPE_CFG_RX_FIFO_FSFU BIT(11) @@ -286,7 +294,7 @@ static void hip04_config_fifo(struct hip04_priv *priv) val |= PPE_CFG_QOS_VMID_MODE; writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN); - val = RX_BUF_SIZE; + val = RX_BUF_SIZE >> PPE_BUF_SIZE_SHIFT; regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val); val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT; @@ -369,12 +377,18 @@ static void hip04_mac_disable(struct net_device *ndev) static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys) { - writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR); + u32 val; + + val = phys >> PPE_BUF_SIZE_SHIFT | PPE_TX_BUF_HOLD; + writel(val, priv->base + PPE_CFG_CPU_ADD_ADDR); } static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys) { - regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys); + u32 val; + + val = phys >> PPE_BUF_SIZE_SHIFT; + regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, val); } static u32 hip04_recv_cnt(struct hip04_priv *priv) -- 1.8.5.6
[PATCH 06/10] net: hisilicon: dt-bindings: Add an field of port-handle
In general, group is the same as the port, but some boards specify a special group for better load balancing of each processing unit. Signed-off-by: Jiangfeng Xiao --- Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt index d1df8a0..464c0da 100644 --- a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt @@ -10,6 +10,7 @@ Required properties: phandle, specifies a reference to the syscon ppe node port, port number connected to the controller channel, recv channel start from channel * number (RX_DESC_NUM) + group, field in the pkg desc, in general, it is the same as the port. - phy-mode: see ethernet.txt [1]. Optional properties: @@ -66,7 +67,7 @@ Example: reg = <0x28b 0x1>; interrupts = <0 413 4>; phy-mode = "mii"; - port-handle = < 31 0>; + port-handle = < 31 0 31>; }; ge0: ethernet@280 { @@ -74,7 +75,7 @@ Example: reg = <0x280 0x1>; interrupts = <0 402 4>; phy-mode = "sgmii"; - port-handle = < 0 1>; + port-handle = < 0 1 0>; phy-handle = <>; }; @@ -83,6 +84,6 @@ Example: reg = <0x288 0x1>; interrupts = <0 410 4>; phy-mode = "sgmii"; - port-handle = < 8 2>; + port-handle = < 8 2 8>; phy-handle = <>; }; -- 1.8.5.6
[PATCH 07/10] net: hisilicon: Add group field to adapt HI13X1_GMAC
In general, group is the same as the port, but some boards specify a special group for better load balancing of each processing unit. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 19d8cfd..5328219 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -178,6 +178,7 @@ struct hip04_priv { int phy_mode; int chan; unsigned int port; + unsigned int group; unsigned int speed; unsigned int duplex; unsigned int reg_inten; @@ -278,10 +279,10 @@ static void hip04_config_fifo(struct hip04_priv *priv) val |= PPE_CFG_STS_RX_PKT_CNT_RC; writel_relaxed(val, priv->base + PPE_CFG_STS_MODE); - val = BIT(priv->port); + val = BIT(priv->group); regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val); - val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT; + val = priv->group << PPE_CFG_QOS_VMID_GRP_SHIFT; val |= PPE_CFG_QOS_VMID_MODE; writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN); @@ -876,7 +877,7 @@ static int hip04_mac_probe(struct platform_device *pdev) } #endif - ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, ); + ret = of_parse_phandle_with_fixed_args(node, "port-handle", 3, 0, ); if (ret < 0) { dev_warn(d, "no port-handle\n"); goto init_fail; @@ -884,6 +885,7 @@ static int hip04_mac_probe(struct platform_device *pdev) priv->port = arg.args[0]; priv->chan = arg.args[1] * RX_DESC_NUM; + priv->group = arg.args[2]; hrtimer_init(>tx_coalesce_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); -- 1.8.5.6
[PATCH 05/10] net: hisilicon: HI13X1_GMAX need dreq reset at first
HI13X1_GMAC delete request for soft reset at first, otherwise, the subsequent initialization will not take effect. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 24 1 file changed, 24 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index fe61b01..19d8cfd 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -16,6 +16,8 @@ #include #include +#define SC_PPE_RESET_DREQ 0x026C + #define PPE_CFG_RX_ADDR0x100 #define PPE_CFG_POOL_GRP 0x300 #define PPE_CFG_RX_BUF_SIZE0x400 @@ -61,6 +63,8 @@ #define PPE_HIS_RX_PKT_CNT 0x804 +#define RESET_DREQ_ALL 0x + /* REG_INTERRUPT */ #define RCV_INTBIT(10) #define RCV_NOBUF BIT(8) @@ -168,6 +172,9 @@ struct rx_desc { struct hip04_priv { void __iomem *base; +#if defined(CONFIG_HI13X1_GMAC) + void __iomem *sysctrl_base; +#endif int phy_mode; int chan; unsigned int port; @@ -244,6 +251,13 @@ static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex) writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG); } +static void hip04_reset_dreq(struct hip04_priv *priv) +{ +#if defined(CONFIG_HI13X1_GMAC) + writel_relaxed(RESET_DREQ_ALL, priv->sysctrl_base + SC_PPE_RESET_DREQ); +#endif +} + static void hip04_reset_ppe(struct hip04_priv *priv) { u32 val, tmp, timeout = 0; @@ -853,6 +867,15 @@ static int hip04_mac_probe(struct platform_device *pdev) goto init_fail; } +#if defined(CONFIG_HI13X1_GMAC) + res = platform_get_resource(pdev, IORESOURCE_MEM, 1); + priv->sysctrl_base = devm_ioremap_resource(d, res); + if (IS_ERR(priv->sysctrl_base)) { + ret = PTR_ERR(priv->sysctrl_base); + goto init_fail; + } +#endif + ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, ); if (ret < 0) { dev_warn(d, "no port-handle\n"); @@ -921,6 +944,7 @@ static int hip04_mac_probe(struct platform_device *pdev) ndev->irq = irq; netif_napi_add(ndev, >napi, hip04_rx_poll, NAPI_POLL_WEIGHT); + hip04_reset_dreq(priv); hip04_reset_ppe(priv); if (priv->phy_mode == PHY_INTERFACE_MODE_MII) hip04_config_port(ndev, SPEED_100, DUPLEX_FULL); -- 1.8.5.6
[PATCH 04/10] net: hisilicon: HI13X1_GMAX skip write LOCAL_PAGE_REG
HI13X1_GMAC changed the offsets and bitmaps for GE_TX_LOCAL_PAGE_REG registers in the same peripheral device on different models of the hip04_eth. With the default configuration, HI13X1_GMAC can also work without any writes to the GE_TX_LOCAL_PAGE_REG register. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index d8f0619..fe61b01 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -308,8 +308,10 @@ static void hip04_config_fifo(struct hip04_priv *priv) val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN; writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG); +#ifndef CONFIG_HI13X1_GMAC val = GE_AUTO_NEG_CTL; writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG); +#endif } static void hip04_mac_enable(struct net_device *ndev) -- 1.8.5.6
[PATCH 01/10] net: hisilicon: Add support for HI13X1 to hip04_eth
Extend the hip04_eth driver to support HI13X1_GMAC. Enable it with CONFIG_HI13X1_GMAC option. HI13X1 changed the offsets and bitmaps for registers in the same peripheral device on different models of the hip04_eth. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/Kconfig | 10 drivers/net/ethernet/hisilicon/hip04_eth.c | 37 -- 2 files changed, 40 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig index a0d780c..3892a20 100644 --- a/drivers/net/ethernet/hisilicon/Kconfig +++ b/drivers/net/ethernet/hisilicon/Kconfig @@ -46,6 +46,16 @@ config HIP04_ETH If you wish to compile a kernel for a hardware with hisilicon p04 SoC and want to use the internal ethernet then you should answer Y to this. +config HI13X1_GMAC + bool "Hisilicon HI13X1 Network Device Support" + depends on HIP04_ETH + help + If you wish to compile a kernel for a hardware with hisilicon hi13x1_gamc + then you should answer Y to this. This makes this driver suitable for use + on certain boards such as the HI13X1. + + If you are unsure, say N. + config HNS_MDIO tristate select PHYLIB diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index e1f2978..2b5112b 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -33,10 +33,23 @@ #define GE_MODE_CHANGE_REG 0x1b4 #define GE_RECV_CONTROL_REG0x1e0 #define GE_STATION_MAC_ADDRESS 0x210 -#define PPE_CFG_CPU_ADD_ADDR 0x580 -#define PPE_CFG_MAX_FRAME_LEN_REG 0x408 + #define PPE_CFG_BUS_CTRL_REG 0x424 #define PPE_CFG_RX_CTRL_REG0x428 + +#if defined(CONFIG_HI13X1_GMAC) +#define PPE_CFG_CPU_ADD_ADDR 0x6D0 +#define PPE_CFG_MAX_FRAME_LEN_REG 0x500 +#define PPE_CFG_RX_PKT_MODE_REG0x504 +#define PPE_CFG_QOS_VMID_GEN 0x520 +#define PPE_CFG_RX_PKT_INT 0x740 +#define PPE_INTEN 0x700 +#define PPE_INTSTS 0x708 +#define PPE_RINT 0x704 +#define PPE_CFG_STS_MODE 0x880 +#else +#define PPE_CFG_CPU_ADD_ADDR 0x580 +#define PPE_CFG_MAX_FRAME_LEN_REG 0x408 #define PPE_CFG_RX_PKT_MODE_REG0x438 #define PPE_CFG_QOS_VMID_GEN 0x500 #define PPE_CFG_RX_PKT_INT 0x538 @@ -44,6 +57,8 @@ #define PPE_INTSTS 0x608 #define PPE_RINT 0x604 #define PPE_CFG_STS_MODE 0x700 +#endif /* CONFIG_HI13X1_GMAC */ + #define PPE_HIS_RX_PKT_CNT 0x804 /* REG_INTERRUPT */ @@ -93,18 +108,26 @@ #define GE_RX_PORT_EN BIT(1) #define GE_TX_PORT_EN BIT(2) -#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12) - #define PPE_CFG_RX_PKT_ALIGN BIT(18) -#define PPE_CFG_QOS_VMID_MODE BIT(14) + +#if defined(CONFIG_HI13X1_GMAC) +#define PPE_CFG_QOS_VMID_GRP_SHIFT 4 +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT7 +#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(0) +#define PPE_CFG_QOS_VMID_MODE BIT(15) +#define PPE_CFG_BUS_LOCAL_REL (BIT(9) | BIT(15) | BIT(19) | BIT(23)) +#else #define PPE_CFG_QOS_VMID_GRP_SHIFT 8 +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 +#define PPE_CFG_STS_RX_PKT_CNT_RC BIT(12) +#define PPE_CFG_QOS_VMID_MODE BIT(14) +#define PPE_CFG_BUS_LOCAL_REL BIT(14) +#endif /* CONFIG_HI13X1_GMAC */ #define PPE_CFG_RX_FIFO_FSFU BIT(11) #define PPE_CFG_RX_DEPTH_SHIFT 16 #define PPE_CFG_RX_START_SHIFT 0 -#define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 -#define PPE_CFG_BUS_LOCAL_REL BIT(14) #define PPE_CFG_BUS_BIG_ENDIEN BIT(0) #define RX_DESC_NUM128 -- 1.8.5.6
[PATCH 02/10] net: hisilicon: Cleanup for got restricted __be32
This patch fixes the following warning from sparse: hip04_eth.c:468:25: warning: incorrect type in assignment hip04_eth.c:468:25:expected unsigned int [usertype] send_addr hip04_eth.c:468:25:got restricted __be32 [usertype] hip04_eth.c:469:25: warning: incorrect type in assignment hip04_eth.c:469:25:expected unsigned int [usertype] send_size hip04_eth.c:469:25:got restricted __be32 [usertype] hip04_eth.c:470:19: warning: incorrect type in assignment hip04_eth.c:470:19:expected unsigned int [usertype] cfg hip04_eth.c:470:19:got restricted __be32 [usertype] hip04_eth.c:472:23: warning: incorrect type in assignment hip04_eth.c:472:23:expected unsigned int [usertype] wb_addr hip04_eth.c:472:23:got restricted __be32 [usertype] Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 2b5112b..31f13cf 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -465,11 +465,11 @@ static void hip04_start_tx_timer(struct hip04_priv *priv) priv->tx_skb[tx_head] = skb; priv->tx_phys[tx_head] = phys; - desc->send_addr = cpu_to_be32(phys); - desc->send_size = cpu_to_be32(skb->len); - desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV); + desc->send_addr = (__force u32)cpu_to_be32(phys); + desc->send_size = (__force u32)cpu_to_be32(skb->len); + desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV); phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc); - desc->wb_addr = cpu_to_be32(phys); + desc->wb_addr = (__force u32)cpu_to_be32(phys); skb_tx_timestamp(skb); hip04_set_xmit_desc(priv, phys); -- 1.8.5.6
[PATCH 03/10] net: hisilicon: Cleanup for cast to restricted __be32
This patch fixes the following warning from sparse: hip04_eth.c:533:23: warning: cast to restricted __be16 hip04_eth.c:533:23: warning: cast to restricted __be16 hip04_eth.c:533:23: warning: cast to restricted __be16 hip04_eth.c:533:23: warning: cast to restricted __be16 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 hip04_eth.c:534:23: warning: cast to restricted __be32 Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 31f13cf..d8f0619 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -530,8 +530,8 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget) priv->rx_phys[priv->rx_head] = 0; desc = (struct rx_desc *)skb->data; - len = be16_to_cpu(desc->pkt_len); - err = be32_to_cpu(desc->pkt_err); + len = be16_to_cpu((__force __be16)desc->pkt_len); + err = be32_to_cpu((__force __be32)desc->pkt_err); if (0 == len) { dev_kfree_skb_any(skb); -- 1.8.5.6
[PATCH] net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
HI13X1 changed the offsets and bitmaps for tx_desc registers in the same peripheral device on different models of the hip04_eth. Signed-off-by: Jiangfeng Xiao --- drivers/net/ethernet/hisilicon/hip04_eth.c | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c index 780fc46..6256357 100644 --- a/drivers/net/ethernet/hisilicon/hip04_eth.c +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c @@ -76,8 +76,15 @@ /* TX descriptor config */ #define TX_FREE_MEMBIT(0) #define TX_READ_ALLOC_L3 BIT(1) -#define TX_FINISH_CACHE_INVBIT(2) +#if defined(CONFIG_HI13X1_GMAC) +#define TX_CLEAR_WBBIT(7) +#define TX_RELEASE_TO_PPE BIT(4) +#define TX_FINISH_CACHE_INVBIT(6) +#define TX_POOL_SHIFT 16 +#else #define TX_CLEAR_WBBIT(4) +#define TX_FINISH_CACHE_INVBIT(2) +#endif #define TX_L3_CHECKSUM BIT(5) #define TX_LOOP_BACK BIT(11) @@ -124,6 +131,7 @@ /* buf unit size is cache_line_size, which is 64, so the shift is 6 */ #define PPE_BUF_SIZE_SHIFT 6 #define PPE_TX_BUF_HOLDBIT(31) +#define CACHE_LINE_MASK0x3F #else #define PPE_CFG_QOS_VMID_GRP_SHIFT 8 #define PPE_CFG_RX_CTRL_ALIGN_SHIFT11 @@ -163,11 +171,22 @@ #define HIP04_MIN_TX_COALESCE_FRAMES 100 struct tx_desc { +#if defined(CONFIG_HI13X1_GMAC) + u32 reserved1[2]; + u32 send_addr; + u16 send_size; + u16 data_offset; + u32 reserved2[7]; + u32 cfg; + u32 wb_addr; + u32 reserved3[3]; +#else u32 send_addr; u32 send_size; u32 next_addr; u32 cfg; u32 wb_addr; +#endif } __aligned(64); struct rx_desc { @@ -505,11 +524,20 @@ static void hip04_start_tx_timer(struct hip04_priv *priv) priv->tx_skb[tx_head] = skb; priv->tx_phys[tx_head] = phys; - desc->send_addr = (__force u32)cpu_to_be32(phys); + desc->send_size = (__force u32)cpu_to_be32(skb->len); +#if defined(CONFIG_HI13X1_GMAC) + desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV + | TX_RELEASE_TO_PPE | priv->port << TX_POOL_SHIFT); + desc->data_offset = (__force u32)cpu_to_be32(phys & CACHE_LINE_MASK); + desc->send_addr = (__force u32)cpu_to_be32(phys & ~CACHE_LINE_MASK); +#else desc->cfg = (__force u32)cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV); + desc->send_addr = (__force u32)cpu_to_be32(phys); +#endif phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc); - desc->wb_addr = (__force u32)cpu_to_be32(phys); + desc->wb_addr = (__force u32)cpu_to_be32(phys + + offsetof(struct tx_desc, send_addr)); skb_tx_timestamp(skb); hip04_set_xmit_desc(priv, phys); -- 1.8.5.6
Re: [PATCH] irqchip/gic: Add dependency for ARM_GIC
On 2019/6/27 22:20, Marc Zyngier wrote: > On 27/06/2019 15:11, Jiangfeng Xiao wrote: >> Not every arch has ARM_GIC, it is strange >> to see ARM_GIC_MAX_NR in the .config file >> of the x86 and IA-64 architecture. so fix >> build by adding necessary dependency. >> >> Signed-off-by: Jiangfeng Xiao >> --- >> drivers/irqchip/Kconfig | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig >> index 659c5e0..e338b471 100644 >> --- a/drivers/irqchip/Kconfig >> +++ b/drivers/irqchip/Kconfig >> @@ -19,6 +19,7 @@ config ARM_GIC_PM >> >> config ARM_GIC_MAX_NR >> int >> +depends on ARM_GIC >> default 2 if ARCH_REALVIEW >> default 1 >> >> > > Isn't that the patch[1] that has already been in -next for the past 10 > days or so? > > Thanks, > > M. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next=702655234dd1d07e76f9bff9575e438c969fc32c > I am very sorry. I thought that this patch was not reviewed, so I pushed it again. I asked my colleagues how to get information about whether the patch is included. Will not make this low-level mistake again. Thanks.
[PATCH] irqchip/gic: Add dependency for ARM_GIC
Not every arch has ARM_GIC, it is strange to see ARM_GIC_MAX_NR in the .config file of the x86 and IA-64 architecture. so fix build by adding necessary dependency. Signed-off-by: Jiangfeng Xiao --- drivers/irqchip/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index 659c5e0..e338b471 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -19,6 +19,7 @@ config ARM_GIC_PM config ARM_GIC_MAX_NR int + depends on ARM_GIC default 2 if ARCH_REALVIEW default 1 -- 1.8.5.6
[PATCH] irqchip/gic: Add dependency for ARM_GIC_MAX_NR
CONFIG_ARM_GIC_MAX_NR is enabled by default. It is redundant in x86 and IA-64 where is without GIC. Signed-off-by: Jiangfeng Xiao --- drivers/irqchip/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index 659c5e0..e338b471 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -19,6 +19,7 @@ config ARM_GIC_PM config ARM_GIC_MAX_NR int + depends on ARM_GIC default 2 if ARCH_REALVIEW default 1 -- 1.8.5.6
[PATCH] PCI/hotplug: fix potential null pointer deference
There is otherwise a risk of a null pointer dereference. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/pci/hotplug/cpqphp_ctrl.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/pci/hotplug/cpqphp_ctrl.c b/drivers/pci/hotplug/cpqphp_ctrl.c index b7f4e1f..3c8399f 100644 --- a/drivers/pci/hotplug/cpqphp_ctrl.c +++ b/drivers/pci/hotplug/cpqphp_ctrl.c @@ -598,10 +598,11 @@ static struct pci_resource *get_io_resource(struct pci_resource **head, u32 size *head = node->next; } else { prevnode = *head; - while (prevnode->next != node) + while (prevnode && prevnode->next != node) prevnode = prevnode->next; - prevnode->next = node->next; + if (prevnode) + prevnode->next = node->next; } node->next = NULL; break; @@ -788,10 +789,11 @@ static struct pci_resource *get_resource(struct pci_resource **head, u32 size) *head = node->next; } else { prevnode = *head; - while (prevnode->next != node) + while (prevnode && prevnode->next != node) prevnode = prevnode->next; - prevnode->next = node->next; + if (prevnode) + prevnode->next = node->next; } node->next = NULL; break; -- 2.7.4
[PATCH] drm/amd: fix hotplug race at startup
We should check mode_config_initialized flag in amdgpu_hotplug_work_func. See commit 7f98ca454ad3 ("drm/radeon: fix hotplug race at startup") for details. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index af4c3b1..13186d6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -85,6 +85,9 @@ static void amdgpu_hotplug_work_func(struct work_struct *work) struct drm_mode_config *mode_config = >mode_config; struct drm_connector *connector; + if (!adev->mode_info.mode_config_initialized) + return; + mutex_lock(_config->mutex); list_for_each_entry(connector, _config->connector_list, head) amdgpu_connector_hotplug(connector); -- 2.7.4
[PATCH] scsi: bnx2fc: Check if sense buffer has been allocated during completion
sc_cmd->sense_buffer is not guaranteed to be allocated so we need to sc_cmd->check if the pointer is NULL before trying to copy anything into it. See commit 16a611154dc1 ("scsi: qedf: Check if sense buffer has been allocated during completion") for details. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/scsi/bnx2fc/bnx2fc_io.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/bnx2fc/bnx2fc_io.c b/drivers/scsi/bnx2fc/bnx2fc_io.c index 8def63c..44a5e59 100644 --- a/drivers/scsi/bnx2fc/bnx2fc_io.c +++ b/drivers/scsi/bnx2fc/bnx2fc_io.c @@ -1789,9 +1789,11 @@ static void bnx2fc_parse_fcp_rsp(struct bnx2fc_cmd *io_req, fcp_sns_len = SCSI_SENSE_BUFFERSIZE; } - memset(sc_cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); - if (fcp_sns_len) - memcpy(sc_cmd->sense_buffer, rq_data, fcp_sns_len); + if (sc_cmd->sense_buffer) { + memset(sc_cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); + if (fcp_sns_len) + memcpy(sc_cmd->sense_buffer, rq_data, fcp_sns_len); + } /* return RQ entries */ for (i = 0; i < num_rq; i++) -- 2.7.4
[PATCH] af_key: Fix memory leak in key_notify_policy.
We leak the allocated out_skb in case pfkey_xfrm_policy2msg() fails. Fix this by freeing it on error. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- net/key/af_key.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/key/af_key.c b/net/key/af_key.c index 4af1e1d..ec414f6 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -2443,6 +2443,7 @@ static int key_pol_get_resp(struct sock *sk, struct xfrm_policy *xp, const struc } err = pfkey_xfrm_policy2msg(out_skb, xp, dir); if (err < 0) + kfree_skb(out_skb); goto out; out_hdr = (struct sadb_msg *) out_skb->data; @@ -2695,6 +2696,7 @@ static int dump_sp(struct xfrm_policy *xp, int dir, int count, void *ptr) err = pfkey_xfrm_policy2msg(out_skb, xp, dir); if (err < 0) + kfree_skb(out_skb); return err; out_hdr = (struct sadb_msg *) out_skb->data; -- 2.7.4
[PATCH] nfc: Ensure presence of required attributes in the deactivate_target handler
Check that the NFC_ATTR_TARGET_INDEX attributes (in addition to NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to accessing them. This prevents potential unhandled NULL pointer dereference exceptions which can be triggered by malicious user-mode programs, if they omit one or both of these attributes. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- net/nfc/netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c index 04a8e47..89d885d 100644 --- a/net/nfc/netlink.c +++ b/net/nfc/netlink.c @@ -923,7 +923,8 @@ static int nfc_genl_deactivate_target(struct sk_buff *skb, u32 device_idx, target_idx; int rc; - if (!info->attrs[NFC_ATTR_DEVICE_INDEX]) + if (!info->attrs[NFC_ATTR_DEVICE_INDEX] || + !info->attrs[NFC_ATTR_TARGET_INDEX]) return -EINVAL; device_idx = nla_get_u32(info->attrs[NFC_ATTR_DEVICE_INDEX]); -- 2.7.4
Re: [PATCH v3 1/9] Documentation: Introduce EPT based Subpage Protection
On Fri, Jun 7, 2019 at 7:12 AM Yang Weijiang wrote: > > On Thu, Jun 06, 2019 at 09:57:00PM -0600, Jidong Xiao wrote: > > Hi, Weijiang, > > > > Does this require some specific Intel processors or is it supported by > > older processors as well? > > > > -Jidong > Hi, Jidong, > SPP is a feature on new platforms, so only available with new > Intel processors. Oh, I see. Thanks! -Jidong > > > > On Thu, Jun 6, 2019 at 9:33 AM Yang Weijiang > > wrote: > > > > > > Signed-off-by: Yang Weijiang > > > --- > > > Documentation/virtual/kvm/spp_kvm.txt | 216 ++ > > > 1 file changed, 216 insertions(+) > > > create mode 100644 Documentation/virtual/kvm/spp_kvm.txt > > > > > > diff --git a/Documentation/virtual/kvm/spp_kvm.txt > > > b/Documentation/virtual/kvm/spp_kvm.txt > > > new file mode 100644 > > > index ..4b5edcaf48b6 > > > --- /dev/null > > > +++ b/Documentation/virtual/kvm/spp_kvm.txt > > > @@ -0,0 +1,216 @@ > > > +EPT-Based Sub-Page Protection (SPP) for KVM > > > += > > > + > > > +1. Overview > > > + > > > +EPT-based Sub-Page Protection (SPP) capability to allow Virtual Machine > > > +Monitors to specify write-protection for guest physical memory at a > > > +sub-page (128 byte) granularity. When this capability is utilized, the > > > +CPU enforces write-access permissions for sub-page regions inside 4K > > > pages > > > +as specified by the VMI tools. > > > + > > > +2. Operation of SPP > > > + > > > +Sub-Page Protection Table (SPPT) is introduced to manage sub-page > > > +write-access. > > > + > > > +SPPT is active when: > > > +a) moddule parameter spp=on is configured for kvm-intel.ko > > > +b) large paging is disabled on host > > > +c) "sub-page write protection" VM-execution control bit is set > > > +SPPT looks up the guest physical address to seek a 64-bit > > > +bitmap indicating sub-page write permission in SPPT leaf entry. > > > + > > > +When the "sub-page write protection" VM-execution control is 1, the SPPT > > > +is used to lookup write permission bits for the 128 byte sub-page regions > > > +contained in the 4KB guest physical page. EPT specifies the 4KB page > > > +write-protection privilege whereas SPPT defines the write permissions > > > +at 128-byte granularity within one 4KB page. Write accesses > > > +prevented due to sub-page permissions induces EPT violation VM exits. > > > +Similar to EPT, a logical processor uses SPPT to lookup sub-page level > > > +write permissions for guest-physical addresses only when those addresses > > > +are used to access memory. > > > +__ > > > + > > > +How SPP hardware works: > > > +__ > > > + > > > +Guest write access --> GPA --> Walk EPT --> EPT leaf entry -| > > > +|---| > > > +|-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61) > > > + | > > > + |-> --> EPT legacy behavior > > > + | > > > + | > > > + |-> --> if ept_leaf_entry.writable > > > + | > > > + |-> --> Ignore SPP > > > + | > > > + |-> --> GPA --> Walk SPP 4-level table--| > > > + | > > > +|<--get-the-SPPT-point-from-VMCS-filed-<--| > > > +| > > > +Walk SPP L4E table > > > +| > > > +|---> if-entry-misconfiguration >---|---<-| > > > + | | | > > > +else| | > > > + | | | > > > + | |--SPP VMexit<-| | > > > + | || > > > + | |
Re: [PATCH v3 1/9] Documentation: Introduce EPT based Subpage Protection
Hi, Weijiang, Does this require some specific Intel processors or is it supported by older processors as well? -Jidong On Thu, Jun 6, 2019 at 9:33 AM Yang Weijiang wrote: > > Signed-off-by: Yang Weijiang > --- > Documentation/virtual/kvm/spp_kvm.txt | 216 ++ > 1 file changed, 216 insertions(+) > create mode 100644 Documentation/virtual/kvm/spp_kvm.txt > > diff --git a/Documentation/virtual/kvm/spp_kvm.txt > b/Documentation/virtual/kvm/spp_kvm.txt > new file mode 100644 > index ..4b5edcaf48b6 > --- /dev/null > +++ b/Documentation/virtual/kvm/spp_kvm.txt > @@ -0,0 +1,216 @@ > +EPT-Based Sub-Page Protection (SPP) for KVM > += > + > +1. Overview > + > +EPT-based Sub-Page Protection (SPP) capability to allow Virtual Machine > +Monitors to specify write-protection for guest physical memory at a > +sub-page (128 byte) granularity. When this capability is utilized, the > +CPU enforces write-access permissions for sub-page regions inside 4K pages > +as specified by the VMI tools. > + > +2. Operation of SPP > + > +Sub-Page Protection Table (SPPT) is introduced to manage sub-page > +write-access. > + > +SPPT is active when: > +a) moddule parameter spp=on is configured for kvm-intel.ko > +b) large paging is disabled on host > +c) "sub-page write protection" VM-execution control bit is set > +SPPT looks up the guest physical address to seek a 64-bit > +bitmap indicating sub-page write permission in SPPT leaf entry. > + > +When the "sub-page write protection" VM-execution control is 1, the SPPT > +is used to lookup write permission bits for the 128 byte sub-page regions > +contained in the 4KB guest physical page. EPT specifies the 4KB page > +write-protection privilege whereas SPPT defines the write permissions > +at 128-byte granularity within one 4KB page. Write accesses > +prevented due to sub-page permissions induces EPT violation VM exits. > +Similar to EPT, a logical processor uses SPPT to lookup sub-page level > +write permissions for guest-physical addresses only when those addresses > +are used to access memory. > +__ > + > +How SPP hardware works: > +__ > + > +Guest write access --> GPA --> Walk EPT --> EPT leaf entry -| > +|---| > +|-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61) > + | > + |-> --> EPT legacy behavior > + | > + | > + |-> --> if ept_leaf_entry.writable > + | > + |-> --> Ignore SPP > + | > + |-> --> GPA --> Walk SPP 4-level table--| > + | > +|<--get-the-SPPT-point-from-VMCS-filed-<--| > +| > +Walk SPP L4E table > +| > +|---> if-entry-misconfiguration >---|---<-| > + | | | > +else| | > + | | | > + | |--SPP VMexit<-| | > + | || > + | |-> exit_qualification & sppt_misconfig --> sppt misconfig | > + | || > + | |-> exit_qualification & sppt_miss --> sppt miss | > + |---|| > + || > +walk SPPT L3E--|--> if-entry-misconfiguration>| > + | | > + else| > + | | > + | | > +walk SPPT L2E --|--> if-entry-misconfiguration>---| > +| | > + else | > +| | > +| | > + walk SPPT L1E --|-> if-entry-misconfiguration--->| > + | > + else > + | > + |-> if sub-page writable > + |-> allow, write access > + |-> disallow, EPT violation >
Re: [PATCH 3/3] rpmsg: virtio_rpmsg_bus: get buffer size from config space
On Wed, Jun 5, 2019 at 4:02 PM Arnaud Pouliquen wrote: > > > > On 6/5/19 4:40 AM, xiang xiao wrote: > > On Tue, Jun 4, 2019 at 10:25 PM Arnaud Pouliquen > > wrote: > >> > >> Hello Xiang, > >> > >> On 5/9/19 3:00 PM, xiang xiao wrote: > >>> On Thu, May 9, 2019 at 8:36 PM Arnaud Pouliquen > >>> wrote: > >>>> > >>>> Hello Xiang, > >>>> > >>>> Similar mechanism has been proposed by Loic 2 years ago (link to the > >>>> series here https://lkml.org/lkml/2017/3/28/349). > >>>> > >>>> Did you see them? Regarding history, patches seem just on hold... > >>>> > >>> > >>> Just saw this patchset, so it's common problem hit by many vendor, > >>> rpmsg framework need to address it.:) > >>> > >>>> Main differences (except interesting RX/TX size split) seems that you > >>>> - don't use the virtio_config_ops->get > >>> > >>> virtio_cread call virtio_config_ops->get internally, the ideal is same > >>> for both patch, just the implementation detail is different. > >>> > >>>> - define a new feature VIRTIO_RPMSG_F_NS. > >>> > >>> I add this flag to keep the compatibility with old remote peer, and > >>> also follow the common virito driver practice. > >> I discussed with Loic, he is ok to go further with your patch and > >> abandon his one. Please find some remarks below in-line > >>> > >>>> > >>>> Regards > >>>> Arnaud > >>>> > >>>> > >>>> On 1/31/19 4:41 PM, Xiang Xiao wrote: > >>>>> 512 bytes isn't always suitable for all case, let firmware > >>>>> maker decide the best value from resource table. > >>>>> enable by VIRTIO_RPMSG_F_BUFSZ feature bit. > >>>>> > >>>>> Signed-off-by: Xiang Xiao > >>>>> --- > >>>>> drivers/rpmsg/virtio_rpmsg_bus.c | 50 > >>>>> +-- > >>>>> include/uapi/linux/virtio_rpmsg.h | 24 +++ > >>>>> 2 files changed, 56 insertions(+), 18 deletions(-) > >>>>> create mode 100644 include/uapi/linux/virtio_rpmsg.h > >>>>> > >>>>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c > >>>>> b/drivers/rpmsg/virtio_rpmsg_bus.c > >>>>> index 59c4554..049dd97 100644 > >>>>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c > >>>>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c > >>>>> @@ -16,6 +16,7 @@ > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> +#include > >>>>> #include > >>>>> #include > >>>>> #include > >>>>> @@ -38,7 +39,8 @@ > >>>>> * @sbufs: kernel address of tx buffers > >>>>> * @num_rbufs: total number of buffers for rx > >>>>> * @num_sbufs: total number of buffers for tx > >>>>> - * @buf_size:size of one rx or tx buffer > >>>>> + * @rbuf_size: size of one rx buffer > >>>>> + * @sbuf_size: size of one tx buffer > >>>>> * @last_sbuf: index of last tx buffer used > >>>>> * @rbufs_dma: dma base addr of rx buffers > >>>>> * @sbufs_dma: dma base addr of tx buffers > >>>>> @@ -61,7 +63,8 @@ struct virtproc_info { > >>>>> void *rbufs, *sbufs; > >>>>> unsigned int num_rbufs; > >>>>> unsigned int num_sbufs; > >>>>> - unsigned int buf_size; > >>>>> + unsigned int rbuf_size; > >>>>> + unsigned int sbuf_size; > >>>>> int last_sbuf; > >>>>> dma_addr_t rbufs_dma; > >>>>> dma_addr_t sbufs_dma; > >>>>> @@ -73,9 +76,6 @@ struct virtproc_info { > >>>>> struct rpmsg_endpoint *ns_ept; > >>>>> }; > >>>>> > >>>>> -/* The feature bitmap for virtio rpmsg */ > >>>>> -#define VIRTIO_RPMSG_F_NS0 /* RP supports name service > >>>>> notifications */ > >>>>> - > >>>>> /** > >>>>> *
Re: [PATCH 0/3] Enhance virtio rpmsg bus driver buffer allocation
On Wed, Jun 5, 2019 at 3:33 PM Arnaud Pouliquen wrote: > > Hi Bjorn, > > On 6/5/19 6:34 AM, Bjorn Andersson wrote: > > On Thu 31 Jan 07:41 PST 2019, Xiang Xiao wrote: > > > >> Hi, > >> This series enhance the buffer allocation by: > >> 1.Support the different buffer number in rx/tx direction > >> 2.Get the individual rx/tx buffer size from config space > >> > >> Here is the related OpenAMP change: > >> https://github.com/OpenAMP/open-amp/pull/155 > >> > > > > This looks pretty reasonable, but can you confirm that it's possible to > > use new firmware with an old Linux kernel when introducing this? > > > > > > But ever since we discussed Loic's similar proposal earlier I've been > > questioning if the fixed buffer size isn't just an artifact of how we > > preallocate our buffers. The virtqueue seems to support arbitrary sizes > > of buffers and I see that the receive function in OpenAMP has been fixed > > to put back the buffer of the size that was received, rather than 512 > > bytes. So it seems like Linux would be able to send whatever size > > messages to OpenAMP it would handle it. > > > > The question is if we could do the same in the other direction, perhaps > > by letting the OpenAMP side do it's message allocation when it's > > sending, rather than Linux pushing inbufs to be filled by the remote. > > IMHO, both could be useful and could be not correlated. > On-the fly buffer allocation seems more efficient but needs an > allocator.This can be a generic allocator (with a va to da) for system > where large amount of memories are accessible from both side. > > Now what about system with small shared memory? In this case you have to > deal with a limited/optimized memory chunk. To avoid memory > fragmentation the allocator should have a pre-reserved buffers pool(so > similar to existing implementation). This serie seems useful to optimize > the size of the pre-reserved pool. > Maybe we can reuse rxbuf_size/txbuf_size to trigger the different allocation policy: 1.If buf_size equal 0xfff, turn on the dynamic allocator 2.If buf_size equall 0, turn on the fixed allocator with the default buffer size 3.otherwise, turn on the fixed allocator with the configed buffer size So, both requirement could be satisfied without breaking the compatibility. > > > > This would remove the problem of always having suboptimal buffer sizes. > > > > Regards, > > Bjorn > > > >> Xiang Xiao (3): > >> rpmsg: virtio_rpmsg_bus: allow the different vring size for send/recv > >> rpmsg: virtio_rpmsg_bus: allocate rx/tx buffer separately > >> rpmsg: virtio_rpmsg_bus: get buffer size from config space > >> > >> drivers/rpmsg/virtio_rpmsg_bus.c | 127 > >> +++--- > >> include/uapi/linux/virtio_rpmsg.h | 24 +++ > >> 2 files changed, 100 insertions(+), 51 deletions(-) > >> create mode 100644 include/uapi/linux/virtio_rpmsg.h > >> > >> -- > >> 2.7.4 > >> > > -- > > Regards, > Arnaud
Re: [PATCH 3/3] rpmsg: virtio_rpmsg_bus: get buffer size from config space
On Tue, Jun 4, 2019 at 10:25 PM Arnaud Pouliquen wrote: > > Hello Xiang, > > On 5/9/19 3:00 PM, xiang xiao wrote: > > On Thu, May 9, 2019 at 8:36 PM Arnaud Pouliquen > > wrote: > >> > >> Hello Xiang, > >> > >> Similar mechanism has been proposed by Loic 2 years ago (link to the > >> series here https://lkml.org/lkml/2017/3/28/349). > >> > >> Did you see them? Regarding history, patches seem just on hold... > >> > > > > Just saw this patchset, so it's common problem hit by many vendor, > > rpmsg framework need to address it.:) > > > >> Main differences (except interesting RX/TX size split) seems that you > >> - don't use the virtio_config_ops->get > > > > virtio_cread call virtio_config_ops->get internally, the ideal is same > > for both patch, just the implementation detail is different. > > > >> - define a new feature VIRTIO_RPMSG_F_NS. > > > > I add this flag to keep the compatibility with old remote peer, and > > also follow the common virito driver practice. > I discussed with Loic, he is ok to go further with your patch and > abandon his one. Please find some remarks below in-line > > > >> > >> Regards > >> Arnaud > >> > >> > >> On 1/31/19 4:41 PM, Xiang Xiao wrote: > >>> 512 bytes isn't always suitable for all case, let firmware > >>> maker decide the best value from resource table. > >>> enable by VIRTIO_RPMSG_F_BUFSZ feature bit. > >>> > >>> Signed-off-by: Xiang Xiao > >>> --- > >>> drivers/rpmsg/virtio_rpmsg_bus.c | 50 > >>> +-- > >>> include/uapi/linux/virtio_rpmsg.h | 24 +++ > >>> 2 files changed, 56 insertions(+), 18 deletions(-) > >>> create mode 100644 include/uapi/linux/virtio_rpmsg.h > >>> > >>> diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c > >>> b/drivers/rpmsg/virtio_rpmsg_bus.c > >>> index 59c4554..049dd97 100644 > >>> --- a/drivers/rpmsg/virtio_rpmsg_bus.c > >>> +++ b/drivers/rpmsg/virtio_rpmsg_bus.c > >>> @@ -16,6 +16,7 @@ > >>> #include > >>> #include > >>> #include > >>> +#include > >>> #include > >>> #include > >>> #include > >>> @@ -38,7 +39,8 @@ > >>> * @sbufs: kernel address of tx buffers > >>> * @num_rbufs: total number of buffers for rx > >>> * @num_sbufs: total number of buffers for tx > >>> - * @buf_size:size of one rx or tx buffer > >>> + * @rbuf_size: size of one rx buffer > >>> + * @sbuf_size: size of one tx buffer > >>> * @last_sbuf: index of last tx buffer used > >>> * @rbufs_dma: dma base addr of rx buffers > >>> * @sbufs_dma: dma base addr of tx buffers > >>> @@ -61,7 +63,8 @@ struct virtproc_info { > >>> void *rbufs, *sbufs; > >>> unsigned int num_rbufs; > >>> unsigned int num_sbufs; > >>> - unsigned int buf_size; > >>> + unsigned int rbuf_size; > >>> + unsigned int sbuf_size; > >>> int last_sbuf; > >>> dma_addr_t rbufs_dma; > >>> dma_addr_t sbufs_dma; > >>> @@ -73,9 +76,6 @@ struct virtproc_info { > >>> struct rpmsg_endpoint *ns_ept; > >>> }; > >>> > >>> -/* The feature bitmap for virtio rpmsg */ > >>> -#define VIRTIO_RPMSG_F_NS0 /* RP supports name service notifications > >>> */ > >>> - > >>> /** > >>> * struct rpmsg_hdr - common header for all rpmsg messages > >>> * @src: source address > >>> @@ -452,7 +452,7 @@ static void *get_a_tx_buf(struct virtproc_info *vrp) > >>> > >>> /* either pick the next unused tx buffer */ > >>> if (vrp->last_sbuf < vrp->num_sbufs) > >>> - ret = vrp->sbufs + vrp->buf_size * vrp->last_sbuf++; > >>> + ret = vrp->sbufs + vrp->sbuf_size * vrp->last_sbuf++; > >>> /* or recycle a used one */ > >>> else > >>> ret = virtqueue_get_buf(vrp->svq, ); > >>> @@ -578,7 +578,7 @@ static int rpmsg_send_offchannel_raw(struct > >>> rpmsg_device *rpdev, > >>>* messaging), or t
Re: [PATCH] net: compat: fix msg_controllen overflow in scm_detach_fds_compat()
On Tue, Jun 4, 2019 at 8:46 PM Daniel Borkmann wrote: > > On 06/04/2019 02:31 PM, Young Xiao wrote: > > There is a missing check between kmsg->msg_controllen and cmlen, > > which can possibly lead to overflow. > > > > This bug is similar to vulnerability that was fixed in commit 6900317f5eff > > ("net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds"). > > Back then I mentioned in commit 6900317f5eff: > > In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an > issue in my tests as alignment seems always on 4 byte boundary. Same > should be in case of native 32 bit, where we end up with 4 byte boundaries > as well. > > Do you have an actual reproducer or is it based on code inspection? based on inspection. > > > Signed-off-by: Young Xiao <92siuy...@gmail.com> > > --- > > net/compat.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/compat.c b/net/compat.c > > index a031bd3..8e74dfb 100644 > > --- a/net/compat.c > > +++ b/net/compat.c > > @@ -301,6 +301,8 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct > > scm_cookie *scm) > > err = put_user(cmlen, >cmsg_len); > > if (!err) { > > cmlen = CMSG_COMPAT_SPACE(i * sizeof(int)); > > + if (kmsg->msg_controllen < cmlen) > > + cmlen = kmsg->msg_controllen; > > kmsg->msg_control += cmlen; > > kmsg->msg_controllen -= cmlen; > > } > > > -- Best regards! Young ---
[PATCH] iio:core: Fix bug in length of event info_mask and catch unhandled bits set in masks.
The incorrect limit for the for_each_set_bit loop was noticed whilst fixing this other case. Note that as we only have 3 possible entries a the moment and the value was set to 4, the bug would not have any effect currently. It will bite fairly soon though, so best fix it now. See commit ef4b4856593f ("iio:core: Fix bug in length of event info_mask and catch unhandled bits set in masks.") for details. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/iio/industrialio-core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c index f5a4581..dd8873a 100644 --- a/drivers/iio/industrialio-core.c +++ b/drivers/iio/industrialio-core.c @@ -1107,6 +1107,8 @@ static int iio_device_add_info_mask_type_avail(struct iio_dev *indio_dev, char *avail_postfix; for_each_set_bit(i, infomask, sizeof(*infomask) * 8) { + if (i >= ARRAY_SIZE(iio_chan_info_postfix)) + return -EINVAL; avail_postfix = kasprintf(GFP_KERNEL, "%s_available", iio_chan_info_postfix[i]); -- 2.7.4
[PATCH] net: compat: fix msg_controllen overflow in scm_detach_fds_compat()
There is a missing check between kmsg->msg_controllen and cmlen, which can possibly lead to overflow. This bug is similar to vulnerability that was fixed in commit 6900317f5eff ("net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds"). Signed-off-by: Young Xiao <92siuy...@gmail.com> --- net/compat.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/compat.c b/net/compat.c index a031bd3..8e74dfb 100644 --- a/net/compat.c +++ b/net/compat.c @@ -301,6 +301,8 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm) err = put_user(cmlen, >cmsg_len); if (!err) { cmlen = CMSG_COMPAT_SPACE(i * sizeof(int)); + if (kmsg->msg_controllen < cmlen) + cmlen = kmsg->msg_controllen; kmsg->msg_control += cmlen; kmsg->msg_controllen -= cmlen; } -- 2.7.4
[PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()
If vpif_probe() fails on v4l2_device_register() and vpif_probe_complete(), then memory allocated at initialize_vpif() for global vpif_obj.dev[i] become unreleased. The patch adds deallocation of vpif_obj.dev[i] on the error path. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/media/platform/davinci/vpif_capture.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/media/platform/davinci/vpif_capture.c b/drivers/media/platform/davinci/vpif_capture.c index b5aacb0..75c2c10 100644 --- a/drivers/media/platform/davinci/vpif_capture.c +++ b/drivers/media/platform/davinci/vpif_capture.c @@ -1385,6 +1385,14 @@ static int initialize_vpif(void) return err; } +static inline void free_vpif_objs(void) +{ + int i; + + for (i = 0; i < VPIF_CAPTURE_MAX_DEVICES; i++) + kfree(vpif_obj.dev[i]); +} + static int vpif_async_bound(struct v4l2_async_notifier *notifier, struct v4l2_subdev *subdev, struct v4l2_async_subdev *asd) @@ -1654,7 +1662,7 @@ static __init int vpif_probe(struct platform_device *pdev) err = v4l2_device_register(vpif_dev, _obj.v4l2_dev); if (err) { v4l2_err(vpif_dev->driver, "Error registering v4l2 device\n"); - goto cleanup; + goto vpif_free; } while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, res_idx))) { @@ -1701,7 +1709,9 @@ static __init int vpif_probe(struct platform_device *pdev) "registered sub device %s\n", subdevdata->name); } - vpif_probe_complete(); + err = vpif_probe_complete(); + if (err) + goto probe_subdev_out; } else { vpif_obj.notifier.ops = _async_ops; err = v4l2_async_notifier_register(_obj.v4l2_dev, @@ -1720,6 +1730,8 @@ static __init int vpif_probe(struct platform_device *pdev) kfree(vpif_obj.sd); vpif_unregister: v4l2_device_unregister(_obj.v4l2_dev); +vpif_free: + free_vpif_objs(); cleanup: v4l2_async_notifier_cleanup(_obj.notifier); -- 2.7.4
Re: [PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()
On Tue, Jun 4, 2019 at 4:15 PM Lad, Prabhakar wrote: > > Hi Young, > > Thanks for the patch. > > On Tue, Jun 4, 2019 at 8:49 AM Young Xiao <92siuy...@gmail.com> wrote: > > > > If vpif_probe() fails on v4l2_device_register() and vpif_probe_complete(), > > then memory allocated at initialize_vpif() for global vpif_obj.dev[i] > > become unreleased. > > > > The patch adds deallocation of vpif_obj.dev[i] on the error path. > > > > Signed-off-by: Young Xiao <92siuy...@gmail.com> > > --- > > drivers/media/platform/davinci/vpif_capture.c | 19 --- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/media/platform/davinci/vpif_capture.c > > b/drivers/media/platform/davinci/vpif_capture.c > > index b5aacb0..277d500 100644 > > --- a/drivers/media/platform/davinci/vpif_capture.c > > +++ b/drivers/media/platform/davinci/vpif_capture.c > > @@ -1385,6 +1385,14 @@ static int initialize_vpif(void) > > return err; > > } > > > > +static void free_vpif_objs(void) > > +{ > function could be made inline. > > > + int i; > > + > > + for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++) > > VPIF_DISPLAY_MAX_DEVICES ? this should be VPIF_CAPTURE_MAX_DEVICES > > > + kfree(vpif_obj.dev[i]); > > +} > > + > > static int vpif_async_bound(struct v4l2_async_notifier *notifier, > > struct v4l2_subdev *subdev, > > struct v4l2_async_subdev *asd) > > @@ -1654,7 +1662,7 @@ static __init int vpif_probe(struct platform_device > > *pdev) > > err = v4l2_device_register(vpif_dev, _obj.v4l2_dev); > > if (err) { > > v4l2_err(vpif_dev->driver, "Error registering v4l2 > > device\n"); > > - goto cleanup; > > + goto vpif_free; > > } > > > > while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, > > res_idx))) { > > @@ -1701,7 +1709,10 @@ static __init int vpif_probe(struct platform_device > > *pdev) > > "registered sub device %s\n", > >subdevdata->name); > > } > > - vpif_probe_complete(); > > + err = vpif_probe_complete(); > > + if (err) { > > + goto probe_subdev_out; > > + } > > No need for { and } as per kernel coding style. Sorry, I can not get your point here. There is no need to check the return value of vpif_probe_complete(), isn't it? So, we just fix the memory leak in the error path of v4l2_device_register()? > > > } else { > > vpif_obj.notifier.ops = _async_ops; > > err = v4l2_async_notifier_register(_obj.v4l2_dev, > > @@ -1720,6 +1731,8 @@ static __init int vpif_probe(struct platform_device > > *pdev) > > kfree(vpif_obj.sd); > > vpif_unregister: > > v4l2_device_unregister(_obj.v4l2_dev); > > +vpif_free: > > + free_vpif_objs(); > > cleanup: > > v4l2_async_notifier_cleanup(_obj.notifier); > > > > @@ -1748,8 +1761,8 @@ static int vpif_remove(struct platform_device *device) > > ch = vpif_obj.dev[i]; > > /* Unregister video device */ > > video_unregister_device(>video_dev); > > - kfree(vpif_obj.dev[i]); > > } > > + free_vpif_objs(); > > no need for this change, leave it as it is. > > Cheers, > Prabhakar Lad -- Best regards! Young ---
[PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()
If vpif_probe() fails on v4l2_device_register() and vpif_probe_complete(), then memory allocated at initialize_vpif() for global vpif_obj.dev[i] become unreleased. The patch adds deallocation of vpif_obj.dev[i] on the error path. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/media/platform/davinci/vpif_capture.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/media/platform/davinci/vpif_capture.c b/drivers/media/platform/davinci/vpif_capture.c index b5aacb0..277d500 100644 --- a/drivers/media/platform/davinci/vpif_capture.c +++ b/drivers/media/platform/davinci/vpif_capture.c @@ -1385,6 +1385,14 @@ static int initialize_vpif(void) return err; } +static void free_vpif_objs(void) +{ + int i; + + for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++) + kfree(vpif_obj.dev[i]); +} + static int vpif_async_bound(struct v4l2_async_notifier *notifier, struct v4l2_subdev *subdev, struct v4l2_async_subdev *asd) @@ -1654,7 +1662,7 @@ static __init int vpif_probe(struct platform_device *pdev) err = v4l2_device_register(vpif_dev, _obj.v4l2_dev); if (err) { v4l2_err(vpif_dev->driver, "Error registering v4l2 device\n"); - goto cleanup; + goto vpif_free; } while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, res_idx))) { @@ -1701,7 +1709,10 @@ static __init int vpif_probe(struct platform_device *pdev) "registered sub device %s\n", subdevdata->name); } - vpif_probe_complete(); + err = vpif_probe_complete(); + if (err) { + goto probe_subdev_out; + } } else { vpif_obj.notifier.ops = _async_ops; err = v4l2_async_notifier_register(_obj.v4l2_dev, @@ -1720,6 +1731,8 @@ static __init int vpif_probe(struct platform_device *pdev) kfree(vpif_obj.sd); vpif_unregister: v4l2_device_unregister(_obj.v4l2_dev); +vpif_free: + free_vpif_objs(); cleanup: v4l2_async_notifier_cleanup(_obj.notifier); @@ -1748,8 +1761,8 @@ static int vpif_remove(struct platform_device *device) ch = vpif_obj.dev[i]; /* Unregister video device */ video_unregister_device(>video_dev); - kfree(vpif_obj.dev[i]); } + free_vpif_objs(); return 0; } -- 2.7.4
Re: [PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()
Yes, you are correct. I will fix the issue and resubmit the patch again. On Mon, Jun 3, 2019 at 7:55 PM Hans Verkuil wrote: > > On 5/29/19 3:09 PM, Young Xiao wrote: > > If vpif_probe() fails on vpif_probe_complete(), then memory > > allocated at initialize_vpif() for global vpif_obj.dev[i] > > become unreleased. > > > > The patch adds deallocation of vpif_obj.dev[i] on the error path. > > > > Signed-off-by: Young Xiao <92siuy...@gmail.com> > > --- > > drivers/media/platform/davinci/vpif_capture.c | 16 ++-- > > 1 file changed, 14 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/media/platform/davinci/vpif_capture.c > > b/drivers/media/platform/davinci/vpif_capture.c > > index b5aacb0..63e6ec4 100644 > > --- a/drivers/media/platform/davinci/vpif_capture.c > > +++ b/drivers/media/platform/davinci/vpif_capture.c > > @@ -1621,6 +1621,14 @@ vpif_capture_get_pdata(struct platform_device *pdev) > > return NULL; > > } > > > > +static void free_vpif_objs(void) > > +{ > > + int i; > > + > > + for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++) > > + kfree(vpif_obj.dev[i]); > > +} > > + > > /** > > * vpif_probe : This function probes the vpif capture driver > > * @pdev: platform device pointer > > @@ -1701,7 +1709,10 @@ static __init int vpif_probe(struct platform_device > > *pdev) > > "registered sub device %s\n", > > subdevdata->name); > > } > > - vpif_probe_complete(); > > + err = vpif_probe_complete(); > > + if (err) { > > + goto probe_subdev_out; > > + } > > No need for { and } as per kernel coding style. > > > } else { > > vpif_obj.notifier.ops = _async_ops; > > err = v4l2_async_notifier_register(_obj.v4l2_dev, > > @@ -1722,6 +1733,7 @@ static __init int vpif_probe(struct platform_device > > *pdev) > > v4l2_device_unregister(_obj.v4l2_dev); > > cleanup: > > v4l2_async_notifier_cleanup(_obj.notifier); > > + free_vpif_objs(); > > This would break the case where initialize_vpif() returns an error, since > initialize_vpif already frees these objects on error. > > In this case the easiest way of doing this is to just return if > initialize_vpif > returns an error. No need to clean up anything. > > Regards, > > Hans > > > > > return err; > > } > > @@ -1748,8 +1760,8 @@ static int vpif_remove(struct platform_device *device) > > ch = vpif_obj.dev[i]; > > /* Unregister video device */ > > video_unregister_device(>video_dev); > > - kfree(vpif_obj.dev[i]); > > } > > + free_vpif_objs() > > return 0; > > } > > > > >
Re: [PATCH] ipv6: Prevent overrun when parsing v6 header options
Sorry, I don't get your point. Why is xfrm6_transport_output() buggy? The point is that there would be out-of-bound access in mip6_destopt_offset() and mip6_destopt_offset(), since there is no sanity check for offset. There is chance that offset + sizeof(struct ipv6_opt_hdr) > packet_len. As described in CVE-2017-9074: "The IPv6 fragmentation implementation in the Linux kernel through 4.11.1 does not consider that the nexthdr field may be associated with an invalid option, which allows local users to cause a denial of service (out-of-bounds read and BUG)". At the same time, there are bugs in mip6_destopt_offset() and mip6_destopt_offset(), which is similar to CVE-2017-7542. On Sat, Jun 1, 2019 at 1:35 AM Eric Dumazet wrote: > > > > On 5/30/19 8:04 PM, Yang Xiao wrote: > > On Fri, May 31, 2019 at 1:17 AM Eric Dumazet wrote: > >> > >> > >> > >> On 5/30/19 8:28 AM, Young Xiao wrote: > >>> The fragmentation code tries to parse the header options in order > >>> to figure out where to insert the fragment option. Since nexthdr points > >>> to an invalid option, the calculation of the size of the network header > >>> can made to be much larger than the linear section of the skb and data > >>> is read outside of it. > >>> > >>> This vulnerability is similar to CVE-2017-9074. > >>> > >>> Signed-off-by: Young Xiao <92siuy...@gmail.com> > >>> --- > >>> net/ipv6/mip6.c | 24 ++-- > >>> 1 file changed, 14 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c > >>> index 64f0f7b..30ed1c5 100644 > >>> --- a/net/ipv6/mip6.c > >>> +++ b/net/ipv6/mip6.c > >>> @@ -263,8 +263,6 @@ static int mip6_destopt_offset(struct xfrm_state *x, > >>> struct sk_buff *skb, > >>> u8 **nexthdr) > >>> { > >>> u16 offset = sizeof(struct ipv6hdr); > >>> - struct ipv6_opt_hdr *exthdr = > >>> -(struct ipv6_opt_hdr *)(ipv6_hdr(skb) + > >>> 1); > >>> const unsigned char *nh = skb_network_header(skb); > >>> unsigned int packet_len = skb_tail_pointer(skb) - > >>> skb_network_header(skb); > >>> @@ -272,7 +270,8 @@ static int mip6_destopt_offset(struct xfrm_state *x, > >>> struct sk_buff *skb, > >>> > >>> *nexthdr = _hdr(skb)->nexthdr; > >>> > >>> - while (offset + 1 <= packet_len) { > >>> + while (offset <= packet_len) { > >>> + struct ipv6_opt_hdr *exthdr; > >>> > >>> switch (**nexthdr) { > >>> case NEXTHDR_HOP: > >>> @@ -299,12 +298,15 @@ static int mip6_destopt_offset(struct xfrm_state > >>> *x, struct sk_buff *skb, > >>> return offset; > >>> } > >>> > >>> + if (offset + sizeof(struct ipv6_opt_hdr) > packet_len) > >>> + return -EINVAL; > >>> + > >>> + exthdr = (struct ipv6_opt_hdr *)(nh + offset); > >>> offset += ipv6_optlen(exthdr); > >>> *nexthdr = >nexthdr; > >>> - exthdr = (struct ipv6_opt_hdr *)(nh + offset); > >>> } > >>> > >>> - return offset; > >>> + return -EINVAL; > >>> } > >>> > >> > >> > >> Ok, but have you checked that callers have been fixed ? > > > > I've checked the callers. There are two callers: > > xfrm6_transport_output() and xfrm6_ro_output(). There are checks in > > both function. > > > > -- > > hdr_len = x->type->hdr_offset(x, skb, ); > > if (hdr_len < 0) > > return hdr_len; > > -- > >> > >> xfrm6_transport_output() seems buggy as well, > >> unless the skbs are linearized before entering these functions ? > > I can not understand what you mean about this comment. > > Could you explain it in more detail. > > > If we had a problem, then the memmove(ipv6_hdr(skb), iph, hdr_len); > in xfrm6_transport_output() would be buggy, since iph could also point to > freed memory. > > > -- Best regards! Young ---
Re: [PATCH] ipvlan: Don't propagate IFF_ALLMULTI changes on down interfaces.
Based on your explanation, I found that the patch I submitted is useless. Great thanks and sorry for the trouble. On Mon, Jun 3, 2019 at 3:54 PM Xin Long wrote: > > On Mon, Jun 3, 2019 at 11:22 AM Young Xiao <92siuy...@gmail.com> wrote: > > > > Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti > > overflow on the underlying interface. > > > > Attempting the set IFF_ALLMULTI on the underlying interface would cause an > > error and the log message: > > > > "allmulti touches root, set allmulti failed." > s/root/roof > > I guess this patch was inspired by: > > commit bbeb0eadcf9fe74fb2b9b1a6fea82cd538b1e556 > Author: Peter Christensen > Date: Thu May 8 11:15:37 2014 +0200 > > macvlan: Don't propagate IFF_ALLMULTI changes on down interfaces. > > I could trigger this error on macvlan prior to this patch with: > > # ip link add mymacvlan1 link eth2 type macvlan mode bridge > # ip link set mymacvlan1 up > # ip link set mymacvlan1 allmulticast on > # ip link set mymacvlan1 down > # ip link set mymacvlan1 allmulticast off > # ip link set mymacvlan1 allmulticast on > > but not on ipvlan, could you? > > macvlan had this problem, as lowerdev's allmulticast is cleared when doing > dev_stop and set when doing dev_open, which doesn't happen on ipvlan. > > So I'd think this patch fixes nothing unless you want to add > dev_set_allmulti(1/-1) in ipvlan_open/stop(), but that's another topic. > > did I miss something? > > > > > Signed-off-by: Young Xiao <92siuy...@gmail.com> > > --- > > drivers/net/ipvlan/ipvlan_main.c | 6 -- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ipvlan/ipvlan_main.c > > b/drivers/net/ipvlan/ipvlan_main.c > > index bbeb162..523bb83 100644 > > --- a/drivers/net/ipvlan/ipvlan_main.c > > +++ b/drivers/net/ipvlan/ipvlan_main.c > > @@ -242,8 +242,10 @@ static void ipvlan_change_rx_flags(struct net_device > > *dev, int change) > > struct ipvl_dev *ipvlan = netdev_priv(dev); > > struct net_device *phy_dev = ipvlan->phy_dev; > > > > - if (change & IFF_ALLMULTI) > > - dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : > > -1); > > + if (dev->flags & IFF_UP) { > > + if (change & IFF_ALLMULTI) > > + dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI > > ? 1 : -1); > > + } > > } > > > > static void ipvlan_set_multicast_mac_filter(struct net_device *dev) > > -- > > 2.7.4 > >
[PATCH] ipvlan: Don't propagate IFF_ALLMULTI changes on down interfaces.
Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti overflow on the underlying interface. Attempting the set IFF_ALLMULTI on the underlying interface would cause an error and the log message: "allmulti touches root, set allmulti failed." Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/net/ipvlan/ipvlan_main.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index bbeb162..523bb83 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -242,8 +242,10 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change) struct ipvl_dev *ipvlan = netdev_priv(dev); struct net_device *phy_dev = ipvlan->phy_dev; - if (change & IFF_ALLMULTI) - dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1); + if (dev->flags & IFF_UP) { + if (change & IFF_ALLMULTI) + dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI ? 1 : -1); + } } static void ipvlan_set_multicast_mac_filter(struct net_device *dev) -- 2.7.4
[PATCH] unicore32: check stack pointer in get_wchan
get_wchan() is lockless. Task may wakeup at any time and change its own stack, thus each next stack frame may be overwritten and filled with random stuff. This patch fixes oops in unwind_frame() by adding stack pointer validation on each step (as x86 code do), unwind_frame() already checks frame pointer. See commit 1b15ec7a7427 ("ARM: 7912/1: check stack pointer in get_wchan") for details. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- arch/unicore32/kernel/process.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c index 2bc10b8..1899ebc 100644 --- a/arch/unicore32/kernel/process.c +++ b/arch/unicore32/kernel/process.c @@ -277,6 +277,7 @@ EXPORT_SYMBOL(dump_fpu); unsigned long get_wchan(struct task_struct *p) { struct stackframe frame; + unsigned long stack_page; int count = 0; if (!p || p == current || p->state == TASK_RUNNING) return 0; @@ -285,9 +286,11 @@ unsigned long get_wchan(struct task_struct *p) frame.sp = thread_saved_sp(p); frame.lr = 0; /* recovered from the stack */ frame.pc = thread_saved_pc(p); + stack_page = (unsigned long)task_stack_page(p); do { - int ret = unwind_frame(); - if (ret < 0) + if (frame.sp < stack_page || + frame.sp >= stack_page + THREAD_SIZE || + unwind_frame() < 0) return 0; if (!in_sched_functions(frame.pc)) return frame.pc; -- 2.7.4
Re: [PATCH] ipv6: Prevent overrun when parsing v6 header options
On Fri, May 31, 2019 at 11:57 PM Eric Dumazet wrote: > > > > On 5/31/19 7:54 AM, Herbert Xu wrote: > > On Fri, May 31, 2019 at 07:50:06AM -0700, Eric Dumazet wrote: > >> > >> What do you mean by should ? > >> > >> Are they currently already linearized before the function is called, > >> or is it missing and a bug needs to be fixed ? > > > > AFAICS this is the code-path for locally generated outbound packets. > > Under what circumstances can the IPv6 header be not in the head? > > > > > > I guess this means we had yet another random submission from Young Xiao :/ Excuse me, what do you mean about random submission from Young? A month ago, I submitted the patch, and I was told that the format should be correct. Then, I resubmitted again. > > Thanks. >
[PATCH] pinctrl: pinctrl-mtk-common: fix a possible NULL pointer deference
The function, external interrupt controller, is made as an optional to pinctrl. But if we don't want pio behaves as an external interrupt controller, it would lead to pctl->eint not be created properly and then will cause 'kernel NULL pointer' issue when gpiochip try to call .to_irq or .set_config. To fix it, check pctl->eint before accessing the member. See commit 5f591543a937 ("pinctrl: mt7622: fix a kernel panic when pio don't work as EINT controller") for details. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/pinctrl/mediatek/pinctrl-mtk-common.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c index 0716238..b395f0b 100644 --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c @@ -836,6 +836,9 @@ static int mtk_gpio_to_irq(struct gpio_chip *chip, unsigned offset) const struct mtk_desc_pin *pin; unsigned long eint_n; + if (!pctl->eint) + return -ENOTSUPP; + pin = pctl->devdata->pins + offset; if (pin->eint.eintnum == NO_EINT_SUPPORT) return -EINVAL; @@ -853,7 +856,8 @@ static int mtk_gpio_set_config(struct gpio_chip *chip, unsigned offset, unsigned long eint_n; u32 debounce; - if (pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE) + if (!pctl->eint || + pinconf_to_config_param(config) != PIN_CONFIG_INPUT_DEBOUNCE) return -ENOTSUPP; pin = pctl->devdata->pins + offset; -- 2.7.4
Re: [PATCH] ipv6: Prevent overrun when parsing v6 header options
On Fri, May 31, 2019 at 1:17 AM Eric Dumazet wrote: > > > > On 5/30/19 8:28 AM, Young Xiao wrote: > > The fragmentation code tries to parse the header options in order > > to figure out where to insert the fragment option. Since nexthdr points > > to an invalid option, the calculation of the size of the network header > > can made to be much larger than the linear section of the skb and data > > is read outside of it. > > > > This vulnerability is similar to CVE-2017-9074. > > > > Signed-off-by: Young Xiao <92siuy...@gmail.com> > > --- > > net/ipv6/mip6.c | 24 ++-- > > 1 file changed, 14 insertions(+), 10 deletions(-) > > > > diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c > > index 64f0f7b..30ed1c5 100644 > > --- a/net/ipv6/mip6.c > > +++ b/net/ipv6/mip6.c > > @@ -263,8 +263,6 @@ static int mip6_destopt_offset(struct xfrm_state *x, > > struct sk_buff *skb, > > u8 **nexthdr) > > { > > u16 offset = sizeof(struct ipv6hdr); > > - struct ipv6_opt_hdr *exthdr = > > -(struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 1); > > const unsigned char *nh = skb_network_header(skb); > > unsigned int packet_len = skb_tail_pointer(skb) - > > skb_network_header(skb); > > @@ -272,7 +270,8 @@ static int mip6_destopt_offset(struct xfrm_state *x, > > struct sk_buff *skb, > > > > *nexthdr = _hdr(skb)->nexthdr; > > > > - while (offset + 1 <= packet_len) { > > + while (offset <= packet_len) { > > + struct ipv6_opt_hdr *exthdr; > > > > switch (**nexthdr) { > > case NEXTHDR_HOP: > > @@ -299,12 +298,15 @@ static int mip6_destopt_offset(struct xfrm_state *x, > > struct sk_buff *skb, > > return offset; > > } > > > > + if (offset + sizeof(struct ipv6_opt_hdr) > packet_len) > > + return -EINVAL; > > + > > + exthdr = (struct ipv6_opt_hdr *)(nh + offset); > > offset += ipv6_optlen(exthdr); > > *nexthdr = >nexthdr; > > - exthdr = (struct ipv6_opt_hdr *)(nh + offset); > > } > > > > - return offset; > > + return -EINVAL; > > } > > > > > Ok, but have you checked that callers have been fixed ? I've checked the callers. There are two callers: xfrm6_transport_output() and xfrm6_ro_output(). There are checks in both function. -- hdr_len = x->type->hdr_offset(x, skb, ); if (hdr_len < 0) return hdr_len; -- > > xfrm6_transport_output() seems buggy as well, > unless the skbs are linearized before entering these functions ? I can not understand what you mean about this comment. Could you explain it in more detail. > > Thanks. > > >
[PATCH] ipv6: Prevent overrun when parsing v6 header options
The fragmentation code tries to parse the header options in order to figure out where to insert the fragment option. Since nexthdr points to an invalid option, the calculation of the size of the network header can made to be much larger than the linear section of the skb and data is read outside of it. This vulnerability is similar to CVE-2017-9074. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- net/ipv6/mip6.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c index 64f0f7b..30ed1c5 100644 --- a/net/ipv6/mip6.c +++ b/net/ipv6/mip6.c @@ -263,8 +263,6 @@ static int mip6_destopt_offset(struct xfrm_state *x, struct sk_buff *skb, u8 **nexthdr) { u16 offset = sizeof(struct ipv6hdr); - struct ipv6_opt_hdr *exthdr = - (struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 1); const unsigned char *nh = skb_network_header(skb); unsigned int packet_len = skb_tail_pointer(skb) - skb_network_header(skb); @@ -272,7 +270,8 @@ static int mip6_destopt_offset(struct xfrm_state *x, struct sk_buff *skb, *nexthdr = _hdr(skb)->nexthdr; - while (offset + 1 <= packet_len) { + while (offset <= packet_len) { + struct ipv6_opt_hdr *exthdr; switch (**nexthdr) { case NEXTHDR_HOP: @@ -299,12 +298,15 @@ static int mip6_destopt_offset(struct xfrm_state *x, struct sk_buff *skb, return offset; } + if (offset + sizeof(struct ipv6_opt_hdr) > packet_len) + return -EINVAL; + + exthdr = (struct ipv6_opt_hdr *)(nh + offset); offset += ipv6_optlen(exthdr); *nexthdr = >nexthdr; - exthdr = (struct ipv6_opt_hdr *)(nh + offset); } - return offset; + return -EINVAL; } static int mip6_destopt_init_state(struct xfrm_state *x) @@ -399,8 +401,6 @@ static int mip6_rthdr_offset(struct xfrm_state *x, struct sk_buff *skb, u8 **nexthdr) { u16 offset = sizeof(struct ipv6hdr); - struct ipv6_opt_hdr *exthdr = - (struct ipv6_opt_hdr *)(ipv6_hdr(skb) + 1); const unsigned char *nh = skb_network_header(skb); unsigned int packet_len = skb_tail_pointer(skb) - skb_network_header(skb); @@ -408,7 +408,8 @@ static int mip6_rthdr_offset(struct xfrm_state *x, struct sk_buff *skb, *nexthdr = _hdr(skb)->nexthdr; - while (offset + 1 <= packet_len) { + while (offset <= packet_len) { + struct ipv6_opt_hdr *exthdr; switch (**nexthdr) { case NEXTHDR_HOP: @@ -434,12 +435,15 @@ static int mip6_rthdr_offset(struct xfrm_state *x, struct sk_buff *skb, return offset; } + if (offset + sizeof(struct ipv6_opt_hdr) > packet_len) + return -EINVAL; + + exthdr = (struct ipv6_opt_hdr *)(nh + offset); offset += ipv6_optlen(exthdr); *nexthdr = >nexthdr; - exthdr = (struct ipv6_opt_hdr *)(nh + offset); } - return offset; + return -EINVAL; } static int mip6_rthdr_init_state(struct xfrm_state *x) -- 2.7.4
Re: [PATCH] amd64-agp: fix arbitrary kernel memory writes
I am not so sure about taking off the cast, just to be in line with patch in 194b3da873fd. The comment can be deleted. On Wed, May 29, 2019 at 4:35 PM Greg KH wrote: > > On Wed, May 29, 2019 at 12:52:01PM +0800, Young Xiao wrote: > > pg_start is copied from userspace on AGPIOC_BIND and AGPIOC_UNBIND ioctl > > cmds of agp_ioctl() and passed to agpioc_bind_wrap(). As said in the > > comment, (pg_start + mem->page_count) may wrap in case of AGPIOC_BIND, > > and it is not checked at all in case of AGPIOC_UNBIND. As a result, user > > with sufficient privileges (usually "video" group) may generate either > > local DoS or privilege escalation. > > > > See commit 194b3da873fd ("agp: fix arbitrary kernel memory writes") > > for details. > > > > Signed-off-by: Young Xiao <92siuy...@gmail.com> > > --- > > drivers/char/agp/amd64-agp.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c > > index c69e39f..5daa0e3 100644 > > --- a/drivers/char/agp/amd64-agp.c > > +++ b/drivers/char/agp/amd64-agp.c > > @@ -60,7 +60,8 @@ static int amd64_insert_memory(struct agp_memory *mem, > > off_t pg_start, int type) > > > > /* Make sure we can fit the range in the gatt table. */ > > /* FIXME: could wrap */ > > - if (((unsigned long)pg_start + mem->page_count) > num_entries) > > + if (((pg_start + mem->page_count) > num_entries) || > > + ((pg_start + mem->page_count) < pg_start)) > > Why did you take off the cast for the first test? > > And if this really does fix this issue, should you remove the FIXME > line? > > thanks, > > greg k-h -- Best regards! Young ---
Re: [PATCH] ipv4: tcp_input: fix stack out of bounds when parsing TCP options.
Indeed, condition opsize < 2 and opsize > length can deduce that length >= 2. However, before the condition (if opsize < 2), there may be one-byte out-of-bound access in line 12. I'm not sure whether I have put it very clearly. On Wed, May 29, 2019 at 10:20 PM Eric Dumazet wrote: > > On Wed, May 29, 2019 at 1:10 AM Young Xiao <92siuy...@gmail.com> wrote: > > > > The TCP option parsing routines in tcp_parse_options function could > > read one byte out of the buffer of the TCP options. > > > > 1 while (length > 0) { > > 2 int opcode = *ptr++; > > 3 int opsize; > > 4 > > 5 switch (opcode) { > > 6 case TCPOPT_EOL: > > 7 return; > > 8 case TCPOPT_NOP:/* Ref: RFC 793 section 3.1 */ > > 9 length--; > > 10continue; > > 11default: > > 12opsize = *ptr++; //out of bound access > > > > If length = 1, then there is an access in line2. > > And another access is occurred in line 12. > > This would lead to out-of-bound access. > > > > Therefore, in the patch we check that the available data length is > > larger enough to pase both TCP option code and size. > > > > Signed-off-by: Young Xiao <92siuy...@gmail.com> > > --- > > net/ipv4/tcp_input.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index 20f6fac..9775825 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -3791,6 +3791,8 @@ void tcp_parse_options(const struct net *net, > > length--; > > continue; > > default: > > + if (length < 2) > > + return; > > opsize = *ptr++; > > if (opsize < 2) /* "silly options" */ > > return; > > In practice we are good, since we have at least 320 bytes of room there, > and the test done later catches silly options. > > if (opsize < 2) /* "silly options" */ > return; > if (opsize > length) /* remember, opsize >= 2 here */ > return; /* don't parse partial options */ > > I guess adding yet another conditional will make this code obviously > correct for all eyes > and various tools. > > Thanks. > > Signed-off-by: Eric Dumazet -- Best regards! Young ---
[PATCH] media: davinci: vpif_capture: fix memory leak in vpif_probe()
If vpif_probe() fails on vpif_probe_complete(), then memory allocated at initialize_vpif() for global vpif_obj.dev[i] become unreleased. The patch adds deallocation of vpif_obj.dev[i] on the error path. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/media/platform/davinci/vpif_capture.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/media/platform/davinci/vpif_capture.c b/drivers/media/platform/davinci/vpif_capture.c index b5aacb0..63e6ec4 100644 --- a/drivers/media/platform/davinci/vpif_capture.c +++ b/drivers/media/platform/davinci/vpif_capture.c @@ -1621,6 +1621,14 @@ vpif_capture_get_pdata(struct platform_device *pdev) return NULL; } +static void free_vpif_objs(void) +{ + int i; + + for (i = 0; i < VPIF_DISPLAY_MAX_DEVICES; i++) + kfree(vpif_obj.dev[i]); +} + /** * vpif_probe : This function probes the vpif capture driver * @pdev: platform device pointer @@ -1701,7 +1709,10 @@ static __init int vpif_probe(struct platform_device *pdev) "registered sub device %s\n", subdevdata->name); } - vpif_probe_complete(); + err = vpif_probe_complete(); + if (err) { + goto probe_subdev_out; + } } else { vpif_obj.notifier.ops = _async_ops; err = v4l2_async_notifier_register(_obj.v4l2_dev, @@ -1722,6 +1733,7 @@ static __init int vpif_probe(struct platform_device *pdev) v4l2_device_unregister(_obj.v4l2_dev); cleanup: v4l2_async_notifier_cleanup(_obj.notifier); + free_vpif_objs(); return err; } @@ -1748,8 +1760,8 @@ static int vpif_remove(struct platform_device *device) ch = vpif_obj.dev[i]; /* Unregister video device */ video_unregister_device(>video_dev); - kfree(vpif_obj.dev[i]); } + free_vpif_objs() return 0; } -- 2.7.4
[PATCH] isdn: hisax: hfc_2bds0: Fix a possible concurrency use-after-free bug in HFCD_l1hw()
In drivers/isdn/hisax/hfc_2bds0.c, the function hfc2bds0_interrupt() and HFCD_l1hw() may be concurrently executed. HFCD_l1hw() line 969: if (!cs->tx_skb) hfc2bds0_interrupt() line 875: dev_kfree_skb_irq(cs->tx_skb); Thus, a possible concurrency use-after-free bug may occur in HFCD_l1hw(). To fix these bugs, the calls to spin_lock_irqsave() and spin_unlock_irqrestore() are added in HFCD_l1hw(), to protect the access to cs->tx_skb. See commit 7418e6520f22 ("isdn: hisax: hfc_pci: Fix a possible concurrency use-after-free bug in HFCPCI_l1hw()") for details. Signed-off-by: Young Xiao <92siuy...@gmail.com> --- drivers/isdn/hisax/hfc_2bds0.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/isdn/hisax/hfc_2bds0.c b/drivers/isdn/hisax/hfc_2bds0.c index 3715fa0..ade12c0 100644 --- a/drivers/isdn/hisax/hfc_2bds0.c +++ b/drivers/isdn/hisax/hfc_2bds0.c @@ -966,11 +966,13 @@ HFCD_l1hw(struct PStack *st, int pr, void *arg) if (cs->debug & L1_DEB_LAPD) debugl1(cs, "-> PH_REQUEST_PULL"); #endif + spin_lock_irqsave(>lock, flags); if (!cs->tx_skb) { test_and_clear_bit(FLG_L1_PULL_REQ, >l1.Flags); st->l1.l1l2(st, PH_PULL | CONFIRM, NULL); } else test_and_set_bit(FLG_L1_PULL_REQ, >l1.Flags); + spin_unlock_irqrestore(>lock, flags); break; case (HW_RESET | REQUEST): spin_lock_irqsave(>lock, flags); -- 2.7.4