from:"thomas"

Re: [RFC PATCH v2] ptp: Add vDSO-style vmclock support

2024-06-25 Thread Thomas Gleixner

On Tue, Jun 25 2024 at 20:01, David Woodhouse wrote:
> From: David Woodhouse 
>
> The vmclock "device" provides a shared memory region with precision clock
> information. By using shared memory, it is safe across Live Migration.
>
> Like the KVM PTP clock, this can convert TSC-based cross timestamps into
> KVM clock values. Unlike the KVM PTP clock, it does so only when such is
> actually helpful.
>
> The memory region of the device is also exposed to userspace so it can be
> read or memory mapped by application which need reliable notification of
> clock disruptions.

There is effort underway to expose PTP clocks to user space via
VDSO. Can we please not expose an ad hoc interface for that?

As you might have heard the sad news, I'm not feeling up to the task to
dig deeper into this right now. Give me a couple of days to look at this
with working brain.

Thanks,

tglx

Re: [PATCH 1/1] s390/virtio_ccw: fix config change notifications

2024-06-12 Thread Thomas Huth


On 11/06/2024 23.47, Halil Pasic wrote:

Commit e3e9bda38e6d ("s390/virtio_ccw: use DMA handle from DMA API")
broke configuration change notifications for virtio-ccw by putting the
DMA address of *indicatorp directly into ccw->cda disregarding the fact
that if !!(vcdev->is_thinint) then the function
virtio_ccw_register_adapter_ind() will overwrite that ccw->cda value
with the address of the virtio_thinint_area so it can actually set up
the adapter interrupts via CCW_CMD_SET_IND_ADAPTER.  Thus we end up
pointing to the wrong object for both CCW_CMD_SET_IND if setting up the
adapter interrupts fails, and for CCW_CMD_SET_CONF_IND regardless
whether it succeeds or fails.

To fix this, let us save away the dma address of *indicatorp in a local
variable, and copy it to ccw->cda after the "vcdev->is_thinint" branch.

Reported-by: Boqiao Fu 
Reported-by: Sebastian Mitterle 
Fixes: e3e9bda38e6d ("s390/virtio_ccw: use DMA handle from DMA API")
Signed-off-by: Halil Pasic 
---
I know that checkpatch.pl complains about a missing 'Closes' tag.
Unfortunately I don't have an appropriate URL at hand. @Sebastian,
@Boqiao: do you have any suggetions?


Closes: https://issues.redhat.com/browse/RHEL-39983
?

Anyway, I've tested the patch and it indeed fixes the problem with 
virtio-balloon and the link state for me:


Tested-by: Thomas Huth

[PATCH net-next 1/5] net/neighbour: constify ctl_table arguments of utility function

2024-05-27 Thread Thomas Weißschuh

The sysctl core is preparing to only expose instances of
struct ctl_table as "const".
This will also affect the ctl_table argument of sysctl handlers.

As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.

To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".

No functional change.

Signed-off-by: Thomas Weißschuh 
---
 net/core/neighbour.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 45fd88405b6b..277751375b0a 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3578,7 +3578,7 @@ static void neigh_copy_dflt_parms(struct net *net, struct 
neigh_parms *p,
rcu_read_unlock();
 }
 
-static void neigh_proc_update(struct ctl_table *ctl, int write)
+static void neigh_proc_update(const struct ctl_table *ctl, int write)
 {
struct net_device *dev = ctl->extra1;
struct neigh_parms *p = ctl->extra2;

-- 
2.45.1

[PATCH net-next 5/5] ipvs: constify ctl_table arguments of utility functions

2024-05-27 Thread Thomas Weißschuh

The sysctl core is preparing to only expose instances of
struct ctl_table as "const".
This will also affect the ctl_table argument of sysctl handlers.

As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.

To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".

No functional change.

Signed-off-by: Thomas Weißschuh 
---
 net/netfilter/ipvs/ip_vs_ctl.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index b6d0dcf3a5c3..78a1cc72dc38 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1924,7 +1924,8 @@ proc_do_sync_ports(struct ctl_table *table, int write,
return rc;
 }
 
-static int ipvs_proc_est_cpumask_set(struct ctl_table *table, void *buffer)
+static int ipvs_proc_est_cpumask_set(const struct ctl_table *table,
+void *buffer)
 {
struct netns_ipvs *ipvs = table->extra2;
cpumask_var_t *valp = table->data;
@@ -1962,8 +1963,8 @@ static int ipvs_proc_est_cpumask_set(struct ctl_table 
*table, void *buffer)
return ret;
 }
 
-static int ipvs_proc_est_cpumask_get(struct ctl_table *table, void *buffer,
-size_t size)
+static int ipvs_proc_est_cpumask_get(const struct ctl_table *table,
+void *buffer, size_t size)
 {
struct netns_ipvs *ipvs = table->extra2;
cpumask_var_t *valp = table->data;

-- 
2.45.1

[PATCH net-next 0/5] net: constify ctl_table arguments of utility functions

2024-05-27 Thread Thomas Weißschuh

The sysctl core is preparing to only expose instances of
struct ctl_table as "const".
This will also affect the ctl_table argument of sysctl handlers.

As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.

To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".

No functional change.

This patch(set) is meant to be applied through your subsystem tree.
Or at your preference through the sysctl tree.

Motivation
==

Moving structures containing function pointers into unmodifiable .rodata
prevents attackers or bugs from corrupting and diverting those pointers.

Also the "struct ctl_table" exposed by the sysctl core were never meant
to be mutated by users.

For this goal changes to both the sysctl core and "const" qualifiers for
various sysctl APIs are necessary.

Full Process


* Drop ctl_table modifications from the sysctl core ([0], in mainline)
* Constify arguments to ctl_table_root::{set_ownership,permissions}
  ([1], in mainline)
* Migrate users of "ctl_table_header::ctl_table_arg" to "const".
  (in mainline)
* Afterwards convert "ctl_table_header::ctl_table_arg" itself to const.
  (in mainline)
* Prepare helpers used to implement proc_handlers throughout the tree to
  use "const struct ctl_table *". ([2], in progress, this patch)
* Afterwards switch over all proc_handlers callbacks to use
  "const struct ctl_table *" in one commit. ([2], in progress)
  Only custom handlers will be affected, the big commit avoids a
  disruptive and messy transition phase.
* Switch over the internals of the sysctl core to "const struct ctl_table *" 
(to be done)
* Switch include/linux/sysctl.h to "const struct ctl_table *" (to be done)
* Transition instances of "struct ctl_table" through the tree to const (to be 
done)

A work-in-progress view containing all the outlined changes can be found at
https://git.sr.ht/~t-8ch/linux sysctl-constfy

[0] 
https://lore.kernel.org/lkml/20240322-sysctl-empty-dir-v2-0-e559cf8ec...@weissschuh.net/
[1] 
https://lore.kernel.org/lkml/20240315-sysctl-const-ownership-v3-0-b86680eae...@weissschuh.net/
[2] 
https://lore.kernel.org/lkml/20240423-sysctl-const-handler-v3-0-e0beccb83...@weissschuh.net/

---
Thomas Weißschuh (5):
  net/neighbour: constify ctl_table arguments of utility function
  net/ipv4/sysctl: constify ctl_table arguments of utility functions
  net/ipv6/addrconf: constify ctl_table arguments of utility functions
  net/ipv6/ndisc: constify ctl_table arguments of utility function
  ipvs: constify ctl_table arguments of utility functions

 net/core/neighbour.c   | 2 +-
 net/ipv4/sysctl_net_ipv4.c | 6 --
 net/ipv6/addrconf.c| 8 
 net/ipv6/ndisc.c   | 2 +-
 net/netfilter/ipvs/ip_vs_ctl.c | 7 ---
 5 files changed, 14 insertions(+), 11 deletions(-)
---
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
change-id: 20240523-sysctl-const-handler-net-824d4ad5a15a

Best regards,
-- 
Thomas Weißschuh

[PATCH net-next 4/5] net/ipv6/ndisc: constify ctl_table arguments of utility function

2024-05-27 Thread Thomas Weißschuh

The sysctl core is preparing to only expose instances of
struct ctl_table as "const".
This will also affect the ctl_table argument of sysctl handlers.

As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.

To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".

No functional change.

Signed-off-by: Thomas Weißschuh 
---
 net/ipv6/ndisc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index d914b23256ce..254b192c5705 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1936,7 +1936,7 @@ static struct notifier_block ndisc_netdev_notifier = {
 };
 
 #ifdef CONFIG_SYSCTL
-static void ndisc_warn_deprecated_sysctl(struct ctl_table *ctl,
+static void ndisc_warn_deprecated_sysctl(const struct ctl_table *ctl,
 const char *func, const char *dev_name)
 {
static char warncomm[TASK_COMM_LEN];

-- 
2.45.1

[PATCH net-next 3/5] net/ipv6/addrconf: constify ctl_table arguments of utility functions

2024-05-27 Thread Thomas Weißschuh

The sysctl core is preparing to only expose instances of
struct ctl_table as "const".
This will also affect the ctl_table argument of sysctl handlers.

As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.

To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".

No functional change.

Signed-off-by: Thomas Weißschuh 
---
 net/ipv6/addrconf.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 5c424a0e7232..1e69756d53d9 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -863,7 +863,7 @@ static void addrconf_forward_change(struct net *net, __s32 
newf)
}
 }
 
-static int addrconf_fixup_forwarding(struct ctl_table *table, int *p, int newf)
+static int addrconf_fixup_forwarding(const struct ctl_table *table, int *p, 
int newf)
 {
struct net *net;
int old;
@@ -931,7 +931,7 @@ static void addrconf_linkdown_change(struct net *net, __s32 
newf)
}
 }
 
-static int addrconf_fixup_linkdown(struct ctl_table *table, int *p, int newf)
+static int addrconf_fixup_linkdown(const struct ctl_table *table, int *p, int 
newf)
 {
struct net *net;
int old;
@@ -6378,7 +6378,7 @@ static void addrconf_disable_change(struct net *net, 
__s32 newf)
}
 }
 
-static int addrconf_disable_ipv6(struct ctl_table *table, int *p, int newf)
+static int addrconf_disable_ipv6(const struct ctl_table *table, int *p, int 
newf)
 {
struct net *net = (struct net *)table->extra2;
int old;
@@ -6669,7 +6669,7 @@ void addrconf_disable_policy_idev(struct inet6_dev *idev, 
int val)
 }
 
 static
-int addrconf_disable_policy(struct ctl_table *ctl, int *valp, int val)
+int addrconf_disable_policy(const struct ctl_table *ctl, int *valp, int val)
 {
struct net *net = (struct net *)ctl->extra2;
struct inet6_dev *idev;

-- 
2.45.1

[PATCH net-next 2/5] net/ipv4/sysctl: constify ctl_table arguments of utility functions

2024-05-27 Thread Thomas Weißschuh

The sysctl core is preparing to only expose instances of
struct ctl_table as "const".
This will also affect the ctl_table argument of sysctl handlers.

As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.

To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".

No functional change.

Signed-off-by: Thomas Weißschuh 
---
 net/ipv4/sysctl_net_ipv4.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 162a0a3b6ba5..d7892f34a15b 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -130,7 +130,8 @@ static int ipv4_privileged_ports(struct ctl_table *table, 
int write,
return ret;
 }
 
-static void inet_get_ping_group_range_table(struct ctl_table *table, kgid_t 
*low, kgid_t *high)
+static void inet_get_ping_group_range_table(const struct ctl_table *table,
+   kgid_t *low, kgid_t *high)
 {
kgid_t *data = table->data;
struct net *net =
@@ -145,7 +146,8 @@ static void inet_get_ping_group_range_table(struct 
ctl_table *table, kgid_t *low
 }
 
 /* Update system visible IP port range */
-static void set_ping_group_range(struct ctl_table *table, kgid_t low, kgid_t 
high)
+static void set_ping_group_range(const struct ctl_table *table,
+kgid_t low, kgid_t high)
 {
kgid_t *data = table->data;
struct net *net =

-- 
2.45.1

Re: [PATCH v2 1/1] x86/vector: Fix vector leak during CPU offline

2024-05-23 Thread Thomas Gleixner

On Wed, May 22 2024 at 15:02, Dongli Zhang wrote:
> The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
> interrupt affinity reconfiguration via procfs. Instead, the change is
> deferred until the next instance of the interrupt being triggered on the
> original CPU.
>
> When the interrupt next triggers on the original CPU, the new affinity is
> enforced within __irq_move_irq(). A vector is allocated from the new CPU,
> but if the old vector on the original CPU remains online, it is not
> immediately reclaimed. Instead, apicd->move_in_progress is flagged, and the
> reclaiming process is delayed until the next trigger of the interrupt on
> the new CPU.
>
> Upon the subsequent triggering of the interrupt on the new CPU,
> irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
> remains online. Subsequently, the timer on the old CPU iterates over its
> vector_cleanup list, reclaiming old vectors.
>
> However, a rare scenario arises if the old CPU is outgoing before the
> interrupt triggers again on the new CPU. The irq_force_complete_move() may
> not have the chance to be invoked on the outgoing CPU to reclaim the old
> apicd->prev_vector. This is because the interrupt isn't currently affine to
> the outgoing CPU, and irq_needs_fixup() returns false. Even though
> __vector_schedule_cleanup() is later called on the new CPU, it doesn't
> reclaim apicd->prev_vector; instead, it simply resets both
> apicd->move_in_progress and apicd->prev_vector to 0.
>
> As a result, the vector remains unreclaimed in vector_matrix, leading to a
> CPU vector leak.
>
> To address this issue, move the invocation of irq_force_complete_move()
> before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
> interrupt is currently or used to affine to the outgoing CPU. Additionally,
> reclaim the vector in __vector_schedule_cleanup() as well, following a
> warning message, although theoretically it should never see
> apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.

Nice change log!

Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline

2024-05-21 Thread Thomas Gleixner

On Wed, May 15 2024 at 12:51, Dongli Zhang wrote:
> On 5/13/24 3:46 PM, Thomas Gleixner wrote:
>> So yes, moving the invocation of irq_force_complete_move() before the
>> irq_needs_fixup() call makes sense, but it wants this to actually work
>> correctly:
>> @@ -1097,10 +1098,11 @@ void irq_force_complete_move(struct irq_
>>  goto unlock;
>>  
>>  /*
>> - * If prev_vector is empty, no action required.
>> + * If prev_vector is empty or the descriptor was previously
>> + * not on the outgoing CPU no action required.
>>   */
>>  vector = apicd->prev_vector;
>> -if (!vector)
>> +if (!vector || apicd->prev_cpu != smp_processor_id())
>>  goto unlock;
>>  
>
> The above may not work. migrate_one_irq() relies on irq_force_complete_move() 
> to
> always reclaim the apicd->prev_vector. Otherwise, the call of
> irq_do_set_affinity() later may return -EBUSY.

You're right. But that still can be handled in irq_force_complete_move()
with a single unconditional invocation in migrate_one_irq():

cpu = smp_processor_id();
if (!vector || (apicd->cur_cpu != cpu && apicd->prev_cpu != cpu))
goto unlock;

because there are only two cases when a cleanup is required:

   1) The outgoing CPU is the current target

   2) The outgoing CPU was the previous target

No?

Thanks,

tglx

Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline

2024-05-13 Thread Thomas Gleixner

On Mon, May 13 2024 at 10:43, Dongli Zhang wrote:
> On 5/13/24 5:44 AM, Thomas Gleixner wrote:
>> On Fri, May 10 2024 at 12:06, Dongli Zhang wrote:
>> Any interrupt which is affine to an outgoing CPU is migrated and
>> eventually pending moves are enforced:
>> 
>> cpu_down()
>>   ...
>>   cpu_disable_common()
>> fixup_irqs()
>>   irq_migrate_all_off_this_cpu()
>> migrate_one_irq()
>>   irq_force_complete_move()
>> free_moved_vector();
>> 
>> No?
>
> I noticed this and finally abandoned the solution to fix at migrate_one_irq(),
> because:
>
> 1. The objective of migrate_one_irq()-->irq_force_complete_move() looks to
> cleanup before irq_do_set_affinity().
>
> 2. The irq_needs_fixup() may return false so that irq_force_complete_move() 
> does
> not get the chance to trigger.
>
> 3. Even irq_force_complete_move() is triggered, it exits early if
> apicd->prev_vector==0.

But that's not the case, really.

> The apicd->prev_vector can be cleared by __vector_schedule_cleanup() because
> cpu_disable_common() releases the vector_lock after CPU is flagged offline.

Nothing can schedule vector cleanup at that point because _all_ other
CPUs spin in stop_machine() with interrupts disabled and therefore
cannot handle interrupts which might invoke it.

So it does not matter whether the vector lock is dropped or not in
cpu_disable_common().

> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -1035,8 +1035,6 @@ static void __vector_schedule_cleanup(struct
> apic_chip_data *apicd)
> cl->timer.expires = jiffies + 1;
> add_timer_on(>timer, cpu);
> }
> -   } else {
> -   apicd->prev_vector = 0; // or print a warning

This really wants to be a warning.

>> In fact irq_complete_move() should never see apicd->move_in_progress
>> with apicd->prev_cpu pointing to an offline CPU.
>
> I think it is possible. The fact that a CPU is offline doesn't indicate
> fixup_irqs() has already been triggered. The vector_lock is released after CPU
> is flagged offline.

No.

stop_machine()
_ALL_ CPUs rendevouz and spin with interrupts disabled

Outgoing CPU invokes cpu_disable_common()

So it does not matter at all whether vector lock is dropped before
fixup_irqs() is invoked. The new target CPU _cannot_ handle the
interrupt at that point and invoke irq_complete_move().

>> If you can trigger that case, then there is something fundamentally
>> wrong with the CPU hotplug interrupt migration code and that needs to be
>> investigated and fixed.
>> 
>
> I can easily reproduce the issue.

Good.

> I will fix in the interrupt migration code.

You need a proper explanation for the problem first otherwise you can't
fix it.

I understand the failure mode by now. What happens is:

1) Interrupt is affine to CPU11

2) Affinity is set to CPU10

3) Interrupt is raised and handled on CPU11

   irq_move_masked_irq()
 irq_do_set_affinity()
   apicd->prev_cpu = 11;
   apicd->move_in_progress = true;

4) CPU11 goes offline

   irq_needs_fixup() returns false because effective affinity points
   already to CPU 10, so irq_force_complete_move() is not invoked.

5) Interrupt is raised and handled on CPU10

   irq_complete_move()
__vector_schedule_cleanup()
   if (cpu_online(apicd->prev_cpu))<- observes offline

See? So this has nothing to do with vector lock being dropped.

> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 1ed2b1739363..5ecd072a34fe 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -69,6 +69,14 @@ static bool migrate_one_irq(struct irq_desc *desc)
> return false;
> }
>
> +   /*
> +* Complete an eventually pending irq move cleanup. If this
> +* interrupt was moved in hard irq context, then the vectors need
> +* to be cleaned up. It can't wait until this interrupt actually
> +* happens and this CPU was involved.
> +*/
> +   irq_force_complete_move(desc);

You cannot do that here because it is only valid when the interrupt is
affine to the outgoing CPU.

In the problem case the interrupt was affine to the outgoing CPU, but
the core code does not know that it has not been cleaned up yet. It does
not even know that the interrupt was affine to the outgoing CPU before.

So in principle we could just go and do:

} else {
-   apicd->prev_vector = 0;
+   free_moved_vector(apicd);
}
raw_spin_unlock(_lock);

but that would not give enough information and would depend on the
interrupt to actually arrive

Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline

2024-05-13 Thread Thomas Gleixner

On Fri, May 10 2024 at 12:06, Dongli Zhang wrote:
> The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
> interrupt affinity reconfiguration via procfs. Instead, the change is
> deferred until the next instance of the interrupt being triggered on the
> original CPU.
>
> When the interrupt next triggers on the original CPU, the new affinity is
> enforced within __irq_move_irq(). A vector is allocated from the new CPU,
> but if the old vector on the original CPU remains online, it is not
> immediately reclaimed. Instead, apicd->move_in_progress is flagged, and the
> reclaiming process is delayed until the next trigger of the interrupt on
> the new CPU.
>
> Upon the subsequent triggering of the interrupt on the new CPU,
> irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
> remains online. Subsequently, the timer on the old CPU iterates over its
> vector_cleanup list, reclaiming vectors.
>
> However, if the old CPU is offline before the interrupt triggers again on
> the new CPU, irq_complete_move() simply resets both apicd->move_in_progress
> and apicd->prev_vector to 0. Consequently, the vector remains unreclaimed
> in vector_matrix, resulting in a CPU vector leak.

I doubt that.

Any interrupt which is affine to an outgoing CPU is migrated and
eventually pending moves are enforced:

cpu_down()
  ...
  cpu_disable_common()
fixup_irqs()
  irq_migrate_all_off_this_cpu()
migrate_one_irq()
  irq_force_complete_move()
free_moved_vector();

No?

In fact irq_complete_move() should never see apicd->move_in_progress
with apicd->prev_cpu pointing to an offline CPU.

The CPU offline case in __vector_schedule_cleanup() should not even
exist or at least just emit a warning.

If you can trigger that case, then there is something fundamentally
wrong with the CPU hotplug interrupt migration code and that needs to be
investigated and fixed.

Thanks,

tglx

Re: [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure

2024-04-22 Thread Thomas Gleixner

On Mon, Apr 22 2024 at 16:09, Dongli Zhang wrote:
> On 4/22/24 13:58, Thomas Gleixner wrote:
>> On Thu, Apr 18 2024 at 18:33, Dongli Zhang wrote:
> Would you mind suggesting if the below commit message is fine to you?
>
>
> genirq/cpuhotplug: retry with cpu_online_mask when irq_do_set_affinity return
> -ENOSPC
>
> When a CPU goes offline, the interrupts pinned to that CPU are
> re-configured.
>
> Its managed interrupts undergo either migration to other CPUs or shutdown
> if all CPUs listed in the affinity are offline. This patch doesn't affect
> managed interrupts.
>
> For regular interrupts, they are migrated to other selected online CPUs.
> The target CPUs are chosen from either desc->pending_mask (suppose
> CONFIG_GENERIC_PENDING_IRQ) or d->common->affinity (suppose CONFIG_SMP).
> The cpu_online_mask is used as target CPUs only when CPUs in both
> desc->pending_mask and d->common->affinity are offline.
>
> However, there is a bad corner case, when desc->pending_mask or
> d->common->affinity is selected as the target cpumask, but none of their
> CPUs has any available vectors.

Up to here it's fine.

> As a result, an -ENOSPC error happens:
>
>   "IRQ151: set affinity failed(-28)."
>
> This is from the debugfs. The allocation fails although other online CPUs
> (except CPU=2) have many free vectors.

The debugfs output is not really providing more information than the
last sentence. It just occupies space :)

> The steps to reproduce the issue are in [1]. The core idea is:
>
> 1. Create a KVM guest with many virtio-net PCI devices, each configured
> with a very large number of queues/vectors.
>
> 2. Set the affinity of all virtio-net interrupts to "2,3".

That makes absolutely no sense at all. :)

But yes, I can see the non-real world problem with that.

> For regular interrupts, if irq_do_set_affinity() returns -ENOSPC, retry it
> with all online CPUs. The issue does not happen for managed interrupts
> because the vectors are always reserved (in cm->managed_map) before the CPU
> offline operation.
>
> [1]
> https://lore.kernel.org/all/20240419013322.58500-1-dongli.zh...@oracle.com/

The reproduction instructions are just additional information and not
necessarily change log material.

So I'd just say after the above:
> However, there is a bad corner case, when desc->pending_mask or
> d->common->affinity is selected as the target cpumask, but none of their
> CPUs has any available vectors.

   In this case the migration fails and the device interrupt becomes
   stale. This is not any different from the case where the affinity
   mask does not contain any online CPU, but there is no fallback
   operation for this.

   Instead of giving up retry the migration attempt with the online CPU
   mask if the interrupt is not managed as managed interrupts cannot be
   affected by this problem.

Hmm?

> I will change it to a single line.
>
> Would you mind suggesting which is preferred? !cpumask_equal(affinity,
> cpu_online_mask) or (affinity != cpu_online_mask)?

If at all you want !cpumask_subset(cpu_online_mask, affinity), but as
this is a corner case 'affinity != cpu_online_mask' should be good
enough.

Thanks,

tglx

Re: [PATCH 1/1] genirq/cpuhotplug: retry with online CPUs on irq_do_set_affinity failure

2024-04-22 Thread Thomas Gleixner

On Thu, Apr 18 2024 at 18:33, Dongli Zhang wrote:

> When a CPU is offline, its IRQs may migrate to other CPUs. For managed
> IRQs, they are migrated, or shutdown (if all CPUs of the managed IRQ
> affinity are offline). For regular IRQs, there will only be a
> migration.

Please write out interrupts. There is enough space for it and IRQ is
just not a regular word.

> The migrate_one_irq() first uses pending_mask or affinity_mask of the IRQ.
>
> 104 if (irq_fixup_move_pending(desc, true))
> 105 affinity = irq_desc_get_pending_mask(desc);
> 106 else
> 107 affinity = irq_data_get_affinity_mask(d);
>
> The migrate_one_irq() may use all online CPUs, if all CPUs in
> pending_mask/affinity_mask are already offline.
>
> 113 if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
> 114 /*
> 115  * If the interrupt is managed, then shut it down and 
> leave
> 116  * the affinity untouched.
> 117  */
> 118 if (irqd_affinity_is_managed(d)) {
> 119 irqd_set_managed_shutdown(d);
> 120 irq_shutdown_and_deactivate(desc);
> 121 return false;
> 122 }
> 123 affinity = cpu_online_mask;
> 124 brokeaff = true;
> 125 }

Please don't copy code into the change log. Describe the problem in
text.

> However, there is a corner case. Although some CPUs in
> pending_mask/affinity_mask are still online, they are lack of available
> vectors. If the kernel continues calling irq_do_set_affinity() with those 
> CPUs,
> there will be -ENOSPC error.
>
> This is not reasonable as other online CPUs still have many available
> vectors.

Reasonable is not the question here. It's either correct or not.

> name:   VECTOR
>  size:   0
>  mapped: 529
>  flags:  0x0103
> Online bitmaps:7
> Global available:884
> Global reserved:   6
> Total allocated: 539
> System: 36: 0-19,21,50,128,236,243-244,246-255
>  | CPU | avl | man | mac | act | vectors
>  0   147 0 0   55  32-49,51-87
>  1   147 0 0   55  32-49,51-87
>  2 0 0 0  202  32-49,51-127,129-235

Just ouf of curiousity. How did this end up with CPU2 completely
occupied?

>  4   147 0 0   55  32-49,51-87
>  5   147 0 0   55  32-49,51-87
>  6   148 0 0   54  32-49,51-86
>  7   148 0 0   54  32-49,51-86
>
> This issue should not happen for managed IRQs because the vectors are already
> reserved before CPU hotplug.

Should not? It either does or it does not.

> For regular IRQs, do a re-try with all online
> CPUs if the prior irq_do_set_affinity() is failed with -ENOSPC.
>
> Cc: Joe Jin 
> Signed-off-by: Dongli Zhang 
> ---
>  kernel/irq/cpuhotplug.c | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 1ed2b1739363..d1666a6b73f4 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -130,6 +130,19 @@ static bool migrate_one_irq(struct irq_desc *desc)
>* CPU.
>*/
>   err = irq_do_set_affinity(d, affinity, false);
> +
> + if (err == -ENOSPC &&
> + !irqd_affinity_is_managed(d) &&
> + affinity != cpu_online_mask) {

This really wants to be a single line conditional.

> + affinity = cpu_online_mask;
> + brokeaff = true;
> +
> + pr_debug("IRQ%u: set affinity failed for %*pbl, re-try with all 
> online CPUs\n",
> +  d->irq, cpumask_pr_args(affinity));

How is it useful to print cpu_online_mask here?

Thanks,

tglx

[PATCH] selftests/nolibc: libc-test: avoid -Wstringop-overflow warnings

2023-09-10 Thread Thomas Weißschuh

Newer versions of glibc annotate the poll() function with
__attribute__(access) which triggers a compiler warning inside the
testcase poll_fault.
Avoid this by using a plain NULL which is enough for the testcase.
To avoid potential future warnings also adapt the other EFAULT
testcases, except select_fault as NULL is a valid value for its
argument.

nolibc-test.c: In function ‘run_syscall’:
nolibc-test.c:338:62: warning: ‘poll’ writing 8 bytes into a region of size 0 
overflows the destination [-Wstringop-overflow=]
  338 | do { if (!(cond)) result(llen, SKIPPED); else ret += 
expect_syserr2(expr, expret, experr1, experr2, llen); } while (0)
  |  
^~~~
nolibc-test.c:341:9: note: in expansion of macro ‘EXPECT_SYSER2’
  341 | EXPECT_SYSER2(cond, expr, expret, experr, 0)
  | ^
nolibc-test.c:905:47: note: in expansion of macro ‘EXPECT_SYSER’
  905 | CASE_TEST(poll_fault);EXPECT_SYSER(1, 
poll((void *)1, 1, 0), -1, EFAULT); break;
  |   ^~~~
cc1: note: destination object is likely at address zero
In file included from /usr/include/poll.h:1,
 from nolibc-test.c:33:
/usr/include/sys/poll.h:54:12: note: in a call to function ‘poll’ declared with 
attribute ‘access (write_only, 1, 2)’
   54 | extern int poll (struct pollfd *__fds, nfds_t __nfds, int __timeout)
  |^~~~

Signed-off-by: Thomas Weißschuh 
---
 tools/testing/selftests/nolibc/nolibc-test.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/nolibc/nolibc-test.c 
b/tools/testing/selftests/nolibc/nolibc-test.c
index e2b70641a1e7..a0478f8eaee8 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -895,14 +895,14 @@ int run_syscall(int min, int max)
CASE_TEST(lseek_0);   EXPECT_SYSER(1, lseek(0, 0, 
SEEK_SET), -1, ESPIPE); break;
CASE_TEST(mkdir_root);EXPECT_SYSER(1, mkdir("/", 0755), 
-1, EEXIST); break;
CASE_TEST(mmap_bad);  EXPECT_PTRER(1, mmap(NULL, 0, 
PROT_READ, MAP_PRIVATE, 0, 0), MAP_FAILED, EINVAL); break;
-   CASE_TEST(munmap_bad);EXPECT_SYSER(1, munmap((void *)1, 
0), -1, EINVAL); break;
+   CASE_TEST(munmap_bad);EXPECT_SYSER(1, munmap(NULL, 0), 
-1, EINVAL); break;
CASE_TEST(mmap_munmap_good);  EXPECT_SYSZR(1, 
test_mmap_munmap()); break;
CASE_TEST(open_tty);  EXPECT_SYSNE(1, tmp = 
open("/dev/null", 0), -1); if (tmp != -1) close(tmp); break;
CASE_TEST(open_blah); EXPECT_SYSER(1, tmp = 
open("/proc/self/blah", 0), -1, ENOENT); if (tmp != -1) close(tmp); break;
CASE_TEST(pipe);  EXPECT_SYSZR(1, test_pipe()); 
break;
CASE_TEST(poll_null); EXPECT_SYSZR(1, poll(NULL, 0, 
0)); break;
CASE_TEST(poll_stdout);   EXPECT_SYSNE(1, ({ struct pollfd 
fds = { 1, POLLOUT, 0}; poll(, 1, 0); }), -1); break;
-   CASE_TEST(poll_fault);EXPECT_SYSER(1, poll((void *)1, 
1, 0), -1, EFAULT); break;
+   CASE_TEST(poll_fault);EXPECT_SYSER(1, poll(NULL, 1, 0), 
-1, EFAULT); break;
CASE_TEST(prctl); EXPECT_SYSER(1, 
prctl(PR_SET_NAME, (unsigned long)NULL, 0, 0, 0), -1, EFAULT); break;
CASE_TEST(read_badf); EXPECT_SYSER(1, read(-1, , 
1), -1, EBADF); break;
CASE_TEST(rmdir_blah);EXPECT_SYSER(1, rmdir("/blah"), 
-1, ENOENT); break;
@@ -911,7 +911,7 @@ int run_syscall(int min, int max)
CASE_TEST(select_stdout); EXPECT_SYSNE(1, ({ fd_set fds; 
FD_ZERO(); FD_SET(1, ); select(2, NULL, , NULL, NULL); }), -1); 
break;
CASE_TEST(select_fault);  EXPECT_SYSER(1, select(1, (void 
*)1, NULL, NULL, 0), -1, EFAULT); break;
CASE_TEST(stat_blah); EXPECT_SYSER(1, 
stat("/proc/self/blah", _buf), -1, ENOENT); break;
-   CASE_TEST(stat_fault);EXPECT_SYSER(1, stat((void *)1, 
_buf), -1, EFAULT); break;
+   CASE_TEST(stat_fault);EXPECT_SYSER(1, stat(NULL, 
_buf), -1, EFAULT); break;
CASE_TEST(stat_timestamps);   EXPECT_SYSZR(1, 
test_stat_timestamps()); break;
CASE_TEST(symlink_root);  EXPECT_SYSER(1, symlink("/", 
"/"), -1, EEXIST); break;
CASE_TEST(unlink_root);   EXPECT_SYSER(1, unlink("/"), -1, 
EISDIR); break;

---
base-commit: f7a6e4791e3d685eddca29b5d16d183ee0407caa
change-id: 20230910-nolibc-poll-fault-4152a6836ef8

Best regards,
-- 
Thomas Weißschuh

Re: [PATCH V7 03/18] x86/pks: Add additional PKEY helper macros

2021-12-08 Thread Thomas Gleixner

Ira,

On Tue, Dec 07 2021 at 16:51, Ira Weiny wrote:
> On Thu, Nov 25, 2021 at 03:25:09PM +0100, Thomas Gleixner wrote:
>
>  u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits)
>  {
> -  int shift = pkey * PKR_BITS_PER_PKEY;
> +  int shift = PKR_PKEY_SHIFT(pkey);
>
>if (WARN_ON_ONCE(accessbits & ~PKEY_ACCESS_MASK))
>accessbits &= PKEY_ACCESS_MASK;
>
> Better?

Let me postpone this question.

> As to the reason of why to put this patch after the other one.  Why would I
> improve the old pre-refactoring code only to throw it away when moving it to
> pkey_update_pkval()?  This reasoning is even stronger when pkey_update_pkval()
> is implemented.

Which refactoring? We seem to have fundamentally definitions of that
term. Let me illustrate why.

The original version of this was:

  u32 get_new_pkr(u32 old_pkr, int pkey, unsigned long init_val)
  {
int pkey_shift = (pkey * PKR_BITS_PER_PKEY);
u32 new_pkr_bits = 0;

/* Set the bits we need in the register:  */
if (init_val & PKEY_DISABLE_ACCESS)
new_pkr_bits |= PKR_AD_BIT;
if (init_val & PKEY_DISABLE_WRITE)
new_pkr_bits |= PKR_WD_BIT;

/* Shift the bits in to the correct place: */
new_pkr_bits <<= pkey_shift;

/* Mask off any old bits in place: */
old_pkr &= ~((PKR_AD_BIT | PKR_WD_BIT) << pkey_shift);

/* Return the old part along with the new part: */
return old_pkr | new_pkr_bits;
  }

IOW, mechanical Cut & Paste.

Then PeterZ came along and suggested to improve it this way:

  u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
  {
  int pkey_shift = pkey * PKR_BITS_PER_PKEY;

  /*  Mask out old bit values */
  pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);

  /*  Or in new values */
  if (flags & PKEY_DISABLE_ACCESS)
  pk_reg |= PKR_AD_BIT << pkey_shift;
  if (flags & PKEY_DISABLE_WRITE)
  pk_reg |= PKR_WD_BIT << pkey_shift;

  return pk_reg;
  }

which is already better. So you changed your approach from Cut & Paste
to Copy & Paste.

But neither Cut & Paste nor Copy & Paste match what refactoring is
really about. Just throwing the term refactoring at it does not make it
so.

Refactoring is about improving the code in design and implementation.
The keyword is: improving.

There are obviously cases where you can take the code as is and split it
out into a new helper function.

You really have to look at it and answer the question whether it's good
code or not, whether it could be written in better ways and with
improved functionality.

I could have given you this minimalistic one:

  u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits)
  {
  int shift = pkey * PKR_BITS_PER_PKEY;

  pkval &= ~(PKEY_ACCESS_MASK << shift);
  return pkval | (accessbit & PKEY_ACCESS_MASK) << shift;
  }

But I gave you this:

  u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits)
  {
  int shift = pkey * PKR_BITS_PER_PKEY;

  if (WARN_ON_ONCE(accessbits & ~PKEY_ACCESS_MASK))
  accessbits &= PKEY_ACCESS_MASK;

  pkval &= ~(PKEY_ACCESS_MASK << shift);
  return pkval | accessbit << shift;
  }

This is what refactoring is about. See?

> I agree with Dan regarding the macros though.  I think they make it easier to
> see what is going on without dealing with masks and shifts directly.  But I 
> can
> remove this patch if you feel that strongly about it.

I'm not against macros per se, but not everything is automatically
better when it is hidden behind a macro.

What I'm arguing against is the claim that macros are an improvement by
definition. Especially when they are just blindly thrown into code which
should not exist in the first place.

Also versus ordering. What's wrong with doing it this way:

  1) Define the macros first without changing the code

  2) Implement pkey_update_pkval() in a sensible way and use the macros
 where appropriate. Thereby replacing the existing version in the
 other function.

Which would end up in the obviously even simpler code:

  u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits)
  {
  if (WARN_ON_ONCE(accessbits & ~PKEY_ACCESS_MASK))
  accessbits &= PKEY_ACCESS_MASK;

  pkval &= ~PKR_PKEY_VALUE(pkey, PKEY_ACCESS_MASK);
  return pkval | PKR_PKEY_VALUE(pkey, accessbits);
  }

That fits the goal of that macro exercise to make it easy to read and
obvious what's going on, no?

Instead of:

>  u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits)
>  {
> -  int shift = pkey * PKR_BITS_PER_PKEY;
> +  int shift = PKR_PKEY_SHIF

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-12-07 Thread Thomas Gleixner

Ira,

On Mon, Dec 06 2021 at 17:54, Ira Weiny wrote:
> On Thu, Nov 25, 2021 at 03:12:47PM +0100, Thomas Gleixner wrote:
>> > +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
>> > +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
>> > +  /* add space for extended_pt_regs */
>> > +  subq$EXTENDED_PT_REGS_SIZE, %rsp
>> > +#endif
>> > +  .if \annotate_retpoline_safe == 1
>> > +  ANNOTATE_RETPOLINE_SAFE
>> > +  .endif
>> 
>> This annotation is new and nowhere mentioned why it is part of this
>> patch.
>
> I don't understand.  ANNOTATE_RETPOLINE_SAFE has been around since:
>
> 9e0e3c5130e9 x86/speculation, objtool: Annotate indirect calls/jumps
> for objtool

Sorry, I misread that macro maze. It's conditional obviously.

> I can split it if you prefer.  How about a patch with just the x86 extended
> pt_regs stuff but that would leave a zero size for the extended stuff?  Then
> followed by the pks bits?

Whatever makes sense and does one thing per patch.

>> I really have to ask the question whether this #ifdeffery has any value
>> at all. 8 bytes extra stack usage is not going to be the end of the
>> world and distro kernels will enable that config anyway.
>
> My goal with this has always been 0 overhead if turned off.  So this seemed
> like a logical addition.  Furthermore, ARCH_ENABLE_SUPERVISOR_PKEYS is
> predicated on ARCH_HAS_SUPERVISOR_PKEYS which is only available with x86_64.
> This removes the space for x86 when not needed.

The question is not about 64 vs. 32bit. The question is whether the
conditional makes sense for 64bit in the first place. Whether this
matters for 32bit has to be determined. It makes some sense, but less
#ifdeffery and less obfuscation makes sense too.

>> If we really want to save the space then certainly not by sprinkling
>> CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS all over the place and hiding the
>> extra sized ptregs in the pkrs header.
>> 
>> You are changing generic architecture code so you better think about
>> making such a change generic and extensible.
>
> I agree.  And I tried to do so.  The generic entry code is modified only by 
> the
> addition of pkrs_[save|restore]_irq().  These are only defined if the arch
> defines ARCH_HAS_SUPERVISOR_PKEYS and furthermore, if something triggers
> enabling ARCH_ENABLE_SUPERVISOR_PKEYS.

I'm talking about generic _architecture_ code, i.e. the code in
arch/x86/ which affects all vendors and systems.

> ARCH_HAS_SUPERVISOR_PKEYS is restricted to x86_64 at the moment.  All other
> arch's including x86 should not see any changes in the generic code.

That was not the question and I'm well aware of that.

>> If the next feature comes around which needs to save something in that
>> extended area then we are going to change the world again, right?
>
> I'm not sure what you mean by 'change the world'.  I would anticipate the 
> entry
> code to be modified with something similar to pks_[save|restore]_irq() and let
> the arch deal with the specifics.

If on X86 the next X86 specific feature comes around which needs extra
reg space then someone has to change world in arch/x86 again, replace
all the ARCH_ENABLE_SUPERVISOR_PKEYS #ifdefs with something else, right?

Instead of adding a new field to pt_regs_aux and be done with it.

> Also in [1] I thought Peter and Andy agreed that placing additional generic
> state in the extended pt_regs was not needed and does not buy us anything.  I
> specifically asked if that was something we wanted to do in.[2]

This was about a generic representation which affects the common entry
code in kernel/entry/... Can you spot the difference?

What I suggested is _solely_ x86 specific and does not trickle into
anything outside of arch/x86.

>> See? No magic hardcoded constant, no build time error checking for that
>> constant. Nothing, it just works.
>
> Yes agreed definitely an improvement.

It's the right thing to do.

>> That's one part, but let me come back to this:
>> 
>> > +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
>> > +  /* add space for extended_pt_regs */
>> > +  subq$EXTENDED_PT_REGS_SIZE, %rsp
>> 
>> What guarantees that RSP points to pt_regs at this point?  Nothing at
>> all. It's just pure luck and a question of time until this explodes in
>> hard to diagnose ways.
>
> It took me a bit to wrap my head around what I think you mean.  My initial
> response was that rsp should be the stack pointer for __call_ext_ptregs() just
> like it was for call.  But I think I see that it is better to open code this
> since others may want to play the same trick without using this code and
> therefore we may not be getting the extended pt_regs structure on t

Re: [PATCH V7 05/18] x86/pks: Add PKS setup code

2021-11-26 Thread Thomas Gleixner

On Thu, Nov 25 2021 at 16:15, Thomas Gleixner wrote:
> On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> Aside of that, the function which set's up the init value is really
> bogus. As you explained in the cover letter a kernel user has to:
>
>1) Claim an index in the enum
>2) Add a default value to the array in that function
>
> Seriously? How is that any better than doing:
>
> #define PKS_KEY0_DEFAULT  PKR_RW_ENABLE
> #define PKS_KEY1_DEFAULT  PKR_ACCESS_DISABLE
> ...
> #define PKS_KEY15_DEFAULT PKR_ACCESS_DISABLE
>
> and let the compiler construct pkrs_init_value?
>
> TBH, it's not and this function has to be ripped out in case that you
> need a dynamic allocation of keys some day. So what is this buying us
> aside of horrible to read and utterly pointless code?

And as Taoyi confirmed its broken.

It surely takes a reviewer to spot that and an external engineer to run
rdmsr -a to validate that this is not working as expected, right?

Sigh...

Re: [PATCH V7 05/18] x86/pks: Add PKS setup code

2021-11-26 Thread Thomas Gleixner

On Fri, Nov 26 2021 at 11:11, taoyi ty wrote:
> On 11/25/21 11:15 PM, Thomas Gleixner wrote:
>>> +void setup_pks(void)
>>> +{
>>> +   if (!cpu_feature_enabled(X86_FEATURE_PKS))
>>> +   return;
>>> +
>>> +   write_pkrs(pkrs_init_value);
>> 
>> Is the init value set up _before_ this function is invoked for the first
>> time?
>> 
> Setting up for cpu0 is before create_initial_pkrs_value. therefore pkrs 
> value of cpu0 won't be set correctly.
>
> [root@AliYun ~]# rdmsr -a 0x06E1
> 0
> 5554
> 5554
> 5554
> 5554
> 5554
> 5554
> 5554
> 5554
> 5554
>
> Here are my test results after applying the patches

Thanks for confirming what I assumed from looking at the patches!

   tglx

Re: [PATCH V7 03/18] x86/pks: Add additional PKEY helper macros

2021-11-25 Thread Thomas Gleixner

On Thu, Nov 25 2021 at 15:25, Thomas Gleixner wrote:
> On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
>> @@ -200,16 +200,14 @@ __setup("init_pkru=", setup_init_pkru);
>>   */
>>  u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
>>  {
>> -int pkey_shift = pkey * PKR_BITS_PER_PKEY;
>> -
>>  /*  Mask out old bit values */
>> -pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
>> +pk_reg &= ~PKR_PKEY_MASK(pkey);
>>  
>>  /*  Or in new values */
>>  if (flags & PKEY_DISABLE_ACCESS)
>> -pk_reg |= PKR_AD_BIT << pkey_shift;
>> +pk_reg |= PKR_AD_KEY(pkey);
>>  if (flags & PKEY_DISABLE_WRITE)
>> -pk_reg |= PKR_WD_BIT << pkey_shift;
>> +pk_reg |= PKR_WD_KEY(pkey);
>
> I'm not seeing how this is improving that code. Quite the contrary.

Aside of that why are you ordering it the wrong way around, i.e.

   1) implement turd
   2) polish turd

instead of implementing the required helpers first if they are really
providing value.

Thanks,

tglx

Re: [PATCH V7 07/18] x86/pks: Preserve the PKRS MSR on context switch

2021-11-25 Thread Thomas Gleixner

On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> @@ -658,6 +659,8 @@ __switch_to(struct task_struct *prev_p, struct 
> task_struct *next_p)
>   /* Load the Intel cache allocation PQR MSR. */
>   resctrl_sched_in();
>  
> + pkrs_write_current();

This is invoked from switch_to() and does this extra get/put_cpu_ptr()
dance in the write function for no reason. Can you please stop sticking
overhead into the hotpath just because?

Thanks,

tglx

Re: [PATCH V7 05/18] x86/pks: Add PKS setup code

2021-11-25 Thread Thomas Gleixner

On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> +
> +void setup_pks(void);

pks_setup()

> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> +
> +static DEFINE_PER_CPU(u32, pkrs_cache);
> +u32 __read_mostly pkrs_init_value;
> +
> +/*
> + * write_pkrs() optimizes MSR writes by maintaining a per cpu cache which can
> + * be checked quickly.
> + *
> + * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not
> + * serializing but still maintains ordering properties similar to WRPKRU.
> + * The current SDM section on PKRS needs updating but should be the same as
> + * that of WRPKRU.  So to quote from the WRPKRU text:
> + *
> + * WRPKRU will never execute transiently. Memory accesses
> + * affected by PKRU register will not execute (even transiently)
> + * until all prior executions of WRPKRU have completed execution
> + * and updated the PKRU register.
> + */
> +void write_pkrs(u32 new_pkrs)

pkrs_write()

> +{
> + u32 *pkrs;
> +
> + if (!static_cpu_has(X86_FEATURE_PKS))
> + return;

  cpu_feature_enabled() if at all. Why is this function even invoked
  when PKS is off?

> +
> + pkrs = get_cpu_ptr(_cache);

As far as I've seen this is mostly called from non-preemptible
regions. So that get/put pair is pointless. Stick a lockdep assert into
the code and disable preemption at the maybe one callsite which needs
it.

> + if (*pkrs != new_pkrs) {
> + *pkrs = new_pkrs;
> + wrmsrl(MSR_IA32_PKRS, new_pkrs);
> + }
> + put_cpu_ptr(pkrs);
> +}
> +
> +/*
> + * Build a default PKRS value from the array specified by consumers
> + */
> +static int __init create_initial_pkrs_value(void)
> +{
> + /* All users get Access Disabled unless changed below */
> + u8 consumer_defaults[PKS_NUM_PKEYS] = {
> + [0 ... PKS_NUM_PKEYS-1] = PKR_AD_BIT
> + };
> + int i;
> +
> + consumer_defaults[PKS_KEY_DEFAULT] = PKR_RW_BIT;
> +
> + /* Ensure the number of consumers is less than the number of keys */
> + BUILD_BUG_ON(PKS_KEY_NR_CONSUMERS > PKS_NUM_PKEYS);
> +
> + pkrs_init_value = 0;

This needs to be cleared because the BSS might be non zero?

> + /* Fill the defaults for the consumers */
> + for (i = 0; i < PKS_NUM_PKEYS; i++)
> + pkrs_init_value |= PKR_VALUE(i, consumer_defaults[i]);

Also PKR_RW_BIT is a horrible invention:

> +#define PKR_RW_BIT 0x0
>  #define PKR_AD_BIT 0x1
>  #define PKR_WD_BIT 0x2
>  #define PKR_BITS_PER_PKEY 2

This makes my brain spin. How do you fit 3 bits into 2 bits per key?
That's really non-intuitive.

PKR_RW_ENABLE   0x0
PKR_ACCESS_DISABLE  0x1
PKR_WRITE_DISABLE   0x2

makes it obvious what this is about, no?

Aside of that, the function which set's up the init value is really
bogus. As you explained in the cover letter a kernel user has to:

   1) Claim an index in the enum
   2) Add a default value to the array in that function

Seriously? How is that any better than doing:

#define PKS_KEY0_DEFAULTPKR_RW_ENABLE
#define PKS_KEY1_DEFAULTPKR_ACCESS_DISABLE
...
#define PKS_KEY15_DEFAULT   PKR_ACCESS_DISABLE

and let the compiler construct pkrs_init_value?

TBH, it's not and this function has to be ripped out in case that you
need a dynamic allocation of keys some day. So what is this buying us
aside of horrible to read and utterly pointless code?

> + return 0;
> +}
> +early_initcall(create_initial_pkrs_value);
> +
> +/*
> + * PKS is independent of PKU and either or both may be supported on a CPU.
> + * Configure PKS if the CPU supports the feature.
> + */
> +void setup_pks(void)
> +{
> + if (!cpu_feature_enabled(X86_FEATURE_PKS))
> + return;
> +
> + write_pkrs(pkrs_init_value);

Is the init value set up _before_ this function is invoked for the first
time?

Thanks,

tglx

Re: [PATCH V7 03/18] x86/pks: Add additional PKEY helper macros

2021-11-25 Thread Thomas Gleixner

On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> @@ -200,16 +200,14 @@ __setup("init_pkru=", setup_init_pkru);
>   */
>  u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)
>  {
> - int pkey_shift = pkey * PKR_BITS_PER_PKEY;
> -
>   /*  Mask out old bit values */
> - pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
> + pk_reg &= ~PKR_PKEY_MASK(pkey);
>  
>   /*  Or in new values */
>   if (flags & PKEY_DISABLE_ACCESS)
> - pk_reg |= PKR_AD_BIT << pkey_shift;
> + pk_reg |= PKR_AD_KEY(pkey);
>   if (flags & PKEY_DISABLE_WRITE)
> - pk_reg |= PKR_WD_BIT << pkey_shift;
> + pk_reg |= PKR_WD_KEY(pkey);

I'm not seeing how this is improving that code. Quite the contrary.

Thanks,

tglx

Re: [PATCH V7 02/18] x86/fpu: Refactor arch_set_user_pkey_access()

2021-11-25 Thread Thomas Gleixner

On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> +/*
> + * Replace disable bits for @pkey with values from @flags
> + *
> + * Kernel users use the same flags as user space:
> + * PKEY_DISABLE_ACCESS
> + * PKEY_DISABLE_WRITE
> + */
> +u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags)

pkey_ please.

> +{
> + int pkey_shift = pkey * PKR_BITS_PER_PKEY;
> +
> + /*  Mask out old bit values */
> + pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift);
> +
> + /*  Or in new values */
> + if (flags & PKEY_DISABLE_ACCESS)
> + pk_reg |= PKR_AD_BIT << pkey_shift;
> + if (flags & PKEY_DISABLE_WRITE)
> + pk_reg |= PKR_WD_BIT << pkey_shift;
> +
> + return pk_reg;

Also this code is silly.

#define PKEY_ACCESS_MASK(PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)

u32 pkey_update_pkval(u32 pkval, int pkey, u32 accessbits)
{
int shift = pkey * PKR_BITS_PER_PKEY;

if (WARN_ON_ONCE(accessbits & ~PKEY_ACCESS_MASK))
accessbits &= PKEY_ACCESS_MASK;

pkval &= ~(PKEY_ACCESS_MASK << shift);
return pkval | accessbit << pkey_shift;
}

See? It does not even need comments because it's self explaining and
uses sensible argument names matching the function name.

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-11-25 Thread Thomas Gleixner

Ira,

On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> +/*
> + * __call_ext_ptregs - Helper macro to call into C with extended pt_regs
> + * @cfunc:   C function to be called
> + *
> + * This will ensure that extended_ptregs is added and removed as needed 
> during
> + * a call into C code.
> + */
> +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* add space for extended_pt_regs */
> + subq$EXTENDED_PT_REGS_SIZE, %rsp
> +#endif
> + .if \annotate_retpoline_safe == 1
> + ANNOTATE_RETPOLINE_SAFE
> + .endif

This annotation is new and nowhere mentioned why it is part of this
patch.

Can you please do _ONE_ functional change per patch and not a
unreviewable pile of changes in one go? Same applies for the ASM and the
C code changes. The ASM change has to go first and then the C code can
build upon it.

> + call\cfunc
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* remove space for extended_pt_regs */
> + addq$EXTENDED_PT_REGS_SIZE, %rsp
> +#endif

I really have to ask the question whether this #ifdeffery has any value
at all. 8 bytes extra stack usage is not going to be the end of the
world and distro kernels will enable that config anyway.

If we really want to save the space then certainly not by sprinkling
CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS all over the place and hiding the
extra sized ptregs in the pkrs header.

You are changing generic architecture code so you better think about
making such a change generic and extensible. Can folks please start
thinking beyond the brim of their teacup and not pretend that the
feature they are working on is the unicorn which requires unique magic
brandnamed after the unicorn of the day.

If the next feature comes around which needs to save something in that
extended area then we are going to change the world again, right?
Certainly not.

This wants to go into asm/ptrace.h:

struct pt_regs_aux {
u32 pkrs;
};

struct pt_regs_extended {
struct pt_regs_aux  aux;
struct pt_regs  regs __attribute__((aligned(8)));
};

and then have in asm-offset:

   DEFINE(PT_REGS_AUX_SIZE, sizeof(struct pt_regs_extended) - sizeof(struct 
pt_regs));

which does the right thing whatever the size of pt_regs_aux is. So for
the above it will have:

 #define PT_REGS_AUX_SIZE 8 /* sizeof(struct pt_regs_extended) - sizeof(struct 
pt_regs) */

Even, if you do

struct pt_regs_aux {
#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
u32 pkrs;
#endif
};

and the config switch is disabled. It's still correct:

 #define PT_REGS_AUX_SIZE 0 /* sizeof(struct pt_regs_extended) - sizeof(struct 
pt_regs) */

See? No magic hardcoded constant, no build time error checking for that
constant. Nothing, it just works.

That's one part, but let me come back to this:

> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* add space for extended_pt_regs */
> + subq$EXTENDED_PT_REGS_SIZE, %rsp

What guarantees that RSP points to pt_regs at this point?  Nothing at
all. It's just pure luck and a question of time until this explodes in
hard to diagnose ways.

Because between

movq%rsp, %rdi
and
call

can legitimately be other code which causes the stack pointer to
change. It's not the case today, but nothing prevents this in the
future.

The correct thing to do is:

movq%rsp, %rdi
RSP_MAKE_PT_REGS_AUX_SPACE
call...
RSP_REMOVE_PT_REGS_AUX_SPACE

The few extra macro lines in the actual code are way better as they make
it completely obvious what's going on and any misuse can be spotted
easily.

> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> +/*
> + * PKRS is a per-logical-processor MSR which overlays additional protection 
> for
> + * pages which have been mapped with a protection key.
> + *
> + * Context switches save the MSR in the task struct thus taking that value to
> + * other processors if necessary.
> + *
> + * To protect against exceptions having access to this memory save the 
> current
> + * thread value and set the PKRS value to be used during the exception.
> + */
> +void pkrs_save_irq(struct pt_regs *regs)

That's a misnomer as this is invoked for _any_ exception not just
interrupts.

>  #ifdef CONFIG_XEN_PV
>  #ifndef CONFIG_PREEMPTION
>  /*
> @@ -309,6 +361,8 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct 
> pt_regs *regs)
>  
>   inhcall = get_and_clear_inhcall();
>   if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
> + /* Normally called by irqentry_exit, restore pkrs here */
> + pkrs_restore_irq(regs);
>   irqentry_exit_cond_resched();

Sigh. Consistency is overrated

> +
>  void setup_pks(void);
>  void pkrs_write_current(void);
>  void pks_init_task(struct task_struct *task);
> +void write_pkrs(u32 new_pkrs);

So we have pkrs_write_current() and write_pkrs() now. Can you please
stick to a common

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-11-25 Thread Thomas Gleixner

On Fri, Nov 12 2021 at 16:50, Ira Weiny wrote:
> On Tue, Aug 03, 2021 at 09:32:21PM -0700, 'Ira Weiny' wrote:
>> From: Ira Weiny 
>> 
>> The PKRS MSR is not managed by XSAVE.  It is preserved through a context
>> switch but this support leaves exception handling code open to memory
>> accesses during exceptions.
>> 
>> 2 possible places for preserving this state were considered,
>> irqentry_state_t or pt_regs.[1]  pt_regs was much more complicated and
>> was potentially fraught with unintended consequences.[2]  However, Andy
>> came up with a way to hide additional values on the stack which could be
>> accessed as "extended_pt_regs".[3]
>
> Andy,
>
> I'm preparing to send V8 of this PKS work.  But I have not seen any feed back
> since I originally implemented this in V4[1].
>
> Does this meets your expectations?  Are there any issues you can see with this
> code?
>
> I would appreciate your feedback.

Not Andy here, but I'll respond to the patch in a minute.

Thanks,

tglx

Re: [PATCH] ata: pata_rb532: Add OF support and make COMPILE_TESTable

2021-04-20 Thread Thomas Bogendoerfer

On Tue, Apr 20, 2021 at 07:09:26PM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 4/20/21 5:04 PM, Thomas Bogendoerfer wrote:
> 
> > Add OF support for switching RB532 do device tree possible.
> 
>I couldnb't parse that. :-)

no wonder ;-) I'll rephrase in v2.

> 
> > By removing
> > the not needed asm/mach-rc32434/rb.h include the driver could be
> > compile tested now.
> 
>   I think it's a separte issue worth its own patch.

Jens, do you want this in an extra patch ?

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

[PATCH v2 2/2] dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller

2021-04-20 Thread Thomas Bogendoerfer

Document DT bindings for IDT 79RC3243x Interrupt Controller.

Signed-off-by: Thomas Bogendoerfer 
---
 .../interrupt-controller/idt,3243x-pic.yaml   | 48 +++
 1 file changed, 48 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/interrupt-controller/idt,3243x-pic.yaml

diff --git 
a/Documentation/devicetree/bindings/interrupt-controller/idt,3243x-pic.yaml 
b/Documentation/devicetree/bindings/interrupt-controller/idt,3243x-pic.yaml
new file mode 100644
index ..6a1c5ac75f1a
--- /dev/null
+++ b/Documentation/devicetree/bindings/interrupt-controller/idt,3243x-pic.yaml
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/interrupt-controller/idt,3243x-pic.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: IDT 79RC3243x Interrupt Controller Device Tree Bindings
+
+maintainers:
+  - Thomas Bogendoerfer 
+
+allOf:
+  - $ref: /schemas/interrupt-controller.yaml#
+
+properties:
+  "#interrupt-cells":
+const: 1
+
+  compatible:
+const: idt,3243x-pci
+
+  reg:
+maxItems: 1
+
+  interrupt-controller: true
+
+required:
+  - "#interrupt-cells"
+  - compatible
+  - reg
+  - interrupt-controller
+
+additionalProperties: false
+
+examples:
+  - |
+idtpic3: interrupt-controller@3800c {
+compatible = "idt,rc3243x-pic";
+reg = <0x3800c 0x0c>;
+
+interrupt-controller;
+#interrupt-cells = <1>;
+
+interrupt-parent = <>;
+interrupts = <3>;
+};
+
+...
-- 
2.29.2

[PATCH v2 1/2] irqchip: Add support for IDT 79rc3243x interrupt controller

2021-04-20 Thread Thomas Bogendoerfer

IDT 79rc3243x SoCs have rather simple interrupt controllers connected
to the MIPS CPU interrupt lines. Each of them has room for up to
32 interrupts.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/irqchip/Kconfig|   5 ++
 drivers/irqchip/Makefile   |   1 +
 drivers/irqchip/irq-idt3243x.c | 124 +
 3 files changed, 130 insertions(+)
 create mode 100644 drivers/irqchip/irq-idt3243x.c

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index e74fa206240a..55562b36bf3c 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -586,4 +586,9 @@ config MST_IRQ
help
  Support MStar Interrupt Controller.
 
+config IRQ_IDT3243X
+   bool
+   select GENERIC_IRQ_CHIP
+   select IRQ_DOMAIN
+
 endmenu
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index c59b95a0532c..341891443eec 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -113,3 +113,4 @@ obj-$(CONFIG_LOONGSON_PCH_MSI)  += 
irq-loongson-pch-msi.o
 obj-$(CONFIG_MST_IRQ)  += irq-mst-intc.o
 obj-$(CONFIG_SL28CPLD_INTC)+= irq-sl28cpld.o
 obj-$(CONFIG_MACH_REALTEK_RTL) += irq-realtek-rtl.o
+obj-$(CONFIG_IRQ_IDT3243X) += irq-idt3243x.o
diff --git a/drivers/irqchip/irq-idt3243x.c b/drivers/irqchip/irq-idt3243x.c
new file mode 100644
index ..61caf21ef46c
--- /dev/null
+++ b/drivers/irqchip/irq-idt3243x.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for IDT/Renesas 79RC3243x Interrupt Controller.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define IDT_PIC_NR_IRQS32
+
+#define IDT_PIC_IRQ_PEND   0x00
+#define IDT_PIC_IRQ_MASK   0x08
+
+struct idt_pic_data {
+   void __iomem *base;
+   struct irq_domain *irq_domain;
+   struct irq_chip_generic *gc;
+};
+
+static void idt_irq_dispatch(struct irq_desc *desc)
+{
+   struct idt_pic_data *idtpic = irq_desc_get_handler_data(desc);
+   struct irq_chip *host_chip = irq_desc_get_chip(desc);
+   u32 pending, hwirq, virq;
+
+   chained_irq_enter(host_chip, desc);
+
+   pending = irq_reg_readl(idtpic->gc, IDT_PIC_IRQ_PEND);
+   pending &= ~idtpic->gc->mask_cache;
+   while (pending) {
+   hwirq = __fls(pending);
+   virq = irq_linear_revmap(idtpic->irq_domain, hwirq);
+   if (virq)
+   generic_handle_irq(virq);
+   pending &= ~(1 << hwirq);
+   }
+
+   chained_irq_exit(host_chip, desc);
+}
+
+static int idt_pic_init(struct device_node *of_node, struct device_node 
*parent)
+{
+   struct irq_domain *domain;
+   struct idt_pic_data *idtpic;
+   struct irq_chip_generic *gc;
+   struct irq_chip_type *ct;
+   unsigned int parent_irq;
+   int ret = 0;
+
+   idtpic = kzalloc(sizeof(*idtpic), GFP_KERNEL);
+   if (!idtpic) {
+   ret = -ENOMEM;
+   goto out_err;
+   }
+
+   parent_irq = irq_of_parse_and_map(of_node, 0);
+   if (!parent_irq) {
+   pr_err("Failed to map parent IRQ!\n");
+   ret = -EINVAL;
+   goto out_free;
+   }
+
+   idtpic->base = of_iomap(of_node, 0);
+   if (!idtpic->base) {
+   pr_err("Failed to map base address!\n");
+   ret = -ENOMEM;
+   goto out_unmap_irq;
+   }
+
+   domain = irq_domain_add_linear(of_node, IDT_PIC_NR_IRQS,
+  _generic_chip_ops, NULL);
+   if (!domain) {
+   pr_err("Failed to add irqdomain!\n");
+   ret = -ENOMEM;
+   goto out_iounmap;
+   }
+   idtpic->irq_domain = domain;
+
+   ret = irq_alloc_domain_generic_chips(domain, 32, 1, "IDTPIC",
+handle_level_irq, 0,
+IRQ_NOPROBE | IRQ_LEVEL, 0);
+   if (ret)
+   goto out_domain_remove;
+
+   gc = irq_get_domain_generic_chip(domain, 0);
+   gc->reg_base = idtpic->base;
+   gc->private = idtpic;
+
+   ct = gc->chip_types;
+   ct->regs.mask = IDT_PIC_IRQ_MASK;
+   ct->chip.irq_mask = irq_gc_mask_set_bit;
+   ct->chip.irq_unmask = irq_gc_mask_clr_bit;
+   idtpic->gc = gc;
+
+   /* Mask interrupts. */
+   writel(0x, idtpic->base + IDT_PIC_IRQ_MASK);
+   gc->mask_cache = 0x;
+
+   irq_set_chained_handler_and_data(parent_irq,
+idt_irq_dispatch, idtpic);
+
+   return 0;
+
+out_domain_remove:
+   irq_domain_remove(domain);
+out_iounmap:
+   iounmap(idtpic->base);
+out_unmap_irq:
+   irq_dispose_mapping(parent_irq);
+

Re: [PATCH] irqchip: Add support for IDT 79rc3243x interrupt controller

2021-04-20 Thread Thomas Bogendoerfer

On Tue, Apr 20, 2021 at 06:34:59PM +0100, Marc Zyngier wrote:
> On 2021-04-20 13:34, Thomas Bogendoerfer wrote:
> > IDT 79rc3243x SoCs have rather simple interrupt controllers connected
> > to the MIPS CPU interrupt lines. Each of them has room for up to
> > 32 interrupts.
> > 
> > Signed-off-by: Thomas Bogendoerfer 
> 
> Is there a DT binding for this irqchip? The code looks fine, but
> it'd be good if the binding was merged at the same time as the driver.

I'll write one and send a v2.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

Re: [signal] 4bad58ebc8: will-it-scale.per_thread_ops -3.3% regression

2021-04-20 Thread Thomas Gleixner

On Tue, Apr 20 2021 at 11:08, kernel test robot wrote:
> FYI, we noticed a -3.3% regression of will-it-scale.per_thread_ops due to 
> commit:
>
> commit: 4bad58ebc8bc4f20d89cff95417c9b4674769709 ("signal: Allow tasks to 
> cache one sigqueue struct")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
>
> in testcase: will-it-scale
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 
> with 192G memory
> with following parameters:
>
>   nr_task: 100%
>   mode: thread
>   test: futex3
>   cpufreq_governor: performance
>   ucode: 0x5003006
>
> test-description: Will It Scale takes a testcase and runs it from 1 through 
> to n parallel copies to see if the testcase will scale. It builds both a 
> process and threads based test in order to see any differences between the 
> two.
> test-url: https://github.com/antonblanchard/will-it-scale
> commit: 
>   69995ebbb9 ("signal: Hand SIGQUEUE_PREALLOC flag to __sigqueue_alloc()")
>   4bad58ebc8 ("signal: Allow tasks to cache one sigqueue struct")
>
> 69995ebbb9d37173 4bad58ebc8bc4f20d89cff95417 
>  --- 
>  %stddev %change %stddev
>  \  |\  
>  1.273e+09-3.3%  1.231e+09will-it-scale.192.threads
>6630224-3.3%6409738will-it-scale.per_thread_ops
>  1.273e+09-3.3%  1.231e+09will-it-scale.workload
>   1638 ±  3%  -7.8%   1510 ±  5%  
> sched_debug.cfs_rq:/.runnable_avg.max
> 297.83 ± 68%   +1747.6%   5502 ±152%  
> interrupts.33:PCI-MSI.524291-edge.eth0-TxRx-2
> 297.83 ± 68%   +1747.6%   5502 ±152%  
> interrupts.CPU12.33:PCI-MSI.524291-edge.eth0-TxRx-2

This change is definitely not causing more network traffic

>   8200   -33.4%   5459 ± 35%  
> interrupts.CPU27.NMI:Non-maskable_interrupts
>   8200   -33.4%   5459 ± 35%  
> interrupts.CPU27.PMI:Performance_monitoring_interrupts
>   8199   -33.4%   5459 ± 35%  
> interrupts.CPU28.NMI:Non-maskable_interrupts
>   8199   -33.4%   5459 ± 35%  
> interrupts.CPU28.PMI:Performance_monitoring_interrupts
>   6148 ± 33% -11.2%   5459 ± 35%  
> interrupts.CPU29.NMI:Non-maskable_interrupts
>   6148 ± 33% -11.2%   5459 ± 35%  
> interrupts.CPU29.PMI:Performance_monitoring_interrupts
>   4287 ±  8% +33.6%   5730 ± 15%  
> interrupts.CPU49.CAL:Function_call_interrupts
>   6356 ± 19% +49.6%   9509 ± 19%  
> interrupts.CPU97.CAL:Function_call_interrupts

Neither does it increase the number of function calls

> 407730 ±  8% +37.5% 560565 ±  7%  perf-stat.i.dTLB-load-misses
> 415959 ±  8% +40.4% 583928 ±  7%  perf-stat.ps.dTLB-load-misses

And this massive increase does not make sense either.

Confused.

Thanks,

tglx

Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

2021-04-20 Thread Thomas Gleixner

On Tue, Apr 20 2021 at 17:15, Lorenzo Colitti wrote:
> On Fri, Apr 16, 2021 at 1:47 AM Thomas Gleixner  wrote:
>> Enable tracing and enable the following tracepoints:
>> [...]
>
> Sorry for the delay. I had to learn a bit about how to use the tracing
> infrastructure. I don't know if I can post here, but to my untrained
> eye, one big difference between the old (fast) code and the new (slow)
> code is that the new code calls tick_program_event() much more. It
> looks like that makes most of the difference.
>
> With the old code, hrtimer_start_range_ns almost never calls
> tick_program_event at all, but the new code seems to call it twice on
> every timer update.

Yes, I figured out why that happens by now, but the old behaviour was
just incorrect. So now there are clearly two issues:

  1) hrtimer is contrary to timer_list not really suited for high
 frequency start/cancel/start/... cycles of a timer. It's optimized
 for the start and expire precisely case.

  2) The cost for reprogramming depends on the underlying hardware. With
 halfways sane timer hardware it's minimal and as I measured in a
 micro benchmark in the 1% range. Now when your system ends up
 having one of the real timer trainwrecks which can be found in
 drivers/clocksource/ which are even worse than the x86 HPET horror,
 then it's going to be painful and a performance issue.

 I assume that's an ARM64 system. ARM64 CPUs have an architected per
 CPU timer where the reprogramming is pretty fast as it's next to
 the CPU, but who knows what your system is using.

Now in the meantime I looked into __hrtimer_start_range_ns() whether
that double reprogram can be avoided without creating a total trainwreck
and imposing penalty on all sensible use cases. Completely untested
patch below should do the trick and it's not ugly enough that I hate it
with a passion.

Even if that makes your problem go away #1 is still beyond suboptimal
and #2 is something you really want to look into.

Thanks,

tglx
---
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1030,12 +1030,13 @@ static void __remove_hrtimer(struct hrti
  * remove hrtimer, called with base lock held
  */
 static inline int
-remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base, bool 
restart)
+remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base,
+  bool restart, bool keep_local)
 {
u8 state = timer->state;
 
if (state & HRTIMER_STATE_ENQUEUED) {
-   int reprogram;
+   bool reprogram;
 
/*
 * Remove the timer and force reprogramming when high
@@ -1048,8 +1049,16 @@ remove_hrtimer(struct hrtimer *timer, st
debug_deactivate(timer);
reprogram = base->cpu_base == this_cpu_ptr(_bases);
 
+   /*
+* If the timer is not restarted then reprogramming is
+* required if the timer is local. If it is local and about
+* to be restarted, avoid programming it twice (on removal
+* and a moment later when it's requeued).
+*/
if (!restart)
state = HRTIMER_STATE_INACTIVE;
+   else
+   reprogram &= !keep_local;
 
__remove_hrtimer(timer, base, state, reprogram);
return 1;
@@ -1103,9 +1112,31 @@ static int __hrtimer_start_range_ns(stru
struct hrtimer_clock_base *base)
 {
struct hrtimer_clock_base *new_base;
+   bool force_local, first;
 
-   /* Remove an active timer from the queue: */
-   remove_hrtimer(timer, base, true);
+   /*
+* If the timer is on the local cpu base and is the first expiring
+* timer then this might end up reprogramming the hardware twice
+* (on removal and on enqueue). To avoid that by prevent the
+* reprogram on removal, keep the timer local to the current CPU
+* and enforce reprogramming after it is queued no matter whether
+* it is the new first expiring timer again or not.
+*/
+   force_local = base->cpu_base == this_cpu_ptr(_bases);
+   force_local &= base->cpu_base->next_timer == timer;
+
+   /*
+* Remove an active timer from the queue. In case it is not queued
+* on the current CPU, make sure that remove_hrtimer() updates the
+* remote data correctly.
+*
+* If it's on the current CPU and the first expiring timer, then
+* skip reprogramming, keep the timer local and enforce
+* reprogramming later if it was the first expiring timer.  This
+* avoids programming the underlying clock event twice (once at
+* removal and once after enqueue).
+*/
+   remove_hrtimer(timer, base, true, force_local);
 
if (mode &

[PATCH] ata: pata_rb532: Add OF support and make COMPILE_TESTable

2021-04-20 Thread Thomas Bogendoerfer

Add OF support for switching RB532 do device tree possible. By removing
the not needed asm/mach-rc32434/rb.h include the driver could be
compile tested now.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/ata/Kconfig |  2 +-
 drivers/ata/pata_rb532_cf.c | 11 +--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index 030cb32da980..53f40f92e4eb 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -1139,7 +1139,7 @@ config PATA_QDI
 
 config PATA_RB532
tristate "RouterBoard 532 PATA CompactFlash support"
-   depends on MIKROTIK_RB532
+   depends on MIKROTIK_RB532 || COMPILE_TEST
help
  This option enables support for the RouterBoard 532
  PATA CompactFlash controller.
diff --git a/drivers/ata/pata_rb532_cf.c b/drivers/ata/pata_rb532_cf.c
index 479c4b29b856..93d839ab9654 100644
--- a/drivers/ata/pata_rb532_cf.c
+++ b/drivers/ata/pata_rb532_cf.c
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-#include 
-
 #define DRV_NAME   "pata-rb532-cf"
 #define DRV_VERSION"0.1.0"
 #define DRV_DESC   "PATA driver for RouterBOARD 532 Compact Flash"
@@ -164,11 +162,20 @@ static int rb532_pata_driver_remove(struct 
platform_device *pdev)
return 0;
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id pata_rb532_match[] = {
+   { .compatible = "mikrotik,rb532-pata", },
+   { }
+};
+MODULE_DEVICE_TABLE(of, pata_rb532_match);
+#endif
+
 static struct platform_driver rb532_pata_platform_driver = {
.probe  = rb532_pata_driver_probe,
.remove = rb532_pata_driver_remove,
.driver  = {
.name   = DRV_NAME,
+   .of_match_table = of_match_ptr(pata_rb532_match),
},
 };
 
-- 
2.29.2

[PATCH 2/2] dt-bindings: gpio: Add devicetree binding for IDT 79RC3243x GPIO controller

2021-04-20 Thread Thomas Bogendoerfer

Add YAML devicetree binding for IDT 79RC3243x GPIO controller

Signed-off-by: Thomas Bogendoerfer 
---
 .../bindings/gpio/gpio-idt3243x.yaml  | 73 +++
 1 file changed, 73 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpio/gpio-idt3243x.yaml

diff --git a/Documentation/devicetree/bindings/gpio/gpio-idt3243x.yaml 
b/Documentation/devicetree/bindings/gpio/gpio-idt3243x.yaml
new file mode 100644
index ..346a57ef8298
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio-idt3243x.yaml
@@ -0,0 +1,73 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/gpio-idt3243x.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: IDT 79RC32434x GPIO controller
+
+maintainers:
+  - Thomas Bogendoerfer 
+
+properties:
+  compatible:
+const: idt,3243x-gpio
+
+  reg:
+maxItems: 2
+
+  reg-names:
+items:
+  - const: gpio
+  - const: pic
+
+  gpio-controller: true
+
+  "#gpio-cells":
+const: 2
+
+  ngpios:
+description:
+  Number of available gpios in a bank.
+minimum: 1
+maximum: 32
+
+  interrupt-controller: true
+
+  "#interrupt-cells":
+const: 2
+
+  interrupts:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - gpio-controller
+  - "#gpio-cells"
+  - ngpios
+  - interrupt-controller
+  - "#interrupt-cells"
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+gpio0: interrupt-controller@5 {
+compatible = "idt,3243x-gpio";
+reg = <0x5 0x14>, <0x38030 0x0c>;
+reg-names = "gpio", "pic";
+
+interrupt-controller;
+#interrupt-cells = <2>;
+
+interrupt-parent = <>;
+interrupts = <6>;
+
+gpio-controller;
+#gpio-cells = <2>;
+
+ngpios = <14>;
+};
-- 
2.29.2

[PATCH 1/2] gpio: Add support for IDT 79RC3243x GPIO controller

2021-04-20 Thread Thomas Bogendoerfer

IDT 79RC3243x SoCs integrated a gpio controller, which handles up
to 32 gpios. All gpios could be used as interrupt source.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/gpio/Kconfig |  10 ++
 drivers/gpio/Makefile|   1 +
 drivers/gpio/gpio-idt3243x.c | 210 +++
 3 files changed, 221 insertions(+)
 create mode 100644 drivers/gpio/gpio-idt3243x.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index e3607ec4c2e8..6847a06ffcfe 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -770,6 +770,16 @@ config GPIO_MSC313
  Say Y here to support the main GPIO block on MStar/SigmaStar
  ARMv7 based SoCs.
 
+config GPIO_IDT3243X
+   bool "IDT 79RC3243X GPIO support"
+   default y if MIKROTIK_RB532
+   depends on MIKROTIK_RB532 || COMPILE_TEST
+   select GPIO_GENERIC
+   select GPIOLIB_IRQCHIP
+   help
+ Select this option to enable GPIO driver for
+ IDT 79RC3243X SoC devices.
+
 endmenu
 
 menu "Port-mapped I/O GPIO drivers"
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index c58a90a3c3b1..75dd9c5665c5 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_GPIO_HISI) += gpio-hisi.o
 obj-$(CONFIG_GPIO_HLWD)+= gpio-hlwd.o
 obj-$(CONFIG_HTC_EGPIO)+= gpio-htc-egpio.o
 obj-$(CONFIG_GPIO_ICH) += gpio-ich.o
+obj-$(CONFIG_GPIO_IDT3243X)+= gpio-idt3243x.o
 obj-$(CONFIG_GPIO_IOP) += gpio-iop.o
 obj-$(CONFIG_GPIO_IT87)+= gpio-it87.o
 obj-$(CONFIG_GPIO_IXP4XX)  += gpio-ixp4xx.o
diff --git a/drivers/gpio/gpio-idt3243x.c b/drivers/gpio/gpio-idt3243x.c
new file mode 100644
index ..eaee46480268
--- /dev/null
+++ b/drivers/gpio/gpio-idt3243x.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for IDT/Renesas 79RC3243x Interrupt Controller.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define IDT_GPIO_NR_IRQS   32
+
+#define IDT_PIC_IRQ_PEND   0x00
+#define IDT_PIC_IRQ_MASK   0x08
+
+#define IDT_GPIO_DIR   0x04
+#define IDT_GPIO_DATA  0x08
+#define IDT_GPIO_ILEVEL0x0C
+#define IDT_GPIO_ISTAT 0x10
+
+struct idt_gpio_ctrl {
+   struct gpio_chip gc;
+   void __iomem *pic;
+   void __iomem *gpio;
+   u32 mask_cache;
+};
+
+static struct idt_gpio_ctrl *irq_data_to_idt_gpio(struct irq_data *d)
+{
+   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
+
+   return container_of(gc, struct idt_gpio_ctrl, gc);
+}
+
+static void idt_gpio_dispatch(struct irq_desc *desc)
+{
+   struct gpio_chip *gc = irq_desc_get_handler_data(desc);
+   struct idt_gpio_ctrl *ctrl = container_of(gc, struct idt_gpio_ctrl, gc);
+   struct irq_chip *host_chip = irq_desc_get_chip(desc);
+   u32 pending, hwirq, virq;
+
+   chained_irq_enter(host_chip, desc);
+
+   pending = readl(ctrl->pic + IDT_PIC_IRQ_PEND);
+   pending &= ~ctrl->mask_cache;
+   while (pending) {
+   hwirq = __fls(pending);
+   virq = irq_linear_revmap(gc->irq.domain, hwirq);
+   if (virq)
+   generic_handle_irq(virq);
+   pending &= ~(1 << hwirq);
+   }
+
+   chained_irq_exit(host_chip, desc);
+}
+
+static int idt_gpio_irq_set_type(struct irq_data *d, unsigned int flow_type)
+{
+   struct idt_gpio_ctrl *ctrl = irq_data_to_idt_gpio(d);
+   unsigned int sense = flow_type & IRQ_TYPE_SENSE_MASK;
+   u32 ilevel;
+
+   if (sense & ~(IRQ_TYPE_LEVEL_HIGH | IRQ_TYPE_LEVEL_LOW))
+   return -EINVAL;
+
+   ilevel = readl(ctrl->gpio + IDT_GPIO_ILEVEL);
+   if (sense & IRQ_TYPE_LEVEL_HIGH)
+   ilevel |= BIT(d->hwirq);
+   else if (sense & IRQ_TYPE_LEVEL_LOW)
+   ilevel &= ~BIT(d->hwirq);
+   else
+   return -EINVAL;
+
+   writel(ilevel, ctrl->gpio + IDT_GPIO_ILEVEL);
+   return 0;
+}
+
+static void idt_gpio_ack(struct irq_data *d)
+{
+   struct idt_gpio_ctrl *ctrl = irq_data_to_idt_gpio(d);
+
+   writel(~BIT(d->hwirq), ctrl->gpio + IDT_GPIO_ISTAT);
+}
+
+static void idt_gpio_mask(struct irq_data *d)
+{
+   struct idt_gpio_ctrl *ctrl = irq_data_to_idt_gpio(d);
+
+   ctrl->mask_cache |= BIT(d->hwirq);
+   writel(ctrl->mask_cache, ctrl->pic + IDT_PIC_IRQ_MASK);
+}
+
+static void idt_gpio_unmask(struct irq_data *d)
+{
+   struct idt_gpio_ctrl *ctrl = irq_data_to_idt_gpio(d);
+
+   ctrl->mask_cache &= ~BIT(d->hwirq);
+   writel(ctrl->mask_cache, ctrl->pic + IDT_PIC_IRQ_MASK);
+}
+
+static struct irq_chip idt_gpio_irqchip = {
+   .name

[PATCH] irqchip: Add support for IDT 79rc3243x interrupt controller

2021-04-20 Thread Thomas Bogendoerfer

IDT 79rc3243x SoCs have rather simple interrupt controllers connected
to the MIPS CPU interrupt lines. Each of them has room for up to
32 interrupts.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/irqchip/Kconfig|   5 ++
 drivers/irqchip/Makefile   |   1 +
 drivers/irqchip/irq-idt3243x.c | 124 +
 3 files changed, 130 insertions(+)
 create mode 100644 drivers/irqchip/irq-idt3243x.c

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index e74fa206240a..55562b36bf3c 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -586,4 +586,9 @@ config MST_IRQ
help
  Support MStar Interrupt Controller.
 
+config IRQ_IDT3243X
+   bool
+   select GENERIC_IRQ_CHIP
+   select IRQ_DOMAIN
+
 endmenu
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index c59b95a0532c..341891443eec 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -113,3 +113,4 @@ obj-$(CONFIG_LOONGSON_PCH_MSI)  += 
irq-loongson-pch-msi.o
 obj-$(CONFIG_MST_IRQ)  += irq-mst-intc.o
 obj-$(CONFIG_SL28CPLD_INTC)+= irq-sl28cpld.o
 obj-$(CONFIG_MACH_REALTEK_RTL) += irq-realtek-rtl.o
+obj-$(CONFIG_IRQ_IDT3243X) += irq-idt3243x.o
diff --git a/drivers/irqchip/irq-idt3243x.c b/drivers/irqchip/irq-idt3243x.c
new file mode 100644
index ..61caf21ef46c
--- /dev/null
+++ b/drivers/irqchip/irq-idt3243x.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for IDT/Renesas 79RC3243x Interrupt Controller.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define IDT_PIC_NR_IRQS32
+
+#define IDT_PIC_IRQ_PEND   0x00
+#define IDT_PIC_IRQ_MASK   0x08
+
+struct idt_pic_data {
+   void __iomem *base;
+   struct irq_domain *irq_domain;
+   struct irq_chip_generic *gc;
+};
+
+static void idt_irq_dispatch(struct irq_desc *desc)
+{
+   struct idt_pic_data *idtpic = irq_desc_get_handler_data(desc);
+   struct irq_chip *host_chip = irq_desc_get_chip(desc);
+   u32 pending, hwirq, virq;
+
+   chained_irq_enter(host_chip, desc);
+
+   pending = irq_reg_readl(idtpic->gc, IDT_PIC_IRQ_PEND);
+   pending &= ~idtpic->gc->mask_cache;
+   while (pending) {
+   hwirq = __fls(pending);
+   virq = irq_linear_revmap(idtpic->irq_domain, hwirq);
+   if (virq)
+   generic_handle_irq(virq);
+   pending &= ~(1 << hwirq);
+   }
+
+   chained_irq_exit(host_chip, desc);
+}
+
+static int idt_pic_init(struct device_node *of_node, struct device_node 
*parent)
+{
+   struct irq_domain *domain;
+   struct idt_pic_data *idtpic;
+   struct irq_chip_generic *gc;
+   struct irq_chip_type *ct;
+   unsigned int parent_irq;
+   int ret = 0;
+
+   idtpic = kzalloc(sizeof(*idtpic), GFP_KERNEL);
+   if (!idtpic) {
+   ret = -ENOMEM;
+   goto out_err;
+   }
+
+   parent_irq = irq_of_parse_and_map(of_node, 0);
+   if (!parent_irq) {
+   pr_err("Failed to map parent IRQ!\n");
+   ret = -EINVAL;
+   goto out_free;
+   }
+
+   idtpic->base = of_iomap(of_node, 0);
+   if (!idtpic->base) {
+   pr_err("Failed to map base address!\n");
+   ret = -ENOMEM;
+   goto out_unmap_irq;
+   }
+
+   domain = irq_domain_add_linear(of_node, IDT_PIC_NR_IRQS,
+  _generic_chip_ops, NULL);
+   if (!domain) {
+   pr_err("Failed to add irqdomain!\n");
+   ret = -ENOMEM;
+   goto out_iounmap;
+   }
+   idtpic->irq_domain = domain;
+
+   ret = irq_alloc_domain_generic_chips(domain, 32, 1, "IDTPIC",
+handle_level_irq, 0,
+IRQ_NOPROBE | IRQ_LEVEL, 0);
+   if (ret)
+   goto out_domain_remove;
+
+   gc = irq_get_domain_generic_chip(domain, 0);
+   gc->reg_base = idtpic->base;
+   gc->private = idtpic;
+
+   ct = gc->chip_types;
+   ct->regs.mask = IDT_PIC_IRQ_MASK;
+   ct->chip.irq_mask = irq_gc_mask_set_bit;
+   ct->chip.irq_unmask = irq_gc_mask_clr_bit;
+   idtpic->gc = gc;
+
+   /* Mask interrupts. */
+   writel(0x, idtpic->base + IDT_PIC_IRQ_MASK);
+   gc->mask_cache = 0x;
+
+   irq_set_chained_handler_and_data(parent_irq,
+idt_irq_dispatch, idtpic);
+
+   return 0;
+
+out_domain_remove:
+   irq_domain_remove(domain);
+out_iounmap:
+   iounmap(idtpic->base);
+out_unmap_irq:
+   irq_dispose_mapping(parent_irq);
+

Re: build failure of malta_qemu_32r6_defconfig

2021-04-20 Thread Thomas Bogendoerfer

On Sun, Apr 18, 2021 at 12:01:36AM +0100, Sudip Mukherjee wrote:
> Hi Thomas,
> 
> On Fri, Apr 9, 2021 at 1:17 PM Thomas Bogendoerfer
>  wrote:
> >
> > On Thu, Apr 08, 2021 at 09:42:11AM +0800, YunQiang Su wrote:
> > > Sudip Mukherjee  于2021年4月8日周四 上午2:26写道：
> > > >
> > > > Hi Thomas,
> > > >
> > > > I was building v5.10.28 with malta_qemu_32r6_defconfig and noticed that
> > > > it fails to build, so tried next-20210407 to see if it has been fixed.
> > > > But linux-next also has the issue with gcc-10.
> > > >
> > > > The error is:
> > > >
> > > > ./arch/mips/include/asm/vdso/gettimeofday.h: In function 
> > > > '__vdso_clock_gettime':
> > > > ./arch/mips/include/asm/vdso/gettimeofday.h:103:2: error: the register 
> > > > 'lo' cannot be clobbered in 'asm' for the current target
> > > >   103 |  asm volatile(
> > > >   |  ^~~
> > >
> > > this operation try to save lo and hi register, while they are not
> > > exisiting on r6.
> > > We are working on figure out a patch for it.
> >
> > looks like there is already a patch in patchwork, which just needs
> > a workup:
> >
> > https://patchwork.kernel.org/project/linux-mips/patch/20200801154401.4177009-1-romain.na...@gmail.com/
> 
> Looks like there has been no response to it since last 8 months. Do
> you want me to respin it and send a proper patch?

that would me fantastic and much appreciated :-)

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

2021-04-20 Thread Thomas Gleixner

On Mon, Apr 19 2021 at 20:12, Maciej Żenczykowski wrote:
> On Thu, Apr 15, 2021 at 9:47 AM Thomas Gleixner  wrote:
>> Run the test on a kernels with and without that commit and collect trace
>> data for both.
>>
>> That should give me a pretty clear picture what's going on.
>
> Lorenzo is trying to get the traces you asked for, or rather he’s
> trying to get confirmation he can post them.
>
> Our initial observation of these results seems to suggest that
> updating the timer (hrtimer_start, which seems to also call
> hrtimer_cancel) takes twice as long as it used to.

Which contradicts my measurements. The change in complexity is marginal
and the overhead in cycles/instructions is close to noise. It's
measurable in a microbenchmark, but it's in the < 1% range which is far
away from the 60% you are seing.

> My gut feeling is that softirq_activated is usually false, and the old
> code in such a case calls just __hrtimer_get_next_event(,
> HRTIMER_ACTIVE_ALL).  While the new code will first call
> __hrtimer_get_next_event(, HRTIMER_ACTIVE_SOFT) and then
> __hrtimer_get_next_event(, HRTIMER_ACTIVE_HARD)
>
> Perhaps __hrtimer_get_next_event() should return both soft and hard
> event times in one function call?
> Or perhaps hrtimer_start should not call hrtimer_cancel?

Perhaps we do a proper analysis first :)

Thanks,

tglx

Re: [PATCH v6] hrtimer: avoid retrigger_next_event IPI

2021-04-19 Thread Thomas Gleixner

On Mon, Apr 19 2021 at 16:39, Marcelo Tosatti wrote:
>  
> +static void clock_was_set_force_reprogram_work(struct work_struct *work)
> +{
> + clock_was_set(true);
> +}
> +
> +static DECLARE_WORK(hrtimer_force_reprogram_work, 
> clock_was_set_force_reprogram_work);
> +
> +
>  static void clock_was_set_work(struct work_struct *work)
>  {
> - clock_was_set();
> + clock_was_set(false);
>  }
>  
>  static DECLARE_WORK(hrtimer_work, clock_was_set_work);
> @@ -769,9 +777,12 @@ static DECLARE_WORK(hrtimer_work, clock_was_set_work);
>   * Called from timekeeping and resume code to reprogram the hrtimer
>   * interrupt device on all cpus.
>   */
> -void clock_was_set_delayed(void)
> +void clock_was_set_delayed(bool force_reprogram)
>  {
> - schedule_work(_work);
> + if (force_reprogram)
> + schedule_work(_force_reprogram_work);
> + else
> + schedule_work(_work);
>  }
>  
>  #else
> @@ -871,6 +882,18 @@ static void hrtimer_reprogram(struct hrtimer *timer, 
> bool reprogram)
>   tick_program_event(expires, 1);
>  }
>  
> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) | \
> +  (1U << HRTIMER_BASE_REALTIME_SOFT) |   \
> +  (1U << HRTIMER_BASE_TAI) | \
> +  (1U << HRTIMER_BASE_TAI_SOFT) |\
> +  (1U << HRTIMER_BASE_BOOTTIME) |\
> +  (1U << HRTIMER_BASE_BOOTTIME_SOFT))
> +
> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> +{
> + return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
> +}
> +
>  /*
>   * Clock realtime was set
>   *
> @@ -882,11 +905,42 @@ static void hrtimer_reprogram(struct hrtimer *timer, 
> bool reprogram)
>   * resolution timer interrupts. On UP we just disable interrupts and
>   * call the high resolution interrupt code.
>   */
> -void clock_was_set(void)
> +void clock_was_set(bool force_reprogram)
>  {
>  #ifdef CONFIG_HIGH_RES_TIMERS
> - /* Retrigger the CPU local events everywhere */
> - on_each_cpu(retrigger_next_event, NULL, 1);
> + cpumask_var_t mask;
> + int cpu;
> +
> + if (force_reprogram == true) {
> + on_each_cpu(retrigger_next_event, NULL, 1);
> + goto set_timerfd;
> + }
> +
> + if (!zalloc_cpumask_var(, GFP_KERNEL)) {
> + on_each_cpu(retrigger_next_event, NULL, 1);
> + goto set_timerfd;
> + }

This is really horrible and the proper thing to do is to seperate the
s2idle/resume related functionality which allows to do some other
cleanups as well.

I already started to look at that over the weekend, but got stuck due to
a couple of correctness issues I found with the whole clock_was_set()
mechanism while looking. That stuff needs to be fixed first.

Back to the above. For the s2idle resume path there is no reason for an
IPI at all. It's just that way because it has been bolted on the
existing machinery.

So today it does:

   tick_unfreeze() {
 if (first_cpu_to_unfreeze()) {
timekeeping_resume();
  tick_resume();
tick_resume_local();
  clock_was_set_delayed();
 } else {
tick_resume_local();
 }
   }

Then after everything is unfrozen including workqueues the delayed
clock_was_set() runs and does the IPI dance. That's just silly.

We can be smarter than that:

   tick_unfreeze() {
 if (first_cpu_to_unfreeze())
timekeeping_resume();
  tick_resume();
tick_resume_local();
  hrtimer_resume_local();
 } else {
tick_resume_local();
  hrtimer_resume_local();
 }
   }

See? No IPI required at all and no magic force argument and as a bonus
we get also understandable code.

Splitting this properly allows to be smarter about the affected clocks
because most of the clock_was_set() events only care about
CLOCK_REALTIME and CLOCK_TAI and just the sleep time injection affects
CLOCK_BOOTTIME as well.

There are a few other things which can be done to avoid even more IPIs,
but let me address the correctness issues of clock_was_set() first.

Thanks,

tglx

Re: [syzbot] WARNING in kthread_is_per_cpu

2021-04-19 Thread Thomas Gleixner

On Mon, Apr 19 2021 at 03:36, syzbot wrote:

> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:1216f02e Add linux-next specific files for 20210415
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1032ba29d0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3491b04113499f81
> dashboard link: https://syzkaller.appspot.com/bug?extid=9362b31a2e0cad8b749d
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+9362b31a2e0cad8b7...@syzkaller.appspotmail.com
>
> [ cut here ]
> WARNING: CPU: 1 PID: 23550 at kernel/kthread.c:83 to_kthread 
> kernel/kthread.c:83 [inline]
> WARNING: CPU: 1 PID: 23550 at kernel/kthread.c:83 
> kthread_is_per_cpu+0xc4/0xf0 kernel/kthread.c:519
> Modules linked in:
> CPU: 1 PID: 23550 Comm: syz-executor.3 Not tainted 
> 5.12.0-rc7-next-20210415-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:to_kthread kernel/kthread.c:83 [inline]
> RIP: 0010:kthread_is_per_cpu+0xc4/0xf0 kernel/kthread.c:519
> Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 75 2e 4c 8b 23 41 83 e4 01 
> e8 89 d3 27 00 44 89 e0 5b 5d 41 5c c3 e8 7c d3 27 00 <0f> 0b eb 88 e8 33 90 
> 6c 00 e9 68 ff ff ff e8 39 90 6c 00 eb 9a 48
> RSP: 0018:c9dc0c08 EFLAGS: 00010046
> RAX:  RBX: 88802533d580 RCX: 0100
> RDX: 8880549bb900 RSI: 814ca4c4 RDI: 0003
> RBP:  R08:  R09: 88802533d580
> R10: 814ca44c R11: 018a3b90 R12: 0001
> R13: c9dc0d90 R14: 0001 R15: 88802533d580
> FS:  7f4be57d3700() GS:8880b9d0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 001b2cd24000 CR3: 24626000 CR4: 001506e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  
>  can_migrate_task+0x124/0x1630 kernel/sched/fair.c:7610

So this is:

if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p))

The warning in to_kthread() is:

WARN_ON(!(k->flags & PF_KTHREAD));

IOW, the p>flags lost PF_KTHREAD within at max. 50 instructions.

Magic, cosmic rays or memory corruption / stray pointer in some other
place?

Thanks,

tglx

[RESEND PATCH] drm/rockchip: dsi: move all lane config except LCDC mux to bind()

2021-04-18 Thread Thomas Hebb

When we first enable the DSI encoder, we currently program some per-chip
configuration that we look up in rk3399_chip_data based on the device
tree compatible we match. This data configures various parameters of the
MIPI lanes, including on RK3399 whether DSI1 is slaved to DSI0 in a
dual-mode configuration. It also selects which LCDC (i.e. VOP) to scan
out from.

This causes a problem in RK3399 dual-mode configurations, though: panel
prepare() callbacks run before the encoder gets enabled and expect to be
able to write commands to the DSI bus, but the bus isn't fully
functional until the lane and master/slave configuration have been
programmed. As a result, dual-mode panels (and possibly others too) fail
to turn on when the rockchipdrm driver is initially loaded.

Because the LCDC mux is the only thing we don't know until enable time
(and is the only thing that can ever change), we can actually move most
of the initialization to bind() and get it out of the way early. That's
what this change does. (Rockchip's 4.4 BSP kernel does it in mode_set(),
which also avoids the issue, but bind() seems like the more correct
place to me.)

Tested on a Google Scarlet board (Acer Chromebook Tab 10), which has a
Kingdisplay KD097D04 dual-mode panel. Prior to this change, the panel's
backlight would turn on but no image would appear when initially loading
rockchipdrm. If I kept rockchipdrm loaded and reloaded the panel driver,
it would come on. With this change, the panel successfully turns on
during initial rockchipdrm load as expected.

Fixes: 2d4f7bdafd70 ("drm/rockchip: dsi: migrate to use dw-mipi-dsi bridge 
driver")
Signed-off-by: Thomas Hebb 
---

 .../gpu/drm/rockchip/dw-mipi-dsi-rockchip.c   | 36 ++-
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index 8cc81d5b82f0..520a0a0cd2b5 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -691,13 +691,8 @@ static const struct dw_mipi_dsi_phy_ops 
dw_mipi_dsi_rockchip_phy_ops = {
.get_timing = dw_mipi_dsi_phy_get_timing,
 };
 
-static void dw_mipi_dsi_rockchip_config(struct dw_mipi_dsi_rockchip *dsi,
-   int mux)
+static void dw_mipi_dsi_rockchip_config(struct dw_mipi_dsi_rockchip *dsi)
 {
-   if (dsi->cdata->lcdsel_grf_reg)
-   regmap_write(dsi->grf_regmap, dsi->cdata->lcdsel_grf_reg,
-   mux ? dsi->cdata->lcdsel_lit : dsi->cdata->lcdsel_big);
-
if (dsi->cdata->lanecfg1_grf_reg)
regmap_write(dsi->grf_regmap, dsi->cdata->lanecfg1_grf_reg,
  dsi->cdata->lanecfg1);
@@ -711,6 +706,13 @@ static void dw_mipi_dsi_rockchip_config(struct 
dw_mipi_dsi_rockchip *dsi,
  dsi->cdata->enable);
 }
 
+static void dw_mipi_dsi_rockchip_set_lcdsel(struct dw_mipi_dsi_rockchip *dsi,
+   int mux)
+{
+   regmap_write(dsi->grf_regmap, dsi->cdata->lcdsel_grf_reg,
+   mux ? dsi->cdata->lcdsel_lit : dsi->cdata->lcdsel_big);
+}
+
 static int
 dw_mipi_dsi_encoder_atomic_check(struct drm_encoder *encoder,
 struct drm_crtc_state *crtc_state,
@@ -766,9 +768,9 @@ static void dw_mipi_dsi_encoder_enable(struct drm_encoder 
*encoder)
return;
}
 
-   dw_mipi_dsi_rockchip_config(dsi, mux);
+   dw_mipi_dsi_rockchip_set_lcdsel(dsi, mux);
if (dsi->slave)
-   dw_mipi_dsi_rockchip_config(dsi->slave, mux);
+   dw_mipi_dsi_rockchip_set_lcdsel(dsi->slave, mux);
 
clk_disable_unprepare(dsi->grf_clk);
 }
@@ -922,6 +924,24 @@ static int dw_mipi_dsi_rockchip_bind(struct device *dev,
return ret;
}
 
+   /*
+* With the GRF clock running, write lane and dual-mode configurations
+* that won't change immediately. If we waited until enable() to do
+* this, things like panel preparation would not be able to send
+* commands over DSI.
+*/
+   ret = clk_prepare_enable(dsi->grf_clk);
+   if (ret) {
+   DRM_DEV_ERROR(dsi->dev, "Failed to enable grf_clk: %d\n", ret);
+   return ret;
+   }
+
+   dw_mipi_dsi_rockchip_config(dsi);
+   if (dsi->slave)
+   dw_mipi_dsi_rockchip_config(dsi->slave);
+
+   clk_disable_unprepare(dsi->grf_clk);
+
ret = rockchip_dsi_drm_create_encoder(dsi, drm_dev);
if (ret) {
DRM_DEV_ERROR(dev, "Failed to create drm encoder\n");
-- 
2.30.0

[RESEND PATCH] drm/rockchip: dsi: remove extra component_del() call

2021-04-18 Thread Thomas Hebb

commit cf6d100dd238 ("drm/rockchip: dsi: add dual mipi support") added
this devcnt field and call to component_del(). However, these both
appear to be erroneous changes left over from an earlier version of the
patch. In the version merged, nothing ever modifies devcnt, meaning
component_del() runs unconditionally and in addition to the
component_del() calls in dw_mipi_dsi_rockchip_host_detach(). The second
call fails to delete anything and produces a warning in dmesg.

If we look at the previous version of the patch[1], however, we see that
it had logic to calculate devcnt and call component_add() in certain
situations. This was removed in v6, and the fact that the deletion code
was not appears to have been an oversight.

[1] 
https://patchwork.kernel.org/project/dri-devel/patch/20180821140515.22246-8-he...@sntech.de/

Fixes: cf6d100dd238 ("drm/rockchip: dsi: add dual mipi support")
Cc: sta...@vger.kernel.org
Signed-off-by: Thomas Hebb 
---

 drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
index 24a71091759c..8cc81d5b82f0 100644
--- a/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw-mipi-dsi-rockchip.c
@@ -243,7 +243,6 @@ struct dw_mipi_dsi_rockchip {
struct dw_mipi_dsi *dmd;
const struct rockchip_dw_dsi_chip_data *cdata;
struct dw_mipi_dsi_plat_data pdata;
-   int devcnt;
 };
 
 struct dphy_pll_parameter_map {
@@ -1121,9 +1120,6 @@ static int dw_mipi_dsi_rockchip_remove(struct 
platform_device *pdev)
 {
struct dw_mipi_dsi_rockchip *dsi = platform_get_drvdata(pdev);
 
-   if (dsi->devcnt == 0)
-   component_del(dsi->dev, _mipi_dsi_rockchip_ops);
-
dw_mipi_dsi_remove(dsi->dmd);
 
return 0;
-- 
2.30.0

Re: [PATCH] genirq/irqaction: Move dev_id and percpu_dev_id in a union

2021-04-18 Thread Thomas Gleixner

On Sat, Apr 10 2021 at 13:00, Marc Zyngier wrote:
> dev_id and percpu_dev_id are mutually exclusive in struct irqaction,
> as they conceptually represent the same thing, only in a per-cpu
> fashion.
>
> Move them into an anonymous union, saving a few bytes on the way.

The reason why they are not in an anomymous union is that any misuse of
interfaces will result in an instantaneous explosion while with your
variant it will cause hard to diagnose side effects.

I rather waste the extra 4/8 bytes unless there is a compelling reason
not to do so.

Thanks,

tglx

[PATCH v6 net-next 09/10] net: korina: Make driver COMPILE_TESTable

2021-04-18 Thread Thomas Bogendoerfer

Move structs/defines for ethernet/dma register into driver, since they
are only used for this driver and remove any MIPS specific includes.
This makes it possible to COMPILE_TEST the driver.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/Kconfig  |   2 +-
 drivers/net/ethernet/korina.c | 249 --
 2 files changed, 239 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c059b4bd3f23..453d202a28c1 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -97,7 +97,7 @@ config JME
 
 config KORINA
tristate "Korina (IDT RC32434) Ethernet support"
-   depends on MIKROTIK_RB532
+   depends on MIKROTIK_RB532 || COMPILE_TEST
select MII
help
  If you have a Mikrotik RouterBoard 500 or IDT RC32434
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 19f226428f89..4878e527e3c8 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -59,18 +59,244 @@
 #include 
 #include 
 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
 #define DRV_NAME   "korina"
 #define DRV_VERSION"0.20"
 #define DRV_RELDATE"15Sep2017"
 
+struct eth_regs {
+   u32 ethintfc;
+   u32 ethfifott;
+   u32 etharc;
+   u32 ethhash0;
+   u32 ethhash1;
+   u32 ethu0[4];   /* Reserved. */
+   u32 ethpfs;
+   u32 ethmcp;
+   u32 eth_u1[10]; /* Reserved. */
+   u32 ethspare;
+   u32 eth_u2[42]; /* Reserved. */
+   u32 ethsal0;
+   u32 ethsah0;
+   u32 ethsal1;
+   u32 ethsah1;
+   u32 ethsal2;
+   u32 ethsah2;
+   u32 ethsal3;
+   u32 ethsah3;
+   u32 ethrbc;
+   u32 ethrpc;
+   u32 ethrupc;
+   u32 ethrfc;
+   u32 ethtbc;
+   u32 ethgpf;
+   u32 eth_u9[50]; /* Reserved. */
+   u32 ethmac1;
+   u32 ethmac2;
+   u32 ethipgt;
+   u32 ethipgr;
+   u32 ethclrt;
+   u32 ethmaxf;
+   u32 eth_u10;/* Reserved. */
+   u32 ethmtest;
+   u32 miimcfg;
+   u32 miimcmd;
+   u32 miimaddr;
+   u32 miimwtd;
+   u32 miimrdd;
+   u32 miimind;
+   u32 eth_u11;/* Reserved. */
+   u32 eth_u12;/* Reserved. */
+   u32 ethcfsa0;
+   u32 ethcfsa1;
+   u32 ethcfsa2;
+};
+
+/* Ethernet interrupt registers */
+#define ETH_INT_FC_EN  BIT(0)
+#define ETH_INT_FC_ITS BIT(1)
+#define ETH_INT_FC_RIP BIT(2)
+#define ETH_INT_FC_JAM BIT(3)
+#define ETH_INT_FC_OVR BIT(4)
+#define ETH_INT_FC_UND BIT(5)
+#define ETH_INT_FC_IOC 0x00c0
+
+/* Ethernet FIFO registers */
+#define ETH_FIFI_TT_TTH_BIT0
+#define ETH_FIFO_TT_TTH0x007f
+
+/* Ethernet ARC/multicast registers */
+#define ETH_ARC_PROBIT(0)
+#define ETH_ARC_AM BIT(1)
+#define ETH_ARC_AFMBIT(2)
+#define ETH_ARC_AB BIT(3)
+
+/* Ethernet SAL registers */
+#define ETH_SAL_BYTE_5 0x00ff
+#define ETH_SAL_BYTE_4 0xff00
+#define ETH_SAL_BYTE_3 0x00ff
+#define ETH_SAL_BYTE_2 0xff00
+
+/* Ethernet SAH registers */
+#define ETH_SAH_BYTE1  0x00ff
+#define ETH_SAH_BYTE0  0xff00
+
+/* Ethernet GPF register */
+#define ETH_GPF_PTV0x
+
+/* Ethernet PFG register */
+#define ETH_PFS_PFDBIT(0)
+
+/* Ethernet CFSA[0-3] registers */
+#define ETH_CFSA0_CFSA40x00ff
+#define ETH_CFSA0_CFSA50xff00
+#define ETH_CFSA1_CFSA20x00ff
+#define ETH_CFSA1_CFSA30xff00
+#define ETH_CFSA1_CFSA00x00ff
+#define ETH_CFSA1_CFSA10xff00
+
+/* Ethernet MAC1 registers */
+#define ETH_MAC1_REBIT(0)
+#define ETH_MAC1_PAF   BIT(1)
+#define ETH_MAC1_RFC   BIT(2)
+#define ETH_MAC1_TFC   BIT(3)
+#define ETH_MAC1_LBBIT(4)
+#define ETH_MAC1_MRBIT(31)
+
+/* Ethernet MAC2 registers */
+#define ETH_MAC2_FDBIT(0)
+#define ETH_MAC2_FLC   BIT(1)
+#define ETH_MAC2_HFE   BIT(2)
+#define ETH_MAC2_DCBIT(3)
+#define ETH_MAC2_CEN   BIT(4)
+#define ETH_MAC2_PEBIT(5)
+#define ETH_MAC2_VPE   BIT(6)
+#define ETH_MAC2_APE   BIT(7)
+#define ETH_MAC2_PPE   BIT(8)
+#define ETH_MAC2_LPE   BIT(9)
+#define ETH_MAC2_NBBIT(12)
+#define ETH_MAC2_BPBIT(13)
+#define ETH_MAC2_EDBIT(14)
+
+/* Ethernet IPGT register */
+#define ETH_IPGT   0x007f
+
+/* Ethernet IPGR registers */
+#define ETH_IPGR_IPGR2 0x007f
+#define ETH_IPGR_IPGR1 0x7f00
+
+/* Ethernet CLRT registers */
+#define ETH_CLRT_MAX_RET   0x000f
+#

[PATCH v6 net-next 10/10] dt-bindings: net: korina: Add DT bindings for IDT 79RC3243x SoCs

2021-04-18 Thread Thomas Bogendoerfer

Add device tree bindings for ethernet controller integrated into
IDT 79RC3243x SoCs.

Signed-off-by: Thomas Bogendoerfer 
---
 .../bindings/net/idt,3243x-emac.yaml  | 73 +++
 1 file changed, 73 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/idt,3243x-emac.yaml

diff --git a/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml 
b/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml
new file mode 100644
index ..11ffc306dd54
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml
@@ -0,0 +1,73 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/idt,3243x-emac.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: IDT 79rc3243x Ethernet controller
+
+description: Ethernet controller integrated into IDT 79RC3243x family SoCs
+
+maintainers:
+  - Thomas Bogendoerfer 
+
+allOf:
+  - $ref: ethernet-controller.yaml#
+
+properties:
+  compatible:
+const: idt,3243x-emac
+
+  reg:
+maxItems: 3
+
+  reg-names:
+items:
+  - const: emac
+  - const: dma_rx
+  - const: dma_tx
+
+  interrupts:
+items:
+  - description: RX interrupt
+  - description: TX interrupt
+
+  interrupt-names:
+items:
+  - const: rx
+  - const: tx
+
+  clocks:
+maxItems: 1
+
+  clock-names:
+items:
+  - const: mdioclk
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - interrupts
+  - interrupt-names
+
+additionalProperties: false
+
+examples:
+  - |
+
+ethernet@6 {
+compatible = "idt,3243x-emac";
+
+reg = <0x6 0x1>,
+  <0x4 0x14>,
+  <0x40014 0x14>;
+reg-names = "emac", "dma_rx", "dma_tx";
+
+interrupt-parent = <>;
+interrupts = <0>, <1>;
+interrupt-names = "rx", "tx";
+
+clocks = <>;
+clock-names = "mdioclk";
+};
-- 
2.29.2

[PATCH v6 net-next 08/10] net: korina: Get mdio input clock via common clock framework

2021-04-18 Thread Thomas Bogendoerfer

With device tree clock is provided via CCF. For non device tree
use a maximum clock value to not overclock the PHY. The non device
tree usage will go away after platform is converted to DT.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index a1f53d7753ae..19f226428f89 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -57,14 +57,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#include 
-#include 
 #include 
 #include 
 
@@ -146,10 +145,9 @@ struct korina_private {
struct work_struct restart_task;
struct net_device *dev;
struct device *dmadev;
+   int mii_clock_freq;
 };
 
-extern unsigned int idt_cpu_freq;
-
 static dma_addr_t korina_tx_dma(struct korina_private *lp, int idx)
 {
return lp->td_dma + (idx * sizeof(struct dma_desc));
@@ -899,8 +897,8 @@ static int korina_init(struct net_device *dev)
 
/* Management Clock Prescaler Divisor
 * Clock independent setting */
-   writel(((idt_cpu_freq) / MII_CLOCK + 1) & ~1,
-   >eth_regs->ethmcp);
+   writel(((lp->mii_clock_freq) / MII_CLOCK + 1) & ~1,
+  >eth_regs->ethmcp);
writel(0, >eth_regs->miimcfg);
 
/* don't transmit until fifo contains 48b */
@@ -1060,6 +1058,7 @@ static int korina_probe(struct platform_device *pdev)
u8 *mac_addr = dev_get_platdata(>dev);
struct korina_private *lp;
struct net_device *dev;
+   struct clk *clk;
void __iomem *p;
int rc;
 
@@ -1075,6 +1074,16 @@ static int korina_probe(struct platform_device *pdev)
else if (of_get_mac_address(pdev->dev.of_node, dev->dev_addr) < 0)
eth_hw_addr_random(dev);
 
+   clk = devm_clk_get_optional(>dev, "mdioclk");
+   if (IS_ERR(clk))
+   return PTR_ERR(clk);
+   if (clk) {
+   clk_prepare_enable(clk);
+   lp->mii_clock_freq = clk_get_rate(clk);
+   } else {
+   lp->mii_clock_freq = 2; /* max possible input clk */
+   }
+
lp->rx_irq = platform_get_irq_byname(pdev, "rx");
lp->tx_irq = platform_get_irq_byname(pdev, "tx");
 
-- 
2.29.2

[PATCH v6 net-next 06/10] net: korina: Only pass mac address via platform data

2021-04-18 Thread Thomas Bogendoerfer

Get rid of access to struct korina_device by just passing the mac
address via platform data and use drvdata for passing netdev to remove
function.

Signed-off-by: Thomas Bogendoerfer 
---
 arch/mips/rb532/devices.c |  5 +++--
 drivers/net/ethernet/korina.c | 11 ++-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/mips/rb532/devices.c b/arch/mips/rb532/devices.c
index dd34f1b32b79..5fc3c8ee4f31 100644
--- a/arch/mips/rb532/devices.c
+++ b/arch/mips/rb532/devices.c
@@ -105,6 +105,9 @@ static struct platform_device korina_dev0 = {
.name = "korina",
.resource = korina_dev0_res,
.num_resources = ARRAY_SIZE(korina_dev0_res),
+   .dev = {
+   .platform_data = _dev0_data.mac,
+   }
 };
 
 static struct resource cf_slot0_res[] = {
@@ -299,8 +302,6 @@ static int __init plat_setup_devices(void)
/* set the uart clock to the current cpu frequency */
rb532_uart_res[0].uartclk = idt_cpu_freq;
 
-   dev_set_drvdata(_dev0.dev, _dev0_data);
-
gpiod_add_lookup_table(_slot0_gpio_table);
return platform_add_devices(rb532_devs, ARRAY_SIZE(rb532_devs));
 }
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 44fad9e924ca..d6dbbdd43d7c 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -1055,7 +1055,7 @@ static const struct net_device_ops korina_netdev_ops = {
 
 static int korina_probe(struct platform_device *pdev)
 {
-   struct korina_device *bif = platform_get_drvdata(pdev);
+   u8 *mac_addr = dev_get_platdata(>dev);
struct korina_private *lp;
struct net_device *dev;
void __iomem *p;
@@ -1068,8 +1068,7 @@ static int korina_probe(struct platform_device *pdev)
SET_NETDEV_DEV(dev, >dev);
lp = netdev_priv(dev);
 
-   bif->dev = dev;
-   memcpy(dev->dev_addr, bif->mac, ETH_ALEN);
+   memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
 
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
@@ -1123,6 +1122,8 @@ static int korina_probe(struct platform_device *pdev)
lp->mii_if.phy_id_mask = 0x1f;
lp->mii_if.reg_num_mask = 0x1f;
 
+   platform_set_drvdata(pdev, dev);
+
rc = register_netdev(dev);
if (rc < 0) {
printk(KERN_ERR DRV_NAME
@@ -1140,9 +1141,9 @@ static int korina_probe(struct platform_device *pdev)
 
 static int korina_remove(struct platform_device *pdev)
 {
-   struct korina_device *bif = platform_get_drvdata(pdev);
+   struct net_device *dev = platform_get_drvdata(pdev);
 
-   unregister_netdev(bif->dev);
+   unregister_netdev(dev);
 
return 0;
 }
-- 
2.29.2

[PATCH v6 net-next 05/10] net: korina: Use DMA API

2021-04-18 Thread Thomas Bogendoerfer

Instead of messing with MIPS specific macros use DMA API for mapping
descriptors and skbs.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 158 +-
 1 file changed, 98 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 955fe3d3da06..44fad9e924ca 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -110,10 +110,15 @@ struct korina_private {
struct dma_reg __iomem *tx_dma_regs;
struct dma_desc *td_ring; /* transmit descriptor ring */
struct dma_desc *rd_ring; /* receive descriptor ring  */
+   dma_addr_t td_dma;
+   dma_addr_t rd_dma;
 
struct sk_buff *tx_skb[KORINA_NUM_TDS];
struct sk_buff *rx_skb[KORINA_NUM_RDS];
 
+   dma_addr_t rx_skb_dma[KORINA_NUM_RDS];
+   dma_addr_t tx_skb_dma[KORINA_NUM_TDS];
+
int rx_next_done;
int rx_chain_head;
int rx_chain_tail;
@@ -138,10 +143,21 @@ struct korina_private {
struct mii_if_info mii_if;
struct work_struct restart_task;
struct net_device *dev;
+   struct device *dmadev;
 };
 
 extern unsigned int idt_cpu_freq;
 
+static dma_addr_t korina_tx_dma(struct korina_private *lp, int idx)
+{
+   return lp->td_dma + (idx * sizeof(struct dma_desc));
+}
+
+static dma_addr_t korina_rx_dma(struct korina_private *lp, int idx)
+{
+   return lp->rd_dma + (idx * sizeof(struct dma_desc));
+}
+
 static inline void korina_abort_dma(struct net_device *dev,
struct dma_reg *ch)
 {
@@ -176,14 +192,17 @@ static void korina_abort_rx(struct net_device *dev)
 static int korina_send_packet(struct sk_buff *skb, struct net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
-   unsigned long flags;
-   u32 length;
u32 chain_prev, chain_next;
+   unsigned long flags;
struct dma_desc *td;
+   dma_addr_t ca;
+   u32 length;
+   int idx;
 
spin_lock_irqsave(>lock, flags);
 
-   td = >td_ring[lp->tx_chain_tail];
+   idx = lp->tx_chain_tail;
+   td = >td_ring[idx];
 
/* stop queue when full, drop pkts if queue already full */
if (lp->tx_count >= (KORINA_NUM_TDS - 2)) {
@@ -191,26 +210,26 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
 
if (lp->tx_count == (KORINA_NUM_TDS - 2))
netif_stop_queue(dev);
-   else {
-   dev->stats.tx_dropped++;
-   dev_kfree_skb_any(skb);
-   spin_unlock_irqrestore(>lock, flags);
-
-   return NETDEV_TX_OK;
-   }
+   else
+   goto drop_packet;
}
 
lp->tx_count++;
 
-   lp->tx_skb[lp->tx_chain_tail] = skb;
+   lp->tx_skb[idx] = skb;
 
length = skb->len;
-   dma_cache_wback((u32)skb->data, skb->len);
 
/* Setup the transmit descriptor. */
-   td->ca = CPHYSADDR(skb->data);
-   chain_prev = (lp->tx_chain_tail - 1) & KORINA_TDS_MASK;
-   chain_next = (lp->tx_chain_tail + 1) & KORINA_TDS_MASK;
+   ca = dma_map_single(lp->dmadev, skb->data, length, DMA_TO_DEVICE);
+   if (dma_mapping_error(lp->dmadev, ca))
+   goto drop_packet;
+
+   lp->tx_skb_dma[idx] = ca;
+   td->ca = ca;
+
+   chain_prev = (idx - 1) & KORINA_TDS_MASK;
+   chain_next = (idx + 1) & KORINA_TDS_MASK;
 
if (readl(&(lp->tx_dma_regs->dmandptr)) == 0) {
if (lp->tx_chain_status == desc_empty) {
@@ -220,8 +239,8 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
/* Move tail */
lp->tx_chain_tail = chain_next;
/* Write to NDPTR */
-   writel(CPHYSADDR(>td_ring[lp->tx_chain_head]),
-   >tx_dma_regs->dmandptr);
+   writel(korina_tx_dma(lp, lp->tx_chain_head),
+  >tx_dma_regs->dmandptr);
/* Move head to tail */
lp->tx_chain_head = lp->tx_chain_tail;
} else {
@@ -232,12 +251,12 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
lp->td_ring[chain_prev].control &=
~DMA_DESC_COF;
/* Link to prev */
-   lp->td_ring[chain_prev].link =  CPHYSADDR(td);
+   lp->td_ring[chain_prev].link = korina_tx_dma(lp, idx);
/* Move tail */
lp->tx_chain_tail = chain_next;

[PATCH v6 net-next 07/10] net: korina: Add support for device tree

2021-04-18 Thread Thomas Bogendoerfer

If there is no mac address passed via platform data try to get it via
device tree and fall back to a random mac address, if all fail.

Signed-off-by: Thomas Bogendoerfer 
---
 arch/mips/rb532/devices.c | 20 +---
 drivers/net/ethernet/korina.c | 32 +---
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/mips/rb532/devices.c b/arch/mips/rb532/devices.c
index 5fc3c8ee4f31..04684990e28e 100644
--- a/arch/mips/rb532/devices.c
+++ b/arch/mips/rb532/devices.c
@@ -58,37 +58,27 @@ EXPORT_SYMBOL(get_latch_u5);
 
 static struct resource korina_dev0_res[] = {
{
-   .name = "korina_regs",
+   .name = "emac",
.start = ETH0_BASE_ADDR,
.end = ETH0_BASE_ADDR + sizeof(struct eth_regs),
.flags = IORESOURCE_MEM,
 }, {
-   .name = "korina_rx",
+   .name = "rx",
.start = ETH0_DMA_RX_IRQ,
.end = ETH0_DMA_RX_IRQ,
.flags = IORESOURCE_IRQ
}, {
-   .name = "korina_tx",
+   .name = "tx",
.start = ETH0_DMA_TX_IRQ,
.end = ETH0_DMA_TX_IRQ,
.flags = IORESOURCE_IRQ
}, {
-   .name = "korina_ovr",
-   .start = ETH0_RX_OVR_IRQ,
-   .end = ETH0_RX_OVR_IRQ,
-   .flags = IORESOURCE_IRQ
-   }, {
-   .name = "korina_und",
-   .start = ETH0_TX_UND_IRQ,
-   .end = ETH0_TX_UND_IRQ,
-   .flags = IORESOURCE_IRQ
-   }, {
-   .name = "korina_dma_rx",
+   .name = "dma_rx",
.start = ETH0_RX_DMA_ADDR,
.end = ETH0_RX_DMA_ADDR + DMA_CHAN_OFFSET - 1,
.flags = IORESOURCE_MEM,
 }, {
-   .name = "korina_dma_tx",
+   .name = "dma_tx",
.start = ETH0_TX_DMA_ADDR,
.end = ETH0_TX_DMA_ADDR + DMA_CHAN_OFFSET - 1,
.flags = IORESOURCE_MEM,
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index d6dbbdd43d7c..a1f53d7753ae 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -43,6 +43,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -1068,26 +1070,29 @@ static int korina_probe(struct platform_device *pdev)
SET_NETDEV_DEV(dev, >dev);
lp = netdev_priv(dev);
 
-   memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
+   if (mac_addr)
+   ether_addr_copy(dev->dev_addr, mac_addr);
+   else if (of_get_mac_address(pdev->dev.of_node, dev->dev_addr) < 0)
+   eth_hw_addr_random(dev);
 
-   lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
-   lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
+   lp->rx_irq = platform_get_irq_byname(pdev, "rx");
+   lp->tx_irq = platform_get_irq_byname(pdev, "tx");
 
-   p = devm_platform_ioremap_resource_byname(pdev, "korina_regs");
+   p = devm_platform_ioremap_resource_byname(pdev, "emac");
if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap registers\n");
return -ENOMEM;
}
lp->eth_regs = p;
 
-   p = devm_platform_ioremap_resource_byname(pdev, "korina_dma_rx");
+   p = devm_platform_ioremap_resource_byname(pdev, "dma_rx");
if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap Rx DMA registers\n");
return -ENOMEM;
}
lp->rx_dma_regs = p;
 
-   p = devm_platform_ioremap_resource_byname(pdev, "korina_dma_tx");
+   p = devm_platform_ioremap_resource_byname(pdev, "dma_tx");
if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap Tx DMA registers\n");
return -ENOMEM;
@@ -1148,8 +1153,21 @@ static int korina_remove(struct platform_device *pdev)
return 0;
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id korina_match[] = {
+   {
+   .compatible = "idt,3243x-emac",
+   },
+   { }
+};
+MODULE_DEVICE_TABLE(of, korina_match);
+#endif
+
 static struct platform_driver korina_driver = {
-   .driver.name = "korina",
+   .driver = {
+   .name = "korina",
+   .of_match_table = of_match_ptr(korina_match),
+   },
.probe = korina_probe,
.remove = korina_remove,
 };
-- 
2.29.2

[PATCH v6 net-next 04/10] net: korina: Remove nested helpers

2021-04-18 Thread Thomas Bogendoerfer

Remove helpers, which are only used in one call site.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 28 +++-
 1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index c7abb4a8dd37..955fe3d3da06 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -142,12 +142,6 @@ struct korina_private {
 
 extern unsigned int idt_cpu_freq;
 
-static inline void korina_start_dma(struct dma_reg *ch, u32 dma_addr)
-{
-   writel(0, >dmandptr);
-   writel(dma_addr, >dmadptr);
-}
-
 static inline void korina_abort_dma(struct net_device *dev,
struct dma_reg *ch)
 {
@@ -164,11 +158,6 @@ static inline void korina_abort_dma(struct net_device *dev,
writel(0, >dmandptr);
 }
 
-static inline void korina_chain_dma(struct dma_reg *ch, u32 dma_addr)
-{
-   writel(dma_addr, >dmandptr);
-}
-
 static void korina_abort_tx(struct net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
@@ -183,18 +172,6 @@ static void korina_abort_rx(struct net_device *dev)
korina_abort_dma(dev, lp->rx_dma_regs);
 }
 
-static void korina_start_rx(struct korina_private *lp,
-   struct dma_desc *rd)
-{
-   korina_start_dma(lp->rx_dma_regs, CPHYSADDR(rd));
-}
-
-static void korina_chain_rx(struct korina_private *lp,
-   struct dma_desc *rd)
-{
-   korina_chain_dma(lp->rx_dma_regs, CPHYSADDR(rd));
-}
-
 /* transmit packet */
 static int korina_send_packet(struct sk_buff *skb, struct net_device *dev)
 {
@@ -463,7 +440,7 @@ static int korina_rx(struct net_device *dev, int limit)
rd->devcs = 0;
skb = lp->rx_skb[lp->rx_next_done];
rd->ca = CPHYSADDR(skb->data);
-   korina_chain_rx(lp, rd);
+   writel(CPHYSADDR(rd), >rx_dma_regs->dmandptr);
}
 
return count;
@@ -840,7 +817,8 @@ static int korina_init(struct net_device *dev)
 
writel(0, >rx_dma_regs->dmas);
/* Start Rx DMA */
-   korina_start_rx(lp, >rd_ring[0]);
+   writel(0, >rx_dma_regs->dmandptr);
+   writel(CPHYSADDR(>rd_ring[0]), >rx_dma_regs->dmadptr);
 
writel(readl(>tx_dma_regs->dmasm) &
~(DMA_STAT_FINI | DMA_STAT_ERR),
-- 
2.29.2

[PATCH v6 net-next 03/10] net: korina: Remove not needed cache flushes

2021-04-18 Thread Thomas Bogendoerfer

Descriptors are mapped uncached so there is no need to do any cache
handling for them.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index b56de01f6bb8..c7abb4a8dd37 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -231,7 +231,6 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
dma_cache_wback((u32)skb->data, skb->len);
 
/* Setup the transmit descriptor. */
-   dma_cache_inv((u32) td, sizeof(*td));
td->ca = CPHYSADDR(skb->data);
chain_prev = (lp->tx_chain_tail - 1) & KORINA_TDS_MASK;
chain_next = (lp->tx_chain_tail + 1) & KORINA_TDS_MASK;
@@ -284,7 +283,6 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
lp->tx_chain_tail = chain_next;
}
}
-   dma_cache_wback((u32) td, sizeof(*td));
 
netif_trans_update(dev);
spin_unlock_irqrestore(>lock, flags);
@@ -373,8 +371,6 @@ static int korina_rx(struct net_device *dev, int limit)
u32 devcs, pkt_len, dmas;
int count;
 
-   dma_cache_inv((u32)rd, sizeof(*rd));
-
for (count = 0; count < limit; count++) {
skb = lp->rx_skb[lp->rx_next_done];
skb_new = NULL;
@@ -453,7 +449,6 @@ static int korina_rx(struct net_device *dev, int limit)
~DMA_DESC_COD;
 
lp->rx_next_done = (lp->rx_next_done + 1) & KORINA_RDS_MASK;
-   dma_cache_wback((u32)rd, sizeof(*rd));
rd = >rd_ring[lp->rx_next_done];
writel(~DMA_STAT_DONE, >rx_dma_regs->dmas);
}
@@ -468,7 +463,6 @@ static int korina_rx(struct net_device *dev, int limit)
rd->devcs = 0;
skb = lp->rx_skb[lp->rx_next_done];
rd->ca = CPHYSADDR(skb->data);
-   dma_cache_wback((u32)rd, sizeof(*rd));
korina_chain_rx(lp, rd);
}
 
-- 
2.29.2

[PATCH v6 net-next 02/10] net: korina: Use devres functions

2021-04-18 Thread Thomas Bogendoerfer

Simplify probe/remove code by using devm_ functions.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 64 ---
 1 file changed, 21 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 1b7e1c75ed9e..b56de01f6bb8 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -105,9 +105,9 @@ enum chain_status {
 
 /* Information that need to be kept for each board. */
 struct korina_private {
-   struct eth_regs *eth_regs;
-   struct dma_reg *rx_dma_regs;
-   struct dma_reg *tx_dma_regs;
+   struct eth_regs __iomem *eth_regs;
+   struct dma_reg __iomem *rx_dma_regs;
+   struct dma_reg __iomem *tx_dma_regs;
struct dma_desc *td_ring; /* transmit descriptor ring */
struct dma_desc *rd_ring; /* receive descriptor ring  */
 
@@ -1044,10 +1044,10 @@ static int korina_probe(struct platform_device *pdev)
struct korina_device *bif = platform_get_drvdata(pdev);
struct korina_private *lp;
struct net_device *dev;
-   struct resource *r;
+   void __iomem *p;
int rc;
 
-   dev = alloc_etherdev(sizeof(struct korina_private));
+   dev = devm_alloc_etherdev(>dev, sizeof(struct korina_private));
if (!dev)
return -ENOMEM;
 
@@ -1060,36 +1060,30 @@ static int korina_probe(struct platform_device *pdev)
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_regs");
-   dev->base_addr = r->start;
-   lp->eth_regs = ioremap(r->start, resource_size(r));
-   if (!lp->eth_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_regs");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap registers\n");
-   rc = -ENXIO;
-   goto probe_err_out;
+   return -ENOMEM;
}
+   lp->eth_regs = p;
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_dma_rx");
-   lp->rx_dma_regs = ioremap(r->start, resource_size(r));
-   if (!lp->rx_dma_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_dma_rx");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap Rx DMA registers\n");
-   rc = -ENXIO;
-   goto probe_err_dma_rx;
+   return -ENOMEM;
}
+   lp->rx_dma_regs = p;
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_dma_tx");
-   lp->tx_dma_regs = ioremap(r->start, resource_size(r));
-   if (!lp->tx_dma_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_dma_tx");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap Tx DMA registers\n");
-   rc = -ENXIO;
-   goto probe_err_dma_tx;
+   return -ENOMEM;
}
+   lp->tx_dma_regs = p;
 
lp->td_ring = kmalloc(TD_RING_SIZE + RD_RING_SIZE, GFP_KERNEL);
-   if (!lp->td_ring) {
-   rc = -ENXIO;
-   goto probe_err_td_ring;
-   }
+   if (!lp->td_ring)
+   return -ENOMEM;
 
dma_cache_inv((unsigned long)(lp->td_ring),
TD_RING_SIZE + RD_RING_SIZE);
@@ -1119,7 +1113,8 @@ static int korina_probe(struct platform_device *pdev)
if (rc < 0) {
printk(KERN_ERR DRV_NAME
": cannot register net device: %d\n", rc);
-   goto probe_err_register;
+   kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
+   return rc;
}
timer_setup(>media_check_timer, korina_poll_media, 0);
 
@@ -1127,20 +1122,7 @@ static int korina_probe(struct platform_device *pdev)
 
printk(KERN_INFO "%s: " DRV_NAME "-" DRV_VERSION " " DRV_RELDATE "\n",
dev->name);
-out:
return rc;
-
-probe_err_register:
-   kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
-probe_err_td_ring:
-   iounmap(lp->tx_dma_regs);
-probe_err_dma_tx:
-   iounmap(lp->rx_dma_regs);
-probe_err_dma_rx:
-   iounmap(lp->eth_regs);
-probe_err_out:
-   free_netdev(dev);
-   goto out;
 }
 
 static int korina_remove(struct platform_device *pdev)
@@ -1148,13 +1130,9 @@ static int korina_remove(struct platform_device *pdev)
struct korina_device *bif = platform_get_drvdata(pdev);
struct korina_private *lp = netdev_priv(bif->dev);
 
-   iounmap(lp->eth_regs);
-   iounmap(lp->rx_dma_regs);
-   iounmap(lp->tx_dma_regs);
kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
 
unregister_netdev(bif->dev);
-   free_netdev(bif->dev);
 
return 0;
 }
-- 
2.29.2

[PATCH v6 net-next 01/10] net: korina: Fix MDIO functions

2021-04-18 Thread Thomas Bogendoerfer

Fixed MDIO functions to work reliable and not just by accident.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/Kconfig  |  1 +
 drivers/net/ethernet/korina.c | 56 +++
 2 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index ad04660b97b8..c059b4bd3f23 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -98,6 +98,7 @@ config JME
 config KORINA
tristate "Korina (IDT RC32434) Ethernet support"
depends on MIKROTIK_RB532
+   select MII
help
  If you have a Mikrotik RouterBoard 500 or IDT RC32434
  based system say Y. Otherwise say N.
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 925161959b9b..1b7e1c75ed9e 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -137,7 +138,6 @@ struct korina_private {
struct mii_if_info mii_if;
struct work_struct restart_task;
struct net_device *dev;
-   int phy_addr;
 };
 
 extern unsigned int idt_cpu_freq;
@@ -292,32 +292,48 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
-static int mdio_read(struct net_device *dev, int mii_id, int reg)
+static int korina_mdio_wait(struct korina_private *lp)
+{
+   u32 value;
+
+   return readl_poll_timeout_atomic(>eth_regs->miimind,
+value, value & ETH_MII_IND_BSY,
+1, 1000);
+}
+
+static int korina_mdio_read(struct net_device *dev, int phy, int reg)
 {
struct korina_private *lp = netdev_priv(dev);
int ret;
 
-   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+   ret = korina_mdio_wait(lp);
+   if (ret < 0)
+   return ret;
 
-   writel(0, >eth_regs->miimcfg);
-   writel(0, >eth_regs->miimcmd);
-   writel(mii_id | reg, >eth_regs->miimaddr);
-   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+   writel(phy << 8 | reg, >eth_regs->miimaddr);
+   writel(1, >eth_regs->miimcmd);
 
-   ret = (int)(readl(>eth_regs->miimrdd));
+   ret = korina_mdio_wait(lp);
+   if (ret < 0)
+   return ret;
+
+   if (readl(>eth_regs->miimind) & ETH_MII_IND_NV)
+   return -EINVAL;
+
+   ret = readl(>eth_regs->miimrdd);
+   writel(0, >eth_regs->miimcmd);
return ret;
 }
 
-static void mdio_write(struct net_device *dev, int mii_id, int reg, int val)
+static void korina_mdio_write(struct net_device *dev, int phy, int reg, int 
val)
 {
struct korina_private *lp = netdev_priv(dev);
 
-   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+   if (korina_mdio_wait(lp))
+   return;
 
-   writel(0, >eth_regs->miimcfg);
-   writel(1, >eth_regs->miimcmd);
-   writel(mii_id | reg, >eth_regs->miimaddr);
-   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+   writel(0, >eth_regs->miimcmd);
+   writel(phy << 8 | reg, >eth_regs->miimaddr);
writel(val, >eth_regs->miimwtd);
 }
 
@@ -643,7 +659,7 @@ static void korina_check_media(struct net_device *dev, 
unsigned int init_media)
 {
struct korina_private *lp = netdev_priv(dev);
 
-   mii_check_media(>mii_if, 0, init_media);
+   mii_check_media(>mii_if, 1, init_media);
 
if (lp->mii_if.full_duplex)
writel(readl(>eth_regs->ethmac2) | ETH_MAC2_FD,
@@ -869,12 +885,15 @@ static int korina_init(struct net_device *dev)
 * Clock independent setting */
writel(((idt_cpu_freq) / MII_CLOCK + 1) & ~1,
>eth_regs->ethmcp);
+   writel(0, >eth_regs->miimcfg);
 
/* don't transmit until fifo contains 48b */
writel(48, >eth_regs->ethfifott);
 
writel(ETH_MAC1_RE, >eth_regs->ethmac1);
 
+   korina_check_media(dev, 1);
+
napi_enable(>napi);
netif_start_queue(dev);
 
@@ -1089,11 +1108,10 @@ static int korina_probe(struct platform_device *pdev)
dev->watchdog_timeo = TX_TIMEOUT;
netif_napi_add(dev, >napi, korina_poll, NAPI_POLL_WEIGHT);
 
-   lp->phy_addr = (((lp->rx_irq == 0x2c? 1:0) << 8) | 0x05);
lp->mii_if.dev = dev;
-   lp->mii_if.mdio_read = mdio_read;
-   lp->mii_if.mdio_write = mdio_write;
-   lp->mii_if.phy_id = lp->phy_addr;
+   lp->mii_if.mdio_read = korina_mdio_read;
+   lp->mii_if.mdio_write = korina_mdio_write;
+   lp->mii_if.phy_id = 1;
lp->mii_if.phy_id_mask = 0x1f;
lp->mii_if.reg_num_mask = 0x1f;
 
-- 
2.29.2

[PATCH v6 net-next 00/10] net: Korina improvements

2021-04-18 Thread Thomas Bogendoerfer

While converting Mikrotik RB532 support to use device tree I stumbled
over the korina ethernet driver, which used way too many MIPS specific
hacks. This series cleans this all up and adds support for device tree.

Changes in v6:
 - remove korina from resource names and adapt DT binding to it
 - removed superfluous braces around of_get_mac_address

Changes in v5:
  - fixed email address in binding document, which prevented sending it

Changes in v4:
  - improve error returns in mdio_read further
  - added clock name and improved clk handling
  - fixed binding errors

Changes in v3:
  - use readl_poll_timeout_atomic in mdio_wait
  - return -ETIMEDOUT, if mdio_wait failed
  - added DT binding and changed name to idt,3243x-emac
  - fixed usage of of_get_mac_address for net-next

Changes in v2:
  - added device tree support to get rid of idt_cpu_freq
  - fixed compile test on 64bit archs
  - fixed descriptor current address handling by storing/using mapped
dma addresses (dma controller modifies current address)


Thomas Bogendoerfer (10):
  net: korina: Fix MDIO functions
  net: korina: Use devres functions
  net: korina: Remove not needed cache flushes
  net: korina: Remove nested helpers
  net: korina: Use DMA API
  net: korina: Only pass mac address via platform data
  net: korina: Add support for device tree
  net: korina: Get mdio input clock via common clock framework
  net: korina: Make driver COMPILE_TESTable
  dt-bindings: net: korina: Add DT bindings for IDT 79RC3243x SoCs

 .../bindings/net/idt,3243x-emac.yaml  |  73 +++
 arch/mips/rb532/devices.c |  25 +-
 drivers/net/ethernet/Kconfig  |   3 +-
 drivers/net/ethernet/korina.c | 603 +-
 4 files changed, 515 insertions(+), 189 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/idt,3243x-emac.yaml

-- 
2.29.2

Re: [PATCH 05/15] x86: Implement function_nocfi

2021-04-18 Thread Thomas Gleixner

On Sat, Apr 17 2021 at 17:11, Andy Lutomirski wrote:
> On Sat, Apr 17, 2021 at 4:53 PM Thomas Gleixner  wrote:
>> which works for
>>
>>   foo = function_nocfi(bar);
>
> I agree in general.  But right now, we have, in asm/proto.h:
>
> void entry_SYSCALL_64(void);
>
> and that's pure nonsense.  Depending on your point of view,
> entry_SYSCALL_64 is a symbol that resolves to an integer or it's an
> array of bytes containing instructions, but it is most definitely not
> a function void (void).  So, regardless of any CFI stuff, I propose
> that we standardize our handling of prototypes of symbols that are
> opaque to the C compiler.  Here are a couple of choices:
>
> Easy one:
>
> extern u8 entry_SYSCALL_64[];
>
> Slightly more complicated:
>
> struct opaque_symbol;
> extern struct opaque_symbol entry_SYSCALL_64;
>
> The opaque_symbol variant avoids any possible confusion over the weird
> status of arrays in C, and it's hard to misuse, since struct
> opaque_symbol is an incomplete type.

Makes sense.

Re: [PATCH 05/15] x86: Implement function_nocfi

2021-04-17 Thread Thomas Gleixner

On Sat, Apr 17 2021 at 16:19, Andy Lutomirski wrote:
> On Fri, Apr 16, 2021 at 4:40 PM Kees Cook  wrote:
>> Okay, you're saying you want __builtin_gimme_body_p() to be a constant
>> expression for the compiler, not inline asm?
>
> Yes.
>
> I admit that, in the trivial case where the asm code is *not* a
> C-ABI-compliant function, giving a type that doesn't fool the compiler
> into thinking that it might be is probably the best fix.  Maybe we
> should standardize something, e.g.:
>
> struct raw_symbol;  /* not defined anywhere */
> #define DECLARE_RAW_SYMBOL(x) struct raw_symbol x[]
>
> and then we write this:
>
> DECLARE_RAW_SYMBOL(entry_SYSCALL_64);
>
> wrmsrl(..., (unsigned long)entry_SYSCALL_64);
>
> It would be a bit nifty if we didn't need a forward declaration, but
> I'm not immediately seeing a way to do this without hacks that we'll
> probably regret;
>
> But this doesn't help the case in which the symbol is an actual
> C-callable function and we want to be able to call it, too.

The right way to solve this is that the compiler provides a builtin

 function_nocfi() +/- the naming bikeshed

which works for

  foo = function_nocfi(bar);

and

static unsigned long foo[] = {
   function_nocfi(bar1),
   function_nocfi(bar2),
};

which are pretty much the only possible 2 usecases. If the compiler
wants to have function_nocfi_in_code() and function_nocfi_const()
because it can't figure it out on it's own, then I can live with that,
but that's still several orders of magnitudes better than having to work
around it by whatever nasty macro/struct magic.

I don't care about the slightly more unreadable code, but if that
builtin has a descriptive name, then it's even useful for documentary
purposes. And it can be easily grepped for which makes it subject to
static code analysers which can help to detect abuse.

Anything which requires to come up with half baken constructs to work
around the shortcomings of the compiler are wrong to begin with.

Thanks,

tglx

Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected

2021-04-17 Thread Thomas Gleixner

On Sat, Apr 17 2021 at 15:54, Paul E. McKenney wrote:
> On Sat, Apr 17, 2021 at 02:24:23PM +0200, Thomas Gleixner wrote:
>> I so wish we could just delete all of this horror instead of making it
>> more horrible.
>
> Revisit deleting it in five years if there are no issues, whatever
> "issue" might mean in this context?  Five years should give time for
> this to propagate to the enterprise distros.

That does not help with the fact that the broken hardware will still be
around in 5 years from now. Which is what prevents me from deleting the
whole watchdog muck ...

Thanks,

tglx

Re: [PATCH] Fix set apic mode from x2apic enabled bit patch

2021-04-17 Thread Thomas Gleixner

Mike!

On Sun, Apr 18 2021 at 00:39, Thomas Gleixner wrote:
> If you can't come up with something sensible anytime soon before the
> merge window opens then I'm simply going to revert 41e2da9b5e67 and you
> can try again for the next cycle.

so I just figured out that Boris wasted his time once more to fix that
up and redo the commit in question.

Won't happen again.

Thanks,

tglx

Re: [PATCH] Fix set apic mode from x2apic enabled bit patch

2021-04-17 Thread Thomas Gleixner

Mike!

On Thu, Apr 15 2021 at 17:06, Mike Travis wrote:

I'm slowly getting tired of the fact that every patch coming from you
fails to comply with the minimal requirements of the documented
procedures.

$subject: [PATCH] Fix set apic mode from x2apic enabled bit patch

Documentation clearly states, that there has to be a subsystem
prefix. It's not rocket science to figure it out:

# git log arch/x86/kernel/apic/x2apic_uv_x.c

gives you a pretty good hint that the prefix should be:

 x86/platform/uv:

It's not that hard and it's not optional.

Also what the heck means:

 Fix set apic mode from x2apic enabled bit patch

Again documentation is very clear about what the subject line, aka git
shortlog of a patch should look like.

This sentence is just word salad and does not give any clue of what this
is about. Even you wont be able to decode this 3 month from now. Simply
because it's word salad.

I don't care what kind of standards you have @hpe, but I very much care
about the standards of the kernel.

> Do not set uv_system_type for hubless UV systems as it tricks the
> is_uv_system function into thinking it's a UV hubbed system and includes
> a UV HUB RTC.  This causes UV RTC init to panic on UV hubless systems.

And yet more word salad. Can you please describe the precise technical
problem and not use metaphors of functions which are thinking?

Aside of that this does not describe the change at all. The change is:

> - early_set_apic_mode();

but your changelog blurbs about "thinking it's an UV hubbed system".

How the heck can this make sense for someone who is not part of the @hpe
universe mindset?

> Fixes: 41e2da9b5e67 ("x86/platform/uv: Use x2apic enabled bit as set by BIOS 
> to indicate APIC mode")
>
> [41e2da9b5e67 was accepted into tip x86/platform branch but not yet
> pulled into the linus tree.]

Truly useful. The only value of that information is that the maintainer
has to manualy remove it because it's irrelevant for the final commit
message of the change which ends up on top of that commit in that
branch. You can stick this into the section below '---' if you think
it's helpful, but then maintainers and tools can just ignore it.

TBH, it's not helpful at all because tooling already tells us that
41e2da9b5e67 is not in Linus tree and only queued in tip x86/platform.

Commit SHAs are supposed to be unique for a reason...

> Signed-off-by: Mike Travis 
> Reviewed-by: Steve Wahl 
> Reviewed-by: Dimitri Sivanich 

The value of these reviewed-by tags is close to zero because they are
just documenting that the change is ok at the @hpe universe level...

> ---
>  arch/x86/kernel/apic/x2apic_uv_x.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c 
> b/arch/x86/kernel/apic/x2apic_uv_x.c
> index 2e99605f9a05..68ef9abc91f7 100644
> --- a/arch/x86/kernel/apic/x2apic_uv_x.c
> +++ b/arch/x86/kernel/apic/x2apic_uv_x.c
> @@ -413,9 +413,8 @@ static int __init uv_set_system_type(char *_oem_id, char 
> *_oem_table_id)
>   else
>   uv_hubless_system |= 0x8;
>  
> - /* Copy OEM Table ID and set APIC Mode */
> + /* Copy OEM Table ID */
>   uv_stringify(sizeof(oem_table_id), oem_table_id, _oem_table_id);
> - early_set_apic_mode();

So the patch itself tells me more about what's going on than the
changelog:

  Setting up the APIC mode at this place is wrong.

But it does not tell me WHY. Neither does the changelog because of
useless word salad...

If you can't come up with something sensible anytime soon before the
merge window opens then I'm simply going to revert 41e2da9b5e67 and you
can try again for the next cycle.

Thanks,

tglx

Re: [PATCH v5] hrtimer: avoid retrigger_next_event IPI

2021-04-17 Thread Thomas Gleixner

On Sat, Apr 17 2021 at 18:24, Thomas Gleixner wrote:
> On Fri, Apr 16 2021 at 13:13, Peter Xu wrote:
>> On Fri, Apr 16, 2021 at 01:00:23PM -0300, Marcelo Tosatti wrote:
>>>  
>>> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |   \
>>> +(1U << HRTIMER_BASE_REALTIME_SOFT) |   \
>>> +(1U << HRTIMER_BASE_TAI) | \
>>> +(1U << HRTIMER_BASE_TAI_SOFT))
>>> +
>>> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
>>> +{
>>> +   if (cpu_base->softirq_activated)
>>> +   return true;
>>
>> A pure question on whether this check is needed...
>>
>> Here even if softirq_activated==1 (as softirq is going to happen), as long as
>> (cpu_base->active_bases & CLOCK_SET_BASES)==0, shouldn't it already mean that
>> "yes indeed clock was set, but no need to kick this cpu as no relevant 
>> timer"?
>> As that question seems to be orthogonal to whether a softirq is going to
>> trigger on that cpu.
>
> That's correct and it's not any different from firing the IPI because in
> both cases the update happens with the base lock of the CPU in question
> held. And if there are no active timers in any of the affected bases,
> then there is no need to reevaluate the next expiry because the offset
> update does not affect any armed timers. It just makes sure that the
> next enqueu of a timer on such a base will see the the correct offset.
>
> I'll just zap it.

But the whole thing is still wrong in two aspects:

1) BOOTTIME can be one of the affected clocks when sleep time
   (suspended time) is injected because that uses the same mechanism.

   Sorry for missing that earlier when I asked to remove it, but
   that's trivial to fix by adding the BOOTTIME base back.

2) What's worse is that on resume this might break because that
   mechanism is also used to enforce the reprogramming of the clock
   event devices and there we cannot be selective on clock bases.

   I need to dig deeper into that because suspend/resume has changed
   a lot over time, so this might be just a historical leftover. But
   without proper analysis we might end up with subtle and hard to
   debug wreckage.

Thanks,

tglx

Re: [PATCH v5] hrtimer: avoid retrigger_next_event IPI

2021-04-17 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 13:13, Peter Xu wrote:
> On Fri, Apr 16, 2021 at 01:00:23PM -0300, Marcelo Tosatti wrote:
>>  
>> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) |\
>> + (1U << HRTIMER_BASE_REALTIME_SOFT) |   \
>> + (1U << HRTIMER_BASE_TAI) | \
>> + (1U << HRTIMER_BASE_TAI_SOFT))
>> +
>> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
>> +{
>> +if (cpu_base->softirq_activated)
>> +return true;
>
> A pure question on whether this check is needed...
>
> Here even if softirq_activated==1 (as softirq is going to happen), as long as
> (cpu_base->active_bases & CLOCK_SET_BASES)==0, shouldn't it already mean that
> "yes indeed clock was set, but no need to kick this cpu as no relevant timer"?
> As that question seems to be orthogonal to whether a softirq is going to
> trigger on that cpu.

That's correct and it's not any different from firing the IPI because in
both cases the update happens with the base lock of the CPU in question
held. And if there are no active timers in any of the affected bases,
then there is no need to reevaluate the next expiry because the offset
update does not affect any armed timers. It just makes sure that the
next enqueu of a timer on such a base will see the the correct offset.

I'll just zap it.

Thanks,

tglx

Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-17 Thread Thomas Gleixner

On Tue, Apr 13 2021 at 21:36, Paul E. McKenney wrote:

Bah, hit send too quick.

> + cpumask_clear(_ahead);
> + cpumask_clear(_behind);
> + preempt_disable();

Daft. 

> + testcpu = smp_processor_id();
> + pr_warn("Checking clocksource %s synchronization from CPU %d.\n", 
> cs->name, testcpu);
> + for_each_online_cpu(cpu) {
> + if (cpu == testcpu)
> + continue;
> + csnow_begin = cs->read(cs);
> + smp_call_function_single(cpu, clocksource_verify_one_cpu, cs, 
> 1);
> + csnow_end = cs->read(cs);

As this must run with interrupts enabled, that's a pretty rough
approximation like measuring wind speed with a wet thumb.

Wouldn't it be smarter to let the remote CPU do the watchdog dance and
take that result? i.e. split out more of the watchdog code so that you
can get the nanoseconds delta on that remote CPU to the watchdog.

> + delta = (s64)((csnow_mid - csnow_begin) & cs->mask);
> + if (delta < 0)
> + cpumask_set_cpu(cpu, _behind);
> + delta = (csnow_end - csnow_mid) & cs->mask;
> + if (delta < 0)
> + cpumask_set_cpu(cpu, _ahead);
> + delta = clocksource_delta(csnow_end, csnow_begin, cs->mask);
> + cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);

> + if (firsttime || cs_nsec > cs_nsec_max)
> + cs_nsec_max = cs_nsec;
> + if (firsttime || cs_nsec < cs_nsec_min)
> + cs_nsec_min = cs_nsec;
> + firsttime = 0;

  int64_t cs_nsec_max = 0, cs_nsec_min = LLONG_MAX;

and then the firsttime muck is not needed at all.

Thanks,

tglx

Re: [PATCH v8 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-17 Thread Thomas Gleixner

On Tue, Apr 13 2021 at 21:36, Paul E. McKenney wrote:
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 1fc0962c89c0..97eeaf164296 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
>   .read   = kvm_clock_get_cycles,
>   .rating = 400,
>   .mask   = CLOCKSOURCE_MASK(64),
> - .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
> + .flags  = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,

kvm_clock is not marked with CLOCK_SOURCE_MUST_VERIFY, so what's the
point of adding this here? It's not subject to be monitored by the
watchdog muck.

> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index f70dffc2771f..56289170753c 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
>   .mask   = CLOCKSOURCE_MASK(64),
>   .flags  = CLOCK_SOURCE_IS_CONTINUOUS |
> CLOCK_SOURCE_VALID_FOR_HRES |
> -   CLOCK_SOURCE_MUST_VERIFY,
> +   CLOCK_SOURCE_MUST_VERIFY |
> +   CLOCK_SOURCE_VERIFY_PERCPU,

While this one is part of the horror show.

> +static u64 csnow_mid;
> +static cpumask_t cpus_ahead;
> +static cpumask_t cpus_behind;
> +
> +static void clocksource_verify_one_cpu(void *csin)
> +{
> + struct clocksource *cs = (struct clocksource *)csin;
> +
> + csnow_mid = cs->read(cs);
> +}
> +
> +static void clocksource_verify_percpu(struct clocksource *cs)
> +{
> + int64_t cs_nsec, cs_nsec_max, cs_nsec_min;
> + u64 csnow_begin, csnow_end;
> + bool firsttime = 1;
> + int testcpu;
> + s64 delta;
> + int cpu;

int testcpu, cpu; :)

Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected

2021-04-17 Thread Thomas Gleixner

On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
>  #define WATCHDOG_INTERVAL (HZ >> 1)
>  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
> +#define WATCHDOG_MAX_SKEW (NSEC_PER_SEC >> 6)

That's ~15ms which is a tad large I'd say...
  
>  static void clocksource_watchdog_work(struct work_struct *work)
>  {
> @@ -213,9 +214,10 @@ static void clocksource_watchdog_inject_delay(void)
>  static void clocksource_watchdog(struct timer_list *unused)
>  {
>   struct clocksource *cs;
> - u64 csnow, wdnow, cslast, wdlast, delta;
> - int64_t wd_nsec, cs_nsec;
> + u64 csnow, wdnow, wdagain, cslast, wdlast, delta;
> + int64_t wd_nsec, wdagain_delta, wderr_nsec = 0, cs_nsec;
>   int next_cpu, reset_pending;
> + int nretries;
>  
>   spin_lock(_lock);
>   if (!watchdog_running)
> @@ -224,6 +226,7 @@ static void clocksource_watchdog(struct timer_list 
> *unused)
>   reset_pending = atomic_read(_reset_pending);
>  
>   list_for_each_entry(cs, _list, wd_list) {
> + nretries = 0;
>  
>   /* Clocksource already marked unstable? */
>   if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
> @@ -232,11 +235,24 @@ static void clocksource_watchdog(struct timer_list 
> *unused)
>   continue;
>   }
>  
> +retry:
>   local_irq_disable();
> - csnow = cs->read(cs);
> - clocksource_watchdog_inject_delay();
>   wdnow = watchdog->read(watchdog);
> + clocksource_watchdog_inject_delay();
> + csnow = cs->read(cs);
> + wdagain = watchdog->read(watchdog);
>   local_irq_enable();
> + delta = clocksource_delta(wdagain, wdnow, watchdog->mask);
> + wdagain_delta = clocksource_cyc2ns(delta, watchdog->mult, 
> watchdog->shift);
> + if (wdagain_delta > WATCHDOG_MAX_SKEW) {
> + wderr_nsec = wdagain_delta;
> + if (nretries++ < max_read_retries)
> + goto retry;
> + }
> + if (nretries) {
> + pr_warn("timekeeping watchdog on CPU%d: %s read-back 
> delay of %lldns, attempt %d\n",
> + smp_processor_id(), watchdog->name, wderr_nsec, 
> nretries);
> + }
>  
>   /* Clocksource initialized ? */
>   if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||

This can nicely be split out into a read function which avoids brain
overload when reading. Something like the uncompiled below.

I so wish we could just delete all of this horror instead of making it
more horrible.

Thanks,

tglx
---
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -124,6 +124,12 @@ static void __clocksource_change_rating(
 #define WATCHDOG_INTERVAL (HZ >> 1)
 #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
 
+/*
+ * The maximum delay between two consecutive readouts of the watchdog
+ * clocksource to detect SMI,NMI,vCPU preemption.
+ */
+#define WATCHDOG_MAX_DELAY (100 * NSEC_PER_USEC)
+
 static void clocksource_watchdog_work(struct work_struct *work)
 {
/*
@@ -184,12 +190,37 @@ void clocksource_mark_unstable(struct cl
spin_unlock_irqrestore(_lock, flags);
 }
 
+static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
+{
+   unsigned int nretries;
+   u64 wd_end, wd_delta;
+   int64_t wd_delay;
+
+   for (nretries = 0; nretries < max_read_retries; nretries++) {
+   local_irq_disable();
+   *wdnow = watchdog->read(watchdog);
+   clocksource_watchdog_inject_delay();
+   *csnow = cs->read(cs);
+   wd_end = watchdog->read(watchdog);
+   local_irq_enable();
+
+   wd_delta = clocksource_delta(wd_end, *wdnow, watchdog->mask);
+   wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, 
watchdog->shift);
+   if (wd_delay < WATCHDOG_MAX_DELAY)
+   return true;
+   }
+
+   pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, 
%d attempts\n",
+   smp_processor_id(), watchdog->name, wd_delay, nretries);
+   return false;
+}
+
 static void clocksource_watchdog(struct timer_list *unused)
 {
-   struct clocksource *cs;
u64 csnow, wdnow, cslast, wdlast, delta;
-   int64_t wd_nsec, cs_nsec;
int next_cpu, reset_pending;
+   int64_t wd_nsec, cs_nsec;
+   struct clocksource *cs;
 
spin_lock(_lock);
if (!watchdog_running)
@@ -206,10 +237,14 @@ static void clocksource_watchdog(struct
continue;
}
 
-   local_irq_disable();
-   csnow = cs->read(cs);
-   wdnow = watchdog->read(watchdog);
-   local_irq_enable();
+   if (!cs_watchdog_read(cs, , )) {
+   /*
+* No point to continue if the watchdog

Re: [PATCH 05/15] x86: Implement function_nocfi

2021-04-17 Thread Thomas Gleixner

On Sat, Apr 17 2021 at 01:02, Thomas Gleixner wrote:
> On Fri, Apr 16 2021 at 15:37, Kees Cook wrote:
>
>> On Fri, Apr 16, 2021 at 03:20:17PM -0700, Andy Lutomirski wrote:
>>> But obviously there is code that needs real function pointers.  How
>>> about making this a first-class feature, or at least hacking around it
>>> more cleanly.  For example, what does this do:
>>> 
>>> char entry_whatever[];
>>> wrmsrl(..., (unsigned long)entry_whatever);
>>
>> This is just casting. It'll still resolve to the jump table entry.
>>
>>> or, alternatively,
>>> 
>>> extern void func() __attribute__((nocfi));
>>
>> __nocfi says func() should not perform checking of correct jump table
>> membership for indirect calls.
>>
>> But we don't want a global marking for a function to be ignored by CFI;
>> we don't want functions to escape CFI -- we want specific _users_ to
>> either not check CFI for indirect calls (__nocfi) or we want specific
>> passed addresses to avoid going through the jump table
>> (function_nocfi()).
>
> And that's why you mark entire files to be exempt without any rationale
> why it makes sense.

The reason why you have to do that is because function_nocfi() is not
provided by the compiler.

So you need to hack around that with that macro which fails to work
e.g. for the idt data arrays.

Is there any fundamental reason why the compiler does not provide that
in a form which allows to use it everywhere?

It's not too much asked from a tool which provides new functionality to
provide it in a way which is usable.

Thanks,

tglx

Re: [PATCH 04/15] static_call: Use global functions for the self-test

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 23:37, Thomas Gleixner wrote:
> On Fri, Apr 16 2021 at 13:38, Sami Tolvanen wrote:
>>  #ifdef CONFIG_STATIC_CALL_SELFTEST
>>  
>> -static int func_a(int x)
>> +int func_a(int x)
>>  {
>>  return x+1;
>>  }
>>  
>> -static int func_b(int x)
>> +int func_b(int x)
>>  {
>>  return x+2;
>>  }
>
> Did you even compile that?
>
> Global functions without a prototype are generating warnings, but we can
> ignore them just because of sekurity, right?
>
> Aside of that polluting the global namespace with func_a/b just to work
> around a tool shortcoming is beyond hillarious.
>
> Fix the tool not the perfectly correct code.

That said, I wouldn't mind a  __dont_dare_to_rename annotation to help
the compiler, but anything else is just wrong.

Thanks,

tglx

Re: [PATCH 06/15] x86: Avoid CFI jump tables in IDT and entry points

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 16:56, Kees Cook wrote:
> On Sat, Apr 17, 2021 at 12:26:56AM +0200, Thomas Gleixner wrote:
>> Where is the analysis why excluding 
>> 
>> > +CFLAGS_REMOVE_idt.o   := $(CC_FLAGS_CFI)
>> > +CFLAGS_REMOVE_paravirt.o  := $(CC_FLAGS_CFI)
>> 
>> all of idt.c and paravirt.c is correct and how that is going to be
>> correct in the future?
>> 
>> These files are excluded from CFI, so I can add whatever I want to them
>> and circumvent the purpose of CFI, right?
>> 
>> Brilliant plan that. But I know, sekurity ...
>
> *sigh* we're on the same side. :P I will choose to understand your
> comments here as:
>
> "How will enforcement of CFI policy be correctly maintained here if
> the justification for disabling it for whole compilation units is not
> clearly understandable by other developers not familiar with the nuances
> of its application?"

Plus, if there is a justification for disabling it for a whole
compilation unit:

 Where is the tooling which makes sure that this compilation unit is not
 later on filled with code which should be subject to CFI?

Thanks,

tglx

Re: [PATCH 05/15] x86: Implement function_nocfi

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 15:37, Kees Cook wrote:

> On Fri, Apr 16, 2021 at 03:20:17PM -0700, Andy Lutomirski wrote:
>> But obviously there is code that needs real function pointers.  How
>> about making this a first-class feature, or at least hacking around it
>> more cleanly.  For example, what does this do:
>> 
>> char entry_whatever[];
>> wrmsrl(..., (unsigned long)entry_whatever);
>
> This is just casting. It'll still resolve to the jump table entry.
>
>> or, alternatively,
>> 
>> extern void func() __attribute__((nocfi));
>
> __nocfi says func() should not perform checking of correct jump table
> membership for indirect calls.
>
> But we don't want a global marking for a function to be ignored by CFI;
> we don't want functions to escape CFI -- we want specific _users_ to
> either not check CFI for indirect calls (__nocfi) or we want specific
> passed addresses to avoid going through the jump table
> (function_nocfi()).

And that's why you mark entire files to be exempt without any rationale
why it makes sense.

Thanks,

tglx

Re: [PATCH 06/15] x86: Avoid CFI jump tables in IDT and entry points

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 13:38, Sami Tolvanen wrote:
> With CONFIG_CFI_CLANG, the compiler replaces function addresses in C
> code with jump table addresses.

Fine.

> To avoid referring to jump tables in entry code with PTI,

What has this to do with PTI?

> disable CFI for IDT and paravirt code, and use function_nocfi() to
> prevent jump table addresses from being added to the IDT or system
> call entry points.

How does this changelog make sense for anyone not familiar with the
matter at hand?

Where is the analysis why excluding 

> +CFLAGS_REMOVE_idt.o  := $(CC_FLAGS_CFI)
> +CFLAGS_REMOVE_paravirt.o := $(CC_FLAGS_CFI)

all of idt.c and paravirt.c is correct and how that is going to be
correct in the future?

These files are excluded from CFI, so I can add whatever I want to them
and circumvent the purpose of CFI, right?

Brilliant plan that. But I know, sekurity ...

Thanks,

tglx

Re: [PATCH 05/15] x86: Implement function_nocfi

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 14:49, Sami Tolvanen wrote:
> On Fri, Apr 16, 2021 at 2:18 PM Borislav Petkov  wrote:
>> In file included from ./include/linux/ftrace.h:22:0,
>>  from ./include/linux/init_task.h:9,
>>  from init/init_task.c:2:
>> ./include/linux/ftrace.h: In function ‘ftrace_init_nop’:
>> ./arch/x86/include/asm/ftrace.h:9:40: error: implicit declaration of 
>> function ‘function_nocfi’ [-Werror=implicit-function-declaration]
>
> This is defined in linux-next, but I do see another issue, which I'll
> fix in v2. Note that CFI_CLANG itself cannot be selected on 32-bit
> x86.

Sure and because of that it's overrated to make sure that it does not
break the build. I know, sekurity ...

But aside of that when looking at the rest of the series, then I really
have to ask whether the only way to address this is to make a large
amount of code unreadable like this:

-   wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
+   wrmsrl(MSR_LSTAR, (unsigned long)function_nocfi(entry_SYSCALL_64));

plus a gazillion of similar changes.

This is unreadable garbage.

Thanks,

tglx

Re: [PATCH v2] X86: Makefile: Replace -pg with CC_FLAGS_FTRACE

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 13:39, zhaoxiao wrote:
> In preparation for x86 supporting ftrace built on other compiler
> options, let's have the x86 Makefiles remove the $(CC_FLAGS_FTRACE)
> flags, whatever these may be, rather than assuming '-pg'.

s/let's have the/make the/

> There should be no functional change as a result of this patch.

Should be? Either you know that there is no functional change or not. If
you're unsure why are you posting it at all?

So this wants to be:

   No functional change.

Other than that and the fact that this is incomplete as it fails to
catch _all_ instances of -pg in arch/x86 and also leaves stale comments
around I'm ok with the intent.

# git grep '\-pg' arch/x86/
arch/x86/entry/vdso/Makefile:# vDSO code runs in userspace and -pg doesn't help 
with profiling anyway.
arch/x86/kernel/ftrace_64.S: * gcc -pg option adds a call to 'mcount' in most 
functions.
arch/x86/purgatory/Makefile:# Default KBUILD_CFLAGS can have -pg option set 
when FTRACE is enabled. That
arch/x86/um/vdso/Makefile:# vDSO code runs in userspace and -pg doesn't help 
with profiling anyway.
arch/x86/um/vdso/Makefile:CFLAGS_REMOVE_vdso-note.o = -pg -fprofile-arcs 
-ftest-coverage
arch/x86/um/vdso/Makefile:CFLAGS_REMOVE_um_vdso.o = -pg -fprofile-arcs 
-ftest-coverage

grep is not rocket science ...

Thanks,

tglx

Re: [patch] x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 17:13, Mike Galbraith wrote:
> On Fri, 2021-04-16 at 16:44 +0200, Borislav Petkov wrote:
>> On Fri, Apr 16, 2021 at 03:16:07PM +0200, Mike Galbraith wrote:
>> > On Fri, 2021-04-16 at 14:16 +0200, Borislav Petkov wrote:
>> > >
>> > > Please be more verbose and structure your commit message like this:
>> >
>> > Hrmph, I thought it was too verbose for dinky one-liner if anything.
>>
>> Please look at how other commit messages in tip have free text - not
>> only tools output.
>>
>> Also, this looks like a fix for some previous commit. Please dig out
>> which commit introduced the issue and put its commit ID in a Fixes: tag
>> above your S-o-B.
>>
>> If you don't have time or desire to do that, you can say so and I'll do
>> it myself when I get a chance.
>
> Ok, bin it for the nonce.

Can all of you involved stop this sandpit fight and do something useful
to fix that obvious bug already?

OMG

Re: [PATCH 04/15] static_call: Use global functions for the self-test

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 13:38, Sami Tolvanen wrote:

> With CONFIG_CFI_CLANG, the compiler renames static functions. This
> breaks static_call users using static functions, because the current
> implementation assumes functions have stable names by hardcoding them
> in inline assembly. Make the self-test functions global to prevent the
> compiler from renaming them.

Sorry, no.

> Signed-off-by: Sami Tolvanen 
> ---
>  kernel/static_call.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/static_call.c b/kernel/static_call.c
> index 723fcc9d20db..d09f500c2d2a 100644
> --- a/kernel/static_call.c
> +++ b/kernel/static_call.c
> @@ -503,12 +503,12 @@ long __static_call_return0(void)
>  
>  #ifdef CONFIG_STATIC_CALL_SELFTEST
>  
> -static int func_a(int x)
> +int func_a(int x)
>  {
>   return x+1;
>  }
>  
> -static int func_b(int x)
> +int func_b(int x)
>  {
>   return x+2;
>  }

Did you even compile that?

Global functions without a prototype are generating warnings, but we can
ignore them just because of sekurity, right?

Aside of that polluting the global namespace with func_a/b just to work
around a tool shortcoming is beyond hillarious.

Fix the tool not the perfectly correct code.

Thanks,

tglx

Re: [PATCH v8 clocksource 2/5] clocksource: Retry clock read if long delays detected

2021-04-16 Thread Thomas Gleixner

On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
>  #define WATCHDOG_INTERVAL (HZ >> 1)
>  #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)

Didn't we discuss that the threshold is too big ?

Re: [PATCH v8 clocksource 1/5] clocksource: Provide module parameters to inject delays in watchdog

2021-04-16 Thread Thomas Gleixner

On Tue, Apr 13 2021 at 21:35, Paul E. McKenney wrote:
>  
> +static int inject_delay_freq;
> +module_param(inject_delay_freq, int, 0644);
> +static int inject_delay_run = 1;
> +module_param(inject_delay_run, int, 0644);

int? Can't we just make them 'unsigned int'? Negative values are not
that useful.

> +static int max_read_retries = 3;
> +module_param(max_read_retries, int, 0644);

max_read_retries is unused here. Should be in the patch which actually
uses it.

> +static void clocksource_watchdog_inject_delay(void)
> +{
> + int i;
> + static int injectfail = -1;
> +
> + if (inject_delay_freq <= 0 || inject_delay_run <= 0)
> + return;
> + if (injectfail < 0 || injectfail > INT_MAX / 2)
> + injectfail = inject_delay_run;
> + if (!(++injectfail / inject_delay_run % inject_delay_freq)) {

Operator precedence based cleverness is really easy to parse - NOT!

> + pr_warn("%s(): Injecting delay.\n", __func__);
> + for (i = 0; i < 2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC; i++)
> + udelay(1000);
> + pr_warn("%s(): Done injecting delay.\n", __func__);
> + }
> +
> + WARN_ON_ONCE(injectfail < 0);
> +}

Brain melt stage reached by now.

static unsigned int invocations, injections;

if (!inject_delay_period || !inject_delay_repeat)
return;

if (!(invocations % inject_delay_period)) {
mdelay(2 * WATCHDOG_THRESHOLD / NSEC_PER_MSEC);
if (++injections < inject_delay_repeat)
return;
injections = 0;
}

invocations++;
}

Hmm?

Thanks,

tglx

Re: [PATCH v5 net-next 10/10] dt-bindings: net: korina: Add DT bindings for IDT 79RC3243x SoCs

2021-04-16 Thread Thomas Bogendoerfer

On Fri, Apr 16, 2021 at 12:29:46PM +0300, Sergei Shtylyov wrote:
> On 16.04.2021 11:52, Thomas Bogendoerfer wrote:
> 
> > Add device tree bindings for ethernet controller integrated into
> > IDT 79RC3243x SoCs.
> > 
> > Signed-off-by: Thomas Bogendoerfer 
> > ---
> >   .../bindings/net/idt,3243x-emac.yaml  | 74 +++
> >   1 file changed, 74 insertions(+)
> >   create mode 100644 
> > Documentation/devicetree/bindings/net/idt,3243x-emac.yaml
> > 
> > diff --git a/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml 
> > b/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml
> > new file mode 100644
> > index ..3697af5cb66f
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml
> > @@ -0,0 +1,74 @@
> > +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/net/idt,3243x-emac.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: IDT 79rc3243x Ethernet controller
> > +
> > +description: Ethernet controller integrated into IDT 79RC3243x family SoCs
> > +
> > +maintainers:
> > +  - Thomas Bogendoerfer 
> > +
> > +allOf:
> > +  - $ref: ethernet-controller.yaml#
> > +
> > +properties:
> > +  compatible:
> > +const: idt,3243x-emac
> > +
> > +  reg:
> > +maxItems: 3
> > +
> > +  reg-names:
> > +items:
> > +  - const: korina_regs
> > +  - const: korina_dma_rx
> > +  - const: korina_dma_tx
> > +
> > +  interrupts:
> > +items:
> > +  - description: RX interrupt
> > +  - description: TX interrupt
> > +
> > +  interrupt-names:
> > +items:
> > +  - const: korina_rx
> > +  - const: korina_tx
> > +
> > +  clocks:
> > +maxItems: 1
> > +
> > +  clock-names:
> > +items:
> > +  - const: mdioclk
> > +
> > +required:
> > +  - compatible
> > +  - reg
> > +  - reg-names
> > +  - interrupts
> > +  - interrupt-names
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > +  - |
> > +
> > +ethernet@6 {
> > +compatible = "idt,3243x-emac";
> > +
> > +reg = <0x6 0x1>,
> > +  <0x4 0x14>,
> > +  <0x40014 0x14>;
> > +reg-names = "korina_regs",
> > +    "korina_dma_rx",
> > +"korina_dma_tx";
> > +
> > +interrupts-extended = < 0>, < 1>;
> 
>You use this prop, yet don't describe it?

that's just interrupt-parent and interrupts in one statement. And since
make dt_binding_check didn't complained I thought that's good this way.
Rob, do I need to describe interrupts-extended as well ?

I could change that to interrupt-parent/interrupts as the driver no
longer uses dma under/overrun interrupts, which have a different
interrupt-parent.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

[PATCH] ccp: ccp - add support for Green Sardine

2021-04-16 Thread Rijo Thomas

From: Devaraj Rangasamy 

Add a new PCI device entry for Green Sardine APU.

Signed-off-by: Devaraj Rangasamy 
Tested-by: Babulu Ellune 
Signed-off-by: Rijo Thomas 
---
 drivers/crypto/ccp/sp-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/ccp/sp-pci.c b/drivers/crypto/ccp/sp-pci.c
index f471dbaef1fb..f468594ef8af 100644
--- a/drivers/crypto/ccp/sp-pci.c
+++ b/drivers/crypto/ccp/sp-pci.c
@@ -356,6 +356,7 @@ static const struct pci_device_id sp_pci_table[] = {
{ PCI_VDEVICE(AMD, 0x1468), (kernel_ulong_t)_vdata[2] },
{ PCI_VDEVICE(AMD, 0x1486), (kernel_ulong_t)_vdata[3] },
{ PCI_VDEVICE(AMD, 0x15DF), (kernel_ulong_t)_vdata[4] },
+   { PCI_VDEVICE(AMD, 0x1649), (kernel_ulong_t)_vdata[4] },
/* Last entry must be zero */
{ 0, }
 };
-- 
2.17.1

Re: [PATCH] kernel:irq:manage: request threaded irq with a specified priority

2021-04-16 Thread Thomas Gleixner

On Fri, Apr 16 2021 at 12:57, chensong wrote:
> On 2021/4/13 下午4:39, Thomas Gleixner wrote:
>> It breaks because the system designer failed to assign proper priorities
>> to the irq threads int_a, int_b and to the user space process task_a.
>
> yes, it's designers' responsibility to assign proper priorities, but 
> kernel is also obliged to provide interfaces for those designers.

The interface exists. sched_setscheduler(2)

> chrt can help designers in this case, however, the truth is lot of 
> customers are not familiar with it.

The truth is that real-time systems need to be carefully designed and
parametrized. And that's only possible when _all_ of the system
components and their constraints are known. Trying to solve it at the
device driver level of a single device is impossible and a guarantee for
fail.

If the customer does not know how to do it, then I really have to ask
why he's trying to use a real-time system at all. There is no real-time
system which magically tunes itself by pulling the overall system
constraints out of thin air.

> what's more, chrt can also apply to userspace rt task, but userspace
> also has sched_setscheduler to assgin proper priority inside code like
> cyclictest, why can't driver writers have another choice?

There is a very simple reason: The driver writer cannot know about the
requirements of the complete system which is composed of kernel, drivers
and user space applications, unless the driver writer is fully aware of
the overall system design and constraints.

How is that supposed to work on a general purpose kernel which is
utilized for a gazillion of different use cases which all have different
expectations?

It simply cannot work because default A will only work for usecase A and
be completely wrong for all others.

> Further, what if irq handlear thread has to run on the expected priority 
> at the very beginning? This patch helps.

There is no such thing as the expected priority of an interrupt thread
which can be applied upfront.

There are ~5400 instances of request*irq() in the kernel source and
there is no way to make priority decisions for them which work for every
RT system out there.

The kernel sets a default and the system designer, admin, user has to
take care of tuning it to match the expectations and constraints of his
particular application scenario.

The kernel provides an userspace interface to do that. That interface
might be a bit awkward to use, but there are tools out there which help
with that, and if at all we can think about providing a better and
easier to use interface for this.

Trying to solve that at the kernel level is putting the cart before the
horse.

Thanks,

tglx

[PATCH v5 net-next 09/10] net: korina: Make driver COMPILE_TESTable

2021-04-16 Thread Thomas Bogendoerfer

Move structs/defines for ethernet/dma register into driver, since they
are only used for this driver and remove any MIPS specific includes.
This makes it possible to COMPILE_TEST the driver.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/Kconfig  |   2 +-
 drivers/net/ethernet/korina.c | 249 --
 2 files changed, 239 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c059b4bd3f23..453d202a28c1 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -97,7 +97,7 @@ config JME
 
 config KORINA
tristate "Korina (IDT RC32434) Ethernet support"
-   depends on MIKROTIK_RB532
+   depends on MIKROTIK_RB532 || COMPILE_TEST
select MII
help
  If you have a Mikrotik RouterBoard 500 or IDT RC32434
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index e95c8d87d893..5f39a5bba531 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -59,18 +59,244 @@
 #include 
 #include 
 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
 #define DRV_NAME   "korina"
 #define DRV_VERSION"0.20"
 #define DRV_RELDATE"15Sep2017"
 
+struct eth_regs {
+   u32 ethintfc;
+   u32 ethfifott;
+   u32 etharc;
+   u32 ethhash0;
+   u32 ethhash1;
+   u32 ethu0[4];   /* Reserved. */
+   u32 ethpfs;
+   u32 ethmcp;
+   u32 eth_u1[10]; /* Reserved. */
+   u32 ethspare;
+   u32 eth_u2[42]; /* Reserved. */
+   u32 ethsal0;
+   u32 ethsah0;
+   u32 ethsal1;
+   u32 ethsah1;
+   u32 ethsal2;
+   u32 ethsah2;
+   u32 ethsal3;
+   u32 ethsah3;
+   u32 ethrbc;
+   u32 ethrpc;
+   u32 ethrupc;
+   u32 ethrfc;
+   u32 ethtbc;
+   u32 ethgpf;
+   u32 eth_u9[50]; /* Reserved. */
+   u32 ethmac1;
+   u32 ethmac2;
+   u32 ethipgt;
+   u32 ethipgr;
+   u32 ethclrt;
+   u32 ethmaxf;
+   u32 eth_u10;/* Reserved. */
+   u32 ethmtest;
+   u32 miimcfg;
+   u32 miimcmd;
+   u32 miimaddr;
+   u32 miimwtd;
+   u32 miimrdd;
+   u32 miimind;
+   u32 eth_u11;/* Reserved. */
+   u32 eth_u12;/* Reserved. */
+   u32 ethcfsa0;
+   u32 ethcfsa1;
+   u32 ethcfsa2;
+};
+
+/* Ethernet interrupt registers */
+#define ETH_INT_FC_EN  BIT(0)
+#define ETH_INT_FC_ITS BIT(1)
+#define ETH_INT_FC_RIP BIT(2)
+#define ETH_INT_FC_JAM BIT(3)
+#define ETH_INT_FC_OVR BIT(4)
+#define ETH_INT_FC_UND BIT(5)
+#define ETH_INT_FC_IOC 0x00c0
+
+/* Ethernet FIFO registers */
+#define ETH_FIFI_TT_TTH_BIT0
+#define ETH_FIFO_TT_TTH0x007f
+
+/* Ethernet ARC/multicast registers */
+#define ETH_ARC_PROBIT(0)
+#define ETH_ARC_AM BIT(1)
+#define ETH_ARC_AFMBIT(2)
+#define ETH_ARC_AB BIT(3)
+
+/* Ethernet SAL registers */
+#define ETH_SAL_BYTE_5 0x00ff
+#define ETH_SAL_BYTE_4 0xff00
+#define ETH_SAL_BYTE_3 0x00ff
+#define ETH_SAL_BYTE_2 0xff00
+
+/* Ethernet SAH registers */
+#define ETH_SAH_BYTE1  0x00ff
+#define ETH_SAH_BYTE0  0xff00
+
+/* Ethernet GPF register */
+#define ETH_GPF_PTV0x
+
+/* Ethernet PFG register */
+#define ETH_PFS_PFDBIT(0)
+
+/* Ethernet CFSA[0-3] registers */
+#define ETH_CFSA0_CFSA40x00ff
+#define ETH_CFSA0_CFSA50xff00
+#define ETH_CFSA1_CFSA20x00ff
+#define ETH_CFSA1_CFSA30xff00
+#define ETH_CFSA1_CFSA00x00ff
+#define ETH_CFSA1_CFSA10xff00
+
+/* Ethernet MAC1 registers */
+#define ETH_MAC1_REBIT(0)
+#define ETH_MAC1_PAF   BIT(1)
+#define ETH_MAC1_RFC   BIT(2)
+#define ETH_MAC1_TFC   BIT(3)
+#define ETH_MAC1_LBBIT(4)
+#define ETH_MAC1_MRBIT(31)
+
+/* Ethernet MAC2 registers */
+#define ETH_MAC2_FDBIT(0)
+#define ETH_MAC2_FLC   BIT(1)
+#define ETH_MAC2_HFE   BIT(2)
+#define ETH_MAC2_DCBIT(3)
+#define ETH_MAC2_CEN   BIT(4)
+#define ETH_MAC2_PEBIT(5)
+#define ETH_MAC2_VPE   BIT(6)
+#define ETH_MAC2_APE   BIT(7)
+#define ETH_MAC2_PPE   BIT(8)
+#define ETH_MAC2_LPE   BIT(9)
+#define ETH_MAC2_NBBIT(12)
+#define ETH_MAC2_BPBIT(13)
+#define ETH_MAC2_EDBIT(14)
+
+/* Ethernet IPGT register */
+#define ETH_IPGT   0x007f
+
+/* Ethernet IPGR registers */
+#define ETH_IPGR_IPGR2 0x007f
+#define ETH_IPGR_IPGR1 0x7f00
+
+/* Ethernet CLRT registers */
+#define ETH_CLRT_MAX_RET   0x000f
+#

[PATCH v5 net-next 10/10] dt-bindings: net: korina: Add DT bindings for IDT 79RC3243x SoCs

2021-04-16 Thread Thomas Bogendoerfer

Add device tree bindings for ethernet controller integrated into
IDT 79RC3243x SoCs.

Signed-off-by: Thomas Bogendoerfer 
---
 .../bindings/net/idt,3243x-emac.yaml  | 74 +++
 1 file changed, 74 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/idt,3243x-emac.yaml

diff --git a/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml 
b/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml
new file mode 100644
index ..3697af5cb66f
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/idt,3243x-emac.yaml
@@ -0,0 +1,74 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/idt,3243x-emac.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: IDT 79rc3243x Ethernet controller
+
+description: Ethernet controller integrated into IDT 79RC3243x family SoCs
+
+maintainers:
+  - Thomas Bogendoerfer 
+
+allOf:
+  - $ref: ethernet-controller.yaml#
+
+properties:
+  compatible:
+const: idt,3243x-emac
+
+  reg:
+maxItems: 3
+
+  reg-names:
+items:
+  - const: korina_regs
+  - const: korina_dma_rx
+  - const: korina_dma_tx
+
+  interrupts:
+items:
+  - description: RX interrupt
+  - description: TX interrupt
+
+  interrupt-names:
+items:
+  - const: korina_rx
+  - const: korina_tx
+
+  clocks:
+maxItems: 1
+
+  clock-names:
+items:
+  - const: mdioclk
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - interrupts
+  - interrupt-names
+
+additionalProperties: false
+
+examples:
+  - |
+
+ethernet@6 {
+compatible = "idt,3243x-emac";
+
+reg = <0x6 0x1>,
+  <0x4 0x14>,
+  <0x40014 0x14>;
+reg-names = "korina_regs",
+"korina_dma_rx",
+"korina_dma_tx";
+
+interrupts-extended = < 0>, < 1>;
+interrupt-names = "korina_rx", "korina_tx";
+
+clocks = <>;
+clock-names = "mdioclk";
+};
-- 
2.29.2

[PATCH v5 net-next 07/10] net: korina: Add support for device tree

2021-04-16 Thread Thomas Bogendoerfer

If there is no mac address passed via platform data try to get it via
device tree and fall back to a random mac address, if all fail.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index d6dbbdd43d7c..cd078a5c679b 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -43,6 +43,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -1068,7 +1070,12 @@ static int korina_probe(struct platform_device *pdev)
SET_NETDEV_DEV(dev, >dev);
lp = netdev_priv(dev);
 
-   memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
+   if (mac_addr) {
+   ether_addr_copy(dev->dev_addr, mac_addr);
+   } else {
+   if (of_get_mac_address(pdev->dev.of_node, dev->dev_addr))
+   eth_hw_addr_random(dev);
+   }
 
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
@@ -1148,8 +1155,21 @@ static int korina_remove(struct platform_device *pdev)
return 0;
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id korina_match[] = {
+   {
+   .compatible = "idt,3243x-emac",
+   },
+   { }
+};
+MODULE_DEVICE_TABLE(of, korina_match);
+#endif
+
 static struct platform_driver korina_driver = {
-   .driver.name = "korina",
+   .driver = {
+   .name = "korina",
+   .of_match_table = of_match_ptr(korina_match),
+   },
.probe = korina_probe,
.remove = korina_remove,
 };
-- 
2.29.2

[PATCH v5 net-next 06/10] net: korina: Only pass mac address via platform data

2021-04-16 Thread Thomas Bogendoerfer

Get rid of access to struct korina_device by just passing the mac
address via platform data and use drvdata for passing netdev to remove
function.

Signed-off-by: Thomas Bogendoerfer 
---
 arch/mips/rb532/devices.c |  5 +++--
 drivers/net/ethernet/korina.c | 11 ++-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/mips/rb532/devices.c b/arch/mips/rb532/devices.c
index dd34f1b32b79..5fc3c8ee4f31 100644
--- a/arch/mips/rb532/devices.c
+++ b/arch/mips/rb532/devices.c
@@ -105,6 +105,9 @@ static struct platform_device korina_dev0 = {
.name = "korina",
.resource = korina_dev0_res,
.num_resources = ARRAY_SIZE(korina_dev0_res),
+   .dev = {
+   .platform_data = _dev0_data.mac,
+   }
 };
 
 static struct resource cf_slot0_res[] = {
@@ -299,8 +302,6 @@ static int __init plat_setup_devices(void)
/* set the uart clock to the current cpu frequency */
rb532_uart_res[0].uartclk = idt_cpu_freq;
 
-   dev_set_drvdata(_dev0.dev, _dev0_data);
-
gpiod_add_lookup_table(_slot0_gpio_table);
return platform_add_devices(rb532_devs, ARRAY_SIZE(rb532_devs));
 }
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 44fad9e924ca..d6dbbdd43d7c 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -1055,7 +1055,7 @@ static const struct net_device_ops korina_netdev_ops = {
 
 static int korina_probe(struct platform_device *pdev)
 {
-   struct korina_device *bif = platform_get_drvdata(pdev);
+   u8 *mac_addr = dev_get_platdata(>dev);
struct korina_private *lp;
struct net_device *dev;
void __iomem *p;
@@ -1068,8 +1068,7 @@ static int korina_probe(struct platform_device *pdev)
SET_NETDEV_DEV(dev, >dev);
lp = netdev_priv(dev);
 
-   bif->dev = dev;
-   memcpy(dev->dev_addr, bif->mac, ETH_ALEN);
+   memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
 
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
@@ -1123,6 +1122,8 @@ static int korina_probe(struct platform_device *pdev)
lp->mii_if.phy_id_mask = 0x1f;
lp->mii_if.reg_num_mask = 0x1f;
 
+   platform_set_drvdata(pdev, dev);
+
rc = register_netdev(dev);
if (rc < 0) {
printk(KERN_ERR DRV_NAME
@@ -1140,9 +1141,9 @@ static int korina_probe(struct platform_device *pdev)
 
 static int korina_remove(struct platform_device *pdev)
 {
-   struct korina_device *bif = platform_get_drvdata(pdev);
+   struct net_device *dev = platform_get_drvdata(pdev);
 
-   unregister_netdev(bif->dev);
+   unregister_netdev(dev);
 
return 0;
 }
-- 
2.29.2

[PATCH v5 net-next 08/10] net: korina: Get mdio input clock via common clock framework

2021-04-16 Thread Thomas Bogendoerfer

With device tree clock is provided via CCF. For non device tree
use a maximum clock value to not overclock the PHY. The non device
tree usage will go away after platform is converted to DT.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index cd078a5c679b..e95c8d87d893 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -57,14 +57,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#include 
-#include 
 #include 
 #include 
 
@@ -146,10 +145,9 @@ struct korina_private {
struct work_struct restart_task;
struct net_device *dev;
struct device *dmadev;
+   int mii_clock_freq;
 };
 
-extern unsigned int idt_cpu_freq;
-
 static dma_addr_t korina_tx_dma(struct korina_private *lp, int idx)
 {
return lp->td_dma + (idx * sizeof(struct dma_desc));
@@ -899,8 +897,8 @@ static int korina_init(struct net_device *dev)
 
/* Management Clock Prescaler Divisor
 * Clock independent setting */
-   writel(((idt_cpu_freq) / MII_CLOCK + 1) & ~1,
-   >eth_regs->ethmcp);
+   writel(((lp->mii_clock_freq) / MII_CLOCK + 1) & ~1,
+  >eth_regs->ethmcp);
writel(0, >eth_regs->miimcfg);
 
/* don't transmit until fifo contains 48b */
@@ -1060,6 +1058,7 @@ static int korina_probe(struct platform_device *pdev)
u8 *mac_addr = dev_get_platdata(>dev);
struct korina_private *lp;
struct net_device *dev;
+   struct clk *clk;
void __iomem *p;
int rc;
 
@@ -1077,6 +1076,16 @@ static int korina_probe(struct platform_device *pdev)
eth_hw_addr_random(dev);
}
 
+   clk = devm_clk_get_optional(>dev, "mdioclk");
+   if (IS_ERR(clk))
+   return PTR_ERR(clk);
+   if (clk) {
+   clk_prepare_enable(clk);
+   lp->mii_clock_freq = clk_get_rate(clk);
+   } else {
+   lp->mii_clock_freq = 2; /* max possible input clk */
+   }
+
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
 
-- 
2.29.2

[PATCH v5 net-next 05/10] net: korina: Use DMA API

2021-04-16 Thread Thomas Bogendoerfer

Instead of messing with MIPS specific macros use DMA API for mapping
descriptors and skbs.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 158 +-
 1 file changed, 98 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 955fe3d3da06..44fad9e924ca 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -110,10 +110,15 @@ struct korina_private {
struct dma_reg __iomem *tx_dma_regs;
struct dma_desc *td_ring; /* transmit descriptor ring */
struct dma_desc *rd_ring; /* receive descriptor ring  */
+   dma_addr_t td_dma;
+   dma_addr_t rd_dma;
 
struct sk_buff *tx_skb[KORINA_NUM_TDS];
struct sk_buff *rx_skb[KORINA_NUM_RDS];
 
+   dma_addr_t rx_skb_dma[KORINA_NUM_RDS];
+   dma_addr_t tx_skb_dma[KORINA_NUM_TDS];
+
int rx_next_done;
int rx_chain_head;
int rx_chain_tail;
@@ -138,10 +143,21 @@ struct korina_private {
struct mii_if_info mii_if;
struct work_struct restart_task;
struct net_device *dev;
+   struct device *dmadev;
 };
 
 extern unsigned int idt_cpu_freq;
 
+static dma_addr_t korina_tx_dma(struct korina_private *lp, int idx)
+{
+   return lp->td_dma + (idx * sizeof(struct dma_desc));
+}
+
+static dma_addr_t korina_rx_dma(struct korina_private *lp, int idx)
+{
+   return lp->rd_dma + (idx * sizeof(struct dma_desc));
+}
+
 static inline void korina_abort_dma(struct net_device *dev,
struct dma_reg *ch)
 {
@@ -176,14 +192,17 @@ static void korina_abort_rx(struct net_device *dev)
 static int korina_send_packet(struct sk_buff *skb, struct net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
-   unsigned long flags;
-   u32 length;
u32 chain_prev, chain_next;
+   unsigned long flags;
struct dma_desc *td;
+   dma_addr_t ca;
+   u32 length;
+   int idx;
 
spin_lock_irqsave(>lock, flags);
 
-   td = >td_ring[lp->tx_chain_tail];
+   idx = lp->tx_chain_tail;
+   td = >td_ring[idx];
 
/* stop queue when full, drop pkts if queue already full */
if (lp->tx_count >= (KORINA_NUM_TDS - 2)) {
@@ -191,26 +210,26 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
 
if (lp->tx_count == (KORINA_NUM_TDS - 2))
netif_stop_queue(dev);
-   else {
-   dev->stats.tx_dropped++;
-   dev_kfree_skb_any(skb);
-   spin_unlock_irqrestore(>lock, flags);
-
-   return NETDEV_TX_OK;
-   }
+   else
+   goto drop_packet;
}
 
lp->tx_count++;
 
-   lp->tx_skb[lp->tx_chain_tail] = skb;
+   lp->tx_skb[idx] = skb;
 
length = skb->len;
-   dma_cache_wback((u32)skb->data, skb->len);
 
/* Setup the transmit descriptor. */
-   td->ca = CPHYSADDR(skb->data);
-   chain_prev = (lp->tx_chain_tail - 1) & KORINA_TDS_MASK;
-   chain_next = (lp->tx_chain_tail + 1) & KORINA_TDS_MASK;
+   ca = dma_map_single(lp->dmadev, skb->data, length, DMA_TO_DEVICE);
+   if (dma_mapping_error(lp->dmadev, ca))
+   goto drop_packet;
+
+   lp->tx_skb_dma[idx] = ca;
+   td->ca = ca;
+
+   chain_prev = (idx - 1) & KORINA_TDS_MASK;
+   chain_next = (idx + 1) & KORINA_TDS_MASK;
 
if (readl(&(lp->tx_dma_regs->dmandptr)) == 0) {
if (lp->tx_chain_status == desc_empty) {
@@ -220,8 +239,8 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
/* Move tail */
lp->tx_chain_tail = chain_next;
/* Write to NDPTR */
-   writel(CPHYSADDR(>td_ring[lp->tx_chain_head]),
-   >tx_dma_regs->dmandptr);
+   writel(korina_tx_dma(lp, lp->tx_chain_head),
+  >tx_dma_regs->dmandptr);
/* Move head to tail */
lp->tx_chain_head = lp->tx_chain_tail;
} else {
@@ -232,12 +251,12 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
lp->td_ring[chain_prev].control &=
~DMA_DESC_COF;
/* Link to prev */
-   lp->td_ring[chain_prev].link =  CPHYSADDR(td);
+   lp->td_ring[chain_prev].link = korina_tx_dma(lp, idx);
/* Move tail */
lp->tx_chain_tail = chain_next;

[PATCH v5 net-next 04/10] net: korina: Remove nested helpers

2021-04-16 Thread Thomas Bogendoerfer

Remove helpers, which are only used in one call site.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 28 +++-
 1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index c7abb4a8dd37..955fe3d3da06 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -142,12 +142,6 @@ struct korina_private {
 
 extern unsigned int idt_cpu_freq;
 
-static inline void korina_start_dma(struct dma_reg *ch, u32 dma_addr)
-{
-   writel(0, >dmandptr);
-   writel(dma_addr, >dmadptr);
-}
-
 static inline void korina_abort_dma(struct net_device *dev,
struct dma_reg *ch)
 {
@@ -164,11 +158,6 @@ static inline void korina_abort_dma(struct net_device *dev,
writel(0, >dmandptr);
 }
 
-static inline void korina_chain_dma(struct dma_reg *ch, u32 dma_addr)
-{
-   writel(dma_addr, >dmandptr);
-}
-
 static void korina_abort_tx(struct net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
@@ -183,18 +172,6 @@ static void korina_abort_rx(struct net_device *dev)
korina_abort_dma(dev, lp->rx_dma_regs);
 }
 
-static void korina_start_rx(struct korina_private *lp,
-   struct dma_desc *rd)
-{
-   korina_start_dma(lp->rx_dma_regs, CPHYSADDR(rd));
-}
-
-static void korina_chain_rx(struct korina_private *lp,
-   struct dma_desc *rd)
-{
-   korina_chain_dma(lp->rx_dma_regs, CPHYSADDR(rd));
-}
-
 /* transmit packet */
 static int korina_send_packet(struct sk_buff *skb, struct net_device *dev)
 {
@@ -463,7 +440,7 @@ static int korina_rx(struct net_device *dev, int limit)
rd->devcs = 0;
skb = lp->rx_skb[lp->rx_next_done];
rd->ca = CPHYSADDR(skb->data);
-   korina_chain_rx(lp, rd);
+   writel(CPHYSADDR(rd), >rx_dma_regs->dmandptr);
}
 
return count;
@@ -840,7 +817,8 @@ static int korina_init(struct net_device *dev)
 
writel(0, >rx_dma_regs->dmas);
/* Start Rx DMA */
-   korina_start_rx(lp, >rd_ring[0]);
+   writel(0, >rx_dma_regs->dmandptr);
+   writel(CPHYSADDR(>rd_ring[0]), >rx_dma_regs->dmadptr);
 
writel(readl(>tx_dma_regs->dmasm) &
~(DMA_STAT_FINI | DMA_STAT_ERR),
-- 
2.29.2

[PATCH v5 net-next 02/10] net: korina: Use devres functions

2021-04-16 Thread Thomas Bogendoerfer

Simplify probe/remove code by using devm_ functions.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 64 ---
 1 file changed, 21 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 1b7e1c75ed9e..b56de01f6bb8 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -105,9 +105,9 @@ enum chain_status {
 
 /* Information that need to be kept for each board. */
 struct korina_private {
-   struct eth_regs *eth_regs;
-   struct dma_reg *rx_dma_regs;
-   struct dma_reg *tx_dma_regs;
+   struct eth_regs __iomem *eth_regs;
+   struct dma_reg __iomem *rx_dma_regs;
+   struct dma_reg __iomem *tx_dma_regs;
struct dma_desc *td_ring; /* transmit descriptor ring */
struct dma_desc *rd_ring; /* receive descriptor ring  */
 
@@ -1044,10 +1044,10 @@ static int korina_probe(struct platform_device *pdev)
struct korina_device *bif = platform_get_drvdata(pdev);
struct korina_private *lp;
struct net_device *dev;
-   struct resource *r;
+   void __iomem *p;
int rc;
 
-   dev = alloc_etherdev(sizeof(struct korina_private));
+   dev = devm_alloc_etherdev(>dev, sizeof(struct korina_private));
if (!dev)
return -ENOMEM;
 
@@ -1060,36 +1060,30 @@ static int korina_probe(struct platform_device *pdev)
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_regs");
-   dev->base_addr = r->start;
-   lp->eth_regs = ioremap(r->start, resource_size(r));
-   if (!lp->eth_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_regs");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap registers\n");
-   rc = -ENXIO;
-   goto probe_err_out;
+   return -ENOMEM;
}
+   lp->eth_regs = p;
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_dma_rx");
-   lp->rx_dma_regs = ioremap(r->start, resource_size(r));
-   if (!lp->rx_dma_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_dma_rx");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap Rx DMA registers\n");
-   rc = -ENXIO;
-   goto probe_err_dma_rx;
+   return -ENOMEM;
}
+   lp->rx_dma_regs = p;
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_dma_tx");
-   lp->tx_dma_regs = ioremap(r->start, resource_size(r));
-   if (!lp->tx_dma_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_dma_tx");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap Tx DMA registers\n");
-   rc = -ENXIO;
-   goto probe_err_dma_tx;
+   return -ENOMEM;
}
+   lp->tx_dma_regs = p;
 
lp->td_ring = kmalloc(TD_RING_SIZE + RD_RING_SIZE, GFP_KERNEL);
-   if (!lp->td_ring) {
-   rc = -ENXIO;
-   goto probe_err_td_ring;
-   }
+   if (!lp->td_ring)
+   return -ENOMEM;
 
dma_cache_inv((unsigned long)(lp->td_ring),
TD_RING_SIZE + RD_RING_SIZE);
@@ -1119,7 +1113,8 @@ static int korina_probe(struct platform_device *pdev)
if (rc < 0) {
printk(KERN_ERR DRV_NAME
": cannot register net device: %d\n", rc);
-   goto probe_err_register;
+   kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
+   return rc;
}
timer_setup(>media_check_timer, korina_poll_media, 0);
 
@@ -1127,20 +1122,7 @@ static int korina_probe(struct platform_device *pdev)
 
printk(KERN_INFO "%s: " DRV_NAME "-" DRV_VERSION " " DRV_RELDATE "\n",
dev->name);
-out:
return rc;
-
-probe_err_register:
-   kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
-probe_err_td_ring:
-   iounmap(lp->tx_dma_regs);
-probe_err_dma_tx:
-   iounmap(lp->rx_dma_regs);
-probe_err_dma_rx:
-   iounmap(lp->eth_regs);
-probe_err_out:
-   free_netdev(dev);
-   goto out;
 }
 
 static int korina_remove(struct platform_device *pdev)
@@ -1148,13 +1130,9 @@ static int korina_remove(struct platform_device *pdev)
struct korina_device *bif = platform_get_drvdata(pdev);
struct korina_private *lp = netdev_priv(bif->dev);
 
-   iounmap(lp->eth_regs);
-   iounmap(lp->rx_dma_regs);
-   iounmap(lp->tx_dma_regs);
kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
 
unregister_netdev(bif->dev);
-   free_netdev(bif->dev);
 
return 0;
 }
-- 
2.29.2

[PATCH v5 net-next 03/10] net: korina: Remove not needed cache flushes

2021-04-16 Thread Thomas Bogendoerfer

Descriptors are mapped uncached so there is no need to do any cache
handling for them.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index b56de01f6bb8..c7abb4a8dd37 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -231,7 +231,6 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
dma_cache_wback((u32)skb->data, skb->len);
 
/* Setup the transmit descriptor. */
-   dma_cache_inv((u32) td, sizeof(*td));
td->ca = CPHYSADDR(skb->data);
chain_prev = (lp->tx_chain_tail - 1) & KORINA_TDS_MASK;
chain_next = (lp->tx_chain_tail + 1) & KORINA_TDS_MASK;
@@ -284,7 +283,6 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
lp->tx_chain_tail = chain_next;
}
}
-   dma_cache_wback((u32) td, sizeof(*td));
 
netif_trans_update(dev);
spin_unlock_irqrestore(>lock, flags);
@@ -373,8 +371,6 @@ static int korina_rx(struct net_device *dev, int limit)
u32 devcs, pkt_len, dmas;
int count;
 
-   dma_cache_inv((u32)rd, sizeof(*rd));
-
for (count = 0; count < limit; count++) {
skb = lp->rx_skb[lp->rx_next_done];
skb_new = NULL;
@@ -453,7 +449,6 @@ static int korina_rx(struct net_device *dev, int limit)
~DMA_DESC_COD;
 
lp->rx_next_done = (lp->rx_next_done + 1) & KORINA_RDS_MASK;
-   dma_cache_wback((u32)rd, sizeof(*rd));
rd = >rd_ring[lp->rx_next_done];
writel(~DMA_STAT_DONE, >rx_dma_regs->dmas);
}
@@ -468,7 +463,6 @@ static int korina_rx(struct net_device *dev, int limit)
rd->devcs = 0;
skb = lp->rx_skb[lp->rx_next_done];
rd->ca = CPHYSADDR(skb->data);
-   dma_cache_wback((u32)rd, sizeof(*rd));
korina_chain_rx(lp, rd);
}
 
-- 
2.29.2

[PATCH v5 net-next 01/10] net: korina: Fix MDIO functions

2021-04-16 Thread Thomas Bogendoerfer

Fixed MDIO functions to work reliable and not just by accident.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/Kconfig  |  1 +
 drivers/net/ethernet/korina.c | 56 +++
 2 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index ad04660b97b8..c059b4bd3f23 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -98,6 +98,7 @@ config JME
 config KORINA
tristate "Korina (IDT RC32434) Ethernet support"
depends on MIKROTIK_RB532
+   select MII
help
  If you have a Mikrotik RouterBoard 500 or IDT RC32434
  based system say Y. Otherwise say N.
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 925161959b9b..1b7e1c75ed9e 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -137,7 +138,6 @@ struct korina_private {
struct mii_if_info mii_if;
struct work_struct restart_task;
struct net_device *dev;
-   int phy_addr;
 };
 
 extern unsigned int idt_cpu_freq;
@@ -292,32 +292,48 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
-static int mdio_read(struct net_device *dev, int mii_id, int reg)
+static int korina_mdio_wait(struct korina_private *lp)
+{
+   u32 value;
+
+   return readl_poll_timeout_atomic(>eth_regs->miimind,
+value, value & ETH_MII_IND_BSY,
+1, 1000);
+}
+
+static int korina_mdio_read(struct net_device *dev, int phy, int reg)
 {
struct korina_private *lp = netdev_priv(dev);
int ret;
 
-   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+   ret = korina_mdio_wait(lp);
+   if (ret < 0)
+   return ret;
 
-   writel(0, >eth_regs->miimcfg);
-   writel(0, >eth_regs->miimcmd);
-   writel(mii_id | reg, >eth_regs->miimaddr);
-   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+   writel(phy << 8 | reg, >eth_regs->miimaddr);
+   writel(1, >eth_regs->miimcmd);
 
-   ret = (int)(readl(>eth_regs->miimrdd));
+   ret = korina_mdio_wait(lp);
+   if (ret < 0)
+   return ret;
+
+   if (readl(>eth_regs->miimind) & ETH_MII_IND_NV)
+   return -EINVAL;
+
+   ret = readl(>eth_regs->miimrdd);
+   writel(0, >eth_regs->miimcmd);
return ret;
 }
 
-static void mdio_write(struct net_device *dev, int mii_id, int reg, int val)
+static void korina_mdio_write(struct net_device *dev, int phy, int reg, int 
val)
 {
struct korina_private *lp = netdev_priv(dev);
 
-   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+   if (korina_mdio_wait(lp))
+   return;
 
-   writel(0, >eth_regs->miimcfg);
-   writel(1, >eth_regs->miimcmd);
-   writel(mii_id | reg, >eth_regs->miimaddr);
-   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+   writel(0, >eth_regs->miimcmd);
+   writel(phy << 8 | reg, >eth_regs->miimaddr);
writel(val, >eth_regs->miimwtd);
 }
 
@@ -643,7 +659,7 @@ static void korina_check_media(struct net_device *dev, 
unsigned int init_media)
 {
struct korina_private *lp = netdev_priv(dev);
 
-   mii_check_media(>mii_if, 0, init_media);
+   mii_check_media(>mii_if, 1, init_media);
 
if (lp->mii_if.full_duplex)
writel(readl(>eth_regs->ethmac2) | ETH_MAC2_FD,
@@ -869,12 +885,15 @@ static int korina_init(struct net_device *dev)
 * Clock independent setting */
writel(((idt_cpu_freq) / MII_CLOCK + 1) & ~1,
>eth_regs->ethmcp);
+   writel(0, >eth_regs->miimcfg);
 
/* don't transmit until fifo contains 48b */
writel(48, >eth_regs->ethfifott);
 
writel(ETH_MAC1_RE, >eth_regs->ethmac1);
 
+   korina_check_media(dev, 1);
+
napi_enable(>napi);
netif_start_queue(dev);
 
@@ -1089,11 +1108,10 @@ static int korina_probe(struct platform_device *pdev)
dev->watchdog_timeo = TX_TIMEOUT;
netif_napi_add(dev, >napi, korina_poll, NAPI_POLL_WEIGHT);
 
-   lp->phy_addr = (((lp->rx_irq == 0x2c? 1:0) << 8) | 0x05);
lp->mii_if.dev = dev;
-   lp->mii_if.mdio_read = mdio_read;
-   lp->mii_if.mdio_write = mdio_write;
-   lp->mii_if.phy_id = lp->phy_addr;
+   lp->mii_if.mdio_read = korina_mdio_read;
+   lp->mii_if.mdio_write = korina_mdio_write;
+   lp->mii_if.phy_id = 1;
lp->mii_if.phy_id_mask = 0x1f;
lp->mii_if.reg_num_mask = 0x1f;
 
-- 
2.29.2

[PATCH v5 net-next 00/10] net: Korina improvements

2021-04-16 Thread Thomas Bogendoerfer

While converting Mikrotik RB532 support to use device tree I stumbled
over the korina ethernet driver, which used way too many MIPS specific
hacks. This series cleans this all up and adds support for device tree.

Changes in v5:
  - fixed email address in binding document, which prevented sending it

Changes in v4:
  - improve error returns in mdio_read further
  - added clock name and improved clk handling
  - fixed bindind errors

Changes in v3:
  - use readl_poll_timeout_atomic in mdio_wait
  - return -ETIMEDOUT, if mdio_wait failed
  - added DT binding and changed name to idt,3243x-emac
  - fixed usage of of_get_mac_address for net-next

Changes in v2:
  - added device tree support to get rid of idt_cpu_freq
  - fixed compile test on 64bit archs
  - fixed descriptor current address handling by storing/using mapped
dma addresses (dma controller modifies current address)

Thomas Bogendoerfer (10):
  net: korina: Fix MDIO functions
  net: korina: Use devres functions
  net: korina: Remove not needed cache flushes
  net: korina: Remove nested helpers
  net: korina: Use DMA API
  net: korina: Only pass mac address via platform data
  net: korina: Add support for device tree
  net: korina: Get mdio input clock via common clock framework
  net: korina: Make driver COMPILE_TESTable
  dt-bindings: net: korina: Add DT bindings for IDT 79RC3243x SoCs

 .../bindings/net/idt,3243x-emac.yaml  |  74 +++
 arch/mips/rb532/devices.c |   5 +-
 drivers/net/ethernet/Kconfig  |   3 +-
 drivers/net/ethernet/korina.c | 601 +-
 4 files changed, 511 insertions(+), 172 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/idt,3243x-emac.yaml

-- 
2.29.2

[PATCH v4 net-next 09/10] net: korina: Make driver COMPILE_TESTable

2021-04-16 Thread Thomas Bogendoerfer

Move structs/defines for ethernet/dma register into driver, since they
are only used for this driver and remove any MIPS specific includes.
This makes it possible to COMPILE_TEST the driver.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/Kconfig  |   2 +-
 drivers/net/ethernet/korina.c | 249 --
 2 files changed, 239 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index c059b4bd3f23..453d202a28c1 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -97,7 +97,7 @@ config JME
 
 config KORINA
tristate "Korina (IDT RC32434) Ethernet support"
-   depends on MIKROTIK_RB532
+   depends on MIKROTIK_RB532 || COMPILE_TEST
select MII
help
  If you have a Mikrotik RouterBoard 500 or IDT RC32434
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index e95c8d87d893..5f39a5bba531 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -59,18 +59,244 @@
 #include 
 #include 
 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
 #define DRV_NAME   "korina"
 #define DRV_VERSION"0.20"
 #define DRV_RELDATE"15Sep2017"
 
+struct eth_regs {
+   u32 ethintfc;
+   u32 ethfifott;
+   u32 etharc;
+   u32 ethhash0;
+   u32 ethhash1;
+   u32 ethu0[4];   /* Reserved. */
+   u32 ethpfs;
+   u32 ethmcp;
+   u32 eth_u1[10]; /* Reserved. */
+   u32 ethspare;
+   u32 eth_u2[42]; /* Reserved. */
+   u32 ethsal0;
+   u32 ethsah0;
+   u32 ethsal1;
+   u32 ethsah1;
+   u32 ethsal2;
+   u32 ethsah2;
+   u32 ethsal3;
+   u32 ethsah3;
+   u32 ethrbc;
+   u32 ethrpc;
+   u32 ethrupc;
+   u32 ethrfc;
+   u32 ethtbc;
+   u32 ethgpf;
+   u32 eth_u9[50]; /* Reserved. */
+   u32 ethmac1;
+   u32 ethmac2;
+   u32 ethipgt;
+   u32 ethipgr;
+   u32 ethclrt;
+   u32 ethmaxf;
+   u32 eth_u10;/* Reserved. */
+   u32 ethmtest;
+   u32 miimcfg;
+   u32 miimcmd;
+   u32 miimaddr;
+   u32 miimwtd;
+   u32 miimrdd;
+   u32 miimind;
+   u32 eth_u11;/* Reserved. */
+   u32 eth_u12;/* Reserved. */
+   u32 ethcfsa0;
+   u32 ethcfsa1;
+   u32 ethcfsa2;
+};
+
+/* Ethernet interrupt registers */
+#define ETH_INT_FC_EN  BIT(0)
+#define ETH_INT_FC_ITS BIT(1)
+#define ETH_INT_FC_RIP BIT(2)
+#define ETH_INT_FC_JAM BIT(3)
+#define ETH_INT_FC_OVR BIT(4)
+#define ETH_INT_FC_UND BIT(5)
+#define ETH_INT_FC_IOC 0x00c0
+
+/* Ethernet FIFO registers */
+#define ETH_FIFI_TT_TTH_BIT0
+#define ETH_FIFO_TT_TTH0x007f
+
+/* Ethernet ARC/multicast registers */
+#define ETH_ARC_PROBIT(0)
+#define ETH_ARC_AM BIT(1)
+#define ETH_ARC_AFMBIT(2)
+#define ETH_ARC_AB BIT(3)
+
+/* Ethernet SAL registers */
+#define ETH_SAL_BYTE_5 0x00ff
+#define ETH_SAL_BYTE_4 0xff00
+#define ETH_SAL_BYTE_3 0x00ff
+#define ETH_SAL_BYTE_2 0xff00
+
+/* Ethernet SAH registers */
+#define ETH_SAH_BYTE1  0x00ff
+#define ETH_SAH_BYTE0  0xff00
+
+/* Ethernet GPF register */
+#define ETH_GPF_PTV0x
+
+/* Ethernet PFG register */
+#define ETH_PFS_PFDBIT(0)
+
+/* Ethernet CFSA[0-3] registers */
+#define ETH_CFSA0_CFSA40x00ff
+#define ETH_CFSA0_CFSA50xff00
+#define ETH_CFSA1_CFSA20x00ff
+#define ETH_CFSA1_CFSA30xff00
+#define ETH_CFSA1_CFSA00x00ff
+#define ETH_CFSA1_CFSA10xff00
+
+/* Ethernet MAC1 registers */
+#define ETH_MAC1_REBIT(0)
+#define ETH_MAC1_PAF   BIT(1)
+#define ETH_MAC1_RFC   BIT(2)
+#define ETH_MAC1_TFC   BIT(3)
+#define ETH_MAC1_LBBIT(4)
+#define ETH_MAC1_MRBIT(31)
+
+/* Ethernet MAC2 registers */
+#define ETH_MAC2_FDBIT(0)
+#define ETH_MAC2_FLC   BIT(1)
+#define ETH_MAC2_HFE   BIT(2)
+#define ETH_MAC2_DCBIT(3)
+#define ETH_MAC2_CEN   BIT(4)
+#define ETH_MAC2_PEBIT(5)
+#define ETH_MAC2_VPE   BIT(6)
+#define ETH_MAC2_APE   BIT(7)
+#define ETH_MAC2_PPE   BIT(8)
+#define ETH_MAC2_LPE   BIT(9)
+#define ETH_MAC2_NBBIT(12)
+#define ETH_MAC2_BPBIT(13)
+#define ETH_MAC2_EDBIT(14)
+
+/* Ethernet IPGT register */
+#define ETH_IPGT   0x007f
+
+/* Ethernet IPGR registers */
+#define ETH_IPGR_IPGR2 0x007f
+#define ETH_IPGR_IPGR1 0x7f00
+
+/* Ethernet CLRT registers */
+#define ETH_CLRT_MAX_RET   0x000f
+#

[PATCH v4 net-next 08/10] net: korina: Get mdio input clock via common clock framework

2021-04-16 Thread Thomas Bogendoerfer

With device tree clock is provided via CCF. For non device tree
use a maximum clock value to not overclock the PHY. The non device
tree usage will go away after platform is converted to DT.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index cd078a5c679b..e95c8d87d893 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -57,14 +57,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#include 
-#include 
 #include 
 #include 
 
@@ -146,10 +145,9 @@ struct korina_private {
struct work_struct restart_task;
struct net_device *dev;
struct device *dmadev;
+   int mii_clock_freq;
 };
 
-extern unsigned int idt_cpu_freq;
-
 static dma_addr_t korina_tx_dma(struct korina_private *lp, int idx)
 {
return lp->td_dma + (idx * sizeof(struct dma_desc));
@@ -899,8 +897,8 @@ static int korina_init(struct net_device *dev)
 
/* Management Clock Prescaler Divisor
 * Clock independent setting */
-   writel(((idt_cpu_freq) / MII_CLOCK + 1) & ~1,
-   >eth_regs->ethmcp);
+   writel(((lp->mii_clock_freq) / MII_CLOCK + 1) & ~1,
+  >eth_regs->ethmcp);
writel(0, >eth_regs->miimcfg);
 
/* don't transmit until fifo contains 48b */
@@ -1060,6 +1058,7 @@ static int korina_probe(struct platform_device *pdev)
u8 *mac_addr = dev_get_platdata(>dev);
struct korina_private *lp;
struct net_device *dev;
+   struct clk *clk;
void __iomem *p;
int rc;
 
@@ -1077,6 +1076,16 @@ static int korina_probe(struct platform_device *pdev)
eth_hw_addr_random(dev);
}
 
+   clk = devm_clk_get_optional(>dev, "mdioclk");
+   if (IS_ERR(clk))
+   return PTR_ERR(clk);
+   if (clk) {
+   clk_prepare_enable(clk);
+   lp->mii_clock_freq = clk_get_rate(clk);
+   } else {
+   lp->mii_clock_freq = 2; /* max possible input clk */
+   }
+
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
 
-- 
2.29.2

[PATCH v4 net-next 07/10] net: korina: Add support for device tree

2021-04-16 Thread Thomas Bogendoerfer

If there is no mac address passed via platform data try to get it via
device tree and fall back to a random mac address, if all fail.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index d6dbbdd43d7c..cd078a5c679b 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -43,6 +43,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -1068,7 +1070,12 @@ static int korina_probe(struct platform_device *pdev)
SET_NETDEV_DEV(dev, >dev);
lp = netdev_priv(dev);
 
-   memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
+   if (mac_addr) {
+   ether_addr_copy(dev->dev_addr, mac_addr);
+   } else {
+   if (of_get_mac_address(pdev->dev.of_node, dev->dev_addr))
+   eth_hw_addr_random(dev);
+   }
 
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
@@ -1148,8 +1155,21 @@ static int korina_remove(struct platform_device *pdev)
return 0;
 }
 
+#ifdef CONFIG_OF
+static const struct of_device_id korina_match[] = {
+   {
+   .compatible = "idt,3243x-emac",
+   },
+   { }
+};
+MODULE_DEVICE_TABLE(of, korina_match);
+#endif
+
 static struct platform_driver korina_driver = {
-   .driver.name = "korina",
+   .driver = {
+   .name = "korina",
+   .of_match_table = of_match_ptr(korina_match),
+   },
.probe = korina_probe,
.remove = korina_remove,
 };
-- 
2.29.2

[PATCH v4 net-next 05/10] net: korina: Use DMA API

2021-04-16 Thread Thomas Bogendoerfer

Instead of messing with MIPS specific macros use DMA API for mapping
descriptors and skbs.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 158 +-
 1 file changed, 98 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 955fe3d3da06..44fad9e924ca 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -110,10 +110,15 @@ struct korina_private {
struct dma_reg __iomem *tx_dma_regs;
struct dma_desc *td_ring; /* transmit descriptor ring */
struct dma_desc *rd_ring; /* receive descriptor ring  */
+   dma_addr_t td_dma;
+   dma_addr_t rd_dma;
 
struct sk_buff *tx_skb[KORINA_NUM_TDS];
struct sk_buff *rx_skb[KORINA_NUM_RDS];
 
+   dma_addr_t rx_skb_dma[KORINA_NUM_RDS];
+   dma_addr_t tx_skb_dma[KORINA_NUM_TDS];
+
int rx_next_done;
int rx_chain_head;
int rx_chain_tail;
@@ -138,10 +143,21 @@ struct korina_private {
struct mii_if_info mii_if;
struct work_struct restart_task;
struct net_device *dev;
+   struct device *dmadev;
 };
 
 extern unsigned int idt_cpu_freq;
 
+static dma_addr_t korina_tx_dma(struct korina_private *lp, int idx)
+{
+   return lp->td_dma + (idx * sizeof(struct dma_desc));
+}
+
+static dma_addr_t korina_rx_dma(struct korina_private *lp, int idx)
+{
+   return lp->rd_dma + (idx * sizeof(struct dma_desc));
+}
+
 static inline void korina_abort_dma(struct net_device *dev,
struct dma_reg *ch)
 {
@@ -176,14 +192,17 @@ static void korina_abort_rx(struct net_device *dev)
 static int korina_send_packet(struct sk_buff *skb, struct net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
-   unsigned long flags;
-   u32 length;
u32 chain_prev, chain_next;
+   unsigned long flags;
struct dma_desc *td;
+   dma_addr_t ca;
+   u32 length;
+   int idx;
 
spin_lock_irqsave(>lock, flags);
 
-   td = >td_ring[lp->tx_chain_tail];
+   idx = lp->tx_chain_tail;
+   td = >td_ring[idx];
 
/* stop queue when full, drop pkts if queue already full */
if (lp->tx_count >= (KORINA_NUM_TDS - 2)) {
@@ -191,26 +210,26 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
 
if (lp->tx_count == (KORINA_NUM_TDS - 2))
netif_stop_queue(dev);
-   else {
-   dev->stats.tx_dropped++;
-   dev_kfree_skb_any(skb);
-   spin_unlock_irqrestore(>lock, flags);
-
-   return NETDEV_TX_OK;
-   }
+   else
+   goto drop_packet;
}
 
lp->tx_count++;
 
-   lp->tx_skb[lp->tx_chain_tail] = skb;
+   lp->tx_skb[idx] = skb;
 
length = skb->len;
-   dma_cache_wback((u32)skb->data, skb->len);
 
/* Setup the transmit descriptor. */
-   td->ca = CPHYSADDR(skb->data);
-   chain_prev = (lp->tx_chain_tail - 1) & KORINA_TDS_MASK;
-   chain_next = (lp->tx_chain_tail + 1) & KORINA_TDS_MASK;
+   ca = dma_map_single(lp->dmadev, skb->data, length, DMA_TO_DEVICE);
+   if (dma_mapping_error(lp->dmadev, ca))
+   goto drop_packet;
+
+   lp->tx_skb_dma[idx] = ca;
+   td->ca = ca;
+
+   chain_prev = (idx - 1) & KORINA_TDS_MASK;
+   chain_next = (idx + 1) & KORINA_TDS_MASK;
 
if (readl(&(lp->tx_dma_regs->dmandptr)) == 0) {
if (lp->tx_chain_status == desc_empty) {
@@ -220,8 +239,8 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
/* Move tail */
lp->tx_chain_tail = chain_next;
/* Write to NDPTR */
-   writel(CPHYSADDR(>td_ring[lp->tx_chain_head]),
-   >tx_dma_regs->dmandptr);
+   writel(korina_tx_dma(lp, lp->tx_chain_head),
+  >tx_dma_regs->dmandptr);
/* Move head to tail */
lp->tx_chain_head = lp->tx_chain_tail;
} else {
@@ -232,12 +251,12 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
lp->td_ring[chain_prev].control &=
~DMA_DESC_COF;
/* Link to prev */
-   lp->td_ring[chain_prev].link =  CPHYSADDR(td);
+   lp->td_ring[chain_prev].link = korina_tx_dma(lp, idx);
/* Move tail */
lp->tx_chain_tail = chain_next;

[PATCH v4 net-next 06/10] net: korina: Only pass mac address via platform data

2021-04-16 Thread Thomas Bogendoerfer

Get rid of access to struct korina_device by just passing the mac
address via platform data and use drvdata for passing netdev to remove
function.

Signed-off-by: Thomas Bogendoerfer 
---
 arch/mips/rb532/devices.c |  5 +++--
 drivers/net/ethernet/korina.c | 11 ++-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/mips/rb532/devices.c b/arch/mips/rb532/devices.c
index dd34f1b32b79..5fc3c8ee4f31 100644
--- a/arch/mips/rb532/devices.c
+++ b/arch/mips/rb532/devices.c
@@ -105,6 +105,9 @@ static struct platform_device korina_dev0 = {
.name = "korina",
.resource = korina_dev0_res,
.num_resources = ARRAY_SIZE(korina_dev0_res),
+   .dev = {
+   .platform_data = _dev0_data.mac,
+   }
 };
 
 static struct resource cf_slot0_res[] = {
@@ -299,8 +302,6 @@ static int __init plat_setup_devices(void)
/* set the uart clock to the current cpu frequency */
rb532_uart_res[0].uartclk = idt_cpu_freq;
 
-   dev_set_drvdata(_dev0.dev, _dev0_data);
-
gpiod_add_lookup_table(_slot0_gpio_table);
return platform_add_devices(rb532_devs, ARRAY_SIZE(rb532_devs));
 }
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 44fad9e924ca..d6dbbdd43d7c 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -1055,7 +1055,7 @@ static const struct net_device_ops korina_netdev_ops = {
 
 static int korina_probe(struct platform_device *pdev)
 {
-   struct korina_device *bif = platform_get_drvdata(pdev);
+   u8 *mac_addr = dev_get_platdata(>dev);
struct korina_private *lp;
struct net_device *dev;
void __iomem *p;
@@ -1068,8 +1068,7 @@ static int korina_probe(struct platform_device *pdev)
SET_NETDEV_DEV(dev, >dev);
lp = netdev_priv(dev);
 
-   bif->dev = dev;
-   memcpy(dev->dev_addr, bif->mac, ETH_ALEN);
+   memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
 
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
@@ -1123,6 +1122,8 @@ static int korina_probe(struct platform_device *pdev)
lp->mii_if.phy_id_mask = 0x1f;
lp->mii_if.reg_num_mask = 0x1f;
 
+   platform_set_drvdata(pdev, dev);
+
rc = register_netdev(dev);
if (rc < 0) {
printk(KERN_ERR DRV_NAME
@@ -1140,9 +1141,9 @@ static int korina_probe(struct platform_device *pdev)
 
 static int korina_remove(struct platform_device *pdev)
 {
-   struct korina_device *bif = platform_get_drvdata(pdev);
+   struct net_device *dev = platform_get_drvdata(pdev);
 
-   unregister_netdev(bif->dev);
+   unregister_netdev(dev);
 
return 0;
 }
-- 
2.29.2

[PATCH v4 net-next 04/10] net: korina: Remove nested helpers

2021-04-16 Thread Thomas Bogendoerfer

Remove helpers, which are only used in one call site.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 28 +++-
 1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index c7abb4a8dd37..955fe3d3da06 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -142,12 +142,6 @@ struct korina_private {
 
 extern unsigned int idt_cpu_freq;
 
-static inline void korina_start_dma(struct dma_reg *ch, u32 dma_addr)
-{
-   writel(0, >dmandptr);
-   writel(dma_addr, >dmadptr);
-}
-
 static inline void korina_abort_dma(struct net_device *dev,
struct dma_reg *ch)
 {
@@ -164,11 +158,6 @@ static inline void korina_abort_dma(struct net_device *dev,
writel(0, >dmandptr);
 }
 
-static inline void korina_chain_dma(struct dma_reg *ch, u32 dma_addr)
-{
-   writel(dma_addr, >dmandptr);
-}
-
 static void korina_abort_tx(struct net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
@@ -183,18 +172,6 @@ static void korina_abort_rx(struct net_device *dev)
korina_abort_dma(dev, lp->rx_dma_regs);
 }
 
-static void korina_start_rx(struct korina_private *lp,
-   struct dma_desc *rd)
-{
-   korina_start_dma(lp->rx_dma_regs, CPHYSADDR(rd));
-}
-
-static void korina_chain_rx(struct korina_private *lp,
-   struct dma_desc *rd)
-{
-   korina_chain_dma(lp->rx_dma_regs, CPHYSADDR(rd));
-}
-
 /* transmit packet */
 static int korina_send_packet(struct sk_buff *skb, struct net_device *dev)
 {
@@ -463,7 +440,7 @@ static int korina_rx(struct net_device *dev, int limit)
rd->devcs = 0;
skb = lp->rx_skb[lp->rx_next_done];
rd->ca = CPHYSADDR(skb->data);
-   korina_chain_rx(lp, rd);
+   writel(CPHYSADDR(rd), >rx_dma_regs->dmandptr);
}
 
return count;
@@ -840,7 +817,8 @@ static int korina_init(struct net_device *dev)
 
writel(0, >rx_dma_regs->dmas);
/* Start Rx DMA */
-   korina_start_rx(lp, >rd_ring[0]);
+   writel(0, >rx_dma_regs->dmandptr);
+   writel(CPHYSADDR(>rd_ring[0]), >rx_dma_regs->dmadptr);
 
writel(readl(>tx_dma_regs->dmasm) &
~(DMA_STAT_FINI | DMA_STAT_ERR),
-- 
2.29.2

[PATCH v4 net-next 02/10] net: korina: Use devres functions

2021-04-16 Thread Thomas Bogendoerfer

Simplify probe/remove code by using devm_ functions.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 64 ---
 1 file changed, 21 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 1b7e1c75ed9e..b56de01f6bb8 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -105,9 +105,9 @@ enum chain_status {
 
 /* Information that need to be kept for each board. */
 struct korina_private {
-   struct eth_regs *eth_regs;
-   struct dma_reg *rx_dma_regs;
-   struct dma_reg *tx_dma_regs;
+   struct eth_regs __iomem *eth_regs;
+   struct dma_reg __iomem *rx_dma_regs;
+   struct dma_reg __iomem *tx_dma_regs;
struct dma_desc *td_ring; /* transmit descriptor ring */
struct dma_desc *rd_ring; /* receive descriptor ring  */
 
@@ -1044,10 +1044,10 @@ static int korina_probe(struct platform_device *pdev)
struct korina_device *bif = platform_get_drvdata(pdev);
struct korina_private *lp;
struct net_device *dev;
-   struct resource *r;
+   void __iomem *p;
int rc;
 
-   dev = alloc_etherdev(sizeof(struct korina_private));
+   dev = devm_alloc_etherdev(>dev, sizeof(struct korina_private));
if (!dev)
return -ENOMEM;
 
@@ -1060,36 +1060,30 @@ static int korina_probe(struct platform_device *pdev)
lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_regs");
-   dev->base_addr = r->start;
-   lp->eth_regs = ioremap(r->start, resource_size(r));
-   if (!lp->eth_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_regs");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap registers\n");
-   rc = -ENXIO;
-   goto probe_err_out;
+   return -ENOMEM;
}
+   lp->eth_regs = p;
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_dma_rx");
-   lp->rx_dma_regs = ioremap(r->start, resource_size(r));
-   if (!lp->rx_dma_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_dma_rx");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap Rx DMA registers\n");
-   rc = -ENXIO;
-   goto probe_err_dma_rx;
+   return -ENOMEM;
}
+   lp->rx_dma_regs = p;
 
-   r = platform_get_resource_byname(pdev, IORESOURCE_MEM, "korina_dma_tx");
-   lp->tx_dma_regs = ioremap(r->start, resource_size(r));
-   if (!lp->tx_dma_regs) {
+   p = devm_platform_ioremap_resource_byname(pdev, "korina_dma_tx");
+   if (!p) {
printk(KERN_ERR DRV_NAME ": cannot remap Tx DMA registers\n");
-   rc = -ENXIO;
-   goto probe_err_dma_tx;
+   return -ENOMEM;
}
+   lp->tx_dma_regs = p;
 
lp->td_ring = kmalloc(TD_RING_SIZE + RD_RING_SIZE, GFP_KERNEL);
-   if (!lp->td_ring) {
-   rc = -ENXIO;
-   goto probe_err_td_ring;
-   }
+   if (!lp->td_ring)
+   return -ENOMEM;
 
dma_cache_inv((unsigned long)(lp->td_ring),
TD_RING_SIZE + RD_RING_SIZE);
@@ -1119,7 +1113,8 @@ static int korina_probe(struct platform_device *pdev)
if (rc < 0) {
printk(KERN_ERR DRV_NAME
": cannot register net device: %d\n", rc);
-   goto probe_err_register;
+   kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
+   return rc;
}
timer_setup(>media_check_timer, korina_poll_media, 0);
 
@@ -1127,20 +1122,7 @@ static int korina_probe(struct platform_device *pdev)
 
printk(KERN_INFO "%s: " DRV_NAME "-" DRV_VERSION " " DRV_RELDATE "\n",
dev->name);
-out:
return rc;
-
-probe_err_register:
-   kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
-probe_err_td_ring:
-   iounmap(lp->tx_dma_regs);
-probe_err_dma_tx:
-   iounmap(lp->rx_dma_regs);
-probe_err_dma_rx:
-   iounmap(lp->eth_regs);
-probe_err_out:
-   free_netdev(dev);
-   goto out;
 }
 
 static int korina_remove(struct platform_device *pdev)
@@ -1148,13 +1130,9 @@ static int korina_remove(struct platform_device *pdev)
struct korina_device *bif = platform_get_drvdata(pdev);
struct korina_private *lp = netdev_priv(bif->dev);
 
-   iounmap(lp->eth_regs);
-   iounmap(lp->rx_dma_regs);
-   iounmap(lp->tx_dma_regs);
kfree((struct dma_desc *)KSEG0ADDR(lp->td_ring));
 
unregister_netdev(bif->dev);
-   free_netdev(bif->dev);
 
return 0;
 }
-- 
2.29.2

[PATCH v4 net-next 03/10] net: korina: Remove not needed cache flushes

2021-04-16 Thread Thomas Bogendoerfer

Descriptors are mapped uncached so there is no need to do any cache
handling for them.

Reviewed-by: Andrew Lunn 
Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/korina.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index b56de01f6bb8..c7abb4a8dd37 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -231,7 +231,6 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
dma_cache_wback((u32)skb->data, skb->len);
 
/* Setup the transmit descriptor. */
-   dma_cache_inv((u32) td, sizeof(*td));
td->ca = CPHYSADDR(skb->data);
chain_prev = (lp->tx_chain_tail - 1) & KORINA_TDS_MASK;
chain_next = (lp->tx_chain_tail + 1) & KORINA_TDS_MASK;
@@ -284,7 +283,6 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
lp->tx_chain_tail = chain_next;
}
}
-   dma_cache_wback((u32) td, sizeof(*td));
 
netif_trans_update(dev);
spin_unlock_irqrestore(>lock, flags);
@@ -373,8 +371,6 @@ static int korina_rx(struct net_device *dev, int limit)
u32 devcs, pkt_len, dmas;
int count;
 
-   dma_cache_inv((u32)rd, sizeof(*rd));
-
for (count = 0; count < limit; count++) {
skb = lp->rx_skb[lp->rx_next_done];
skb_new = NULL;
@@ -453,7 +449,6 @@ static int korina_rx(struct net_device *dev, int limit)
~DMA_DESC_COD;
 
lp->rx_next_done = (lp->rx_next_done + 1) & KORINA_RDS_MASK;
-   dma_cache_wback((u32)rd, sizeof(*rd));
rd = >rd_ring[lp->rx_next_done];
writel(~DMA_STAT_DONE, >rx_dma_regs->dmas);
}
@@ -468,7 +463,6 @@ static int korina_rx(struct net_device *dev, int limit)
rd->devcs = 0;
skb = lp->rx_skb[lp->rx_next_done];
rd->ca = CPHYSADDR(skb->data);
-   dma_cache_wback((u32)rd, sizeof(*rd));
korina_chain_rx(lp, rd);
}
 
-- 
2.29.2

[PATCH v4 net-next 01/10] net: korina: Fix MDIO functions

2021-04-16 Thread Thomas Bogendoerfer

Fixed MDIO functions to work reliable and not just by accident.

Signed-off-by: Thomas Bogendoerfer 
---
 drivers/net/ethernet/Kconfig  |  1 +
 drivers/net/ethernet/korina.c | 56 +++
 2 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index ad04660b97b8..c059b4bd3f23 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -98,6 +98,7 @@ config JME
 config KORINA
tristate "Korina (IDT RC32434) Ethernet support"
depends on MIKROTIK_RB532
+   select MII
help
  If you have a Mikrotik RouterBoard 500 or IDT RC32434
  based system say Y. Otherwise say N.
diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 925161959b9b..1b7e1c75ed9e 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -137,7 +138,6 @@ struct korina_private {
struct mii_if_info mii_if;
struct work_struct restart_task;
struct net_device *dev;
-   int phy_addr;
 };
 
 extern unsigned int idt_cpu_freq;
@@ -292,32 +292,48 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
-static int mdio_read(struct net_device *dev, int mii_id, int reg)
+static int korina_mdio_wait(struct korina_private *lp)
+{
+   u32 value;
+
+   return readl_poll_timeout_atomic(>eth_regs->miimind,
+value, value & ETH_MII_IND_BSY,
+1, 1000);
+}
+
+static int korina_mdio_read(struct net_device *dev, int phy, int reg)
 {
struct korina_private *lp = netdev_priv(dev);
int ret;
 
-   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+   ret = korina_mdio_wait(lp);
+   if (ret < 0)
+   return ret;
 
-   writel(0, >eth_regs->miimcfg);
-   writel(0, >eth_regs->miimcmd);
-   writel(mii_id | reg, >eth_regs->miimaddr);
-   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+   writel(phy << 8 | reg, >eth_regs->miimaddr);
+   writel(1, >eth_regs->miimcmd);
 
-   ret = (int)(readl(>eth_regs->miimrdd));
+   ret = korina_mdio_wait(lp);
+   if (ret < 0)
+   return ret;
+
+   if (readl(>eth_regs->miimind) & ETH_MII_IND_NV)
+   return -EINVAL;
+
+   ret = readl(>eth_regs->miimrdd);
+   writel(0, >eth_regs->miimcmd);
return ret;
 }
 
-static void mdio_write(struct net_device *dev, int mii_id, int reg, int val)
+static void korina_mdio_write(struct net_device *dev, int phy, int reg, int 
val)
 {
struct korina_private *lp = netdev_priv(dev);
 
-   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+   if (korina_mdio_wait(lp))
+   return;
 
-   writel(0, >eth_regs->miimcfg);
-   writel(1, >eth_regs->miimcmd);
-   writel(mii_id | reg, >eth_regs->miimaddr);
-   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+   writel(0, >eth_regs->miimcmd);
+   writel(phy << 8 | reg, >eth_regs->miimaddr);
writel(val, >eth_regs->miimwtd);
 }
 
@@ -643,7 +659,7 @@ static void korina_check_media(struct net_device *dev, 
unsigned int init_media)
 {
struct korina_private *lp = netdev_priv(dev);
 
-   mii_check_media(>mii_if, 0, init_media);
+   mii_check_media(>mii_if, 1, init_media);
 
if (lp->mii_if.full_duplex)
writel(readl(>eth_regs->ethmac2) | ETH_MAC2_FD,
@@ -869,12 +885,15 @@ static int korina_init(struct net_device *dev)
 * Clock independent setting */
writel(((idt_cpu_freq) / MII_CLOCK + 1) & ~1,
>eth_regs->ethmcp);
+   writel(0, >eth_regs->miimcfg);
 
/* don't transmit until fifo contains 48b */
writel(48, >eth_regs->ethfifott);
 
writel(ETH_MAC1_RE, >eth_regs->ethmac1);
 
+   korina_check_media(dev, 1);
+
napi_enable(>napi);
netif_start_queue(dev);
 
@@ -1089,11 +1108,10 @@ static int korina_probe(struct platform_device *pdev)
dev->watchdog_timeo = TX_TIMEOUT;
netif_napi_add(dev, >napi, korina_poll, NAPI_POLL_WEIGHT);
 
-   lp->phy_addr = (((lp->rx_irq == 0x2c? 1:0) << 8) | 0x05);
lp->mii_if.dev = dev;
-   lp->mii_if.mdio_read = mdio_read;
-   lp->mii_if.mdio_write = mdio_write;
-   lp->mii_if.phy_id = lp->phy_addr;
+   lp->mii_if.mdio_read = korina_mdio_read;
+   lp->mii_if.mdio_write = korina_mdio_write;
+   lp->mii_if.phy_id = 1;
lp->mii_if.phy_id_mask = 0x1f;
lp->mii_if.reg_num_mask = 0x1f;
 
-- 
2.29.2

[PATCH v4 net-next 00/10] net: Korina improvements

2021-04-16 Thread Thomas Bogendoerfer

While converting Mikrotik RB532 support to use device tree I stumbled
over the korina ethernet driver, which used way too many MIPS specific
hacks. This series cleans this all up and adds support for device tree.

Changes in v4:
  - improve error returns in mdio_read further
  - added clock name and improved clk handling
  - fixed bindind errors

Changes in v3:
  - use readl_poll_timeout_atomic in mdio_wait
  - return -ETIMEDOUT, if mdio_wait failed
  - added DT binding and changed name to idt,3243x-emac
  - fixed usage of of_get_mac_address for net-next

Changes in v2:
  - added device tree support to get rid of idt_cpu_freq
  - fixed compile test on 64bit archs
  - fixed descriptor current address handling by storing/using mapped
dma addresses (dma controller modifies current address)

Thomas Bogendoerfer (10):
  net: korina: Fix MDIO functions
  net: korina: Use devres functions
  net: korina: Remove not needed cache flushes
  net: korina: Remove nested helpers
  net: korina: Use DMA API
  net: korina: Only pass mac address via platform data
  net: korina: Add support for device tree
  net: korina: Get mdio input clock via common clock framework
  net: korina: Make driver COMPILE_TESTable
  dt-bindings: net: korina: Add DT bindings for IDT 79RC3243x SoCs

 .../bindings/net/idt,3243x-emac.yaml  |  74 +++
 arch/mips/rb532/devices.c |   5 +-
 drivers/net/ethernet/Kconfig  |   3 +-
 drivers/net/ethernet/korina.c | 601 +-
 4 files changed, 511 insertions(+), 172 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/idt,3243x-emac.yaml

-- 
2.29.2

Re: [PATCH v3 net-next 07/10] net: korina: Add support for device tree

2021-04-16 Thread Thomas Bogendoerfer

On Fri, Apr 16, 2021 at 01:49:07AM +0200, Andrew Lunn wrote:
> > -   memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
> > +   if (mac_addr) {
> > +   ether_addr_copy(dev->dev_addr, mac_addr);
> > +   } else {
> > +   u8 ofmac[ETH_ALEN];
> > +
> > +   if (of_get_mac_address(pdev->dev.of_node, ofmac) == 0)
> > +   ether_addr_copy(dev->dev_addr, ofmac);
> 
> You should be able to skip the ether_addr_copy() by passing 
> dev->dev_addr directly to of_get_mac_address().

good point

> 
> > +   else
> > +   eth_hw_addr_random(dev);
> > +   }
> >  
> > lp->rx_irq = platform_get_irq_byname(pdev, "korina_rx");
> > lp->tx_irq = platform_get_irq_byname(pdev, "korina_tx");
> > @@ -1146,8 +1157,21 @@ static int korina_remove(struct platform_device 
> > *pdev)
> > return 0;
> >  }
> >  
> > +#ifdef CONFIG_OF
> > +static const struct of_device_id korina_match[] = {
> > +   {
> > +   .compatible = "idt,3243x-emac",
> 
> You need to document this compatible somewhere under 
> Documentation/devicetree/binding

checkpatch hinted to put it in an extra patch, it's patch 10 of this
series and looking at my inbox it didn't get through :-(.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 38838 matches

Mail list logo