Re: [PATCH net-next v6 5/5] tools: virtio: introduce vhost_net_test

2024-03-05 Thread Jason Wang
On Tue, Mar 5, 2024 at 5:47 PM Paolo Abeni  wrote:
>
> On Wed, 2024-02-28 at 17:30 +0800, Yunsheng Lin wrote:
> > introduce vhost_net_test for both vhost_net tx and rx basing
> > on virtio_test to test vhost_net changing in the kernel.
> >
> > Steps for vhost_net tx testing:
> > 1. Prepare a out buf.
> > 2. Kick the vhost_net to do tx processing.
> > 3. Do the receiving in the tun side.
> > 4. verify the data received by tun is correct.
> >
> > Steps for vhost_net rx testing:
> > 1. Prepare a in buf.
> > 2. Do the sending in the tun side.
> > 3. Kick the vhost_net to do rx processing.
> > 4. verify the data received by vhost_net is correct.
> >
> > Signed-off-by: Yunsheng Lin 
>
> @Jason: AFAICS this addresses the points you raised on v5, could you
> please have a look?
>
> Thanks!

Looks good to me.

Acked-by: Jason Wang 

Thanks

>
> Paolo
>
>




Re: [PATCH net-next v2 3/3] tun: AF_XDP Tx zero-copy support

2024-03-05 Thread Jason Wang
On Sat, Mar 2, 2024 at 2:40 AM Willem de Bruijn
 wrote:
>
> Maciej Fijalkowski wrote:
> > On Wed, Feb 28, 2024 at 07:05:56PM +0800, Yunjian Wang wrote:
> > > This patch set allows TUN to support the AF_XDP Tx zero-copy feature,
> > > which can significantly reduce CPU utilization for XDP programs.
> >
> > Why no Rx ZC support though? What will happen if I try rxdrop xdpsock
> > against tun with this patch? You clearly allow for that.
>
> This is AF_XDP receive zerocopy, right?
>
> The naming is always confusing with tun, but even though from a tun
> PoV this happens on ndo_start_xmit, it is the AF_XDP equivalent to
> tun_put_user.
>
> So the implementation is more like other device's Rx ZC.
>
> I would have preferred that name, but I think Jason asked for this
> and given tun's weird status, there is something bo said for either.
>

>From the the view of the AF_XDP userspace program, it's the TX path,
and as you said it happens on the TUN xmit path as well. When using
with a VM, it's the RX path.

So TX seems better.

Thanks




Re: [PATCH V4] remoteproc: qcom: q6v5: Get crash reason from specific SMEM partition

2024-03-05 Thread Vignesh Viswanathan



On 3/6/2024 9:25 AM, Bjorn Andersson wrote:
> On Wed, Dec 20, 2023 at 11:25:11AM +0530, Vignesh Viswanathan wrote:
>> q6v5 fatal and watchdog IRQ handlers always retrieves the crash reason
>> information from SMEM global partition (QCOM_SMEM_HOST_ANY).
>>
>> For some targets like IPQ9574 and IPQ5332, crash reason information is
>> present in target specific partition due to which the crash reason is
>> not printed in the current implementation.
>>
>> Add support to pass crash_reason_partition along with crash_reason_item
>> number in qcom_q6v5_init call and use the same to get the crash
>> information from SMEM in fatal and watchdog IRQ handlers.
>>
>> While at it, rename all instances of "crash_reason_smem" with
>> "crash_reason_item" as this reflects the proper meaning.
>>
>> Signed-off-by: Vignesh Viswanathan 
> 
> No concerns with the patch now, but as this depends on the mpd driver,
> which is being refactored, please resubmit this once the driver is being
> accepted.
> 
Thanks Bjorn. Will resubmit this once the refactored MPD driver posted.

>> ---
>> Changes in V4: Rename all instances of crash_reason_smem to crash_reason_item
>> Changes in V3: Updated commit message.
>> Changes in V2: Addressed comments in V1.
> 
> Please review go/upstream and start using b4.

Sure, will give b4 a try.

Thanks,
Vignesh
> 
> Regards,
> Bjorn
> 
>>
>> This patch depends on [1] which adds support for IPQ9574 and IPQ5332
>> remoteproc q5v5_mpd driver.
>>
>> [1]: 
>> https://lore.kernel.org/all/20231110091939.3025413-1-quic_mmani...@quicinc.com/
>>
>>  drivers/remoteproc/qcom_q6v5.c  | 10 +++--
>>  drivers/remoteproc/qcom_q6v5.h  |  6 ++-
>>  drivers/remoteproc/qcom_q6v5_adsp.c | 17 +
>>  drivers/remoteproc/qcom_q6v5_mpd.c  | 13 ---
>>  drivers/remoteproc/qcom_q6v5_mss.c  |  5 ++-
>>  drivers/remoteproc/qcom_q6v5_pas.c  | 59 +++--
>>  drivers/remoteproc/qcom_q6v5_wcss.c | 12 +++---
>>  7 files changed, 66 insertions(+), 56 deletions(-)
>>
>> diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c
>> index 0e32f13c196d..e4a28bf25130 100644
>> --- a/drivers/remoteproc/qcom_q6v5.c
>> +++ b/drivers/remoteproc/qcom_q6v5.c
>> @@ -100,7 +100,7 @@ static irqreturn_t q6v5_wdog_interrupt(int irq, void 
>> *data)
>>  return IRQ_HANDLED;
>>  }
>>  
>> -msg = qcom_smem_get(QCOM_SMEM_HOST_ANY, q6v5->crash_reason, );
>> +msg = qcom_smem_get(q6v5->crash_reason_partition, 
>> q6v5->crash_reason_item, );
>>  if (!IS_ERR(msg) && len > 0 && msg[0])
>>  dev_err(q6v5->dev, "watchdog received: %s\n", msg);
>>  else
>> @@ -121,7 +121,7 @@ irqreturn_t q6v5_fatal_interrupt(int irq, void *data)
>>  if (!q6v5->running)
>>  return IRQ_HANDLED;
>>  
>> -msg = qcom_smem_get(QCOM_SMEM_HOST_ANY, q6v5->crash_reason, );
>> +msg = qcom_smem_get(q6v5->crash_reason_partition, 
>> q6v5->crash_reason_item, );
>>  if (!IS_ERR(msg) && len > 0 && msg[0])
>>  dev_err(q6v5->dev, "fatal error received: %s\n", msg);
>>  else
>> @@ -279,14 +279,16 @@ EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
>>   * Return: 0 on success, negative errno on failure
>>   */
>>  int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>> -   struct rproc *rproc, int crash_reason, const char 
>> *load_state,
>> +   struct rproc *rproc, int crash_reason_partition,
>> +   int crash_reason_item, const char *load_state,
>> void (*handover)(struct qcom_q6v5 *q6v5))
>>  {
>>  int ret;
>>  
>>  q6v5->rproc = rproc;
>>  q6v5->dev = >dev;
>> -q6v5->crash_reason = crash_reason;
>> +q6v5->crash_reason_partition = crash_reason_partition;
>> +q6v5->crash_reason_item = crash_reason_item;
>>  q6v5->handover = handover;
>>  
>>  init_completion(>start_done);
>> diff --git a/drivers/remoteproc/qcom_q6v5.h b/drivers/remoteproc/qcom_q6v5.h
>> index 4e1bb1a68284..cd02372e9856 100644
>> --- a/drivers/remoteproc/qcom_q6v5.h
>> +++ b/drivers/remoteproc/qcom_q6v5.h
>> @@ -40,7 +40,8 @@ struct qcom_q6v5 {
>>  struct completion stop_done;
>>  struct completion spawn_done;
>>  
>> -int crash_reason;
>> +int crash_reason_partition;
>> +int crash_reason_item;
>>  
>>  bool running;
>>  
>> @@ -49,7 +50,8 @@ struct qcom_q6v5 {
>>  };
>>  
>>  int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
>> -   struct rproc *rproc, int crash_reason, const char 
>> *load_state,
>> +   struct rproc *rproc, int crash_reason_partition,
>> +   int crash_reason_item, const char *load_state,
>> void (*handover)(struct qcom_q6v5 *q6v5));
>>  void qcom_q6v5_deinit(struct qcom_q6v5 *q6v5);
>>  
>> diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
>> b/drivers/remoteproc/qcom_q6v5_adsp.c
>> index 6c67514cc493..055764aa201c 100644
>> --- a/drivers/remoteproc/qcom_q6v5_adsp.c
>> +++ 

Re: [PATCH V4] remoteproc: qcom: q6v5: Get crash reason from specific SMEM partition

2024-03-05 Thread Bjorn Andersson
On Wed, Dec 20, 2023 at 11:25:11AM +0530, Vignesh Viswanathan wrote:
> q6v5 fatal and watchdog IRQ handlers always retrieves the crash reason
> information from SMEM global partition (QCOM_SMEM_HOST_ANY).
> 
> For some targets like IPQ9574 and IPQ5332, crash reason information is
> present in target specific partition due to which the crash reason is
> not printed in the current implementation.
> 
> Add support to pass crash_reason_partition along with crash_reason_item
> number in qcom_q6v5_init call and use the same to get the crash
> information from SMEM in fatal and watchdog IRQ handlers.
> 
> While at it, rename all instances of "crash_reason_smem" with
> "crash_reason_item" as this reflects the proper meaning.
> 
> Signed-off-by: Vignesh Viswanathan 

No concerns with the patch now, but as this depends on the mpd driver,
which is being refactored, please resubmit this once the driver is being
accepted.

> ---
> Changes in V4: Rename all instances of crash_reason_smem to crash_reason_item
> Changes in V3: Updated commit message.
> Changes in V2: Addressed comments in V1.

Please review go/upstream and start using b4.

Regards,
Bjorn

> 
> This patch depends on [1] which adds support for IPQ9574 and IPQ5332
> remoteproc q5v5_mpd driver.
> 
> [1]: 
> https://lore.kernel.org/all/20231110091939.3025413-1-quic_mmani...@quicinc.com/
> 
>  drivers/remoteproc/qcom_q6v5.c  | 10 +++--
>  drivers/remoteproc/qcom_q6v5.h  |  6 ++-
>  drivers/remoteproc/qcom_q6v5_adsp.c | 17 +
>  drivers/remoteproc/qcom_q6v5_mpd.c  | 13 ---
>  drivers/remoteproc/qcom_q6v5_mss.c  |  5 ++-
>  drivers/remoteproc/qcom_q6v5_pas.c  | 59 +++--
>  drivers/remoteproc/qcom_q6v5_wcss.c | 12 +++---
>  7 files changed, 66 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/remoteproc/qcom_q6v5.c b/drivers/remoteproc/qcom_q6v5.c
> index 0e32f13c196d..e4a28bf25130 100644
> --- a/drivers/remoteproc/qcom_q6v5.c
> +++ b/drivers/remoteproc/qcom_q6v5.c
> @@ -100,7 +100,7 @@ static irqreturn_t q6v5_wdog_interrupt(int irq, void 
> *data)
>   return IRQ_HANDLED;
>   }
>  
> - msg = qcom_smem_get(QCOM_SMEM_HOST_ANY, q6v5->crash_reason, );
> + msg = qcom_smem_get(q6v5->crash_reason_partition, 
> q6v5->crash_reason_item, );
>   if (!IS_ERR(msg) && len > 0 && msg[0])
>   dev_err(q6v5->dev, "watchdog received: %s\n", msg);
>   else
> @@ -121,7 +121,7 @@ irqreturn_t q6v5_fatal_interrupt(int irq, void *data)
>   if (!q6v5->running)
>   return IRQ_HANDLED;
>  
> - msg = qcom_smem_get(QCOM_SMEM_HOST_ANY, q6v5->crash_reason, );
> + msg = qcom_smem_get(q6v5->crash_reason_partition, 
> q6v5->crash_reason_item, );
>   if (!IS_ERR(msg) && len > 0 && msg[0])
>   dev_err(q6v5->dev, "fatal error received: %s\n", msg);
>   else
> @@ -279,14 +279,16 @@ EXPORT_SYMBOL_GPL(qcom_q6v5_panic);
>   * Return: 0 on success, negative errno on failure
>   */
>  int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
> -struct rproc *rproc, int crash_reason, const char 
> *load_state,
> +struct rproc *rproc, int crash_reason_partition,
> +int crash_reason_item, const char *load_state,
>  void (*handover)(struct qcom_q6v5 *q6v5))
>  {
>   int ret;
>  
>   q6v5->rproc = rproc;
>   q6v5->dev = >dev;
> - q6v5->crash_reason = crash_reason;
> + q6v5->crash_reason_partition = crash_reason_partition;
> + q6v5->crash_reason_item = crash_reason_item;
>   q6v5->handover = handover;
>  
>   init_completion(>start_done);
> diff --git a/drivers/remoteproc/qcom_q6v5.h b/drivers/remoteproc/qcom_q6v5.h
> index 4e1bb1a68284..cd02372e9856 100644
> --- a/drivers/remoteproc/qcom_q6v5.h
> +++ b/drivers/remoteproc/qcom_q6v5.h
> @@ -40,7 +40,8 @@ struct qcom_q6v5 {
>   struct completion stop_done;
>   struct completion spawn_done;
>  
> - int crash_reason;
> + int crash_reason_partition;
> + int crash_reason_item;
>  
>   bool running;
>  
> @@ -49,7 +50,8 @@ struct qcom_q6v5 {
>  };
>  
>  int qcom_q6v5_init(struct qcom_q6v5 *q6v5, struct platform_device *pdev,
> -struct rproc *rproc, int crash_reason, const char 
> *load_state,
> +struct rproc *rproc, int crash_reason_partition,
> +int crash_reason_item, const char *load_state,
>  void (*handover)(struct qcom_q6v5 *q6v5));
>  void qcom_q6v5_deinit(struct qcom_q6v5 *q6v5);
>  
> diff --git a/drivers/remoteproc/qcom_q6v5_adsp.c 
> b/drivers/remoteproc/qcom_q6v5_adsp.c
> index 6c67514cc493..055764aa201c 100644
> --- a/drivers/remoteproc/qcom_q6v5_adsp.c
> +++ b/drivers/remoteproc/qcom_q6v5_adsp.c
> @@ -62,7 +62,7 @@
>  #define LPASS_EFUSE_Q6SS_EVB_SEL 0x0
>  
>  struct adsp_pil_data {
> - int crash_reason_smem;
> + int crash_reason_item;
>   const char *firmware_name;
>  
>   const char *ssr_name;
> @@ -98,7 

Re: [PATCH v6 7/7] Documentation: KVM: Add hypercall for LoongArch

2024-03-05 Thread maobibo




On 2024/3/6 上午2:26, WANG Xuerui wrote:

On 3/4/24 17:10, maobibo wrote:

On 2024/3/2 下午5:41, WANG Xuerui wrote:

On 3/2/24 16:47, Bibo Mao wrote:

[snip]
+Querying for existence
+==
+
+To find out if we're running on KVM or not, cpucfg can be used with 
index
+CPUCFG_KVM_BASE (0x4000), cpucfg range between 0x4000 - 
0x40FF
+is marked as a specially reserved range. All existing and future 
processors

+will not implement any features in this range.
+
+When Linux is running on KVM, cpucfg with index CPUCFG_KVM_BASE 
(0x4000)

+returns magic string "KVM\0"
+
+Once you determined you're running under a PV capable KVM, you can 
now use

+hypercalls as described below.


So this is still the approach similar to the x86 CPUID-based 
implementation. But here the non-privileged behavior isn't specified 
-- I see there is PLV checking in Patch 3 but it's safer to have the 
requirement spelled out here too.


But I still think this approach touches more places than strictly 
needed. As it is currently the case in 
arch/loongarch/kernel/cpu-probe.c, the FEATURES IOCSR is checked for 
a bit IOCSRF_VM that already signifies presence of a hypervisor; if 
this information can be interpreted as availability of the HVCL 
instruction (which I suppose is the case -- a hypervisor can always 
trap-and-emulate in case HVCL isn't provided by hardware), here we 
can already start making calls with HVCL.


We can and should define a uniform interface for probing the 
hypervisor kind, similar to the centrally-managed RISC-V SBI 
implementation ID registry [1]: otherwise future non-KVM hypervisors 
would have to


1. somehow pretend they are KVM and eventually fail to do so, leading 
to subtle incompatibilities,

2. invent another way of probing for their existence,
3. piggy-back on the current KVM definition, which is inelegant 
(reading the LoongArch-KVM-defined CPUCFG leaf only to find it's not 
KVM) and utterly makes the definition here *not* KVM-specific.


[1]: 
https://github.com/riscv-non-isa/riscv-sbi-doc/blob/v2.0/src/ext-base.adoc 



Sorry, I know nothing about riscv. Can you describe how 
sbi_get_mimpid() is implemented in detailed? Is it a simple library or 
need trap into secure mode or need trap into hypervisor mode?


For these simple interfaces you can expect trivial implementation. See 
for example [OpenSBI]'s respective code.


[OpenSBI]: 
https://github.com/riscv-software-src/opensbi/blob/v1.4/lib/sbi/sbi_ecall.c#L29-L34 




My take on this:

To check if we are running on Linux KVM or not, first check IOCSR 0x8 
(``LOONGARCH_IOCSR_FEATURES``) for bit 11 (``IOCSRF_VM``); we are 
running under a hypervisor if the bit is set. Then invoke ``HVCL 0`` 
to find out the hypervisor implementation ID; a return value in 
``$a0`` of 0x004d564b (``KVM\0``) means Linux KVM, in which case the 
rest of the convention applies.


I do not think so. `HVCL 0` requires that hypercall ABIs need be 
unified for all hypervisors. Instead it is not necessary, each 
hypervisor can has its own hypercall ABI.


I don't think agreeing upon the ABI of HVCL 0 is going to affect ABI of 
other hypercalls. Plus, as long as people don't invent something that 
they think is smart and deviate from the platform calling convention, 
I'd expect every hypervisor to have identical ABI apart from the exact 
HVCL operation ID chosen.



+
+KVM hypercall ABI
+=
+
+Hypercall ABI on KVM is simple, only one scratch register a0 (v0) 
and at most
+five generic registers used as input parameter. FP register and 
vector register
+is not used for input register and should not be modified during 
hypercall.
+Hypercall function can be inlined since there is only one scratch 
register.


It should be pointed out explicitly that on hypercall return all 
Well, return value description will added. What do think about the 
meaning of return value for KVM_HCALL_FUNC_PV_IPI hypercall?  The 
number of CPUs with IPI delivered successfully like kvm x86 or simply 
success/failure?
architectural state except ``$a0`` is preserved. Or is the whole 
``$a0 - $t8`` range clobbered, just like with Linux syscalls?



what is advantage with $a0 - > $t8 clobbered?


Because then a hypercall is going to behave identical as an ordinary C 
function call, which is easy for people and compilers to understand.


If you really understand detailed behavior about hypercall/syscall, the 
conclusion may be different.


If T0 - T8 is clobbered with hypercall instruction, hypercall caller 
need save clobbered register, now hypercall exception save/restore all 
the registers during VM exits. If so, hypercall caller need not save 
general registers and it is not necessary scratched for hypercall ABI.


Until now all the discussion the macro level, no detail code level.

Can you show me some example code where T0-T8 need not save/restore 
during LoongArch hypercall exception?


Regards
Bibo Mao

It seems that with linux Loongarch syscall, t0--t8 

[PATCH] ring-buffer: mark racy accesses on work->wait_index

2024-03-05 Thread linke li
Mark data races to work->wait_index as benign using READ_ONCE and WRITE_ONCE. 
These accesses are expected to be racy.

Signed-off-by: linke li 
---
 kernel/trace/ring_buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0699027b4f4c..a47e9e9750cc 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -798,7 +798,7 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, 
int cpu)
rbwork = _buffer->irq_work;
}
 
-   rbwork->wait_index++;
+   WRITE_ONCE(rbwork->wait_index, READ_ONCE(rbwork->wait_index) + 1);
/* make sure the waiters see the new index */
smp_wmb();
 
@@ -906,7 +906,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, 
int full)
 
/* Make sure to see the new wait index */
smp_rmb();
-   if (wait_index != work->wait_index)
+   if (wait_index != READ_ONCE(work->wait_index))
break;
}
 
-- 
2.39.3 (Apple Git-145)




Re: [PATCH net-next v2 3/3] tun: AF_XDP Tx zero-copy support

2024-03-05 Thread Jason Wang
On Mon, Mar 4, 2024 at 7:24 PM wangyunjian  wrote:
>
>
>
> > -Original Message-
> > From: Jason Wang [mailto:jasow...@redhat.com]
> > Sent: Monday, March 4, 2024 2:56 PM
> > To: wangyunjian 
> > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> > bj...@kernel.org; magnus.karls...@intel.com; maciej.fijalkow...@intel.com;
> > jonathan.le...@gmail.com; da...@davemloft.net; b...@vger.kernel.org;
> > net...@vger.kernel.org; linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> > virtualizat...@lists.linux.dev; xudingke ; liwei (DT)
> > 
> > Subject: Re: [PATCH net-next v2 3/3] tun: AF_XDP Tx zero-copy support
> >
> > On Wed, Feb 28, 2024 at 7:06 PM Yunjian Wang 
> > wrote:
> > >
> > > This patch set allows TUN to support the AF_XDP Tx zero-copy feature,
> > > which can significantly reduce CPU utilization for XDP programs.
> > >
> > > Since commit fc72d1d54dd9 ("tuntap: XDP transmission"), the pointer
> > > ring has been utilized to queue different types of pointers by
> > > encoding the type into the lower bits. Therefore, we introduce a new
> > > flag, TUN_XDP_DESC_FLAG(0x2UL), which allows us to enqueue XDP
> > > descriptors and differentiate them from XDP buffers and sk_buffs.
> > > Additionally, a spin lock is added for enabling and disabling operations 
> > > on the
> > xsk pool.
> > >
> > > The performance testing was performed on a Intel E5-2620 2.40GHz
> > machine.
> > > Traffic were generated/send through TUN(testpmd txonly with AF_XDP) to
> > > VM (testpmd rxonly in guest).
> > >
> > > +--+-+-+-+
> > > |  |   copy  |zero-copy| speedup |
> > > +--+-+-+-+
> > > | UDP  |   Mpps  |   Mpps  |%|
> > > | 64   |   2.5   |   4.0   |   60%   |
> > > | 512  |   2.1   |   3.6   |   71%   |
> > > | 1024 |   1.9   |   3.3   |   73%   |
> > > +--+-+-+-+
> > >
> > > Signed-off-by: Yunjian Wang 
> > > ---
> > >  drivers/net/tun.c  | 177
> > +++--
> > >  drivers/vhost/net.c|   4 +
> > >  include/linux/if_tun.h |  32 
> > >  3 files changed, 208 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c index
> > > bc80fc1d576e..7f4ff50b532c 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -63,6 +63,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -86,6 +87,7 @@ static void tun_default_link_ksettings(struct
> > net_device *dev,
> > >struct
> > ethtool_link_ksettings
> > > *cmd);
> > >
> > >  #define TUN_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
> > > +#define TUN_XDP_BATCH 64
> > >
> > >  /* TUN device flags */
> > >
> > > @@ -146,6 +148,9 @@ struct tun_file {
> > > struct tun_struct *detached;
> > > struct ptr_ring tx_ring;
> > > struct xdp_rxq_info xdp_rxq;
> > > +   struct xsk_buff_pool *xsk_pool;
> > > +   spinlock_t pool_lock;   /* Protects xsk pool enable/disable */
> > > +   u32 nb_descs;
> > >  };
> > >
> > >  struct tun_page {
> > > @@ -614,6 +619,8 @@ void tun_ptr_free(void *ptr)
> > > struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr);
> > >
> > > xdp_return_frame(xdpf);
> > > +   } else if (tun_is_xdp_desc_frame(ptr)) {
> > > +   return;
> > > } else {
> > > __skb_array_destroy_skb(ptr);
> > > }
> > > @@ -631,6 +638,37 @@ static void tun_queue_purge(struct tun_file *tfile)
> > > skb_queue_purge(>sk.sk_error_queue);
> > >  }
> > >
> > > +static void tun_set_xsk_pool(struct tun_file *tfile, struct
> > > +xsk_buff_pool *pool) {
> > > +   if (!pool)
> > > +   return;
> > > +
> > > +   spin_lock(>pool_lock);
> > > +   xsk_pool_set_rxq_info(pool, >xdp_rxq);
> > > +   tfile->xsk_pool = pool;
> > > +   spin_unlock(>pool_lock); }
> > > +
> > > +static void tun_clean_xsk_pool(struct tun_file *tfile) {
> > > +   spin_lock(>pool_lock);
> > > +   if (tfile->xsk_pool) {
> > > +   void *ptr;
> > > +
> > > +   while ((ptr = ptr_ring_consume(>tx_ring)) != NULL)
> > > +   tun_ptr_free(ptr);
> > > +
> > > +   if (tfile->nb_descs) {
> > > +   xsk_tx_completed(tfile->xsk_pool,
> > tfile->nb_descs);
> > > +   if (xsk_uses_need_wakeup(tfile->xsk_pool))
> > > +
> > xsk_set_tx_need_wakeup(tfile->xsk_pool);
> > > +   tfile->nb_descs = 0;
> > > +   }
> > > +   tfile->xsk_pool = NULL;
> > > +   }
> > > +   spin_unlock(>pool_lock); }
> > > +
> > >  static void __tun_detach(struct tun_file *tfile, bool clean)  {
> > > struct tun_file *ntfile;
> > > @@ -648,6 +686,11 @@ static void __tun_detach(struct tun_file *tfile, bool
> > clean)
> > > u16 index = 

[POC] !!! Re: [PATCH 0/8] tracing: Persistent traces across a reboot or crash

2024-03-05 Thread Steven Rostedt
I forgot to add [POC] to the topic.

All these patches are a proof of concept.

-- Steve



[PATCH 8/8] ring-buffer: Validate boot range memory events

2024-03-05 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Make sure all the events in each of the sub-buffers that were mapped in a
memory region are valid. This moves the code that walks the buffers for
time-stamp validation out of the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS
ifdef block and is used to validate the content. Only the ring buffer
event meta data is checked and not the data load.

This also has a second purpose. The buffer_page structure that points to
the data sub-buffers has accounting that keeps track of the number of
events that are on the sub-buffer. This updates that counter as well. That
counter is used in reading the buffer and knowing if the ring buffer is
empty or not.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 222 +++--
 1 file changed, 165 insertions(+), 57 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index e74185a4d864..f7b511935fcf 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1610,10 +1610,171 @@ static bool rb_meta_valid(struct ring_buffer_meta 
*meta, int cpu,
subbuf = (void *)subbuf + subbuf_size;
}
 
-   pr_info("Ring buffer meta is from previous boot!\n");
return true;
 }
 
+static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
+
+#ifdef CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS
+static DEFINE_PER_CPU(atomic_t, checking);
+static atomic_t ts_dump;
+
+#define buffer_warn_return(fmt, ...)   \
+   do {\
+   /* If another report is happening, ignore this one */   \
+   if (atomic_inc_return(_dump) != 1) { \
+   atomic_dec(_dump);   \
+   goto out;   \
+   }   \
+   atomic_inc(_buffer->record_disabled);   \
+   pr_warn(fmt, ##__VA_ARGS__);\
+   dump_buffer_page(bpage, info, tail);\
+   atomic_dec(_dump);   \
+   /* There's some cases in boot up that this can happen */ \
+   if (WARN_ON_ONCE(system_state != SYSTEM_BOOTING))   \
+   /* Do not re-enable checking */ \
+   return; \
+   } while (0)
+#else
+#define buffer_warn_return(fmt, ...) do { } while (0)
+#endif
+
+static int rb_read_data_buffer(struct buffer_data_page *dpage, int tail, int 
cpu,
+  unsigned long long *timestamp, bool warn)
+{
+   struct ring_buffer_event *event;
+   u64 ts, delta;
+   int events = 0;
+   int e;
+
+   ts = dpage->time_stamp;
+
+   for (e = 0; e < tail; e += rb_event_length(event)) {
+
+   event = (struct ring_buffer_event *)(dpage->data + e);
+
+   switch (event->type_len) {
+
+   case RINGBUF_TYPE_TIME_EXTEND:
+   delta = rb_event_time_stamp(event);
+   ts += delta;
+   break;
+
+   case RINGBUF_TYPE_TIME_STAMP:
+   delta = rb_event_time_stamp(event);
+   delta = rb_fix_abs_ts(delta, ts);
+   if (warn && delta < ts) {
+   buffer_warn_return("[CPU: %d]ABSOLUTE TIME WENT 
BACKWARDS: last ts: %lld absolute ts: %lld\n",
+  cpu, ts, delta);
+   }
+   ts = delta;
+   break;
+
+   case RINGBUF_TYPE_PADDING:
+   if (event->time_delta == 1)
+   break;
+   fallthrough;
+   case RINGBUF_TYPE_DATA:
+   events++;
+   ts += event->time_delta;
+   break;
+
+   default:
+   return -1;
+   }
+   }
+   *timestamp = ts;
+   return events;
+}
+
+static int rb_validate_buffer(struct buffer_data_page *dpage, int cpu)
+{
+   unsigned long long ts;
+   int tail;
+
+   tail = local_read(>commit);
+   return rb_read_data_buffer(dpage, tail, cpu, , false);
+}
+
+/* If the meta data has been validated, now validate the events */
+static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
+{
+   struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+   struct buffer_page *head_page;
+   unsigned long entry_bytes = 0;
+   unsigned long entries = 0;
+   int ret;
+   int i;
+
+   if (!meta || !meta->head_buffer)
+   return;
+
+   /* Do the reader page first */
+   ret = 

[PATCH 7/8] ring-buffer: Add test if range of boot buffer is valid

2024-03-05 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Add a test against the ring buffer memory range to see if it has valid
data. The ring_buffer_meta structure is given a new field called
"first_buffer" which holds the address of the first sub-buffer. This is
used to both determine if the other fields are valid as well as finding
the offset between the old addresses of the sub-buffer from the previous
boot to the new addresses of the current boot.

Since the values for nr_subbufs and subbuf_size is to be the same, check
if the values in the meta page match the values calculated.

Take the range of the first_buffer and the total size of all the buffers
and make sure the saved head_buffer and commit_buffer fall in the range.

Iterate through all the sub-buffers to make sure that the values in the
sub-buffer "commit" field (the field that holds the amount of data on the
sub-buffer) is within the end of the sub-buffer. Also check the index
array to make sure that all the indexes are within nr_subbufs.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 142 ++---
 1 file changed, 134 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1e06ebe36ad1..e74185a4d864 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -44,6 +44,7 @@
 static void update_pages_handler(struct work_struct *work);
 
 struct ring_buffer_meta {
+   unsigned long   first_buffer;
unsigned long   head_buffer;
unsigned long   commit_buffer;
__u32   subbuf_size;
@@ -1554,20 +1555,101 @@ static void *rb_range_buffer(struct 
ring_buffer_per_cpu *cpu_buffer, int idx)
return (void *)ptr;
 }
 
+/*
+ * See if the existing memory contains valid ring buffer data.
+ * As the previous kernel must be the same as this kernel, all
+ * the calculations (size of buffers and number of buffers)
+ * must be the same.
+ */
+static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
+ struct trace_buffer *buffer, int nr_pages)
+{
+   int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+   struct buffer_data_page *subbuf;
+   unsigned long buffers_start;
+   unsigned long buffers_end;
+
+   /* The subbuffer's size and number of subbuffers must match */
+   if (meta->subbuf_size != subbuf_size ||
+   meta->nr_subbufs != nr_pages + 1) {
+   pr_info("Ring buffer boot meta [%d] mismatch of 
subbuf_size/nr_pages\n", cpu);
+   return false;
+   }
+
+   buffers_start = meta->first_buffer;
+   buffers_end = meta->first_buffer + (subbuf_size * meta->nr_subbufs);
+
+   /* Is the head and commit buffers within the range of buffers? */
+   if (meta->head_buffer < buffers_start ||
+   meta->head_buffer >= buffers_end) {
+   pr_info("Ring buffer boot meta [%d] head buffer out of 
range\n", cpu);
+   return false;
+   }
+
+   if (meta->commit_buffer < buffers_start ||
+   meta->commit_buffer >= buffers_end) {
+   pr_info("Ring buffer boot meta [%d] commit buffer out of 
range\n", cpu);
+   return false;
+   }
+
+   subbuf = rb_subbufs_from_meta(meta);
+
+   /* Is the meta buffers and the subbufs themselves have correct data? */
+   for (int i = 0; i < meta->nr_subbufs; i++) {
+   if (meta->buffers[i] < 0 ||
+   meta->buffers[i] >= meta->nr_subbufs) {
+   pr_info("Ring buffer boot meta [%d] array out of 
range\n", cpu);
+   return false;
+   }
+
+   if ((unsigned)local_read(>commit) > subbuf_size) {
+   pr_info("Ring buffer boot meta [%d] buffer invalid 
commit\n", cpu);
+   return false;
+   }
+
+   subbuf = (void *)subbuf + subbuf_size;
+   }
+
+   pr_info("Ring buffer meta is from previous boot!\n");
+   return true;
+}
+
 static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
 {
struct ring_buffer_meta *meta;
+   unsigned long delta;
void *subbuf;
int cpu;
 
for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+   void *next_meta;
+
meta = rb_range_meta(buffer, nr_pages, cpu);
 
+   if (rb_meta_valid(meta, cpu, buffer, nr_pages)) {
+   /* Make the mappings match the current address */
+   subbuf = rb_subbufs_from_meta(meta);
+   delta = (unsigned long)subbuf - meta->first_buffer;
+   meta->first_buffer += delta;
+   meta->head_buffer += delta;
+   meta->commit_buffer += delta;
+   continue;
+   }
+
+   if (cpu < nr_cpu_ids - 1)
+   next_meta = rb_range_meta(buffer, nr_pages, cpu + 1);
+   

[PATCH 5/8] ring-buffer: Add ring_buffer_meta data

2024-03-05 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Populate the ring_buffer_meta array. It holds the pointer to the
head_buffer (next to read), the commit_buffer (next to write) the size of
the sub-buffers, number of sub-buffers and an array that keeps track of
the order of the sub-buffers.

This information will be stored in the persistent memory to help on reboot
to reconstruct the ring buffer.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 207 -
 1 file changed, 182 insertions(+), 25 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 367597dc766b..5a90ada49366 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -42,6 +42,11 @@
 static void update_pages_handler(struct work_struct *work);
 
 struct ring_buffer_meta {
+   unsigned long   head_buffer;
+   unsigned long   commit_buffer;
+   __u32   subbuf_size;
+   __u32   nr_subbufs;
+   int buffers[];
 };
 
 /*
@@ -497,6 +502,7 @@ struct ring_buffer_per_cpu {
struct mutexmapping_lock;
unsigned long   *subbuf_ids;/* ID to subbuf addr */
struct trace_buffer_meta*meta_page;
+   struct ring_buffer_meta *ring_meta;
 
/* ring buffer pages to update, > 0 to add, < 0 to remove */
longnr_pages_to_update;
@@ -1206,6 +1212,11 @@ static void rb_head_page_activate(struct 
ring_buffer_per_cpu *cpu_buffer)
 * Set the previous list pointer to have the HEAD flag.
 */
rb_set_list_to_head(head->list.prev);
+
+   if (cpu_buffer->ring_meta) {
+   struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+   meta->head_buffer = (unsigned long)head->page;
+   }
 }
 
 static void rb_list_head_clear(struct list_head *list)
@@ -1453,50 +1464,124 @@ rb_range_align_subbuf(unsigned long addr, int 
subbuf_size, int nr_subbufs)
 }
 
 /*
- * Return a specific sub-buffer for a given @cpu defined by @idx.
+ * Return the ring_buffer_meta for a given @cpu.
  */
-static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int 
nr_pages, int idx)
+static void *rb_range_meta(struct trace_buffer *buffer, int nr_pages, int cpu)
 {
-   unsigned long ptr;
int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+   unsigned long ptr = buffer->range_addr_start;
+   struct ring_buffer_meta *meta;
int nr_subbufs;
 
-   /* Include the reader page */
-   nr_subbufs = nr_pages + 1;
+   if (!ptr)
+   return NULL;
+
+   /* When nr_pages passed in is zero, the first meta has already been 
initialized */
+   if (!nr_pages) {
+   meta = (struct ring_buffer_meta *)ptr;
+   nr_subbufs = meta->nr_subbufs;
+   } else {
+   meta = NULL;
+   /* Include the reader page */
+   nr_subbufs = nr_pages + 1;
+   }
 
/*
 * The first chunk may not be subbuffer aligned, where as
 * the rest of the chunks are.
 */
-   ptr = buffer->range_addr_start;
-   ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
if (cpu) {
-   unsigned long p;
-
-   ptr += subbuf_size * nr_subbufs;
-
-   /* Save the beginning of this CPU chunk */
-   p = ptr;
-
ptr = rb_range_align_subbuf(ptr, subbuf_size, nr_subbufs);
+   ptr += subbuf_size * nr_subbufs;
 
if (cpu > 1) {
unsigned long size;
+   unsigned long p;
 
+   /* Save the beginning of this CPU chunk */
+   p = ptr;
+   ptr = rb_range_align_subbuf(ptr, subbuf_size, 
nr_subbufs);
ptr += subbuf_size * nr_subbufs;
 
/* Now all chunks after this are the same size */
size = ptr - p;
ptr += size * (cpu - 2);
-
-   ptr = rb_range_align_subbuf(ptr, subbuf_size, 
nr_subbufs);
}
}
-   if (ptr + subbuf_size * nr_subbufs > buffer->range_addr_end)
+   return (void *)ptr;
+}
+
+static void *rb_subbufs_from_meta(struct ring_buffer_meta *meta)
+{
+   int subbuf_size = meta->subbuf_size;
+   unsigned long ptr;
+
+   ptr = (unsigned long)meta;
+   ptr = rb_range_align_subbuf(ptr, subbuf_size, meta->nr_subbufs);
+
+   return (void *)ptr;
+}
+
+/*
+ * Return a specific sub-buffer for a given @cpu defined by @idx.
+ */
+static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
+{
+   struct ring_buffer_meta *meta;
+   unsigned long ptr;
+   int subbuf_size;
+
+   meta = rb_range_meta(cpu_buffer->buffer, 0, cpu_buffer->cpu);
+   if (!meta)
+   return NULL;
+
+   if (WARN_ON_ONCE(idx 

[PATCH 6/8] ring-buffer: Add output of ring buffer meta page

2024-03-05 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Add a buffer_meta per-cpu file for the trace instance that is mapped to
boot memory. This shows the current meta-data and can be used by user
space tools to record off the current mappings to help reconstruct the
ring buffer after a reboot.

It does not expose any virtual addresses, just indexes into the sub-buffer
pages.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 77 ++
 kernel/trace/trace.c   | 30 ++-
 kernel/trace/trace.h   |  2 +
 3 files changed, 107 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5a90ada49366..1e06ebe36ad1 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -31,6 +31,8 @@
 #include 
 #include 
 
+#include "trace.h"
+
 /*
  * The "absolute" timestamp in the buffer is only 59 bits.
  * If a clock has the 5 MSBs set, it needs to be saved and
@@ -1582,6 +1584,81 @@ static void rb_range_meta_init(struct trace_buffer 
*buffer, int nr_pages)
}
 }
 
+static void *rbm_start(struct seq_file *m, loff_t *pos)
+{
+   struct ring_buffer_per_cpu *cpu_buffer = m->private;
+   struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+   unsigned long val;
+
+   if (!meta)
+   return NULL;
+
+   if (*pos > meta->nr_subbufs)
+   return NULL;
+
+   val = *pos;
+   val++;
+
+   return (void *)val;
+}
+
+static void *rbm_next(struct seq_file *m, void *v, loff_t *pos)
+{
+   (*pos)++;
+
+   return rbm_start(m, pos);
+}
+
+static int rb_meta_subbuf_idx(struct ring_buffer_meta *meta, void *subbuf);
+
+static int rbm_show(struct seq_file *m, void *v)
+{
+   struct ring_buffer_per_cpu *cpu_buffer = m->private;
+   struct ring_buffer_meta *meta = cpu_buffer->ring_meta;
+   unsigned long val = (unsigned long)v;
+
+   if (val == 1) {
+   seq_printf(m, "head_buffer:   %d\n",
+  rb_meta_subbuf_idx(meta, (void *)meta->head_buffer));
+   seq_printf(m, "commit_buffer: %d\n",
+  rb_meta_subbuf_idx(meta, (void 
*)meta->commit_buffer));
+   seq_printf(m, "subbuf_size:   %d\n", meta->subbuf_size);
+   seq_printf(m, "nr_subbufs:%d\n", meta->nr_subbufs);
+   return 0;
+   }
+
+   val -= 2;
+   seq_printf(m, "buffer[%ld]:%d\n", val, meta->buffers[val]);
+
+   return 0;
+}
+
+static void rbm_stop(struct seq_file *m, void *p)
+{
+}
+
+static const struct seq_operations rb_meta_seq_ops = {
+   .start  = rbm_start,
+   .next   = rbm_next,
+   .show   = rbm_show,
+   .stop   = rbm_stop,
+};
+
+int ring_buffer_meta_seq_init(struct file *file, struct trace_buffer *buffer, 
int cpu)
+{
+   struct seq_file *m;
+   int ret;
+
+   ret = seq_open(file, _meta_seq_ops);
+   if (ret)
+   return ret;
+
+   m = file->private_data;
+   m->private = buffer->buffers[cpu];
+
+   return 0;
+}
+
 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
long nr_pages, struct list_head *pages)
 {
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ff986d2a4bd0..b4a7960aed98 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4997,7 +4997,7 @@ static int show_traces_open(struct inode *inode, struct 
file *file)
return 0;
 }
 
-static int show_traces_release(struct inode *inode, struct file *file)
+static int tracing_seq_release(struct inode *inode, struct file *file)
 {
struct trace_array *tr = inode->i_private;
 
@@ -5038,7 +5038,7 @@ static const struct file_operations show_traces_fops = {
.open   = show_traces_open,
.read   = seq_read,
.llseek = seq_lseek,
-   .release= show_traces_release,
+   .release= tracing_seq_release,
 };
 
 static ssize_t
@@ -6840,6 +6840,22 @@ tracing_total_entries_read(struct file *filp, char 
__user *ubuf,
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
 }
 
+static int tracing_buffer_meta_open(struct inode *inode, struct file *filp)
+{
+   struct trace_array *tr = inode->i_private;
+   int cpu = tracing_get_cpu(inode);
+   int ret;
+
+   ret = tracing_check_open_get_tr(tr);
+   if (ret)
+   return ret;
+
+   ret = ring_buffer_meta_seq_init(filp, tr->array_buffer.buffer, cpu);
+   if (ret < 0)
+   __trace_array_put(tr);
+   return ret;
+}
+
 static ssize_t
 tracing_free_buffer_write(struct file *filp, const char __user *ubuf,
  size_t cnt, loff_t *ppos)
@@ -7416,6 +7432,13 @@ static const struct file_operations tracing_entries_fops 
= {
.release= tracing_release_generic_tr,
 };
 
+static const struct file_operations tracing_buffer_meta_fops = {
+   .open   = 

[PATCH 3/8] tracing: Create "boot_mapped" instance for memory mapped buffer

2024-03-05 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Add two global variables trace_buffer_start and trace_buffer_size. If they
are both set, then a "boot_mapped" instance will be created using the
memory specified by these variables as its ring buffer.

The instance will exist in:

  /sys/kernel/tracing/instances/boot_mapped

Note, because the ring buffer is using a defined memory ranged, it will
act just like a memory mapped ring buffer. It will not have a snapshot
buffer, as it can't swap out the buffer. The snapshot files as well as any
tracers that uses a snapshot will not be present in the boot_mapped
instance.

Signed-off-by: Steven Rostedt (Google) 
---
 include/linux/trace.h |  7 +
 kernel/trace/trace.c  | 65 ---
 kernel/trace/trace.h  |  3 ++
 3 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/include/linux/trace.h b/include/linux/trace.h
index fdcd76b7be83..75dab6bb88c9 100644
--- a/include/linux/trace.h
+++ b/include/linux/trace.h
@@ -33,6 +33,13 @@ struct trace_array;
 int register_ftrace_export(struct trace_export *export);
 int unregister_ftrace_export(struct trace_export *export);
 
+/*
+ * If the below are set, then a "boot_mapped" tracing instance will
+ * be created using this memory for its ring buffer.
+ */
+extern unsigned long trace_buffer_start;
+extern unsigned long trace_buffer_size;
+
 /**
  * trace_array_puts - write a constant string into the trace buffer.
  * @tr:The trace array to write to
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ff0b0a999171..ff986d2a4bd0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4900,6 +4900,11 @@ static int tracing_open(struct inode *inode, struct file 
*file)
 static bool
 trace_ok_for_array(struct tracer *t, struct trace_array *tr)
 {
+#ifdef CONFIG_TRACER_SNAPSHOT
+   /* arrays with mapped buffer range do not have snapshots */
+   if (tr->range_addr_start && t->use_max_tr)
+   return false;
+#endif
return (tr->flags & TRACE_ARRAY_FL_GLOBAL) || t->allow_instances;
 }
 
@@ -8670,11 +8675,13 @@ tracing_init_tracefs_percpu(struct trace_array *tr, 
long cpu)
tr, cpu, _entries_fops);
 
 #ifdef CONFIG_TRACER_SNAPSHOT
-   trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu,
-   tr, cpu, _fops);
+   if (!tr->range_addr_start) {
+   trace_create_cpu_file("snapshot", TRACE_MODE_WRITE, d_cpu,
+ tr, cpu, _fops);
 
-   trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu,
-   tr, cpu, _raw_fops);
+   trace_create_cpu_file("snapshot_raw", TRACE_MODE_READ, d_cpu,
+ tr, cpu, _raw_fops);
+   }
 #endif
 }
 
@@ -9211,7 +9218,18 @@ allocate_trace_buffer(struct trace_array *tr, struct 
array_buffer *buf, int size
 
buf->tr = tr;
 
-   buf->buffer = ring_buffer_alloc(size, rb_flags);
+   if (tr->range_addr_start && tr->range_addr_size) {
+   buf->buffer = ring_buffer_alloc_range(size, rb_flags, 0,
+ tr->range_addr_start,
+ tr->range_addr_size);
+   /*
+* This is basically the same as a mapped buffer,
+* with the same restrictions.
+*/
+   tr->mapped++;
+   } else {
+   buf->buffer = ring_buffer_alloc(size, rb_flags);
+   }
if (!buf->buffer)
return -ENOMEM;
 
@@ -9248,6 +9266,10 @@ static int allocate_trace_buffers(struct trace_array 
*tr, int size)
return ret;
 
 #ifdef CONFIG_TRACER_MAX_TRACE
+   /* Fix mapped buffer trace arrays do not have snapshot buffers */
+   if (tr->range_addr_start)
+   return 0;
+
ret = allocate_trace_buffer(tr, >max_buffer,
allocate_snapshot ? size : 1);
if (MEM_FAIL(ret, "Failed to allocate trace buffer\n")) {
@@ -9348,7 +9370,9 @@ static int trace_array_create_dir(struct trace_array *tr)
 }
 
 static struct trace_array *
-trace_array_create_systems(const char *name, const char *systems)
+trace_array_create_systems(const char *name, const char *systems,
+  unsigned long range_addr_start,
+  unsigned long range_addr_size)
 {
struct trace_array *tr;
int ret;
@@ -9374,6 +9398,10 @@ trace_array_create_systems(const char *name, const char 
*systems)
goto out_free_tr;
}
 
+   /* Only for boot up memory mapped ring buffers */
+   tr->range_addr_start = range_addr_start;
+   tr->range_addr_size = range_addr_size;
+
tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS;
 
cpumask_copy(tr->tracing_cpumask, cpu_all_mask);
@@ -9431,9 +9459,24 @@ 

[PATCH 4/8] HACK: Hard code in mapped tracing buffer address

2024-03-05 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Do not submit!

This is for testing purposes only. It hard codes an address that I was
using to store the ring buffer range. How the memory actually gets mapped
will be another project.

Signed-off-by: Steven Rostedt (Google) 
---
 arch/x86/kernel/setup.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 84201071dfac..dcba729349d3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -26,6 +26,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 
 #include 
@@ -1106,6 +1108,24 @@ void __init setup_arch(char **cmdline_p)
 */
arch_reserve_crashkernel();
 
+   trace_buffer_size = 12582912;
+   {
+   phys_addr_t ftrace_addr;
+   unsigned long phys_start = 0x28540;
+   unsigned long phys_end = phys_start + trace_buffer_size + 
1024*1024;
+
+   ftrace_addr = memblock_phys_alloc_range(trace_buffer_size, 4096,
+   phys_start, phys_end);
+   if (ftrace_addr) {
+   printk("MEMORY ALLOC %lx-%lx\n", (long)ftrace_addr,
+  (long)ftrace_addr + trace_buffer_size);
+   trace_buffer_start = (unsigned long)__va(ftrace_addr);
+   printk("MEMORY ADDR %lx-%lx\n", trace_buffer_start,
+  trace_buffer_start + trace_buffer_size);
+   } else
+   printk("MEMORY FAILED\n");
+   }
+
memblock_find_dma_reserve();
 
if (!early_xdbc_setup_hardware())
-- 
2.43.0





[PATCH 2/8] ring-buffer: Add ring_buffer_alloc_range()

2024-03-05 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

In preparation to allowing the trace ring buffer to be allocated in a
range of memory that is persistent across reboots, add
ring_buffer_alloc_range(). It takes a contiguous range of memory and will
split it up evening for the per CPU ring buffers.

If there's not enough memory to handle all CPUs with the minimum size, it
will fail to allocate the ring buffer.

Signed-off-by: Steven Rostedt (Google) 
---
 include/linux/ring_buffer.h |  17 +++
 kernel/trace/ring_buffer.c  | 221 ++--
 2 files changed, 203 insertions(+), 35 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 0841ba8bab14..17b5508f042c 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -89,6 +89,11 @@ void ring_buffer_discard_commit(struct trace_buffer *buffer,
 struct trace_buffer *
 __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key 
*key);
 
+struct trace_buffer *__ring_buffer_alloc_range(unsigned long size, unsigned 
flags,
+  int order, unsigned long start,
+  unsigned long range_size,
+  struct lock_class_key *key);
+
 /*
  * Because the ring buffer is generic, if other users of the ring buffer get
  * traced by ftrace, it can produce lockdep warnings. We need to keep each
@@ -100,6 +105,18 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, 
struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key);   \
 })
 
+/*
+ * Because the ring buffer is generic, if other users of the ring buffer get
+ * traced by ftrace, it can produce lockdep warnings. We need to keep each
+ * ring buffer's lock class separate.
+ */
+#define ring_buffer_alloc_range(size, flags, order, start, range_size) \
+({ \
+   static struct lock_class_key __key; \
+   __ring_buffer_alloc_range((size), (flags), (order), (start),\
+ (range_size), &__key);\
+})
+
 int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
 __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
  struct file *filp, poll_table *poll_table, int full);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 524b2c185c88..367597dc766b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -41,6 +41,9 @@
 
 static void update_pages_handler(struct work_struct *work);
 
+struct ring_buffer_meta {
+};
+
 /*
  * The ring buffer header is special. We must manually up keep it.
  */
@@ -339,7 +342,8 @@ struct buffer_page {
local_t  entries;   /* entries on this page */
unsigned longreal_end;  /* real end of data */
unsigned order; /* order of the page */
-   u32  id;/* ID for external mapping */
+   u32  id:30; /* ID for external mapping */
+   u32  range:1;   /* Mapped via a range */
struct buffer_data_page *page;  /* Actual data page */
 };
 
@@ -370,7 +374,9 @@ static __always_inline unsigned int rb_page_commit(struct 
buffer_page *bpage)
 
 static void free_buffer_page(struct buffer_page *bpage)
 {
-   free_pages((unsigned long)bpage->page, bpage->order);
+   /* Range pages are not to be freed */
+   if (!bpage->range)
+   free_pages((unsigned long)bpage->page, bpage->order);
kfree(bpage);
 }
 
@@ -520,6 +526,9 @@ struct trace_buffer {
struct rb_irq_work  irq_work;
booltime_stamp_abs;
 
+   unsigned long   range_addr_start;
+   unsigned long   range_addr_end;
+
unsigned intsubbuf_size;
unsigned intsubbuf_order;
unsigned intmax_data_size;
@@ -1431,9 +1440,67 @@ static void rb_check_pages(struct ring_buffer_per_cpu 
*cpu_buffer)
}
 }
 
+/*
+ * Take an address, add the meta data size as well as the array of
+ * array subbuffer indexes, then align it to a subbuffer size.
+ */
+static unsigned long
+rb_range_align_subbuf(unsigned long addr, int subbuf_size, int nr_subbufs)
+{
+   addr += sizeof(struct ring_buffer_meta) +
+   sizeof(int) * nr_subbufs;
+   return ALIGN(addr, subbuf_size);
+}
+
+/*
+ * Return a specific sub-buffer for a given @cpu defined by @idx.
+ */
+static void *rb_range_buffer(struct trace_buffer *buffer, int cpu, int 
nr_pages, int idx)
+{
+   unsigned long ptr;
+   int subbuf_size = buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
+   int nr_subbufs;
+
+   /* Include the reader page */
+   nr_subbufs = nr_pages + 1;
+
+   /*
+* The first 

[PATCH 0/8] tracing: Persistent traces across a reboot or crash

2024-03-05 Thread Steven Rostedt


This is a way to map a ring buffer instance across reboots.
The requirement is that you have a memory region that is not erased.
I tested this on a Debian VM running on qemu on a Debian server,
and even tested it on a baremetal box running Fedora. I was
surprised that it worked on the baremetal box, but it does so
surprisingly consistently.

The idea is that you can reserve a memory region and save it in two
special variables:

  trace_buffer_start and trace_buffer_size

If these are set by fs_initcall() then a "boot_mapped" instance
is created. The memory that was reserved is used by the ring buffer
of this instance. It acts like a memory mapped instance so it has
some limitations. It does not allow snapshots nor does it allow
tracers which use a snapshot buffer (like irqsoff and wakeup tracers).

On boot up, when setting up the ring buffer, it looks at the current
content and does a vigorous test to see if the content is valid.
It even walks the events in all the sub-buffers to make sure the
ring buffer meta data is correct. If it determines that the content
is valid, it will reconstruct the ring buffer to use the content
it has found.

If the buffer is valid, on the next boot, the boot_mapped instance
will contain the data from the previous boot. You can cat the
trace or trace_pipe file, or even run trace-cmd extract on it to
make a trace.dat file that holds the date. This is much better than
dealing with a ftrace_dump_on_opps (I wish I had this a decade ago!)

There are still some limitations of this buffer. One is that it assumes
that the kernel you are booting back into is the same one that crashed.
At least the trace_events (like sched_switch and friends) all have the
same ids. This would be true with the same kernel as the ids are determined
at link time.

Module events could possible be a problem as the ids may not match.

One idea is to just print the raw fields and not process the print formats
for this instance, as the print formats may do some crazy things with
data that does not match.

Another limitation is any print format that has "%pS" will likely not work.
That's because the pointer in the old ring buffer is for an address that
may be different than the function points to now. I was thinking of
adding a file in the boot_mapped instance that holds the delta of the
old mapping to the new mapping, so that trace-cmd and perf could
calculate the current kallsyms from the old pointers.

Finally, this is still a proof of concept. How to create this memory
mapping isn't decided yet. In this patch set I simply hacked into
kexec crash code and hard coded an address that worked for one of my
machines (for the other machine I had to play around to find another
address). Perhaps we could add a kernel command line parameter that
lets people decided, or an option where it could possibly look at
the ACPI (for intel) tables to come up with an address on its own.

Anyway, I plan on using this for debugging, as it already is pretty
featureful but there's much more that can be done.

Basically, all you need to do is:

  echo 1 > /sys/kernel/tracing/instances/boot_mapped/events/enable

Do what ever you want and the system crashes (and boots to the same
kernel). Then:

  cat /sys/kernel/tracing/instances/boot_mapped/trace

and it will have the trace.

I'm sure there's still some gotchas here, which is why this is currently
still just a POC.

Enjoy...

Steven Rostedt (Google) (8):
  ring-buffer: Allow mapped field to be set without mapping
  ring-buffer: Add ring_buffer_alloc_range()
  tracing: Create "boot_mapped" instance for memory mapped buffer
  HACK: Hard code in mapped tracing buffer address
  ring-buffer: Add ring_buffer_meta data
  ring-buffer: Add output of ring buffer meta page
  ring-buffer: Add test if range of boot buffer is valid
  ring-buffer: Validate boot range memory events


 arch/x86/kernel/setup.c |  20 ++
 include/linux/ring_buffer.h |  17 +
 include/linux/trace.h   |   7 +
 kernel/trace/ring_buffer.c  | 826 ++--
 kernel/trace/trace.c|  95 -
 kernel/trace/trace.h|   5 +
 6 files changed, 856 insertions(+), 114 deletions(-)



[PATCH 1/8] ring-buffer: Allow mapped field to be set without mapping

2024-03-05 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

In preparation for having the ring buffer mapped to a dedicated location,
which will have the same restrictions as user space memory mapped buffers,
allow it to use the "mapped" field of the ring_buffer_per_cpu structure
without having the user space meta page mapping.

When this starts using the mapped field, it will need to handle adding a
user space mapping (and removing it) from a ring buffer that is using a
dedicated memory range.

Signed-off-by: Steven Rostedt (Google) 
---
 kernel/trace/ring_buffer.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1d7d7a701867..524b2c185c88 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5171,6 +5171,9 @@ static void rb_update_meta_page(struct 
ring_buffer_per_cpu *cpu_buffer)
 {
struct trace_buffer_meta *meta = cpu_buffer->meta_page;
 
+   if (!meta)
+   return;
+
meta->reader.read = cpu_buffer->reader_page->read;
meta->reader.id = cpu_buffer->reader_page->id;
meta->reader.lost_events = cpu_buffer->lost_events;
@@ -6159,7 +6162,7 @@ rb_get_mapped_buffer(struct trace_buffer *buffer, int cpu)
 
mutex_lock(_buffer->mapping_lock);
 
-   if (!cpu_buffer->mapped) {
+   if (!cpu_buffer->mapped || !cpu_buffer->meta_page) {
mutex_unlock(_buffer->mapping_lock);
return ERR_PTR(-ENODEV);
}
@@ -6217,7 +6220,7 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu)
 
mutex_lock(_buffer->mapping_lock);
 
-   if (cpu_buffer->mapped) {
+   if (cpu_buffer->meta_page) {
err = __rb_inc_dec_mapped(buffer, cpu_buffer, true);
mutex_unlock(_buffer->mapping_lock);
return err;
@@ -6247,7 +6250,7 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu)
raw_spin_lock_irqsave(_buffer->reader_lock, flags);
 
rb_setup_ids_meta_page(cpu_buffer, subbuf_ids);
-   cpu_buffer->mapped = 1;
+   cpu_buffer->mapped++;
 
raw_spin_unlock_irqrestore(_buffer->reader_lock, flags);
 unlock:
-- 
2.43.0





[PATCH 3/5] ARM: dts: qcom: msm8974pro-castor: Remove camera button definitions

2024-03-05 Thread Luca Weiss
>From what I can tell, the camera buttons are not part of Z2 Tablet
hardware even though other devices based on 'shinano' do have them.

Fixes: ab80661883de ("ARM: dts: qcom: msm8974: Add Sony Xperia Z2 Tablet")
Signed-off-by: Luca Weiss 
---
 .../dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts  | 12 
 1 file changed, 12 deletions(-)

diff --git 
a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts 
b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
index da554f72528a..97b55bda9189 100644
--- a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
+++ b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
@@ -34,18 +34,6 @@ key-volume-down {
linux,code = ;
};
 
-   key-camera-snapshot {
-   label = "camera_snapshot";
-   gpios = <_gpios 3 GPIO_ACTIVE_LOW>;
-   linux,code = ;
-   };
-
-   key-camera-focus {
-   label = "camera_focus";
-   gpios = <_gpios 4 GPIO_ACTIVE_LOW>;
-   linux,code = ;
-   };
-
key-volume-up {
label = "volume_up";
gpios = <_gpios 5 GPIO_ACTIVE_LOW>;

-- 
2.44.0




[PATCH 1/5] ARM: dts: qcom: msm8974pro-castor: Clean up formatting

2024-03-05 Thread Luca Weiss
Clean up some easy things do prepare the dts for further changes.

* Move pinctrl-names below pinctrl-*
* Move status as last property
* Remove default linux,input-type value

Signed-off-by: Luca Weiss 
---
 .../qcom-msm8974pro-sony-xperia-shinano-castor.dts | 65 +-
 1 file changed, 27 insertions(+), 38 deletions(-)

diff --git 
a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts 
b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
index ee94741a26ed..2db2ddf00580 100644
--- a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
+++ b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
@@ -23,34 +23,30 @@ chosen {
gpio-keys {
compatible = "gpio-keys";
 
-   pinctrl-names = "default";
pinctrl-0 = <_keys_pin_a>;
+   pinctrl-names = "default";
 
key-volume-down {
label = "volume_down";
gpios = <_gpios 2 GPIO_ACTIVE_LOW>;
-   linux,input-type = <1>;
linux,code = ;
};
 
key-camera-snapshot {
label = "camera_snapshot";
gpios = <_gpios 3 GPIO_ACTIVE_LOW>;
-   linux,input-type = <1>;
linux,code = ;
};
 
key-camera-focus {
label = "camera_focus";
gpios = <_gpios 4 GPIO_ACTIVE_LOW>;
-   linux,input-type = <1>;
linux,code = ;
};
 
key-volume-up {
label = "volume_up";
gpios = <_gpios 5 GPIO_ACTIVE_LOW>;
-   linux,input-type = <1>;
linux,code = ;
};
};
@@ -67,8 +63,8 @@ vreg_bl_vddio: lcd-backlight-vddio {
vin-supply = <_s3>;
startup-delay-us = <7>;
 
-   pinctrl-names = "default";
pinctrl-0 = <_backlight_en_pin_a>;
+   pinctrl-names = "default";
};
 
vreg_vsp: lcd-dcdc-regulator {
@@ -80,8 +76,8 @@ vreg_vsp: lcd-dcdc-regulator {
gpio = <_gpios 20 GPIO_ACTIVE_HIGH>;
enable-active-high;
 
-   pinctrl-names = "default";
pinctrl-0 = <_dcdc_en_pin_a>;
+   pinctrl-names = "default";
};
 
vreg_boost: vreg-boost {
@@ -121,8 +117,8 @@ vreg_wlan: wlan-regulator {
gpio = <_gpios 18 GPIO_ACTIVE_HIGH>;
enable-active-high;
 
-   pinctrl-names = "default";
pinctrl-0 = <_regulator_pin>;
+   pinctrl-names = "default";
};
 };
 
@@ -131,9 +127,10 @@ _uart2 {
 };
 
 _i2c2 {
-   status = "okay";
clock-frequency = <355000>;
 
+   status = "okay";
+
synaptics@2c {
compatible = "syna,rmi4-i2c";
reg = <0x2c>;
@@ -147,8 +144,8 @@ synaptics@2c {
vdd-supply = <_l22>;
vio-supply = <_lvs3>;
 
-   pinctrl-names = "default";
pinctrl-0 = <_int_pin>;
+   pinctrl-names = "default";
 
syna,startup-delay-ms = <100>;
 
@@ -166,9 +163,10 @@ rmi4-f11@11 {
 };
 
 _i2c5 {
-   status = "okay";
clock-frequency = <355000>;
 
+   status = "okay";
+
lp8566_wled: backlight@2c {
compatible = "ti,lp8556";
reg = <0x2c>;
@@ -232,8 +230,8 @@ bluetooth {
compatible = "brcm,bcm43438-bt";
max-speed = <300>;
 
-   pinctrl-names = "default";
pinctrl-0 = <_host_wake_pin>, <_dev_wake_pin>, 
<_reg_on_pin>;
+   pinctrl-names = "default";
 
host-wakeup-gpios = < 95 GPIO_ACTIVE_HIGH>;
device-wakeup-gpios = < 96 GPIO_ACTIVE_HIGH>;
@@ -242,17 +240,16 @@ bluetooth {
 };
 
 _coincell {
-   status = "okay";
-
qcom,rset-ohms = <2100>;
qcom,vset-millivolts = <3000>;
+
+   status = "okay";
 };
 
 _gpios {
gpio_keys_pin_a: gpio-keys-active-state {
pins = "gpio2", "gpio5";
function = "normal";
-
bias-pull-up;
power-source = ;
};
@@ -260,7 +257,6 @@ gpio_keys_pin_a: gpio-keys-active-state {
bt_reg_on_pin: bt-reg-on-state {
pins = "gpio16";
function = "normal";
-
output-low;
power-source = ;
};
@@ -268,7 +264,6 @@ bt_reg_on_pin: bt-reg-on-state {
wlan_sleep_clk_pin: wl-sleep-clk-state {
pins = "gpio17";
function = "func2";
-
output-high;
power-source = ;
};
@@ -276,7 +271,6 @@ wlan_sleep_clk_pin: wl-sleep-clk-state {

[PATCH 4/5] ARM: dts: qcom: msm8974pro-castor: Add debounce-interval for keys

2024-03-05 Thread Luca Weiss
Set the debounce-interval for the GPIO keys.

Signed-off-by: Luca Weiss 
---
 arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts 
b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
index 97b55bda9189..c9f74bf2f8bd 100644
--- a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
+++ b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
@@ -32,12 +32,14 @@ key-volume-down {
label = "volume_down";
gpios = <_gpios 2 GPIO_ACTIVE_LOW>;
linux,code = ;
+   debounce-interval = <15>;
};
 
key-volume-up {
label = "volume_up";
gpios = <_gpios 5 GPIO_ACTIVE_LOW>;
linux,code = ;
+   debounce-interval = <15>;
};
};
 

-- 
2.44.0




[PATCH 0/5] Some devicetree cleanup for MSM8974 Sony Xperia Z2 Tablet

2024-03-05 Thread Luca Weiss
The sony-castor dts has been around for a while, clean up some things to
prepare for further changes including the introduction of the
shinano-based Sony Xperia Z3.

Signed-off-by: Luca Weiss 
---
Luca Weiss (5):
  ARM: dts: qcom: msm8974pro-castor: Clean up formatting
  ARM: dts: qcom: msm8974pro-castor: Add mmc aliases
  ARM: dts: qcom: msm8974pro-castor: Remove camera button definitions
  ARM: dts: qcom: msm8974pro-castor: Add debounce-interval for keys
  ARM: dts: qcom: msm8974pro-castor: Rename wifi node name

 .../qcom-msm8974pro-sony-xperia-shinano-castor.dts | 83 +-
 1 file changed, 32 insertions(+), 51 deletions(-)
---
base-commit: 2e397253aae928c6d318beb18c05bc2236f69a8a
change-id: 20240305-castor-changes-bc6785ba8458

Best regards,
-- 
Luca Weiss 




[PATCH 2/5] ARM: dts: qcom: msm8974pro-castor: Add mmc aliases

2024-03-05 Thread Luca Weiss
Add the mmc0 & mmc1 aliases to make sure internal storage always becomes
/dev/mmcblk0 and SD card becomes /dev/mmcblk1

Signed-off-by: Luca Weiss 
---
 arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts 
b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
index 2db2ddf00580..da554f72528a 100644
--- a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
+++ b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
@@ -12,6 +12,8 @@ / {
chassis-type = "tablet";
 
aliases {
+   mmc0 = _1;
+   mmc1 = _2;
serial0 = _uart2;
serial1 = _uart1;
};

-- 
2.44.0




[PATCH 5/5] ARM: dts: qcom: msm8974pro-castor: Rename wifi node name

2024-03-05 Thread Luca Weiss
Give the wifi node a generic node name 'wifi'.

Signed-off-by: Luca Weiss 
---
 arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts 
b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
index c9f74bf2f8bd..20f98a9e49ea 100644
--- a/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
+++ b/arch/arm/boot/dts/qcom/qcom-msm8974pro-sony-xperia-shinano-castor.dts
@@ -547,7 +547,7 @@ _3 {
 
status = "okay";
 
-   bcrmf@1 {
+   wifi@1 {
compatible = "brcm,bcm4339-fmac", "brcm,bcm4329-fmac";
reg = <1>;
 

-- 
2.44.0




[PATCH] remoteproc: make rproc_class constant

2024-03-05 Thread Ricardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the rproc_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Cc: Greg Kroah-Hartman 
Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Ricardo B. Marliere 
---
 drivers/remoteproc/remoteproc_internal.h | 2 +-
 drivers/remoteproc/remoteproc_sysfs.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_internal.h 
b/drivers/remoteproc/remoteproc_internal.h
index f62a82d71dfa..0cd09e67ac14 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -72,7 +72,7 @@ void rproc_init_debugfs(void);
 void rproc_exit_debugfs(void);
 
 /* from remoteproc_sysfs.c */
-extern struct class rproc_class;
+extern const struct class rproc_class;
 int rproc_init_sysfs(void);
 void rproc_exit_sysfs(void);
 
diff --git a/drivers/remoteproc/remoteproc_sysfs.c 
b/drivers/remoteproc/remoteproc_sysfs.c
index 8c7ea8922638..138e752c5e4e 100644
--- a/drivers/remoteproc/remoteproc_sysfs.c
+++ b/drivers/remoteproc/remoteproc_sysfs.c
@@ -254,7 +254,7 @@ static const struct attribute_group *rproc_devgroups[] = {
NULL
 };
 
-struct class rproc_class = {
+const struct class rproc_class = {
.name   = "remoteproc",
.dev_groups = rproc_devgroups,
 };

---
base-commit: 8b46dc5cfa5ffea279aed0fc05dc4b1c39a51517
change-id: 20240305-class_cleanup-remoteproc2-f1212934f990

Best regards,
-- 
Ricardo B. Marliere 




Re: [PATCH 0/3] Restructure init sequence to set aside reserved memory earlier

2024-03-05 Thread Oreoluwa Babatunde


On 2/9/2024 4:29 PM, Oreoluwa Babatunde wrote:
> The loongarch, openric, and sh architectures allocate memory from
> memblock before it gets the chance to set aside reserved memory regions.
> This means that there is a possibility for memblock to allocate from
> memory regions that are supposed to be reserved.
>
> This series makes changes to the arch specific setup code to call the
> functions responsible for setting aside the reserved memory regions earlier
> in the init sequence.
> Hence, by the time memblock starts being used to allocate memory, the
> reserved memory regions should already be set aside, and it will no
> longer be possible for allocations to come from them.
>
> I am currnetly using an arm64 device, and so I will need assistance from
> the relevant arch maintainers to help check if this breaks anything from
> compilation to device bootup.
>
> Oreoluwa Babatunde (3):
>   loongarch: Call arch_mem_init() before platform_init() in the init
> sequence
>   openrisc: Call setup_memory() earlier in the init sequence
>   sh: Call paging_init() earlier in the init sequence
>
>  arch/loongarch/kernel/setup.c | 2 +-
>  arch/openrisc/kernel/setup.c  | 6 +++---
>  arch/sh/kernel/setup.c| 4 ++--
>  3 files changed, 6 insertions(+), 6 deletions(-)
Hello,

Loongarch patch has already merged for this, but review is still pending
from openrisc and sh architectures.
Could someone please comment on these?

Regards,

Oreoluwa



Re: [RFC PATCH v3 2/5] mfd: add driver for Marvell 88PM886 PMIC

2024-03-05 Thread Karel Balej
Lee Jones, 2024-03-05T11:44:18+00:00:
> > +static struct mfd_cell pm886_devs[] = {
> > +   {
> > +   .name = "88pm886-onkey",
> > +   .num_resources = ARRAY_SIZE(pm886_onkey_resources),
> > +   .resources = pm886_onkey_resources,
> > +   },
> > +   {
> > +   .name = "88pm886-regulator",
> > +   .id = PM886_REGULATOR_ID_LDO2,
>
> Why doesn't PLATFORM_DEVID_AUTO work for this device?

Because I am using the IDs in the regulator driver to determine which
regulator data to use/which regulator to register.

> > +static int pm886_initialize_subregmaps(struct pm886_chip *chip)
> > +{
> > +   struct device *dev = >client->dev;
> > +   struct i2c_client *page;
> > +   struct regmap *regmap;
> > +   int err;
> > +
> > +   /* regulators page */
> > +   page = devm_i2c_new_dummy_device(dev, chip->client->adapter,
> > +   chip->client->addr + 
> > PM886_PAGE_OFFSET_REGULATORS);
> > +   if (IS_ERR(page)) {
> > +   err = PTR_ERR(page);
> > +   dev_err(dev, "Failed to initialize regulators client: %d\n", 
> > err);
> > +   return err;
> > +   }
> > +   regmap = devm_regmap_init_i2c(page, _i2c_regmap);
> > +   if (IS_ERR(regmap)) {
> > +   err = PTR_ERR(regmap);
> > +   dev_err(dev, "Failed to initialize regulators regmap: %d\n", 
> > err);
> > +   return err;
> > +   }
> > +   chip->regmaps[PM886_REGMAP_REGULATORS] = regmap;
>
> Except for the regulator driver, where else is the regulators regmap used?

Nowhere, at least as of now. So you are saying that I should initialize
the regmap in the regulator driver?

Thank you,
K. B.



[PATCH] rpmsg: core: make rpmsg_class constant

2024-03-05 Thread Ricardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the rpmsg_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Cc: Greg Kroah-Hartman 
Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Ricardo B. Marliere 
---
 drivers/rpmsg/rpmsg_char.c |  2 +-
 drivers/rpmsg/rpmsg_core.c | 16 +---
 drivers/rpmsg/rpmsg_ctrl.c |  2 +-
 drivers/rpmsg/rpmsg_internal.h |  2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 1cb8d7474428..d7a342510902 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -423,7 +423,7 @@ static struct rpmsg_eptdev 
*rpmsg_chrdev_eptdev_alloc(struct rpmsg_device *rpdev
init_waitqueue_head(>readq);
 
device_initialize(dev);
-   dev->class = rpmsg_class;
+   dev->class = _class;
dev->parent = parent;
dev->groups = rpmsg_eptdev_groups;
dev_set_drvdata(dev, eptdev);
diff --git a/drivers/rpmsg/rpmsg_core.c b/drivers/rpmsg/rpmsg_core.c
index 4295c01a2861..0fa08266404d 100644
--- a/drivers/rpmsg/rpmsg_core.c
+++ b/drivers/rpmsg/rpmsg_core.c
@@ -20,7 +20,9 @@
 
 #include "rpmsg_internal.h"
 
-struct class *rpmsg_class;
+const struct class rpmsg_class = {
+   .name = "rpmsg",
+};
 EXPORT_SYMBOL(rpmsg_class);
 
 /**
@@ -715,16 +717,16 @@ static int __init rpmsg_init(void)
 {
int ret;
 
-   rpmsg_class = class_create("rpmsg");
-   if (IS_ERR(rpmsg_class)) {
-   pr_err("failed to create rpmsg class\n");
-   return PTR_ERR(rpmsg_class);
+   ret = class_register(_class);
+   if (ret) {
+   pr_err("failed to register rpmsg class\n");
+   return ret;
}
 
ret = bus_register(_bus);
if (ret) {
pr_err("failed to register rpmsg bus: %d\n", ret);
-   class_destroy(rpmsg_class);
+   class_destroy(_class);
}
return ret;
 }
@@ -733,7 +735,7 @@ postcore_initcall(rpmsg_init);
 static void __exit rpmsg_fini(void)
 {
bus_unregister(_bus);
-   class_destroy(rpmsg_class);
+   class_destroy(_class);
 }
 module_exit(rpmsg_fini);
 
diff --git a/drivers/rpmsg/rpmsg_ctrl.c b/drivers/rpmsg/rpmsg_ctrl.c
index c312794ba4b3..28f57945ccd9 100644
--- a/drivers/rpmsg/rpmsg_ctrl.c
+++ b/drivers/rpmsg/rpmsg_ctrl.c
@@ -150,7 +150,7 @@ static int rpmsg_ctrldev_probe(struct rpmsg_device *rpdev)
dev = >dev;
device_initialize(dev);
dev->parent = >dev;
-   dev->class = rpmsg_class;
+   dev->class = _class;
 
mutex_init(>ctrl_lock);
cdev_init(>cdev, _ctrldev_fops);
diff --git a/drivers/rpmsg/rpmsg_internal.h b/drivers/rpmsg/rpmsg_internal.h
index b950d6f790a3..a3ba768138f1 100644
--- a/drivers/rpmsg/rpmsg_internal.h
+++ b/drivers/rpmsg/rpmsg_internal.h
@@ -18,7 +18,7 @@
 #define to_rpmsg_device(d) container_of(d, struct rpmsg_device, dev)
 #define to_rpmsg_driver(d) container_of(d, struct rpmsg_driver, drv)
 
-extern struct class *rpmsg_class;
+extern const struct class rpmsg_class;
 
 /**
  * struct rpmsg_device_ops - indirection table for the rpmsg_device operations

---
base-commit: b03aa6d4e9a74c4289929b6cf3c6bcc80270682d
change-id: 20240305-class_cleanup-remoteproc-2b53e26b2817

Best regards,
-- 
Ricardo B. Marliere 




Re: [PATCH v6 7/7] Documentation: KVM: Add hypercall for LoongArch

2024-03-05 Thread WANG Xuerui

On 3/4/24 17:10, maobibo wrote:

On 2024/3/2 下午5:41, WANG Xuerui wrote:

On 3/2/24 16:47, Bibo Mao wrote:

[snip]
+Querying for existence
+==
+
+To find out if we're running on KVM or not, cpucfg can be used with 
index
+CPUCFG_KVM_BASE (0x4000), cpucfg range between 0x4000 - 
0x40FF
+is marked as a specially reserved range. All existing and future 
processors

+will not implement any features in this range.
+
+When Linux is running on KVM, cpucfg with index CPUCFG_KVM_BASE 
(0x4000)

+returns magic string "KVM\0"
+
+Once you determined you're running under a PV capable KVM, you can 
now use

+hypercalls as described below.


So this is still the approach similar to the x86 CPUID-based 
implementation. But here the non-privileged behavior isn't specified 
-- I see there is PLV checking in Patch 3 but it's safer to have the 
requirement spelled out here too.


But I still think this approach touches more places than strictly 
needed. As it is currently the case in 
arch/loongarch/kernel/cpu-probe.c, the FEATURES IOCSR is checked for a 
bit IOCSRF_VM that already signifies presence of a hypervisor; if this 
information can be interpreted as availability of the HVCL instruction 
(which I suppose is the case -- a hypervisor can always 
trap-and-emulate in case HVCL isn't provided by hardware), here we can 
already start making calls with HVCL.


We can and should define a uniform interface for probing the 
hypervisor kind, similar to the centrally-managed RISC-V SBI 
implementation ID registry [1]: otherwise future non-KVM hypervisors 
would have to


1. somehow pretend they are KVM and eventually fail to do so, leading 
to subtle incompatibilities,

2. invent another way of probing for their existence,
3. piggy-back on the current KVM definition, which is inelegant 
(reading the LoongArch-KVM-defined CPUCFG leaf only to find it's not 
KVM) and utterly makes the definition here *not* KVM-specific.


[1]: 
https://github.com/riscv-non-isa/riscv-sbi-doc/blob/v2.0/src/ext-base.adoc


Sorry, I know nothing about riscv. Can you describe how sbi_get_mimpid() 
is implemented in detailed? Is it a simple library or need trap into 
secure mode or need trap into hypervisor mode?


For these simple interfaces you can expect trivial implementation. See 
for example [OpenSBI]'s respective code.


[OpenSBI]: 
https://github.com/riscv-software-src/opensbi/blob/v1.4/lib/sbi/sbi_ecall.c#L29-L34



My take on this:

To check if we are running on Linux KVM or not, first check IOCSR 0x8 
(``LOONGARCH_IOCSR_FEATURES``) for bit 11 (``IOCSRF_VM``); we are 
running under a hypervisor if the bit is set. Then invoke ``HVCL 0`` 
to find out the hypervisor implementation ID; a return value in 
``$a0`` of 0x004d564b (``KVM\0``) means Linux KVM, in which case the 
rest of the convention applies.


I do not think so. `HVCL 0` requires that hypercall ABIs need be unified 
for all hypervisors. Instead it is not necessary, each hypervisor can 
has its own hypercall ABI.


I don't think agreeing upon the ABI of HVCL 0 is going to affect ABI of 
other hypercalls. Plus, as long as people don't invent something that 
they think is smart and deviate from the platform calling convention, 
I'd expect every hypervisor to have identical ABI apart from the exact 
HVCL operation ID chosen.



+
+KVM hypercall ABI
+=
+
+Hypercall ABI on KVM is simple, only one scratch register a0 (v0) 
and at most
+five generic registers used as input parameter. FP register and 
vector register
+is not used for input register and should not be modified during 
hypercall.
+Hypercall function can be inlined since there is only one scratch 
register.


It should be pointed out explicitly that on hypercall return all 
Well, return value description will added. What do think about the 
meaning of return value for KVM_HCALL_FUNC_PV_IPI hypercall?  The number 
of CPUs with IPI delivered successfully like kvm x86 or simply 
success/failure?
architectural state except ``$a0`` is preserved. Or is the whole ``$a0 
- $t8`` range clobbered, just like with Linux syscalls?



what is advantage with $a0 - > $t8 clobbered?


Because then a hypercall is going to behave identical as an ordinary C 
function call, which is easy for people and compilers to understand.


It seems that with linux Loongarch syscall, t0--t8 are clobber rather 
than a0-t8. Am I wrong?


You're right, my memory has faded a bit. But I think my reasoning still 
holds.



+
+The parameters are as follows:
+
+            
+    Register    IN    OUT
+            
+    a0    function number    Return code
+    a1    1st parameter    -
+    a2    2nd parameter    -
+    a3    3rd parameter    -
+    a4    4th parameter    -
+    a5    5th parameter    -
+            
+
+Return codes can 

Re: [PATCH v2] tracing: Limit trace_marker writes to just 4K

2024-03-05 Thread Mathieu Desnoyers

On 2024-03-04 22:34, Steven Rostedt wrote:

From: "Steven Rostedt (Google)" 

Limit the max print event of trace_marker to just 4K string size. This must
also be less than the amount that can be held by a trace_seq along with
the text that is before the output (like the task name, PID, CPU, state,
etc). As trace_seq is made to handle large events (some greater than 4K).
Make the max size of a trace_marker write event be 4K which is guaranteed
to fit in the trace_seq buffer.

Suggested-by: Mathieu Desnoyers 


From my perspective I only attempted to clarify the point Linus made
about limiting the trace_marker input to 4kB. Feel adapt the
Suggested-by tag accordingly.

Reviewed-by: Mathieu Desnoyers 

Thanks,

Mathieu


Signed-off-by: Steven Rostedt (Google) 
---
Changes since v1: 
https://lore.kernel.org/linux-trace-kernel/20240304192710.4c996...@gandalf.local.home/

- Just make the max limit 4K and not half of the trace_seq size.
   The trace_seq is already made to handle events greater than 4k.

  kernel/trace/trace.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 8198bfc54b58..d16b95ca58a7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7293,6 +7293,8 @@ tracing_free_buffer_release(struct inode *inode, struct 
file *filp)
return 0;
  }
  
+#define TRACE_MARKER_MAX_SIZE		4096

+
  static ssize_t
  tracing_mark_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *fpos)
@@ -7320,6 +7322,9 @@ tracing_mark_write(struct file *filp, const char __user 
*ubuf,
if ((ssize_t)cnt < 0)
return -EINVAL;
  
+	if (cnt > TRACE_MARKER_MAX_SIZE)

+   cnt = TRACE_MARKER_MAX_SIZE;
+
meta_size = sizeof(*entry) + 2;  /* add '\0' and possible '\n' */
   again:
size = cnt + meta_size;
@@ -7328,11 +7333,6 @@ tracing_mark_write(struct file *filp, const char __user 
*ubuf,
if (cnt < FAULTED_SIZE)
size += FAULTED_SIZE - cnt;
  
-	if (size > TRACE_SEQ_BUFFER_SIZE) {

-   cnt -= size - TRACE_SEQ_BUFFER_SIZE;
-   goto again;
-   }
-
buffer = tr->array_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
tracing_gen_ctx());


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [RFC PATCH v3 2/5] mfd: add driver for Marvell 88PM886 PMIC

2024-03-05 Thread Lee Jones
On Sun, 03 Mar 2024, Karel Balej wrote:

> From: Karel Balej 
> 
> Marvell 88PM886 is a PMIC which provides various functions such as
> onkey, battery, charger and regulators. It is found for instance in the
> samsung,coreprimevelte smartphone with which this was tested.
> 
> Only implement basic support to allow for the use of regulators and
> onkey omitting the currently unused register definitions and I2C
> subclients which should thus be added with the subdevice drivers which
> need them.
> 
> Signed-off-by: Karel Balej 
> ---
> 
> Notes:
> RFC v3:
> - Drop onkey cell .of_compatible.
> - Rename LDO page offset and regmap to REGULATORS.
> RFC v2:
> - Remove some abstraction.
> - Sort includes alphabetically and add linux/of.h.
> - Depend on OF, remove of_match_ptr and add MODULE_DEVICE_TABLE.
> - Use more temporaries and break long lines.
> - Do not initialize ret in probe.
> - Use the wakeup-source DT property.
> - Rename ret to err.
> - Address Lee's comments:
>   - Drop patched in presets for base regmap and related defines.
>   - Use full sentences in comments.
>   - Remove IRQ comment.
>   - Define regmap_config member values.
>   - Rename data to sys_off_data.
>   - Add _PMIC suffix to Kconfig.
>   - Use dev_err_probe.
>   - Do not store irq_data.
>   - s/WHOAMI/CHIP_ID

I still see 'whoami'.

>   - Drop LINUX part of include guard name.
>   - Merge in the regulator series modifications in order to have more
> devices and modify the commit message accordingly. Changes with
> respect to the original regulator series patches:
> - ret -> err
> - Add temporary for dev in pm88x_initialize_subregmaps.
> - Drop of_compatible for the regulators.
> - Do not duplicate LDO regmap for bucks.
> - Rewrite commit message.
> 
>  drivers/mfd/88pm886.c   | 210 
>  drivers/mfd/Kconfig |  12 +++
>  drivers/mfd/Makefile|   1 +
>  include/linux/mfd/88pm886.h |  46 
>  4 files changed, 269 insertions(+)
>  create mode 100644 drivers/mfd/88pm886.c
>  create mode 100644 include/linux/mfd/88pm886.h
> 
> diff --git a/drivers/mfd/88pm886.c b/drivers/mfd/88pm886.c
> new file mode 100644
> index ..c17220e1b7e2
> --- /dev/null
> +++ b/drivers/mfd/88pm886.c
> @@ -0,0 +1,210 @@
> +// SPDX-License-Identifier: GPL-2.0-only

Header?  Description?  Author?  Copyright?

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define PM886_REG_INT_STATUS10x05
> +
> +#define PM886_REG_INT_ENA_1  0x0a
> +#define PM886_INT_ENA1_ONKEY BIT(0)
> +
> +#define PM886_REGMAP_CONF_REG_BITS   8
> +#define PM886_REGMAP_CONF_VAL_BITS   8

These 2 are not commonly defined, just the one below please.

> +#define PM886_REGMAP_CONF_MAX_REG0xfe

> +enum pm886_irq_number {
> + PM886_IRQ_ONKEY,
> +
> + PM886_MAX_IRQ
> +};

Seems superfluous for 1 IRQ and an unused MAX.

Looks like I've mentioned this before.

The IRQ number probably shouldn't be arbitrary either.

> +static struct regmap_irq pm886_regmap_irqs[] = {
> + REGMAP_IRQ_REG(PM886_IRQ_ONKEY, 0, PM886_INT_ENA1_ONKEY),
> +};
> +
> +static struct regmap_irq_chip pm886_regmap_irq_chip = {
> + .name = "88pm886",
> + .irqs = pm886_regmap_irqs,
> + .num_irqs = ARRAY_SIZE(pm886_regmap_irqs),
> + .num_regs = 4,
> + .status_base = PM886_REG_INT_STATUS1,
> + .ack_base = PM886_REG_INT_STATUS1,
> + .unmask_base = PM886_REG_INT_ENA_1,
> +};
> +
> +static struct resource pm886_onkey_resources[] = {
> + DEFINE_RES_IRQ_NAMED(PM886_IRQ_ONKEY, "88pm886-onkey"),
> +};
> +
> +static struct mfd_cell pm886_devs[] = {
> + {
> + .name = "88pm886-onkey",
> + .num_resources = ARRAY_SIZE(pm886_onkey_resources),
> + .resources = pm886_onkey_resources,
> + },
> + {
> + .name = "88pm886-regulator",
> + .id = PM886_REGULATOR_ID_LDO2,

Why doesn't PLATFORM_DEVID_AUTO work for this device?

> + },
> + {
> + .name = "88pm886-regulator",
> + .id = PM886_REGULATOR_ID_LDO15,
> + },
> + {
> + .name = "88pm886-regulator",
> + .id = PM886_REGULATOR_ID_BUCK2,
> + },
> +};
> +
> +static const struct regmap_config pm886_i2c_regmap = {
> + .reg_bits = PM886_REGMAP_CONF_REG_BITS,
> + .val_bits = PM886_REGMAP_CONF_VAL_BITS,
> + .max_register = PM886_REGMAP_CONF_MAX_REG,
> +};
> +
/> +static int pm886_power_off_handler(struct sys_off_data *sys_off_data)
> +{
> + struct pm886_chip *chip = sys_off_data->cb_data;
> + struct regmap *regmap = chip->regmaps[PM886_REGMAP_BASE];
> + struct device *dev = >client->dev;
> + int err;
> +
> + err = regmap_update_bits(regmap, 

Re: [PATCH net-next v6 5/5] tools: virtio: introduce vhost_net_test

2024-03-05 Thread Paolo Abeni
On Wed, 2024-02-28 at 17:30 +0800, Yunsheng Lin wrote:
> introduce vhost_net_test for both vhost_net tx and rx basing
> on virtio_test to test vhost_net changing in the kernel.
> 
> Steps for vhost_net tx testing:
> 1. Prepare a out buf.
> 2. Kick the vhost_net to do tx processing.
> 3. Do the receiving in the tun side.
> 4. verify the data received by tun is correct.
> 
> Steps for vhost_net rx testing:
> 1. Prepare a in buf.
> 2. Do the sending in the tun side.
> 3. Kick the vhost_net to do rx processing.
> 4. verify the data received by vhost_net is correct.
> 
> Signed-off-by: Yunsheng Lin 

@Jason: AFAICS this addresses the points you raised on v5, could you
please have a look?

Thanks!

Paolo




Re: [PATCH v3 2/2] riscv: Fix text patching when IPI are used

2024-03-05 Thread Björn Töpel
Conor Dooley  writes:

> On Tue, Mar 05, 2024 at 08:33:30AM +0530, Anup Patel wrote:
>> On Tue, Mar 5, 2024 at 1:54 AM Björn Töpel  wrote:
>> >
>> > Conor Dooley  writes:
>> >
>> > > On Thu, Feb 29, 2024 at 01:10:56PM +0100, Alexandre Ghiti wrote:
>> > >> For now, we use stop_machine() to patch the text and when we use IPIs 
>> > >> for
>> > >> remote icache flushes (which is emitted in patch_text_nosync()), the 
>> > >> system
>> > >> hangs.
>> > >>
>> > >> So instead, make sure every CPU executes the stop_machine() patching
>> > >> function and emit a local icache flush there.
>> > >>
>> > >> Co-developed-by: Björn Töpel 
>> > >> Signed-off-by: Björn Töpel 
>> > >> Signed-off-by: Alexandre Ghiti 
>> > >> Reviewed-by: Andrea Parri 
>> > >
>> > > What commit does this fix?
>> >
>> > Hmm. The bug is exposed when the AIA IPI are introduced, and used
>> > (instead of the firmware-based).
>> >
>> > I'm not sure this is something we'd like backported, but rather a
>> > prerequisite to AIA.
>> >
>> > @Anup @Alex WDYT?
>> >
>> 
>> The current text patching never considered IPIs being injected
>> directly in S-mode from hart to another so we are seeing this
>> issue now with AIA IPIs.
>> 
>> We certainly don't need to backport this fix since it's more
>> of a preparatory fix for AIA IPIs.
>
> Whether or not this is backportable, if it fixes a bug, it should get
> a Fixes: tag for the commit that it fixes. Fixes: isn't "the backport"
> tag, cc: stable is.

I guess the question is if this *is* a fix, or rather a change required
for AIA (not a fix).



Re: [PATCH v3 2/2] riscv: Fix text patching when IPI are used

2024-03-05 Thread Conor Dooley
On Tue, Mar 05, 2024 at 08:33:30AM +0530, Anup Patel wrote:
> On Tue, Mar 5, 2024 at 1:54 AM Björn Töpel  wrote:
> >
> > Conor Dooley  writes:
> >
> > > On Thu, Feb 29, 2024 at 01:10:56PM +0100, Alexandre Ghiti wrote:
> > >> For now, we use stop_machine() to patch the text and when we use IPIs for
> > >> remote icache flushes (which is emitted in patch_text_nosync()), the 
> > >> system
> > >> hangs.
> > >>
> > >> So instead, make sure every CPU executes the stop_machine() patching
> > >> function and emit a local icache flush there.
> > >>
> > >> Co-developed-by: Björn Töpel 
> > >> Signed-off-by: Björn Töpel 
> > >> Signed-off-by: Alexandre Ghiti 
> > >> Reviewed-by: Andrea Parri 
> > >
> > > What commit does this fix?
> >
> > Hmm. The bug is exposed when the AIA IPI are introduced, and used
> > (instead of the firmware-based).
> >
> > I'm not sure this is something we'd like backported, but rather a
> > prerequisite to AIA.
> >
> > @Anup @Alex WDYT?
> >
> 
> The current text patching never considered IPIs being injected
> directly in S-mode from hart to another so we are seeing this
> issue now with AIA IPIs.
> 
> We certainly don't need to backport this fix since it's more
> of a preparatory fix for AIA IPIs.

Whether or not this is backportable, if it fixes a bug, it should get
a Fixes: tag for the commit that it fixes. Fixes: isn't "the backport"
tag, cc: stable is.


signature.asc
Description: PGP signature