Re: [PATCH] virt: acrn: Remove unusted list 'acrn_irqfd_clients'

2024-07-19 Thread Dr. David Alan Gilbert
* Li, Fei1 (fei1...@intel.com) wrote:
> > -Original Message-
> > From: Dr. David Alan Gilbert 
> > Sent: Friday, July 19, 2024 1:44 AM
> > To: Li, Fei1 
> > Cc: linux-kernel@vger.kernel.org; virtualizat...@lists.linux.dev
> > Subject: Re: [PATCH] virt: acrn: Remove unusted list 'acrn_irqfd_clients'
> > 
> > * Fei Li (fei1...@intel.com) wrote:
> > > On 2024-05-18 at 00:12:46 +, Dr. David Alan Gilbert wrote:
> > > > * li...@treblig.org (li...@treblig.org) wrote:
> > > > > From: "Dr. David Alan Gilbert" 
> > > > >
> > > > > It doesn't look like this was ever used.
> > > > >
> > > > > Build tested only.
> > > > >
> > > > > Signed-off-by: Dr. David Alan Gilbert 
> > > >
> > > > Ping
> > >
> > > Acked-by: Fei Li 
> > 
> > Hi Fei,
> >   Do you know which way this is likely to get picked up?
> > (I don't see it in -next)
> > 
> >  Thanks,
> > 
> > Dave
> > 
> > > Thanks.
> 
> Hi Dave
> 
> For this patch, you could refer to 
> https://lore.kernel.org/all/20210910094708.3430340-1-bige...@linutronix.de/ 
> to cc Thomas Gleixner  to help review.

OK, I've added Thomas to the To: on this mail;
note he's not listed in Maintainers for drivers/virt/acrn

Dave

> Thanks.
> 
> > >
> > > >
> > > > > ---
> > > > >  drivers/virt/acrn/irqfd.c | 2 --
> > > > >  1 file changed, 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
> > > > > index d4ad211dce7a3..346cf0be4aac7 100644
> > > > > --- a/drivers/virt/acrn/irqfd.c
> > > > > +++ b/drivers/virt/acrn/irqfd.c
> > > > > @@ -16,8 +16,6 @@
> > > > >
> > > > >  #include "acrn_drv.h"
> > > > >
> > > > > -static LIST_HEAD(acrn_irqfd_clients);
> > > > > -
> > > > >  /**
> > > > >   * struct hsm_irqfd - Properties of HSM irqfd
> > > > >   * @vm:  Associated VM pointer
> > > > > --
> > > > > 2.45.0
> > > > >
> > > > --
> > > >  -Open up your eyes, open up your mind, open up your code ---
> > > > / Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \
> > > > \dave @ treblig.org |   | In Hex /
> > > >  \ _|_ http://www.treblig.org   |___/
> > > >
> > >
> > --
> >  -Open up your eyes, open up your mind, open up your code ---
> > / Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \
> > \dave @ treblig.org |   | In Hex /
> >  \ _|_ http://www.treblig.org   |___/
> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Re: [PATCH] virt: acrn: Remove unusted list 'acrn_irqfd_clients'

2024-07-18 Thread Dr. David Alan Gilbert
* Fei Li (fei1...@intel.com) wrote:
> On 2024-05-18 at 00:12:46 +, Dr. David Alan Gilbert wrote:
> > * li...@treblig.org (li...@treblig.org) wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > It doesn't look like this was ever used.
> > > 
> > > Build tested only.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > 
> > Ping
> 
> Acked-by: Fei Li 

Hi Fei,
  Do you know which way this is likely to get picked up?
(I don't see it in -next)

 Thanks,

Dave

> Thanks.
> 
> > 
> > > ---
> > >  drivers/virt/acrn/irqfd.c | 2 --
> > >  1 file changed, 2 deletions(-)
> > > 
> > > diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
> > > index d4ad211dce7a3..346cf0be4aac7 100644
> > > --- a/drivers/virt/acrn/irqfd.c
> > > +++ b/drivers/virt/acrn/irqfd.c
> > > @@ -16,8 +16,6 @@
> > >  
> > >  #include "acrn_drv.h"
> > >  
> > > -static LIST_HEAD(acrn_irqfd_clients);
> > > -
> > >  /**
> > >   * struct hsm_irqfd - Properties of HSM irqfd
> > >   * @vm:  Associated VM pointer
> > > -- 
> > > 2.45.0
> > > 
> > -- 
> >  -Open up your eyes, open up your mind, open up your code ---   
> > / Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
> > \dave @ treblig.org |   | In Hex /
> >  \ _|_ http://www.treblig.org   |___/
> > 
> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Re: [PATCH] virt: acrn: Remove unusted list 'acrn_irqfd_clients'

2024-05-17 Thread Dr. David Alan Gilbert
* li...@treblig.org (li...@treblig.org) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> It doesn't look like this was ever used.
> 
> Build tested only.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Ping

> ---
>  drivers/virt/acrn/irqfd.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
> index d4ad211dce7a3..346cf0be4aac7 100644
> --- a/drivers/virt/acrn/irqfd.c
> +++ b/drivers/virt/acrn/irqfd.c
> @@ -16,8 +16,6 @@
>  
>  #include "acrn_drv.h"
>  
> -static LIST_HEAD(acrn_irqfd_clients);
> -
>  /**
>   * struct hsm_irqfd - Properties of HSM irqfd
>   * @vm:  Associated VM pointer
> -- 
> 2.45.0
> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Re: [PATCH] ftrace: Remove unused global 'ftrace_direct_func_count'

2024-05-06 Thread Dr. David Alan Gilbert
* li...@treblig.org (li...@treblig.org) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Commit 8788ca164eb4b ("ftrace: Remove the legacy _ftrace_direct API")
> stopped setting the 'ftrace_direct_func_count' variable, but left
> it around.  Clean it up.
> 
> Signed-off-by: Dr. David Alan Gilbert 

FYI this is on top of my earlier 'ftrace: Remove unused list 
'ftrace_direct_funcs'

Dave

> ---
>  include/linux/ftrace.h |  2 --
>  kernel/trace/fgraph.c  | 11 ---
>  kernel/trace/ftrace.c  |  1 -
>  3 files changed, 14 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index b01cca36147ff..e3a83ebd1b333 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -413,7 +413,6 @@ struct ftrace_func_entry {
>  };
>  
>  #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> -extern int ftrace_direct_func_count;
>  unsigned long ftrace_find_rec_direct(unsigned long ip);
>  int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr);
>  int unregister_ftrace_direct(struct ftrace_ops *ops, unsigned long addr,
> @@ -425,7 +424,6 @@ void ftrace_stub_direct_tramp(void);
>  
>  #else
>  struct ftrace_ops;
> -# define ftrace_direct_func_count 0
>  static inline unsigned long ftrace_find_rec_direct(unsigned long ip)
>  {
>   return 0;
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index c83c005e654e3..a130b2d898f7c 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -125,17 +125,6 @@ int function_graph_enter(unsigned long ret, unsigned 
> long func,
>  {
>   struct ftrace_graph_ent trace;
>  
> -#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
> - /*
> -  * Skip graph tracing if the return location is served by direct 
> trampoline,
> -  * since call sequence and return addresses are unpredictable anyway.
> -  * Ex: BPF trampoline may call original function and may skip frame
> -  * depending on type of BPF programs attached.
> -  */
> - if (ftrace_direct_func_count &&
> - ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE))
> - return -EBUSY;
> -#endif
>   trace.func = func;
>   trace.depth = ++current->curr_ret_depth;
>  
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index b18b4ece3d7c9..adf34167c3418 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2538,7 +2538,6 @@ ftrace_find_unique_ops(struct dyn_ftrace *rec)
>  /* Protected by rcu_tasks for reading, and direct_mutex for writing */
>  static struct ftrace_hash __rcu *direct_functions = EMPTY_HASH;
>  static DEFINE_MUTEX(direct_mutex);
> -int ftrace_direct_func_count;
>  
>  /*
>   * Search the direct_functions hash to see if the given instruction pointer
> -- 
> 2.45.0
> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Re: ftrace_direct_func_count ?

2024-05-06 Thread Dr. David Alan Gilbert
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Sat, 4 May 2024 13:35:26 +
> "Dr. David Alan Gilbert"  wrote:
> 
> > Hi,
> >   I've just posted a patch 'ftrace: Remove unused list 
> > 'ftrace_direct_funcs''
> > that clears out some old code, but while at it I noticed the global
> > 'ftrace_direct_func_count'.
> > 
> > As far as I can tell, it's never assigned (or initialised) but it is tested:
> > 
> > kernel/trace/fgraph.c:
> > #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
> >   /*
> >* Skip graph tracing if the return location is served by direct 
> > trampoline,
> >* since call sequence and return addresses are unpredictable anyway.
> >* Ex: BPF trampoline may call original function and may skip frame
> >* depending on type of BPF programs attached.
> >*/
> >   if (ftrace_direct_func_count &&
> >   ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE))
> > return -EBUSY;
> > #endif
> > 
> > So I wasn't sure whether it was just safe to nuke that section
> > or whether it really needed fixing?
> 
> Yes, after commit 8788ca164eb4bad ("ftrace: Remove the legacy
> _ftrace_direct API") that variable is no longer used.

OK, thanks, I'll send a follow up patch to my other patch to nuke
this as well.

Dave

> -- Steve
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



ftrace_direct_func_count ?

2024-05-04 Thread Dr. David Alan Gilbert
Hi,
  I've just posted a patch 'ftrace: Remove unused list 'ftrace_direct_funcs''
that clears out some old code, but while at it I noticed the global
'ftrace_direct_func_count'.

As far as I can tell, it's never assigned (or initialised) but it is tested:

kernel/trace/fgraph.c:
#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
  /*
   * Skip graph tracing if the return location is served by direct trampoline,
   * since call sequence and return addresses are unpredictable anyway.
   * Ex: BPF trampoline may call original function and may skip frame
   * depending on type of BPF programs attached.
   */
  if (ftrace_direct_func_count &&
  ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE))
return -EBUSY;
#endif

So I wasn't sure whether it was just safe to nuke that section
or whether it really needed fixing?

Dave

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Re: [PATCH] module: ban '.', '..' as module names, ban '/' in module names

2024-04-15 Thread Dr. David Alan Gilbert
* Alexey Dobriyan (adobri...@gmail.com) wrote:
> On Sun, Apr 14, 2024 at 01:58:55PM -0700, Luis Chamberlain wrote:
> > On Sun, Apr 14, 2024 at 10:05:05PM +0300, Alexey Dobriyan wrote:
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -3616,4 +3616,12 @@ extern int vfs_fadvise(struct file *file, loff_t 
> > > offset, loff_t len,
> > >  extern int generic_fadvise(struct file *file, loff_t offset, loff_t len,
> > >  int advice);
> > >  
> > > +/*
> > > + * Use this if data from userspace end up as directory/filename on
> > > + * some virtual filesystem.
> > > + */
> > > +static inline bool string_is_vfs_ready(const char *s)
> > > +{
> > > + return strcmp(s, ".") != 0 && strcmp(s, "..") != 0 && !strchr(s, '/');
> > > +}
> > >  #endif /* _LINUX_FS_H */
> > > --- a/kernel/module/main.c
> > > +++ b/kernel/module/main.c
> > > @@ -2893,6 +2893,11 @@ static int load_module(struct load_info *info, 
> > > const char __user *uargs,
> > >  
> > >   audit_log_kern_module(mod->name);
> > >  
> > > + if (!string_is_vfs_ready(mod->name)) {
> > > + err = -EINVAL;
> > > + goto free_module;
> > > + }
> > > +
> > 
> > Sensible change however to put string_is_vfs_ready() in include/linux/fs.h 
> > is a stretch if there really are no other users.
> 
> This is forward thinking patch :-)
> 
> Other subsystems may create files/directories in proc/sysfs, and should
> check for bad names as well:
> 
>   /proc/2821/net/dev_snmp6/eth0
> 
> This looks exactly like something coming from userspace and making it
> into /proc, so the filter function doesn't belong to kernel/module/internal.h

You mean like:

[24180.292204] tuxthe: renamed from tuxthe
root@dalek:/home/dg# ls /sys/class/net/
enp5s0  lo  tuxthe  tuxthe  tuxthe  virbr0  virbr1

?

Dave
> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Re: [PATCH] bpf/btf: Move tracing BTF APIs to the BTF library

2023-10-11 Thread Alan Maguire
On 10/10/2023 14:54, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) 
> 
> Move the BTF APIs used in tracing to the BTF library code for sharing it
> with others.
> Previously, to avoid complex dependency in a series I made it on the
> tracing tree, but now it is a good time to move it to BPF tree because
> these functions are pure BTF functions.
>

Makes sense to me. Two very small things - usual practice for
bpf-related changes is to specify "PATCH bpf-next" for changes like
this that target the -next tree. Other thing is I'm reasonably sure
no functional changes are intended - it's basically just a matter of
moving code from trace_btf -> btf - but would be good to confirm
that no functional changes are intended or similar in the commit
message. It's sort of implicit when you say "move the BTF APIs", but
would be good to confirm.


> Signed-off-by: Masami Hiramatsu (Google) 

Reviewed-by: Alan Maguire 


> ---
>  include/linux/btf.h|   24 +
>  kernel/bpf/btf.c   |  115 +
>  kernel/trace/Makefile  |1 
>  kernel/trace/trace_btf.c   |  122 
> 
>  kernel/trace/trace_btf.h   |   11 
>  kernel/trace/trace_probe.c |2 -
>  6 files changed, 140 insertions(+), 135 deletions(-)
>  delete mode 100644 kernel/trace/trace_btf.c
>  delete mode 100644 kernel/trace/trace_btf.h
> 
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 928113a80a95..8372d93ea402 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -507,6 +507,14 @@ btf_get_prog_ctx_type(struct bpf_verifier_log *log, 
> const struct btf *btf,
>  int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type 
> prog_type);
>  bool btf_types_are_same(const struct btf *btf1, u32 id1,
>   const struct btf *btf2, u32 id2);
> +const struct btf_type *btf_find_func_proto(const char *func_name,
> +struct btf **btf_p);
> +const struct btf_param *btf_get_func_param(const struct btf_type *func_proto,
> +s32 *nr);
> +const struct btf_member *btf_find_struct_member(struct btf *btf,
> + const struct btf_type *type,
> + const char *member_name,
> + u32 *anon_offset);
>  #else
>  static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
>   u32 type_id)
> @@ -559,6 +567,22 @@ static inline bool btf_types_are_same(const struct btf 
> *btf1, u32 id1,
>  {
>   return false;
>  }
> +static inline const struct btf_type *btf_find_func_proto(const char 
> *func_name,
> +  struct btf **btf_p)
> +{
> + return NULL;
> +}
> +static inline const struct btf_param *
> +btf_get_func_param(const struct btf_type *func_proto, s32 *nr)
> +{
> + return NULL;
> +}
> +static inline const struct btf_member *
> +btf_find_struct_member(struct btf *btf, const struct btf_type *type,
> +const char *member_name, u32 *anon_offset)
> +{
> + return NULL;
> +}
>  #endif
>  
>  static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct 
> btf_type *t)
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 8090d7fb11ef..e5cbf3b31b78 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -912,6 +912,121 @@ static const struct btf_type 
> *btf_type_skip_qualifiers(const struct btf *btf,
>   return t;
>  }
>  
> +/*
> + * Find a function proto type by name, and return the btf_type with its btf
> + * in *@btf_p. Return NULL if not found.
> + * Note that caller has to call btf_put(*@btf_p) after using the btf_type.
> + */
> +const struct btf_type *btf_find_func_proto(const char *func_name, struct btf 
> **btf_p)
> +{
> + const struct btf_type *t;
> + s32 id;
> +
> + id = bpf_find_btf_id(func_name, BTF_KIND_FUNC, btf_p);
> + if (id < 0)
> + return NULL;
> +
> + /* Get BTF_KIND_FUNC type */
> + t = btf_type_by_id(*btf_p, id);
> + if (!t || !btf_type_is_func(t))
> + goto err;
> +
> + /* The type of BTF_KIND_FUNC is BTF_KIND_FUNC_PROTO */
> + t = btf_type_by_id(*btf_p, t->type);
> + if (!t || !btf_type_is_func_proto(t))
> + goto err;
> +
> + return t;
> +err:
> + btf_put(*btf_p);
> + return NULL;
> +}
> +
> +/*
> + * Get function parameter with the number of parameters.
> + * This can

Re: [PATCH 0/4] tracing: improve symbolic printing

2023-10-04 Thread Alan Maguire
On 04/10/2023 22:43, Steven Rostedt wrote:
> On Wed, 4 Oct 2023 22:35:07 +0100
> Alan Maguire  wrote:
> 
>> One thing we've heard from some embedded folks [1] is that having
>> kernel BTF loadable as a separate module (rather than embedded in
>> vmlinux) would help, as there are size limits on vmlinux that they can
>> workaround by having modules on a different partition. We're hoping
>> to get that working soon. I was wondering if you see other issues around
>> BTF adoption for embedded systems that we could put on the to-do list?
>> Not necessarily for this particular use-case (since there are
>> complications with trace data as you describe), but just trying to make
>> sure we can remove barriers to BTF adoption where possible.
> 
> I wonder how easy is it to create subsets of BTF. For one thing, in the
> future we want to be able to trace the arguments of all functions. That is,
> tracing all functions at the same time (function tracer) and getting the
> arguments within the trace.
> 
> This would only require information about functions and their arguments,
> which would be very useful. Is BTF easy to break apart? That is, just
> generate the information needed for function arguments?
>

There has been a fair bit of effort around this from the userspace side;
the BTF gen efforts were focused around applications carrying the
minimum BTF for their needs, so just the structures needed by the
particular BPF programs rather than the full set of vmlinux structures
for example [1].

Parsing BTF in-kernel to pull out the BTF functions (BTF_KIND_FUNC),
their prototypes (BTF_KIND_FUNC_PROTO) and all associated parameters
would be pretty straightforward I think, especially if you don't need
the structures that are passed via pointers. So if you're starting with
the full BTF, creating a subset for use in tracing would be reasonably
straightforward. My personal preference would always be to have the
full BTF where possible, but if that wasn't feasible on some systems
we'd need to add some options to pahole/libbpf to support such trimming
during the DWARF->BTF translation process.

Alan

[1] https://lore.kernel.org/bpf/20220209222646.348365-7-mauri...@kinvolk.io/

> Note, pretty much all functions do not pass structures by values, and this
> would not need to know the contents of a pointer to a structure. This would
> mean that structure layout information is not needed.
> 
> -- Steve
> 



Re: [PATCH 0/4] tracing: improve symbolic printing

2023-10-04 Thread Alan Maguire
On 04/10/2023 18:29, Steven Rostedt wrote:
> On Wed, 4 Oct 2023 09:54:31 -0700
> Jakub Kicinski  wrote:
> 
>> On Wed, 4 Oct 2023 12:35:24 -0400 Steven Rostedt wrote:
>>>> Potentially naive question - the trace point holds enum skb_drop_reason.
>>>> The user space can get the names from BTF. Can we not teach user space
>>>> to generically look up names of enums in BTF?
>>>
>>> That puts a hard requirement to include BTF in builds where it was not
>>> needed before. I really do not want to build with BTF just to get access to
>>> these symbols. And since this is used by the embedded world, and BTF is
>>> extremely bloated, the short answer is "No".  
>>
>> Dunno. BTF is there most of the time. It could make the life of
>> majority of the users far more pleasant.
> 
> BTF isn't there for a lot of developers working in embedded who use this
> code. Most my users that I deal with have minimal environments, so BTF is a
> showstopper.

One thing we've heard from some embedded folks [1] is that having
kernel BTF loadable as a separate module (rather than embedded in
vmlinux) would help, as there are size limits on vmlinux that they can
workaround by having modules on a different partition. We're hoping
to get that working soon. I was wondering if you see other issues around
BTF adoption for embedded systems that we could put on the to-do list?
Not necessarily for this particular use-case (since there are
complications with trace data as you describe), but just trying to make
sure we can remove barriers to BTF adoption where possible.

Thanks!

Alan

[1]
https://lore.kernel.org/bpf/CAHBbfcUkr6fTm2X9GNsFNqV75fTG=abqxfx_8ayk+4hk7he...@mail.gmail.com/

> 
>>
>> I hope we can at least agree that the current methods of generating 
>> the string arrays at C level are... aesthetically displeasing.
> 
> I don't know, I kinda like it ;-)
> 
> -- Steve
> 



Re: [PATCH 2/8] usb: gadget: add anonymous definition in some struct for trace purpose

2023-09-14 Thread Alan Stern
On Fri, Sep 15, 2023 at 09:02:48AM +0800, Linyu Yuan wrote:
> 
> On 9/14/2023 10:54 PM, Alan Stern wrote:
> > You didn't include the version number in the Subject: line.  Undoubtedly
> > Greg's automatic error checker will warn you about this.  Unless the
> > version number is clearly marked for each patch, it's difficult for his
> > programs to tell which email message contains the most recent version.
> > 
> > On Thu, Sep 14, 2023 at 06:02:56PM +0800, Linyu Yuan wrote:
> > > Some UDC trace event will save usb udc information, but it use one int
> > > size buffer to save one bit information of usb udc, it is wast trace
> > > buffer.
> > > 
> > > Add anonymous union which have one u32 member can be used by trace event
> > > during fast assign stage to save more entries with same trace ring buffer
> > > size.
> > > 
> > > Signed-off-by: Linyu Yuan 
> > > ---
> > And you didn't include the version change information here, below the
> > "---" line.
> > 
> > Apart from that, this is a _lot_ better than before!  I don't know if
> > Greg will think this change is worth merging, but at least now it's
> > possible to read the code and understand what's going on.
> 
> 
> according Steven's comment, maybe will always save data in little endian at
> trace event
> 
> fast assign stage.
> 
> it will add definition of bit field back.

Yes, that would be even better because you wouldn't have to change the 
definition of struct usb_gadget or struct usb_endpoint at all.  The fast 
assign stage can simply do:

__entry->dw1 = (g->sg_supported << 0) |
(g->is_otg << 1) |
...

and then you can easily access the individual bits in __entry.  It 
wouldn't be as fast but it would still save a lot of space.

Alan Stern


Re: [PATCH 2/8] usb: gadget: add anonymous definition in some struct for trace purpose

2023-09-14 Thread Alan Stern
You didn't include the version number in the Subject: line.  Undoubtedly 
Greg's automatic error checker will warn you about this.  Unless the 
version number is clearly marked for each patch, it's difficult for his 
programs to tell which email message contains the most recent version.

On Thu, Sep 14, 2023 at 06:02:56PM +0800, Linyu Yuan wrote:
> Some UDC trace event will save usb udc information, but it use one int
> size buffer to save one bit information of usb udc, it is wast trace
> buffer.
> 
> Add anonymous union which have one u32 member can be used by trace event
> during fast assign stage to save more entries with same trace ring buffer
> size.
> 
> Signed-off-by: Linyu Yuan 
> ---

And you didn't include the version change information here, below the 
"---" line.

Apart from that, this is a _lot_ better than before!  I don't know if 
Greg will think this change is worth merging, but at least now it's 
possible to read the code and understand what's going on.

Alan Stern


Re: [PATCH 10/19] USB: gadget/legacy: remove sb_mutex

2023-09-13 Thread Alan Stern
On Wed, Sep 13, 2023 at 08:10:04AM -0300, Christoph Hellwig wrote:
> Creating new a new super_block vs freeing the old one for single instance
> file systems is serialized by the wait for SB_DEAD.
> 
> Remove the superfluous sb_mutex.
> 
> Signed-off-by: Christoph Hellwig 
> ---

You might mention that this is essentially a reversion of commit 
d18dcfe9860e ("USB: gadgetfs: Fix race between mounting and 
unmounting").

Alan Stern

>  drivers/usb/gadget/legacy/inode.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/usb/gadget/legacy/inode.c 
> b/drivers/usb/gadget/legacy/inode.c
> index ce9e31f3d26bcc..a203266bc0dc82 100644
> --- a/drivers/usb/gadget/legacy/inode.c
> +++ b/drivers/usb/gadget/legacy/inode.c
> @@ -229,7 +229,6 @@ static void put_ep (struct ep_data *data)
>   */
>  
>  static const char *CHIP;
> -static DEFINE_MUTEX(sb_mutex);   /* Serialize superblock 
> operations */
>  
>  /*--*/
>  
> @@ -2012,8 +2011,6 @@ gadgetfs_fill_super (struct super_block *sb, struct 
> fs_context *fc)
>   struct dev_data *dev;
>   int rc;
>  
> - mutex_lock(_mutex);
> -
>   if (the_device) {
>   rc = -ESRCH;
>   goto Done;
> @@ -2069,7 +2066,6 @@ gadgetfs_fill_super (struct super_block *sb, struct 
> fs_context *fc)
>   rc = -ENOMEM;
>  
>   Done:
> - mutex_unlock(_mutex);
>   return rc;
>  }
>  
> @@ -2092,7 +2088,6 @@ static int gadgetfs_init_fs_context(struct fs_context 
> *fc)
>  static void
>  gadgetfs_kill_sb (struct super_block *sb)
>  {
> - mutex_lock(_mutex);
>   kill_litter_super (sb);
>   if (the_device) {
>   put_dev (the_device);
> @@ -2100,7 +2095,6 @@ gadgetfs_kill_sb (struct super_block *sb)
>   }
>   kfree(CHIP);
>   CHIP = NULL;
> - mutex_unlock(_mutex);
>  }
>  
>  /*--*/
> -- 
> 2.39.2
> 


Re: [PATCH] USB: Add reset-resume quirk for WD19's Realtek Hub

2021-04-20 Thread Alan Stern
On Wed, Apr 21, 2021 at 01:46:51AM +0800, chris.c...@canonical.com wrote:
> From: Chris Chiu 
> 
> Realtek Hub (0bda:5487) in Dell Dock WD19 sometimes fails to work
> after the system resumes from suspend with remote wakeup enabled
> device connected:
> [ 1947.640907] hub 5-2.3:1.0: hub_ext_port_status failed (err = -71)
> [ 1947.641208] usb 5-2.3-port5: cannot disable (err = -71)
> [ 1947.641401] hub 5-2.3:1.0: hub_ext_port_status failed (err = -71)
> [ 1947.641450] usb 5-2.3-port4: cannot reset (err = -71)
> 
> Information of this hub:
> T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480  MxCh= 5
> D:  Ver= 2.10 Cls=09(hub  ) Sub=00 Prot=02 MxPS=64 #Cfgs=  1
> P:  Vendor=0bda ProdID=5487 Rev= 1.47
> S:  Manufacturer=Dell Inc.
> S:  Product=Dell dock
> C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=  0mA
> I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=01 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms
> I:* If#= 0 Alt= 1 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=02 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms
> 
> The failure results from the ETIMEDOUT by chance when turning on
> the suspend feature for the specified port of the hub. The port
> seems to be in an unknown state so the hub_activate during resume
> fails the hub_port_status, then the hub will fail to work.
> 
> The quirky hub needs the reset-resume quirk to function correctly.
> 
> Signed-off-by: Chris Chiu 
> ---
>  drivers/usb/core/quirks.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
> index 76ac5d6555ae..4e2483e34250 100644
> --- a/drivers/usb/core/quirks.c
> +++ b/drivers/usb/core/quirks.c
> @@ -406,6 +406,7 @@ static const struct usb_device_id usb_quirk_list[] = {
>  
>   /* Realtek hub in Dell WD19 (Type-C) */
>   { USB_DEVICE(0x0bda, 0x0487), .driver_info = USB_QUIRK_NO_LPM },
> + { USB_DEVICE(0x0bda, 0x5487), .driver_info = USB_QUIRK_RESET_RESUME },
>  
>   /* Generic RTL8153 based ethernet adapters */
>   { USB_DEVICE(0x0bda, 0x8153), .driver_info = USB_QUIRK_NO_LPM },

Acked-by: Alan Stern 


Re: [PATCH v3] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-20 Thread Alan Stern
On Tue, Apr 20, 2021 at 03:14:56PM +0800, Chris Chiu wrote:
> On Mon, Apr 19, 2021 at 10:19 PM Alan Stern  wrote:
> >
> > On Mon, Apr 19, 2021 at 01:11:38AM -0400, Chris Chiu wrote:
> > > Sorry that I didn't make myself clear. I found that if I applied 
> > > RESET_RESUME
> > > quirk on the problematic hub, the Set-Port-Feature(suspend) timeout error
> > > disappeared. SInce the timeout is not happening for each suspend by 
> > > default,
> > > I suspect maybe reset-resume take everything back to clean state for the 
> > > hub
> > > and the Set-Port-Feature(suspend) can be taken care of w/o problems.
> >
> > Okay, that's a good solution for system suspend.
> >
> > > I didn't like RESET_RESUME because runtime PM would not work on the 
> > > quirked
> > > device.
> >
> > A more interesting question is whether it will work for devices plugged
> > into the hub.  Even though the hub won't be runtime suspended, the
> > things attached to it might be.
> >
> > >  But if the Set-Port-Feature(suspend) can't be handled and
> > > skipped, I can't
> > > expect the runtime PM to work for all devices connected to the hub either.
> > > Is that right? If what I proposed in the patch can not get better
> > > result than existing
> > > quirk, I think using the RESET_RESUME would be a better option. Any 
> > > suggestions?
> >
> > Try the RESET_RESUME quirk and see how well it works with runtime
> > suspend.
> >
> > Alan Stern
> 
> [  453.064346] usb 3-4: finish reset-resume
> [  453.192387] usb 3-4: reset high-speed USB device number 2 using xhci_hcd
> [  453.339916] usb 3-4: USB quirks for this device: 2

Here 3-4 is problematic RealTek hub, right?

> Seems that even w/ the RESET_RESUME enabled, the connected device still
> can runtime suspend/resume. That's acceptable to me. I'll send the patch
> with the reset-resume quirk later.
> 
> [  626.081068] usb 3-4.3.1: usb auto-suspend, wakeup 0
> [  632.552071] usb 3-4.3.1: usb auto-resume
> [  632.617467] usb 3-4.3.1: Waited 0ms for CONNECT
> [  632.617471] usb 3-4.3.1: finish resume

Then 3-4.3 is another hub plugged into the Realtek hub, and 3-4.3.1 (the 
device being suspended and resumed) is plugged into that other hub.

I'm concerned about devices that are plugged directly into the Realtek 
hub.  For example, did you try allowing the 3-4.3 hub in the experiment 
above to suspend and resume?

Alan Stern


Re: [PATCH] usb: storage: datafab: remove redundant assignment of variable result

2021-04-20 Thread Alan Stern
On Tue, Apr 20, 2021 at 12:38:18PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> The variable result is being assigned with a value that is
> never read, the assignment is redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 
> ---

Acked-by: Alan Stern 

>  drivers/usb/storage/datafab.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/usb/storage/datafab.c b/drivers/usb/storage/datafab.c
> index 588818483f4b..bcc4a2fad863 100644
> --- a/drivers/usb/storage/datafab.c
> +++ b/drivers/usb/storage/datafab.c
> @@ -294,7 +294,6 @@ static int datafab_write_data(struct us_data *us,
>   if (reply[0] != 0x50 && reply[1] != 0) {
>   usb_stor_dbg(us, "Gah! write return code: %02x %02x\n",
>reply[0], reply[1]);
> - result = USB_STOR_TRANSPORT_ERROR;
>   goto leave;
>   }
>  
> -- 
> 2.30.2
> 


Re: [PATCH v3] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-19 Thread Alan Stern
On Mon, Apr 19, 2021 at 01:11:38AM -0400, Chris Chiu wrote:
> Sorry that I didn't make myself clear. I found that if I applied RESET_RESUME
> quirk on the problematic hub, the Set-Port-Feature(suspend) timeout error
> disappeared. SInce the timeout is not happening for each suspend by default,
> I suspect maybe reset-resume take everything back to clean state for the hub
> and the Set-Port-Feature(suspend) can be taken care of w/o problems.

Okay, that's a good solution for system suspend.

> I didn't like RESET_RESUME because runtime PM would not work on the quirked
> device.

A more interesting question is whether it will work for devices plugged 
into the hub.  Even though the hub won't be runtime suspended, the 
things attached to it might be.

>  But if the Set-Port-Feature(suspend) can't be handled and
> skipped, I can't
> expect the runtime PM to work for all devices connected to the hub either.
> Is that right? If what I proposed in the patch can not get better
> result than existing
> quirk, I think using the RESET_RESUME would be a better option. Any 
> suggestions?

Try the RESET_RESUME quirk and see how well it works with runtime 
suspend.

Alan Stern


Re: [PATCH] usb: gadget: dummy_hcd: fix gpf in gadget_setup

2021-04-17 Thread Alan Stern
On Sat, Apr 17, 2021 at 06:22:09PM +0530, Anirudh Rayabharam wrote:
> Fix a general protection fault reported by syzbot due to a race between
> gadget_setup() and gadget_unbind() in raw_gadget.
> 
> The gadget core is supposed to guarantee that there won't be any more
> callbacks to the gadget driver once the driver's unbind routine is
> called. That guarantee is enforced in usb_gadget_remove_driver as
> follows:
> 
> usb_gadget_disconnect(udc->gadget);
> if (udc->gadget->irq)
> synchronize_irq(udc->gadget->irq);
> udc->driver->unbind(udc->gadget);
> usb_gadget_udc_stop(udc);
> 
> usb_gadget_disconnect turns off the pullup resistor, telling the host
> that the gadget is no longer connected and preventing the transmission
> of any more USB packets. Any packets that have already been received
> are sure to processed by the UDC driver's interrupt handler by the time
> synchronize_irq returns.
> 
> But this doesn't work with dummy_hcd, because dummy_hcd doesn't use
> interrupts; it uses a timer instead.  It does have code to emulate the
> effect of synchronize_irq, but that code doesn't get invoked at the
> right time -- it currently runs in usb_gadget_udc_stop, after the unbind
> callback instead of before.  Indeed, there's no way for
> usb_gadget_remove_driver to invoke this code before the unbind
> callback.
> 
> To fix this, move the synchronize_irq() emulation code to dummy_pullup
> so that it runs before unbind. Also, add a comment explaining why it is
> necessary to have it there.
> 
> Suggested-by: Alan Stern 
> Reported-by: syzbot+eb4674092e6cc8d9e...@syzkaller.appspotmail.com
> Signed-off-by: Anirudh Rayabharam 
> ---
>  drivers/usb/gadget/udc/dummy_hcd.c | 23 +++
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/usb/gadget/udc/dummy_hcd.c 
> b/drivers/usb/gadget/udc/dummy_hcd.c
> index ce24d4f28f2a..d0dae6406612 100644
> --- a/drivers/usb/gadget/udc/dummy_hcd.c
> +++ b/drivers/usb/gadget/udc/dummy_hcd.c
> @@ -903,6 +903,21 @@ static int dummy_pullup(struct usb_gadget *_gadget, int 
> value)
>   spin_lock_irqsave(>lock, flags);
>   dum->pullup = (value != 0);
>   set_link_state(dum_hcd);
> + if (value == 0) {
> + /*
> +  * emulate synchronize_irq(): wait for callbacks to finish
> +  * This seems to be the best to place to emulate the call
> +  * to synchronize_irq(). Doing it in dummy_udc_stop() would
> +  * be too late since it is called after the unbind callback.
> +  * Also, there is no way for core:usb_gadget_remove_driver()
> +  * to invoke this code before the unbind callback.
> +  */

This comment could be edited a little better.  It should start with a 
capital letter, and there should be a period at the end of the first 
line.  There is an extra "to" before "place to emulate".  The last 
sentence isn't really needed.

Also, you could be more specific.  The call to synchronize_irq() which 
we want to emulate is the one in usb_gadget_remove_driver().  And the 
reason we want to do it before the unbind callback is because unbind 
shouldn't be invoked until all the other callbacks are finished.

Once those changes are made, you can add:

Acked-by: Alan Stern 

Alan Stern

> + while (dum->callback_usage > 0) {
> + spin_unlock_irqrestore(>lock, flags);
> + usleep_range(1000, 2000);
> + spin_lock_irqsave(>lock, flags);
> + }
> + }
>   spin_unlock_irqrestore(>lock, flags);
>  
>   usb_hcd_poll_rh_status(dummy_hcd_to_hcd(dum_hcd));
> @@ -1004,14 +1019,6 @@ static int dummy_udc_stop(struct usb_gadget *g)
>   spin_lock_irq(>lock);
>   dum->ints_enabled = 0;
>   stop_activity(dum);
> -
> - /* emulate synchronize_irq(): wait for callbacks to finish */
> - while (dum->callback_usage > 0) {
> - spin_unlock_irq(>lock);
> - usleep_range(1000, 2000);
> - spin_lock_irq(>lock);
> - }
> -
>   dum->driver = NULL;
>   spin_unlock_irq(>lock);
>  
> -- 
> 2.26.2
> 


Re: [RFC PATCH] USB:XHCI:skip hub registration

2021-04-17 Thread Alan Stern
On Sat, Apr 17, 2021 at 02:48:22PM +0800, liulongfang wrote:
> On 2021/4/16 23:20, Alan Stern wrote:
> > On Fri, Apr 16, 2021 at 10:03:21AM +0800, liulongfang wrote:
> >> The current method is an improved method of the above patch.
> >> This patch just make it skip registering USB-3 root hub if that hub has no 
> >> ports,
> > 
> > No, that isn't what this patch does.
> > 
> > If the root hub wasn't registered, hub_probe wouldn't get called.  But 
> > with your patch, the system tries to register the root hub, and it does 
> > call hub_probe, and then that function fails with a warning message.
> > 
> > The way to _really_ akip registering the root hub is to change the 
> > xhci-hcd code.  Make it skip calling usb_add_hcd.
> > 
> 
> If you do not register in the root hub, this will return an error code,

What will return an error code?  Are you talking about xhci_pci_probe()?  
You oight to be able to figure out how to make it work.

> which will make all the XHCI drivers unregister, causing the USB2.0 
> controllers
> on the xhci to be unavailable.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 10:35:20PM +0530, Anirudh Rayabharam wrote:
> On Fri, Apr 16, 2021 at 11:27:34AM -0400, Alan Stern wrote:
> > Actually, I wanted to move this emulation code into a new subroutine and 
> > then call that subroutine from _both_ places.  Would you like to write 
> 
> Does it really need to be called from both places?

You know, I was going to say Yes, but now I think you're right; it's not 
needed in dummy_udc_stop.  This is because core.c always calls 
usb_gadget_disconnect before usb_gadget_udc_stop.  And we can rely on 
this behavior; it's obviously necessary to disconnect from the host 
before stopping the UDC driver.

On the other hand, while checking that fact I noticed that 
soft_connect_store in core.c doesn't call synchronize_irq in between the 
other two, the way usb_gadget_remove_driver does.  That seems like a bug 
-- if it's necessary to synchronize with the IRQ handler on one path, it 
should be necessary on the other path as well.  But that's a matter for 
a separate patch.

Alan Stern

> > and submit a patch that does this?
> 
> Sure! I will do that.
> 
> Thanks!
> 
>   - Anirudh.


Re: [PATCH v3] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 09:24:30AM +0800, Chris Chiu wrote:
> On Fri, Apr 16, 2021 at 2:46 AM Alan Stern  wrote:
> >
> > On Fri, Apr 16, 2021 at 12:13:43AM +0800, Chris Chiu wrote:
> > > One thing worth mentioning here, I never hit the hub_ext_port_status -71
> > > problem if I resume by waking up from the keyboard connected to the hub.
> >
> > I thought you said earlier that the port got into trouble while it was
> > suspending, not while it was resuming.  You wrote:
> >
> > > [ 2789.679812] usb 3-4-port3: can't suspend, status -110
> >
> > So if the problem occurs during suspend, how can it be possible to avoid
> > the problem by taking some particular action later while resuming?
> >
> 
> The ETIMEDOUT is still there, but the port can resume w/o problems and I
> don't really know what the hub did. I can only reset_resume the hub to bring 
> it
> back if I'm not waking up from the connected keyboard.

What if two devices are plugged into the hub, only one of them is 
runtime suspended, and you need to runtime resume that device?  Then you 
can't do a reset-resume of the hub, because the hub isn't suspended.

> > > But the usbcore kernel log shows me wPortStatus: 0503 wPortChane: 0004
> > > of that port while resuming. In normal cases, they are 0507:.
> >
> > The 0004 bit of wPortChange means that the suspend status has changed;
> > the port is no longer suspended because the device attached to that port
> > (your keyboard) issued a wakeup request.
> >
> > >  I don't know how to SetPortFeature() with setting the status change bit 
> > > only.
> >
> > You can't.  Only the hub itself can set the wPortChange bits.
> >
> > > Or maybe it's just some kind of timing issue of the
> > > idle/suspend/resume signaling.
> >
> > Not timing.  It's because you woke the system up from the attached
> > keyboard.
> >
> > Alan Stern
> 
> Got it. I'm just confused by the USB 2.0 spec 11.24.2.7.2 that
> "Hubs may allow setting of the status change bits with a SetPortFeature()
>  request for diagnostic purposes."

Yeah, I don't think very many hubs actually do that.

> So for this case, I think USB_QUIRK_RESET_RESUME would be a
> better option to at least bring back the port. Any suggestion about what
> kind of test I can do on this hub? Thanks

I'm not sure what you're proposing.

If (as mentioned above) the hub doesn't handle the 
Set-Port-Feature(suspend) request properly, then we should avoid issuing 
that request.  Which means runtime suspend attempts should not be 
allowed, as in your most recent patch.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 09:21:12AM +0200, Dmitry Vyukov wrote:
> On Thu, Apr 15, 2021 at 10:59 PM Alan Stern  wrote:
> >
> > On Tue, Apr 13, 2021 at 07:11:11PM +0200, Dmitry Vyukov wrote:
> > > Yes, this is possible:
> > > http://bit.do/syzbot#syzkaller-reproducers
> >
> > That's not really what I had in mind.  I don't want to spend the time
> > and effort installing syskaller on my own system; I want to tell syzbot
> > to run a particular syzkaller program (the one that originally led to
> > this bug report) on a patched kernel.
> >
> > The syzbot instructions say that it can test bugs with reproducers.  The
> > problem here is that there doesn't seem to be any way to tell it to use
> > a particular syzkaller program as a reproducer.
> 
> Hi Alan,
> 
> This makes sense and I've found an existing feature request:
> https://github.com/google/syzkaller/issues/1611
> I've added a reference to this thread there.

Great!  Thank you.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 11:10:35AM +0530, Anirudh Rayabharam wrote:
> On Tue, Apr 13, 2021 at 12:13:11PM -0400, Alan Stern wrote:
> > Maybe we can test this reasoning by putting a delay just before the call 
> > to dum->driver->setup.  That runs in the timer handler, so it's not a 
> > good place to delay, but it may be okay just for testing purposes.
> > 
> > Hopefully this patch will make the race a lot more likely to occur.  Is 
> 
> Hi Alan, 
> 
> Indeed, I was able to reproduce this bug easily on my machine with your
> delay patch applied and using this syzkaller program:
> 
> syz_usb_connect$cdc_ncm(0x1, 0x6e, &(0x7f40)={{0x12, 0x1, 0x0, 0x2, 
> 0x0, 0x0, 0x8, 0x525, 0xa4a1, 0x40, 0x1, 0x2, 0x3, 0x1, [{{0x9, 0x2, 0x5c, 
> 0x2, 0x1, 0x0, 0x0, 0x0, {{0x9, 0x4, 0x0, 0x0, 0x1, 0x2, 0xd, 0x0, 0x0, 
> {{0x5}, {0x5}, {0xd}, {0x6}}, {{0x9, 0x5, 0x81, 0x3, 0x200}}]}}, 
> &(0x7f000480)={0x0, 0x0, 0x0, 0x0, 0x3, [{0x0, 0x0}, {0x0, 0x0}, {0x0, 
> 0x0}]})
> 
> I also tested doing the synchronize_irq emulation in dummy_pullup and it
> fixed the issue. The patch is below.

That's great!  Thanks for testing.

> Thanks!
> 
>   - Anirudh.
> 
> diff --git a/drivers/usb/gadget/udc/dummy_hcd.c 
> b/drivers/usb/gadget/udc/dummy_hcd.c
> index ce24d4f28f2a..931d4612d859 100644
> --- a/drivers/usb/gadget/udc/dummy_hcd.c
> +++ b/drivers/usb/gadget/udc/dummy_hcd.c
> @@ -903,6 +903,12 @@ static int dummy_pullup(struct usb_gadget *_gadget, int 
> value)
>   spin_lock_irqsave(>lock, flags);
>   dum->pullup = (value != 0);
>   set_link_state(dum_hcd);
> + /* emulate synchronize_irq(): wait for callbacks to finish */
> + while (dum->callback_usage > 0) {
> + spin_unlock_irqrestore(>lock, flags);
> + usleep_range(1000, 2000);
> + spin_lock_irqsave(>lock, flags);
> + }

We should do this only if value == 0.  No synchronization is needed when 
the pullup is turned on.

Also, there should be a comment explaining that this is necessary 
because there's no other place to emulate the call made to 
synchronize_irq() in core.c:usb_gadget_remove_driver().

>   spin_unlock_irqrestore(>lock, flags);
>  
>   usb_hcd_poll_rh_status(dummy_hcd_to_hcd(dum_hcd));
> @@ -1005,13 +1011,6 @@ static int dummy_udc_stop(struct usb_gadget *g)
>   dum->ints_enabled = 0;
>   stop_activity(dum);
>  
> - /* emulate synchronize_irq(): wait for callbacks to finish */
> - while (dum->callback_usage > 0) {
> - spin_unlock_irq(>lock);
> - usleep_range(1000, 2000);
> - spin_lock_irq(>lock);
> - }
> -
>   dum->driver = NULL;
>   spin_unlock_irq(>lock);

Actually, I wanted to move this emulation code into a new subroutine and 
then call that subroutine from _both_ places.  Would you like to write 
and submit a patch that does this?

Alan Stern


Re: [RFC PATCH] USB:XHCI:skip hub registration

2021-04-16 Thread Alan Stern
On Fri, Apr 16, 2021 at 10:03:21AM +0800, liulongfang wrote:
> On 2021/4/15 22:43, Alan Stern wrote:
> > On Thu, Apr 15, 2021 at 08:22:38PM +0800, Longfang Liu wrote:
> >> When the number of ports on the USB hub is 0, skip the registration
> >> operation of the USB hub.
> >>
> >> The current Kunpeng930's XHCI hardware controller is defective. The number
> >> of ports on its USB3.0 bus controller is 0, and the number of ports on
> >> the USB2.0 bus controller is 1.
> >>
> >> In order to solve this problem that the USB3.0 controller does not have
> >> a port which causes the registration of the hub to fail, this patch passes
> >> the defect information by adding flags in the quirks of xhci and usb_hcd,
> >> and finally skips the registration process of the hub directly according
> >> to the results of these flags when the hub is initialized.
> >>
> >> Signed-off-by: Longfang Liu 
> > 
> > The objections that Greg raised are all good ones.
> > 
> > But even aside from them, this patch doesn't actually do what the 
> > description says.  The patch doesn't remove the call to usb_add_hcd 
> > for the USB-3 bus.  If you simply skipped that call (and the 
> > corresponding call to usb_remove_hcd) when there are no 
> > ports on the root hub, none of the stuff in this patch would be needed.
> > 
> > Alan Stern
> > 
> 
> "[RFC PATCH] USB:XHCI:Adjust the log level of hub"

I don't understand.  What patch is that?  Do you have a URL for it?

> The current method is an improved method of the above patch.
> This patch just make it skip registering USB-3 root hub if that hub has no 
> ports,

No, that isn't what this patch does.

If the root hub wasn't registered, hub_probe wouldn't get called.  But 
with your patch, the system tries to register the root hub, and it does 
call hub_probe, and then that function fails with a warning message.

The way to _really_ akip registering the root hub is to change the 
xhci-hcd code.  Make it skip calling usb_add_hcd.

> after skipping registering, no port will not report error log,the goal of this
> patch is reached without error log output.

Why do you want to get rid of the error log output?  There really _is_ 
an error, because the USB-3 hardware on your controller is defective.  
Since the hardware is buggy, we _should_ print an error message in the 
kernel log.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-15 Thread Alan Stern
On Tue, Apr 13, 2021 at 07:11:11PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 13, 2021 at 6:57 PM Alan Stern  wrote:
> >
> > On Tue, Apr 13, 2021 at 06:47:47PM +0200, Dmitry Vyukov wrote:
> > > On Tue, Apr 13, 2021 at 6:13 PM Alan Stern  
> > > wrote:
> > > > Hopefully this patch will make the race a lot more likely to occur.  Is
> > > > there any way to tell syzkaller to test it, despite the fact that
> > > > syzkaller doesn't think it has a reproducer for this issue?
> > >
> > > If there is no reproducer the only way syzbot can test it is if it's
> > > in linux-next under CONFIG_DEBUG_AID_FOR_SYZBOT:
> > > http://bit.do/syzbot#no-custom-patches
> >
> > There _is_ a theoretical reproducer: the test that provoked syzkaller's
> > original bug report.  But syzkaller doesn't realize that it is (or may
> > be) a reproducer.
> >
> > It ought to be possible to ask syzkaller to run a particular test that
> > it has done before, with a patch applied -- and without having to add
> > anything to linux-next.
> 
> Yes, this is possible:
> http://bit.do/syzbot#syzkaller-reproducers

That's not really what I had in mind.  I don't want to spend the time 
and effort installing syskaller on my own system; I want to tell syzbot 
to run a particular syzkaller program (the one that originally led to 
this bug report) on a patched kernel.

The syzbot instructions say that it can test bugs with reproducers.  The 
problem here is that there doesn't seem to be any way to tell it to use 
a particular syzkaller program as a reproducer.

Alan Stern

> The log of tests executed before the crash is available under the
> "console output" link:
> console output: https://syzkaller.appspot.com/x/log.txt?x=124adbf6d0
> And this log can be replayed using syz-execprog utility.


Re: [PATCH v3] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-15 Thread Alan Stern
On Fri, Apr 16, 2021 at 12:13:43AM +0800, Chris Chiu wrote:
> One thing worth mentioning here, I never hit the hub_ext_port_status -71
> problem if I resume by waking up from the keyboard connected to the hub.

I thought you said earlier that the port got into trouble while it was 
suspending, not while it was resuming.  You wrote:

> [ 2789.679812] usb 3-4-port3: can't suspend, status -110

So if the problem occurs during suspend, how can it be possible to avoid 
the problem by taking some particular action later while resuming?

> But the usbcore kernel log shows me wPortStatus: 0503 wPortChane: 0004
> of that port while resuming. In normal cases, they are 0507:.

The 0004 bit of wPortChange means that the suspend status has changed; 
the port is no longer suspended because the device attached to that port 
(your keyboard) issued a wakeup request.

>  I don't know how to SetPortFeature() with setting the status change bit only.

You can't.  Only the hub itself can set the wPortChange bits.

> Or maybe it's just some kind of timing issue of the
> idle/suspend/resume signaling.

Not timing.  It's because you woke the system up from the attached 
keyboard.

Alan Stern


Re: [RFC PATCH] USB:XHCI:skip hub registration

2021-04-15 Thread Alan Stern
On Thu, Apr 15, 2021 at 08:22:38PM +0800, Longfang Liu wrote:
> When the number of ports on the USB hub is 0, skip the registration
> operation of the USB hub.
> 
> The current Kunpeng930's XHCI hardware controller is defective. The number
> of ports on its USB3.0 bus controller is 0, and the number of ports on
> the USB2.0 bus controller is 1.
> 
> In order to solve this problem that the USB3.0 controller does not have
> a port which causes the registration of the hub to fail, this patch passes
> the defect information by adding flags in the quirks of xhci and usb_hcd,
> and finally skips the registration process of the hub directly according
> to the results of these flags when the hub is initialized.
> 
> Signed-off-by: Longfang Liu 

The objections that Greg raised are all good ones.

But even aside from them, this patch doesn't actually do what the 
description says.  The patch doesn't remove the call to usb_add_hcd 
for the USB-3 bus.  If you simply skipped that call (and the 
corresponding call to usb_remove_hcd) when there are no 
ports on the root hub, none of the stuff in this patch would be needed.

Alan Stern

> ---
>  drivers/usb/core/hub.c  | 6 ++
>  drivers/usb/host/xhci-pci.c | 4 
>  drivers/usb/host/xhci.c | 5 +
>  drivers/usb/host/xhci.h | 1 +
>  include/linux/usb/hcd.h | 1 +
>  5 files changed, 17 insertions(+)
> 
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index b1e14be..2d6869d 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -1769,9 +1769,15 @@ static int hub_probe(struct usb_interface *intf, const 
> struct usb_device_id *id)
>   struct usb_host_interface *desc;
>   struct usb_device *hdev;
>   struct usb_hub *hub;
> + struct usb_hcd *hcd;
>  
>   desc = intf->cur_altsetting;
>   hdev = interface_to_usbdev(intf);
> + hcd = bus_to_hcd(hdev->bus);
> + if (hcd->usb3_no_port) {
> + dev_warn(>dev, "USB hub has no port\n");
> + return -ENODEV;
> + }
>  
>   /*
>* Set default autosuspend delay as 0 to speedup bus suspend,
> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index ef513c2..63b89a4 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -281,6 +281,10 @@ static void xhci_pci_quirks(struct device *dev, struct 
> xhci_hcd *xhci)
>   if (xhci->quirks & XHCI_RESET_ON_RESUME)
>   xhci_dbg_trace(xhci, trace_xhci_dbg_quirks,
>   "QUIRK: Resetting on resume");
> +
> + if (pdev->vendor == PCI_VENDOR_ID_HUAWEI &&
> + pdev->device == 0xa23c)
> + xhci->quirks |= XHCI_USB3_NOPORT;
>  }
>  
>  #ifdef CONFIG_ACPI
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index bee5dec..e3e3573 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -5184,6 +5184,11 @@ int xhci_gen_setup(struct usb_hcd *hcd, 
> xhci_get_quirks_t get_quirks)
>   /* xHCI private pointer was set in xhci_pci_probe for the second
>* registered roothub.
>*/
> + if (xhci->quirks & XHCI_USB3_NOPORT) {
> + xhci_info(xhci, "xHCI host has no port\n");
> + hcd->usb3_no_port = 1;
> + }
> +
>   return 0;
>   }
>  
> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
> index 2c6c4f8..d3c658f 100644
> --- a/drivers/usb/host/xhci.h
> +++ b/drivers/usb/host/xhci.h
> @@ -1874,6 +1874,7 @@ struct xhci_hcd {
>  #define XHCI_RESET_PLL_ON_DISCONNECT BIT_ULL(34)
>  #define XHCI_SNPS_BROKEN_SUSPENDBIT_ULL(35)
>  #define XHCI_RENESAS_FW_QUIRKBIT_ULL(36)
> +#define XHCI_USB3_NOPORT BIT_ULL(37)
>  
>   unsigned intnum_active_eps;
>   unsigned intlimit_active_eps;
> diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
> index 3dbb42c..7df23a0f 100644
> --- a/include/linux/usb/hcd.h
> +++ b/include/linux/usb/hcd.h
> @@ -172,6 +172,7 @@ struct usb_hcd {
>   unsignedtpl_support:1; /* OTG & EH TPL support */
>   unsignedcant_recv_wakeups:1;
>   /* wakeup requests from downstream aren't received */
> + unsignedusb3_no_port:1; /* xHCI main_hcd has no port */
>  
>   unsigned intirq;/* irq allocated */
>   void __iomem*regs;  /* device memory/io */
> -- 
> 2.8.1
> 


Re: UBSAN: array-index-out-of-bounds in ehci_hub_control

2021-04-15 Thread Alan Stern
On Thu, Apr 15, 2021 at 04:10:45PM +0200, Dmitry Vyukov wrote:
> Hi,
> 
> I've got this report while booting v5.10.13 kernel, but upstream code
> seems to be the same.
> The access to port_status, the code is:
> 
> struct ehci_regs {
> u32 port_status[0]; /* up to N_PORTS */
> u32 reserved3[9];
> https://elixir.bootlin.com/linux/v5.12-rc7/source/include/linux/usb/ehci_def.h#L130
> 
> Question: should it be an empty array "[]" to prevent the undefined behavior?
> Or should it be declared as "[9]" to be more explicit?

Arnd has already looked at this:

https://marc.info/?t=15882824021=1=2

I thought we had arrived at an acceptable (though not great) solution, 
but he never posted any finished patches.  :-(

Alan Stern

> UBSAN: array-index-out-of-bounds in drivers/usb/host/ehci-hub.c:893:16
> index 1 is out of range for type 'u32 [0]'
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.13+ #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.13.0-44-g88ab0c15525c-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x183/0x225 lib/dump_stack.c:118
>  ubsan_epilogue lib/ubsan.c:148 [inline]
>  __ubsan_handle_out_of_bounds+0xdb/0x130 lib/ubsan.c:356
>  ehci_hub_control+0x1d27/0x2370 drivers/usb/host/ehci-hub.c:893
>  rh_call_control+0x938/0x11a0 drivers/usb/core/hcd.c:683
>  rh_urb_enqueue drivers/usb/core/hcd.c:842 [inline]
>  usb_hcd_submit_urb+0x2f2/0x5f0 drivers/usb/core/hcd.c:1543
>  usb_start_wait_urb+0x11a/0x550 drivers/usb/core/message.c:58
>  usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
>  usb_control_msg+0x281/0x470 drivers/usb/core/message.c:153
>  hub_power_on+0x1a8/0x3c0 arch/x86/include/asm/bitops.h:219
>  hub_activate+0x330/0x1ba0 drivers/usb/core/hub.c:1076
>  hub_configure+0x19e0/0x2690 drivers/usb/core/hub.c:1680
>  hub_probe+0x82f/0x9b0 drivers/usb/core/hub.c:1882
>  usb_probe_interface+0x67b/0xb70 drivers/usb/core/driver.c:396
>  really_probe+0x58b/0x1580 drivers/base/dd.c:558
>  driver_probe_device+0x15a/0x310 drivers/base/dd.c:740
>  bus_for_each_drv+0x16a/0x1f0 drivers/base/bus.c:431
>  __device_attach+0x2b2/0x4b0 drivers/base/dd.c:914
>  bus_probe_device+0xb8/0x200 drivers/base/bus.c:491
>  device_add+0x8e7/0xcb0 drivers/base/core.c:2954
>  usb_set_configuration+0x1b98/0x2230 drivers/usb/core/message.c:2159
>  usb_generic_driver_probe+0x83/0x140 drivers/usb/core/generic.c:238
>  usb_probe_device+0x13a/0x260 drivers/usb/core/driver.c:293
>  really_probe+0x58b/0x1580 drivers/base/dd.c:558
>  driver_probe_device+0x15a/0x310 drivers/base/dd.c:740
>  bus_for_each_drv+0x16a/0x1f0 drivers/base/bus.c:431
>  __device_attach+0x2b2/0x4b0 drivers/base/dd.c:914
>  bus_probe_device+0xb8/0x200 drivers/base/bus.c:491
>  device_add+0x8e7/0xcb0 drivers/base/core.c:2954
>  usb_new_device+0xa43/0x1120 drivers/usb/core/hub.c:2554
>  register_root_hub+0x214/0x560 drivers/usb/core/hcd.c:1009
>  usb_add_hcd+0x8ee/0x1080 drivers/usb/core/hcd.c:2793
>  usb_hcd_pci_probe+0xa61/0x1280 drivers/usb/core/hcd-pci.c:264
>  local_pci_probe drivers/pci/pci-driver.c:308 [inline]
>  pci_call_probe drivers/pci/pci-driver.c:365 [inline]
>  __pci_device_probe drivers/pci/pci-driver.c:390 [inline]
>  pci_device_probe+0x3ef/0x630 drivers/pci/pci-driver.c:433
>  really_probe+0x544/0x1580 drivers/base/dd.c:554
>  driver_probe_device+0x15a/0x310 drivers/base/dd.c:740
>  device_driver_attach+0x176/0x280 drivers/base/dd.c:1015
>  __driver_attach+0xa7/0x490 drivers/base/dd.c:1092
>  bus_for_each_dev+0x168/0x1d0 drivers/base/bus.c:305
>  bus_add_driver+0x324/0x5e0 drivers/base/bus.c:622
>  driver_register+0x2e9/0x3e0 drivers/base/driver.c:171
>  do_one_initcall+0x1a3/0x410 init/main.c:1217
>  do_initcall_level+0x168/0x218 init/main.c:1290
>  do_initcalls+0x50/0x91 init/main.c:1306
>  kernel_init_freeable+0x33b/0x47f init/main.c:1527
>  kernel_init+0xd/0x290 init/main.c:1415


Re: [PATCH] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-14 Thread Alan Stern
On Wed, Apr 14, 2021 at 01:07:43PM +0800, Chris Chiu wrote:
> Thanks for the instructions. I can hit the same timeout problem with
> runtime PM. The
> fail rate seems the same as normal PM. (around 1/4 ~ 1/7)
> root@:/sys/bus/usb/devices/3-4.3# echo auto > power/control
> root@:/sys/bus/usb/devices/3-4.3# echo on > power/control
> root@:/sys/bus/usb/devices/3-4.3# dmesg -c
> [ 2789.679807] usb 3-4: kworker/7:0 timed out on ep0out len=0/0
> [ 2789.679812] usb 3-4-port3: can't suspend, status -110
> [ 2789.680078] usb 3-4.3: Failed to suspend device, error -110

Since these are random failures, occurring at a low rate, maybe it would 
help simply to retry the transfer that timed out.  Have you tested this?

> > > > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > > > > index 7f71218cc1e5..8478d49bba77 100644
> > > > > --- a/drivers/usb/core/hub.c
> > > > > +++ b/drivers/usb/core/hub.c
> > > > > @@ -3329,8 +3329,11 @@ int usb_port_suspend(struct usb_device *udev, 
> > > > > pm_message_t msg)
> > > > >* descendants is enabled for remote wakeup.
> > > > >*/
> > > > >   else if (PMSG_IS_AUTO(msg) || 
> > > > > usb_wakeup_enabled_descendants(udev) > 0)
> > > > > - status = set_port_feature(hub->hdev, port1,
> > > > > - USB_PORT_FEAT_SUSPEND);
> > > > > + if (udev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND)
> > > >
> > > > You should test hub->hdev->quirks, here, not udev->quirks.  The quirk
> > > > belongs to the Realtek hub, not to the device that's plugged into the
> > > > hub.
> > > >
> > >
> > > Thanks for pointing that out. I'll verify again and propose a V2 after
> > > it's done.
> >
> > Another thing to consider: You shouldn't return 0 from usb_port_suspend
> > if the port wasn't actually suspended.  We don't want to kernel to have
> > a false idea of the hardware's current state.
> >
> So we still need the "really_suspend=false". What if I replace it with
> the following?
> It's a little verbose but expressive enough. Any suggestions?
> 
> +   else if (!(hub->hdev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND) &&
> +   (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 
> 0))
> +   status = set_port_feature(hub->hdev, port1,
> +   USB_PORT_FEAT_SUSPEND);
> else {
> really_suspend = false;
> status = 0;

You should do something more like this:

-   else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0)
-   status = set_port_feature(hub->hdev, port1,
-   USB_PORT_FEAT_SUSPEND);
-   else {
+   else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0) 
{
+   if (hub->hdev->quirks & USB_QUIRK_NO_SUSPEND)
+   status = -EIO;
+   else
+   status = set_port_feature(hub->hdev, port1,
+   USB_PORT_FEAT_SUSPEND);
+   } else {
really_suspend = false;
status = 0;
}

But I would prefer to find a way to make port suspend actually work, 
instead of giving up on it.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-13 Thread Alan Stern
On Tue, Apr 13, 2021 at 06:47:47PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 13, 2021 at 6:13 PM Alan Stern  wrote:
> > Hopefully this patch will make the race a lot more likely to occur.  Is
> > there any way to tell syzkaller to test it, despite the fact that
> > syzkaller doesn't think it has a reproducer for this issue?
> 
> If there is no reproducer the only way syzbot can test it is if it's
> in linux-next under CONFIG_DEBUG_AID_FOR_SYZBOT:
> http://bit.do/syzbot#no-custom-patches

There _is_ a theoretical reproducer: the test that provoked syzkaller's 
original bug report.  But syzkaller doesn't realize that it is (or may 
be) a reproducer.

It ought to be possible to ask syzkaller to run a particular test that 
it has done before, with a patch applied -- and without having to add 
anything to linux-next.

Alan Stern


Re: [syzbot] general protection fault in gadget_setup

2021-04-13 Thread Alan Stern
On Tue, Apr 13, 2021 at 10:12:05AM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 13, 2021 at 10:08 AM syzbot
>  wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:0f4498ce Merge tag 'for-5.12/dm-fixes-2' of git://git.kern..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=124adbf6d0
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=eb4674092e6cc8d9e0bd
> > userspace arch: i386
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+eb4674092e6cc8d9e...@syzkaller.appspotmail.com
> 
> I suspect that the raw gadget_unbind() can be called while the timer
> is still active. gadget_unbind() sets gadget data to NULL.
> But I am not sure which unbind call this is:
> usb_gadget_remove_driver() or right in udc_bind_to_driver() due to a
> start error.

This certainly looks like a race between gadget_unbind and gadget_setup 
in raw_gadget.

In theory, this race shouldn't matter.  The gadget core is supposed to 
guarantee that there won't be any more callbacks to the gadget driver 
once the driver's unbind routine is called.  That guarantee is enforced 
in usb_gadget_remove_driver as follows:

usb_gadget_disconnect(udc->gadget);
if (udc->gadget->irq)
synchronize_irq(udc->gadget->irq);
udc->driver->unbind(udc->gadget);
usb_gadget_udc_stop(udc);

usb_gadget_disconnect turns off the pullup resistor, telling the host 
that the gadget is no longer connected and preventing the transmission 
of any more USB packets.  Any packets that have already been received 
are sure to processed by the UDC driver's interrupt handler by the time 
synchronize_irq returns.

But this doesn't work with dummy_hcd, because dummy_hcd doesn't use 
interrupts; it uses a timer instead.  It does have code to emulate the 
effect of synchronize_irq, but that code doesn't get invoked at the 
right time -- it currently runs in usb_gadget_udc_stop, after the unbind 
callback instead of before.  Indeed, there's no way for 
usb_gadget_remove_driver to invoke this code before the unbind 
callback,.

I thought the synchronize_irq emulation problem had been completely 
solved, but evidently it hasn't.  It looks like the best solution is to 
add a call of the synchronize_irq emulation code in dummy_pullup.

Maybe we can test this reasoning by putting a delay just before the call 
to dum->driver->setup.  That runs in the timer handler, so it's not a 
good place to delay, but it may be okay just for testing purposes.

Hopefully this patch will make the race a lot more likely to occur.  Is 
there any way to tell syzkaller to test it, despite the fact that 
syzkaller doesn't think it has a reproducer for this issue?

Alan Stern


Index: usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
===
--- usb-devel.orig/drivers/usb/gadget/udc/dummy_hcd.c
+++ usb-devel/drivers/usb/gadget/udc/dummy_hcd.c
@@ -1900,6 +1900,7 @@ restart:
if (value > 0) {
++dum->callback_usage;
spin_unlock(>lock);
+   mdelay(5);
value = dum->driver->setup(>gadget,
);
spin_lock(>lock);


Re: [PATCH] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-13 Thread Alan Stern
On Tue, Apr 13, 2021 at 03:52:14PM +0800, Chris Chiu wrote:
> On Mon, Apr 12, 2021 at 11:12 PM Alan Stern  wrote:
> >
> > On Mon, Apr 12, 2021 at 11:00:06PM +0800, chris.c...@canonical.com wrote:
> > > The USB_PORT_FEAT_SUSPEND is not really necessary due to the
> > > "global suspend" in USB 2.0 spec. It's only for many hub devices
> > > which don't relay wakeup requests from the devices connected to
> > > downstream ports. For this realtek hub, there's no problem waking
> > > up the system from connected keyboard.
> >
> > What about runtime suspend?  That _does_ require USB_PORT_FEAT_SUSPEND.
> 
> It's hard to reproduce the same thing with runtime PM. I also don't
> know the aggressive
> way to trigger runtime suspend. So I'm assuming the same thing will happen in
> runtime PM case because they both go the same usb_port_resume path. Could
> you please suggest a better way to verify this for runtime PM?

To put a USB device into runtime suspend, do this:

echo 0 >/sys/bus/usb/devices/.../bConfigurationValue
echo auto >/sys/bus/usb/devices/.../power/control

where ... is the pathname for the device you want to suspend.  (Note 
that this will unbind the device from its driver, so make sure there's 
no possibility of data loss before you do it.)

To resume the device, write "on" to the power/control file.  You can 
verify the runtime-PM status by reading the files in the power/ 
subdirectory.

> > > This commit bypasses the USB_PORT_FEAT_SUSPEND for the quirky hub.
> > >
> > > Signed-off-by: Chris Chiu 
> > > ---
> >
> >
> > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > > index 7f71218cc1e5..8478d49bba77 100644
> > > --- a/drivers/usb/core/hub.c
> > > +++ b/drivers/usb/core/hub.c
> > > @@ -3329,8 +3329,11 @@ int usb_port_suspend(struct usb_device *udev, 
> > > pm_message_t msg)
> > >* descendants is enabled for remote wakeup.
> > >*/
> > >   else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) 
> > > > 0)
> > > - status = set_port_feature(hub->hdev, port1,
> > > - USB_PORT_FEAT_SUSPEND);
> > > + if (udev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND)
> >
> > You should test hub->hdev->quirks, here, not udev->quirks.  The quirk
> > belongs to the Realtek hub, not to the device that's plugged into the
> > hub.
> >
> 
> Thanks for pointing that out. I'll verify again and propose a V2 after
> it's done.

Another thing to consider: You shouldn't return 0 from usb_port_suspend 
if the port wasn't actually suspended.  We don't want to kernel to have 
a false idea of the hardware's current state.

Alan Stern


Re: [PATCH] USB: Don't set USB_PORT_FEAT_SUSPEND on WD19's Realtek Hub

2021-04-12 Thread Alan Stern
On Mon, Apr 12, 2021 at 11:00:06PM +0800, chris.c...@canonical.com wrote:
> From: Chris Chiu 
> 
> Realtek Hub (0bda:5413) in Dell Dock WD19 sometimes fails to work
> after the system resumes from suspend with remote wakeup enabled
> device connected:
> [ 1947.640907] hub 5-2.3:1.0: hub_ext_port_status failed (err = -71)
> [ 1947.641208] usb 5-2.3-port5: cannot disable (err = -71)
> [ 1947.641401] hub 5-2.3:1.0: hub_ext_port_status failed (err = -71)
> [ 1947.641450] usb 5-2.3-port4: cannot reset (err = -71)
> 
> Information of this hub:
> T:  Bus=01 Lev=02 Prnt=02 Port=02 Cnt=01 Dev#=  9 Spd=480  MxCh= 6
> D:  Ver= 2.10 Cls=09(hub  ) Sub=00 Prot=02 MxPS=64 #Cfgs=  1
> P:  Vendor=0bda ProdID=5413 Rev= 1.21
> S:  Manufacturer=Dell Inc.
> S:  Product=Dell dock
> C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=  0mA
> I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=01 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms
> I:* If#= 0 Alt= 1 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=02 Driver=hub
> E:  Ad=81(I) Atr=03(Int.) MxPS=   1 Ivl=256ms
> 
> The failure results from the ETIMEDOUT by chance when turning on
> the suspend feature of the hub. The usb_resume_device will not be
> invoked since the device state is not set to suspended, then the
> hub fails to activate subsequently.
> 
> The USB_PORT_FEAT_SUSPEND is not really necessary due to the
> "global suspend" in USB 2.0 spec. It's only for many hub devices
> which don't relay wakeup requests from the devices connected to
> downstream ports. For this realtek hub, there's no problem waking
> up the system from connected keyboard.

What about runtime suspend?  That _does_ require USB_PORT_FEAT_SUSPEND.

> This commit bypasses the USB_PORT_FEAT_SUSPEND for the quirky hub.
> 
> Signed-off-by: Chris Chiu 
> ---


> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 7f71218cc1e5..8478d49bba77 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -3329,8 +3329,11 @@ int usb_port_suspend(struct usb_device *udev, 
> pm_message_t msg)
>* descendants is enabled for remote wakeup.
>*/
>   else if (PMSG_IS_AUTO(msg) || usb_wakeup_enabled_descendants(udev) > 0)
> - status = set_port_feature(hub->hdev, port1,
> - USB_PORT_FEAT_SUSPEND);
> + if (udev->quirks & USB_QUIRK_NO_SET_FEAT_SUSPEND)

You should test hub->hdev->quirks, here, not udev->quirks.  The quirk 
belongs to the Realtek hub, not to the device that's plugged into the 
hub.

Alan Stern


Re: [RFC PATCH] usb: core: reduce power-on-good delay time of root hub

2021-04-09 Thread Alan Stern
On Fri, Apr 09, 2021 at 10:39:07AM +0800, Chunfeng Yun wrote:
> Return the exactly delay time given by root hub descriptor,
> this helps to reduce resume time etc.
> 
> Due to the root hub descriptor is usually provided by the host
> controller driver, if there is compatibility for a root hub,
> we can fix it easily without affect other root hub
> 
> Signed-off-by: Chunfeng Yun 

Acked-by: Alan Stern 

> ---
>  drivers/usb/core/hub.h | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h
> index 73f4482d833a..22ea1f4f2d66 100644
> --- a/drivers/usb/core/hub.h
> +++ b/drivers/usb/core/hub.h
> @@ -148,8 +148,10 @@ static inline unsigned hub_power_on_good_delay(struct 
> usb_hub *hub)
>  {
>   unsigned delay = hub->descriptor->bPwrOn2PwrGood * 2;
>  
> - /* Wait at least 100 msec for power to become stable */
> - return max(delay, 100U);
> + if (!hub->hdev->parent) /* root hub */
> + return delay;
> + else /* Wait at least 100 msec for power to become stable */
> + return max(delay, 100U);
>  }
>  
>  static inline int hub_port_debounce_be_connected(struct usb_hub *hub,
> -- 
> 2.18.0
> 


Re: [PATCH v2] USB:ohci:fix ohci interruption problem

2021-04-09 Thread Alan Stern
On Fri, Apr 09, 2021 at 03:47:02PM +0800, Longfang Liu wrote:
> The operating method of the system entering S4 sleep mode:
> echo reboot > /sys/power/disk
> echo disk > /sys/power/state
> 
> When OHCI enters the S4 sleep state, check the log and find that
> the USB sleep process will call check_root_hub_suspend() and
> ohci_bus_suspend() instead ohci_suspend() and ohci_bus_suspend(),
> which will cause the OHCI interrupt to not be closed.
> 
> At this time, if just one device interrupt is reported. the
> driver will not process and close this device interrupt. It will cause
> the entire system to be stuck during sleep, causing the device to
> fail to respond.
> 
> When the abnormal interruption reaches 100,000 times, the system will
> forcibly close the interruption and make the device unusable.
> 
> Because the root cause of the problem is that ohci_suspend is not
> called to perform normal interrupt shutdown operations when the system
> enters S4 sleep mode.
> 
> Therefore, our solution is to specify freeze interface in this mode to
> perform normal suspend_common() operations, and call ohci_suspend()
> after check_root_hub_suspend() is executed through the suspend_common()
> operation.
> 
> Signed-off-by: Longfang Liu 
> ---
> 
> Changes in V2:
>   - Modify comment and patch version information.

Please, instead of sending the same incorrect patch over and over again, 
spend some time figuring out what is really going wrong.  I have already 
explained why this patch is not the right thing to do.

You have to determine why the poweroff callback in hcd-pci.c (which 
points to hcd_pci_suspend) isn't getting called.  That's the real 
explanation for your problem.

Alan Stern

> Changes in V1:
>   - Call suspend_common by adding the hcd_pci_freeze function turn off
>   the interrupt instead of adding a shutdown operation in ohci_bus_suspend
>   to turn off the interrupt.
> 
>  drivers/usb/core/hcd-pci.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> index 1547aa6..c5844a3 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -509,6 +509,11 @@ static int resume_common(struct device *dev, int event)
>  
>  #ifdef   CONFIG_PM_SLEEP
>  
> +static int hcd_pci_freeze(struct device *dev)
> +{
> + return suspend_common(dev, device_may_wakeup(dev));
> +}
> +
>  static int hcd_pci_suspend(struct device *dev)
>  {
>   return suspend_common(dev, device_may_wakeup(dev));
> @@ -605,7 +610,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = {
>   .suspend_noirq  = hcd_pci_suspend_noirq,
>   .resume_noirq   = hcd_pci_resume_noirq,
>   .resume = hcd_pci_resume,
> - .freeze = check_root_hub_suspended,
> + .freeze = hcd_pci_freeze,
>   .freeze_noirq   = check_root_hub_suspended,
>   .thaw_noirq = NULL,
>   .thaw   = NULL,
> -- 
> 2.8.1
> 


Re: [PATCH v4] USB:ehci:fix Kunpeng920 ehci hardware problem

2021-04-09 Thread Alan Stern
On Fri, Apr 09, 2021 at 04:48:01PM +0800, Longfang Liu wrote:
> Kunpeng920's EHCI controller does not have SBRN register.
> Reading the SBRN register when the controller driver is
> initialized will get 0.
> 
> When rebooting the EHCI driver, ehci_shutdown() will be called.
> if the sbrn flag is 0, ehci_shutdown() will return directly.
> The sbrn flag being 0 will cause the EHCI interrupt signal to
> not be turned off after reboot. this interrupt that is not closed
> will cause an exception to the device sharing the interrupt.
> 
> Therefore, the EHCI controller of Kunpeng920 needs to skip
> the read operation of the SBRN register.
> 
> Signed-off-by: Longfang Liu 
> ---

Acked-by: Alan Stern 

> Changes in v4:
>   - Modify the code implementation.
> 
> Changes in v3:
>   - Fix some code style issues.
>   - Update struct name.
> 
> Changes in v2:
>   - Fix some code style issues.
>   - Update function name.
> 
>  drivers/usb/host/ehci-pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/usb/host/ehci-pci.c b/drivers/usb/host/ehci-pci.c
> index 3c3820a..237a346 100644
> --- a/drivers/usb/host/ehci-pci.c
> +++ b/drivers/usb/host/ehci-pci.c
> @@ -291,6 +291,9 @@ static int ehci_pci_setup(struct usb_hcd *hcd)
>   if (pdev->vendor == PCI_VENDOR_ID_STMICRO
>   && pdev->device == PCI_DEVICE_ID_STMICRO_USB_HOST)
>   ;   /* ConneXT has no sbrn register */
> + else if (pdev->vendor == PCI_VENDOR_ID_HUAWEI
> +  && pdev->device == 0xa239)
> + ;   /* HUAWEI Kunpeng920 USB EHCI has no sbrn register */
>   else
>   pci_read_config_byte(pdev, 0x60, >sbrn);
>  
> -- 
> 2.8.1
> 


Re: [PATCH v2 0/2] USB:ehci:fix the no SRBN register problem

2021-04-08 Thread Alan Stern
On Thu, Apr 08, 2021 at 09:49:18PM +0800, Longfang Liu wrote:
> (1) Add a whitelist for EHCI devices without SBRN registers.
> (2) Add Kunpeng920's EHCI device to the whitelist.
> 
> Changes in v2:
>   - Fix some code style issues.
>   - Update function name.
> 
> Longfang Liu (2):
>   USB:ehci:Add a whitelist for EHCI controllers
>   USB:ehci:fix Kunpeng920 ehci hardware problem
> 
>  drivers/usb/host/ehci-pci.c | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)

I don't think we need a whole list, along with an associated lookup 
routine, when there are only two entries.  The total amount of code will 
be smaller if you just add a check for the Kunpeng920 controller to
the existing check for the STMICRO controller.

Alan Stern


Re: [PATCH 4/4] ARM: dts: Fix-up EMMC2 controller's frequency

2021-04-07 Thread Alan Cooper
Nicolas,

I got a better description of the failure and it looks like the bus
clock needs to be limited to 300KHz for a 500MHz core clock.
What's happening is that an internal reset sequence is needed after a
command timeout and the reset signal needs to be asserted for at least
2 ticks of the bus clock. This is done using a 12 bit counter clocked
by the core clock. That means a 500MHz core clock produces a 122KHz
reset signal which is too fast for 2 ticks of the 200KHz bus clock
(100KHz) but is okay for the 300KHz (150Khz) bus clock.

Al

On Mon, Apr 5, 2021 at 4:45 AM Nicolas Saenz Julienne
 wrote:
>
> Hi Alan,
>
> On Thu, 2021-04-01 at 11:23 -0400, Alan Cooper wrote:
> > Nicolas,
> >
> > Sorry, I just noticed this thread.
> > This is a known bug in some newer Arasan cores.
> > The problem happens when the difference between the core clock and the bus
> > clock is too great.
> > Limiting the clock to 200KHz minimum should be a good fix.
>
> Great, that's what I was hoping to hear :). Out of curiosity, can you share
> more details on how the failure occurs?
>
> > In my experience, it's only eMMC that needs the clock to be retried
>
> > below 400KHz and not SD or SDIO. That's because the CMD signal for
> > eMMC starts out as open-drain during identification and the size of
> > the pull-up on the CMD signal can require the <400KHz clock. Once eMMC
> > is out of identification mode the CMD signal is switched to push-pull
> > and can run at much higher clock rates.
>
> Fair enough, I need to do some tests, some of the compute modules use an eMMC.
>
> > I don't think that SD and SDIO have any open-drain signals, so they
> > shouldn't need to retry at slower clock speeds.
>
> Noted.
>
> > I'm trying to get more detail on the bug, like the exact ratio of core
> > clock to bus clock that causes the problem. When I first found this
> > bug I was told that the failure would not happen at 200KHz, but we
> > were using a 405MHz core clock.
>
> That would be nice to have.
>
> > One other question. Why are you using polling for the SD card, this
> > newer controller supports the interrupt driven "Card Inserted" signal
> > and avoids wasting time polling?
>
> I believe the line isn't routed on RPi4.
>
> > Al
>


Re: [PATCH] USB:ohci:fix ohci interruption problem

2021-04-02 Thread Alan Stern
On Fri, Apr 02, 2021 at 05:27:59PM +0800, Longfang Liu wrote:
> The operating method of the system entering S4 sleep mode:
> echo disk > /sys/power/state

This discussion is still not right.

> When OHCI enters the S4 sleep state,

To start with, you should be talking about hibernation (also known as 
suspend-to-disk), not S4.  When the system enters hibernation -- for 
example, when you write "disk" to /sys/power/state -- the controller may 
go into S4 or it may go into some other power-saving state.

>  the USB sleep process will call
> check_root_hub_suspend() and ohci_bus_suspend() instead of
> ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt
> to not be closed.

This isn't true.  The procedure _does_ call ohci_suspend, through the 
.poweroff callback in hcd-pci.c.  That callback goes to the 
hcd_pci_suspend routine, which calls suspend_common and then 
ohci_suspend.

However, these calls happen after the kernel image has be written to the 
storage area on the disk.  As a result, any log messages produced during 
the calls do not get saved, so they don't get reloaded when the system 
resumes from hibernation, and they aren't present in the log after the 
system wakes up.  That's why they didn't appear in the log you included 
in an earlier email.  The only way to observe them is to use a remote 
console, such as a network console.

In fact, that's pretty much the only way to debug problems that occur 
during a hibernation transition.

> At this time, if just one device interrupt is reported. Since rh_state
> has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the
> driver will not process and close this device interrupt.

That's not true either.  The ohci_irq routine does indeed process 
interrupts even when rh_state is set to OHCI_RH_SUSPENDED.  How else 
could it handle a device's wakeup request?

> It will cause
> the entire system to be stuck during sleep, causing the device to
> fail to respond.

During hibernation, the system is powered off.  Obviously the kernel is 
not capable of handling interrupts at this time.

Also, why would a device interrupt be reported at this time?  What 
causes the interrupt request?

> When the abnormal interruption reaches 100,000 times, the system will
> forcibly close the interruption and make the device unusable.
> 
> Because the root cause of the problem is that ohci_suspend is not
> called to perform normal interrupt shutdown operations when the system
> enters S4 sleep mode.
> 
> Therefore, our solution is to specify freeze interface in this mode to
> perform normal suspend_common() operations, and call ohci_suspend()
> after check_root_hub_suspend() is executed through the suspend_common()
> operation.

No.  The freeze interface does not need to power-down the controller.  
All it needs to do is make sure that no communication between the 
computer and the attached USB devices takes place, and this is handled 
by ohci_bus_suspend.

Furthermore, it is a mistake for the freeze routine to change anything 
unless the thaw routine reverses the change.  Your patch leaves the thaw 
callback pointer set to NULL.

> After using this solution, it is verified by the stress test of sleep
> wake up in S4 mode for a long time that this problem no longer occurs.

Something else must be happeneing, something you don't understand.

Alan Stern

> Signed-off-by: Longfang Liu 
> ---
>  drivers/usb/core/hcd-pci.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> index 1547aa6..c5844a3 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -509,6 +509,11 @@ static int resume_common(struct device *dev, int event)
>  
>  #ifdef   CONFIG_PM_SLEEP
>  
> +static int hcd_pci_freeze(struct device *dev)
> +{
> + return suspend_common(dev, device_may_wakeup(dev));
> +}
> +
>  static int hcd_pci_suspend(struct device *dev)
>  {
>   return suspend_common(dev, device_may_wakeup(dev));
> @@ -605,7 +610,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = {
>   .suspend_noirq  = hcd_pci_suspend_noirq,
>   .resume_noirq   = hcd_pci_resume_noirq,
>   .resume = hcd_pci_resume,
> - .freeze = check_root_hub_suspended,
> + .freeze = hcd_pci_freeze,
>   .freeze_noirq   = check_root_hub_suspended,
>   .thaw_noirq = NULL,
>   .thaw   = NULL,
> -- 
> 2.8.1
> 


Re: [syzbot] WARNING in ieee802154_del_seclevel

2021-04-01 Thread Alan Stern
On Wed, Mar 31, 2021 at 02:03:08PM -0700, syzbot wrote:
> syzbot has bisected this issue to:
> 
> commit 416dacb819f59180e4d86a5550052033ebb6d72c
> Author: Alan Stern 
> Date:   Wed Aug 21 17:27:12 2019 +
> 
> HID: hidraw: Fix invalid read in hidraw_ioctl
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=127430fcd0
> start commit:   6e5a03bc ethernet/netronome/nfp: Fix a use after free in n..
> git tree:   net
> final oops: https://syzkaller.appspot.com/x/report.txt?x=117430fcd0
> console output: https://syzkaller.appspot.com/x/log.txt?x=167430fcd0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=daeff30c2474a60f
> dashboard link: https://syzkaller.appspot.com/bug?extid=fbf4fc11a819824e027b
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=13bfe45ed0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1188e31ad0
> 
> Reported-by: syzbot+fbf4fc11a819824e0...@syzkaller.appspotmail.com
> Fixes: 416dacb819f5 ("HID: hidraw: Fix invalid read in hidraw_ioctl")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

It seems likely that the bisection ran off the rails here.  This commit 
could not have caused a problem, although it may have revealed a 
pre-existing problem that previously was hidden.

By the way, what happened to the annotated stack dumps that syzkaller 
used to provide in its bug reports?

Alan Stern


Re: [PATCH 4/4] ARM: dts: Fix-up EMMC2 controller's frequency

2021-04-01 Thread Alan Cooper
Nicolas,

Sorry, I just noticed this thread.
This is a known bug in some newer Arasan cores.
The problem happens when the difference between the core clock and the
bus clock is too great.
Limiting the clock to 200KHz minimum should be a good fix.
In my experience, it's only eMMC that needs the clock to be retried
below 400KHz and not SD or SDIO. That's because the CMD signal for
eMMC starts out as open-drain during identification and the size of
the pull-up on the CMD signal can require the <400KHz clock. Once eMMC
is out of identification mode the CMD signal is switched to push-pull
and can run at much higher clock rates.
I don't think that SD and SDIO have any open-drain signals, so they
shouldn't need to retry at slower clock speeds.
I'm trying to get more detail on the bug, like the exact ratio of core
clock to bus clock that causes the problem. When I first found this
bug I was told that the failure would not happen at 200KHz, but we
were using a 405MHz core clock.

One other question. Why are you using polling for the SD card, this
newer controller supports the interrupt driven "Card Inserted" signal
and avoids wasting time polling?

Al


On Fri, Mar 26, 2021 at 12:17 PM Nicolas Saenz Julienne
 wrote:
>
> On Thu, 2021-03-25 at 20:11 +0100, Stefan Wahren wrote:
> > Am 24.03.21 um 16:34 schrieb Nicolas Saenz Julienne:
> > > Hi Stefan,
> > >
> > > On Wed, 2021-03-24 at 16:16 +0100, Stefan Wahren wrote:
> > > > Hi Nicolas,
> > > >
> > > > Am 22.03.21 um 19:58 schrieb Nicolas Saenz Julienne:
> > > > > From: Nicolas Saenz Julienne 
> > > > >
> > > > > Force emmc2's frequency to 150MHz as the default 100MHz (set by FW)
> > > > > seems to interfere with the VPU clock when setup at frequencies bigger
> > > > > than 500MHz (a pretty common case). This ends up causing unwarranted
> > > > > SDHCI CMD hangs  when no SD card is present.
> > > > >
> > > > > Signed-off-by: Nicolas Saenz Julienne 
> > > > > ---
> > > > >  arch/arm/boot/dts/bcm2711-rpi-4-b.dts | 6 ++
> > > > >  1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/arch/arm/boot/dts/bcm2711-rpi-4-b.dts 
> > > > > b/arch/arm/boot/dts/bcm2711-rpi-4-b.dts
> > > > > index 3b4ab947492a..9aa8408d9960 100644
> > > > > --- a/arch/arm/boot/dts/bcm2711-rpi-4-b.dts
> > > > > +++ b/arch/arm/boot/dts/bcm2711-rpi-4-b.dts
> > > > > @@ -257,6 +257,12 @@  {
> > > > > vqmmc-supply = <_io_1v8_reg>;
> > > > > vmmc-supply = <_vcc_reg>;
> > > > > broken-cd;
> > > > > +   /*
> > > > > +* Force the frequency to 150MHz as the default 100MHz seems 
> > > > > to
> > > > > +* interfere with the VPU clock when setup at frequencies 
> > > > > bigger than
> > > > > +* 500MHz, causing unwarranted CMD hangs.
> > > > > +*/
> > > > > +   clock-frequency = <15000>;
> > > > i don't want to bike-shed here, but is there any chance to solve this in
> > > > clk-bcm2835 in a less hacky way?
> > > What do you have in mind?
> > Sorry, nothing specific.
> > >
> > > All I can think of is adding some kind of heuristic to the clock's 
> > > prepare()
> > > callback. That said, I don't feel it would be a better solution than this.
> >
> > Based on my limited knowledge and an old SD card specification, all
> > possibly connected devices could have different frequencies. So my
> > concern here is, that in case we limit the frequency to a specific value
> > we could break things just to suppress a warning.
>
> SDHCI should be able to handle up to 233MHz IIRC, and there are divisors
> available, it depends on the implementation but the worst kind provide /2^n.
> Not perfect, but good enough for things to work.
>
> Now, I've been having a deeper look into how clocks are handled, and found two
> new clues:
>
>  - First of all RPi4's sdhci-iproc needs SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
>that is, the controller isn't properly identifying the clock frequency fed
>into it, and defaults to saying it's configured at 100MHz. I'm not an SDHCI
>expert, so it's possible changing frequencies also needs a special 
> operation
>to recalculate this variable. But this was making all internal calculations
>wrong when paired with this series.
>
>  - With this flag set SDHCI's core now properly calculates divisor values 
> based
>on whatever clock frequency I set in DT. And guess what, the issue 
> reappears
>even when running on 150MHz. It turns out, as I had some debugging enabled,
>the issue only happens when the controller is configured at 100KHz (that
>only happens while running the card detect thread).
>
> So, I can now do this (note that for card detection try to communicate with 
> the
> card starting at 400KHz down to 100KHz in 100KHz steps):
>
> ->8-
>
> diff --git a/drivers/mmc/host/sdhci-iproc.c b/drivers/mmc/host/sdhci-iproc.c
> index 536c382e2486..e5a5de63f347 100644
> --- a/drivers/mmc/host/sdhci-iproc.c
> +++ b/drivers/mmc/host/sdhci-iproc.c
> @@ -173,6 +173,11 @@ static unsigned 

Re: >20 KB URBs + EHCI = bad performance due to stalls

2021-03-31 Thread Alan Stern
On Wed, Mar 31, 2021 at 08:20:56PM +0200, Maciej S. Szmigiero wrote:
> On 29.03.2021 17:22, Alan Stern wrote:
> > On Sat, Mar 27, 2021 at 04:55:20PM +0100, Maciej S. Szmigiero wrote:
> > > Hi,
> > > 
> > > Is there any specific reason that URBs without URB_SHORT_NOT_OK flag that
> > > span multiple EHCI qTDs have Alternate Next qTD pointer set to the dummy
> > > qTD in their every qTD besides the last one (instead of to the first qTD
> > > of the next URB to that endpoint)?
> > 
> > Quick answer: I don't know.  I can't think of any good reason.  This
> > code was all written a long time ago.  Maybe the issue was overlooked
> > or the details were misunderstood.
> 
> I've dug out the original EHCI driver, that landed in 2.5.2:
> https://marc.info/?l=linux-usb-devel=100875066109269=2
> https://marc.info/?l=linux-usb-devel=100982880716373=2
> 
> It already had the following qTD setup code, roughly similar to what
> the current one does:
> > /* previous urb allows short rx? maybe optimize. */
> > if (!(last_qtd->urb->transfer_flags & USB_DISABLE_SPD)
> > && (epnum & 0x10)) {
> > // only the last QTD for now
> > last_qtd->hw_alt_next = hw_next;
> 
> The "for now" language seems to suggest that ultimately other-than-last
> qTDs were supposed to be set not to stall the queue, too.
> Just the code to handle this case was never written.

Probably it just slipped out of the developer's mind.

> It seems to me though, this should be possible with relatively few
> changes to the code:
> qh_append_tds() will need to patch these other-than-last qTDs
> hw_alt_next pointer to point to the (new) dummy qTD (instead of just
> pointing the last submitted qTD hw_next to it and adding the remaining
> qTDs verbatim to the qH qTD list).

Right.

> Then qh_completions() will need few changes:
> *
> >  } else if (IS_SHORT_READ (token)
> >   && !(qtd->hw_alt_next
> >& EHCI_LIST_END(ehci))) {
> This branch will need to be modified not to mark the queue as stopped
> and request its unlinking when such type of short qTD was processed.

This would be a good place to introduce a macro.  For example:

} else if (IS_SHORT_READ(token) && 
EHCI_PTR_IS_SET(qtd->hw_alt_next)) {

or something similar.

> * The ACTIVE bit should be treated as unset on any qTD following the
> one that hits the above condition until a qTD for a different URB is
> encountered.
> Otherwise the unprocessed remaining qTDs from that short URB will be
> considered pending active qTDs and the code will wait forever for their
> processing,

The treatment shouldn't be exactly the same as if ACTIVE is clear.  The 
following qTDs can be removed from the list and deallocated immediately, 
since the hardware won't look at them.  And they shouldn't affect the 
URB's status.

> * The code that patches the previous qTD hw_next pointer when removing a
> qTD that isn't currently at the qH will need changing to also patch
> hw_alt_next pointers of the qTDs belonging to the previous URB in case
> the previous URB was one of these short-read-ok ones.

Yes.  Awkward but necessary.

Although I know nothing at all about the USB API in Windows, I suspect 
that it manages to avoid this awkwardness entirely by not allowing URBs 
in the middle of the queue to be unlinked.  Or perhaps allowing it only 
for endpoint 0.  I've often wished Linux's API had been written that 
way.

> That's was my quick assessment what is required to handle these
> transactions effectively in the EHCI driver.
> 
> I suspect, however, there may be some corner cases involving
> non-ordinary qTD unlinking which might need fixing, too (like caused
> by usb_unlink_urb(), system suspend or HC removal).
> But I am not sure about this since I don't know this code well.

Those shouldn't present any difficulty.  There are inherently easier to 
handle because the QH won't be actively running when they occur.

> > > This causes that endpoint queue to stall in case of a short read that
> > > does not reach the last qTD (I guess this condition persists until an
> > > URB is (re)submitted to that endpoint, but I am not sure here).
> > 
> > It persists until the driver cleans up the queue.
> 
> I guess by "the driver" you mean the host controller driver, not the USB
> device driver.

Yes, I meant ehci-hcd.

> > > Looking at OHCI and UHCI host controller drivers the equivalent
> > > limits seem to be different there (8 KB and 2 KB), while I don't
> > > see any specific limit in the XHCI case.
> > 
> > I'd have to review the details of ohci-hcd and uh

Re: [PATCH v2 5/6] usb: Iterator for ports

2021-03-30 Thread Alan Stern
On Tue, Mar 30, 2021 at 12:07:35PM +0300, Heikki Krogerus wrote:
> On Mon, Mar 29, 2021 at 02:49:46PM -0400, Alan Stern wrote:
> > On Mon, Mar 29, 2021 at 11:44:25AM +0300, Heikki Krogerus wrote:
> > > Introducing usb_for_each_port(). It works the same way as
> > > usb_for_each_dev(), but instead of going through every USB
> > > device in the system, it walks through the USB ports in the
> > > system.
> > > 
> > > Signed-off-by: Heikki Krogerus 
> > > ---
> > >  drivers/usb/core/usb.c | 46 ++
> > >  include/linux/usb.h|  1 +
> > >  2 files changed, 47 insertions(+)
> > > 
> > > diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> > > index 2ce3667ec6fae..62368c4ed37af 100644
> > > --- a/drivers/usb/core/usb.c
> > > +++ b/drivers/usb/core/usb.c
> > > @@ -398,6 +398,52 @@ int usb_for_each_dev(void *data, int (*fn)(struct 
> > > usb_device *, void *))
> > >  }
> > >  EXPORT_SYMBOL_GPL(usb_for_each_dev);
> > >  
> > > +struct each_hub_arg {
> > > + void *data;
> > > + int (*fn)(struct device *, void *);
> > > +};
> > > +
> > > +static int __each_hub(struct usb_device *hdev, void *data)
> > > +{
> > > + struct each_hub_arg *arg = (struct each_hub_arg *)data;
> > > + struct usb_hub *hub;
> > > + int ret = 0;
> > > + int i;
> > > +
> > > + hub = usb_hub_to_struct_hub(hdev);
> > > + if (!hub)
> > > + return 0;
> > 
> > What happens if the hub is removed exactly now?  Although hdev is 
> > reference-counted (and the loop iterator does take a reference to it), 
> > usb_hub_to_struct_hub doesn't take a reference to hub.  And hub->ports 
> > isn't refcounted at all.
> 
> If the hub is removed right now, and if hub_disconnect() also manages
> to remove the ports before we have time to take the lock below, then
> hdev->maxchild will be 0 by the time we can take the lock. In that
> case nothing happens here.

Okay, good.

> If on the other hand we manage to acquire the usb_port_peer_mutex
> before hub_disconnect(), then hub_disconnect() will simply have to
> wait until we are done, and only after that remove the ports.
> 
> > > + mutex_lock(_port_peer_mutex);
> > > +
> > > + for (i = 0; i < hdev->maxchild; i++) {
> > > + ret = arg->fn(>ports[i]->dev, arg->data);
> > > + if (ret)
> > > + break;
> > > + }
> > > +
> > > + mutex_unlock(_port_peer_mutex);
> > 
> > I have a feeling that it would be better to take and release this mutex 
> > in usb_for_each_port (or its caller), so that it is held over the whole 
> > loop.
> 
> I disagree. The lock is for the ports, not the hubs. We should take
> the lock when we are going through the ports of a hub, but release it
> between the hubs. Otherwise we will be only keeping things on hold for
> a long period of time for no good reason (I for example have to
> evaluate the _PLD of every single port which takes a lot of time). We
> don't need to prevent other things from happening to the hubs at the
> same time.

All right, you convinced me.

Acked-by: Alan Stern 

Alan Stern


Re: [PATCH v2 5/6] usb: Iterator for ports

2021-03-29 Thread Alan Stern
On Mon, Mar 29, 2021 at 11:44:25AM +0300, Heikki Krogerus wrote:
> Introducing usb_for_each_port(). It works the same way as
> usb_for_each_dev(), but instead of going through every USB
> device in the system, it walks through the USB ports in the
> system.
> 
> Signed-off-by: Heikki Krogerus 
> ---
>  drivers/usb/core/usb.c | 46 ++
>  include/linux/usb.h|  1 +
>  2 files changed, 47 insertions(+)
> 
> diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> index 2ce3667ec6fae..62368c4ed37af 100644
> --- a/drivers/usb/core/usb.c
> +++ b/drivers/usb/core/usb.c
> @@ -398,6 +398,52 @@ int usb_for_each_dev(void *data, int (*fn)(struct 
> usb_device *, void *))
>  }
>  EXPORT_SYMBOL_GPL(usb_for_each_dev);
>  
> +struct each_hub_arg {
> + void *data;
> + int (*fn)(struct device *, void *);
> +};
> +
> +static int __each_hub(struct usb_device *hdev, void *data)
> +{
> + struct each_hub_arg *arg = (struct each_hub_arg *)data;
> + struct usb_hub *hub;
> + int ret = 0;
> + int i;
> +
> + hub = usb_hub_to_struct_hub(hdev);
> + if (!hub)
> + return 0;

What happens if the hub is removed exactly now?  Although hdev is 
reference-counted (and the loop iterator does take a reference to it), 
usb_hub_to_struct_hub doesn't take a reference to hub.  And hub->ports 
isn't refcounted at all.

> +
> + mutex_lock(_port_peer_mutex);
> +
> + for (i = 0; i < hdev->maxchild; i++) {
> + ret = arg->fn(>ports[i]->dev, arg->data);
> + if (ret)
> + break;
> + }
> +
> + mutex_unlock(_port_peer_mutex);

I have a feeling that it would be better to take and release this mutex 
in usb_for_each_port (or its caller), so that it is held over the whole 
loop.

Alan Stern

> +
> + return ret;
> +}
> +
> +/**
> + * usb_for_each_port - interate over all USB ports in the system
> + * @data: data pointer that will be handed to the callback function
> + * @fn: callback function to be called for each USB port
> + *
> + * Iterate over all USB ports and call @fn for each, passing it @data. If it
> + * returns anything other than 0, we break the iteration prematurely and 
> return
> + * that value.
> + */
> +int usb_for_each_port(void *data, int (*fn)(struct device *, void *))
> +{
> + struct each_hub_arg arg = {data, fn};
> +
> + return usb_for_each_dev(, __each_hub);
> +}
> +EXPORT_SYMBOL_GPL(usb_for_each_port);
> +
>  /**
>   * usb_release_dev - free a usb device structure when all users of it are 
> finished.
>   * @dev: device that's been disconnected
> diff --git a/include/linux/usb.h b/include/linux/usb.h
> index ddd2f5b2a2827..e4d2eb703cf89 100644
> --- a/include/linux/usb.h
> +++ b/include/linux/usb.h
> @@ -871,6 +871,7 @@ extern int usb_match_one_id(struct usb_interface 
> *interface,
>   const struct usb_device_id *id);
>  
>  extern int usb_for_each_dev(void *data, int (*fn)(struct usb_device *, void 
> *));
> +int usb_for_each_port(void *data, int (*fn)(struct device *, void *));
>  extern struct usb_interface *usb_find_interface(struct usb_driver *drv,
>   int minor);
>  extern struct usb_interface *usb_ifnum_to_if(const struct usb_device *dev,
> -- 
> 2.30.2
> 


Re: [RFC PATCH] USB:ohci:fix ohci interruption problem

2021-03-29 Thread Alan Stern
On Mon, Mar 29, 2021 at 04:38:10PM +0800, liulongfang wrote:
> On 2021/3/26 23:28, Alan Stern wrote:
> > On Fri, Mar 26, 2021 at 04:54:56PM +0800, Longfang Liu wrote:
> >> When OHCI enters the S4 sleep state, the USB sleep process will call
> >> check_root_hub_suspend() and ohci_bus_suspend() instead of
> >> ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt
> >> to not be closed.
> > 
> > What on earth are you talking about?  This isn't true at all.
> > 
> > Can you provide more information about your system?  Are you using a 
> > PCI-based OHCI controller or a platform device (and if so, which one)?  
> > Can you post system logs to back up your statements?
> > The system is UOS, the kernel version is kernel4.19, and the driver
> used is ohci-pci.c based on PCI.
> 
> By adding the log in ohci_suspend, and then viewing the dmesg after sleep,
> I can confirm that the system does not call ohci_suspend in S4 sleep mode.

Then your job is to figure out why not.  Doesn't entry into S4 sleep
call hcd_pci_suspend() in core/hcd-pci.c, and doesn't that call
suspend_common(), and doesn't that call hcd->driver->pci_suspend(),
and isn't that routine ohci_suspend()?

Alan Stern


Re: >20 KB URBs + EHCI = bad performance due to stalls

2021-03-29 Thread Alan Stern
On Sat, Mar 27, 2021 at 04:55:20PM +0100, Maciej S. Szmigiero wrote:
> Hi,
> 
> Is there any specific reason that URBs without URB_SHORT_NOT_OK flag that
> span multiple EHCI qTDs have Alternate Next qTD pointer set to the dummy
> qTD in their every qTD besides the last one (instead of to the first qTD
> of the next URB to that endpoint)?

Quick answer: I don't know.  I can't think of any good reason.  This
code was all written a long time ago.  Maybe the issue was overlooked
or the details were misunderstood.

> This causes that endpoint queue to stall in case of a short read that
> does not reach the last qTD (I guess this condition persists until an
> URB is (re)submitted to that endpoint, but I am not sure here).

It persists until the driver cleans up the queue.

> One of affected drivers here is drivers/net/usb/r8152.c.
> 
> If I simply reduce its per-URB transfer buffer to 20 KB (the maximum
> that fits in a well-aligned qTD) the RX rate increases from around
> 100 Mbps to 200+ Mbps (on an ICH8M controller):

> The driver default is to use 32 KB buffers (which span two qTDs),
> but the device rarely fully fills the first qTD resulting in
> repetitive stalls and more than halving the performance.
> 
> As far as I can see, the relevant code in
> drivers/usb/host/ehci-q.c::qh_urb_transaction() predates the git era.

Like I said, a long time ago.

> The comment in that function before setting the Alternate Next qTD
> pointer:
> > /*
> >  * short reads advance to a "magic" dummy instead of the next
> >  * qtd ... that forces the queue to stop, for manual cleanup.
> >  * (this will usually be overridden later.)
> >  */
> 
> ...suggests the idea was to override that pointer when
> URB_SHORT_NOT_OK is not set, but this is actually done only for
> the last qTD from the URB (also, that's the only one that ends
> with interrupt flag set).

The hw_alt_next field should be updated for all the qTDs in the URB.
Failure to this was probably an oversight.  Or maybe the omission was
to simplify the procedure for cleaning up the queue after a short
transfer.

> Looking at OHCI and UHCI host controller drivers the equivalent
> limits seem to be different there (8 KB and 2 KB), while I don't
> see any specific limit in the XHCI case.

I'd have to review the details of ohci-hcd and uhci-hcd to make
sure.  In principle, the queue isn't supposed to stop merely because
of a short transfer unless URB_SHORT_NOT_OK is set.  However, the UHCI
hardware in particular may offer no other way to handle a short transfer.

> Because of that variance in the URB buffer limit it seems strange
> to me that this should be managed by a particular USB device driver
> rather than by the host controller driver, because this would mean
> every such driver would need to either use the lowest common
> denominator for the URB buffer size (which is very small) or
> hardcode the limit for every host controller that the device can
> be connected to, which seems a bit inefficient.

I don't understand what you're saying in this paragraph.  What do you
think USB device drivers are supposed to be managing?  The URB buffer
size?  They should set that field without regard to the type of host
controller in use.

In short, the behavior you observed is a bug, resulting in a loss of
throughput (though not in any loss of data).  It needs to be fixed.

If you would like to write and submit a patch, that would be great.
Otherwise, I'll try to find time to work on it.

I would appreciate any effort you could make toward checking the code
in qh_completions(); I suspect that the checks it does involving
EHCI_LIST_END may not be right.  At the very least, they should be
encapsulated in a macro so that they are easier to understand.

Alan Stern


Re: [PATCH v3 0/4] scsi: add runtime PM workaround for SD cardreaders

2021-03-28 Thread Alan Stern
On Sun, Mar 28, 2021 at 12:25:27PM +0200, Martin Kepplinger wrote:
> hi,
> 
> In short: there are SD cardreaders that send MEDIA_CHANGED on
> (runtime) resume. We cannot use runtime PM with these devices as
> I/O always fails. I'd like to discuss a way to fix this
> or at least allow us to work around this problem:

In fact, as far as I know _all_ USB SD card readers send Media Changed 
notifications on resume.  Maybe there are some that don't, but I haven't 
heard of any.

Alan Stern


Re: [RFC PATCH] USB:ohci:fix ohci interruption problem

2021-03-26 Thread Alan Stern
On Fri, Mar 26, 2021 at 04:54:56PM +0800, Longfang Liu wrote:
> When OHCI enters the S4 sleep state, the USB sleep process will call
> check_root_hub_suspend() and ohci_bus_suspend() instead of
> ohci_suspend() and ohci_bus_suspend(), this causes the OHCI interrupt
> to not be closed.

What on earth are you talking about?  This isn't true at all.

Can you provide more information about your system?  Are you using a 
PCI-based OHCI controller or a platform device (and if so, which one)?  
Can you post system logs to back up your statements?

The proper order of calls is ohci_bus_suspend, then 
check_root_hub_suspended, then ohci_suspend.  Often the first one is 
called some time before the other two.

> At this time, if just one device interrupt is reported. Since rh_state
> has been changed to OHCI_RH_SUSPENDED after ohci_bus_suspend(), the
> driver will not process and close this device interrupt. It will cause
> the entire system to be stuck during sleep, causing the device to
> fail to respond.
> 
> When the abnormal interruption reaches 100,000 times, the system will
> forcibly close the interruption and make the device unusable.
> 
> Since the problem is that the interrupt is not closed, we copied the
> interrupt shutdown operation of ohci_suspend() into ohci_bus_suspend()
> during the S4 sleep period. We found that this method can solve this
> problem.
> 
> At present, we hope to be able to call ohci_suspend() directly during
> the sleep process of S4. Do you have any suggestions for this
> modification?
> 
> Signed-off-by: Longfang Liu 
> ---
>  drivers/usb/host/ohci-hub.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/host/ohci-hub.c b/drivers/usb/host/ohci-hub.c
> index 634f3c7..d468cef 100644
> --- a/drivers/usb/host/ohci-hub.c
> +++ b/drivers/usb/host/ohci-hub.c
> @@ -315,6 +315,14 @@ static int ohci_bus_suspend (struct usb_hcd *hcd)
>   del_timer_sync(>io_watchdog);
>   ohci->prev_frame_no = IO_WATCHDOG_OFF;
>   }
> +
> + spin_lock_irqsave(>lock, flags);
> + ohci_writel(ohci, OHCI_INTR_MIE, >regs->intrdisable);
> + (void)ohci_readl(ohci, >regs->intrdisable);
> +
> + clear_bit(HCD_FLAG_HW_ACCESSIBLE, >flags);
> + spin_unlock_irqrestore(>lock, flags);

This is completely wrong.  The hardware certainly remains accessible 
when the root hub stops running.  The HW_ACCESSIBLE flag should not be 
cleared here.

And if the Master Interrupt Enable bit is cleared, how will the driver 
ever learn if a remote wakeup request (such as a plug or unplug event) 
occurs?

Alan Stern

> +
>   return rc;
>  }
>  
> @@ -326,7 +334,10 @@ static int ohci_bus_resume (struct usb_hcd *hcd)
>   if (time_before (jiffies, ohci->next_statechange))
>   msleep(5);
>  
> - spin_lock_irq (>lock);
> + spin_lock_irq(>lock);
> + set_bit(HCD_FLAG_HW_ACCESSIBLE, >flags);
> + ohci_writel(ohci, OHCI_INTR_MIE, >regs->intrenable);
> + ohci_readl(ohci, >regs->intrenable);
>  
>   if (unlikely(!HCD_HW_ACCESSIBLE(hcd)))
>   rc = -ESHUTDOWN;
> -- 
> 2.8.1
> 


Re: [PATCH 1/6] usb: Iterator for ports

2021-03-25 Thread Alan Stern
On Thu, Mar 25, 2021 at 05:14:42PM +0200, Heikki Krogerus wrote:
> On Thu, Mar 25, 2021 at 10:41:09AM -0400, Alan Stern wrote:
> > On Thu, Mar 25, 2021 at 03:29:21PM +0300, Heikki Krogerus wrote:
> > > Introducing usb_for_each_port(). It works the same way as
> > > usb_for_each_dev(), but instead of going through every USB
> > > device in the system, it walks through the USB ports in the
> > > system.
> > > 
> > > Signed-off-by: Heikki Krogerus 
> > 
> > This has a couple of nasty errors.
> > 
> > > ---
> > >  drivers/usb/core/usb.c | 43 ++
> > >  include/linux/usb.h|  1 +
> > >  2 files changed, 44 insertions(+)
> > > 
> > > diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> > > index 2ce3667ec6fae..6d49db9a1b208 100644
> > > --- a/drivers/usb/core/usb.c
> > > +++ b/drivers/usb/core/usb.c
> > > @@ -398,6 +398,49 @@ int usb_for_each_dev(void *data, int (*fn)(struct 
> > > usb_device *, void *))
> > >  }
> > >  EXPORT_SYMBOL_GPL(usb_for_each_dev);
> > >  
> > > +struct each_hub_arg {
> > > + void *data;
> > > + int (*fn)(struct device *, void *);
> > > +};
> > > +
> > > +static int __each_hub(struct device *dev, void *data)
> > > +{
> > > + struct each_hub_arg *arg = (struct each_hub_arg *)data;
> > > + struct usb_device *hdev = to_usb_device(dev);
> > 
> > to_usb_device() won't work properly if the struct device isn't embedded 
> > in an actual usb_device structure.  And that will happen, since the USB 
> > bus type holds usb_interface structures as well as usb_devices.
> 
> OK, so I need to filter them out.
> 
> > In fact, you should use usb_for_each_dev here; it already does what you 
> > want.
> 
> Unfortunately I can't use usb_for_each_dev here, because it deals with
> struct usb_device while I need to deal with struct device in the
> callback.

I see; the prototypes of arg->fn are different.  Oh well, it's a shame 
the code can't be reused.  In any case, you should copy what 
usb.c:__each_dev() does.

Alan Stern

> > > + struct usb_hub *hub;
> > > + int ret;
> > > + int i;
> > > +
> > > + hub = usb_hub_to_struct_hub(hdev);
> > > + if (!hub)
> > > + return 0;
> > > +
> > > + for (i = 0; i < hdev->maxchild; i++) {
> > > + ret = arg->fn(>ports[i]->dev, arg->data);
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > 
> > Don't you need some sort of locking or refcounting here?  What would 
> > happen if this hub got removed while the routine was running?
> 
> I'll use a lock then.
> 
> thanks,
> 
> -- 
> heikki


Re: [PATCH 1/6] usb: Iterator for ports

2021-03-25 Thread Alan Stern
On Thu, Mar 25, 2021 at 03:29:21PM +0300, Heikki Krogerus wrote:
> Introducing usb_for_each_port(). It works the same way as
> usb_for_each_dev(), but instead of going through every USB
> device in the system, it walks through the USB ports in the
> system.
> 
> Signed-off-by: Heikki Krogerus 

This has a couple of nasty errors.

> ---
>  drivers/usb/core/usb.c | 43 ++
>  include/linux/usb.h|  1 +
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> index 2ce3667ec6fae..6d49db9a1b208 100644
> --- a/drivers/usb/core/usb.c
> +++ b/drivers/usb/core/usb.c
> @@ -398,6 +398,49 @@ int usb_for_each_dev(void *data, int (*fn)(struct 
> usb_device *, void *))
>  }
>  EXPORT_SYMBOL_GPL(usb_for_each_dev);
>  
> +struct each_hub_arg {
> + void *data;
> + int (*fn)(struct device *, void *);
> +};
> +
> +static int __each_hub(struct device *dev, void *data)
> +{
> + struct each_hub_arg *arg = (struct each_hub_arg *)data;
> + struct usb_device *hdev = to_usb_device(dev);

to_usb_device() won't work properly if the struct device isn't embedded 
in an actual usb_device structure.  And that will happen, since the USB 
bus type holds usb_interface structures as well as usb_devices.

In fact, you should use usb_for_each_dev here; it already does what you 
want.

> + struct usb_hub *hub;
> + int ret;
> + int i;
> +
> + hub = usb_hub_to_struct_hub(hdev);
> + if (!hub)
> + return 0;
> +
> + for (i = 0; i < hdev->maxchild; i++) {
> + ret = arg->fn(>ports[i]->dev, arg->data);
> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}

Don't you need some sort of locking or refcounting here?  What would 
happen if this hub got removed while the routine was running?

Alan Stern


Re: [RFC PATCH] USB:XHCI:Adjust the log level of hub

2021-03-25 Thread Alan Stern
On Thu, Mar 25, 2021 at 02:59:03PM +0100, Greg KH wrote:
> On Thu, Mar 25, 2021 at 09:33:53PM +0800, liulongfang wrote:
> > On 2021/3/25 18:31, Greg KH wrote:
> > > On Thu, Mar 25, 2021 at 06:04:12PM +0800, Longfang Liu wrote:
> > >> When the number of ports of the hub is not between 1 and Maxports,
> > >> it will only exit the registration of the hub on the current controller,
> > >> but it will not affect the function of the controller itself. Its other
> > >> hubs can operate normally, so the log level here can be changed from
> > >> error to information.
> > >>
> > >> Signed-off-by: Longfang Liu 
> > >> ---
> > >>  drivers/usb/core/hub.c | 10 --
> > >>  1 file changed, 4 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > >> index b1e14be..70294ad 100644
> > >> --- a/drivers/usb/core/hub.c
> > >> +++ b/drivers/usb/core/hub.c
> > >> @@ -1409,13 +1409,11 @@ static int hub_configure(struct usb_hub *hub,
> > >>  maxchild = min_t(unsigned, maxchild, USB_SS_MAXPORTS);
> > >>  
> > >>  if (hub->descriptor->bNbrPorts > maxchild) {
> > >> -message = "hub has too many ports!";
> > >> -ret = -ENODEV;
> > >> -goto fail;
> > >> +dev_info(hub_dev, "hub has too many ports!\n");
> > > 
> > > Is this an error?  If so, report it as such, not as "information".
> > > 
> > >> +return -ENODEV;
> > >>  } else if (hub->descriptor->bNbrPorts == 0) {
> > >> -message = "hub doesn't have any ports!";
> > >> -ret = -ENODEV;
> > >> -goto fail;
> > >> +dev_info(hub_dev, "hub doesn't have any ports!\n");
> > > 
> > > Same here.
> > > 
> > > What problem are you trying to solve here?
> > > 
> > > What hub do you have that has no ports, or too many, that you think
> > > should still be able to work properly?
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > > .
> > On our test platform, the xhci usb3 hub has no port.
> 
> Sounds like a broken device, why not fix that?

If this device is used only for testing and not for production, who 
cares how severe the log message is?

> > when initializing the usb3 hub, an error will be reported
> > because the port is 0, but in fact it will not affect
> > the use of usb2, and the usb2 hub is working normally.
> 
> But you can not have a USB3 hub with no ports, isn't that against
> against the USB spec?  How does this device pass the USB-IF
> certification?
> 
> > thanks, therefore, in order to reduce the severity of the log,
> > we hope to lower the level of this log.
> 
> You did not reduce the severity at all, everyone can still see it.
> 
> Please try fixing your hardware :

Alternatively, you could change the xhci-hcd driver.  Make it skip 
registering the USB-3 root hub if that hub has no ports.

But don't change these log messages.  They describe real errors, so they 
should be actual error messages.

Alan Stern


Re: [PATCH] USB: ehci: drop workaround for forced irq threading

2021-03-22 Thread Alan Stern
On Mon, Mar 22, 2021 at 05:59:17PM +0100, Sebastian Andrzej Siewior wrote:
> On 2021-03-22 12:42:00 [-0400], Alan Stern wrote:
> > What happens on RT systems?  Are they smart enough to avoid the whole 
> > problem by enabling interrupts during _all_ callbacks?
> 
> tl;dr: Yes. 
> 
> The referenced commit (id 81e2073c175b) disables interrupts only on !RT
> configs so for RT everything remains unchanged (the backports are
> already adjusted for the old stable trees to use the proper CONFIG_* for
> enabled RT).
> 
> All hrtimer callbacks run as HRTIMER_MODE_SOFT by default. The
> HRTIMER_MODE_HARD ones (which expire in HARDIRQ context) were audited /
> explicitly enabled.
> The same goes irq_work.
> The printk code is different compared to mainline. A printk() on RT in
> HARDIRQ context is printed once the HARDIRQ context is left. So the
> serial/console/… driver never gets a chance to acquire its lock in
> hardirq context.
> 
> An interrupt handler which is not forced-threaded must be marked as such
> and must not use any spinlock_t based locking. lockdep/might_sleep
> complain here already.

Okay, in that case:

Acked-by: Alan Stern 


Re: [PATCH] USB: ehci: drop workaround for forced irq threading

2021-03-22 Thread Alan Stern
On Mon, Mar 22, 2021 at 12:12:49PM +0100, Johan Hovold wrote:
> Force-threaded interrupt handlers used to run with interrupts enabled,
> something which could lead to deadlocks in case a threaded handler
> shared a lock with code running in hard interrupt context (e.g. timer
> callbacks) and did not explicitly disable interrupts.
> 
> Since commit 81e2073c175b ("genirq: Disable interrupts for force
> threaded handlers") interrupt handlers always run with interrupts
> disabled on non-RT so that drivers no longer need to do handle forced
> threading ("threadirqs").

What happens on RT systems?  Are they smart enough to avoid the whole 
problem by enabling interrupts during _all_ callbacks?

Alan Stern


Re: [PATCH v1 2/2] usb: host: ehci-tegra: Select USB_GADGET Kconfig option

2021-03-20 Thread Alan Stern
On Sat, Mar 20, 2021 at 06:19:15PM +0300, Dmitry Osipenko wrote:
> Select USB_GADGET Kconfig option in order to fix build failure which
> happens because ChipIdea driver has a build dependency on both USB_GADGET
> and USB_EHCI_HCD, while USB_EHCI_TEGRA force-selects the ChipIdea driver
> without taking into account the tristate USB_GADGET dependency. It's not
> possible to do anything about the cyclic dependency of the Kconfig
> options, but USB_EHCI_TEGRA is now a deprecated option that isn't used
> by defconfigs and USB_GADGET is wanted on Tegra by default, hence it's
> okay to have a bit clunky workaround for it.
> 
> Fixes: c3590c7656fb ("usb: host: ehci-tegra: Remove the driver")
> Reported-by: kernel test robot 
> Signed-off-by: Dmitry Osipenko 

Acked-by: Alan Stern 

> ---
>  drivers/usb/host/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/host/Kconfig b/drivers/usb/host/Kconfig
> index b94f2a070c05..df9428f1dc5e 100644
> --- a/drivers/usb/host/Kconfig
> +++ b/drivers/usb/host/Kconfig
> @@ -272,6 +272,7 @@ config USB_EHCI_TEGRA
>   select USB_CHIPIDEA
>   select USB_CHIPIDEA_HOST
>   select USB_CHIPIDEA_TEGRA
> + select USB_GADGET
>   help
> This option is deprecated now and the driver was removed, use
> USB_CHIPIDEA_TEGRA instead.
> -- 
> 2.30.2
> 


Re: [PATCH] usb: dwc3: remove 'pm_runtime_set_active' in resume callback

2021-03-17 Thread Alan Stern
On Wed, Mar 17, 2021 at 05:25:20PM +0900, taehyun cho wrote:
> On Mon, Mar 15, 2021 at 10:13:35AM -0400, Alan Stern wrote:
> > On Mon, Mar 15, 2021 at 04:43:17PM +0900, taehyun cho wrote:
> > > 'pm_runtime_set_active' sets a flag to describe rumtime status.
> > > This flag is automatically set in pm_runtime_get_sync/put_sync API.
> > > 'pm_runtime_set_active' checks the runtime status of parent device.
> > > As a result, the below error message is printed.
> > > dwc3 .dwc3: runtime PM trying to activate child device
> > > .dwc3 but parent (.usb) is not active.
> > 
> > This is very suspicious.  That error message indicates a real error is 
> > present; removing these pm_runtime_set_active calls won't fix the error.
> > 
> > You need to determine why the parent platform device .usb isn't 
> > active when the dwc3 probe and resume routines are called.  It seems 
> > likely that there is a bug in the platform device's driver.
> > 
> > Alan Stern
> >
> 
> Alan,
> 
> Thanks to your comments, I checked our platform device driver and found
> the problem. Our parent platform device didn't set active in resume
> callback. This made a problem.

Ah, good.  Does the platform driver set the active flag in its probe 
routine?

>  Thank you for the help and sorry for
> disturbing you.

No problem at all.

Alan Stern


Re: [PATCH] usb: dwc3: remove 'pm_runtime_set_active' in resume callback

2021-03-15 Thread Alan Stern
On Mon, Mar 15, 2021 at 04:43:17PM +0900, taehyun cho wrote:
> 'pm_runtime_set_active' sets a flag to describe rumtime status.
> This flag is automatically set in pm_runtime_get_sync/put_sync API.
> 'pm_runtime_set_active' checks the runtime status of parent device.
> As a result, the below error message is printed.
> dwc3 .dwc3: runtime PM trying to activate child device
> .dwc3 but parent (.usb) is not active.

This is very suspicious.  That error message indicates a real error is 
present; removing these pm_runtime_set_active calls won't fix the error.

You need to determine why the parent platform device .usb isn't 
active when the dwc3 probe and resume routines are called.  It seems 
likely that there is a bug in the platform device's driver.

Alan Stern

> Signed-off-by: taehyun cho 
> ---
>  drivers/usb/dwc3/core.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index 94fdbe502ce9..e7c0e390f885 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -1553,7 +1553,6 @@ static int dwc3_probe(struct platform_device *pdev)
>  
>   spin_lock_init(>lock);
>  
> - pm_runtime_set_active(dev);
>   pm_runtime_use_autosuspend(dev);
>   pm_runtime_set_autosuspend_delay(dev, DWC3_DEFAULT_AUTOSUSPEND_DELAY);
>   pm_runtime_enable(dev);
> @@ -1920,7 +1919,6 @@ static int dwc3_resume(struct device *dev)
>   return ret;
>  
>   pm_runtime_disable(dev);
> - pm_runtime_set_active(dev);
>   pm_runtime_enable(dev);
>  
>   return 0;
> -- 
> 2.26.0


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-12 Thread Alan Stern
On Fri, Mar 12, 2021 at 07:26:31PM +0100, Sedat Dilek wrote:
> On Fri, Mar 12, 2021 at 7:05 PM Alan Stern  wrote:

> > Although it's not conclusive, this log seems to indicate that ata_id
> > is the only program causing resets.  Have you tried preventing the
> > ata_id program from running (for example, by renaming it)?
> >
> 
> This is /lib/udev/ata_id from Debian's udev package.

That does not answer my question.

> > > Your diff now should say; s/SCSI ioctl error/SCSI ioctl info'.
> >
> > No, it shouldn't.  The log message itself is an info, but the event it
> > reports is an error.
> >
> 
> OK.
> Some of these SCSI ioctl errors are not causing a xhci-reset.

Yes, I noticed that.  In fact, the commands that cause a reset are all 
A1 (and not all of them), never 85.

> > > Alan, so "t" flags should be added as a quirks to linux-kernel sources...
> > >
> > > t = NO_ATA_1X  (don't allow ATA(12) and ATA(16) commands, uas only);
> > >
> > > ...for my ASMedia USB-3.0 controller?
> >
> > That's not at all clear.  This is a very common and popular device,
> > and nobody else has reported these problems.  It could be that
> > something is odd about your particular drive or computer, not these
> > drives in general.
> >
> 
> So, the external USB-3.0 HDD is now in "UAS only" mode/status.

Why?  Did you change something?

Alan Stern

> Cannot judge if things got better or not.
> 
> - Sedat -


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-12 Thread Alan Stern
smartd
> [Fri Mar 12 18:26:30 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:31 2021] SCSI ioctl error, cmd 85, prog smartd
> [Fri Mar 12 18:26:39 2021] SCSI ioctl error, cmd A1, prog ata_id
> [Fri Mar 12 18:26:39 2021] SCSI ioctl error, cmd A1, prog ata_id
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd A1, prog ata_id
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:40 2021] SCSI ioctl error, cmd 85, prog hdparm
> [Fri Mar 12 18:26:43 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:43 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:44 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:49 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:49 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:49 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:49 2021] SCSI ioctl error, cmd 85, prog udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:26:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:28:09 2021] SCSI ioctl error, cmd 85, prog smartctl
> [Fri Mar 12 18:28:09 2021] SCSI ioctl error, cmd 85, prog smartctl
> [Fri Mar 12 18:28:10 2021] SCSI ioctl error, cmd 85, prog smartctl
> [Fri Mar 12 18:28:11 2021] SCSI ioctl error, cmd 85, prog smartctl
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:49 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> [Fri Mar 12 18:36:50 2021] SCSI ioctl error, cmd 85, prog pool-udisksd

Although it's not conclusive, this log seems to indicate that ata_id
is the only program causing resets.  Have you tried preventing the 
ata_id program from running (for example, by renaming it)?

> Your diff now should say; s/SCSI ioctl error/SCSI ioctl info'.

No, it shouldn't.  The log message itself is an info, but the event it
reports is an error.

> Alan, so "t" flags should be added as a quirks to linux-kernel sources...
> 
> t = NO_ATA_1X  (don't allow ATA(12) and ATA(16) commands, uas only);
> 
> ...for my ASMedia USB-3.0 controller?

That's not at all clear.  This is a very common and popular device,
and nobody else has reported these problems.  It could be that
something is odd about your particular drive or computer, not these
drives in general.

Alan Stern


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-10 Thread Alan Stern
On Tue, Mar 09, 2021 at 08:04:53PM -0800, Asutosh Das (asd) wrote:
> On 3/9/2021 7:14 PM, Alan Stern wrote:
> > On Tue, Mar 09, 2021 at 07:04:34PM -0800, Asutosh Das (asd) wrote:
> > > Hello
> > > I & Can (thanks CanG) debugged this further:
> > > 
> > > Looks like this issue can occur if the sd probe is asynchronous.
> > > 
> > > Essentially, the sd_probe() is done asynchronously and 
> > > driver_probe_device()
> > > invokes pm_runtime_get_suppliers() before invoking sd_probe().
> > > 
> > > But scsi_probe_and_add_lun() runs in a separate context.
> > > So the scsi_autopm_put_device() invoked from scsi_scan_host() context
> > > reduces the link->rpm_active to 1. And sd_probe() invokes
> > > scsi_autopm_put_device() and starts a timer. And then 
> > > driver_probe_device()
> > > invoked from __device_attach_async_helper context reduces the
> > > link->rpm_active to 1 thus enabling the supplier to suspend before the
> > > consumer suspends.
> > 
> > > I don't see a way around this. Please let me know if you
> > > (@Alan/@Bart/@Adrian) have any thoughts on this.
> > 
> > How about changing the SCSI core so that it does a runtime_get before
> > starting an async probe, and the async probe routine does a
> > runtime_put when it is finished?  In other words, don't allow a device
> > to go into runtime suspend while it is waiting to be probed.
> > 
> > I don't think that would be too intrusive.
> > 
> > Alan Stern
> > 
> 
> Hi Alan
> Thanks for the suggestion.
> 
> Am trying to understand:
> 
> Do you mean something like this:
> 
> int scsi_sysfs_add_sdev(struct scsi_device *sdev)
> {
>   
>   scsi_autopm_get_device(sdev);
>   pm_runtime_get_noresume(>sdev_gendev);
>   [...]
>   scsi_autopm_put_device(sdev);
>   [...]
> }
> 
> static int sd_probe(struct device *dev)
> {
>   [...]
>   pm_runtime_put_noidle(dev);
>   scsi_autopm_put_device(sdp);
>   [...]
> }
> 
> This may work (I'm limited by my imagination in scsi layer :) ).

I'm not sure about this.  To be honest, I did not read the entirety of 
your last message; it had way too much detail.  THere's a time and place 
for that, but when you're brainstorming to figure out the underlying 
cause of a problem and come up with a strategy to fix it, you want to 
concentrate on the overall picture, not the details.

As I understand the situation, you've get a SCSI target with multiple 
logical units, let's say A and B, and you need to make sure that A never 
goes into runtime suspend unless B is already suspended.  In other 
words, B always has to suspend before A and resume after A.

To do this, you register a device link with A as the supplier and B as 
the consumer.  Then the PM core takes care of the ordering for you.

But I don't understand when you set up the device link.  If the timing 
is wrong then, thanks to async SCSI probing, you may have a situation 
where A is registered before B and before the link is set up.  Then 
there's temporarily nothing to stop A from suspending before B.

You also need to prevent each device from suspending before it is 
probed.  That's the easy part I was trying to address before (although 
it may not be so easy if the drivers are in loadable modules and not 
present in the kernel).

You need to think through these issues before proposing actual changes.

> But the pm_runtime_put_noidle() would have to be added to all registered
> scsi_driver{}, perhaps? Or may be I can check for sdp->type?

Like this; it's too early to worry about this sort of thing.

Alan Stern


Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-09 Thread Alan Stern
On Tue, Mar 09, 2021 at 07:04:34PM -0800, Asutosh Das (asd) wrote:
> Hello
> I & Can (thanks CanG) debugged this further:
> 
> Looks like this issue can occur if the sd probe is asynchronous.
> 
> Essentially, the sd_probe() is done asynchronously and driver_probe_device()
> invokes pm_runtime_get_suppliers() before invoking sd_probe().
> 
> But scsi_probe_and_add_lun() runs in a separate context.
> So the scsi_autopm_put_device() invoked from scsi_scan_host() context
> reduces the link->rpm_active to 1. And sd_probe() invokes
> scsi_autopm_put_device() and starts a timer. And then driver_probe_device()
> invoked from __device_attach_async_helper context reduces the
> link->rpm_active to 1 thus enabling the supplier to suspend before the
> consumer suspends.

> I don't see a way around this. Please let me know if you
> (@Alan/@Bart/@Adrian) have any thoughts on this.

How about changing the SCSI core so that it does a runtime_get before
starting an async probe, and the async probe routine does a
runtime_put when it is finished?  In other words, don't allow a device
to go into runtime suspend while it is waiting to be probed.

I don't think that would be too intrusive.

Alan Stern


Re: [PATCH v2 16/18] usb: common: add function to get interval expressed in us unit

2021-03-08 Thread Alan Stern
On Mon, Mar 08, 2021 at 10:52:05AM +0800, Chunfeng Yun wrote:
> Add a new function to convert bInterval into the time expressed
> in 1us unit.
> 
> Signed-off-by: Chunfeng Yun 
> ---
> v2: move kerneldoc next to function's definition suggested by Alan
> ---
>  drivers/usb/common/common.c | 41 +
>  drivers/usb/core/devices.c  | 21 ---
>  drivers/usb/core/endpoint.c | 35 ---
>  include/linux/usb/ch9.h |  3 +++
>  4 files changed, 52 insertions(+), 48 deletions(-)

Acked-by: Alan Stern 


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-07 Thread Alan Stern
On Sun, Mar 07, 2021 at 05:57:39PM +0100, Sedat Dilek wrote:
> On Sun, Mar 7, 2021 at 4:46 PM Alan Stern  wrote:
> >
> > On Sat, Mar 06, 2021 at 09:49:00PM +0100, Sedat Dilek wrote:
> >
> > > For testing purposes, I stopped these systemd services:
> > >
> > > 1. systemctl stop smartmontools.service
> > >
> > > 2. systemctl stop udisks2.service
> > >
> > > Last seen xhci-reset:
> > >
> > > [Sat Mar  6 21:37:40 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> > >
> > > So, that every 10min xhci-reset was caused by pool-udisksd from 
> > > udisks2.service.
> >
> > You have found the cause of your problem!  Great!
> >
> > And now, obviously order to fix the problem, you'll have to look into
> > the udisks2 service.  Maybe you can configure it so that it won't send
> > the problem-causing commands.
> >
> 
> I tried yesterday to add --debug option to the ExexStart line of
> udisks2.service, but did not see anything helpful.
> 
> There exist more user-space than udisks2 causing these xhci-resets.
> The cmd#s are also clear: A1 and 85 - whatever they mean.

Those are the two prefixes which indicate an ATA command is present.  
You can find them listed as ATA_12 and ATA_16 in 
include/scsi/scsi_proto.h.

> As said with Linux v5.10.y and Linux v5.11 I have not seen this.

Have you tried setting the quirk flag we discussed earlier _and_ turning 
off udisks2?  Maybe also turning off the other services which generate 
these commands?  Perhaps you'll find that when the quirk flag is 
present, some of those programs _don't_ generate any ATA commands.

> What about CCing linux-block and linux-scsi people?

Sure, go ahead if you want to.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-07 Thread Alan Stern
On Sat, Mar 06, 2021 at 09:49:00PM +0100, Sedat Dilek wrote:

> For testing purposes, I stopped these systemd services:
> 
> 1. systemctl stop smartmontools.service
> 
> 2. systemctl stop udisks2.service
> 
> Last seen xhci-reset:
> 
> [Sat Mar  6 21:37:40 2021] SCSI ioctl error, cmd 85, prog pool-udisksd
> 
> So, that every 10min xhci-reset was caused by pool-udisksd from 
> udisks2.service.

You have found the cause of your problem!  Great!

And now, obviously order to fix the problem, you'll have to look into 
the udisks2 service.  Maybe you can configure it so that it won't send 
the problem-causing commands.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-06 Thread Alan Stern
On Sat, Mar 06, 2021 at 07:42:30AM +0100, Sedat Dilek wrote:
> No, with Debian-Kernel 5.10.19-1 there are no xhci-resets:

Is the kernel the only thing that is different?  The rest of the 
operating system and environment is exactly the same?

> But I see there is already a quirk enabled and matches my ASmedia USB
> 3.0 controller (as I have *no* usb-storage-quirks enabled):
> 
> root# LC_ALL=C dmesg -T | grep -i quirks | egrep '174c|55aa'
> [Sat Mar  6 06:52:41 2021] usb-storage 4-1:1.0: Quirks match for vid
> 174c pid 55aa: 40

Yes, this is because that type of device already has a quirk entry built 
into the kernel.  You can find it by searching for "174c" in the kernel 
source file drivers/usb/storage/unusual_devs.h.

> Thanks Alan for all the hints and tips in the topic "usb-storage and
> quirks" and your patience.

You can try building a 5.11 kernel with the patch below.  I don't know 
whether it will show anything in the dmesg log when one of these resets 
occurs, but it might.

If that doesn't work out, another possibility is to use git bisect to 
find the commit between 5.10 and 5.11 which caused the problem to start.

Alan Stern


--- usb-devel.orig/block/scsi_ioctl.c
+++ usb-devel/block/scsi_ioctl.c
@@ -258,8 +258,11 @@ static int blk_complete_sghdr_rq(struct
hdr->host_status = host_byte(req->result);
hdr->driver_status = driver_byte(req->result);
hdr->info = 0;
-   if (hdr->masked_status || hdr->host_status || hdr->driver_status)
+   if (hdr->masked_status || hdr->host_status || hdr->driver_status) {
hdr->info |= SG_INFO_CHECK;
+   printk(KERN_INFO "SCSI ioctl error, cmd %02X, prog %s\n",
+   req->cmd[0], current->comm);
+   }
hdr->resid = req->resid_len;
hdr->sb_len_wr = 0;
 



Re: [PATCH v10 1/2] scsi: ufs: Enable power management for wlun

2021-03-06 Thread Alan Stern
On Fri, Mar 05, 2021 at 06:54:24PM -0800, Asutosh Das (asd) wrote:

> Now during my testing I see a weird issue sometimes (1 in 7).
> Scenario - bootups
> 
> Issue:
> The supplier 'ufs_device_wlun 0:0:0:49488' goes into runtime suspend even
> when one/more of its consumers are in RPM_ACTIVE state.
> 
> *Log:
> [   10.056379][  T206] sd 0:0:0:1: [sdb] Synchronizing SCSI cache
> [   10.062497][  T113] sd 0:0:0:5: [sdf] Synchronizing SCSI cache
> [   10.356600][   T32] sd 0:0:0:7: [sdh] Synchronizing SCSI cache
> [   10.362944][  T174] sd 0:0:0:3: [sdd] Synchronizing SCSI cache
> [   10.696627][   T83] sd 0:0:0:2: [sdc] Synchronizing SCSI cache
> [   10.704562][  T170] sd 0:0:0:6: [sdg] Synchronizing SCSI cache
> [   10.980602][T5] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> 
> /** Printing all the consumer nodes of supplier **/
> [   10.987327][T5] ufs_device_wlun 0:0:0:49488: usage-count @ suspend: 0
> <-- this is the usage_count
> [   10.994440][T5] ufs_rpmb_wlun 0:0:0:49476: PM state - 2
> [   11.000402][T5] scsi 0:0:0:49456: PM state - 2
> [   11.005453][T5] sd 0:0:0:0: PM state - 2
> [   11.009958][T5] sd 0:0:0:1: PM state - 2
> [   11.014469][T5] sd 0:0:0:2: PM state - 2
> [   11.019072][T5] sd 0:0:0:3: PM state - 2
> [   11.023595][T5] sd 0:0:0:4: PM state - 0 << RPM_ACTIVE
> [   11.353298][T5] sd 0:0:0:5: PM state - 2
> [   11.357726][T5] sd 0:0:0:6: PM state - 2
> [   11.362155][T5] sd 0:0:0:7: PM state - 2
> [   11.366584][T5] ufshcd-qcom 1d84000.ufshc: __ufshcd_wl_suspend - 8709
> [   11.374366][T5] ufs_device_wlun 0:0:0:49488: __ufshcd_wl_suspend -
> (0) has rpm_active flags
> [   11.383376][T5] ufs_device_wlun 0:0:0:49488:
> ufshcd_wl_runtime_suspend <-- Supplier suspends fine.
> [   12.977318][  T174] sd 0:0:0:4: [sde] Synchronizing SCSI cache
> 
> And the the suspend of sde is stuck now:
> schedule+0x9c/0xe0
> schedule_timeout+0x40/0x128
> io_schedule_timeout+0x44/0x68
> wait_for_common_io+0x7c/0x100
> wait_for_completion_io+0x14/0x20
> blk_execute_rq+0x90/0xcc
> __scsi_execute+0x104/0x1c4
> sd_sync_cache+0xf8/0x2a0
> sd_suspend_common+0x74/0x11c
> sd_suspend_runtime+0x14/0x20
> scsi_runtime_suspend+0x64/0x94
> __rpm_callback+0x80/0x2a4
> rpm_suspend+0x308/0x614
> pm_runtime_work+0x98/0xa8
> 
> I added 'DL_FLAG_RPM_ACTIVE' while creating links.
>   if (hba->sdev_ufs_device) {
>   link = device_link_add(>sdev_gendev,
>   >sdev_ufs_device->sdev_gendev,
>  DL_FLAG_PM_RUNTIME|DL_FLAG_RPM_ACTIVE);
> I didn't expect this to resolve the issue anyway and it didn't.
> 
> Another interesting point here is when I resume any of the above suspended
> consumers, it all goes back to normal, which is kind of expected. I tried
> resuming the consumer and the supplier is resumed and the supplier is
> suspended when all the consumers are suspended.
> 
> Any pointers on this issue please?
> 
> @Bart/@Alan - Do you've any pointers please?

It's very noticeable that although you seem to have isolated a bug in
the power management subsystem (supplier goes into runtime suspend
even when one of its consumers is still active), you did not CC the
power management maintainer or mailing list.

I have added the appropriate CC's.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 08:41:40PM +0100, Sedat Dilek wrote:
> On Fri, Mar 5, 2021 at 8:30 PM Alan Stern  wrote:

> > Okay, that indicates the ATA commands are being sent not by the kernel
> > but by some program.  I'm not sure how you can easily find out which
> > program; probably the best thing to do is turn them off one by one until
> > you find the one responsible.
> >
> 
> I can hardly imagine which user-space tools other than powertop can
> interfere here.
> Any ideas?

No.  Especially since I have no idea what programs are running on your 
computer.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 08:22:22PM +0100, Sedat Dilek wrote:
> The quirks match:
> 
> [Fri Mar  5 20:06:56 2021] usb-storage 4-1:1.0: USB Mass Storage device 
> detected
> [Fri Mar  5 20:06:56 2021] usb-storage 4-1:1.0: Quirks match for vid
> 174c pid 55aa: 40
> 
> That seems not to be the trick:
> 
> root# LC_ALL=C dmesg -T | grep 'usb 4-1:'
> [Fri Mar  5 20:06:55 2021] usb 4-1: new SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 20:06:55 2021] usb 4-1: New USB device found,
> idVendor=174c, idProduct=55aa, bcdDevice= 1.00
> [Fri Mar  5 20:06:55 2021] usb 4-1: New USB device strings: Mfr=2,
> Product=3, SerialNumber=1
> [Fri Mar  5 20:06:55 2021] usb 4-1: Product: MEDION HDDrive-n-GO
> [Fri Mar  5 20:06:55 2021] usb 4-1: Manufacturer: MEDION
> [Fri Mar  5 20:06:55 2021] usb 4-1: SerialNumber: 3180092C
> [Fri Mar  5 20:06:57 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd

Okay, that indicates the ATA commands are being sent not by the kernel 
but by some program.  I'm not sure how you can easily find out which 
program; probably the best thing to do is turn them off one by one until 
you find the one responsible.

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 08:05:49PM +0100, Sedat Dilek wrote:
> On Fri, Mar 5, 2021 at 5:07 PM Alan Stern  wrote:

> > Don't worry about trying to decode the output.  To me it looks like the
> > drive crashes and needs to be reset at times when the computer sends it
> > an ATA command.  (Not all ATA commands, but some.)  You can prevent this
> > by setting the following module parameter for the usb-storage driver:
> >
> > quirks=174c:55aa:t
> >
> > where the two numbers are the Vendor and Product IDs for the external
> > drive, and the 't' is a quirks flag saying not to use any ATA commands.
> > If this module parameter fixes the problem, we can add a permanent quirk
> > setting to the kernel.
> >
> 
> Thanks Alan.
> 
> I did:
> 
> [ /etc/modules-load.d/usb-storage.conf ]
> 
> # Add quirks for ATA commands for usb-storage devices connected to
> ASMedia M1042 USB-3.0 controller
> options usb-storage quirks=174c:55aa:t
> - EOF -
> 
> It is:
> 
> /lib/modules/5.12.0-rc1-11-amd64-clang13-cfi/kernel/drivers/usb/storage/usb-storage.ko
> 
> But:
> 
> root# lsmod | grep usb | grep storage
> usb_storage90112  2 uas
> scsi_mod  307200  6 sd_mod,usb_storage,uas,libata,sg,sr_mod
> usbcore   385024  14
> usbserial,xhci_hcd,ehci_pci,usbnet,usbhid,usb_storage,usb_wwan,uvcvideo,ehci_hcd,btusb,xhci_pci,cdc_ether,uas,option

I don't understand.  What is the point of this listing?

> I have not rebooted yet.

Depending on how your system is set up, the new usb-storage.conf file 
might need to be copied into the initramfs image.

However, you don't need to reload the driver module or reboot.  To make 
the new quirk take effect, all you have to do is write 174c:55aa:t to
/sys/module/usb_storage/parameters/quirks.

> Interferences with PowerTop?

Maybe.  It's entirely possible that PowerTop or some other program is 
issuing the troublesome ATA commands.

> These xhci-resets happen every 10mins in a sequence of 4.
> 
> I have here a powertop.service (systemd) with passing --auto-tune option.
> That was not a problem with previous Linux-kernels >= v5.12-rc1, so.
> 
> Alan, what do you think?

Try turning the service off and see if that makes any difference.

Alan Stern


Re: [PATCH] tools/memory-model: Fix smp_mb__after_spinlock() spelling

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 10:26:50AM -0800, Paul E. McKenney wrote:
> On Fri, Mar 05, 2021 at 04:41:49PM +0100, Björn Töpel wrote:
> > On 2021-03-05 16:36, Alan Stern wrote:
> > > On Fri, Mar 05, 2021 at 11:28:23AM +0100, Björn Töpel wrote:
> > > > From: Björn Töpel 
> > > > 
> > > > A misspelled invokation of git-grep, revealed that
> > > ---^
> > > 
> > > Smetimes my brain is a little slow...  Do you confirm that this is a
> > > joke?
> > > 
> > 
> > I wish, Alan. I wish.
> > 
> > Looks like I can only spel function names correctly.
> 
> Heh!  I missed that one completely.  Please see below for a wortschmied
> commit.
> 
>   Thanx, Paul
> 
> 
> 
> commit 1c737ce34715db9431f6b034f92dbf09d954126d
> Author: Björn Töpel 
> Date:   Fri Mar 5 11:28:23 2021 +0100
> 
> tools/memory-model: Fix smp_mb__after_spinlock() spelling
> 
> A misspelled git-grep regex revealed that smp_mb__after_spinlock()
> was misspelled in explanation.txt.
> 
> This commit adds the missing "_" to smp_mb__after_spinlock().

Strictly speaking, the commit adds a missing "_" to 
smp_mb_after_spinlock().  If it added anything to 
smp_mb__after_spinlock(), the result would be incorrect.

How about just:

A misspelled git-grep regex revealed that smp_mb__after_spinlock()
was misspelled in explanation.txt.  This commit adds the missing "_".

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 09:12:30AM +0800, Boqun Feng wrote:
> On Thu, Mar 04, 2021 at 11:11:42AM -0500, Alan Stern wrote:

> > Forget about local variables for the time being and just consider
> > 
> > dep ; [Plain] ; rfi
> > 
> > For example:
> > 
> > A: r1 = READ_ONCE(x);
> >y = r1;
> > B: r2 = READ_ONCE(y);
> > 
> > Should B be ordered after A?  I don't see how any CPU could hope to 
> > excute B before A, but maybe I'm missing something.
> > 
> 
> Agreed.
> 
> > There's another twist, connected with the fact that herd7 can't detect 
> > control dependencies caused by unexecuted code.  If we have:
> > 
> > A: r1 = READ_ONCE(x);
> > if (r1)
> > WRITE_ONCE(y, 5);
> > r2 = READ_ONCE(y);
> > B: WRITE_ONCE(z, r2);
> > 
> > then in executions where x == 0, herd7 doesn't see any control 
> > dependency.  But CPUs do see control dependencies whenever there is a 
> > conditional branch, whether the branch is taken or not, and so they will 
> > never reorder B before A.
> > 
> 
> Right, because B in this example is a write, what if B is a read that
> depends on r2, like in my example? Let y be a pointer to a memory
> location, and initialized as a valid value (pointing to a valid memory
> location) you example changed to:
> 
>   A: r1 = READ_ONCE(x);
>   if (r1)
>   WRITE_ONCE(y, 5);
>   C: r2 = READ_ONCE(y);
>   B: r3 = READ_ONCE(*r2);
> 
> , then A don't have the control dependency to B, because A and B is
> read+read. So B can be ordered before A, right?

Yes, I think that's right: Both C and B can be executed before A.

> > One last thing to think about: My original assessment or Björn's problem 
> > wasn't right, because the dep in (dep ; rfi) doesn't include control 
> > dependencies.  Only data and address.  So I believe that the LKMM 
> 
> Ah, right. I was mising that part (ctrl is not in dep). So I guess my
> example is pointless for the question we are discussing here ;-(
> 
> > wouldn't consider A to be ordered before B in this example even if x 
> > was nonzero.
> 
> Yes, and similar to my example (changing B to a read).
> 
> I did try to run my example with herd, and got confused no matter I make
> dep; [Plain]; rfi as to-r (I got the same result telling me a reorder
> can happen). Now the reason is clear, because this is a ctrl; rfi not a
> dep; rfi.
> 
> Thanks so much for walking with me on this ;-)

You're welcome.  At this point, it looks like the only remaining 
question is whether to include (dep ; [Plain] ; rfi) in to-r.  This 
doesn't seem to be an urgent question.

Alan


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 01:09:16PM +0100, Sedat Dilek wrote:
> On Mon, Mar 1, 2021 at 4:53 PM Alan Stern  wrote:
> [ ... ]
> > You can use usbmon on bus 4 to record the USB traffic.  It may indicate
> > why the resets occur.
> >
> 
> Hi Alan,
> 
> I followed the instructions in [1].
> 
> root# modprobe -v usbmon
> 
> root# ls /sys/kernel/debug/usb/usbmon
> 0s  0u  1s  1t  1u  2s  2t  2u  3s  3t  3u  4s  4t  4u
> 
> root# cat /sys/kernel/debug/usb/usbmon/4u > /tmp/usbmon-log_4u.txt
> [ Ctrl+C ]
> 
> I recorded 13:03 - 13:04 (one minute).
> 
> So these xhci-resets should be included:
> 
> [Fri Mar  5 13:03:07 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:07 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:27 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:27 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:27 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:28 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> [Fri Mar  5 13:03:28 2021] usb 4-1: reset SuperSpeed Gen 1 USB device
> number 2 using xhci_hcd
> 
> The usbmon-log is attached.
> 
> Unsure how to interpret the log - the kernel-doc says `raw data`.
> How can I bring this into a human-readable format?
> Can you give me a hand?

Don't worry about trying to decode the output.  To me it looks like the 
drive crashes and needs to be reset at times when the computer sends it 
an ATA command.  (Not all ATA commands, but some.)  You can prevent this 
by setting the following module parameter for the usb-storage driver:

quirks=174c:55aa:t

where the two numbers are the Vendor and Product IDs for the external 
drive, and the 't' is a quirks flag saying not to use any ATA commands.  
If this module parameter fixes the problem, we can add a permanent quirk 
setting to the kernel.

Alan Stern


Re: [PATCH] tools/memory-model: Fix smp_mb__after_spinlock() spelling

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 11:28:23AM +0100, Björn Töpel wrote:
> From: Björn Töpel 
> 
> A misspelled invokation of git-grep, revealed that
---^

Smetimes my brain is a little slow...  Do you confirm that this is a 
joke?

Alan


Re: [PATCH 16/17] usb: common: add function to get interval expressed in us unit

2021-03-05 Thread Alan Stern
On Fri, Mar 05, 2021 at 05:02:54PM +0800, Chunfeng Yun wrote:
> Add a new function to convert bInterval into the time expressed
> in 1us unit.
> 
> Signed-off-by: Chunfeng Yun 
> ---

> --- a/drivers/usb/common/common.c
> +++ b/drivers/usb/common/common.c
> @@ -165,6 +165,39 @@ enum usb_dr_mode usb_get_dr_mode(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(usb_get_dr_mode);
>  
> +unsigned int usb_decode_interval(const struct usb_endpoint_descriptor *epd,
> +  enum usb_device_speed speed)
> +{
> + unsigned int interval = 0;
> +
> + switch (usb_endpoint_type(epd)) {
> + case USB_ENDPOINT_XFER_CONTROL:
> + /* uframes per NAK */
> + if (speed == USB_SPEED_HIGH)
> + interval = epd->bInterval;
> + break;
> + case USB_ENDPOINT_XFER_ISOC:
> + interval = 1 << (epd->bInterval - 1);
> + break;
> + case USB_ENDPOINT_XFER_BULK:
> + /* uframes per NAK */
> + if (speed == USB_SPEED_HIGH && usb_endpoint_dir_out(epd))
> + interval = epd->bInterval;
> + break;
> + case USB_ENDPOINT_XFER_INT:
> + if (speed >= USB_SPEED_HIGH)
> + interval = 1 << (epd->bInterval - 1);
> + else
> + interval = epd->bInterval;
> + break;
> + }
> +
> + interval *= (speed >= USB_SPEED_HIGH) ? 125 : 1000;
> +
> + return interval;
> +}
> +EXPORT_SYMBOL_GPL(usb_decode_interval);

> --- a/include/linux/usb/ch9.h
> +++ b/include/linux/usb/ch9.h
> @@ -90,6 +90,17 @@ extern enum usb_ssp_rate usb_get_maximum_ssp_rate(struct 
> device *dev);
>   */
>  extern const char *usb_state_string(enum usb_device_state state);
>  
> +/**
> + * usb_decode_interval - Decode bInterval into the time expressed in 1us unit
> + * @epd: The descriptor of the endpoint
> + * @speed: The speed that the endpoint works as
> + *
> + * Function returns the interval expressed in 1us unit for servicing
> + * endpoint for data transfers.
> + */
> +unsigned int usb_decode_interval(const struct usb_endpoint_descriptor *epd,
> +  enum usb_device_speed speed);

As a general rule, I believe people expect to find the kerneldoc for a 
function next to the function's definition, not next to the declaration 
in a header file.

Alan Stern


Re: XDP socket rings, and LKMM litmus tests

2021-03-04 Thread Alan Stern
On Thu, Mar 04, 2021 at 11:05:15AM -0800, Paul E. McKenney wrote:
> On Thu, Mar 04, 2021 at 10:35:24AM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> > > > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> > 
> > > > > > >  And I cannot immediately think of a situation where
> > > > > > > this approach would break that would not result in a data race 
> > > > > > > being
> > > > > > > flagged.  Or is this yet another failure of my imagination?
> > > > > > 
> > > > > > By definition, an access to a local variable cannot participate in 
> > > > > > a 
> > > > > > data race because all such accesses are confined to a single thread.
> > > > > 
> > > > > True, but its value might have come from a load from a shared 
> > > > > variable.
> > > > 
> > > > Then that load could have participated in a data race.  But the store 
> > > > to 
> > > > the local variable cannot.
> > > 
> > > Agreed.  My thought was that if the ordering from the initial (non-local)
> > > load mattered, then that initial load must have participated in a
> > > data race.  Is that true, or am I failing to perceive some corner case?
> > 
> > Ordering can matter even when no data race is involved.  Just think
> > about how much of the memory model is concerned with ordering of
> > marked accesses, which don't participate in data races unless there is
> > a conflicting plain access somewhere.
> 
> Fair point.  Should I have instead said "then that initial load must
> have run concurrently with a store to that same variable"?

I'm losing track of the point you were originally trying to make.

Does ordering matter when there are no conflicting accesses?  Sure.  
Consider this:

A: r1 = READ_ONCE(x);
B: WRITE_ONCE(y, r1);
   smp_wmb();
C: WRITE_ONCE(z, 1);

Even if there are no other accesses to y at all (let alone any 
conflicting ones), the mere existence of B forces A to be ordered before 
C, and this is easily detectable by a litmus test.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-04 Thread Alan Stern
On Thu, Mar 04, 2021 at 02:33:32PM +0800, Boqun Feng wrote:

> Right, I was thinking about something unrelated.. but how about the
> following case:
> 
>   local_v = 
>   r1 = READ_ONCE(*x); // f
> 
>   if (r1 == 1) {
>   local_v =  // e
>   } else {
>   local_v =  // d
>   }
> 
>   p = READ_ONCE(local_v); // g
> 
>   r2 = READ_ONCE(*p);   // h
> 
> if r1 == 1, we definitely think we have:
> 
>   f ->ctrl e ->rfi g ->addr h
> 
> , and if we treat ctrl;rfi as "to-r", then we have "f" happens before
> "h". However compile can optimze the above as:
> 
>   local_v = 
> 
>   r1 = READ_ONCE(*x); // f
> 
>   if (r1 != 1) {
>   local_v =  // d
>   }
> 
>   p = READ_ONCE(local_v); // g
> 
>   r2 = READ_ONCE(*p);   // h
> 
> , and when this gets executed, I don't think we have the guarantee we
> have "f" happens before "h", because CPU can do optimistic read for "g"
> and "h".

In your example, which accesses are supposed to be to actual memory and 
which to registers?  Also, remember that the memory model assumes the 
hardware does not reorder loads if there is an address dependency 
between them.

> Part of this is because when we take plain access into consideration, we
> won't guarantee a read-from or other relations exists if compiler
> optimization happens.
> 
> Maybe I'm missing something subtle, but just try to think through the
> effect of making dep; rfi as "to-r".

Forget about local variables for the time being and just consider

dep ; [Plain] ; rfi

For example:

A: r1 = READ_ONCE(x);
   y = r1;
B: r2 = READ_ONCE(y);

Should B be ordered after A?  I don't see how any CPU could hope to 
excute B before A, but maybe I'm missing something.

There's another twist, connected with the fact that herd7 can't detect 
control dependencies caused by unexecuted code.  If we have:

A: r1 = READ_ONCE(x);
if (r1)
WRITE_ONCE(y, 5);
r2 = READ_ONCE(y);
B: WRITE_ONCE(z, r2);

then in executions where x == 0, herd7 doesn't see any control 
dependency.  But CPUs do see control dependencies whenever there is a 
conditional branch, whether the branch is taken or not, and so they will 
never reorder B before A.

One last thing to think about: My original assessment or Björn's problem 
wasn't right, because the dep in (dep ; rfi) doesn't include control 
dependencies.  Only data and address.  So I believe that the LKMM 
wouldn't consider A to be ordered before B in this example even if x 
was nonzero.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-04 Thread Alan Stern
On Wed, Mar 03, 2021 at 09:04:07PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 10:21:01PM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:

> > > > >  And I cannot immediately think of a situation where
> > > > > this approach would break that would not result in a data race being
> > > > > flagged.  Or is this yet another failure of my imagination?
> > > > 
> > > > By definition, an access to a local variable cannot participate in a 
> > > > data race because all such accesses are confined to a single thread.
> > > 
> > > True, but its value might have come from a load from a shared variable.
> > 
> > Then that load could have participated in a data race.  But the store to 
> > the local variable cannot.
> 
> Agreed.  My thought was that if the ordering from the initial (non-local)
> load mattered, then that initial load must have participated in a
> data race.  Is that true, or am I failing to perceive some corner case?

Ordering can matter even when no data race is involved.  Just think
about how much of the memory model is concerned with ordering of
marked accesses, which don't participate in data races unless there is
a conflicting plain access somewhere.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Wed, Mar 03, 2021 at 02:03:48PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:
> > On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> > > On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:
> > 
> > > > Local variables absolutely should be treated just like CPU registers, 
> > > > if 
> > > > possible.  In fact, the compiler has the option of keeping local 
> > > > variables stored in registers.
> > > > 
> > > > (Of course, things may get complicated if anyone writes a litmus test 
> > > > that uses a pointer to a local variable,  Especially if the pointer 
> > > > could hold the address of a local variable in one execution and a 
> > > > shared variable in another!  Or if the pointer is itself a shared 
> > > > variable and is dereferenced in another thread!)
> > > 
> > > Good point!  I did miss this complication.  ;-)
> > 
> > I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
> > local variables.
> > 
> > > As you say, when its address is taken, the "local" variable needs to be
> > > treated as is it were shared.  There are exceptions where the pointed-to
> > > local is still used only by its process.  Are any of these exceptions
> > > problematic?
> > 
> > Easiest just to rule out the whole can of worms.
> 
> Good point, given that a global can be used instead of a local for
> any case where an address must be taken.

Another thing to consider: Almost all marked accesses involve using the 
address of the storage location (for example, smp_load_acquire's first 
argument must be a pointer).  As far as I can remember at the moment, 
the only ones that don't are READ_ONCE and WRITE_ONCE.  So although we 
might or might not want to allow READ_ONCE or WRITE_ONCE on a local 
variable, we won't have to worry about any of the other kinds of marked 
accesses.

> > > > But even if local variables are treated as non-shared storage 
> > > > locations, 
> > > > we should still handle this correctly.  Part of the problem seems to 
> > > > lie 
> > > > in the definition of the to-r dependency relation; the relevant portion 
> > > > is:
> > > > 
> > > > (dep ; [Marked] ; rfi)
> > > > 
> > > > Here dep is the control dependency from the READ_ONCE to the 
> > > > local-variable store, and the rfi refers to the following load of the 
> > > > local variable.  The problem is that the store to the local variable 
> > > > doesn't go in the Marked class, because it is notated as a plain C 
> > > > assignment.  (And likewise for the following load.)
> > > > 
> > > > Should we change the model to make loads from and stores to local 
> > > > variables always count as Marked?
> > > 
> > > As long as the initial (possibly unmarked) load would be properly
> > > complained about.
> > 
> > Sorry, I don't understand what you mean.
> 
> I was thinking in terms of something like this in one of the processes:
> 
>   p = gp; // Unmarked!
>   r1 = p;
>   q = r1; // Implicitly marked now?
>   if (q)
>   WRITE_ONCE(x, 1); // ctrl dep from gp???

I hope we won't have to worry about this!  :-)  Treating local variable 
accesses as if they are always marked looks wrong.

> > >  And I cannot immediately think of a situation where
> > > this approach would break that would not result in a data race being
> > > flagged.  Or is this yet another failure of my imagination?
> > 
> > By definition, an access to a local variable cannot participate in a 
> > data race because all such accesses are confined to a single thread.
> 
> True, but its value might have come from a load from a shared variable.

Then that load could have participated in a data race.  But the store to 
the local variable cannot.

> > However, there are other aspects to consider, in particular, the 
> > ordering relations on local-variable accesses.  But if, as Luc says, 
> > local variables are treated just like registers then perhaps the issue 
> > doesn't arise.
> 
> Here is hoping!
> 
> > > > What should have happened if the local variable were instead a shared 
> > > > variable which the other thread didn't access at all?  It seems like a 
> > > > weak point of the memory model that it treats these two things 
> > > > differently.
> > > 
> > > But is this really any different than the

Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Thu, Mar 04, 2021 at 09:26:31AM +0800, Boqun Feng wrote:
> On Wed, Mar 03, 2021 at 03:22:46PM -0500, Alan Stern wrote:

> > Which brings us back to the case of the
> > 
> > dep ; rfi
> > 
> > dependency relation, where the accesses in the middle are plain and 
> > non-racy.  Should the LKMM be changed to allow this?
> > 
> 
> For this particular question, do we need to consider code as the follow?
> 
>   r1 = READ_ONCE(x);  // f
>   if (r == 1) {
>   local_v =  // g
>   do_something_a();
>   }
>   else {
>   local_v = 
>   do_something_b();
>   }
> 
>   r2 = READ_ONCE(*local_v); // e
> 
> , do we have the guarantee that the first READ_ONCE() happens before the
> second one? Can compiler optimize the code as:
> 
>   r2 = READ_ONCE(y);
>   r1 = READ_ONCE(x);

Well, it can't do that because the compiler isn't allowed to reorder
volatile accesses (which includes READ_ONCE).  But the compiler could
do:

r1 = READ_ONCE(x);
r2 = READ_ONCE(y);

>   if (r == 1) {
>   do_something_a();
>   }
>   else {
>   do_something_b();
>   }
> 
> ? Although we have:
> 
>   f ->dep g ->rfi ->addr e

This would be an example of a problem Paul has described on several
occasions, where both arms of an "if" statement store the same value
(in this case to local_v).  This problem arises even when local
variables are not involved.  For example:

if (READ_ONCE(x) == 0) {
WRITE_ONCE(y, 1);
do_a();
} else {
WRITE_ONCE(y, 1);
do_b();
}

The compiler can change this to:

r = READ_ONCE(x);
WRITE_ONCE(y, 1);
if (r == 0)
do_a();
else
do_b();

thus allowing the marked accesses to be reordered by the CPU and
breaking the apparent control dependency.

So the answer to your question is: No, we don't have this guarantee,
but the reason is because of doing the same store in both arms, not
because of the use of local variables.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Wed, Mar 03, 2021 at 06:37:36PM +0100, maranget wrote:
> 
> 
> > On 3 Mar 2021, at 18:12, Alan Stern  wrote:
> > 
> > On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote:
> >> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:
> > 
> >>> This result is wrong, apparently because of a bug in herd7.  There 
> >>> should be control dependencies from each of the two loads in P0 to each 
> >>> of the two stores, but herd7 doesn't detect them.
> >>> 
> >>> Maybe Luc can find some time to check whether this really is a bug and 
> >>> if it is, fix it.
> >> 
> >> I agree that herd7's control dependency tracking could be improved.
> >> 
> >> But sadly, it is currently doing exactly what I asked Luc to make it do,
> >> which is to confine the control dependency to its "if" statement.  But as
> >> usual I wasn't thinking globally enough.  And I am not exactly sure what
> >> to ask for.  Here a store to a local was control-dependency ordered after
> >> a read, and so that should propagate to a read from that local variable.
> >> Maybe treat local variables as if they were registers, so that from
> >> herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> >> chains in multiple "if" statements?
> >> 
> >> Thoughts?
> > 
> > Local variables absolutely should be treated just like CPU registers, if 
> > possible.  In fact, the compiler has the option of keeping local 
> > variables stored in registers.
> > 
> 
> And indeed local variables are treated as registers by herd7.
> 
> 
> > (Of course, things may get complicated if anyone writes a litmus test 
> > that uses a pointer to a local variable,  Especially if the pointer 
> > could hold the address of a local variable in one execution and a 
> > shared variable in another!  Or if the pointer is itself a shared 
> > variable and is dereferenced in another thread!)
> > 
> > But even if local variables are treated as non-shared storage locations, 
> > we should still handle this correctly.  Part of the problem seems to lie 
> > in the definition of the to-r dependency relation; the relevant portion 
> > is:
> 
> In fact, I’d rather change the computation of “dep” here control-dependency 
> “ctrl”. Notice that “ctrl” is computed by herd7 and present in the initial 
> environment of the Cat interpreter.
> 
> I have made a PR to herd7 that performs the change. The commit message states 
> the new definition.

Shouldn't similar reasoning apply to data and address dependencies?

For example, suppose there is a control dependency from a load to a 
register variable, and then a data dependency from the register variable 
to a store.  This should be treated as an overall data dependency from 
the load to the store.

Does your change to herd7 do this?  I couldn't tell from the description 
in the PR.

Also, do you think it's reasonable to add a restriction to herd7 against 
taking the address of a local variable?

> > (dep ; [Marked] ; rfi)
> > 
> > Here dep is the control dependency from the READ_ONCE to the 
> > local-variable store, and the rfi refers to the following load of the 
> > local variable.  The problem is that the store to the local variable 
> > doesn't go in the Marked class, because it is notated as a plain C 
> > assignment.  (And likewise for the following load.)
> > 
> This is a related issue, I am not sure, but perhaps it can be formulated as
> "should rfi and rf on registers behave the  same?”

Aren't they already the same thing?  It's not possible to have an rfe 
from a register, is it?

Alan

> > Should we change the model to make loads from and stores to local 
> > variables always count as Marked?
> > 
> > What should have happened if the local variable were instead a shared 
> > variable which the other thread didn't access at all?  It seems like a 
> > weak point of the memory model that it treats these two things 
> > differently.
> > 
> > Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Wed, Mar 03, 2021 at 09:40:22AM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 12:12:21PM -0500, Alan Stern wrote:

> > Local variables absolutely should be treated just like CPU registers, if 
> > possible.  In fact, the compiler has the option of keeping local 
> > variables stored in registers.
> > 
> > (Of course, things may get complicated if anyone writes a litmus test 
> > that uses a pointer to a local variable,  Especially if the pointer 
> > could hold the address of a local variable in one execution and a 
> > shared variable in another!  Or if the pointer is itself a shared 
> > variable and is dereferenced in another thread!)
> 
> Good point!  I did miss this complication.  ;-)

I suspect it wouldn't be so bad if herd7 disallowed taking addresses of 
local variables.

> As you say, when its address is taken, the "local" variable needs to be
> treated as is it were shared.  There are exceptions where the pointed-to
> local is still used only by its process.  Are any of these exceptions
> problematic?

Easiest just to rule out the whole can of worms.

> > But even if local variables are treated as non-shared storage locations, 
> > we should still handle this correctly.  Part of the problem seems to lie 
> > in the definition of the to-r dependency relation; the relevant portion 
> > is:
> > 
> > (dep ; [Marked] ; rfi)
> > 
> > Here dep is the control dependency from the READ_ONCE to the 
> > local-variable store, and the rfi refers to the following load of the 
> > local variable.  The problem is that the store to the local variable 
> > doesn't go in the Marked class, because it is notated as a plain C 
> > assignment.  (And likewise for the following load.)
> > 
> > Should we change the model to make loads from and stores to local 
> > variables always count as Marked?
> 
> As long as the initial (possibly unmarked) load would be properly
> complained about.

Sorry, I don't understand what you mean.

>  And I cannot immediately think of a situation where
> this approach would break that would not result in a data race being
> flagged.  Or is this yet another failure of my imagination?

By definition, an access to a local variable cannot participate in a 
data race because all such accesses are confined to a single thread.

However, there are other aspects to consider, in particular, the 
ordering relations on local-variable accesses.  But if, as Luc says, 
local variables are treated just like registers then perhaps the issue 
doesn't arise.

> > What should have happened if the local variable were instead a shared 
> > variable which the other thread didn't access at all?  It seems like a 
> > weak point of the memory model that it treats these two things 
> > differently.
> 
> But is this really any different than the situation where a global
> variable is only accessed by a single thread?

Indeed; it is the _same_ situation.  Which leads to some interesting 
questions, such as: What does READ_ONCE(r) mean when r is a local 
variable?  Should it be allowed at all?  In what way is it different 
from a plain read of r?

One difference is that the LKMM doesn't allow dependencies to originate 
from a plain load.  Of course, when you're dealing with a local 
variable, what matters is not the load from that variable but rather the 
earlier loads which determined the value that had been stored there.  
Which brings us back to the case of the

dep ; rfi

dependency relation, where the accesses in the middle are plain and 
non-racy.  Should the LKMM be changed to allow this?

There are other differences to consider.  For example:

r = READ_ONCE(x);
smp_wmb();
WRITE_ONCE(y, 1);

If the write to r were treated as a marked store, the smp_wmb would 
order it (and consequently the READ_ONCE) before the WRITE_ONCE.  
However we don't want to do this when r is a local variable.  Indeed, a 
plain store wouldn't be ordered this way because the compiler might 
optimize the store away entirely, leaving the smp_wmb nothing to act on.

So overall the situation is rather puzzling.  Treating local variables 
as registers is probably the best answer.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-03 Thread Alan Stern
On Tue, Mar 02, 2021 at 03:50:19PM -0800, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 04:14:46PM -0500, Alan Stern wrote:

> > This result is wrong, apparently because of a bug in herd7.  There 
> > should be control dependencies from each of the two loads in P0 to each 
> > of the two stores, but herd7 doesn't detect them.
> > 
> > Maybe Luc can find some time to check whether this really is a bug and 
> > if it is, fix it.
> 
> I agree that herd7's control dependency tracking could be improved.
> 
> But sadly, it is currently doing exactly what I asked Luc to make it do,
> which is to confine the control dependency to its "if" statement.  But as
> usual I wasn't thinking globally enough.  And I am not exactly sure what
> to ask for.  Here a store to a local was control-dependency ordered after
> a read, and so that should propagate to a read from that local variable.
> Maybe treat local variables as if they were registers, so that from
> herd7's viewpoint the READ_ONCE()s are able to head control-dependency
> chains in multiple "if" statements?
> 
> Thoughts?

Local variables absolutely should be treated just like CPU registers, if 
possible.  In fact, the compiler has the option of keeping local 
variables stored in registers.

(Of course, things may get complicated if anyone writes a litmus test 
that uses a pointer to a local variable,  Especially if the pointer 
could hold the address of a local variable in one execution and a 
shared variable in another!  Or if the pointer is itself a shared 
variable and is dereferenced in another thread!)

But even if local variables are treated as non-shared storage locations, 
we should still handle this correctly.  Part of the problem seems to lie 
in the definition of the to-r dependency relation; the relevant portion 
is:

(dep ; [Marked] ; rfi)

Here dep is the control dependency from the READ_ONCE to the 
local-variable store, and the rfi refers to the following load of the 
local variable.  The problem is that the store to the local variable 
doesn't go in the Marked class, because it is notated as a plain C 
assignment.  (And likewise for the following load.)

Should we change the model to make loads from and stores to local 
variables always count as Marked?

What should have happened if the local variable were instead a shared 
variable which the other thread didn't access at all?  It seems like a 
weak point of the memory model that it treats these two things 
differently.

Alan


Re: XDP socket rings, and LKMM litmus tests

2021-03-02 Thread Alan Stern
p_mb(), and also
> the circular buffer in circular-buffers.txt (pre commit 6c43c091bdc5
> ("documentation: Update circular buffer for
> load-acquire/store-release")) is missing the smp_mb() at the
> producer-side.
> 
> I'm trying to wrap my head around why it's OK to remove the smp_mb()
> in the cases above? I'm worried that the current XDP socket ring
> implementation (which is missing smp_mb()) might be broken.

Because of the control dependencies, the smp_mb isn't needed.  The 
dependencies will order both of the stores after both of the loads.

Alan Stern


Re: [PATCH 2/2] usb-storage: revert from scsi_add_host_with_dma() to scsi_add_host()

2021-03-01 Thread Alan Stern
On Fri, Feb 26, 2021 at 06:53:52AM +0100, Christoph Hellwig wrote:
> On Thu, Feb 25, 2021 at 11:35:57AM -0500, Alan Stern wrote:
> > This thread seems to have fallen through the cracks.  Maybe now would be 
> > a good time to address the problem (since we originally planned to fix 
> > it in 5.11!).
> > 
> > The questions listed below are pretty self-contained, although the rest
> > of the discussion isn't.  But I never received any answers.
> 
> usb-storage must use scsi_add_host_with_dma to use the right device
> for DMA mapping and parameters.  The calls to set the DMA options
> on the device are needed so that IOMMU merging doesn't change the
> imposed requirements.  If these requirements slow you down you need
> to relax them, as apparently the hardware is able to handle bigger
> limits.

Hans, don't you have the right equipment to test this approach?

Alan Stern


Re: [xhci] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd

2021-03-01 Thread Alan Stern
On Mon, Mar 01, 2021 at 09:54:38AM +0100, Sedat Dilek wrote:
> On Wed, Feb 24, 2021 at 6:25 PM Sedat Dilek  wrote:
> >
> > On Wed, Feb 24, 2021 at 2:44 PM Sedat Dilek  wrote:
> > >
> > > Hi Mathias,
> > >
> > > I am here on Linux-v5.11-10201-gc03c21ba6f4e.
> > >
> > > I see a lot xhci-resets in my dmesg-log:
> > >
> > > root# LC_ALL=C dmesg -T | grep 'usb 4-1: reset SuperSpeed Gen 1 USB
> > > device number 2 using xhci_hcd' | wc -l
> > > 75
> > >
> > > This is what I have:
> > >
> > > root# lsusb -s 004:001
> > > Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
> > >
> > > root# lsusb -s 004:002
> > > Bus 004 Device 002: ID 174c:55aa ASMedia Technology Inc. ASM1051E SATA
> > > 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge,
> > > ASM1153E SATA 6Gb/s bridge
> > >
> > > My external USB 3.0 HDD contains the partition with my Debian-system
> > > and is attached to the above xhci bus/device.
> > >
> > > Can you enlighten what this means?
> > > Is this a known issue?
> > > Is there a fix around?
> > >
> > > BTW, in which Git tree is the xhci development happening?
> > > Can you point me to it?
> > >
> > > I am attaching my linux-config and full dmesg-log.
> > >
> > > Also I have attached outputs of:
> > >
> > > $ sudo lsusb -vvv -d 1d6b:0003
> > > $ sudo lsusb -vvv -d 174c:55aa
> > >
> > > If you need further information, please let me know.
> > >
> > > Thanks.
> > >
> >
> > Looks like that xhci-reset happens here every 10min.
> >
> 
> [ To Greg ]
> 
> The problem still remains with Linux v5.12-rc1 (see [1]).
> 
> Yesterday, I ran some disk-health checks with smartctl and gsmartcontrol.
> All good.
> 
> For the first time I used badblocks from e2fsprogs Debian package:
> 
> root# LC_ALL=C badblocks -v -p 1 -s /dev/sdc -o
> badblocks-v-p-1-s_dev-sdc_$(uname -r).txt
> Checking blocks 0 to 976762583
> Checking for bad blocks (read-only test): done
> Pass completed, 0 bad blocks found. (0/0/0 errors)
> 
> Good, there is no file-system corruption or badblocks or even a hardware 
> damage.
> 
> Anyway, feedback is much appreciated.
> 
> Thanks.

You can use usbmon on bus 4 to record the USB traffic.  It may indicate 
why the resets occur.

Alan Stern


Re: [RFC PATCH] USB:XHCI:Modify XHCI driver for USB2.0 controller

2021-02-27 Thread Alan Stern
On Sat, Feb 27, 2021 at 11:38:08AM +0800, liulongfang wrote:
> On 2021/2/27 0:30, Alan Stern wrote:
> > On Fri, Feb 26, 2021 at 04:21:37PM +0800, Longfang Liu wrote:
> >> Our current XHCI hardware controller has been customized to only
> >> support USB 2.0 ports. When using the current xhci driver, an xhci
> >> controller device and an ehci controller device will be created
> >> automatically.
> > 
> > That sentence makes no sense at all.  An EHCI controller device is a 
> > piece of hardware.  How can an xHCI driver, which is a piece of 
> > software, create a piece of hardware?
> > 
> > Alan Stern
> > .
> > 
> The hardware device is a complete USB3.0 controller,
> but I hope to support a USB2.0-only mode through software configuration.

Even if it only supports USB-2.0 connections, an xHCI controller is 
still an xHCI controller.  It doesn't magically transform into an EHCI 
controller.

You are not creating an EHCI controller device.  Rather, you are trying 
to restrict an xHCI controller device to make it handle only USB-2.0 
connections.  If you run lsusb on a system that has an xHCI controller, 
you'll see that the controller is bound to two USB buses: a USB-2 bus 
and a USB-3 bus.  But for both buses, the controller is xHCI -- not 
EHCI.

Your patch description is inaccurate.

Alan Stern


Re: [RFC PATCH] USB:XHCI:Modify XHCI driver for USB2.0 controller

2021-02-26 Thread Alan Stern
On Fri, Feb 26, 2021 at 04:21:37PM +0800, Longfang Liu wrote:
> Our current XHCI hardware controller has been customized to only
> support USB 2.0 ports. When using the current xhci driver, an xhci
> controller device and an ehci controller device will be created
> automatically.

That sentence makes no sense at all.  An EHCI controller device is a 
piece of hardware.  How can an xHCI driver, which is a piece of 
software, create a piece of hardware?

Alan Stern


Re: [PATCH] drivers/hid: fix for the big hid report length

2021-02-26 Thread Alan Stern
On Fri, Feb 26, 2021 at 02:13:36PM +0600, Sabyrzhan Tasbolatov wrote:
> On Thu, 25 Feb 2021 10:59:14 -0500, Alan Stern wrote:
> > Won't this cause silent errors?
> 
> Agree. But there are already such as cases like in:
> 
> // net/bluetooth/hidp/core.c
> static void hidp_process_report(..)
> {
>   ..
>   if (len > HID_MAX_BUFFER_SIZE)
>   len = HID_MAX_BUFFER_SIZE;
>   ..
> 
> // drivers/hid/hid-core.c
> int hid_report_raw_event(..)
> {
>   ..
>   rsize = hid_compute_report_size(report);
> 
>   if (report_enum->numbered && rsize >= HID_MAX_BUFFER_SIZE)
>   rsize = HID_MAX_BUFFER_SIZE - 1;
>   else if (rsize > HID_MAX_BUFFER_SIZE)
>   rsize = HID_MAX_BUFFER_SIZE;
>   ..
> 
> // drivers/staging/greybus/hid.c
> static int gb_hid_start(..)
> {
>   ..
>   if (bufsize > HID_MAX_BUFFER_SIZE)
>   bufsize = HID_MAX_BUFFER_SIZE;
>   ..
> 
> > How about instead just rejecting any device which includes a report 
> > whose length is too big (along with an line in the system log explaining 
> > what's wrong)?
> 
> Not everywhere, but there are already such as logs when > HID_MAX_BUFFER_SIZE
> 
> // drivers/hid/hidraw.c
> static ssize_t hidraw_send_report(..)
> {
>   ..
>   if (count > HID_MAX_BUFFER_SIZE) {
>   hid_warn(dev, "pid %d passed too large report\n",
>task_pid_nr(current));
>   ret = -EINVAL;
>   goto out;
>   }
> 
> 
> I've just noticed that hid_compute_report_size() doing the same thing as
> hid_report_len(). So I will replace it with latter one with length check.
> 
> So in [PATCH v2] I will do following:
> 
>  1. replace hid_compute_report_size() with hid_report_len()
> 
>  2. in hid_report_len() we can hid_warn() if length > HID_MAX_BUFFER_SIZE,
> and return HID_MAX_BUFFER_SIZE. Or we can return 0 in hid_report_len() to let
> functions like hid_hw_raw_request(), hid_hw_output_report() to validate
> invalid report length and return -EINVAL. Though I'll need to add !length
> missing checks in other places.
> 
> Please let me know what it's preferred way in 2nd step.

It's been too long since I worked on this stuff; you should check with 
the maintainers.

Another thing to consider: There probably are devices with multiple 
reports, where one of the reports is too big but people only want to use 
the other, smaller reports.  For situations like that, we don't want to 
reject the entire device.

I don't know if parsing a partiall part is a good thing to do.

Alan Stern


Re: [PATCH 2/2] usb-storage: revert from scsi_add_host_with_dma() to scsi_add_host()

2021-02-25 Thread Alan Stern
This thread seems to have fallen through the cracks.  Maybe now would be 
a good time to address the problem (since we originally planned to fix 
it in 5.11!).

The questions listed below are pretty self-contained, although the rest
of the discussion isn't.  But I never received any answers.

Alan Stern

On Mon, Nov 30, 2020 at 03:36:18PM -0500, Alan Stern wrote:
> [Added linux-scsi to CC: list.  When discussing code in a particular 
> subsystem, it's a good idea to include that subsystem's mailing list in 
> the CC:.]
> 
> On Tue, Dec 01, 2020 at 03:01:56AM +0800, Tom Yan wrote:
> > For the record,
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/scsi/scsi_host.h?h=v5.10-rc6#n753
> > 
> > On Tue, 1 Dec 2020 at 02:57, Tom Yan  wrote:
> > >
> > > This maybe? 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/scsi_lib.c?h=v5.10-rc6#n1816
> > >
> > > UAS:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/storage/uas.c?h=v5.10-rc6#n918
> > > BOT (AFAICT):
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/hosts.c?h=v5.10-rc6#n466
> > >
> > > It would explain why the issue is only triggered with UAS drives.
> 
> In brief, a recent change -- calling scsi_add_host_with_dma rather than 
> scsi_add_host -- in the USB uas driver has caused a regression in 
> performance.  (Note that the shost->dma_dev value is set differently as 
> a result of this change.)  Hans has determined that the problem seems 
> to be related to permanent changes in the dma_dev's settings caused by 
> scsi_add_host_with_dma.
> 
> Tom pointed out that __scsi_init_queue contains a couple of questionable 
> assignments:
> 
>   dma_set_seg_boundary(dev, shost->dma_boundary);
> 
> and
> 
>   dma_set_max_seg_size(dev, queue_max_segment_size(q));
> 
> where dev = shost->dma_dev -- in this case, a USB host controller.
> 
> So an important question is why decisions related to a particular SCSI 
> host should affect the DMA settings of a device somewhere else in the 
> heirarchy?  Sure, the properties of the USB controller should constrain 
> the settings available to the SCSI host, but there doesn't seem to be 
> any good reason for restrictions to go in the other direction.
> 
> Doesn't the way we handle DMA permit a child device to impose additional 
> restrictions (such as a smaller max segment size) beyond those of the 
> parent device which actually performs the DMA transfer?
> 
> > > The questions (from me) are:
> > > 1. From the scsi layer POV (as per what __scsi_init_queue() does),
> > > what/which should we use as dma_dev?
> 
> We should be using the USB host controller, because it is the device 
> which actually performs the DMA transfers.
> 
> > > 2. Do we really need to set dma_boundary in the UAS host template (to
> > > PAGE_SIZE - 1)?
> 
> I don't know.  But in theory it should be possible to have settings 
> (like this one) which affect only the transfers carried out by the SCSI 
> host, not the transfers carred out by other drivers which might use the 
> same USB controller.
> 
> > > 3. Kind of the same question as #1: when we clamp hw_max_sectors to
> > > dma max mapping size, should the size actually be "the smaller one
> > > among dev and sysdev"? Or is one of the two sizes *always* the smaller
> > > one?
> 
> I assume you're referring to code in the uas driver.  There the value of 
> dev is meaningless as far as DMA is concerned.  Only sysdev matters.
> 
> Alan Stern
> 
> > > On Tue, 1 Dec 2020 at 02:19, Hans de Goede  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 11/30/20 6:20 PM, Alan Stern wrote:
> > > > > On Mon, Nov 30, 2020 at 02:36:38PM +0100, Hans de Goede wrote:
> > > > >> Hi,
> > > > >>
> > > > >> On 11/30/20 2:30 PM, Greg KH wrote:
> > > > >>> On Mon, Nov 30, 2020 at 02:23:48PM +0100, Hans de Goede wrote:
> > > > >>>> Hi,
> > > > >>>>
> > > > >>>> On 11/30/20 1:58 PM, Tom Yan wrote:
> > > > >>>>> It's merely a moving of comment moving for/and a 
> > > > >>>>> no-behavioral-change
> > > > >>>>> adaptation for the reversion.>
> > > > >>>>
> > > > >>>> IMHO the revert of the troublesome commit and the other/new 
> > > > >>>> chang

Re: [PATCH] drivers/hid: fix for the big hid report length

2021-02-25 Thread Alan Stern
On Thu, Feb 25, 2021 at 08:52:15PM +0600, Sabyrzhan Tasbolatov wrote:
> syzbot found WARNING in hid_alloc_report_buf[1] when the raw buffer for
> report is kmalloc() allocated with length > KMALLOC_MAX_SIZE, causing
> order >= MAX_ORDER condition:
> 
> u8 *hid_alloc_report_buf(struct hid_report *report, gfp_t flags)
> {
>   /*
>* 7 extra bytes are necessary to achieve proper functionality
>* of implement() working on 8 byte chunks
>*/
> 
>   u32 len = hid_report_len(report) + 7;
> 
>   return kmalloc(len, flags);
> 
> The restriction with HID_MAX_BUFFER_SIZE (16kb) is, seems, a valid max
> limit. I've come up with this in all hid_report_len() xrefs.
> 
> The fix inside hid_report_len(), not in *hid_alloc_report_buf() is also
> fixing out-of-bounds here in memcpy():
> 
> statc int hid_submit_ctrl(..)
> {
> ..
>   len = hid_report_len(report);
>   if (dir == USB_DIR_OUT) {
>   ..
>   if (raw_report) {
>   memcpy(usbhid->ctrlbuf, raw_report, len);
> ..
> 
> So I've decided to return HID_MAX_BUFFER_SIZE if it the report length is
> bigger than limit, otherwise the return the report length.
> 
> [1]
> Call Trace:
>  alloc_pages include/linux/gfp.h:547 [inline]
>  kmalloc_order+0x40/0x130 mm/slab_common.c:837
>  kmalloc_order_trace+0x15/0x70 mm/slab_common.c:853
>  kmalloc_large include/linux/slab.h:481 [inline]
>  __kmalloc+0x257/0x330 mm/slub.c:3974
>  kmalloc include/linux/slab.h:557 [inline]
>  hid_alloc_report_buf+0x70/0xa0 drivers/hid/hid-core.c:1648
>  __usbhid_submit_report drivers/hid/usbhid/hid-core.c:590 [inline]
> 
> Reported-by: syzbot+ab02336a647181a88...@syzkaller.appspotmail.com
> Signed-off-by: Sabyrzhan Tasbolatov 
> ---
>  drivers/hid/usbhid/hid-core.c | 2 +-
>  include/linux/hid.h   | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
> index 86257ce6d619..4e9077363c96 100644
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -374,7 +374,7 @@ static int hid_submit_ctrl(struct hid_device *hid)
>   raw_report = usbhid->ctrl[usbhid->ctrltail].raw_report;
>   dir = usbhid->ctrl[usbhid->ctrltail].dir;
>  
> - len = ((report->size - 1) >> 3) + 1 + (report->id > 0);
> + len = hid_report_len(report);
>   if (dir == USB_DIR_OUT) {
>   usbhid->urbctrl->pipe = usb_sndctrlpipe(hid_to_usb_dev(hid), 0);
>   usbhid->urbctrl->transfer_buffer_length = len;
> diff --git a/include/linux/hid.h b/include/linux/hid.h
> index c39d71eb1fd0..509a6ffdca00 100644
> --- a/include/linux/hid.h
> +++ b/include/linux/hid.h
> @@ -1156,7 +1156,9 @@ static inline void hid_hw_wait(struct hid_device *hdev)
>  static inline u32 hid_report_len(struct hid_report *report)
>  {
>   /* equivalent to DIV_ROUND_UP(report->size, 8) + !!(report->id > 0) */
> - return ((report->size - 1) >> 3) + 1 + (report->id > 0);
> + u32 len = ((report->size - 1) >> 3) + 1 + (report->id > 0);
> +
> + return len > HID_MAX_BUFFER_SIZE ? HID_MAX_BUFFER_SIZE : len;

Won't this cause silent errors?

How about instead just rejecting any device which includes a report 
whose length is too big (along with an line in the system log explaining 
what's wrong)?

Alan Stern


Re: [PATCH v2 2/2] usb: host: ehci-platform: add ignore_oc DT support

2021-02-23 Thread Alan Stern
On Tue, Feb 23, 2021 at 05:16:44PM +0100, Álvaro Fernández Rojas wrote:
> Over-current reporting isn't supported on some platforms such as bcm63xx.
> These devices will incorrectly report over-current if this flag isn't properly
> activated.
> 
> Signed-off-by: Álvaro Fernández Rojas 
> ---
>  v2: change flag name and improve documentation as suggested by Alan Stern.
> 
>  drivers/usb/host/ehci-platform.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/usb/host/ehci-platform.c 
> b/drivers/usb/host/ehci-platform.c
> index a48dd3fac153..2c587e31d010 100644
> --- a/drivers/usb/host/ehci-platform.c
> +++ b/drivers/usb/host/ehci-platform.c
> @@ -286,6 +286,9 @@ static int ehci_platform_probe(struct platform_device 
> *dev)
>   if (of_property_read_bool(dev->dev.of_node, "big-endian"))
>   ehci->big_endian_mmio = ehci->big_endian_desc = 1;
>  
> + if (of_property_read_bool(dev->dev.of_node, "spurious-oc"))
> + ehci->ignore_oc = 1;
> +
>   if (of_property_read_bool(dev->dev.of_node,
> "needs-reset-on-resume"))
>   priv->reset_on_resume = true;

Acked-by: Alan Stern 


Re: [PATCH 1/2] dt-bindings: usb: generic-ehci: document ignore-oc flag

2021-02-23 Thread Alan Stern
On Tue, Feb 23, 2021 at 05:04:57PM +0100, Álvaro Fernández Rojas wrote:
> Hi Alan,
> 
> > El 23 feb 2021, a las 16:54, Alan Stern  
> > escribió:
> > 
> > On Tue, Feb 23, 2021 at 04:50:04PM +0100, Álvaro Fernández Rojas wrote:
> >> Over-current reporting isn't supported on some platforms such as bcm63xx.
> >> These devices will incorrectly report over-current if this flag isn't 
> >> properly
> >> activated.
> >> 
> >> Signed-off-by: Álvaro Fernández Rojas 
> >> ---
> >> Documentation/devicetree/bindings/usb/generic-ehci.yaml | 5 +
> >> 1 file changed, 5 insertions(+)
> >> 
> >> diff --git a/Documentation/devicetree/bindings/usb/generic-ehci.yaml 
> >> b/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> >> index cf83f2d9afac..294bbf02399e 100644
> >> --- a/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> >> +++ b/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> >> @@ -117,6 +117,11 @@ properties:
> >>   Set this flag if EHCI has a Transaction Translator built into
> >>   the root hub.
> >> 
> >> +  ignore-oc:
> >> +$ref: /schemas/types.yaml#/definitions/flag
> >> +description:
> >> +  Set this flag for HCDs without over-current reporting support.
> > 
> > This is not a good description of a device property.  DT entries are 
> > supposed to described the hardware, not talk about how to use it.
> 
> Any suggestions on a proper description?
> 
> > 
> > When you say that the bcm63xx doesn't support over-current reporting, 
> > what exactly do you mean?  Do you mean that sometimes the hardware turns 
> > on the over-current bit when an over-current isn't actually present?  Or 
> > do you mean something else?
> 
> Yes, the hardware turns on the over-current bit with no over-current present.

Okay, in that case the property should be named something like 
"spurious_oc", and the description should say something like:

Set this flag to indicate that the hardware sometimes turns on 
the OC bit when an over-current isn't actually present.

Alan Stern


Re: [PATCH 1/2] dt-bindings: usb: generic-ehci: document ignore-oc flag

2021-02-23 Thread Alan Stern
On Tue, Feb 23, 2021 at 04:50:04PM +0100, Álvaro Fernández Rojas wrote:
> Over-current reporting isn't supported on some platforms such as bcm63xx.
> These devices will incorrectly report over-current if this flag isn't properly
> activated.
> 
> Signed-off-by: Álvaro Fernández Rojas 
> ---
>  Documentation/devicetree/bindings/usb/generic-ehci.yaml | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/usb/generic-ehci.yaml 
> b/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> index cf83f2d9afac..294bbf02399e 100644
> --- a/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> +++ b/Documentation/devicetree/bindings/usb/generic-ehci.yaml
> @@ -117,6 +117,11 @@ properties:
>Set this flag if EHCI has a Transaction Translator built into
>the root hub.
>  
> +  ignore-oc:
> +$ref: /schemas/types.yaml#/definitions/flag
> +description:
> +  Set this flag for HCDs without over-current reporting support.

This is not a good description of a device property.  DT entries are 
supposed to described the hardware, not talk about how to use it.

When you say that the bcm63xx doesn't support over-current reporting, 
what exactly do you mean?  Do you mean that sometimes the hardware turns 
on the over-current bit when an over-current isn't actually present?  Or 
do you mean something else?

Alan Stern

> +
>needs-reset-on-resume:
>  $ref: /schemas/types.yaml#/definitions/flag
>  description:
> -- 
> 2.20.1
> 


Re: usb: dwc3: gadget: Change runtime pm function for DWC3 runtime suspend

2021-02-16 Thread Alan Stern
On Tue, Feb 16, 2021 at 10:30:52AM +0900, Jung Daehwan wrote:
> Hello, Alan
> 
> On Mon, Feb 15, 2021 at 12:41:45PM -0500, Alan Stern wrote:
> > On Mon, Feb 15, 2021 at 11:38:58AM +0900, Daehwan Jung wrote:
> > > It seems pm_runtime_put calls runtime_idle callback not runtime_suspend 
> > > callback.
> > 
> > How is this fact related to your patch?
> 
> I think we should cause dwc3_runtime_suspend at the time.

Why do you think so?

> That's why I use pm_runtime_put_sync_suspend.
> 
> > 
> > > It's better to use pm_runtime_put_sync_suspend to allow DWC3 runtime 
> > > suspend.
> > 
> > Why do you think it is better?  The advantage of pm_runtime_put is that 
> > it allows the suspend to occur at a later time in a workqueue thread, so 
> > the caller doesn't have to wait for the device to go into suspend.
> > 
> 
> We can assume DWC3 was already in suspend state if pm_runtime_get_sync
> returns 0. DWC3 resumes due to pm_rumtime_get_sync but it doesn't
> re-enter runtime_suspend but runtime_idle. pm_runtime_put decreases
> usage_count but doesn't cause runtime_suspend.
> 
> 1. USB disconnected
> 2. UDC unbinded
> 3. DWC3 runtime suspend
> 4. UDC binded unexpectedly
> 5. DWC3 runtime resume (pm_runtime_get_sync)
> 6. DWC3 runtime idle (pm_runtime_put)
>-> DWC3 runtime suspend again (pm_runtime_put_sync_suspend)

That's what happens with your patch.  Now look at what happens without 
the patch:

1. USB disconnected
2. UDC unbound
3. DWC3 suspend request is added to waitqueue
4. UDC bound unexpectedly
5. DWC3 suspend request is removed from waitqueue
6. DWC3 runtime idle
7. DWC3 runtime suspend

The difference is that this way, we avoid doing an unnecessary suspend 
and resume, and we save the time they would have required.

> I've talked with Wesley in other patch.
> 
> usbb: dwc3: gadget: skip pullup and set_speed after suspend
> patchwork.kernel.org/project/linux-usb/patch/163968-102424-1-git-send-email-dh10.j...@samsung.com
> 
> @ Wesley
> 
> I think We should guarantee DWC3 enters suspend again at the time.

Why do you think we should guarantee this?

Alan Stern


Re: usb: dwc3: gadget: Change runtime pm function for DWC3 runtime suspend

2021-02-15 Thread Alan Stern
On Mon, Feb 15, 2021 at 11:38:58AM +0900, Daehwan Jung wrote:
> It seems pm_runtime_put calls runtime_idle callback not runtime_suspend 
> callback.

How is this fact related to your patch?

> It's better to use pm_runtime_put_sync_suspend to allow DWC3 runtime suspend.

Why do you think it is better?  The advantage of pm_runtime_put is that 
it allows the suspend to occur at a later time in a workqueue thread, so 
the caller doesn't have to wait for the device to go into suspend.

Alan Stern

> Signed-off-by: Daehwan Jung 
> ---
>  drivers/usb/dwc3/gadget.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index aebcf8e..4a4b93b 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2229,7 +2229,7 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int 
> is_on)
>*/
>   ret = pm_runtime_get_sync(dwc->dev);
>   if (!ret || ret < 0) {
> - pm_runtime_put(dwc->dev);
> + pm_runtime_put_sync_suspend(dwc->dev);
>   return 0;
>   }
>  
> -- 
> 2.7.4
> 


Re: [PATCH v3 1/2] kunit: support failure from dynamic analysis tools

2021-02-11 Thread Alan Maguire
  return;
> > +
> > +   kunit_set_failure(current->kunit_test);
> > +

currently kunit_set_failure() is static, but it could be inlined I
suspect.

> > +   /* kunit_err() only accepts literals, so evaluate the args first. */
> > +   va_start(args, fmt);
> > +   len = vsnprintf(NULL, 0, fmt, args) + 1;
> > +   va_end(args);
> > +
> > +   buffer = kunit_kmalloc(current->kunit_test, len, GFP_KERNEL);

kunit_kmalloc()/kunit_kfree() are exported also, but we could probably
dodge allocation with a static buffer.  In fact since we end up
using an on-stack buffer for logging in kunit_log_append(), it might make 
sense to #define __kunit_fail_current_test() instead, i.e.

#define __kunit_fail_current_test(file, line, fmt, ...) \
do {\
kunit_set_failure(current->kunit_test); \
kunit_err(current->kunit_test, "%s:%d: " fmt,   \
  ##__VA_ARGS__);   \
} while (0)

> > +   if (!buffer)
> > +   return;
> > +
> > +   va_start(args, fmt);
> > +   vsnprintf(buffer, len, fmt, args);
> > +   va_end(args);
> > +
> > +   kunit_err(current->kunit_test, "%s:%d: %s", file, line, buffer);

To get kunit_err() to work, we'd need to "static inline" 
kunit_log_append().  It's not a trivial function, but on the plus
side it doesn't call any other exported kunit functions AFAICT.

So while any of the above suggestions aren't intended to block
Daniel's work, does the above seem reasonable for a follow-up
series to get UBSAN working with module-based KUnit? Thanks!

Alan


Re: [PATCH v3 0/2] kunit: fail tests on UBSAN errors

2021-02-10 Thread Alan Maguire


On Tue, 9 Feb 2021, Daniel Latypov wrote:

> v1 by Uriel is here: [1].
> Since it's been a while, I've dropped the Reviewed-By's.
> 
> It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
> hadn't been merged yet, so that caused some kerfuffle with applying them
> previously and the series was reverted.
> 
> This revives the series but makes the kunit_fail_current_test() function
> take a format string and logs the file and line number of the failing
> code, addressing Alan Maguire's comments on the previous version.
> 
> As a result, the patch that makes UBSAN errors was tweaked slightly to
> include an error message.
> 
> v2 -> v3:
>   Fix kunit_fail_current_test() so it works w/ CONFIG_KUNIT=m
>   s/_/__ on the helper func to match others in test.c
> 
> [1] 
> https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguajard...@gmail.com/
>

For the series:

Reviewed-by: Alan Maguire 

Thanks!


Re: [PATCH v2 1/2] kunit: support failure from dynamic analysis tools

2021-02-09 Thread Alan Maguire
On Tue, 9 Feb 2021, Daniel Latypov wrote:

> On Tue, Feb 9, 2021 at 9:26 AM Alan Maguire  wrote:
> >
> > On Fri, 5 Feb 2021, Daniel Latypov wrote:
> >
> > > From: Uriel Guajardo 
> > >
> > > Add a kunit_fail_current_test() function to fail the currently running
> > > test, if any, with an error message.
> > >
> > > This is largely intended for dynamic analysis tools like UBSAN and for
> > > fakes.
> > > E.g. say I had a fake ops struct for testing and I wanted my `free`
> > > function to complain if it was called with an invalid argument, or
> > > caught a double-free. Most return void and have no normal means of
> > > signalling failure (e.g. super_operations, iommu_ops, etc.).
> > >
> > > Key points:
> > > * Always update current->kunit_test so anyone can use it.
> > >   * commit 83c4e7a0363b ("KUnit: KASAN Integration") only updated it for
> > >   CONFIG_KASAN=y
> > >
> > > * Create a new header  so non-test code doesn't have
> > > to include all of  (e.g. lib/ubsan.c)
> > >
> > > * Forward the file and line number to make it easier to track down
> > > failures
> > >
> >
> > Thanks for doing this!
> >
> > > * Declare it as a function for nice __printf() warnings about mismatched
> > > format strings even when KUnit is not enabled.
> > >
> >
> > One thing I _think_ this assumes is that KUnit is builtin;
> > don't we need an
> 
> Ah, you're correct.
> Also going to rename it to have two _ to match other functions used in
> macros like __kunit_test_suites_init.
> 

Great! If you're sending out an updated version with these
changes, feel free to add

Reviewed-by: Alan Maguire 


Re: [PATCH v2 1/2] kunit: support failure from dynamic analysis tools

2021-02-09 Thread Alan Maguire
On Fri, 5 Feb 2021, Daniel Latypov wrote:

> From: Uriel Guajardo 
> 
> Add a kunit_fail_current_test() function to fail the currently running
> test, if any, with an error message.
> 
> This is largely intended for dynamic analysis tools like UBSAN and for
> fakes.
> E.g. say I had a fake ops struct for testing and I wanted my `free`
> function to complain if it was called with an invalid argument, or
> caught a double-free. Most return void and have no normal means of
> signalling failure (e.g. super_operations, iommu_ops, etc.).
> 
> Key points:
> * Always update current->kunit_test so anyone can use it.
>   * commit 83c4e7a0363b ("KUnit: KASAN Integration") only updated it for
>   CONFIG_KASAN=y
> 
> * Create a new header  so non-test code doesn't have
> to include all of  (e.g. lib/ubsan.c)
> 
> * Forward the file and line number to make it easier to track down
> failures
> 

Thanks for doing this!

> * Declare it as a function for nice __printf() warnings about mismatched
> format strings even when KUnit is not enabled.
>

One thing I _think_ this assumes is that KUnit is builtin;
don't we need an

EXPORT_SYMBOL_GPL(_kunit_fail_current_test);

?

Without it, if an analysis tool (or indeed if KUnit) is built
as a module, it won't be possible to use this functionality.

> Example output from kunit_fail_current_test("message"):
> [15:19:34] [FAILED] example_simple_test
> [15:19:34] # example_simple_test: initializing
> [15:19:34] # example_simple_test: lib/kunit/kunit-example-test.c:24: 
> message
> [15:19:34] not ok 1 - example_simple_test
> 
> Co-developed-by: Daniel Latypov 
> Signed-off-by: Uriel Guajardo 
> Signed-off-by: Daniel Latypov 
> ---
>  include/kunit/test-bug.h | 30 ++
>  lib/kunit/test.c | 36 
>  2 files changed, 62 insertions(+), 4 deletions(-)
>  create mode 100644 include/kunit/test-bug.h
> 
> diff --git a/include/kunit/test-bug.h b/include/kunit/test-bug.h
> new file mode 100644
> index ..4963ed52c2df
> --- /dev/null
> +++ b/include/kunit/test-bug.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * KUnit API allowing dynamic analysis tools to interact with KUnit tests
> + *
> + * Copyright (C) 2020, Google LLC.

nit; might want to update copyright year.

> + * Author: Uriel Guajardo 
> + */
> +
> +#ifndef _KUNIT_TEST_BUG_H
> +#define _KUNIT_TEST_BUG_H
> +
> +#define kunit_fail_current_test(fmt, ...) \
> + _kunit_fail_current_test(__FILE__, __LINE__, fmt, ##__VA_ARGS__)
> +
> +#if IS_ENABLED(CONFIG_KUNIT)
> +
> +extern __printf(3, 4) void _kunit_fail_current_test(const char *file, int 
> line,
> + const char *fmt, ...);
> +
> +#else
> +
> +static __printf(3, 4) void _kunit_fail_current_test(const char *file, int 
> line,
> + const char *fmt, ...)
> +{
> +}
> +
> +#endif
> +
> +
> +#endif /* _KUNIT_TEST_BUG_H */
> diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> index ec9494e914ef..7b16aae0ccae 100644
> --- a/lib/kunit/test.c
> +++ b/lib/kunit/test.c
> @@ -7,6 +7,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -16,6 +17,37 @@
>  #include "string-stream.h"
>  #include "try-catch-impl.h"
>  
> +/*
> + * Fail the current test and print an error message to the log.
> + */
> +void _kunit_fail_current_test(const char *file, int line, const char *fmt, 
> ...)
> +{
> + va_list args;
> + int len;
> + char *buffer;
> +
> + if (!current->kunit_test)
> + return;
> +
> + kunit_set_failure(current->kunit_test);
> +
> + /* kunit_err() only accepts literals, so evaluate the args first. */
> + va_start(args, fmt);
> + len = vsnprintf(NULL, 0, fmt, args) + 1;
> + va_end(args);
> +
> + buffer = kunit_kmalloc(current->kunit_test, len, GFP_KERNEL);
> + if (!buffer)
> + return;
> +
> + va_start(args, fmt);
> + vsnprintf(buffer, len, fmt, args);
> + va_end(args);
> +
> + kunit_err(current->kunit_test, "%s:%d: %s", file, line, buffer);
> + kunit_kfree(current->kunit_test, buffer);
> +}
> +
>  /*
>   * Append formatted message to log, size of which is limited to
>   * KUNIT_LOG_SIZE bytes (including null terminating byte).
> @@ -273,9 +305,7 @@ static void kunit_try_run_case(void *data)
>   struct kunit_suite *suite = ctx->suite;
>   struct kunit_case *test_case = ctx->test_case;
>  
> -#if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_KUNIT))
>   current->kunit_test = test;
> -#endif /* IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_KUNIT) */
>  
>   /*
>* kunit_run_case_internal may encounter a fatal error; if it does,
> @@ -624,9 +654,7 @@ void kunit_cleanup(struct kunit *test)
>   spin_unlock(>lock);
>   kunit_remove_resource(test, res);
>   }
> -#if (IS_ENABLED(CONFIG_KASAN) && IS_ENABLED(CONFIG_KUNIT))
>   

Re: WARNING in go7007_usb_onboard_write_interrupt/usb_submit_urb

2021-02-07 Thread Alan Stern
Hans:

On Sat, Feb 06, 2021 at 11:40:13PM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:64eaa0fa platform/chrome: cros_ec_typec: Fix call to typec..
> git tree:   
> https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
> console output: https://syzkaller.appspot.com/x/log.txt?x=12d5c090d0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=29ec25b819cb42f3
> dashboard link: https://syzkaller.appspot.com/bug?extid=d4ebc877b1223f20d5a0
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=16b47dd8d0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=164896c4d0
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d4ebc877b1223f20d...@syzkaller.appspotmail.com
> 
> usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> usb 1-1: SerialNumber: syz
> usb 1-1: config 0 descriptor??
> [ cut here ]
> usb 1-1: BOGUS urb xfer, pipe 2 != type 3
> WARNING: CPU: 0 PID: 2608 at drivers/usb/core/urb.c:493 
> usb_submit_urb+0xd27/0x1540 drivers/usb/core/urb.c:493
> Modules linked in:
> CPU: 0 PID: 2608 Comm: kworker/0:2 Not tainted 5.11.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Workqueue: usb_hub_wq hub_event
> RIP: 0010:usb_submit_urb+0xd27/0x1540 drivers/usb/core/urb.c:493
> Code: 84 d4 02 00 00 e8 e9 e9 ba fd 4c 89 ef e8 f1 3d 1d ff 41 89 d8 44 89 e1 
> 4c 89 f2 48 89 c6 48 c7 c7 00 6a 61 86 e8 31 ee f9 01 <0f> 0b e9 81 f8 ff ff 
> e8 bd e9 ba fd 48 81 c5 30 06 00 00 e9 ad f7
> RSP: 0018:c90006cf6c70 EFLAGS: 00010286
> RAX:  RBX: 0003 RCX: 
> RDX: 88810a895040 RSI: 81299db3 RDI: f52000d9ed80
> RBP: 88810155f8a0 R08: 0001 R09: 
> R10: 8149c4ab R11:  R12: 0002
> R13: 8881121670a0 R14: 8881031c06e0 R15: 888102b46f00
> FS:  () GS:8881f680() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 55bfb7b520d0 CR3: 00010a1f1000 CR4: 001506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  usb_start_wait_urb+0x101/0x4c0 drivers/usb/core/message.c:58
>  usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
>  usb_control_msg+0x31c/0x4a0 drivers/usb/core/message.c:153
>  go7007_usb_onboard_write_interrupt+0x26a/0x340 
> drivers/media/usb/go7007/go7007-usb.c:735

It looks like the go7007 driver isn't very careful about checking that 
the endpoints it uses have the right types.  In particular, this bug was 
caused by not checking that ep2 is a control endpoint (highly unusual to 
have a control endpoint other than 0, but allowed).

Alan Stern


Re: [RESEND PATCH v3 12/12] usb: misc: usbtest: update to use the usb_control_msg_{send|recv}() API

2021-01-27 Thread Alan Stern
;
>   return retval;
> @@ -1845,30 +1842,22 @@ static int ctrl_out(struct usbtest_dev *dev,
>   /* write patterned data */
>   for (j = 0; j < len; j++)
>   buf[j] = (u8)(i + j);
> - retval = usb_control_msg(udev, usb_sndctrlpipe(udev, 0),
> - 0x5b, USB_DIR_OUT|USB_TYPE_VENDOR,
> - 0, 0, buf, len, USB_CTRL_SET_TIMEOUT);
> - if (retval != len) {
> + retval = usb_control_msg_send(udev, 0, 0x5b,
> +   USB_DIR_OUT | USB_TYPE_VENDOR, 0,
> +   0, buf, len, USB_CTRL_SET_TIMEOUT,
> +   GFP_KERNEL);
> + if (retval < 0) {
>   what = "write";
> - if (retval >= 0) {
> - ERROR(dev, "ctrl_out, wlen %d (expected %d)\n",
> - retval, len);
> - retval = -EBADMSG;
> - }
>   break;
>   }

Here buf doesn't need to be copied, and a short write will return an 
error code anyway.

>  
>   /* read it back -- assuming nothing intervened!!  */
> - retval = usb_control_msg(udev, usb_rcvctrlpipe(udev, 0),
> - 0x5c, USB_DIR_IN|USB_TYPE_VENDOR,
> - 0, 0, buf, len, USB_CTRL_GET_TIMEOUT);
> - if (retval != len) {
> + retval = usb_control_msg_recv(udev, 0,
> +   0x5c, USB_DIR_IN|USB_TYPE_VENDOR,
> +   0, 0, buf, len, 
> USB_CTRL_GET_TIMEOUT,
> +   GFP_KERNEL);
> + if (retval < 0) {
>   what = "read";
> - if (retval >= 0) {
> - ERROR(dev, "ctrl_out, rlen %d (expected %d)\n",
> - retval, len);
> - retval = -EBADMSG;
> - }

Similar to one of the cases above.

Alan Stern

>   break;
>   }
>  
> -- 
> 2.25.1
> 


  1   2   3   4   5   6   7   8   9   10   >