Re: Identifying the initial bug within the x86 architecture subsystem

2024-01-03 Thread Dileep Sankhla
On Wed, Jan 3, 2024 at 3:28 PM Greg KH  wrote:
> And maybe there are no x86-specific bugs at the moment?

Yes, I think so.

>Try running linux-next and see if you have any issues with any subsystem, if 
>so, try
> working on that.

 Sure, I will try it next.

Regards,
Dileep

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Identifying the initial bug within the x86 architecture subsystem

2024-01-03 Thread Greg KH
On Wed, Jan 03, 2024 at 03:23:13PM +0530, Dileep Sankhla wrote:
> On Wed, Jan 3, 2024 at 12:33 PM Greg KH  wrote:
> > What do you mean by "first bug"?  Why does the location in an
> > arbitrary list matter?
> 
> Hello Greg,
> 
> >From "first bug" I meant the initial bug for me to solve under this 
> >subsystem.
> 
> > Also, bugzilla is not used by many kernel subsystems, so perhaps the
> > items there just aren't relevant for this one either?
> 
> I looked for the bugs in the subsystem's mailing list but did not find
> anything beginning with the subject line "PROBLEM: ".

I have never seen a bug report for the kernel come in with that in the
subject line, sorry.  So that would be one reason why you might not find
anything.

And maybe there are no x86-specific bugs at the moment?  Try running
linux-next and see if you have any issues with any subsystem, if so, try
working on that.

good luck!

greg k-h

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Identifying the initial bug within the x86 architecture subsystem

2024-01-03 Thread Dileep Sankhla
On Wed, Jan 3, 2024 at 12:33 PM Greg KH  wrote:
> What do you mean by "first bug"?  Why does the location in an
> arbitrary list matter?

Hello Greg,

From "first bug" I meant the initial bug for me to solve under this subsystem.

> Also, bugzilla is not used by many kernel subsystems, so perhaps the
> items there just aren't relevant for this one either?

I looked for the bugs in the subsystem's mailing list but did not find
anything beginning with the subject line "PROBLEM: ".

Following is the query I used in the lkml's archive mirror (see [1]):

subsystem: x86 architecture (32-bit and 64-bit)

query: s:"PROBLEM: " + t:t...@linutronix.de + t:mi...@redhat.com +
t:b...@alien8.de + t:dave.han...@linux.intel.com + t:x...@kernel.org +
c:linux-ker...@vger.kernel.org

Regards,
Dileep

[1]: https://lore.kernel.org/lkml/

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Identifying the initial bug within the x86 architecture subsystem

2024-01-02 Thread Greg KH
On Wed, Jan 03, 2024 at 11:12:57AM +0530, Dileep Sankhla wrote:
> Last night, I dedicated time to go through bugs on Bugzilla (see [1]),
> considering their priorities but I could not figure out which one to
> pick. While I found only a couple of bugs with the latest modification
> date, I lack the same hardware as the original poster (OP) to
> reproduce and fix those bugs. How can I identify the first bug in this
> subsystem? Given the relatively low number of bugs, should I consider
> reaching out to the maintainer of `arch/x86/boot` if I cannot find a
> suitable one to fix?

What do you mean by "first bug"?  Why does the location in an
arbitrary list matter?

Also, bugzilla is not used by many kernel subsystems, so perhaps the
items there just aren't relevant for this one either?

good luck!

greg k-h

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Identifying the initial bug within the x86 architecture subsystem

2024-01-02 Thread Dileep Sankhla
Hello,

I am new to Linux kernel development and have a keen interest in the
x86 architecture subsystem, particularly in the areas of boot and
feature flags. Before diving into adding or changing features, I would
like to address a few existing bugs. I am good in C and have some
knowledge of assembly language.

Last night, I dedicated time to go through bugs on Bugzilla (see [1]),
considering their priorities but I could not figure out which one to
pick. While I found only a couple of bugs with the latest modification
date, I lack the same hardware as the original poster (OP) to
reproduce and fix those bugs. How can I identify the first bug in this
subsystem? Given the relatively low number of bugs, should I consider
reaching out to the maintainer of `arch/x86/boot` if I cannot find a
suitable one to fix?

P.S. I have one commit in the kernel from 2018 (see [2]) , but it was
merely addressing a `checkpatch.pl` warning.

Regards,
Dileep

[1]: 
https://bugzilla.kernel.org/buglist.cgi?bug_severity=blocking_severity=high_status=NEW_status=ASSIGNED_status=REOPENED=i386=x86-64=on=boot%2Cfeature%20flags_type=anywordssubstr=changeddate%20DESC%2Cpriority%20DESC%2Cbug_severity=P1=P2=P3=Platform%20Specific%2FHardware_format=advanced_platform=Intel_platform=i386_platform=x86-64

[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b79f3f68cc306637b88072804396685dc037c779

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: BUG: kernel NULL pointer dereference while copying sk_buff struct

2023-05-02 Thread Abdul Matin
Thanks for the tip!

On Tue, May 2, 2023 at 10:18 PM Alexander Kapshuk <
alexander.kaps...@gmail.com> wrote:

> On Tue, May 2, 2023 at 6:35 PM Abdul Matin
>  wrote:
> >
> > I'm initializing the memory for skbPrev at module init function:
> >
> > static int __init nf_conntrack_my_mod_init(void)
> > {
> > saddr_m = (union nf_inet_addr* ) kmalloc(sizeof(union nf_inet_addr),
> GFP_KERNEL);
> > skbPrev = (struct sk_buff*) kmalloc(sizeof(struct sk_buff),
> GFP_KERNEL);
> > if (saddr_m == NULL) {
> >   printk(KERN_INFO "Can not allocate space for saddr\n");
> >   return -ENOMEM;
> > }
> > if(skbPrev == NULL) {
> >   printk(KERN_INFO "Can not allocate space for skbPrev\n");
> >   return -ENOMEM;
> > }
> >
> > On Tue, May 2, 2023 at 8:23 PM Alexander Kapshuk <
> alexander.kaps...@gmail.com> wrote:
> >>
> >> On Tue, May 2, 2023 at 3:26 PM Abdul Matin
> >>  wrote:
> >> >
> >> > Hi.
> >> > I'm writing a netfilter module where I need to copy a sk_buff in a
> global variable that I use in another subsequent call. But I crashed the
> whole kernel. I've tried to add a code snippet to share with you how I'm
> doing it.
> >> >
> >> > here case1 is always true before case2 (i.e. 1st call of help ->
> case1 is true, 2nd call of help -> case2 true).
> >> > So, in the 2nd call, case2 is true where we're using exp, ctinfoPrev,
> saddr_m which have been initialized before in case1.
> >> >
> >> > union nf_inet_addr *saddr_m;
> >> > struct sk_buff* skbPrev;
> >>
> >> This declares a pointer to struct sk_buff which is uninitialised and
> >> most probably set to NULL by the compiler.
> >> Where do you allocate memory for skbPrev?
> >>
> >> > enum ip_conntrack_info ctinfoPrev;
> >> > struct nf_conntrack_expect *exp;
> >> >
> >> > static int help(struct sk_buff *skb,
> >> > unsigned int protoff,
> >> > struct nf_conn *ct,
> >> > enum ip_conntrack_info ctinfo)
> >> > {
> >> >  switch (msgType) {
> >> > case case1:
> >> >
> >> > ctinfoPrev = ctinfo;
> >> > memcpy((void *)skbPrev, (const void *)skb, sizeof(skb));
> >>
> >> The NULL pointer dereference probably happens here, as memcpy attempts
> >>  to copy data from skb to skbPrev, which is likely to be NULL.
> >>
> >> > skbPrev->next = (struct sk_buff*) kmalloc(sizeof(struct sk_buff),
> GFP_KERNEL);
> >> > skbPrev->prev = (struct sk_buff*) kmalloc(sizeof(struct sk_buff),
> GFP_KERNEL);
> >> > skbPrev->sk = (struct sock*) kmalloc(sizeof(struct sock),
> GFP_KERNEL);
> >> > memcpy((void *)(skbPrev->next), (const void *)skb->next,
> sizeof(skb->next));
> >> > memcpy((void *)(skbPrev->prev), (const void *)skb->prev,
> sizeof(skb->prev));
> >> >memcpy((void *)(skbPrev->sk), (const void *)skb->sk,
> sizeof(skb->sk));
> >> >
> >> > unsigned int type = (dptr[0] << 8) | dptr[1]; // little endian
> >> > unsigned int length = (dptr[2] << 8) | dptr[3];
> >> > printk(KERN_INFO "type: %hu length: %hu", type, length);
> >> >
> >> > unsigned int ip;
> >> > memcpy(, dptr, 4);
> >> > ip = ntohl(ip) ^ MAGIC_COOKIE_VALUE_HOST;
> >> > exp = nf_ct_expect_alloc(ct);
> >> > if (exp == NULL) {
> >> > printk( KERN_INFO "cannot alloc expectation");
> >> > return NF_DROP;
> >> > }
> >> > tuple = >tuplehash[IP_CT_DIR_REPLY].tuple;
> >> > nf_ct_expect_init(exp, NF_CT_EXPECT_CLASS_DEFAULT,
> >> > nf_ct_l3num(ct),
> >> > saddr_m, >dst.u3,
> >> > IPPROTO_UDP, NULL, >dst.u.udp.port);
> >> >
> >> > pr_debug("expect: ");
> >> > nf_ct_dump_tuple(>tuple);
> >> >
> >> >
> >> > break;
> >> >case case2:
> >> > printk(KERN_INFO "createpermission response\n");
> >> > nf_nat_tftp = rcu_dereference(nf_nat_tftp_hook);
> >> > if (nf_nat_tftp && ct->status & IPS_NAT_MASK)
> >> > ret= nf_nat_tftp(skbPrev, ctinfoPrev, exp);
> >> > else if (nf_ct_expect

Re: BUG: kernel NULL pointer dereference while copying sk_buff struct

2023-05-02 Thread Alexander Kapshuk
On Tue, May 2, 2023 at 6:35 PM Abdul Matin
 wrote:
>
> I'm initializing the memory for skbPrev at module init function:
>
> static int __init nf_conntrack_my_mod_init(void)
> {
> saddr_m = (union nf_inet_addr* ) kmalloc(sizeof(union nf_inet_addr), 
> GFP_KERNEL);
> skbPrev = (struct sk_buff*) kmalloc(sizeof(struct sk_buff), GFP_KERNEL);
> if (saddr_m == NULL) {
>   printk(KERN_INFO "Can not allocate space for saddr\n");
>   return -ENOMEM;
> }
> if(skbPrev == NULL) {
>   printk(KERN_INFO "Can not allocate space for skbPrev\n");
>   return -ENOMEM;
> }
>
> On Tue, May 2, 2023 at 8:23 PM Alexander Kapshuk 
>  wrote:
>>
>> On Tue, May 2, 2023 at 3:26 PM Abdul Matin
>>  wrote:
>> >
>> > Hi.
>> > I'm writing a netfilter module where I need to copy a sk_buff in a global 
>> > variable that I use in another subsequent call. But I crashed the whole 
>> > kernel. I've tried to add a code snippet to share with you how I'm doing 
>> > it.
>> >
>> > here case1 is always true before case2 (i.e. 1st call of help -> case1 is 
>> > true, 2nd call of help -> case2 true).
>> > So, in the 2nd call, case2 is true where we're using exp, ctinfoPrev, 
>> > saddr_m which have been initialized before in case1.
>> >
>> > union nf_inet_addr *saddr_m;
>> > struct sk_buff* skbPrev;
>>
>> This declares a pointer to struct sk_buff which is uninitialised and
>> most probably set to NULL by the compiler.
>> Where do you allocate memory for skbPrev?
>>
>> > enum ip_conntrack_info ctinfoPrev;
>> > struct nf_conntrack_expect *exp;
>> >
>> > static int help(struct sk_buff *skb,
>> > unsigned int protoff,
>> > struct nf_conn *ct,
>> > enum ip_conntrack_info ctinfo)
>> > {
>> >  switch (msgType) {
>> > case case1:
>> >
>> > ctinfoPrev = ctinfo;
>> > memcpy((void *)skbPrev, (const void *)skb, sizeof(skb));
>>
>> The NULL pointer dereference probably happens here, as memcpy attempts
>>  to copy data from skb to skbPrev, which is likely to be NULL.
>>
>> > skbPrev->next = (struct sk_buff*) kmalloc(sizeof(struct sk_buff), 
>> > GFP_KERNEL);
>> > skbPrev->prev = (struct sk_buff*) kmalloc(sizeof(struct sk_buff), 
>> > GFP_KERNEL);
>> > skbPrev->sk = (struct sock*) kmalloc(sizeof(struct sock), GFP_KERNEL);
>> > memcpy((void *)(skbPrev->next), (const void *)skb->next, 
>> > sizeof(skb->next));
>> > memcpy((void *)(skbPrev->prev), (const void *)skb->prev, 
>> > sizeof(skb->prev));
>> >memcpy((void *)(skbPrev->sk), (const void *)skb->sk, sizeof(skb->sk));
>> >
>> > unsigned int type = (dptr[0] << 8) | dptr[1]; // little endian
>> > unsigned int length = (dptr[2] << 8) | dptr[3];
>> > printk(KERN_INFO "type: %hu length: %hu", type, length);
>> >
>> > unsigned int ip;
>> > memcpy(, dptr, 4);
>> > ip = ntohl(ip) ^ MAGIC_COOKIE_VALUE_HOST;
>> > exp = nf_ct_expect_alloc(ct);
>> > if (exp == NULL) {
>> > printk( KERN_INFO "cannot alloc expectation");
>> > return NF_DROP;
>> > }
>> > tuple = >tuplehash[IP_CT_DIR_REPLY].tuple;
>> > nf_ct_expect_init(exp, NF_CT_EXPECT_CLASS_DEFAULT,
>> > nf_ct_l3num(ct),
>> > saddr_m, >dst.u3,
>> >     IPPROTO_UDP, NULL, >dst.u.udp.port);
>> >
>> > pr_debug("expect: ");
>> > nf_ct_dump_tuple(>tuple);
>> >
>> >
>> > break;
>> >case case2:
>> > printk(KERN_INFO "createpermission response\n");
>> > nf_nat_tftp = rcu_dereference(nf_nat_tftp_hook);
>> > if (nf_nat_tftp && ct->status & IPS_NAT_MASK)
>> > ret= nf_nat_tftp(skbPrev, ctinfoPrev, exp);
>> > else if (nf_ct_expect_related(exp, 0) != 0) {
>> > printk( KERN_INFO "cannot add expectation");
>> > nf_ct_helper_log(skb, ct, "cannot add expectation");
>> > ret = NF_DROP;
>> > }
>> >nf_ct_expect_put(exp);
>> >break;
>> >}
>> >return ret;
>> >   }
>> > I got this log before crash: 1,589,5743337757,-;BUG: kernel NULL pointer 
>> > dereference, address: 
>> > 1,590,5743337860,-;#PF: supervisor read access in kernel mode
>> > 1,591,5743337880,-;#PF: error_code(0x) - not-present page
>> > 6,592,5743337900,-;PGD 0 P4D 0
>> > 4,593,5743337974,-;Oops:  [#1] SMP PTI
>> >
>> > Is there anything wrong I am doing in copying and initializing?
>> >
>> > ___
>> > Kernelnewbies mailing list
>> > Kernelnewbies@kernelnewbies.org
>> > https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

I see.
Without seeing the code in its entirety it's hard to tell which
pointer is being null-dereferenced.
Examine the OOPS output. The answer may very well be there.
Otherwise, you want to add some pr_info() or printk() calls around the
pointers being dereferenced in your code and see if any of those are
set to NULL.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: BUG: kernel NULL pointer dereference while copying sk_buff struct

2023-05-02 Thread Alexander Kapshuk
On Tue, May 2, 2023 at 3:26 PM Abdul Matin
 wrote:
>
> Hi.
> I'm writing a netfilter module where I need to copy a sk_buff in a global 
> variable that I use in another subsequent call. But I crashed the whole 
> kernel. I've tried to add a code snippet to share with you how I'm doing it.
>
> here case1 is always true before case2 (i.e. 1st call of help -> case1 is 
> true, 2nd call of help -> case2 true).
> So, in the 2nd call, case2 is true where we're using exp, ctinfoPrev, saddr_m 
> which have been initialized before in case1.
>
> union nf_inet_addr *saddr_m;
> struct sk_buff* skbPrev;

This declares a pointer to struct sk_buff which is uninitialised and
most probably set to NULL by the compiler.
Where do you allocate memory for skbPrev?

> enum ip_conntrack_info ctinfoPrev;
> struct nf_conntrack_expect *exp;
>
> static int help(struct sk_buff *skb,
> unsigned int protoff,
> struct nf_conn *ct,
> enum ip_conntrack_info ctinfo)
> {
>  switch (msgType) {
> case case1:
>
> ctinfoPrev = ctinfo;
> memcpy((void *)skbPrev, (const void *)skb, sizeof(skb));

The NULL pointer dereference probably happens here, as memcpy attempts
 to copy data from skb to skbPrev, which is likely to be NULL.

> skbPrev->next = (struct sk_buff*) kmalloc(sizeof(struct sk_buff), 
> GFP_KERNEL);
> skbPrev->prev = (struct sk_buff*) kmalloc(sizeof(struct sk_buff), 
> GFP_KERNEL);
> skbPrev->sk = (struct sock*) kmalloc(sizeof(struct sock), GFP_KERNEL);
> memcpy((void *)(skbPrev->next), (const void *)skb->next, 
> sizeof(skb->next));
> memcpy((void *)(skbPrev->prev), (const void *)skb->prev, 
> sizeof(skb->prev));
>memcpy((void *)(skbPrev->sk), (const void *)skb->sk, sizeof(skb->sk));
>
> unsigned int type = (dptr[0] << 8) | dptr[1]; // little endian
> unsigned int length = (dptr[2] << 8) | dptr[3];
> printk(KERN_INFO "type: %hu length: %hu", type, length);
>
> unsigned int ip;
> memcpy(, dptr, 4);
> ip = ntohl(ip) ^ MAGIC_COOKIE_VALUE_HOST;
> exp = nf_ct_expect_alloc(ct);
> if (exp == NULL) {
> printk( KERN_INFO "cannot alloc expectation");
> return NF_DROP;
> }
> tuple = >tuplehash[IP_CT_DIR_REPLY].tuple;
> nf_ct_expect_init(exp, NF_CT_EXPECT_CLASS_DEFAULT,
> nf_ct_l3num(ct),
> saddr_m, >dst.u3,
> IPPROTO_UDP, NULL, >dst.u.udp.port);
>
> pr_debug("expect: ");
> nf_ct_dump_tuple(>tuple);
>
>
> break;
>case case2:
> printk(KERN_INFO "createpermission response\n");
> nf_nat_tftp = rcu_dereference(nf_nat_tftp_hook);
> if (nf_nat_tftp && ct->status & IPS_NAT_MASK)
> ret= nf_nat_tftp(skbPrev, ctinfoPrev, exp);
> else if (nf_ct_expect_related(exp, 0) != 0) {
> printk( KERN_INFO "cannot add expectation");
> nf_ct_helper_log(skb, ct, "cannot add expectation");
> ret = NF_DROP;
> }
>nf_ct_expect_put(exp);
>break;
>}
>return ret;
>   }
> I got this log before crash: 1,589,5743337757,-;BUG: kernel NULL pointer 
> dereference, address: 
> 1,590,5743337860,-;#PF: supervisor read access in kernel mode
> 1,591,5743337880,-;#PF: error_code(0x) - not-present page
> 6,592,5743337900,-;PGD 0 P4D 0
> 4,593,5743337974,-;Oops:  [#1] SMP PTI
>
> Is there anything wrong I am doing in copying and initializing?
>
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


BUG: kernel NULL pointer dereference while copying sk_buff struct

2023-05-02 Thread Abdul Matin
Hi.
I'm writing a netfilter module where I need to copy a sk_buff in a global
variable that I use in another subsequent call. But I crashed the whole
kernel. I've tried to add a code snippet to share with you how I'm doing
it.

here case1 is always true before case2 (i.e. 1st call of help -> case1 is
true, 2nd call of help -> case2 true).
So, in the 2nd call, case2 is true where we're using exp, ctinfoPrev,
saddr_m which have been initialized before in case1.

union nf_inet_addr *saddr_m;
struct sk_buff* skbPrev;
enum ip_conntrack_info ctinfoPrev;
struct nf_conntrack_expect *exp;

static int help(struct sk_buff *skb,
unsigned int protoff,
struct nf_conn *ct,
enum ip_conntrack_info ctinfo)
{
 switch (msgType) {
case case1:

ctinfoPrev = ctinfo;
memcpy((void *)skbPrev, (const void *)skb, sizeof(skb));
skbPrev->next = (struct sk_buff*) kmalloc(sizeof(struct sk_buff),
GFP_KERNEL);
skbPrev->prev = (struct sk_buff*) kmalloc(sizeof(struct sk_buff),
GFP_KERNEL);
skbPrev->sk = (struct sock*) kmalloc(sizeof(struct sock), GFP_KERNEL);
memcpy((void *)(skbPrev->next), (const void *)skb->next,
sizeof(skb->next));
memcpy((void *)(skbPrev->prev), (const void *)skb->prev,
sizeof(skb->prev));
   memcpy((void *)(skbPrev->sk), (const void *)skb->sk, sizeof(skb->sk));

unsigned int type = (dptr[0] << 8) | dptr[1]; // little endian
unsigned int length = (dptr[2] << 8) | dptr[3];
printk(KERN_INFO "type: %hu length: %hu", type, length);

unsigned int ip;
memcpy(, dptr, 4);
ip = ntohl(ip) ^ MAGIC_COOKIE_VALUE_HOST;
exp = nf_ct_expect_alloc(ct);
if (exp == NULL) {
printk( KERN_INFO "cannot alloc expectation");
return NF_DROP;
}
tuple = >tuplehash[IP_CT_DIR_REPLY].tuple;
nf_ct_expect_init(exp, NF_CT_EXPECT_CLASS_DEFAULT,
nf_ct_l3num(ct),
saddr_m, >dst.u3,
IPPROTO_UDP, NULL, >dst.u.udp.port);

pr_debug("expect: ");
nf_ct_dump_tuple(>tuple);


break;
   case case2:
printk(KERN_INFO "createpermission response\n");
nf_nat_tftp = rcu_dereference(nf_nat_tftp_hook);
if (nf_nat_tftp && ct->status & IPS_NAT_MASK)
ret= nf_nat_tftp(skbPrev, ctinfoPrev, exp);
else if (nf_ct_expect_related(exp, 0) != 0) {
printk( KERN_INFO "cannot add expectation");
nf_ct_helper_log(skb, ct, "cannot add expectation");
ret = NF_DROP;
}
   nf_ct_expect_put(exp);
   break;
   }
   return ret;
  }
I got this log before crash: 1,589,5743337757,-;BUG: kernel NULL pointer
dereference, address: 
1,590,5743337860,-;#PF: supervisor read access in kernel mode
1,591,5743337880,-;#PF: error_code(0x) - not-present page
6,592,5743337900,-;PGD 0 P4D 0
4,593,5743337974,-;Oops:  [#1] SMP PTI

Is there anything wrong I am doing in copying and initializing?
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: mabe a bug in kernel 5.4 since patchlevel 159 - dma error because use ttynull?

2022-10-13 Thread jim . cromie
On Thu, Oct 13, 2022 at 4:48 AM Simon Lindhorst  wrote:
>
> Hello all,
>
>
> when I updated my Kernel from version 5.4.155 to 5.4.215 I get an strange 
> xhci error:
>
> xhci-hcd f10f.usb3: ERROR unknown event type 37
> xhci-hcd f10f.usb3: ERROR Transfer event TRB DMA ptr not part of current 
> TD ep_index 2 comp_code 13
>
> After a lot of this messages, my hardware makes a reboot without any more 
> outputs.
>
> The error only occures when i add console=null to my kernel bootargs. When I 
> add instead console=ttyS0,115200 no error occured.
>
> Now I go back in kernelversions. The error occured first in version 5.4.159. 
> Between patchlevel 158 and 159 there is a change:
>
> --- linux-5.4.158/kernel/printk/printk.c2021-11-06 13:59:45.0 
> +0100
> +++ linux-5.4.159/kernel/printk/printk.c2021-11-12 14:43:05.0 
> +0100
> @@ -2193,8 +2193,15 @@
>  char *s, *options, *brl_options = NULL;
>  int idx;
>
> -if (str[0] == 0)
> +/*
> + * console="" or console=null have been suggested as a way to
> + * disable console output. Use ttynull that has been created
> + * for exacly this purpose.
> + */
> +if (str[0] == 0 || strcmp(str, "null") == 0) {
> +__add_preferred_console("ttynull", 0, NULL, NULL);
>  return 1;
> +}
>
>  if (_braille_console_setup(, _options))
>  return 1;
>
> I checked my kernelconfig and found that I have no ttynull device configured 
> (CONFIG_NULL_TTY=n). Add CONFIG_NULL_TTY=y to my kernelconfig doesn't made a 
> change.
>
> When I undo the change above, everything works fine.
>
>
> Does anybody know, what could be the main trigger for the error above?
>

while there have been lots of change to printk,
the code you cite is still there.
If that code is a candidate for the root-cause,
and you can re-create the error on master,
you are 1/2 the way to getting it fixed.

Also, note latest:

commit 3ef4ea3d84ca568dcd57816b9521e82e3bd94f08
Merge: 30d024b5058e 5eb17c1f458c
Author: Linus Torvalds 
Date:   Wed Mar 23 10:54:27 2022 -0700

Merge tag 'printk-for-5.18' of
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:

 - Make %pK behave the same as %p for kptr_restrict == 0 also with
   no_hash_pointers parameter

 - Ignore the default console in the device tree also when console=null
   or console="" is used on the command line

 - Document console=null and console="" behavior


that last one is pertinent.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


mabe a bug in kernel 5.4 since patchlevel 159 - dma error because use ttynull?

2022-10-13 Thread Simon Lindhorst

Hello all,


when I updated my Kernel from version 5.4.155 to 5.4.215 I get an 
strange xhci error:


/xhci-hcd f10f.usb3: ERROR unknown event type 37//
//xhci-hcd f10f.usb3: ERROR Transfer event TRB DMA ptr not part of 
current TD ep_index 2 comp_code 13//

/

After a lot of this messages, my hardware makes a reboot without any 
more outputs.


The error only occures when i add /console=null/ to my kernel bootargs. 
When I add instead console=/ttyS0,115200/ no error occured.


Now I go back in kernelversions. The error occured first in version 
5.4.159. Between patchlevel 158 and 159 there is a change:


/--- linux-5.4.158/kernel/printk/printk.c    2021-11-06 
13:59:45.0 +0100//
//+++ linux-5.4.159/kernel/printk/printk.c    2021-11-12 
14:43:05.0 +0100//

//@@ -2193,8 +2193,15 @@//
// char *s, *options, *brl_options = NULL;//
// int idx;//

//-    if (str[0] == 0)//
//+    /*//
//+     * console="" or console=null have been suggested as a way to//
//+     * disable console output. Use ttynull that has been created//
//+     * for exacly this purpose.//
//+     *///
//+    if (str[0] == 0 || strcmp(str, "null") == 0) {//
//+        __add_preferred_console("ttynull", 0, NULL, NULL);//
//     return 1;//
//+    }//

// if (_braille_console_setup(, _options))//
//     return 1;/

I checked my kernelconfig and found that I have no ttynull device 
configured (/CONFIG_NULL_TTY=n/). Add /CONFIG_NULL_TTY=y/ to my 
kernelconfig doesn't made a change.


When I undo the change above, everything works fine.


Does anybody know, what could be the main trigger for the error above?


Regards,

Sarah



-- Unsere Aussagen koennen Irrtuemer und Missverstaendnisse enthalten.
Bitte pruefen Sie die Aussagen fuer Ihren Fall, bevor Sie Entscheidungen 
auf Grundlage dieser Aussagen treffen.

Wiesemann & Theis GmbH, Porschestr. 12, D-42279 Wuppertal
Geschaeftsfuehrer: Dipl.-Ing. Ruediger Theis
Registergericht: Amtsgericht Wuppertal, HRB 6377 
Infos zum Datenschutz: https://www.wut.de/datenschutz

Tel. +49-202/2680-0, Fax +49-202/2680-265, https://www.wut.de

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re[2]: Kernel bug tracker

2021-09-06 Thread Adverg Ebashinskii

 
> Ok. Here is the simple one. The other comes in a separate mail.
 
Hello Thomas,
 
I will try to submit the simplest one first (my very first patch). There are to 
questions: 1. When submitting the patch should I include you in the copy as the 
original author? 2. Can I also co-sign the patch by myself?
 
--
Regards,
Adverg Ebashinskii
 
 ___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel bug tracker

2021-09-06 Thread Thomas Schmitt
Hi,

> 1. When submitting the patch should I include you in the copy
> as the original author?

I guess this is more a question for the experienced patch submitters.
My cheat sheet points to
  https://www.kernel.org/doc/html/v5.10/process/submitting-patches.html

The code change is trivial enough that i do not claim authorship.
So how about

  Suggested-by: Thomas Schmitt 

I found the bug and suggested an obvious fix.

The most kernel merit i can claim is that i grepped through the kernel
sources for callers of iso_date() and found them all ready for the change
of result type.
So maybe

  Co-developed-by: Thomas Schmitt 
  Signed-off-by: Thomas Schmitt 

But on the other hand you should verify my claims in -cover-letter.patch
before posting your patch. A year has passed since i did my research and
testing.
So my statements create not more authorship than a detailed bug report would
do.


Have a nice day :)

Thomas


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re[2]: Kernel bug tracker

2021-09-05 Thread Adverg Ebashinskii

 
Hi Thomas,
 
Thanks for the brief explanation of the bugs.
 
>  i'll hand over my patch as guideline, or as 
>  base for own work, or just for review, testing, and posting
 
If you could share your patch here to understand the problem better I would 
gladly dig into it.
 
--
Regards,
Adverg Ebashinskii
 
 ___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re[6]: Kernel bug tracker

2021-09-05 Thread Adverg Ebashinskii

Hi Valdis,
 
> You might want to read this
 
Thanks for the info it was very interesting to read. My primary intention of 
getting into the Linux Kernel Development is that I myself used to be a C/Linux 
developer for years on the user side and pretty well-versed in the Linux 
user-space API. But the thing is that I have almost zero knowledge of what 
actually going on under the hood and how to debug and fix complicated problems 
related to the Kernel itself.  So this is the reason of my interest precisely 
to core subsystems.
 
--
Regards,
Adverg Ebashinskii
 
 ___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel bug tracker

2021-09-05 Thread Thomas Schmitt
Hi,

maybe i should not have pasted my patches into a new mail.
My mail client shows the first mail as three mails. Possibly an effect
of the mailbox-like format which it got by pasting in two send-ready
git patches.
Strangely it shows the second mail with the Rock Ridge patch as a
single one.

Sorry for any confusion in the receiving mail boxes.


Have a nice day :)

Thomas


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel bug tracker

2021-09-05 Thread Thomas Schmitt
Hi,

second patch proposal for isofs because Adverg Ebashinskii wrote:
> If you could share your patch here to understand the problem better I would
> gladly dig into it.


-cover-letter.patch

From 3d484405f0ad8d10ef490281da157bfdd7450cb6 Mon Sep 17 00:00:00 2001
From: Thomas Schmitt 
Date: Tue, 22 Sep 2020 12:35:52 +0200
Subject: [PATCH 0/1] isofs: truncate oversized Rock Ridge names to 255 bytes

Currently Rock Ridge names of length >= 254 are coarsely truncated by
discarding the whole NM entry where the overflow happened. This yields
name lengths of much less than the permissible 255 bytes.
There is no reason to see why to exclude length 254 and 255 and especially
to truncate by possibly a hundred or more bytes than necessary.

So i propose to raise the length of permissible names to 255 and to let
truncation yield exactly a string length of 255 bytes. Truncation shall
take care to invalidate UTF-8 debris at the end of the resulting string
(sorry ISO-8859).

---
Tests made:

Create an ISO 9660 image with file names of length 255, using file
/bin/true as input for both files:

  victim1=12345678901234567890123456789012345678901234567890
  victim1="$victim1"12345678901234567890123456789012345678901234567890
  victim1="$victim1"12345678901234567890123456789012345678901234567890
  victim1="$victim1"12345678901234567890123456789012345678901234567890
  victim1="$victim1"12345678901234567890123456789012345678901234567890
  victim1="$victim1"1234E
  victim2=ä
  victim2="$victim2"ä
  victim2="$victim2"ä
  victim2="$victim2"ä
  victim2="$victim2"ä
  victim2="$victim2"xää
  xorriso -outdev /tmp/test_rr_name.iso \
  -blank as_needed \
  -map /bin/true /"$victim1" \
  -map /bin/true /"$victim2"

Currently the names get truncated to byte lengths 93 and 95:

  mount /tmp/test_rr_name.iso /mnt/iso
  /bin/ls /mnt/iso

yields in xterm with bash

   12345678901234567890...60.more.bytes...1234567890123
  'ää'$'\303'

Note the leading blank with the plain ASCII name and the shell characters
with the name that has 2-byte UTF-8 characters.
But

  /bin/ls /mnt/iso | cat

yields

  12345678901234567890...60.more.bytes...1234567890123
  ää�

The extra characters in xterm seem to be triggered by the presence of the
half UTF-8 'ä' at the end. Its byte 0xc3 is there, byte 0xa4 is missing.
(xterm and /bin/ls are from Debian 10.)
If i make the UTF-8 name shorter to avoid truncation or if i move the 'x'
to the start to cause truncation between complete UTF-8 'ä', the extra
characters do not show up in ls to xterm.

After my change in fs/isofs i get from /bin/ls /mnt/iso

  1234567890...230.more.bytes...12345678901234E
  ää...210.more.bytes...ääxää

Both strings have 255 bytes.

xorriso cannot be talked into writing longer Rock Ridge names. So i rather
set the new macro RR_NAME_LEN in rock.h to 33 to force truncation.
The result with /bin/ls -1 /mnt/iso is:

  123456789012345678901234567890123
  _

Note the half 'ä' at the end being mapped to '_'.
So all characters are valid UTF-8 and no oddities of ls or xterm are to
see.

---
Remaining checkpatch.pl warning:

scripts/checkpatch.pl complains about the string
  'ää'$'\303'
in this text by:
  WARNING: Possible unwrapped commit description (prefer a maximum 75
  chars per line)

Maybe it should talk about "bytes" rather than "chars" or learn about
multi-byte characters in UTF-8.

I think it is beneficial if i show the whole mangled name, rather than
describing it by some ASCII-only text.

---

Have a nice day :)

Thomas

Thomas Schmitt (1):
  isofs: truncate oversized Rock Ridge names to 255 bytes

 fs/isofs/rock.c | 73 ++---
 fs/isofs/rock.h |  2 ++
 2 files changed, 71 insertions(+), 4 deletions(-)

--
2.20.1


0001-isofs-truncate-oversized-Rock-Ridge-names-to-255-byt.patch

From 3d484405f0ad8d10ef490281da157bfdd7450cb6 Mon Sep 17 00:00:00 2001
From: Thomas Schmitt 
Date: Tue, 22 Sep 2020 12:34:50 +0200
Subject: [PATCH 1/1] isofs: truncate oversized Rock Ridge names to 255 bytes

Enlarge the limit for name bytes from 253 to 255.
Do not discard all bytes of the NM field where the overflow occurs, but
rather append them to the accumulated name before truncating it to exactly
255 bytes.
Map trailing incomplete UTF-8 bytes to '_'.

Signed-off-by: Thomas Schmitt 
---
 fs/isofs/rock.c | 

Re: Kernel bug tracker

2021-09-05 Thread Thomas Schmitt
Hi,

Adverg Ebashinskii wrote:
> If you could share your patch here to understand the problem better I would
> gladly dig into it.

Ok. Here is the simple one. The other comes in a separate mail.

The following texts stem from git format-patch. If submitting for real,
i would send them by git send-email to linux-ker...@vger.kernel.org and
linux-s...@vger.kernel.org.
(The latter because Jens Axboe committed a few isofs changes in the past
and because isofs is historically related to sr and cdrom.)


-cover-letter.patch

>From 154a68527351db091e5de60388ba4cfb1fe779fd Mon Sep 17 00:00:00 2001
From: Thomas Schmitt 
Date: Mon, 21 Sep 2020 18:20:14 +0200
Subject: [PATCH 0/1] isofs: prevent file time rollover after year 2038

The time values in struct inode of isofs result from calls to function
iso_date() in isofs/util.c, which returns seconds in the range of signed
int. This will rollover in 2038.
ISO 9660 directory record timestamps are good for up to year 2155.
(ECMA-119 9.1.5: 1900 + 255)

The only callers of iso_date() are in isofs/inode.c and isofs/rock.c
and put the result into struct inode.i_{a,c,m}time.tv_sec which is
of type time64_t.
The time value of iso_date() essentially stems from mktime64().

So return type time64_t is appropriate for iso_date().

--
Demonstration of the problem:

Create an ISO 9660 filesystem with file date in 2040, using file /bin/true
as victim payload:

  xorriso -outdev /tmp/test_date.iso \
  -blank as_needed \
  -map /bin/true /victim \
  -alter_date m 'Oct 01 22:06:12 2040' /victim --

Inspect the current representation by isofs:

  mount /tmp/test_date.iso /mnt/iso
  ls -l /mnt/iso/victim

This yields with int iso_date():

  ... Aug 26  1904 /mnt/iso/victim

After changing the type of iso_date() to time64_t:

  ... Oct  1  2040 /mnt/iso/victim

For completeness i tested the last possible second:

  xorriso ... -alter_date m 'Dec 31 23:59:59 2155' /victim --

and got properly:

  ... Dec 31  2155 /mnt/iso/victim

(When reproducing this it might be to wise to use December 30, to avoid
any potential timezone problems.)

--

Have a nice day :)

Thomas

Thomas Schmitt (1):
  isofs: prevent file time rollover after year 2038

 fs/isofs/isofs.h | 3 ++-
 fs/isofs/util.c  | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

--
2.20.1


0001-isofs-prevent-file-time-rollover-after-year-2038.patch

>From 154a68527351db091e5de60388ba4cfb1fe779fd Mon Sep 17 00:00:00 2001
From: Thomas Schmitt 
Date: Mon, 21 Sep 2020 18:20:06 +0200
Subject: [PATCH 1/1] isofs: prevent file time rollover after year 2038

Change the return type of function iso_date() from int to time64_t.

Signed-off-by: Thomas Schmitt 
---
 fs/isofs/isofs.h | 3 ++-
 fs/isofs/util.c  | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/isofs/isofs.h b/fs/isofs/isofs.h
index 055ec6c586f7..527c0db72ff9 100644
--- a/fs/isofs/isofs.h
+++ b/fs/isofs/isofs.h
@@ -107,7 +107,8 @@ static inline unsigned int isonum_733(u8 *p)
/* Ignore bigendian datum due to broken mastering programs */
return get_unaligned_le32(p);
 }
-extern int iso_date(u8 *, int);
+
+time64_t iso_date(u8 *, int);

 struct inode;  /* To make gcc happy */

diff --git a/fs/isofs/util.c b/fs/isofs/util.c
index e88dba721661..348af786a8a4 100644
--- a/fs/isofs/util.c
+++ b/fs/isofs/util.c
@@ -16,10 +16,10 @@
  * to GMT.  Thus  we should always be correct.
  */

-int iso_date(u8 *p, int flag)
+time64_t iso_date(u8 *p, int flag)
 {
int year, month, day, hour, minute, second, tz;
-   int crtime;
+   time64_t crtime;

year = p[0];
month = p[1];
--
2.20.1



Have a nice day :)

Thomas


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel bug tracker

2021-09-03 Thread Thomas Schmitt
Hi,

Valdis Klētnieks wrote:
> The tricky part is, of course, that for this to work correctly, you need
> to have 64-bit timestamps in the on-disk format.

Initially yes. In
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800627
i sketched what it thought was needed to do.
But by the much more elegant

  https://github.com/torvalds/linux/commit/34be4dbf87fc

the full ISO 9660 date range up to year 2155 would be correctly shown,
if not in year 2038 signed int would roll over.
Demo:

  xorriso -outdev /tmp/test_date.iso \
  -blank as_needed \
  -map /bin/true /victim \
  -alter_date m 'Oct 01 22:06:12 2040' /victim --

  mount /tmp/test_date.iso /mnt/iso
  ls -l /mnt/iso/victim

yields currently

  ... Aug 26  1904 /mnt/iso/victim

But after the really simple change to time64_t it yields

  ... Oct  1  2040 /mnt/iso/victim

So this is really a low hanging fruit in fs.
Still there today in the torvalds Github repo.



> > - isofs: truncate oversized Rock Ridge names to 255 bytes
> >   Map trailing incomplete UTF-8 bytes to '_'.

> A better answer would probably be to truncate it at the last complete UTF-8
> that leaves the string at 255 or less.

My patch proposal could be changed accordingly.
But with '_' as placeholders of bytes from incomplete UTF-8 characters
there would be a distinction to names with the same start bytes but ending
directly before the UTF-8 character which got cut apart.

The need for real truncation should rarely occur. Main motivation for
fixing this would be this observation:

Currently Rock Ridge names of length >= 254 are coarsely truncated by
discarding the whole NM entry where the overflow happened. This yields
name lengths of much less than the permissible 255 bytes.
There is no reason to see why to exclude length 254 and 255 and especially
to truncate by possibly a hundred or more bytes than necessary.

File names in ISO 9660 + Rock Ridge ISO

  1234567890...230.more.bytes...12345678901234E
  ää...210.more.bytes...ääxää

get shown after mount(8) in xterm with bash by /bin/ls as

   12345678901234567890...60.more.bytes...1234567890123
  'ää'$'\303'

Note the leading blank with the plain ASCII name and the shell characters
with the name that has 2-byte UTF-8 characters.

(Rock Ridge encodes its names in one or more NM entries. Long names often
get split between a NM in the file's ISO 9660 directory record and a NM
in the Contiuation Area of the file. That second one gets dropped.)

Other than the time rollover fix, this problem needs some knowledge about
ISO 9660, which is available for free as ECMA-119, and about SUSP + RRIP
of which specs are available for free too.
Both are really simple, compared with e.g. UDF specs.

I am ready to explain in detail what is neded to understand the problem.
If Adverg Ebashinskii wants, i'll hand over my patch as guideline, or as
base for own work, or just for review, testing, and posting.
I can give instructions how to reproduce each of the three bugs by help
of small ISO images made with xorriso.


Have a nice day :)

Thomas


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel bug tracker

2021-09-03 Thread Valdis Klētnieks
On Fri, 03 Sep 2021 13:00:55 +0200, "Thomas Schmitt" said:

> I could offer bugs of isofs with explanations and patch proposals:
>
> - isofs: prevent file time rollover after year 2038
>   Change the return type of function iso_date() from int to time64_t.

The tricky part is, of course, that for this to work correctly, you need
to have 64-bit timestamps in the on-disk format.

> - isofs: truncate oversized Rock Ridge names to 255 bytes
>   Do not discard all bytes of the NM field where the overflow occurs, but
>   rather append them to the accumulated name before truncating it to exactly
>   255 bytes.
>   Map trailing incomplete UTF-8 bytes to '_'.

A better answer would probably be to truncate it at the last complete UTF-8
that leaves the string at 255 or less.




pgpmd4cGnYEkU.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re[4]: Kernel bug tracker

2021-09-03 Thread Adverg Ebashinskii

 
>Your best source for low-hanging fruit these days is probably drivers/staging,
> as pretty much everything under there is *known* to be less-than-optimal.
 
Thanks for the reply. The reason I looked for some bugs is that I’m not really 
interested in driver development and digging into details of a specific 
hardware. So I tried to get into some core subsystems like fs, net, cgroups, 
etc... 
 
--
Regards,
Adverg Ebashinskii
 
 ___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel bug tracker

2021-09-03 Thread Thomas Schmitt
Hi,

Adverg Ebashinskii wrote:
> The reason I looked for some bugs is that I’m not
> really interested in driver development and digging into details of a
> specific hardware. So I tried to get into some core subsystems like fs, net,
> cgroups, etc... 

I could offer bugs of isofs with explanations and patch proposals:

- isofs: prevent file time rollover after year 2038
  Change the return type of function iso_date() from int to time64_t.

- isofs: truncate oversized Rock Ridge names to 255 bytes
  Do not discard all bytes of the NM field where the overflow occurs, but
  rather append them to the accumulated name before truncating it to exactly
  255 bytes.
  Map trailing incomplete UTF-8 bytes to '_'.

- isofs: fix Oops with zisofs and large PAGE_SIZE
  
https://lore.kernel.org/linux-scsi/20201120140633.1673-1-scdbac...@gmx.net/T/#u
  (No replies since 2020-11-20. I hope the tester of this patch still
   has the machine to confirm that the patch is still good.)

What is obviously missing with my skills is ability to get attention of
kernel developers for isofs and their trust that the proposals don't make
things worse.

As developer of libisofs and libburn i can provide motivations and
facts from that experience. There would be 4 bugs in cdrom and sr to be
fixed, 2 wishlist changes for them, and 2 wishlist changes for isofs.
An example can be seen at
  
https://lore.kernel.org/linux-scsi/20201006094026.1730-1-scdbac...@gmx.net/T/#u

(My patch proposals were tested with kernels of a year ago. One of them
meanwhile needs rework due to the demise of the .readpages method:
  isofs: Give zisofs a .readpages() method for use by mm/readahead
My kernel development machine from then meanwhile has a production job.)


Have a nice day :)

Thomas


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Re[2]: Kernel bug tracker

2021-08-31 Thread Valdis Klētnieks
On Sun, 29 Aug 2021 12:39:28 +0300, Adverg Ebashinskii said:

> Hi Anatoly,

> Thank you very much for your response. https://bugzilla.kernel.org looks
> exactly what I was looking for.

Note that the Bugzilla probably *isn't* what you're looking for, if you're
looking for small easy patches to start with.

Hint: Many long-time kernel developers say the bugzilla is where kernel bugs go
to die.

That's because if it's an open bug in the bugzilla, one or more of the
following things are probably true:

* The bug has actually already been fixed but nobody ever bothered closing the
bugzilla entry.

* The bug isn't reproducible on a common configuration, either due to specific
hardware requirements (like a specific card at a specific firmware release), or
the software replicator for the issue isn't known, so only one computer can
reliably trigger the issue. (A few years back, Linus and a few others finally
swatted a bug that triggered on *one* system several times a week.  It turned
out to be a race condition, with a window caused by interrupts being re-enabled
3 instructions too early.  So that one system was doing something that hit this
literally billionth-of-a-second wide window several times a week).

* The bug doesn't have an obvious/easy fix, so it's sitting in the bugzilla
while people try to come up with a fix that isn't too ugly to be allowed to
live. Once you get all the git configuration done and working, it's usually
faster to just create and submit the patch rather than open a bugzilla entry,
so bugzilla entries don't get created for obvious patches.

* The bug report requires more information, and the original reporter of the
bug has evaporated.

Your best source for low-hanging fruit these days is probably drivers/staging,
as pretty much everything under there is *known* to be less-than-optimal. There
should even be a TODO file for each driver in there, saying what stuff is known
to need work.  (Note that it's always possible that things get fixed but the
TODO file doesn't get updated - that's a potential source of cleanup patches as
well)

Good luck.  And remember to back up your system before testing patches. :)



pgpKUuyvSyI7J.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re[2]: Kernel bug tracker

2021-08-29 Thread Adverg Ebashinskii

 
Hi Anatoly,
 
Thank you very much for your response. https://bugzilla.kernel.org looks 
exactly what I was looking for.
 
--
Regards,
Adverg Ebashinskii
 
 
 ___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Kernel bug tracker

2021-08-29 Thread Adverg Ebashinskii

Hello.
 
I’m a kernel newbie and try to get involved into the Kernel development. So I’d 
like to start with small bug fixes related to any subsystem (fs is preferred 
since I familiar with it the most) or something like that.
 
Is there some kernel bug tracker where anybody could pick a bug to fix and then 
send patches?
--
Regards,
Adverg Ebashinskii___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Kernel bug tracker

2021-08-29 Thread Anatoly Pugachev
On Sun, Aug 29, 2021 at 10:04 AM Adverg Ebashinskii  wrote:
>
> Hello.
>
> I’m a kernel newbie and try to get involved into the Kernel development. So 
> I’d like to start with small bug fixes related to any subsystem (fs is 
> preferred since I familiar with it the most) or something like that.
>
> Is there some kernel bug tracker where anybody could pick a bug to fix and 
> then send patches?

https://bugzilla.kernel.org/

There's as well per linux distribution / vendor bug reporting web
interfaces, like http://bugs.debian.org/ and/or
https://bugs.launchpad.net/
https://bugzilla.redhat.com/ where users could first post their bugs

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-07 Thread 孙世龙 sunshilong
Hi,

Thank you for your help and patience.

>Jul  3 10:23:31 yx kernel: [ 1176.166204] CPU: 0 PID: 1837 Comm:
>rt_cansend Tainted: G   OE 4.19.84-solve-alc-failure #1
>Jul  3 10:23:31 yx kernel: [ 1176.166209] I-pipe domain: Linux
>Jul  3 10:23:31 yx kernel: [ 1176.166218] RIP:
>0010:queued_spin_lock_slowpath+0xd9/0x1a0
...
>  Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
> Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
> Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
> Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190

One more question, what's the relation between "queued_spin_lock_slowpath"
and "_raw_spin_lock"?

Best Regards.

孙世龙 sunshilong  于2020年7月4日周六 下午5:13写道:
>
> Hi, Valdis Klētnieks
>
> Thank you for taking the time to respond to me.
> I have a better understanding of this matter.
>
> >> Can I draw the conclusion that continually acquiring the spinlock causes 
> >> the soft
> >> lockup and the CPU has been stuck for 22s?
> >> Can I think in this way?
>
> >No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
>
> I see. So there is a thread that has held the corresponding spinlock
> for more 22s,  and a CPU is sticking(busy acquiring the spinlock) at the
> same duration.
> Can I think in this way?
>
> Thank you for your attention to this matter.
> Best Regards.
>
> Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:
>
> 孙世龙 sunshilong  于2020年7月4日周六 下午5:04写道:
> >
> > Hi, Valdis Klētnieks
> >
> > Thank you for taking the time to respond to me.
> > I have a better understanding of this matter.
> >
> > >> Can I draw the conclusion that continually acquiring the spinlock causes 
> > >> the soft
> > >> lockup and the CPU has been stuck for 22s?
> > >> Can I think in this way?
> >
> > >No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
> >
> > I see. So there is a thread that has held the corresponding spinlock
> > for more 22s.
> > Can I think in this way?
> >
> > Thank you for your attention to this matter.
> > Best Regards.
> >
> > Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:
> > >
> > >
> > > > Can I draw the conclusion that continually acquiring the spinlock 
> > > > causes the soft
> > > > lockup and the CPU has been stuck for 22s?
> > > > Can I think in this way?
> > >
> > > No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
> > >
> > > For comparison - spinlocks are usually used when you need a lock, but the
> > > code protected by the lock is short (things like adding to a linked list, 
> > > etc),
> > > so it should again become available in milliseconds - things where it 
> > > would take
> > > longer to put this thread to sleep and wake another one up than we expect
> > > to be waiting for this lock.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: For a bug in lib/vsprintf.c where should a patch be sent to ?

2020-07-06 Thread Greg KH
On Mon, Jul 06, 2020 at 06:57:34AM +0530, Santosh Sivaraj wrote:
> William Tambe  writes:
> 
> > On Sun, Jul 5, 2020 at 8:40 PM William Tambe  wrote:
> >>
> >> For a bug in lib/vsprintf.c where should a patch be sent to ?
> >
> > The bug is in kernel/kallsyms.c instead.
> >
> > So for a bug in kernel/kallsyms.c, where should a patch be sent to ?
> >
> 
> You can use the 'get_maintainer.pl' script in the kernel tree to identify the
> list and maintainers.
> 
> ./scripts/get_maintainer.pl kernel/kallsyms.c
> 
> 
> But in general you can send it to linux-ker...@vger.kernel.org.

If you only do that, it will be ignored, use the output of
get_maintainer.pl on your patch to get the proper list of people and
mailing lists.

thanks,

greg k-h

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: For a bug in lib/vsprintf.c where should a patch be sent to ?

2020-07-05 Thread Santosh Sivaraj
William Tambe  writes:

> On Sun, Jul 5, 2020 at 8:40 PM William Tambe  wrote:
>>
>> For a bug in lib/vsprintf.c where should a patch be sent to ?
>
> The bug is in kernel/kallsyms.c instead.
>
> So for a bug in kernel/kallsyms.c, where should a patch be sent to ?
>

You can use the 'get_maintainer.pl' script in the kernel tree to identify the
list and maintainers.

./scripts/get_maintainer.pl kernel/kallsyms.c


But in general you can send it to linux-ker...@vger.kernel.org.

Thanks,
Santosh

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: For a bug in lib/vsprintf.c where should a patch be sent to ?

2020-07-05 Thread William Tambe
On Sun, Jul 5, 2020 at 8:40 PM William Tambe  wrote:
>
> For a bug in lib/vsprintf.c where should a patch be sent to ?

The bug is in kernel/kallsyms.c instead.

So for a bug in kernel/kallsyms.c, where should a patch be sent to ?

>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


For a bug in lib/vsprintf.c where should a patch be sent to ?

2020-07-05 Thread William Tambe
For a bug in lib/vsprintf.c where should a patch be sent to ?
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-04 Thread 孙世龙 sunshilong
Hi, Valdis Klētnieks

Thank you for taking the time to respond to me.
I have a better understanding of this matter.

>> Can I draw the conclusion that continually acquiring the spinlock causes the 
>> soft
>> lockup and the CPU has been stuck for 22s?
>> Can I think in this way?

>No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.

I see. So there is a thread that has held the corresponding spinlock
for more 22s,  and a CPU is sticking(busy acquiring the spinlock) at the
same duration.
Can I think in this way?

Thank you for your attention to this matter.
Best Regards.

Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:

孙世龙 sunshilong  于2020年7月4日周六 下午5:04写道:
>
> Hi, Valdis Klētnieks
>
> Thank you for taking the time to respond to me.
> I have a better understanding of this matter.
>
> >> Can I draw the conclusion that continually acquiring the spinlock causes 
> >> the soft
> >> lockup and the CPU has been stuck for 22s?
> >> Can I think in this way?
>
> >No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
>
> I see. So there is a thread that has held the corresponding spinlock
> for more 22s.
> Can I think in this way?
>
> Thank you for your attention to this matter.
> Best Regards.
>
> Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:
> >
> >
> > > Can I draw the conclusion that continually acquiring the spinlock causes 
> > > the soft
> > > lockup and the CPU has been stuck for 22s?
> > > Can I think in this way?
> >
> > No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
> >
> > For comparison - spinlocks are usually used when you need a lock, but the
> > code protected by the lock is short (things like adding to a linked list, 
> > etc),
> > so it should again become available in milliseconds - things where it would 
> > take
> > longer to put this thread to sleep and wake another one up than we expect
> > to be waiting for this lock.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-04 Thread 孙世龙 sunshilong
Hi, Valdis Klētnieks

Thank you for taking the time to respond to me.
I have a better understanding of this matter.

>> Can I draw the conclusion that continually acquiring the spinlock causes the 
>> soft
>> lockup and the CPU has been stuck for 22s?
>> Can I think in this way?

>No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.

I see. So there is a thread that has held the corresponding spinlock
for more 22s.
Can I think in this way?

Thank you for your attention to this matter.
Best Regards.

Valdis Klētnieks  于2020年7月4日周六 下午4:09写道:
>
>
> > Can I draw the conclusion that continually acquiring the spinlock causes 
> > the soft
> > lockup and the CPU has been stuck for 22s?
> > Can I think in this way?
>
> No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.
>
> For comparison - spinlocks are usually used when you need a lock, but the
> code protected by the lock is short (things like adding to a linked list, 
> etc),
> so it should again become available in milliseconds - things where it would 
> take
> longer to put this thread to sleep and wake another one up than we expect
> to be waiting for this lock.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-04 Thread Valdis Klētnieks

> Can I draw the conclusion that continually acquiring the spinlock causes the 
> soft
> lockup and the CPU has been stuck for 22s?
> Can I think in this way?

No.  It's been stuck for 22s *TRYING* and *FAILING* to get the spinlock.

For comparison - spinlocks are usually used when you need a lock, but the
code protected by the lock is short (things like adding to a linked list, etc),
so it should again become available in milliseconds - things where it would take
longer to put this thread to sleep and wake another one up than we expect
to be waiting for this lock.


pgpSMIjGyL8bs.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-04 Thread 孙世龙 sunshilong
Hi, Valdis Klētnieks

Thank you for your generous help.
My understanding of this matter is on a different level with your help.

>>Jul  3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup -
>>CPU#0 stuck for 22s! [rt_cansend:1837]
>>Jul  3 10:23:31 yx kernel: [ 1176.166066] Modules linked in:
>>..
>>Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
>>Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
>>Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
>>
>You get into function can_write() in module advcan.
>That tries to take a spinlock, while something else already has it.
Can I draw the conclusion that continually acquiring the spinlock
causes the soft
lockup and the CPU has been stuck for 22s?
Can I think in this way?

Thank you for your attention to this matter.
Best Regards.


Valdis Klētnieks  于2020年7月4日周六 下午12:39写道:
>
> > Could you please give me some hint on how to investigate the cause deeply?
>
> Shortening the call trace to the relevant lines:
>
> >  Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
> > Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
> > Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
> > Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190
>
> You get into function can_write() in module advcan.
>
> That tries to take a spinlock, while something else already has it.
>
> The spinlock call is (roughly) 15% of the way through the function 
> can_write().
>
> The 'modules linked in' list includes "advcan(OE)".
>
> The 'O' tells us it's an out-of-tree module, which means you need to talk to
> whoever wrote the module and find out why it's hanging on a spin lock (most
> likely something else is failing to release it).
>
> And that's about as far as we can hint, since we don't have the source for 
> your
> out-of-tree module.  If the people who wrote it would clean it up and get it
> into the base Linux tree, then we'd all have access to it and be able to help
> in much greater detail.
>

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-03 Thread Valdis Klētnieks
> Could you please give me some hint on how to investigate the cause deeply?

Shortening the call trace to the relevant lines:

>  Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
> Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
> Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
> Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190

You get into function can_write() in module advcan.

That tries to take a spinlock, while something else already has it.

The spinlock call is (roughly) 15% of the way through the function can_write().

The 'modules linked in' list includes "advcan(OE)".

The 'O' tells us it's an out-of-tree module, which means you need to talk to
whoever wrote the module and find out why it's hanging on a spin lock (most
likely something else is failing to release it).

And that's about as far as we can hint, since we don't have the source for your
out-of-tree module.  If the people who wrote it would clean it up and get it
into the base Linux tree, then we'd all have access to it and be able to help
in much greater detail.



pgp837ZGaCkbE.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


How can I investigate the cause of "watchdog: BUG: soft lockup"?

2020-07-03 Thread 孙世龙 sunshilong
Hi, list
I encountered the error of "watchdog: BUG: soft lockup" when I sent
data through the can bus.
Could you please give me some hint on how to investigate the cause deeply?
Thank you for your attention to this matter.

The most related log(full log is seen at the footnote):
 Jul  3 10:22:36 yx kernel: [ 1120.688506] CAN[0][0] RX: FIFO overrun
Jul  3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup -
CPU#0 stuck for 22s! [rt_cansend:1837]
...
 Jul  3 10:23:31 yx kernel: [ 1176.166252] Call Trace:
Jul  3 10:23:31 yx kernel: [ 1176.166261]  _raw_spin_lock+0x20/0x30
Jul  3 10:23:31 yx kernel: [ 1176.166270]  can_write+0x6c/0x2c0 [advcan]
Jul  3 10:23:31 yx kernel: [ 1176.166276]  ? dequeue_signal+0xae/0x1a0
Jul  3 10:23:31 yx kernel: [ 1176.166281]  ? recalc_sigpending+0x1b/0x50
Jul  3 10:23:31 yx kernel: [ 1176.166286]  ? __set_task_blocked+0x3c/0xa0
Jul  3 10:23:31 yx kernel: [ 1176.166292]  __vfs_write+0x3a/0x190
Jul  3 10:23:31 yx kernel: [ 1176.166298]  ? apparmor_file_permission+0x1a/0x20
Jul  3 10:23:31 yx kernel: [ 1176.166302]  ? security_file_permission+0x3b/0xc0
Jul  3 10:23:31 yx kernel: [ 1176.166307]  vfs_write+0xb8/0x1b0
Jul  3 10:23:31 yx kernel: [ 1176.166312]  ksys_write+0x5c/0xe0
Jul  3 10:23:31 yx kernel: [ 1176.166316]  __x64_sys_write+0x1a/0x20
Jul  3 10:23:31 yx kernel: [ 1176.166321]  do_syscall_64+0x87/0x250
Jul  3 10:23:31 yx kernel: [ 1176.166326]
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Here is the full log:
Jul  3 10:06:16 yx kernel: [  140.313856] CAN[0][0] RX: FIFO overrun
Jul  3 10:06:59 yx kernel: [  183.323792] CAN[0][0] RX: FIFO overrun
Jul  3 10:07:42 yx kernel: [  226.329465] CAN[0][0] RX: FIFO overrun
Jul  3 10:08:24 yx kernel: [  268.362822] CAN[0][0] RX: FIFO overrun
Jul  3 10:09:07 yx kernel: [  311.372488] CAN[0][0] RX: FIFO overrun
Jul  3 10:09:50 yx kernel: [  354.377996] CAN[0][0] RX: FIFO overrun
Jul  3 10:10:32 yx kernel: [  396.411726] CAN[0][0] RX: FIFO overrun
Jul  3 10:11:15 yx kernel: [  439.421156] CAN[0][0] RX: FIFO overrun
Jul  3 10:11:58 yx kernel: [  482.426522] CAN[0][0] RX: FIFO overrun
Jul  3 10:12:40 yx kernel: [  524.460688] CAN[0][0] RX: FIFO overrun
Jul  3 10:13:23 yx kernel: [  567.469857] CAN[0][0] RX: FIFO overrun
Jul  3 10:14:06 yx kernel: [  610.475021] CAN[0][0] RX: FIFO overrun
Jul  3 10:14:48 yx kernel: [  652.509597] CAN[0][0] RX: FIFO overrun
Jul  3 10:15:31 yx kernel: [  695.518491] CAN[0][0] RX: FIFO overrun
Jul  3 10:16:14 yx kernel: [  738.523551] CAN[0][0] RX: FIFO overrun
Jul  3 10:16:55 yx kernel: [  779.558139] CAN[0][0] RX: FIFO overrun
Jul  3 10:17:38 yx kernel: [  822.566773] CAN[0][0] RX: FIFO overrun
Jul  3 10:18:21 yx kernel: [  865.571697] CAN[0][0] RX: FIFO overrun
Jul  3 10:19:03 yx kernel: [  907.607049] CAN[0][0] RX: FIFO overrun
Jul  3 10:19:46 yx kernel: [  950.615449] CAN[0][0] RX: FIFO overrun
Jul  3 10:20:29 yx kernel: [  993.620196] CAN[0][0] RX: FIFO overrun
Jul  3 10:21:11 yx kernel: [ 1035.655974] CAN[0][0] RX: FIFO overrun
Jul  3 10:21:54 yx kernel: [ 1078.664116] CAN[0][0] RX: FIFO overrun
Jul  3 10:22:36 yx kernel: [ 1120.688506] CAN[0][0] RX: FIFO overrun
Jul  3 10:23:31 yx kernel: [ 1176.166058] watchdog: BUG: soft lockup -
CPU#0 stuck for 22s! [rt_cansend:1837]
Jul  3 10:23:31 yx kernel: [ 1176.166066] Modules linked in: bnep
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
nls_iso8859_1 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep
snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi intel_rapl
intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp
kvm_intel snd_seq punit_atom_debug crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel snd_seq_device cryptd intel_cstate snd_timer
hci_uart snd lpc_ich advcan(OE) mei_txe btqca soundcore mei btbcm
btintel bluetooth ecdh_generic rfkill_gpio pwm_lpss_platform mac_hid
pwm_lpss parport_pc ppdev lp parport autofs4 i915 kvmgt vfio_mdev mdev
vfio_iommu_type1 vfio kvm irqbypass drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops igb drm dca ahci i2c_algo_bit
libahci video i2c_hid hid
Jul  3 10:23:31 yx kernel: [ 1176.166204] CPU: 0 PID: 1837 Comm:
rt_cansend Tainted: G   OE 4.19.84-solve-alc-failure #1
Jul  3 10:23:31 yx kernel: [ 1176.166209] I-pipe domain: Linux
Jul  3 10:23:31 yx kernel: [ 1176.166218] RIP:
0010:queued_spin_lock_slowpath+0xd9/0x1a0
Jul  3 10:23:31 yx kernel: [ 1176.166223] Code: 48 03 34 c5 00 67 37
91 48 89 16 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
48 85 f6 74 07 0f 0d 0e eb 02 f3 90 <8b> 07 66 85 c0 75 f7 41 89 c0 66
45 31 c0 41 39 c8 0f 84 96 00 00
Jul  3 10:23:31 yx kernel: [ 1176.166226] RSP: 0018:be6f4c17bd08
EFLAGS: 0202 ORIG_RAX: ff13
Jul  3 10:23:31 yx kernel: [ 1176.166231] RAX:  RBX:
 RCX: 
Jul  3 10:23:31 yx kernel: [ 1176.166234] RDX:  RSI:
 RDI: 
Jul  3 10:23:31 yx kernel: [ 1176.166236] RBP: be6f4c17bd08 R08:

Re: BUG: Bad page state in process swapper on new imx8qm board

2019-10-21 Thread Oliver Graute
On 17/10/19, Cengiz Can wrote:
> Hello Oliver,
> 
> > So after some more digging I assume that this error is related to a
> > missing "reserved-memory" node in my devicetree. Now I need to find
> > out how to split up my memory the right way for this imx8qm congatec
> > board.
> 
> I think asking in #linux-imx (freenode) would be a much better idea.
> 
> You can use https://webchat.freenode.net/ if you don't have an IRC
> client handy.

thx for this hint. I'll show up there.

Best regards,

Oliver

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: BUG: Bad page state in process swapper on new imx8qm board

2019-10-21 Thread Oliver Graute
On 17/10/19, Peng Fan wrote:
> 
> 
> > -Original Message-
> > From: Oliver Graute 
> > Sent: 2019年10月17日 15:34
> > To: kernelnewbies@kernelnewbies.org; Aisheng Dong
> > ; Peng Fan 
> > Subject: Re: BUG: Bad page state in process swapper on new imx8qm board
> > 
> > On 16/10/19, Oliver Graute wrote:
> > > Hello list,
> > >
> > > I try to bootup up a new imx8qm congatec board and I have written a
> > > dts file for it and applied some imx8qm related patches which are not
> > > mainline yet but working fine on another imx8qm board (same cpu,
> > > different board vendor).
> > >
> > > The Kernel starts to boot. But unfortunately its stucked after few
> > > seconds with a lot memory/swapper issues.
> > >
> > > BUG: Bad page state in process swapper
> > >
> > > Some clue what`s going on here?
> > 
> > So after some more digging I assume that this error is related to a missing
> > "reserved-memory" node in my devicetree. Now I need to find out how to
> > split up my memory the right way for this imx8qm congatec board.
> 
> Which uboot release are you using? Did you include M4 image in flash.bin?

I'am using the u-boot master branch with my board patches. Together with
the M4 Image. This particular problem is solved now by adding the right
reserved-memory statement and linux,cma {} statement.

No more kernel crashes yet. Now the kernel starts systemd and is stucked
there. I get no login prompt. It looks for me like getty can't find any
serial to attach. But the kernel is still running and writing to serial
lpuart32.

Best regards,

Oliver

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


RE: BUG: Bad page state in process swapper on new imx8qm board

2019-10-17 Thread Peng Fan


> -Original Message-
> From: Oliver Graute 
> Sent: 2019年10月17日 15:34
> To: kernelnewbies@kernelnewbies.org; Aisheng Dong
> ; Peng Fan 
> Subject: Re: BUG: Bad page state in process swapper on new imx8qm board
> 
> On 16/10/19, Oliver Graute wrote:
> > Hello list,
> >
> > I try to bootup up a new imx8qm congatec board and I have written a
> > dts file for it and applied some imx8qm related patches which are not
> > mainline yet but working fine on another imx8qm board (same cpu,
> > different board vendor).
> >
> > The Kernel starts to boot. But unfortunately its stucked after few
> > seconds with a lot memory/swapper issues.
> >
> > BUG: Bad page state in process swapper
> >
> > Some clue what`s going on here?
> 
> So after some more digging I assume that this error is related to a missing
> "reserved-memory" node in my devicetree. Now I need to find out how to
> split up my memory the right way for this imx8qm congatec board.

Which uboot release are you using? Did you include M4 image in flash.bin?

Regards,
Peng.

> 
> Any hints?
> 
> Best regards,
> 
> Oliver
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: BUG: Bad page state in process swapper on new imx8qm board

2019-10-17 Thread Cengiz Can
Hello Oliver,

> So after some more digging I assume that this error is related to a
> missing "reserved-memory" node in my devicetree. Now I need to find
> out how to split up my memory the right way for this imx8qm congatec
> board.

I think asking in #linux-imx (freenode) would be a much better idea.

You can use https://webchat.freenode.net/ if you don't have an IRC
client handy.

Good luck!

-- 
Cengiz Can


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: BUG: Bad page state in process swapper on new imx8qm board

2019-10-17 Thread Oliver Graute
On 16/10/19, Oliver Graute wrote:
> Hello list,
> 
> I try to bootup up a new imx8qm congatec board and I have written a dts
> file for it and applied some imx8qm related patches which are not
> mainline yet but working fine on another imx8qm board (same cpu,
> different board vendor).
> 
> The Kernel starts to boot. But unfortunately its stucked after few
> seconds with a lot memory/swapper issues.
> 
> BUG: Bad page state in process swapper
> 
> Some clue what`s going on here?

So after some more digging I assume that this error is related to a
missing "reserved-memory" node in my devicetree. Now I need to find out
how to split up my memory the right way for this imx8qm congatec board.

Any hints?

Best regards,

Oliver

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


BUG: Bad page state in process swapper on new imx8qm board

2019-10-16 Thread Oliver Graute
Hello list,

I try to bootup up a new imx8qm congatec board and I have written a dts
file for it and applied some imx8qm related patches which are not
mainline yet but working fine on another imx8qm board (same cpu,
different board vendor).

The Kernel starts to boot. But unfortunately its stucked after few
seconds with a lot memory/swapper issues.

BUG: Bad page state in process swapper

Some clue what`s going on here?

Best regards,

Oliver

Starting kernel ...

[0.00] Booting Linux on physical CPU 0x00 [0x410fd034]
[0.00] Linux version 5.3.0-rc7-next-20190904-00033-gf19a0a4e7252-dirty 
(alarm@imx8qm) (gcc version 8.3.0 (GCC)) #4 SMP PREEMPT Thu Oct 10 13:23:33 UTC 
2019
[0.00] Machine model: Congatec QMX8 Qseven series
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] cma: Reserved 32 MiB at 0xfe00
[0.00] earlycon: lpuart32 at MMIO 0x5a06 (options '')
[0.00] printk: bootconsole [lpuart32] enabled
[0.00] NUMA: No NUMA configuration found
[0.00] NUMA: Faking a node at [mem 
0x8020-0x00097fff]
[0.00] NUMA: NODE_DATA [mem 0x97f3e2800-0x97f3e3fff]
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x8020-0x]
[0.00]   Normal   [mem 0x0001-0x00097fff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x8020-0x]
[0.00]   node   0: [mem 0x00088000-0x00097fff]
[0.00] Initmem setup node 0 [mem 0x8020-0x00097fff]
[0.00] psci: probing for conduit method from DT.
[0.00] psci: PSCIv1.1 detected in firmware.
[0.00] psci: Using standard PSCI v0.2 function IDs
[0.00] psci: MIGRATE_INFO_TYPE not supported.
[0.00] psci: SMC Calling Convention v1.1
[0.00] percpu: Embedded 22 pages/cpu s52952 r8192 d28968 u90112
[0.00] Detected VIPT I-cache on CPU0
[0.00] CPU features: detected: ARM erratum 845719
[0.00] CPU features: detected: GIC system register CPU interface
[0.00] Speculative Store Bypass Disable mitigation not required
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 1547784
[0.00] Policy zone: Normal
[0.00] Kernel command line: console=ttyLP0,115200 root=/dev/mmcblk2p2 
rootwait rw earlycon
[0.00] Dentry cache hash table entries: 1048576 (order: 11, 8388608 
bytes, linear)
[0.00] Inode-cache hash table entries: 524288 (order: 10, 4194304 
bytes, linear)
[0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
[0.00] software IO TLB: mapped [mem 0xfa00-0xfe00] (64MB)
[0.00] BUG: Bad page state in process swapper  pfn:b0001
[0.00] page:fea00040 refcount:0 mapcount:1 
mapping: index:0x0
[0.00] flags: 0x0()
[0.00] raw:    

[0.00] raw:    

[0.00] page dumped because: nonzero mapcount
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.3.0-rc7-next-20190904-00033-gf19a0a4e7252-dirty #4
[0.00] Hardware name: Congatec QMX8 Qseven series (DT)
[0.00] Call trace:
[0.00]  dump_backtrace+0x0/0x140
[0.00]  show_stack+0x14/0x20
[0.00]  dump_stack+0xb0/0xf4
[0.00]  bad_page+0xec/0x118
[0.00]  free_pages_check_bad+0x70/0xa8
[0.00]  __free_pages_ok+0x2a8/0x2c8
[0.00]  __free_pages+0x4c/0x58
[0.00]  __free_pages_core+0xc0/0xd0
[0.00]  memblock_free_pages+0x10/0x18
[0.00]  memblock_free_all+0x188/0x254
[0.00]  mem_init+0x48/0x58
[0.00]  start_kernel+0x254/0x484
[0.00] Disabling lock debugging due to kernel taint
[0.00] BUG: Bad page state in process swapper  pfn:b0002
[0.00] page:fea00080 refcount:0 mapcount:1 
mapping: index:0x0
[0.00] flags: 0x0()
[0.00] raw:    

[0.00] raw:    

[0.00] page dumped because: nonzero mapcount
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Tainted: GB 
5.3.0-rc7-next-20190904-00033-gf19a0a4e7252-dirty #4
[0.00] Hardware name: Congatec QMX8 Qseven series (DT)
[0.00] Call trace:
[0.00]  dump_backtrace+0x0/0x140
[0.00]  show_stack+0x14/0x20
[0.00]  dump_stack+0xb0/0xf4
[0.00]  bad_page+0xec/0x118
[0.00]  free_pages_check_bad+0x70/0xa8
[0.00]  __free_pages_ok+0x2a8/0x2c8
[0.00]  __free_pages+0x4c/0x58

Re: Documentation bug in kernelnewbies.org/StartKernelHacking

2018-02-05 Thread Tobin C. Harding
On Wed, Jan 24, 2018 at 09:40:30PM -0500, Christopher Díaz Riveros wrote:
> Hi, I was reading the StartKernelHacking section from kernelnewbies.org
> site and found that the command:
> 
> scripts/checkpatch.pl --terse --show-types --strict path/to/source/file
> 
> needs to add the --file option before path/to..., if not, checkpatch.pl
> will complain about the non-diff format.

fixed. thanks

Tobin

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Documentation bug in kernelnewbies.org/StartKernelHacking

2018-02-01 Thread Christopher Díaz Riveros
Hi, I was reading the StartKernelHacking section from kernelnewbies.org
site and found that the command:

scripts/checkpatch.pl --terse --show-types --strict path/to/source/file

needs to add the --file option before path/to..., if not, checkpatch.pl
will complain about the non-diff format.

scripts/checkpatch.pl --terse --show-types --strict --file
path/to/source/file

I tried to apply the change by myself, but obviously I had no
permissions to do so.

Hope it helps and thank you for maintaining the website,
-- 
Christopher Díaz Riveros
Gentoo Linux Developer
GPG Fingerprint: E517 5ECB 8152 98E4 FEBC  2BAA 4DBB D10F 0FDD 2547

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Dangling/orphaned shared library as MAP_DENYWRITE result ( BUG)

2017-11-16 Thread Lev Olshvang
Hello list, I applied MAP_DENYWRITE flag to shared object ( kernel 4.8) I compiled test shared library and small executable who uses it. Then executable which used it works as expected, and any attempt to change shared library rejected with EXTBUSY error. But when executable terminated, library is still busy. I though that kernel will clean inode counters, if nobody else reference it(and this is my case - I am the only user of this lib) Is this behaviour a BUG , I mean reference count is zero, but kernel not zeroed i_writecount ?) I suppose that the memory is unmapped, but how can I confirm it ? ( do not have pid for pmap ) Regars,Lev 

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


How to identify the cause of " BUG: Bad page map in process python ..." kernel errors?

2017-01-18 Thread Muni Sekhar
Hi All,

I am observing the following statements in the kernel log while
running driver soak tests and monitoring MIPS and memory usage:

[ 8350.824403] BUG: Bad page map in process python  pte:00c0 pmd:950a5067

[ 8351.457234] BUG: Bad page map in process python
pte:34000c0 pmd:950a5067

[ 8351.458469] BUG: Bad page map in process python
pte:406486b3be206e97 pmd:950a5067

….

[ 8353.266053] BUG: Bad page map in process python
pte:406486b3fb696e97 pmd:950a5067

[ 8353.305982] BUG: Bad page map in process python  pte:00c0 pmd:950a5067

[ 8355.738440] BUG: Bad rss-counter state mm:880138fa7100 idx:1 val:259

[ 8374.905314] general protection fault:  [#1] SMP



In the kernel log, I observed that kernel reported 60 " BUG: Bad page
map in process python ..." reports and 1 “BUG: Bad rss-counter state”
and finally it crashed with “general protection fault”.


I would like to know what does it mean by “BUG: Bad page map in
process python ...” and how to identify the cause of it?


To debug this, do I need to consider the “general protection fault”
stack trace or the first “BUG: Bad page map in process …” stack trace?


-- 
Thanks,
Sekhar

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Possible Bug

2016-04-01 Thread Roger H Newell
On Thu, Mar 31, 2016 at 11:41 PM, nick <xerofo...@gmail.com> wrote:
>
>
> On 2016-03-31 04:22 PM, Roger H Newell wrote:
>> On Thu, Mar 31, 2016 at 4:53 PM,  <valdis.kletni...@vt.edu> wrote:
>>> On Thu, 31 Mar 2016 15:46:51 -0230, Roger H Newell said:
>>>
>>>> I had a look inside the .config I used to compile this kernel.
>>>> I think I found the information you're looking for.
>>>>
>>>> # CONFIG_KASAN is not set
>>>> # CONFIG_SLAB is not set
>>>> CONFIG_SLUB=y
>>>> # CONFIG_SLOB is not set
>>>
>>> Well, that cuts down on the amount of code that needs to be stared at.
>>>
>>> I don't suppose we get extra-ordinarily lucky and the system was set up to
>>> do crash dumps, was it?
>>>
>>> I've spent a few more minutes looking at the relevant code, and the more I
>>> stare at it, the more I understand why we see the same stack trace in varied
>>> forums going back over a year - it looks like it only craps out if something
>>> during resume or hotplug or similar processing stomps on memory, and the 
>>> next
>>> call to apparmor_file_alloc_security() has to allocate a new slab.
>>>
>>> Or more correctly, it only dies with *this* traceback under those 
>>> conditions.
>>> If something else is next up to allocate a slab, it gets a different 
>>> traceback.
>>>
>>>
>>
>> No it wasn't. There is a file
>> /var/crash/linux-image-4.5.0+.267545.crash. However, its basically the
>> same output that I pasted from dmesg. I've included it anyway in case
>> there are some hints in it.
>>
>> ProblemType: KernelOops
>> Annotation: Your system might become unstable now and might need to be
>> restarted.
>> Date: Thu Mar 31 12:29:19 2016
>> Failure: oops
>> OopsText:
>>  [961778.803501] BUG: unable to handle kernel NULL pointer dereference
>> at 0805
>>  [961778.809728] IP: [] kmem_cache_alloc_trace+0x7b/0x1e0
>>  [961778.815943] PGD cea04067 PUD abb59067 PMD 0
>>  [961778.822149] Oops:  [#3] SMP
>>  [961778.828328] Modules linked in: binfmt_misc snd_hda_codec_realtek
>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
>> snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
>> snd_rawmidi snd_seq snd_seq_device snd_timer edac_mce_amd snd joydev
>> kvm_amd input_leds edac_core kvm soundcore serio_raw k10temp i2c_piix4
>> 8250_fintek asus_atk0110 mac_hid irqbypass parport_pc ppdev lp parport
>> autofs4 pata_acpi hid_generic usbhid hid amdkfd amd_iommu_v2 radeon
>> i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
>> fb_sys_fops drm psmouse ahci pata_atiixp libahci r8169 mii wmi
>>  [961778.849223] CPU: 2 PID: 23118 Comm: sign-file Tainted: G  D
>>   4.5.0+ #28
>>  [961778.856339] Hardware name: System manufacturer System Product
>> Name/M5A78L-M LX PLUS, BIOS 040209/20/2011
>>  [961778.863557] task: 88003dbdc100 ti: 88009ae3c000 task.ti:
>> 88009ae3c000
>>  [961778.870811] RIP: 0010:[]  []
>> kmem_cache_alloc_trace+0x7b/0x1e0
>>  [961778.878175] RSP: 0018:88009ae3fc70  EFLAGS: 00010206
>>  [961778.885522] RAX:  RBX: 024080c0 RCX:
>> 0bd44541
>>  [961778.892949] RDX: 0bd44540 RSI: 024080c0 RDI:
>> 00019b20
>>  [961778.900361] RBP: 88009ae3fcb0 R08: 88012fc99b20 R09:
>> 88012b003cc0
>>  [961778.907810] R10: 0805 R11: fefefefefefefeff R12:
>> 024080c0
>>  [961778.915294] R13: 813736d3 R14: 7f9b2ac8c040 R15:
>> 88012b003cc0
>>  [961778.922812] FS:  7f8546f0a700() GS:88012fc8()
>> knlGS:
>>  [961778.930405] CS:  0010 DS:  ES:  CR0: 80050033
>>  [961778.937994] CR2: 0805 CR3: b9cdc000 CR4:
>> 06e0
>>  [961778.945445] Stack:
>>  [961778.952673]  81214fef 88009ae3fccc 0002
>> 880002c28700
>>  [961778.960013]  880002c28700 88009ae3fef4 7f9b2ac8c040
>> 88009ae3fde0
>>  [961778.967372]  88009ae3fcc8 813736d3 81c9fe80
>> 88009ae3fce8
>>  [961778.974682] Call Trace:
>>  [961778.981902]  [] ? lookup_fast+0x16f/0x320
>>  [961778.989161]  [] apparmor_file_alloc_security+0x23/0x40
>>  [961778.996452]  [] security_file_alloc+0x33/0x50
>>  [961779.003495]  [] get_empty_filp+0x9a/0x1c0
>>  [961779.010284]  [] path_openat+0x2e/0x1400
>>  [96

Re: Possible Bug

2016-03-31 Thread Roger H Newell
On Thu, Mar 31, 2016 at 4:53 PM,  <valdis.kletni...@vt.edu> wrote:
> On Thu, 31 Mar 2016 15:46:51 -0230, Roger H Newell said:
>
>> I had a look inside the .config I used to compile this kernel.
>> I think I found the information you're looking for.
>>
>> # CONFIG_KASAN is not set
>> # CONFIG_SLAB is not set
>> CONFIG_SLUB=y
>> # CONFIG_SLOB is not set
>
> Well, that cuts down on the amount of code that needs to be stared at.
>
> I don't suppose we get extra-ordinarily lucky and the system was set up to
> do crash dumps, was it?
>
> I've spent a few more minutes looking at the relevant code, and the more I
> stare at it, the more I understand why we see the same stack trace in varied
> forums going back over a year - it looks like it only craps out if something
> during resume or hotplug or similar processing stomps on memory, and the next
> call to apparmor_file_alloc_security() has to allocate a new slab.
>
> Or more correctly, it only dies with *this* traceback under those conditions.
> If something else is next up to allocate a slab, it gets a different 
> traceback.
>
>

No it wasn't. There is a file
/var/crash/linux-image-4.5.0+.267545.crash. However, its basically the
same output that I pasted from dmesg. I've included it anyway in case
there are some hints in it.

ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be
restarted.
Date: Thu Mar 31 12:29:19 2016
Failure: oops
OopsText:
 [961778.803501] BUG: unable to handle kernel NULL pointer dereference
at 0805
 [961778.809728] IP: [] kmem_cache_alloc_trace+0x7b/0x1e0
 [961778.815943] PGD cea04067 PUD abb59067 PMD 0
 [961778.822149] Oops:  [#3] SMP
 [961778.828328] Modules linked in: binfmt_misc snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
snd_rawmidi snd_seq snd_seq_device snd_timer edac_mce_amd snd joydev
kvm_amd input_leds edac_core kvm soundcore serio_raw k10temp i2c_piix4
8250_fintek asus_atk0110 mac_hid irqbypass parport_pc ppdev lp parport
autofs4 pata_acpi hid_generic usbhid hid amdkfd amd_iommu_v2 radeon
i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops drm psmouse ahci pata_atiixp libahci r8169 mii wmi
 [961778.849223] CPU: 2 PID: 23118 Comm: sign-file Tainted: G  D
  4.5.0+ #28
 [961778.856339] Hardware name: System manufacturer System Product
Name/M5A78L-M LX PLUS, BIOS 040209/20/2011
 [961778.863557] task: 88003dbdc100 ti: 88009ae3c000 task.ti:
88009ae3c000
 [961778.870811] RIP: 0010:[]  []
kmem_cache_alloc_trace+0x7b/0x1e0
 [961778.878175] RSP: 0018:88009ae3fc70  EFLAGS: 00010206
 [961778.885522] RAX:  RBX: 024080c0 RCX:
0bd44541
 [961778.892949] RDX: 0bd44540 RSI: 024080c0 RDI:
00019b20
 [961778.900361] RBP: 88009ae3fcb0 R08: 88012fc99b20 R09:
88012b003cc0
 [961778.907810] R10: 0805 R11: fefefefefefefeff R12:
024080c0
 [961778.915294] R13: 813736d3 R14: 7f9b2ac8c040 R15:
88012b003cc0
 [961778.922812] FS:  7f8546f0a700() GS:88012fc8()
knlGS:
 [961778.930405] CS:  0010 DS:  ES:  CR0: 80050033
 [961778.937994] CR2: 0805 CR3: b9cdc000 CR4:
06e0
 [961778.945445] Stack:
 [961778.952673]  81214fef 88009ae3fccc 0002
880002c28700
 [961778.960013]  880002c28700 88009ae3fef4 7f9b2ac8c040
88009ae3fde0
 [961778.967372]  88009ae3fcc8 813736d3 81c9fe80
88009ae3fce8
 [961778.974682] Call Trace:
 [961778.981902]  [] ? lookup_fast+0x16f/0x320
 [961778.989161]  [] apparmor_file_alloc_security+0x23/0x40
 [961778.996452]  [] security_file_alloc+0x33/0x50
 [961779.003495]  [] get_empty_filp+0x9a/0x1c0
 [961779.010284]  [] path_openat+0x2e/0x1400
 [961779.016817]  [] ? walk_component+0x3a/0x470
 [961779.023241]  [] ? alloc_pages_vma+0xbe/0x240
 [961779.029590]  [] do_filp_open+0x7e/0xe0
 [961779.035858]  [] ?
lru_cache_add_active_or_unevictable+0x36/0xb0
 [961779.042118]  [] ? handle_mm_fault+0x1253/0x19e0
 [961779.048323]  [] ? kmem_cache_alloc+0x17a/0x1d0
 [961779.054493]  [] ? __alloc_fd+0x46/0x190
 [961779.060674]  [] do_sys_open+0x124/0x210
 [961779.066821]  [] SyS_open+0x1e/0x20
 [961779.072981]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
 [961779.079150] Code: 08 65 4c 03 05 3f 3e e2 7e 49 83 78 10 00 4d 8b
10 0f 84 14 01 00 00 4d 85 d2 0f 84 0b 01 00 00 49 63 41 20 48 8d 4a
01 49 8b 39 <49> 8b 1c 02 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb
49 63
 [961779.085893] RIP  [] kmem_cache_alloc_trace+0x7b/0x1e0
 [961779.092359]  RSP 
 [961779.098773] CR2: 0805
 [961779.105231] ---[ end trace e7adb7015192b3a5 ]---

Re: Possible Bug

2016-03-31 Thread Valdis . Kletnieks
On Thu, 31 Mar 2016 15:46:51 -0230, Roger H Newell said:

> I had a look inside the .config I used to compile this kernel.
> I think I found the information you're looking for.
>
> # CONFIG_KASAN is not set
> # CONFIG_SLAB is not set
> CONFIG_SLUB=y
> # CONFIG_SLOB is not set

Well, that cuts down on the amount of code that needs to be stared at.

I don't suppose we get extra-ordinarily lucky and the system was set up to
do crash dumps, was it?

I've spent a few more minutes looking at the relevant code, and the more I
stare at it, the more I understand why we see the same stack trace in varied
forums going back over a year - it looks like it only craps out if something
during resume or hotplug or similar processing stomps on memory, and the next
call to apparmor_file_alloc_security() has to allocate a new slab.

Or more correctly, it only dies with *this* traceback under those conditions.
If something else is next up to allocate a slab, it gets a different traceback.




pgpNvJVw7RCqG.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Possible Bug

2016-03-31 Thread Valdis . Kletnieks
On Thu, 31 Mar 2016 14:59:51 -0230, Roger H Newell said:

> I reverted the previous change, and applied the if(f) test in
> file_free. There are no error messages in dmseg and I can mount the
> USB device.

That's because Nick's patch is *still* wrong, as the *real* problem
appears to be a memory corruption issue elsewhere.  You don't see it
every mount because it only explodes if a new slab needs to be allocated
after the memory corruption has happened.


pgpO9c7A3KHyz.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Possible Bug

2016-03-31 Thread Roger H Newell
On Thu, Mar 31, 2016 at 3:22 PM,   wrote:
> On Thu, 31 Mar 2016 09:30:01 -0700, John Johansen said:
>
>> hrmm, the only thing apparmor is doing in this kernel here is a kzalloc and
>> assigning it to f_security, expanding out the aa_alloc_file_context
>> abstraction (which should probably just be dropped) we get.
>>
>>   file->f_security =  kzalloc(sizeof(struct aa_file_cxt), GFP_KERNEL);
>>   if (!file->f_security)
>>   return -ENOMEM;
>>   return 0;
>>
>> So unless we are getting a NULL for the file I don't see how apparmor can be
>> causing the NULL pointer dereference
>
> Now here's the odd part - just before that, we have:
>
> f->f_cred = get_cred(cred);
> error = security_file_alloc(f);
>
> so if f-> was NULL, we should have exploded just *before* the 
> security_file_alloc()
> call.
>
>>> [952620.397309] IP: [] kmem_cache_alloc_trace+0x7b/0x1e0
>
> Aha.  Smoking gun - I should have spotted this before.  f-> isn't the null
> pointer - it's exploding trying to alloc a slab.  You're right, John - it 
> looks
> like somebody did the fandango all over the memory allocator.
>
> Roger - can you find out if this kernel was using SLAB, SLOB, or SLUB as
> the allocator?  And is KASAN enabled or not? (I see a kasan_kmalloc() lurking
> in slab.h)
>
I had a look inside the .config I used to compile this kernel.
I think I found the information you're looking for.

# CONFIG_KASAN is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Possible Bug

2016-03-31 Thread Valdis . Kletnieks
On Thu, 31 Mar 2016 09:30:01 -0700, John Johansen said:

> hrmm, the only thing apparmor is doing in this kernel here is a kzalloc and
> assigning it to f_security, expanding out the aa_alloc_file_context
> abstraction (which should probably just be dropped) we get.
>
>   file->f_security =  kzalloc(sizeof(struct aa_file_cxt), GFP_KERNEL);
>   if (!file->f_security)
>   return -ENOMEM;
>   return 0;
>
> So unless we are getting a NULL for the file I don't see how apparmor can be
> causing the NULL pointer dereference

Now here's the odd part - just before that, we have:

f->f_cred = get_cred(cred);
error = security_file_alloc(f);

so if f-> was NULL, we should have exploded just *before* the 
security_file_alloc()
call.

>> [952620.397309] IP: [] kmem_cache_alloc_trace+0x7b/0x1e0

Aha.  Smoking gun - I should have spotted this before.  f-> isn't the null
pointer - it's exploding trying to alloc a slab.  You're right, John - it looks
like somebody did the fandango all over the memory allocator.

Roger - can you find out if this kernel was using SLAB, SLOB, or SLUB as
the allocator?  And is KASAN enabled or not? (I see a kasan_kmalloc() lurking
in slab.h)



pgphJuVx5TEcc.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Possible Bug

2016-03-31 Thread Roger H Newell
On Thu, Mar 31, 2016 at 2:12 PM, nick  wrote:
>
>
> On 2016-03-31 12:30 PM, Carlo Caione wrote:
>> On Thu, Mar 31, 2016 at 5:08 PM, Roger H Newell  
>> wrote:
>>> On Thu, Mar 31, 2016 at 12:18 PM, nick  wrote:


 On 2016-03-31 08:34 AM, Roger H Newell wrote:
 In the fs/file_table.c file as from the root directory of your kernel tree 
 change in the function,
 get_empty_flip change these lines:
  if (unlikely(error)) {
  file_free(f);
  return ERR_PTR(error);
  }
 to:
 if (unlikely(error))
 return ERR_PTR(error);
 and tell me if that fixes your issue.
 Nick
>>>
>>>
>>> Seems to have worked, the error is is gone and I can mount the USB device.
>>
>> That's not a fix, you are leaking f.
>>
> Good catch seems:
> static inline void file_free(struct file *f)
> {
>  percpu_counter_dec(_files);
>  if (f)
> call_rcu(>f_u.fu_rcuhead, file_free_rcu);
> }
> Roger can you tell this and see if it fixes your issue. The file
> is fs/file_table.c from the root of the kernel directory.
> Thanks,
> Nick

I reverted the previous change, and applied the if(f) test in
file_free. There are no error messages in dmseg and I can mount the
USB device.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Possible Bug

2016-03-31 Thread Valdis . Kletnieks
On Thu, 31 Mar 2016 13:55:57 -0230, nick said:

> >>> In the fs/file_table.c file as from the root directory of your kernel 
> >>> tree change in the function,
> >>> get_empty_flip change these lines:
> >>>  if (unlikely(error)) {
> >>>  file_free(f);
> >>>  return ERR_PTR(error);
> >>>  }
> >>> to:
> >>> if (unlikely(error))
> >>> return ERR_PTR(error);
> >>> and tell me if that fixes your issue.
> >>> Nick

This is an incorrect fix, as the crash happens in security_file_alloc() -
before it ever even *reaches* the if statement.

In addition, you just leaked a reference on f->f_cred by
bypassing the put_cred() that file_free() calls.

If this happens to work, it's by accident, and is merely papering over
a more serious problem.

Spotting the reference leak is (or should have been) a 3 or 5 minute task -
look at the code, see there's a get_FOO() call, and ask where the matching
put_FOO() is. There's a get_cred() you need to have hit to get here - so
*somebody* needs to do a put_cred(). And then looking at the body of
file_free() *should* have shown you that your proposed fix is incredibly
incorrect.

Seriously Nick - please stop this. You're detracting from valuable developer
resources by submitting these incorrect fixes.



pgpdF1QME74wj.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Possible Bug

2016-03-31 Thread Carlo Caione
On Thu, Mar 31, 2016 at 5:08 PM, Roger H Newell  wrote:
> On Thu, Mar 31, 2016 at 12:18 PM, nick  wrote:
>>
>>
>> On 2016-03-31 08:34 AM, Roger H Newell wrote:
>> In the fs/file_table.c file as from the root directory of your kernel tree 
>> change in the function,
>> get_empty_flip change these lines:
>>  if (unlikely(error)) {
>>  file_free(f);
>>  return ERR_PTR(error);
>>  }
>> to:
>> if (unlikely(error))
>> return ERR_PTR(error);
>> and tell me if that fixes your issue.
>> Nick
>
>
> Seems to have worked, the error is is gone and I can mount the USB device.

That's not a fix, you are leaking f.

-- 
Carlo Caione

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Possible Bug

2016-03-31 Thread Roger H Newell
On Thu, Mar 31, 2016 at 1:41 PM, nick <xerofo...@gmail.com> wrote:
>
>
> On 2016-03-31 11:08 AM, Roger H Newell wrote:
>> On Thu, Mar 31, 2016 at 12:18 PM, nick <xerofo...@gmail.com> wrote:
>>>
>>>
>>> On 2016-03-31 08:34 AM, Roger H Newell wrote:
>>>> Hi:
>>>>
>>>> I think I may have stumbled upon a USB bug. Before I send it off to
>>>> one of the larger lists I thought I should run it through here to be
>>>> sure its a bug and I have all the information. Could someone have a
>>>> look and advise ?
>>>>
>>>> I was having a problem mounting up a USB drive, so I had a look at
>>>> dmesg. The output is as follows. I'm running 4.5.0+ from gregs
>>>> staging-testing tree.
>>>>
>>>> [952620.256859] usb 1-6: new high-speed USB device number 4 using ehci-pci
>>>> [952620.389797] usb 1-6: New USB device found, idVendor=0781, 
>>>> idProduct=5530
>>>> [952620.389807] usb 1-6: New USB device strings: Mfr=1, Product=2,
>>>> SerialNumber=3
>>>> [952620.389813] usb 1-6: Product: Cruzer
>>>> [952620.389818] usb 1-6: Manufacturer: SanDisk
>>>> [952620.389823] usb 1-6: SerialNumber: 20060876510A09733592
>>>> [952620.397158] BUG: unable to handle kernel NULL pointer dereference
>>>> at 0805
>>>> [952620.397309] IP: [] kmem_cache_alloc_trace+0x7b/0x1e0
>>>> [952620.397427] PGD 3db56067 PUD cb6cd067 PMD 0
>>>> [952620.397511] Oops:  [#1] SMP
>>>> [952620.397573] Modules linked in: binfmt_misc snd_hda_codec_realtek
>>>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
>>>> snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
>>>> snd_rawmidi snd_seq snd_seq_device snd_timer edac_mce_amd snd joydev
>>>> kvm_amd input_leds edac_core kvm soundcore serio_raw k10temp i2c_piix4
>>>> 8250_fintek asus_atk0110 mac_hid irqbypass parport_pc ppdev lp parport
>>>> autofs4 pata_acpi hid_generic usbhid hid amdkfd amd_iommu_v2 radeon
>>>> i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
>>>> fb_sys_fops drm psmouse ahci pata_atiixp libahci r8169 mii wmi
>>>> [952620.398620] CPU: 1 PID: 18445 Comm: mtp-probe Not tainted 4.5.0+ #28
>>>> [952620.398726] Hardware name: System manufacturer System Product
>>>> Name/M5A78L-M LX PLUS, BIOS 040209/20/2011
>>>> [952620.398884] task: 88009bf68d00 ti: 8800499f task.ti:
>>>> 8800499f
>>>> [952620.399006] RIP: 0010:[]  []
>>>> kmem_cache_alloc_trace+0x7b/0x1e0
>>>> [952620.399158] RSP: 0018:8800499f3c70  EFLAGS: 00010206
>>>> [952620.399246] RAX:  RBX: 024080c0 RCX:
>>>> 0ae98088
>>>> [952620.399362] RDX: 0ae98087 RSI: 024080c0 RDI:
>>>> 00019b20
>>>> [952620.399477] RBP: 8800499f3cb0 R08: 88012fc59b20 R09:
>>>> 88012b003cc0
>>>> [952620.399593] R10: 0805 R11: fefefefefefefeff R12:
>>>> 024080c0
>>>> [952620.399709] R13: 813736d3 R14: 7f9bfa435040 R15:
>>>> 88012b003cc0
>>>> [952620.399826] FS:  7f550c9a48c0() GS:88012fc4()
>>>> knlGS:
>>>> [952620.399956] CS:  0010 DS:  ES:  CR0: 80050033
>>>> [952620.400050] CR2: 0805 CR3: ce839000 CR4:
>>>> 06e0
>>>> [952620.400165] Stack:
>>>> [952620.400201]  024080c0 8120bb2c 0002
>>>> 88000227d500
>>>> [952620.400335]  88000227d500 8800499f3ef4 7f9bfa435040
>>>> 8800499f3de0
>>>> [952620.400467]  8800499f3cc8 813736d3 81c9fe80
>>>> 8800499f3ce8
>>>> [952620.400599] Call Trace:
>>>> [952620.400649]  [] ? get_empty_filp+0x5c/0x1c0
>>>> [952620.400748]  [] 
>>>> apparmor_file_alloc_security+0x23/0x40
>>>> [952620.400861]  [] security_file_alloc+0x33/0x50
>>>> [952620.400961]  [] get_empty_filp+0x9a/0x1c0
>>>> [952620.401057]  [] path_openat+0x2e/0x1400
>>>> [952620.401149]  [] ? walk_component+0x3a/0x470
>>>> [952620.401246]  [] ? path_init+0x1d9/0x330
>>>> [952620.401339]  [] ? __inc_zone_page_state+0x35/0x40
>>>> [952620.401444]  [] ? putname+0x54/0x60
>>>> [952620.401530]  [] do_filp_

Re: Possible Bug

2016-03-31 Thread Roger H Newell
On Thu, Mar 31, 2016 at 12:18 PM, nick <xerofo...@gmail.com> wrote:
>
>
> On 2016-03-31 08:34 AM, Roger H Newell wrote:
>> Hi:
>>
>> I think I may have stumbled upon a USB bug. Before I send it off to
>> one of the larger lists I thought I should run it through here to be
>> sure its a bug and I have all the information. Could someone have a
>> look and advise ?
>>
>> I was having a problem mounting up a USB drive, so I had a look at
>> dmesg. The output is as follows. I'm running 4.5.0+ from gregs
>> staging-testing tree.
>>
>> [952620.256859] usb 1-6: new high-speed USB device number 4 using ehci-pci
>> [952620.389797] usb 1-6: New USB device found, idVendor=0781, idProduct=5530
>> [952620.389807] usb 1-6: New USB device strings: Mfr=1, Product=2,
>> SerialNumber=3
>> [952620.389813] usb 1-6: Product: Cruzer
>> [952620.389818] usb 1-6: Manufacturer: SanDisk
>> [952620.389823] usb 1-6: SerialNumber: 20060876510A09733592
>> [952620.397158] BUG: unable to handle kernel NULL pointer dereference
>> at 0805
>> [952620.397309] IP: [] kmem_cache_alloc_trace+0x7b/0x1e0
>> [952620.397427] PGD 3db56067 PUD cb6cd067 PMD 0
>> [952620.397511] Oops:  [#1] SMP
>> [952620.397573] Modules linked in: binfmt_misc snd_hda_codec_realtek
>> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
>> snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
>> snd_rawmidi snd_seq snd_seq_device snd_timer edac_mce_amd snd joydev
>> kvm_amd input_leds edac_core kvm soundcore serio_raw k10temp i2c_piix4
>> 8250_fintek asus_atk0110 mac_hid irqbypass parport_pc ppdev lp parport
>> autofs4 pata_acpi hid_generic usbhid hid amdkfd amd_iommu_v2 radeon
>> i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
>> fb_sys_fops drm psmouse ahci pata_atiixp libahci r8169 mii wmi
>> [952620.398620] CPU: 1 PID: 18445 Comm: mtp-probe Not tainted 4.5.0+ #28
>> [952620.398726] Hardware name: System manufacturer System Product
>> Name/M5A78L-M LX PLUS, BIOS 040209/20/2011
>> [952620.398884] task: 88009bf68d00 ti: 8800499f task.ti:
>> 8800499f
>> [952620.399006] RIP: 0010:[]  []
>> kmem_cache_alloc_trace+0x7b/0x1e0
>> [952620.399158] RSP: 0018:8800499f3c70  EFLAGS: 00010206
>> [952620.399246] RAX:  RBX: 024080c0 RCX:
>> 0ae98088
>> [952620.399362] RDX: 0ae98087 RSI: 024080c0 RDI:
>> 00019b20
>> [952620.399477] RBP: 8800499f3cb0 R08: 88012fc59b20 R09:
>> 88012b003cc0
>> [952620.399593] R10: 0805 R11: fefefefefefefeff R12:
>> 024080c0
>> [952620.399709] R13: 813736d3 R14: 7f9bfa435040 R15:
>> 88012b003cc0
>> [952620.399826] FS:  7f550c9a48c0() GS:88012fc4()
>> knlGS:
>> [952620.399956] CS:  0010 DS:  ES:  CR0: 80050033
>> [952620.400050] CR2: 0805 CR3: ce839000 CR4:
>> 06e0
>> [952620.400165] Stack:
>> [952620.400201]  024080c0 8120bb2c 0002
>> 88000227d500
>> [952620.400335]  88000227d500 8800499f3ef4 7f9bfa435040
>> 8800499f3de0
>> [952620.400467]  8800499f3cc8 813736d3 81c9fe80
>> 8800499f3ce8
>> [952620.400599] Call Trace:
>> [952620.400649]  [] ? get_empty_filp+0x5c/0x1c0
>> [952620.400748]  [] apparmor_file_alloc_security+0x23/0x40
>> [952620.400861]  [] security_file_alloc+0x33/0x50
>> [952620.400961]  [] get_empty_filp+0x9a/0x1c0
>> [952620.401057]  [] path_openat+0x2e/0x1400
>> [952620.401149]  [] ? walk_component+0x3a/0x470
>> [952620.401246]  [] ? path_init+0x1d9/0x330
>> [952620.401339]  [] ? __inc_zone_page_state+0x35/0x40
>> [952620.401444]  [] ? putname+0x54/0x60
>> [952620.401530]  [] do_filp_open+0x7e/0xe0
>> [952620.401620]  [] ? kmem_cache_alloc_trace+0x1c5/0x1e0
>> [952620.401728]  [] ? kmem_cache_alloc+0x17a/0x1d0
>> [952620.401829]  [] ? getname_flags+0x56/0x1f0
>> [952620.401924]  [] ? __alloc_fd+0x46/0x190
>> [952620.402016]  [] do_sys_open+0x124/0x210
>> [952620.402107]  [] ? SyS_access+0x1e8/0x230
>> [952620.402200]  [] SyS_open+0x1e/0x20
>> [952620.402286]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
>> [952620.402391] Code: 08 65 4c 03 05 3f 3e e2 7e 49 83 78 10 00 4d 8b
>> 10 0f 84 14 01 00 00 4d 85 d2 0f 84 0b 01 00 00 49 63 41 20 48 8d 4a
>> 01 49 8b 39 <49> 8b 1c 02 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb
>> 49 63
>> [952620.402934] RIP  [] kmem_cache_

Re: Possible Bug

2016-03-31 Thread Valdis . Kletnieks
On Thu, 31 Mar 2016 10:04:47 -0230, Roger H Newell said:

> I think I may have stumbled upon a USB bug. Before I send it off to

Looks like an apparmor bug, not USB. Quite likely the same problem as these
guys hit, as the traceback is the same:

http://askubuntu.com/questions/748119/ubuntu-15-10-hangs-after-suspend-resume-inspiron-7559
https://github.com/IRATI/stack/issues/470
(And other hits)

Seems to be a long-standing issue, that second link is from Feb 2015. On
the other hand, all the hits appear to be in mailing lists *other* than
ones where apparmor guys were likely to see it.

I'm adding a cc: to the apparmor guys.

> I was having a problem mounting up a USB drive, so I had a look at
> dmesg. The output is as follows. I'm running 4.5.0+ from gregs
> staging-testing tree.
>
> [952620.256859] usb 1-6: new high-speed USB device number 4 using ehci-pci
> [952620.389797] usb 1-6: New USB device found, idVendor=0781, idProduct=5530
> [952620.389807] usb 1-6: New USB device strings: Mfr=1, Product=2, 
> SerialNumber=3
> [952620.389813] usb 1-6: Product: Cruzer
> [952620.389818] usb 1-6: Manufacturer: SanDisk
> [952620.389823] usb 1-6: SerialNumber: 20060876510A09733592
> [952620.397158] BUG: unable to handle kernel NULL pointer dereference at 
> 0805
> [952620.397309] IP: [] kmem_cache_alloc_trace+0x7b/0x1e0
> [952620.397427] PGD 3db56067 PUD cb6cd067 PMD 0
> [952620.397511] Oops:  [#1] SMP
> [952620.397573] Modules linked in: binfmt_misc snd_hda_codec_realtek
> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
> snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
> snd_rawmidi snd_seq snd_seq_device snd_timer edac_mce_amd snd joydev
> kvm_amd input_leds edac_core kvm soundcore serio_raw k10temp i2c_piix4
> 8250_fintek asus_atk0110 mac_hid irqbypass parport_pc ppdev lp parport
> autofs4 pata_acpi hid_generic usbhid hid amdkfd amd_iommu_v2 radeon
> i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops drm psmouse ahci pata_atiixp libahci r8169 mii wmi
> [952620.398620] CPU: 1 PID: 18445 Comm: mtp-probe Not tainted 4.5.0+ #28
> [952620.398726] Hardware name: System manufacturer System Product 
> Name/M5A78L-M LX PLUS, BIOS 040209/20/2011
> [952620.398884] task: 88009bf68d00 ti: 8800499f task.ti: 
> 8800499f
> [952620.399006] RIP: 0010:[]  [] 
> kmem_cache_alloc_trace+0x7b/0x1e0
> [952620.399158] RSP: 0018:8800499f3c70  EFLAGS: 00010206
> [952620.399246] RAX:  RBX: 024080c0 RCX: 
> 0ae98088
> [952620.399362] RDX: 0ae98087 RSI: 024080c0 RDI: 
> 00019b20
> [952620.399477] RBP: 8800499f3cb0 R08: 88012fc59b20 R09: 
> 88012b003cc0
> [952620.399593] R10: 0805 R11: fefefefefefefeff R12: 
> 024080c0
> [952620.399709] R13: 813736d3 R14: 7f9bfa435040 R15: 
> 88012b003cc0
> [952620.399826] FS:  7f550c9a48c0() GS:88012fc4() 
> knlGS:
> [952620.399956] CS:  0010 DS:  ES:  CR0: 80050033
> [952620.400050] CR2: 0805 CR3: ce839000 CR4: 
> 06e0
> [952620.400165] Stack:
> [952620.400201]  024080c0 8120bb2c 0002 
> 88000227d500
> [952620.400335]  88000227d500 8800499f3ef4 7f9bfa435040 
> 8800499f3de0
> [952620.400467]  8800499f3cc8 813736d3 81c9fe80 
> 8800499f3ce8
> [952620.400599] Call Trace:
> [952620.400649]  [] ? get_empty_filp+0x5c/0x1c0
> [952620.400748]  [] apparmor_file_alloc_security+0x23/0x40
> [952620.400861]  [] security_file_alloc+0x33/0x50
> [952620.400961]  [] get_empty_filp+0x9a/0x1c0
> [952620.401057]  [] path_openat+0x2e/0x1400
> [952620.401149]  [] ? walk_component+0x3a/0x470
> [952620.401246]  [] ? path_init+0x1d9/0x330
> [952620.401339]  [] ? __inc_zone_page_state+0x35/0x40
> [952620.401444]  [] ? putname+0x54/0x60
> [952620.401530]  [] do_filp_open+0x7e/0xe0
> [952620.401620]  [] ? kmem_cache_alloc_trace+0x1c5/0x1e0
> [952620.401728]  [] ? kmem_cache_alloc+0x17a/0x1d0
> [952620.401829]  [] ? getname_flags+0x56/0x1f0
> [952620.401924]  [] ? __alloc_fd+0x46/0x190
> [952620.402016]  [] do_sys_open+0x124/0x210
> [952620.402107]  [] ? SyS_access+0x1e8/0x230
> [952620.402200]  [] SyS_open+0x1e/0x20
> [952620.402286]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
> [952620.402391] Code: 08 65 4c 03 05 3f 3e e2 7e 49 83 78 10 00 4d 8b 10 0f 
> 84 14 01 00 00 4d 85 d2 0f 84 0b 01 00 00 49 63 41 20 48 8d 4a 01 49 8b 39 
> <49> 8b 1c 02 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63
> [952620.402934] RIP  [] kmem_cache_alloc_trace+0x7b/0x1e0
> [952620.403047]  RSP 
> [952620.403106] CR2: 0805
> [95

Possible Bug

2016-03-31 Thread Roger H Newell
Hi:

I think I may have stumbled upon a USB bug. Before I send it off to
one of the larger lists I thought I should run it through here to be
sure its a bug and I have all the information. Could someone have a
look and advise ?

I was having a problem mounting up a USB drive, so I had a look at
dmesg. The output is as follows. I'm running 4.5.0+ from gregs
staging-testing tree.

[952620.256859] usb 1-6: new high-speed USB device number 4 using ehci-pci
[952620.389797] usb 1-6: New USB device found, idVendor=0781, idProduct=5530
[952620.389807] usb 1-6: New USB device strings: Mfr=1, Product=2,
SerialNumber=3
[952620.389813] usb 1-6: Product: Cruzer
[952620.389818] usb 1-6: Manufacturer: SanDisk
[952620.389823] usb 1-6: SerialNumber: 20060876510A09733592
[952620.397158] BUG: unable to handle kernel NULL pointer dereference
at 0805
[952620.397309] IP: [] kmem_cache_alloc_trace+0x7b/0x1e0
[952620.397427] PGD 3db56067 PUD cb6cd067 PMD 0
[952620.397511] Oops:  [#1] SMP
[952620.397573] Modules linked in: binfmt_misc snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
snd_rawmidi snd_seq snd_seq_device snd_timer edac_mce_amd snd joydev
kvm_amd input_leds edac_core kvm soundcore serio_raw k10temp i2c_piix4
8250_fintek asus_atk0110 mac_hid irqbypass parport_pc ppdev lp parport
autofs4 pata_acpi hid_generic usbhid hid amdkfd amd_iommu_v2 radeon
i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops drm psmouse ahci pata_atiixp libahci r8169 mii wmi
[952620.398620] CPU: 1 PID: 18445 Comm: mtp-probe Not tainted 4.5.0+ #28
[952620.398726] Hardware name: System manufacturer System Product
Name/M5A78L-M LX PLUS, BIOS 040209/20/2011
[952620.398884] task: 88009bf68d00 ti: 8800499f task.ti:
8800499f
[952620.399006] RIP: 0010:[]  []
kmem_cache_alloc_trace+0x7b/0x1e0
[952620.399158] RSP: 0018:8800499f3c70  EFLAGS: 00010206
[952620.399246] RAX:  RBX: 024080c0 RCX:
0ae98088
[952620.399362] RDX: 0ae98087 RSI: 024080c0 RDI:
00019b20
[952620.399477] RBP: 8800499f3cb0 R08: 88012fc59b20 R09:
88012b003cc0
[952620.399593] R10: 0805 R11: fefefefefefefeff R12:
024080c0
[952620.399709] R13: 813736d3 R14: 7f9bfa435040 R15:
88012b003cc0
[952620.399826] FS:  7f550c9a48c0() GS:88012fc4()
knlGS:
[952620.399956] CS:  0010 DS:  ES:  CR0: 80050033
[952620.400050] CR2: 0805 CR3: ce839000 CR4:
06e0
[952620.400165] Stack:
[952620.400201]  024080c0 8120bb2c 0002
88000227d500
[952620.400335]  88000227d500 8800499f3ef4 7f9bfa435040
8800499f3de0
[952620.400467]  8800499f3cc8 813736d3 81c9fe80
8800499f3ce8
[952620.400599] Call Trace:
[952620.400649]  [] ? get_empty_filp+0x5c/0x1c0
[952620.400748]  [] apparmor_file_alloc_security+0x23/0x40
[952620.400861]  [] security_file_alloc+0x33/0x50
[952620.400961]  [] get_empty_filp+0x9a/0x1c0
[952620.401057]  [] path_openat+0x2e/0x1400
[952620.401149]  [] ? walk_component+0x3a/0x470
[952620.401246]  [] ? path_init+0x1d9/0x330
[952620.401339]  [] ? __inc_zone_page_state+0x35/0x40
[952620.401444]  [] ? putname+0x54/0x60
[952620.401530]  [] do_filp_open+0x7e/0xe0
[952620.401620]  [] ? kmem_cache_alloc_trace+0x1c5/0x1e0
[952620.401728]  [] ? kmem_cache_alloc+0x17a/0x1d0
[952620.401829]  [] ? getname_flags+0x56/0x1f0
[952620.401924]  [] ? __alloc_fd+0x46/0x190
[952620.402016]  [] do_sys_open+0x124/0x210
[952620.402107]  [] ? SyS_access+0x1e8/0x230
[952620.402200]  [] SyS_open+0x1e/0x20
[952620.402286]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
[952620.402391] Code: 08 65 4c 03 05 3f 3e e2 7e 49 83 78 10 00 4d 8b
10 0f 84 14 01 00 00 4d 85 d2 0f 84 0b 01 00 00 49 63 41 20 48 8d 4a
01 49 8b 39 <49> 8b 1c 02 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb
49 63
[952620.402934] RIP  [] kmem_cache_alloc_trace+0x7b/0x1e0
[952620.403047]  RSP 
[952620.403106] CR2: 0805
[952620.445606] ---[ end trace e7adb7015192b3a3 ]---

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Is there a bug in dgnc.ko?

2016-02-24 Thread Navy Cheng
On Wed, Feb 24, 2016 at 05:33:11PM +0530, Sudip Mukherjee wrote:
> On Wed, Feb 24, 2016 at 5:27 PM, Navy Cheng  wrote:
> > On Tue, Feb 23, 2016 at 09:43:56PM -0800, Greg KH wrote:
> >> On Wed, Feb 24, 2016 at 12:57:42PM +0800, Navy Cheng wrote:
> >> > Hi,
> >> >
> >> > My kernel version is v4.4, and I have built drivers/staging/dgnc/dgnc.ko.
> >> > I change to *dir*/drivers/staging/dgnc and do like this:
> >> >
> >> > sudo insmod ./dgnc.ko
> >>
> >> Do you have the hardware that this driver controls?
> >
> > I'm not sure. My laptop is Dell Inspiron 14R - 5437 and I don't know if
> > there is the right hardware. I often don't know about what a driver is used
> > for in drivers/staging/. Is there any good way to know the function of a
> > driver or module?
> >
> >>
> >> > sudo lsmod | grep dgnc
> >>
> >> Does that show anything?
> >
> > Output: dgnc   65536  0
> >
> >> > sudo rmmod ./dgnc
> 
> what did dmesg showed after you did rmmod?
> 

*dmesg* show nothing after I rmmod dgnc. I guess something wrong with
dgnc_cleanup_module() which is called when dgnc is removed.


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Is there a bug in dgnc.ko?

2016-02-24 Thread Greg KH
On Wed, Feb 24, 2016 at 07:57:01PM +0800, Navy Cheng wrote:
> On Tue, Feb 23, 2016 at 09:43:56PM -0800, Greg KH wrote:
> > On Wed, Feb 24, 2016 at 12:57:42PM +0800, Navy Cheng wrote:
> > > Hi,
> > > 
> > > My kernel version is v4.4, and I have built drivers/staging/dgnc/dgnc.ko.
> > > I change to *dir*/drivers/staging/dgnc and do like this:
> > > 
> > > sudo insmod ./dgnc.ko
> > 
> > Do you have the hardware that this driver controls?
> 
> I'm not sure. My laptop is Dell Inspiron 14R - 5437 and I don't know if
> there is the right hardware. I often don't know about what a driver is used
> for in drivers/staging/. Is there any good way to know the function of a
> driver or module?

If you don't think you have the hardware, then almost always, you don't
have the hardware, it's pretty simple :)

> > 
> > > sudo lsmod | grep dgnc
> > 
> > Does that show anything?
> 
> Output: dgnc       65536  0

Great, it loaded, then crashes when you unload, congratulations, you can
now work on fixing that bug!

good luck,

greg k-h

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Is there a bug in dgnc.ko?

2016-02-24 Thread Navy Cheng
On Wed, Feb 24, 2016 at 12:37:40AM -0500, valdis.kletni...@vt.edu wrote:
> On Wed, 24 Feb 2016 12:57:42 +0800, Navy Cheng said:
> > Hi,
> >
> > My kernel version is v4.4, and I have built drivers/staging/dgnc/dgnc.ko.
> > I change to *dir*/drivers/staging/dgnc and do like this:
> >
> > sudo insmod ./dgnc.ko
> 
> What output, if any, did this generate?

No output. I use *dmesg* to find more information:

[  572.915977] dgnc: module is from the staging directory, the quality is
   unknown, you have been warned.

> > sudo lsmod | grep dgnc
> 
> Again, what messages?

Output:
dgnc   65536  0

> > sudo rmmod ./dgnc
> 
> Again, what happened?

No output. After *dmesg*, no more info in the ring buffer.

> > sudo insmod ./dgnc.ko
> 
> And here?

No output. 


> > After I re-insmod the dgnc module, my laptop is breakdown.
>
> What does "breakdown" mean?  Did it hang entirely? Did you get a message
> in your dmesg output and/or on the console?  Other?

The GUI stop work and any key in my laptop is not work. The *Caps Lock lamp*
in the keyboard is flashing.

> > My OS is debian 8.0. Is there a bug in dgnc.ko or something wrong with my
> > OS or kernel. If there is a bug, How can I find it?
> 
> Start by providing enough info to see if there's a bug.



> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies



___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Is there a bug in dgnc.ko?

2016-02-24 Thread Sudip Mukherjee
On Wed, Feb 24, 2016 at 5:27 PM, Navy Cheng  wrote:
> On Tue, Feb 23, 2016 at 09:43:56PM -0800, Greg KH wrote:
>> On Wed, Feb 24, 2016 at 12:57:42PM +0800, Navy Cheng wrote:
>> > Hi,
>> >
>> > My kernel version is v4.4, and I have built drivers/staging/dgnc/dgnc.ko.
>> > I change to *dir*/drivers/staging/dgnc and do like this:
>> >
>> > sudo insmod ./dgnc.ko
>>
>> Do you have the hardware that this driver controls?
>
> I'm not sure. My laptop is Dell Inspiron 14R - 5437 and I don't know if
> there is the right hardware. I often don't know about what a driver is used
> for in drivers/staging/. Is there any good way to know the function of a
> driver or module?
>
>>
>> > sudo lsmod | grep dgnc
>>
>> Does that show anything?
>
> Output: dgnc   65536  0
>
>> > sudo rmmod ./dgnc

what did dmesg showed after you did rmmod?

regards
sudip

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Is there a bug in dgnc.ko?

2016-02-24 Thread Navy Cheng
On Tue, Feb 23, 2016 at 09:43:56PM -0800, Greg KH wrote:
> On Wed, Feb 24, 2016 at 12:57:42PM +0800, Navy Cheng wrote:
> > Hi,
> > 
> > My kernel version is v4.4, and I have built drivers/staging/dgnc/dgnc.ko.
> > I change to *dir*/drivers/staging/dgnc and do like this:
> > 
> > sudo insmod ./dgnc.ko
> 
> Do you have the hardware that this driver controls?

I'm not sure. My laptop is Dell Inspiron 14R - 5437 and I don't know if
there is the right hardware. I often don't know about what a driver is used
for in drivers/staging/. Is there any good way to know the function of a
driver or module?

> 
> > sudo lsmod | grep dgnc
> 
> Does that show anything?

Output: dgnc   65536  0

> > sudo rmmod ./dgnc
> > sudo insmod ./dgnc.ko
> > 
> > After I re-insmod the dgnc module, my laptop is breakdown.
> 
> Then there's a bug to fix in the driver, it must not clean up everything
> properly.  Based on a quick read of it, there is lots of things that
> need to be fixed in it, that's why it is in staging.  If you are
> interested, I would suggest fixing this issue would be a great start.

I'm very glad to get your advice to fix this issue. As a kernelnewbies, I
have sent two patches about code cleaning to you, and they are merged to
the kernel tree. I realy interested to fix this bug to improve my
understanding of the kerenl.

Thank you.


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Is there a bug in dgnc.ko?

2016-02-23 Thread Greg KH
On Wed, Feb 24, 2016 at 12:57:42PM +0800, Navy Cheng wrote:
> Hi,
> 
> My kernel version is v4.4, and I have built drivers/staging/dgnc/dgnc.ko.
> I change to *dir*/drivers/staging/dgnc and do like this:
> 
> sudo insmod ./dgnc.ko

Do you have the hardware that this driver controls?

> sudo lsmod | grep dgnc

Does that show anything?

> sudo rmmod ./dgnc
> sudo insmod ./dgnc.ko
> 
> After I re-insmod the dgnc module, my laptop is breakdown.

Then there's a bug to fix in the driver, it must not clean up everything
properly.  Based on a quick read of it, there is lots of things that
need to be fixed in it, that's why it is in staging.  If you are
interested, I would suggest fixing this issue would be a great start.

good luck!

greg k-h

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Is there a bug in dgnc.ko?

2016-02-23 Thread Navy Cheng
Hi,

My kernel version is v4.4, and I have built drivers/staging/dgnc/dgnc.ko.
I change to *dir*/drivers/staging/dgnc and do like this:

sudo insmod ./dgnc.ko
sudo lsmod | grep dgnc
sudo rmmod ./dgnc
sudo insmod ./dgnc.ko

After I re-insmod the dgnc module, my laptop is breakdown.

My OS is debian 8.0. Is there a bug in dgnc.ko or something wrong with my
OS or kernel. If there is a bug, How can I find it?

Thanks.


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: How to find a bug with lost network messages

2016-02-02 Thread Arthur Pichlkostner

I just know that netif_rx() should be updated to netif_rx_ni() for newer 
kernels.
Without the change I had NOHZ errors in the log, the same change was done in 
SLCAN.
Maybe this is the origin of your problem.

You can try our fork on https://github.com/tjohann/sllin which includes many 
improvents and fixes compared to the original driver from 2013.

On Tue, Feb 02, 2016 at 10:09:20AM +0100, Sandro Stiller wrote:
> Hello,
> 
> I'm struggeling with a network driver (sllin[1]) which is not in the 
> official kernel.
> It has a lot in common with the slcan driver but is used for LIN networks.
> The problem is, that sometimes messages sent to the network layer via 
> netif_rx() don't arrive in all listening programs.
> 
> This is how the driver works:
> 1. The application sends CAN messages to the network interface
> 2. The driver forwards it to the UART (tty)
> 3. The UART receives the same message (single-wire connection, RX and TX 
> connected) and sends it back to the network layer
> 4. The sending application receives the previously sent message and can 
> check for transmission errors and appended LIN slave replies.
> 
> Sometimes the last point (4.) does not work after 10 - 40 seconds of 
> transmission.
> The application does not receive the message using a blocking read() on 
> the socket, but other processes receive it (running candump on the 
> interface). netif_rx() always returns 0.
> 
> If more programs are listening (running multiple instances of candump), 
> the problem appears less often or never.
> On my PC there is no problem, it occures on ARM only.
> I'm using kernel 4.1.
> 
> Can you give me a hint where to search for the cause of this behaviour?
> 
> Thank you very much.
> 
> Sandro
> 
> 
> [1]: https://github.com/sstiller/sllin/tree/master/sllin
> 
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


How to find a bug with lost network messages

2016-02-02 Thread Sandro Stiller
Hello,

I'm struggeling with a network driver (sllin[1]) which is not in the 
official kernel.
It has a lot in common with the slcan driver but is used for LIN networks.
The problem is, that sometimes messages sent to the network layer via 
netif_rx() don't arrive in all listening programs.

This is how the driver works:
1. The application sends CAN messages to the network interface
2. The driver forwards it to the UART (tty)
3. The UART receives the same message (single-wire connection, RX and TX 
connected) and sends it back to the network layer
4. The sending application receives the previously sent message and can 
check for transmission errors and appended LIN slave replies.

Sometimes the last point (4.) does not work after 10 - 40 seconds of 
transmission.
The application does not receive the message using a blocking read() on 
the socket, but other processes receive it (running candump on the 
interface). netif_rx() always returns 0.

If more programs are listening (running multiple instances of candump), 
the problem appears less often or never.
On my PC there is no problem, it occures on ARM only.
I'm using kernel 4.1.

Can you give me a hint where to search for the cause of this behaviour?

Thank you very much.

Sandro


[1]: https://github.com/sstiller/sllin/tree/master/sllin

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: System freezes with Kernel BUG at kernel/time/timer.c:1108 run_timer_softirq()

2015-10-19 Thread Mulyadi Santosa
On Wed, Oct 14, 2015 at 11:02 AM, Raghavendra. S
 wrote:
>
> Hi,
>
>   In my x86 laptop I could observe random panic with Fatal exception in 
> interrupt. We are testing Wi-Fi with our own driver issue occur randomly 
> after 2-3 hours of testing.
>
>   Panic is due to fatal except in interrupt and Kernel stack trace is  
> run_timer_softirq()->__run_timers()->cascade()->BUG_ON(tbase_get_base(timer->base)
>  != base);
>
>
>
>
>
>   Any pointers to debug this will be of great help.
>
>
> -Raghu
>
First of all, please send email using plain text mode. If you need to
send picture, better attach it to somewhere else and provide us with
the link.

I am just guessing:
it might be that when timer is about to run or in the middle of
running, the data structure that represents the timer gets deleted.

Sounds like race condition then.

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Do you know the bug of EXPORT_SYMBOL()

2015-07-24 Thread Navy
Hi
To my understanding, EXPORT_SYMBOL() is used to export a symbol in 
kernel/modules. The the address of the all sysbols is in /proc/kallsyms. 
Only symbols exported by EXPORT_SYMBOL() is listed its CRC information 
in Module.symvers. So I think the CRC is the key to export a symbol.
I do an experiment:

   ---mdir
   |
   |---Mod1
   ||---mod1.c
   ||---Makefile
   |
   |---Mod2
|---mod2.c
|---Makefile

mod1.c define function *void myfunc(void)* and exported by EXPORT_SYMBOL() 
and the CRC info is showed in Module.symvers. mod2.c reference *myfunc* and
compiled successfully. BUT when mod2.ko is insmoded, unknown symbol is 
complained. mod2.ko CAN'T BE INSMOD.
I solve this problem by the method in Documentation/kbuild/modules.txt and
heard this is a bug from kernel 2.6.

Why this bug is not be fixed?


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Do you know the bug of EXPORT_SYMBOL()

2015-07-24 Thread Pranay Srivastava
On Fri, Jul 24, 2015 at 11:19 AM, Navy nav...@126.com wrote:
 Hi
 To my understanding, EXPORT_SYMBOL() is used to export a symbol in
 kernel/modules. The the address of the all sysbols is in /proc/kallsyms.
 Only symbols exported by EXPORT_SYMBOL() is listed its CRC information
 in Module.symvers. So I think the CRC is the key to export a symbol.
 I do an experiment:

---mdir
|
|---Mod1
||---mod1.c
||---Makefile
|
|---Mod2
 |---mod2.c
 |---Makefile

 mod1.c define function *void myfunc(void)* and exported by EXPORT_SYMBOL()
 and the CRC info is showed in Module.symvers. mod2.c reference *myfunc* and
 compiled successfully. BUT when mod2.ko is insmoded, unknown symbol is
 complained. mod2.ko CAN'T BE INSMOD.

Your Mod1 must be live before you load Mod2.

When you load your module, the exported symbols would be present in a
separate section in the elf file, you can see that using readelf,
something like __ksymtab_. When the module is loaded these symbols are
noted so that find_symbol can locate these.

When you load a module dependent on those symbols, the load_module
function would use the find_symbol to get that symbol.

CRC would be checked in check_version after the symbol has been found,
even then only if you have CONFIG_MODVERSIONS set in your config.


 I solve this problem by the method in Documentation/kbuild/modules.txt and
 heard this is a bug from kernel 2.6.

 Why this bug is not be fixed?

It's a long way from 2.6 now. can you send something about this bug?

 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies



-- 
---P.K.S

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Do you know the bug of EXPORT_SYMBOL()

2015-07-24 Thread Navy
On Fri, Jul 24, 2015 at 01:48:57PM +0530, Pranay Srivastava wrote:
 On Fri, Jul 24, 2015 at 11:19 AM, Navy nav...@126.com wrote:
  Hi
  To my understanding, EXPORT_SYMBOL() is used to export a symbol in
 
  Why this bug is not be fixed?
 
 It's a long way from 2.6 now. can you send something about this bug?
Hi Pranay,
The detail is below:
https://bugzilla.kernel.org/show_bug.cgi?id=12446#c11
Maybe other people think out-of-tree modules should not be supported.
I don't know How do the in-tree modules reference other symbols and be insmod
without this problem. If you can help me about this, I will deeply grateful.


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Do you know the bug of EXPORT_SYMBOL()

2015-07-24 Thread Greg KH
On Fri, Jul 24, 2015 at 01:49:42PM +0800, Navy wrote:
 Hi
 To my understanding, EXPORT_SYMBOL() is used to export a symbol in 
 kernel/modules. The the address of the all sysbols is in /proc/kallsyms. 
 Only symbols exported by EXPORT_SYMBOL() is listed its CRC information 
 in Module.symvers. So I think the CRC is the key to export a symbol.
 I do an experiment:
   
---mdir
|
|---Mod1
||---mod1.c
||---Makefile
|
|---Mod2
 |---mod2.c
 |---Makefile
   
 mod1.c define function *void myfunc(void)* and exported by EXPORT_SYMBOL() 
 and the CRC info is showed in Module.symvers. mod2.c reference *myfunc* and
 compiled successfully. BUT when mod2.ko is insmoded, unknown symbol is 
 complained. mod2.ko CAN'T BE INSMOD.

Use 'modprobe' after properly installing the kernel modules to the
correct location, and then all will be fine.

greg k-h

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Do you know the bug of EXPORT_SYMBOL()

2015-07-24 Thread Abhishek bist
Hi,
This is the place where modprobe comes into play .If you have dependent
module the on your module then it is recommended to use modprobe .
After compiling your module run :
1. depmod -a
2. modprobe mod1
3. modprobe mod2

And as far as i know CONFIG_MODVERSION is basically for the module
signature according to which you can't directly insert a .ko file compiled
in another system.


On 24 July 2015 at 11:19, Navy nav...@126.com wrote:

 Hi
 To my understanding, EXPORT_SYMBOL() is used to export a symbol in
 kernel/modules. The the address of the all sysbols is in /proc/kallsyms.
 Only symbols exported by EXPORT_SYMBOL() is listed its CRC information
 in Module.symvers. So I think the CRC is the key to export a symbol.
 I do an experiment:

---mdir
|
|---Mod1
||---mod1.c
||---Makefile
|
|---Mod2
 |---mod2.c
 |---Makefile

 mod1.c define function *void myfunc(void)* and exported by EXPORT_SYMBOL()
 and the CRC info is showed in Module.symvers. mod2.c reference *myfunc* and
 compiled successfully. BUT when mod2.ko is insmoded, unknown symbol is
 complained. mod2.ko CAN'T BE INSMOD.
 I solve this problem by the method in Documentation/kbuild/modules.txt and
 heard this is a bug from kernel 2.6.

 Why this bug is not be fixed?


 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Do you know the bug of EXPORT_SYMBOL()

2015-07-24 Thread Tal Shorer
With in-tree modules, the compilation process can determine
dependencies and that's why modprobe works with them (insmod doesn't).
How do you want the kernel to know where the symbol comes from? Why
load your mod1 and not my mod3 that also defines myfunc()? Who's going
to call init_module on your mod1? Where in the filesystem should it
look for it?

On Fri, Jul 24, 2015 at 3:40 PM, Navy nav...@126.com wrote:
 On Fri, Jul 24, 2015 at 01:48:57PM +0530, Pranay Srivastava wrote:
 On Fri, Jul 24, 2015 at 11:19 AM, Navy nav...@126.com wrote:
  Hi
  To my understanding, EXPORT_SYMBOL() is used to export a symbol in

  Why this bug is not be fixed?
 
 It's a long way from 2.6 now. can you send something about this bug?
 Hi Pranay,
 The detail is below:
 https://bugzilla.kernel.org/show_bug.cgi?id=12446#c11
 Maybe other people think out-of-tree modules should not be supported.
 I don't know How do the in-tree modules reference other symbols and be insmod
 without this problem. If you can help me about this, I will deeply grateful.


 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


opening a BUG in kernel bugzilla (https://bugzilla.kernel.org/)

2015-06-12 Thread Kevin Wilson
Hi all,
Can we open a BUG in kernel bugzilla (https://bugzilla.kernel.org/)
against a kernel image which
was found on an image which was built from most recent build of Linus
tree (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
but is not reproducible (yet) against the official kernel images on
https://www.kernel.org/ ?

Or could it be event worse ? I mean, could we open BUGS against **each**
of the images under kernel.org (for example, 4.1-rc7, which is the
latest) ? Or only for those labeled as longterm or stable ?

Regards,
Kevin

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: opening a BUG in kernel bugzilla (https://bugzilla.kernel.org/)

2015-06-12 Thread Greg KH
On Fri, Jun 12, 2015 at 11:09:10AM +0300, Kevin Wilson wrote:
 Hi all,
 Can we open a BUG in kernel bugzilla (https://bugzilla.kernel.org/)
 against a kernel image which
 was found on an image which was built from most recent build of Linus
 tree (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
 but is not reproducible (yet) against the official kernel images on
 https://www.kernel.org/ ?
 
 Or could it be event worse ? I mean, could we open BUGS against **each**
 of the images under kernel.org (for example, 4.1-rc7, which is the
 latest) ? Or only for those labeled as longterm or stable ?

Even better, email the developers and the mailing list for the area of
the bug and don't worry about bugzilla, and hopefully the bug will be
resolved before the final release.

thanks,

greg k-h

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


bug kernel v4.0.1

2015-05-04 Thread Albino Biasutti Neto
Hi

I compiled kernel v4.0.1 of git linux-stable is with bug.

Kernel panic dont start boot loading. Back 3.16.


Albino

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: bug kernel v4.0.1

2015-05-04 Thread nick


On 2015-05-04 12:37 PM, Albino Biasutti Neto wrote:
 Hi
 
 I compiled kernel v4.0.1 of git linux-stable is with bug.
 
 Kernel panic dont start boot loading. Back 3.16.
 
 
 Albino
 
 
Albino,
Since this is a kernel panic it would be nice if you can send us a back trace 
of the function
calls on your screen when the panic happens. Otherwise this is impossible for 
us to trace your
kernel panic related to boot loading.
Nick
___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
 

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


when I use kgdb debugging kernel, I have added the kgdbwait to the kernel boot options, bug the system can't stop at booting for my gdb to connect

2015-01-09 Thread sizel
when I use kgdb debugging kernel, I have added the kgdbwait to the kernel 
boot options, bug the system can't stop at booting for my gdb to connect


In gdb session , type continue . It outputs The program is not run___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


BUG: unable to handle kernel NULL pointer dereference at (null)

2014-10-23 Thread Kevin Wilson
Hello,
I am getting this message when running a patched kernel:
BUG: unable to handle kernel NULL pointer dereference at (null)

Is there some kernel configuration item which enables getting
, instead (null), the proper value (I assume it should be a method name)


Regards,
Kevin

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: BUG: unable to handle kernel NULL pointer dereference at (null)

2014-10-23 Thread Paul Bolle
On Thu, 2014-10-23 at 09:58 +0300, Kevin Wilson wrote:
 I am getting this message when running a patched kernel:
 BUG: unable to handle kernel NULL pointer dereference at (null)

That is just what you get when the kernel dereferences NULL, isn't it?
(If the kernel dereferences, say, something halfway some struct, you'll
see a hex address instead of (null).) Assuming x86, see:
arch/x86/mm/fault.c:show_fault_oops() and lib/vsprintf.c:pointer().

 Is there some kernel configuration item which enables getting
 , instead (null), the proper value (I assume it should be a method name)

Doesn't your BUG also print a backtrace that points at a (possible)
culprit?


Paul Bolle


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Bug Patch

2014-09-08 Thread nick
Hey Guys,
Found a bug and attempted to fix it. I am attaching the patch, no build or 
checkpatch errors and
also checked to see if I need to clean up any memory when returning, and this 
seems to be true.
Nick 
From d5f7b8929bebcf0b12d8e402932b790f61786168 Mon Sep 17 00:00:00 2001
From: Nicholas Krause xerofo...@gmail.com
Date: Mon, 8 Sep 2014 05:57:09 -0400
Subject: [PATCH] staging: Fix ieee_80211_rx.c to check for Null allocated skb

In ieee_80211_rx.c we may have a Null allocated sub in parse_subframe
and need to check if the allocated skb is NUll. If it is return -ENOMEM.

Signed-off-by: Nicholas Krause xerofo...@gmail.com
---
 drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c b/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c
index 73410cc..dc8520d 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211_rx.c
@@ -847,6 +847,8 @@ static u8 parse_subframe(struct sk_buff *skb,
 #else
 			/* Allocate new skb for releasing to upper layer */
 			sub_skb = dev_alloc_skb(nSubframe_Length + 12);
+			if (!sub_skb)
+return -ENOMEM;
 			skb_reserve(sub_skb, 12);
 			data_ptr = (u8 *)skb_put(sub_skb, nSubframe_Length);
 			memcpy(data_ptr,skb-data,nSubframe_Length);
--
1.9.1

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Bug Patch

2014-09-08 Thread Sudip Mukherjee
On Mon, Sep 8, 2014 at 4:51 PM, Tobias S. Josefowitz
t.josefow...@gmail.com wrote:
 Hi Nick,

 parse_subframe() is a static function and the only caller assumes that
 skb is != NULL and would be in trouble way before parse_subframe() if
 skb was indeed NULL.

 Tobi

Hi Tobi,
I think Nick's patch is regarding dev_alloc_skb(nSubframe_Length + 12) ;
There is no error check for the return value of dev_alloc_skb  , and
it can return NULL if it fails and the memory is not allocated.
I admit return -ENOMEM is wrong , but still I think Nick has found
something this time.

sudip

 On Mon, Sep 8, 2014 at 12:36 PM, nick xerofo...@gmail.com wrote:
 Hey Guys,
 Found a bug and attempted to fix it. I am attaching the patch, no build or 
 checkpatch errors and
 also checked to see if I need to clean up any memory when returning, and 
 this seems to be true.
 Nick

 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Bug Patch

2014-09-08 Thread Doug Wilson
Tobi, Nick,
 I think Nick's patch is regarding dev_alloc_skb(nSubframe_Length + 12) ;
 There is no error check for the return value of dev_alloc_skb  , and
 it can return NULL if it fails and the memory is not allocated.
 I admit return -ENOMEM is wrong , but still I think Nick has found
 something this time.


  Nick, the patch you sent is doing the right thing, but like Tobi
mentioned -ENOMEM is wrong.
dev_alloc_skb internally calls __netdev_alloc_skb and the comment on
top of the function says
*%NULL is returned if there is no free memory*. So could you change
the patch accordingly.

for eg: if (sub_skb == NULL)
   return NULL;

- Doug

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Kconfig set default macro bug?

2014-04-09 Thread Matthias Brugger
Hi all,

I'm just playing with the early printk macros of the low level printk.
The address of the UART port is set to a default value [1].
So if you save the config and then enter again in menuconfig, you are
not able to change the value to a different default value (I suppose
because Kconfig see the value as set), but there is no way to find out
what the values for other UART ports would be. E.g. you have
accidentally selected the wrong port.

The only solution is, to disable the low level debugging functions,
save the config and open it with menuconfig again.

I'm not quite sure if this is the most convenient way of doing this.
At least the default address should be readable from the config to
change the value by hand.

Does anyone know if we can consider this to be a bug?

Cheers,
Matthias

[1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/Kconfig.debug?id=refs/tags/v3.14#n1007

-- 
motzblog.wordpress.com

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Selecting a Linux Kernel Bug

2014-03-25 Thread Greg Freemyer


On March 24, 2014 9:23:01 AM EDT, valdis.kletni...@vt.edu wrote:
On Mon, 24 Mar 2014 12:22:58 +0530, sanjeev sharma said:

 Thanks and Let me subscribe so that I can start working on Bugs.

Subscribing to lkml almost guarantees you won't have enough time to
actually work on bugs.

Note that nobody reads every post in linux-kernel. In fact, nobody who
expects
to have time left over to actually do any real kernel work will read
even half.
Except Alan Cox, but he's actually not human, but about a thousand
gnomes
working in under-ground caves in Swansea. None of the individual gnomes
read
all the postings either, they just work together really well. -- Linus
Torvalds

The linux-kernel list has about 4 times the traffic now as when Linus
said that.

I subscribe to a few subsystem lists, that is more than I can keep up with.  
I've never had the urge to even try lkml.  For me it's ide/libata, ext4, xfs.  
I should drop ext4 as I rarely read it these days.

Greg
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Selecting a Linux Kernel Bug

2014-03-24 Thread sanjeev sharma
Hi Greg

Where Kernel Bugs are getting tracked  ? so that open Bugs in Kernel can be
looked.

Regards
Sanjeev Sharma



On Wed, Mar 19, 2014 at 9:55 PM, Greg KH g...@kroah.com wrote:

 On Wed, Mar 19, 2014 at 09:48:13PM +0530, Ashwin Jha wrote:
 
  Hi All,
 
  I am a first year graduate student at Indian Statistical Institute,
 Kolkata,
  India. I have to do a small OS assignment. As I always wanted to
 contribute to
  Linux, I am thinking of working on a Linux kernel bug or a new feature
 as part
  of my assignment.
 
  I have never worked on Linux kernel before. So, I need some help for
 selecting
  a bug or a feature that can be resolved in a month's time. The time
 period is
  not strict but desired. What I really want is a problem that will help
 me in
  building a good understanding of Linux kernel.

 How about looking at drivers/staging/*/TODO ?  There's lots of things
 there that need cleanups and help.

 Good luck,

 greg k-h

 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Selecting a Linux Kernel Bug

2014-03-24 Thread Madper Xie

sanjeev sharma sanjeevsharmae...@gmail.com writes:

 Hi Greg

 Where Kernel Bugs are getting tracked  ? so that open Bugs in Kernel can be
 looked.

Not very sure. maybe:
https://bugzilla.kernel.org/describecomponents.cgi

I always report bugs to LKML or related mail-list. (E.g. Linux-efi)
So you'd better subscribe them.
 Regards
 Sanjeev Sharma

 On Wed, Mar 19, 2014 at 9:55 PM, Greg KH g...@kroah.com wrote:

 On Wed, Mar 19, 2014 at 09:48:13PM +0530, Ashwin Jha wrote:
 
  Hi All,
 
  I am a first year graduate student at Indian Statistical Institute,
 Kolkata,
  India. I have to do a small OS assignment. As I always wanted to
 contribute to
  Linux, I am thinking of working on a Linux kernel bug or a new feature 
 as
 part
  of my assignment.
 
  I have never worked on Linux kernel before. So, I need some help for
 selecting
  a bug or a feature that can be resolved in a month's time. The time
 period is
  not strict but desired. What I really want is a problem that will help 
 me
 in
  building a good understanding of Linux kernel.

 How about looking at drivers/staging/*/TODO ?  There's lots of things
 there that need cleanups and help.

 Good luck,

 greg k-h

 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

-- 
Sent with my mu4e


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Selecting a Linux Kernel Bug

2014-03-24 Thread sanjeev sharma
Thanks and Let me subscribe so that I can start working on Bugs.

Regards
Sanjeev Sharma .


On Mon, Mar 24, 2014 at 11:57 AM, Madper Xie c...@redhat.com wrote:


 sanjeev sharma sanjeevsharmae...@gmail.com writes:

  Hi Greg
 
  Where Kernel Bugs are getting tracked  ? so that open Bugs in Kernel can
 be
  looked.
 
 Not very sure. maybe:
 https://bugzilla.kernel.org/describecomponents.cgi

 I always report bugs to LKML or related mail-list. (E.g. Linux-efi)
 So you'd better subscribe them.
  Regards
  Sanjeev Sharma
 
  On Wed, Mar 19, 2014 at 9:55 PM, Greg KH g...@kroah.com wrote:
 
  On Wed, Mar 19, 2014 at 09:48:13PM +0530, Ashwin Jha wrote:
  
   Hi All,
  
   I am a first year graduate student at Indian Statistical Institute,
  Kolkata,
   India. I have to do a small OS assignment. As I always wanted to
  contribute to
   Linux, I am thinking of working on a Linux kernel bug or a new
 feature as
  part
   of my assignment.
  
   I have never worked on Linux kernel before. So, I need some help
 for
  selecting
   a bug or a feature that can be resolved in a month's time. The time
  period is
   not strict but desired. What I really want is a problem that will
 help me
  in
   building a good understanding of Linux kernel.
 
  How about looking at drivers/staging/*/TODO ?  There's lots of things
  there that need cleanups and help.
 
  Good luck,
 
  greg k-h
 
  ___
  Kernelnewbies mailing list
  Kernelnewbies@kernelnewbies.org
  http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

 --
 Sent with my mu4e


___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Selecting a Linux Kernel Bug

2014-03-24 Thread Valdis . Kletnieks
On Mon, 24 Mar 2014 12:22:58 +0530, sanjeev sharma said:

 Thanks and Let me subscribe so that I can start working on Bugs.

Subscribing to lkml almost guarantees you won't have enough time to
actually work on bugs.

Note that nobody reads every post in linux-kernel. In fact, nobody who expects
to have time left over to actually do any real kernel work will read even half.
Except Alan Cox, but he's actually not human, but about a thousand gnomes
working in under-ground caves in Swansea. None of the individual gnomes read
all the postings either, they just work together really well. -- Linus Torvalds

The linux-kernel list has about 4 times the traffic now as when Linus said that.


pgpDCK0lJhqlM.pgp
Description: PGP signature
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Selecting a Linux Kernel Bug

2014-03-19 Thread Greg KH
On Wed, Mar 19, 2014 at 09:48:13PM +0530, Ashwin Jha wrote:
 
 Hi All,
 
 I am a first year graduate student at Indian Statistical Institute, Kolkata,
 India. I have to do a small OS assignment. As I always wanted to contribute to
 Linux, I am thinking of working on a Linux kernel bug or a new feature as part
 of my assignment.
 
 I have never worked on Linux kernel before. So, I need some help for selecting
 a bug or a feature that can be resolved in a month's time. The time period is
 not strict but desired. What I really want is a problem that will help me in
 building a good understanding of Linux kernel.

How about looking at drivers/staging/*/TODO ?  There's lots of things
there that need cleanups and help.

Good luck,

greg k-h

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: My Kernel bug is celebrating 2 years. Can you help me fix it?

2014-03-07 Thread walter harms


Am 04.03.2014 22:26, schrieb Peter Senna Tschudin:
 I have reported a bug more than two years ago and it is still
 affecting me. The bug report gives some information:
 
 https://bugzilla.redhat.com/show_bug.cgi?id=787299
 
 I have tried basic debug instructions from:
 https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt
 
 And everything works as expected when:
 # echo freeze  /sys/power/state
 # echo disk  /sys/power/state
 
 I have asked for help for fixing it:
 https://lkml.org/lkml/2013/11/1/186
 
 But I don't have a serial port. How can I debug this issue without a
 real serial port? Or what else can I try? 
You need to create a kernel with networksupport,
setup netconsole you need a second computer to receive the issues.


How can I explore the hint
 about the problem only happening with VT-d enabled in BIOS? How can I
 explore the hint about the problem not happening if the option
 nox2apic is passed to the Kernel?

never heard about that until now, obviously there is a bug in several
acpi's that can be triggered. If your systems works with nox2apic
as bootparameter you should be happy since a workaround is available.
Information about it is used can be find here:
http://lxr.free-electrons.com/ident?a=microblaze;i=nox2apic

re,
 wh


 Thank you,
 
 Peter
 
 P.S Yes, it works on Windows.
 

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: My Kernel bug is celebrating 2 years. Can you help me fix it?

2014-03-07 Thread Bjorn Helgaas
[+cc Stoney, Yinghai, Suresh, Joerg, Jiang, Pavel, Rafael, linux-pm]

Let's add some folks who know about x2apic and VT-d.  It's hard for
people to magically pick stuff out of the LKML firehose :)

On Tue, Mar 4, 2014 at 2:26 PM, Peter Senna Tschudin
peter.se...@gmail.com wrote:
 I have reported a bug more than two years ago and it is still
 affecting me. The bug report gives some information:

 https://bugzilla.redhat.com/show_bug.cgi?id=787299

 I have tried basic debug instructions from:
 https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt

 And everything works as expected when:
 # echo freeze  /sys/power/state
 # echo disk  /sys/power/state

 I have asked for help for fixing it:
 https://lkml.org/lkml/2013/11/1/186

 But I don't have a serial port. How can I debug this issue without a
 real serial port? Or what else can I try? How can I explore the hint
 about the problem only happening with VT-d enabled in BIOS? How can I
 explore the hint about the problem not happening if the option
 nox2apic is passed to the Kernel?

 Thank you,

 Peter

 P.S Yes, it works on Windows.

 --
 Peter
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: My Kernel bug is celebrating 2 years. Can you help me fix it?

2014-03-07 Thread Peter Hurley
On 03/06/2014 01:27 PM, Bjorn Helgaas wrote:
 [+cc Stoney, Yinghai, Suresh, Joerg, Jiang, Pavel, Rafael, linux-pm]

 Let's add some folks who know about x2apic and VT-d.  It's hard for
 people to magically pick stuff out of the LKML firehose :)

 On Tue, Mar 4, 2014 at 2:26 PM, Peter Senna Tschudin
 peter.se...@gmail.com wrote:
 I have reported a bug more than two years ago and it is still
 affecting me. The bug report gives some information:

 https://bugzilla.redhat.com/show_bug.cgi?id=787299

 I have tried basic debug instructions from:
 https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt

 And everything works as expected when:
 # echo freeze  /sys/power/state
 # echo disk  /sys/power/state

 I have asked for help for fixing it:
 https://lkml.org/lkml/2013/11/1/186

 But I don't have a serial port. How can I debug this issue without a
 real serial port? Or what else can I try? How can I explore the hint
 about the problem only happening with VT-d enabled in BIOS? How can I
 explore the hint about the problem not happening if the option
 nox2apic is passed to the Kernel?

 Thank you,

 Peter

 P.S Yes, it works on Windows.

Windows 7 doesn't use x2apic mode.
Windows 8 does and this same problem happened with this model: see
http://forums.toshiba.com/t5/Windows-8-8-1/Resume-from-standby-issue-on-R830-PT321A-01K002/td-p/330110

Seems like your model was recently added to Toshiba's Windows 8
compatibility list: http://support.toshiba-tie.co.jp/windows8/list_au.htm

Check for a more recent BIOS update that may fix this problem.

Regards,
Peter Hurley

PS - kernel bugs are better filed on the kernel bugzilla 
https://bugzilla.kernel.org/

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: My Kernel bug is celebrating 2 years. Can you help me fix it?

2014-03-07 Thread Peter Senna Tschudin
On Thu, Mar 6, 2014 at 8:08 PM, Peter Hurley pe...@hurleysoftware.com wrote:
 On 03/06/2014 01:27 PM, Bjorn Helgaas wrote:

 [+cc Stoney, Yinghai, Suresh, Joerg, Jiang, Pavel, Rafael, linux-pm]

 Let's add some folks who know about x2apic and VT-d.  It's hard for
 people to magically pick stuff out of the LKML firehose :)

 On Tue, Mar 4, 2014 at 2:26 PM, Peter Senna Tschudin
 peter.se...@gmail.com wrote:

 I have reported a bug more than two years ago and it is still
 affecting me. The bug report gives some information:

 https://bugzilla.redhat.com/show_bug.cgi?id=787299

 I have tried basic debug instructions from:
 https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt

 And everything works as expected when:
 # echo freeze  /sys/power/state
 # echo disk  /sys/power/state

 I have asked for help for fixing it:
 https://lkml.org/lkml/2013/11/1/186

 But I don't have a serial port. How can I debug this issue without a
 real serial port? Or what else can I try? How can I explore the hint
 about the problem only happening with VT-d enabled in BIOS? How can I
 explore the hint about the problem not happening if the option
 nox2apic is passed to the Kernel?

 Thank you,

 Peter

 P.S Yes, it works on Windows.


 Windows 7 doesn't use x2apic mode.
 Windows 8 does and this same problem happened with this model: see
 http://forums.toshiba.com/t5/Windows-8-8-1/Resume-from-standby-issue-on-R830-PT321A-01K002/td-p/330110
Thank you for the information. The model is similar to mine, probably
the same motherboard. My tests were with Windows 7.


 Seems like your model was recently added to Toshiba's Windows 8
 compatibility list: http://support.toshiba-tie.co.jp/windows8/list_au.htm

 Check for a more recent BIOS update that may fix this problem.
I do that weekly since February last year, and there are no updates
available. The bad news is that the model was discontinued, so it is
possible that there will be no more BIOS updates. I'll write to
Toshiba Europe GmbH asking for a fix, by my hopes in getting an answer
are low.


 Regards,
 Peter Hurley

 PS - kernel bugs are better filed on the kernel bugzilla
 https://bugzilla.kernel.org/
Thank you!


-- 
Peter

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: My Kernel bug is celebrating 2 years. Can you help me fix it?

2014-03-05 Thread Peter Senna Tschudin
On Mar 5, 2014 9:27 AM, walter harms wha...@bfs.de wrote:



 Am 04.03.2014 22:26, schrieb Peter Senna Tschudin:
  I have reported a bug more than two years ago and it is still
  affecting me. The bug report gives some information:
 
  https://bugzilla.redhat.com/show_bug.cgi?id=787299
 
  I have tried basic debug instructions from:
  https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt
 
  And everything works as expected when:
  # echo freeze  /sys/power/state
  # echo disk  /sys/power/state
 
  I have asked for help for fixing it:
  https://lkml.org/lkml/2013/11/1/186
 
  But I don't have a serial port. How can I debug this issue without a
  real serial port? Or what else can I try?
 You need to create a kernel with networksupport,
 setup netconsole you need a second computer to receive the issues.

I have tried, and it does not work. The kernel do not start running
properly after resume, so there is no network...



 How can I explore the hint
  about the problem only happening with VT-d enabled in BIOS? How can I
  explore the hint about the problem not happening if the option
  nox2apic is passed to the Kernel?

 never heard about that until now, obviously there is a bug in several
 acpi's that can be triggered. If your systems works with nox2apic
 as bootparameter you should be happy since a workaround is available.
 Information about it is used can be find here:
 http://lxr.free-electrons.com/ident?a=microblaze;i=nox2apic

Sorry, I'm not happy with the workaround, I want it to work. Thank you for
the link, I'll check it.


 re,
  wh


  Thank you,
 
  Peter
 
  P.S Yes, it works on Windows.
 
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: My Kernel bug is celebrating 2 years. Can you help me fix it?

2014-03-05 Thread Thomas Petazzoni
Dear Peter Senna Tschudin,

On Tue, 4 Mar 2014 22:26:37 +0100, Peter Senna Tschudin wrote:
 I have reported a bug more than two years ago and it is still
 affecting me. The bug report gives some information:
 
 https://bugzilla.redhat.com/show_bug.cgi?id=787299
 
 I have tried basic debug instructions from:
 https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt
 
 And everything works as expected when:
 # echo freeze  /sys/power/state
 # echo disk  /sys/power/state
 
 I have asked for help for fixing it:
 https://lkml.org/lkml/2013/11/1/186
 
 But I don't have a serial port. How can I debug this issue without a
 real serial port? Or what else can I try? How can I explore the hint
 about the problem only happening with VT-d enabled in BIOS? How can I
 explore the hint about the problem not happening if the option
 nox2apic is passed to the Kernel?

Something I would try in this situation is to boot with mem=some value
smaller than the amount of RAM, and then have the kernel write some
debugging informations manually at a fixed location in RAM that has
been reserved by lowering the amount of RAM using mem=. Then, when you
reboot, you can dump what has been left in this memory location. Of
course this requires that 1/ this memory location is not overwritten by
the BIOS/bootloader and 2/ that you can do a warm reset to not loose
the contents of the memory. Since I don't do much x86 kernel hacking,
I never had to do that on x86, but I've used this trick a few times on
ARM platforms.

If that works, then it means you can put some debugging details all
over the kernel to find where things hand exactly during the resume
process.

Another embedded trick is to find a LED that you can easily turn
on/off. It is very useful to see if you reach a given portion of code
or not.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


My Kernel bug is celebrating 2 years. Can you help me fix it?

2014-03-04 Thread Peter Senna Tschudin
I have reported a bug more than two years ago and it is still
affecting me. The bug report gives some information:

https://bugzilla.redhat.com/show_bug.cgi?id=787299

I have tried basic debug instructions from:
https://www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt

And everything works as expected when:
# echo freeze  /sys/power/state
# echo disk  /sys/power/state

I have asked for help for fixing it:
https://lkml.org/lkml/2013/11/1/186

But I don't have a serial port. How can I debug this issue without a
real serial port? Or what else can I try? How can I explore the hint
about the problem only happening with VT-d enabled in BIOS? How can I
explore the hint about the problem not happening if the option
nox2apic is passed to the Kernel?

Thank you,

Peter

P.S Yes, it works on Windows.

-- 
Peter

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


  1   2   >