Re: [PATCH v2 2/6] modpost: fix broken sym->namespace for external module builds

2019-10-03 Thread Shaun Ruffell
On Thu, Oct 03, 2019 at 04:58:22PM +0900, Masahiro Yamada wrote:
> Currently, external module builds produce tons of false-positives:
> 
>   WARNING: module  uses symbol  from namespace , but does not 
> import it.
> 
> Here, the  part shows a random string.
> 
> When you build external modules, the symbol info of vmlinux and
> in-kernel modules are read from $(objtree)/Module.symvers, but
> read_dump() is buggy in multiple ways:
> 
> [1] When the modpost is run for vmlinux and in-kernel modules,
> sym_extract_namespace() allocates memory for the namespace. On the
> other hand, read_dump() does not, then sym->namespace will point to
> somewhere in the line buffer of get_next_line(). The data in the
> buffer will be replaced soon, and sym->namespace will end up with
> pointing to unrelated data. As a result, check_exports() will show
> random strings in the warning messages.
> 
> [2] When there is no namespace, sym_extract_namespace() returns NULL.
> On the other hand, read_dump() sets namespace to an empty string "".
> (but, it will be later replaced with unrelated data due to bug [1].)
> The check_exports() shows a warning unless exp->namespace is NULL,
> so every symbol read from read_dump() emits the warning, which is
> mostly false positive.
> 
> To address [1], sym_add_exported() calls strdup() for s->namespace.
> The namespace from sym_extract_namespace() must be freed to avoid
> memory leak.
> 
> For [2], I changed the if-conditional in check_exports().
> 
> This commit also fixes sym_add_exported() to set s->namespace correctly
> when the symbol is preloaded.
> 
> Signed-off-by: Masahiro Yamada 
> Reviewed-by: Matthias Maennich 
> ---
> 
> Changes in v2:
>   - Change the approach to deal with ->preloaded
> 
>  scripts/mod/modpost.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 2c644086c412..936d3ad23c83 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -166,7 +166,7 @@ struct symbol {
>   struct module *module;
>   unsigned int crc;
>   int crc_valid;
> - const char *namespace;
> + char *namespace;
>   unsigned int weak:1;
>   unsigned int vmlinux:1;/* 1 if symbol is defined in vmlinux */
>   unsigned int kernel:1; /* 1 if symbol is from kernel
> @@ -348,7 +348,7 @@ static enum export export_from_sec(struct elf_info *elf, 
> unsigned int sec)
>   return export_unknown;
>  }
>  
> -static const char *sym_extract_namespace(const char **symname)
> +static char *sym_extract_namespace(const char **symname)
>  {
>   char *namespace = NULL;
>   char *ns_separator;
> @@ -373,7 +373,6 @@ static struct symbol *sym_add_exported(const char *name, 
> const char *namespace,
>  
>   if (!s) {
>   s = new_symbol(name, mod, export);
> - s->namespace = namespace;
>   } else {
>   if (!s->preloaded) {
>   warn("%s: '%s' exported twice. Previous export was in 
> %s%s\n",
> @@ -384,6 +383,8 @@ static struct symbol *sym_add_exported(const char *name, 
> const char *namespace,
>   s->module = mod;
>   }
>   }
> + free(s->namespace);
> + s->namespace = namespace ? strdup(namespace) : NULL;
>   s->preloaded = 0;
>   s->vmlinux   = is_vmlinux(mod->name);
>   s->kernel= 0;
> @@ -670,7 +671,8 @@ static void handle_modversions(struct module *mod, struct 
> elf_info *info,
>   unsigned int crc;
>   enum export export;
>   bool is_crc = false;
> - const char *name, *namespace;
> + const char *name;
> + char *namespace;
>  
>   if ((!is_vmlinux(mod->name) || mod->is_dot_o) &&
>   strstarts(symname, "__ksymtab"))
> @@ -745,6 +747,7 @@ static void handle_modversions(struct module *mod, struct 
> elf_info *info,
>   name = symname + strlen("__ksymtab_");
>   namespace = sym_extract_namespace();
>   sym_add_exported(name, namespace, mod, export);
> + free(namespace);
>   }
>   if (strcmp(symname, "init_module") == 0)
>   mod->has_init = 1;
> @@ -2193,7 +2196,7 @@ static int check_exports(struct module *mod)
>   else
>   basename = mod->name;
>  
> - if (exp->namespace) {
> + if (exp->namespace && exp->namespace[0]) {
>       add_names

Re: [PATCH] modpost: Copy namespace string into 'struct symbol'

2019-10-01 Thread Shaun Ruffell
On Tue, Oct 01, 2019 at 05:19:23PM +0100, Matthias Maennich wrote:
> On Mon, Sep 30, 2019 at 04:20:46PM -0500, Shaun Ruffell wrote:
> > On Fri, Sep 27, 2019 at 09:03:46AM +0100, Matthias Maennich wrote:
> > > On Thu, Sep 26, 2019 at 05:24:46PM -0500, Shaun Ruffell wrote:
> > > > When building an out-of-tree module I was receiving many warnings from
> > > > modpost like:
> > > >
> > > >  WARNING: module dahdi_vpmadt032_loader uses symbol __kmalloc from 
> > > > namespace ts/dahdi-linux/drivers/dahdi/dahdi-version.o: ..., but does 
> > > > not import it.
> > > >  WARNING: module dahdi_vpmadt032_loader uses symbol vpmadtreg_register 
> > > > from namespace linux/drivers/dahdi/dahdi-version.o: ..., but does not 
> > > > import it.
> > > >  WARNING: module dahdi_vpmadt032_loader uses symbol param_ops_int from 
> > > > namespace ahdi-linux/drivers/dahdi/dahdi-version.o: ..., but does not 
> > > > import it.
> > > >  WARNING: module dahdi_vpmadt032_loader uses symbol 
> > > > __init_waitqueue_head from namespace ux/drivers/dahdi/dahdi-version.o: 
> > > > ..., but does not import it.
> > > >  ...
> > > >
> > > > The fundamental issue appears to be that read_dump() is passing a
> > > > pointer to a statically allocated buffer for the namespace which is
> > > > reused as the file is parsed.
> > > 
> > > Hi Shaun,
> > > 
> > > Thanks for working on this. I think you are right about the root cause
> > > of this. I will have a closer look at your fix later today.
> > 
> > Thanks Matthias.
> 
> In the meantime, Masahiro came up with an alternative approach to
> address this problem:
> https://lore.kernel.org/lkml/20190927093603.9140-2-yamada.masah...@socionext.com/
> How do you think about it? It ignores the memory allocation problem that
> you addressed as modpost is a host tool after all. As part of the patch
> series, an alternative format for the namespace ksymtab entry is
> suggested that also changes the way modpost has to deal with it.

Masahiro's patch set looks good to me.

My only comment would be that I felt it preferable for
sym_add_exported() to treat the two string arguments passed to it the
same way. I feel the way it is currently violates the princple of least
surprise. However I accept this is just my personal opinion.

> > > > @@ -672,7 +696,6 @@ static void handle_modversions(struct module *mod, 
> > > > struct elf_info *info,
> > > > unsigned int crc;
> > > > enum export export;
> > > > bool is_crc = false;
> > > > -   const char *name, *namespace;
> > > >
> > > > if ((!is_vmlinux(mod->name) || mod->is_dot_o) &&
> > > > strstarts(symname, "__ksymtab"))
> > > > @@ -744,9 +767,13 @@ static void handle_modversions(struct module *mod, 
> > > > struct elf_info *info,
> > > > default:
> > > > /* All exported symbols */
> > > > if (strstarts(symname, "__ksymtab_")) {
> > > > +   const char *name, *namespace;
> > > > +
> > > > name = symname + strlen("__ksymtab_");
> > > > namespace = sym_extract_namespace();
> > > > sym_add_exported(name, namespace, mod, export);
> > > > +   if (namespace)
> > > > +   free((char *)name);
> > > 
> > > This probably should free namespace instead.
> > 
> > Given the implementation of sym_extract_namespace below, I believe
> > free((char *)name) is correct.
> 
> Yeah, you are right. I was just noticing the inconsistency and thought
> it was obviously wrong. So, I was wrong. Sorry and thanks for the
> explanation.
> 
> > 
> >  static const char *sym_extract_namespace(const char **symname)
> >  {
> > size_t n;
> > char *dupsymname;
> > 
> > n = strcspn(*symname, ".");
> > if (n < strlen(*symname) - 1) {
> > dupsymname = NOFAIL(strdup(*symname));
> > dupsymname[n] = '\0';
> > *symname = dupsymname;
> > return dupsymname + n + 1;
> > }
> > 
> > return NULL;
> >  }
> > 
> > I agree that freeing name instead of namespace is a little surprising
> > unless you know the implementation of sym_extract_namespace.
> > 
> > I thought about changing the the signature of sym_extract_namespace() to
> > make it clear when the symname is used to return a new allocation or
> > not, and given your comment, perhaps I should have.
> 
> I would rather follow-up with Masahiro's approach for now. What do you
> think?

I agree that following-up with Masahiro's patch set is the better
option.

Cheers,
Shaun


Re: [PATCH] modpost: Copy namespace string into 'struct symbol'

2019-09-30 Thread Shaun Ruffell
On Fri, Sep 27, 2019 at 09:03:46AM +0100, Matthias Maennich wrote:
> On Thu, Sep 26, 2019 at 05:24:46PM -0500, Shaun Ruffell wrote:
> > When building an out-of-tree module I was receiving many warnings from
> > modpost like:
> > 
> >  WARNING: module dahdi_vpmadt032_loader uses symbol __kmalloc from 
> > namespace ts/dahdi-linux/drivers/dahdi/dahdi-version.o: ..., but does not 
> > import it.
> >  WARNING: module dahdi_vpmadt032_loader uses symbol vpmadtreg_register from 
> > namespace linux/drivers/dahdi/dahdi-version.o: ..., but does not import it.
> >  WARNING: module dahdi_vpmadt032_loader uses symbol param_ops_int from 
> > namespace ahdi-linux/drivers/dahdi/dahdi-version.o: ..., but does not 
> > import it.
> >  WARNING: module dahdi_vpmadt032_loader uses symbol __init_waitqueue_head 
> > from namespace ux/drivers/dahdi/dahdi-version.o: ..., but does not import 
> > it.
> >  ...
> > 
> > The fundamental issue appears to be that read_dump() is passing a
> > pointer to a statically allocated buffer for the namespace which is
> > reused as the file is parsed.
> 
> Hi Shaun,
> 
> Thanks for working on this. I think you are right about the root cause
> of this. I will have a closer look at your fix later today.

Thanks Matthias.

> > @@ -672,7 +696,6 @@ static void handle_modversions(struct module *mod, 
> > struct elf_info *info,
> > unsigned int crc;
> > enum export export;
> > bool is_crc = false;
> > -   const char *name, *namespace;
> > 
> > if ((!is_vmlinux(mod->name) || mod->is_dot_o) &&
> > strstarts(symname, "__ksymtab"))
> > @@ -744,9 +767,13 @@ static void handle_modversions(struct module *mod, 
> > struct elf_info *info,
> > default:
> > /* All exported symbols */
> > if (strstarts(symname, "__ksymtab_")) {
> > +   const char *name, *namespace;
> > +
> > name = symname + strlen("__ksymtab_");
> > namespace = sym_extract_namespace();
> > sym_add_exported(name, namespace, mod, export);
> > +   if (namespace)
> > +   free((char *)name);
> 
> This probably should free namespace instead.

Given the implementation of sym_extract_namespace below, I believe
free((char *)name) is correct.

  static const char *sym_extract_namespace(const char **symname)
  {
size_t n;
char *dupsymname;
  
n = strcspn(*symname, ".");
if (n < strlen(*symname) - 1) {
dupsymname = NOFAIL(strdup(*symname));
dupsymname[n] = '\0';
*symname = dupsymname;
return dupsymname + n + 1;
}
  
return NULL;
  }

I agree that freeing name instead of namespace is a little surprising
unless you know the implementation of sym_extract_namespace.

I thought about changing the the signature of sym_extract_namespace() to
make it clear when the symname is used to return a new allocation or
not, and given your comment, perhaps I should have.


[PATCH] modpost: Copy namespace string into 'struct symbol'

2019-09-26 Thread Shaun Ruffell
When building an out-of-tree module I was receiving many warnings from
modpost like:

  WARNING: module dahdi_vpmadt032_loader uses symbol __kmalloc from namespace 
ts/dahdi-linux/drivers/dahdi/dahdi-version.o: ..., but does not import it.
  WARNING: module dahdi_vpmadt032_loader uses symbol vpmadtreg_register from 
namespace linux/drivers/dahdi/dahdi-version.o: ..., but does not import it.
  WARNING: module dahdi_vpmadt032_loader uses symbol param_ops_int from 
namespace ahdi-linux/drivers/dahdi/dahdi-version.o: ..., but does not import it.
  WARNING: module dahdi_vpmadt032_loader uses symbol __init_waitqueue_head from 
namespace ux/drivers/dahdi/dahdi-version.o: ..., but does not import it.
  ...

The fundamental issue appears to be that read_dump() is passing a
pointer to a statically allocated buffer for the namespace which is
reused as the file is parsed.

This change makes it so that 'struct symbol' holds a copy of the
namespace string in the same way that it holds a copy of the symbol
string. Because a copy is being made, handle_modversion can now free the
temporary copy

Fixes: cb9b55d21fe0 ("modpost: add support for symbol namespaces")
Cc: Martijn Coenen 
Cc: Joel Fernandes (Google) 
Cc: Greg Kroah-Hartman 
Cc: Matthias Maennich 
Cc: Jessica Yu 
Signed-off-by: Shaun Ruffell 
---

Hi,

I didn't test that this change works with the namespaces, or investigate why
read_dump() is only called first while building out-of-tree modules, but it does
seem correct to me for the symbol to own the memory backing the namespace
string.

I also realize I'm jumping the gun a bit by testing against master before
5.4-rc1 is tagged.

Shaun

 scripts/mod/modpost.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 3961941e8e7a..349832ead200 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -364,6 +364,24 @@ static const char *sym_extract_namespace(const char 
**symname)
return NULL;
 }
 
+static const char *dup_namespace(const char *namespace)
+{
+   if (!namespace || (namespace[0] == '\0'))
+   return NULL;
+   return NOFAIL(strdup(namespace));
+}
+
+static bool is_equal(const char *n1, const char *n2)
+{
+   if (n1 && !n2)
+   return false;
+   if (!n1 && n2)
+   return false;
+   if (!n1 && !n2)
+   return true;
+   return strcmp(n1, n2) == 0;
+}
+
 /**
  * Add an exported symbol - it may have already been added without a
  * CRC, in this case just update the CRC
@@ -375,7 +393,7 @@ static struct symbol *sym_add_exported(const char *name, 
const char *namespace,
 
if (!s) {
s = new_symbol(name, mod, export);
-   s->namespace = namespace;
+   s->namespace = dup_namespace(namespace);
} else {
if (!s->preloaded) {
warn("%s: '%s' exported twice. Previous export was in 
%s%s\n",
@@ -384,6 +402,12 @@ static struct symbol *sym_add_exported(const char *name, 
const char *namespace,
} else {
/* In case Module.symvers was out of date */
s->module = mod;
+
+   /* In case the namespace was out of date */
+   if (!is_equal(s->namespace, namespace)) {
+   free((char *)s->namespace);
+   s->namespace = dup_namespace(namespace);
+   }
}
}
s->preloaded = 0;
@@ -672,7 +696,6 @@ static void handle_modversions(struct module *mod, struct 
elf_info *info,
unsigned int crc;
enum export export;
bool is_crc = false;
-   const char *name, *namespace;
 
if ((!is_vmlinux(mod->name) || mod->is_dot_o) &&
strstarts(symname, "__ksymtab"))
@@ -744,9 +767,13 @@ static void handle_modversions(struct module *mod, struct 
elf_info *info,
default:
/* All exported symbols */
if (strstarts(symname, "__ksymtab_")) {
+   const char *name, *namespace;
+
name = symname + strlen("__ksymtab_");
namespace = sym_extract_namespace();
sym_add_exported(name, namespace, mod, export);
+   if (namespace)
+   free((char *)name);
}
if (strcmp(symname, "init_module") == 0)
mod->has_init = 1;
-- 
2.17.1



Re: [igb] AER timeout - resend.

2015-07-01 Thread Shaun Ruffell
On Wed, Jul 01, 2015 at 11:18:36PM +0200, Ian Kumlien wrote:
> It was actually fixed with a bios upgrade from Super Micro (in this case)
> so I'd investigate that first... =)

Hmm...interesting. Thanks for the reply!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [igb] AER timeout - resend.

2015-07-01 Thread Shaun Ruffell
On Mon, Feb 23, 2015 at 03:56:56PM +0100, Ian Kumlien wrote:
> Sending this to both netdev and kernel since i don't know if it's the
> driver or the pcie AER that does something odd - the machine was
> stable before 3.19 and PCIE AER.
> 
> Everything started out like i first sent to linux nics () intel:
> --
> 
> And today i had some issues and wondered why things was broken, i was met 
> with:
> 
> [950016.366477] pcieport :00:04.0: AER: Uncorrected (Non-Fatal)
> error received: id=0500
> [950016.366495] igb :05:00.0: PCIe Bus Error: severity=Uncorrected
> (Non-Fatal), type=Transaction Layer, id=0500(Requester ID)
> [950016.366502] igb :05:00.0:   device [8086:1521] error
> status/mask=4000/
> [950016.366509] igb :05:00.0:[14] Completion Timeout
> [950016.366519] igb :05:00.0: broadcast error_detected message
> [950016.379742] br0: port 1(enp5s0f0) entered disabled state
> [950016.488213] igb :05:00.0: broadcast slot_reset message
> [950016.588014] igb :05:00.0: broadcast resume message
> [950016.752654] igb :05:00.0: AER: Device recovery successful
> [950019.817249] igb :05:00.1 enp5s0f1: igb: enp5s0f1 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> [950020.699773] igb :05:00.0 enp5s0f0: igb: enp5s0f0 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: RX
> [950020.701485] br0: port 1(enp5s0f0) entered forwarding state
> [950020.701504] br0: port 1(enp5s0f0) entered forwarding state
> [976152.448092] ata5: exception Emask 0x50 SAct 0x0 SErr 0x4090800
> action 0xe frozen
> [976152.448100] ata5: irq_stat 0x00400040, connection status changed
> [976152.448107] ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
> [976152.448117] ata5: hard resetting link
> [976152.448134] ata6: exception Emask 0x50 SAct 0x0 SErr 0x4090800
> action 0xe frozen
> [976152.448140] ata6: irq_stat 0x00400040, connection status changed
> [976152.448147] ata6: SError: { HostInt PHYRdyChg 10B8B DevExch }
> [976152.448155] ata6: hard resetting link
> [976153.171195] ata6: SATA link down (SStatus 0 SControl 300)
> [976158.174058] ata6: hard resetting link
> [976158.174110] ata5: SATA link down (SStatus 0 SControl 300)
> [976163.176997] ata5: hard resetting link
> [976163.480133] ata6: SATA link down (SStatus 0 SControl 300)
> [976163.480147] ata6: limiting SATA link speed to 1.5 Gbps
> [976168.483028] ata6: hard resetting link
> [976168.483095] ata5: SATA link down (SStatus 0 SControl 300)
> [976168.483108] ata5: limiting SATA link speed to 1.5 Gbps
> [976173.485907] ata5: hard resetting link
> [976173.789066] ata6: SATA link down (SStatus 0 SControl 310)
> [976173.789080] ata6.00: disabled
> [976173.791066] ata6: EH complete
> [976173.791078] ata5: SATA link down (SStatus 0 SControl 310)
> [976173.791085] ata6.00: detaching (SCSI 5:0:0:0)
> [976173.791090] ata5.00: disabled
> [976173.794073] ata5: EH complete
> [976173.794100] ata5.00: detaching (SCSI 4:0:0:0)
> [976173.794968] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
> [976173.795073] sd 5:0:0:0: [sdb] Synchronize Cache(10) failed:
> Result: hostbyte=0x04 driverbyte=0x00
> [976173.795080] sd 5:0:0:0: [sdb] Stopping disk
> [976173.795108] sd 5:0:0:0: [sdb] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [976173.797180] sd 4:0:0:0: [sda] Synchronizing SCSI cache
> [976173.797254] sd 4:0:0:0: [sda] Synchronize Cache(10) failed:
> Result: hostbyte=0x04 driverbyte=0x00
> [976173.797261] sd 4:0:0:0: [sda] Stopping disk
> [976173.797285] sd 4:0:0:0: [sda] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
> 
> So two out of two disks just failed and isn't replying anymore?
> 
> Seven hours after a AER this machine who's intel ssd:s are idle just
> fail to respond? ;)
> 
> Anyway, will reboot it when i get home - any idea/suggestion is more
> than welcome.

Hi Ian,

Did you ever find a resolution to this? I'm seeing something very
similar where a customer upgrades to 3.19 and then there are AER
errors and the links are brought down but 3.10 works fine.

Thanks,
Shaun

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [igb] AER timeout - resend.

2015-07-01 Thread Shaun Ruffell
On Mon, Feb 23, 2015 at 03:56:56PM +0100, Ian Kumlien wrote:
 Sending this to both netdev and kernel since i don't know if it's the
 driver or the pcie AER that does something odd - the machine was
 stable before 3.19 and PCIE AER.
 
 Everything started out like i first sent to linux nics () intel:
 --
 
 And today i had some issues and wondered why things was broken, i was met 
 with:
 
 [950016.366477] pcieport :00:04.0: AER: Uncorrected (Non-Fatal)
 error received: id=0500
 [950016.366495] igb :05:00.0: PCIe Bus Error: severity=Uncorrected
 (Non-Fatal), type=Transaction Layer, id=0500(Requester ID)
 [950016.366502] igb :05:00.0:   device [8086:1521] error
 status/mask=4000/
 [950016.366509] igb :05:00.0:[14] Completion Timeout
 [950016.366519] igb :05:00.0: broadcast error_detected message
 [950016.379742] br0: port 1(enp5s0f0) entered disabled state
 [950016.488213] igb :05:00.0: broadcast slot_reset message
 [950016.588014] igb :05:00.0: broadcast resume message
 [950016.752654] igb :05:00.0: AER: Device recovery successful
 [950019.817249] igb :05:00.1 enp5s0f1: igb: enp5s0f1 NIC Link is
 Up 1000 Mbps Full Duplex, Flow Control: RX/TX
 [950020.699773] igb :05:00.0 enp5s0f0: igb: enp5s0f0 NIC Link is
 Up 1000 Mbps Full Duplex, Flow Control: RX
 [950020.701485] br0: port 1(enp5s0f0) entered forwarding state
 [950020.701504] br0: port 1(enp5s0f0) entered forwarding state
 [976152.448092] ata5: exception Emask 0x50 SAct 0x0 SErr 0x4090800
 action 0xe frozen
 [976152.448100] ata5: irq_stat 0x00400040, connection status changed
 [976152.448107] ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
 [976152.448117] ata5: hard resetting link
 [976152.448134] ata6: exception Emask 0x50 SAct 0x0 SErr 0x4090800
 action 0xe frozen
 [976152.448140] ata6: irq_stat 0x00400040, connection status changed
 [976152.448147] ata6: SError: { HostInt PHYRdyChg 10B8B DevExch }
 [976152.448155] ata6: hard resetting link
 [976153.171195] ata6: SATA link down (SStatus 0 SControl 300)
 [976158.174058] ata6: hard resetting link
 [976158.174110] ata5: SATA link down (SStatus 0 SControl 300)
 [976163.176997] ata5: hard resetting link
 [976163.480133] ata6: SATA link down (SStatus 0 SControl 300)
 [976163.480147] ata6: limiting SATA link speed to 1.5 Gbps
 [976168.483028] ata6: hard resetting link
 [976168.483095] ata5: SATA link down (SStatus 0 SControl 300)
 [976168.483108] ata5: limiting SATA link speed to 1.5 Gbps
 [976173.485907] ata5: hard resetting link
 [976173.789066] ata6: SATA link down (SStatus 0 SControl 310)
 [976173.789080] ata6.00: disabled
 [976173.791066] ata6: EH complete
 [976173.791078] ata5: SATA link down (SStatus 0 SControl 310)
 [976173.791085] ata6.00: detaching (SCSI 5:0:0:0)
 [976173.791090] ata5.00: disabled
 [976173.794073] ata5: EH complete
 [976173.794100] ata5.00: detaching (SCSI 4:0:0:0)
 [976173.794968] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
 [976173.795073] sd 5:0:0:0: [sdb] Synchronize Cache(10) failed:
 Result: hostbyte=0x04 driverbyte=0x00
 [976173.795080] sd 5:0:0:0: [sdb] Stopping disk
 [976173.795108] sd 5:0:0:0: [sdb] Start/Stop Unit failed: Result:
 hostbyte=0x04 driverbyte=0x00
 [976173.797180] sd 4:0:0:0: [sda] Synchronizing SCSI cache
 [976173.797254] sd 4:0:0:0: [sda] Synchronize Cache(10) failed:
 Result: hostbyte=0x04 driverbyte=0x00
 [976173.797261] sd 4:0:0:0: [sda] Stopping disk
 [976173.797285] sd 4:0:0:0: [sda] Start/Stop Unit failed: Result:
 hostbyte=0x04 driverbyte=0x00
 
 So two out of two disks just failed and isn't replying anymore?
 
 Seven hours after a AER this machine who's intel ssd:s are idle just
 fail to respond? ;)
 
 Anyway, will reboot it when i get home - any idea/suggestion is more
 than welcome.

Hi Ian,

Did you ever find a resolution to this? I'm seeing something very
similar where a customer upgrades to 3.19 and then there are AER
errors and the links are brought down but 3.10 works fine.

Thanks,
Shaun

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [igb] AER timeout - resend.

2015-07-01 Thread Shaun Ruffell
On Wed, Jul 01, 2015 at 11:18:36PM +0200, Ian Kumlien wrote:
 It was actually fixed with a bios upgrade from Super Micro (in this case)
 so I'd investigate that first... =)

Hmm...interesting. Thanks for the reply!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] kexec: Create a new config option CONFIG_KEXEC_FILE for new syscall

2014-08-14 Thread Shaun Ruffell
On Wed, Aug 13, 2014 at 10:42:59AM -0400, Vivek Goyal wrote:
> Currently new system call kexec_file_load() and all the associated code
> compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory code
> which currently uses gcc option -mcmodel=large. This option seems to be
> available only gcc 4.4 onwards.
> 
> Hiding new functionality behind a new config option will not break
> existing users of old gcc. Those who wish to enable new functionality
> will require new gcc. Having said that, I am trying to figure out how
> can I move away from using -mcmodel=large but that can take a while.
> 
> I think there are other advantages of introducing this new config
> option. As this option will be enabled only on x86_64, other arches
> don't have to compile generic kexec code which will never be used.
> This new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other
> arches had to do this for CONFIG_KEXEC. Now with introduction
> of new config option, we can remove crypto dependency from other
> arches.
> 
> Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I
> had CONFIG_X86_64 defined, I got rid of that.
> 
> For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed
> it to "depends on CRYPTO=y". This should be safer as "select" is
> not recursive.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  arch/x86/Kbuild|  4 +---
>  arch/x86/Kconfig   | 18 ++
>  arch/x86/Makefile  |  5 +
>  arch/x86/kernel/Makefile   |  2 +-
>  arch/x86/kernel/crash.c|  6 ++
>  arch/x86/kernel/machine_kexec_64.c | 11 +++
>  arch/x86/purgatory/Makefile|  5 +
>  kernel/kexec.c | 11 +++
>  8 files changed, 42 insertions(+), 20 deletions(-)


Thanks Vivek. It is no surprise but applying this patch resolved my
issue.

Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] kexec: Create a new config option CONFIG_KEXEC_FILE for new syscall

2014-08-14 Thread Shaun Ruffell
On Wed, Aug 13, 2014 at 10:42:59AM -0400, Vivek Goyal wrote:
 Currently new system call kexec_file_load() and all the associated code
 compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory code
 which currently uses gcc option -mcmodel=large. This option seems to be
 available only gcc 4.4 onwards.
 
 Hiding new functionality behind a new config option will not break
 existing users of old gcc. Those who wish to enable new functionality
 will require new gcc. Having said that, I am trying to figure out how
 can I move away from using -mcmodel=large but that can take a while.
 
 I think there are other advantages of introducing this new config
 option. As this option will be enabled only on x86_64, other arches
 don't have to compile generic kexec code which will never be used.
 This new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other
 arches had to do this for CONFIG_KEXEC. Now with introduction
 of new config option, we can remove crypto dependency from other
 arches.
 
 Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I
 had CONFIG_X86_64 defined, I got rid of that.
 
 For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed
 it to depends on CRYPTO=y. This should be safer as select is
 not recursive.
 
 Signed-off-by: Vivek Goyal vgo...@redhat.com
 ---
  arch/x86/Kbuild|  4 +---
  arch/x86/Kconfig   | 18 ++
  arch/x86/Makefile  |  5 +
  arch/x86/kernel/Makefile   |  2 +-
  arch/x86/kernel/crash.c|  6 ++
  arch/x86/kernel/machine_kexec_64.c | 11 +++
  arch/x86/purgatory/Makefile|  5 +
  kernel/kexec.c | 11 +++
  8 files changed, 42 insertions(+), 20 deletions(-)


Thanks Vivek. It is no surprise but applying this patch resolved my
issue.

Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/15] purgatory: Core purgatory functionality

2014-08-11 Thread Shaun Ruffell
FYI, it looks like the following patch (committed in
8fc5b4d4121c95482b2583) adds a new requirement to use at least gcc
4.4 to build the kernel?

On Thu, Jun 26, 2014 at 04:33:40PM -0400, Vivek Goyal wrote:
> Create a stand alone relocatable object purgatory which runs between two
> kernels. This name, concept and some code has been taken from kexec-tools.
> Idea is that this code runs after a crash and it runs in minimal environment.
> So keep it separate from rest of the kernel and in long term we will have
> to practically do no maintenance of this code.
> 
> This code also has the logic to do verify sha256 hashes of various
> segments which have been loaded into memory. So first we verify that
> the kernel we are jumping to is fine and has not been corrupted and
> make progress only if checsums are verified.
> 
> This code also takes care of copying some memory contents to backup region.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  arch/x86/Kbuild   |   4 ++
>  arch/x86/Makefile |   8 +++
>  arch/x86/purgatory/Makefile   |  30 +++
>  arch/x86/purgatory/entry64.S  | 101 
> ++
>  arch/x86/purgatory/purgatory.c|  72 +++
>  arch/x86/purgatory/setup-x86_64.S |  58 ++
>  arch/x86/purgatory/stack.S|  19 +++
>  arch/x86/purgatory/string.c   |  13 +
>  8 files changed, 305 insertions(+)
>  create mode 100644 arch/x86/purgatory/Makefile
>  create mode 100644 arch/x86/purgatory/entry64.S
>  create mode 100644 arch/x86/purgatory/purgatory.c
>  create mode 100644 arch/x86/purgatory/setup-x86_64.S
>  create mode 100644 arch/x86/purgatory/stack.S
>  create mode 100644 arch/x86/purgatory/string.c
> 

[snip]

> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> new file mode 100644
> index 000..e5829dd
> --- /dev/null
> +++ b/arch/x86/purgatory/Makefile
> @@ -0,0 +1,30 @@
> +purgatory-y := purgatory.o stack.o setup-x86_$(BITS).o sha256.o entry64.o 
> string.o
> +
> +targets += $(purgatory-y)
> +PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
> +
> +LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z 
> nodefaultlib
> +targets += purgatory.ro
> +
> +# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That
> +# in turn leaves some undefined symbols like __fentry__ in purgatory and not
> +# sure how to relocate those. Like kexec-tools, use custom flags.
> +
> +KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes 
> -fno-zero-initialized-in-bss -fno-builtin -ffreestanding -c -MD -Os 
> -mcmodel=large

The above "-mcmodel=large" compiler flag produces the following output on GCC 
4.1.2.

  $ make modules_prepare ; gcc --version
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CC  arch/x86/purgatory/purgatory.o
  arch/x86/purgatory/purgatory.c:1: sorry, unimplemented: code model ‘large’ 
not supported yet
  make[1]: *** [arch/x86/purgatory/purgatory.o] Error 1
  make: *** [archprepare] Error 2
  gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54)
  Copyright (C) 2006 Free Software Foundation, Inc.
  This is free software; see the source for copying conditions.  There is NO
  warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I did a quick search for a discussion the indicates this compiler is now
officially too old to build the kernel but did not find one.

If this is required, maybe Documentation/Changes needs to be updated
with the new minimum required version?

Cheers,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/15] purgatory: Core purgatory functionality

2014-08-11 Thread Shaun Ruffell
FYI, it looks like the following patch (committed in
8fc5b4d4121c95482b2583) adds a new requirement to use at least gcc
4.4 to build the kernel?

On Thu, Jun 26, 2014 at 04:33:40PM -0400, Vivek Goyal wrote:
 Create a stand alone relocatable object purgatory which runs between two
 kernels. This name, concept and some code has been taken from kexec-tools.
 Idea is that this code runs after a crash and it runs in minimal environment.
 So keep it separate from rest of the kernel and in long term we will have
 to practically do no maintenance of this code.
 
 This code also has the logic to do verify sha256 hashes of various
 segments which have been loaded into memory. So first we verify that
 the kernel we are jumping to is fine and has not been corrupted and
 make progress only if checsums are verified.
 
 This code also takes care of copying some memory contents to backup region.
 
 Signed-off-by: Vivek Goyal vgo...@redhat.com
 ---
  arch/x86/Kbuild   |   4 ++
  arch/x86/Makefile |   8 +++
  arch/x86/purgatory/Makefile   |  30 +++
  arch/x86/purgatory/entry64.S  | 101 
 ++
  arch/x86/purgatory/purgatory.c|  72 +++
  arch/x86/purgatory/setup-x86_64.S |  58 ++
  arch/x86/purgatory/stack.S|  19 +++
  arch/x86/purgatory/string.c   |  13 +
  8 files changed, 305 insertions(+)
  create mode 100644 arch/x86/purgatory/Makefile
  create mode 100644 arch/x86/purgatory/entry64.S
  create mode 100644 arch/x86/purgatory/purgatory.c
  create mode 100644 arch/x86/purgatory/setup-x86_64.S
  create mode 100644 arch/x86/purgatory/stack.S
  create mode 100644 arch/x86/purgatory/string.c
 

[snip]

 diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
 new file mode 100644
 index 000..e5829dd
 --- /dev/null
 +++ b/arch/x86/purgatory/Makefile
 @@ -0,0 +1,30 @@
 +purgatory-y := purgatory.o stack.o setup-x86_$(BITS).o sha256.o entry64.o 
 string.o
 +
 +targets += $(purgatory-y)
 +PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
 +
 +LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z 
 nodefaultlib
 +targets += purgatory.ro
 +
 +# Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That
 +# in turn leaves some undefined symbols like __fentry__ in purgatory and not
 +# sure how to relocate those. Like kexec-tools, use custom flags.
 +
 +KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes 
 -fno-zero-initialized-in-bss -fno-builtin -ffreestanding -c -MD -Os 
 -mcmodel=large

The above -mcmodel=large compiler flag produces the following output on GCC 
4.1.2.

  $ make modules_prepare ; gcc --version
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
CC  arch/x86/purgatory/purgatory.o
  arch/x86/purgatory/purgatory.c:1: sorry, unimplemented: code model ‘large’ 
not supported yet
  make[1]: *** [arch/x86/purgatory/purgatory.o] Error 1
  make: *** [archprepare] Error 2
  gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54)
  Copyright (C) 2006 Free Software Foundation, Inc.
  This is free software; see the source for copying conditions.  There is NO
  warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I did a quick search for a discussion the indicates this compiler is now
officially too old to build the kernel but did not find one.

If this is required, maybe Documentation/Changes needs to be updated
with the new minimum required version?

Cheers,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression in 3.14: BUG in get_next_timer_interrupt()

2014-04-04 Thread Shaun Ruffell
I just updated one of my development machines to 3.14 from 3.12.5
and hit a BUG in get_next_timer_interrupt().

In order to get to this point, I had to apply commit (30f2555 "ipv6:
some ipv6 statistic counters failed to disable bh") from the current
master on top of 3.14 to get around a lockdep splat. I then also
just updated to the current master, as of April 3 2014 -
4a4389abdd9822fdf3c, and still hit the BUG.

It is consistently reproduceable when I login to the machine over
ssh. I don't have time right now to start searching for the source
but thought I would throw this out there in case someone else hits
it.

   BUG: unable to handle kernel paging request at 6b6b6b77
   IP: [] get_next_timer_interrupt+0x140/0x230
   *pde = 
   Oops:  [#1] SMP
   Modules linked in: ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 
ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables tun bridge stp 
cpufreq_powersave llc autofs4 cpufreq_ondemand ip6t_REJECT nf_conntrack_ipv6 
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev 
iTCO_wdt iTCO_vendor_support parport_pc parport microcode ipmi_si 
ipmi_msghandler video pcspkr sg i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp 
pps_core acpi_cpufreq ext4 jbd2 mbcache sd_mod crc_t10dif crct10dif_common ahci 
libahci dm_mirror dm_region_hash dm_log dm_mod
   CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.14.0.sruffell.debug+ #33
   Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS 
S1200BT.86B.02.00.0035.030220120927 03/02/2012
   task: f4dfda00 ti: f4e2a000 task.ti: f4e2a000
   EIP: 0060:[] EFLAGS: 00210007 CPU: 2
   EIP is at get_next_timer_interrupt+0x140/0x230
   EAX: 6b6b6b6b EBX: 3ffd091a ECX: f4e54a24 EDX: 
   ESI: 0011 EDI: f4e5499c EBP: f4e2befc ESP: f4e2bec8
DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
   CR0: 80050033 CR2: 6b6b6b77 CR3: 019a3000 CR4: 000407d0
   Stack:
 00fffd0a fffd091a 3ffd0919 f4e54160 000a f4e5499c f4e54b9c
f4e54d9c f4e54f9c d98e39d9  f54c2260 f4e2bf54 c10b7325 0002
0001  c10b7520 9d48a980 0018 c1074e4e 9d4a60c0 0018
   Call Trace:
[] tick_nohz_stop_sched_tick+0x265/0x3d0
[] ? __tick_nohz_idle_enter+0x90/0x130
[] ? sched_clock_cpu+0x13e/0x150
[] __tick_nohz_idle_enter+0x90/0x130
[] ? set_cpu_sd_state_idle+0x85/0xa0
[] tick_nohz_idle_enter+0x32/0x60
[] cpu_idle_loop+0x23/0x1c0
[] ? clockevents_config_and_register+0x22/0x30
[] cpu_startup_entry+0x1f/0x30
[] start_secondary+0x96/0xb0
   Code: 8b 45 dc 05 3c 0e 00 00 89 45 f0 8b 45 d0 83 e0 3f 89 45 e0 89 c6 90 
8d 74 26 00 8b 04 f7 8d 0c f7 39 c8 74 1f 8d b6 00 00 00 00  40 0c 01 75 0d 
8b 50 08 39 da 0f 48 da ba 01 00 00 00 8b 00
   EIP: [] get_next_timer_interrupt+0x140/0x230 SS:ESP 0068:f4e2bec8
   CR2: 6b6b6b77
   ---[ end trace b98242504b80ebf5 ]---

Cheers,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Regression in 3.14: BUG in get_next_timer_interrupt()

2014-04-04 Thread Shaun Ruffell
I just updated one of my development machines to 3.14 from 3.12.5
and hit a BUG in get_next_timer_interrupt().

In order to get to this point, I had to apply commit (30f2555 ipv6:
some ipv6 statistic counters failed to disable bh) from the current
master on top of 3.14 to get around a lockdep splat. I then also
just updated to the current master, as of April 3 2014 -
4a4389abdd9822fdf3c, and still hit the BUG.

It is consistently reproduceable when I login to the machine over
ssh. I don't have time right now to start searching for the source
but thought I would throw this out there in case someone else hits
it.

   BUG: unable to handle kernel paging request at 6b6b6b77
   IP: [c104cb10] get_next_timer_interrupt+0x140/0x230
   *pde = 
   Oops:  [#1] SMP
   Modules linked in: ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 
ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables tun bridge stp 
cpufreq_powersave llc autofs4 cpufreq_ondemand ip6t_REJECT nf_conntrack_ipv6 
nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev 
iTCO_wdt iTCO_vendor_support parport_pc parport microcode ipmi_si 
ipmi_msghandler video pcspkr sg i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp 
pps_core acpi_cpufreq ext4 jbd2 mbcache sd_mod crc_t10dif crct10dif_common ahci 
libahci dm_mirror dm_region_hash dm_log dm_mod
   CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.14.0.sruffell.debug+ #33
   Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS 
S1200BT.86B.02.00.0035.030220120927 03/02/2012
   task: f4dfda00 ti: f4e2a000 task.ti: f4e2a000
   EIP: 0060:[c104cb10] EFLAGS: 00210007 CPU: 2
   EIP is at get_next_timer_interrupt+0x140/0x230
   EAX: 6b6b6b6b EBX: 3ffd091a ECX: f4e54a24 EDX: 
   ESI: 0011 EDI: f4e5499c EBP: f4e2befc ESP: f4e2bec8
DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
   CR0: 80050033 CR2: 6b6b6b77 CR3: 019a3000 CR4: 000407d0
   Stack:
 00fffd0a fffd091a 3ffd0919 f4e54160 000a f4e5499c f4e54b9c
f4e54d9c f4e54f9c d98e39d9  f54c2260 f4e2bf54 c10b7325 0002
0001  c10b7520 9d48a980 0018 c1074e4e 9d4a60c0 0018
   Call Trace:
[c10b7325] tick_nohz_stop_sched_tick+0x265/0x3d0
[c10b7520] ? __tick_nohz_idle_enter+0x90/0x130
[c1074e4e] ? sched_clock_cpu+0x13e/0x150
[c10b7520] __tick_nohz_idle_enter+0x90/0x130
[c1078e75] ? set_cpu_sd_state_idle+0x85/0xa0
[c10b7622] tick_nohz_idle_enter+0x32/0x60
[c1084f93] cpu_idle_loop+0x23/0x1c0
[c10b4792] ? clockevents_config_and_register+0x22/0x30
[c108514f] cpu_startup_entry+0x1f/0x30
[c102d956] start_secondary+0x96/0xb0
   Code: 8b 45 dc 05 3c 0e 00 00 89 45 f0 8b 45 d0 83 e0 3f 89 45 e0 89 c6 90 
8d 74 26 00 8b 04 f7 8d 0c f7 39 c8 74 1f 8d b6 00 00 00 00 f6 40 0c 01 75 0d 
8b 50 08 39 da 0f 48 da ba 01 00 00 00 8b 00
   EIP: [c104cb10] get_next_timer_interrupt+0x140/0x230 SS:ESP 0068:f4e2bec8
   CR2: 6b6b6b77
   ---[ end trace b98242504b80ebf5 ]---

Cheers,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.9-rc1] Bug in bootup code or debug code?

2013-03-20 Thread Shaun Ruffell
On Wed, Mar 20, 2013 at 04:32:14PM +, Yu, Fenghua wrote:
> > From: Shaun Ruffell [mailto:sruff...@digium.com]
> > On Tue, Mar 19, 2013 at 10:12:39PM +, Yu, Fenghua wrote:
> > > > From: Tetsuo Handa [mailto:penguin-ker...@i-love.sakura.ne.jp]
> > > > H. Peter Anvin wrote:
>  
> > Hi Fenghua,
> > 
> > I ran into the same issue on a test system I use (not a virtual
> > machine) and went through basically the same process as Dave
> > Hansen w/bisecting before finding this thread.
> > 
> > Any chance you could send the patch to the mailing list and I could
> > also throw it on my test system?
> Hi, Shaun,
> 
> The patch is in tip.git tree now. You can get it from:
> http://git.kernel.org/tip/c83a9d5e425d4678b05ca058fec6254f18601474
> 
> Please let us know if the patch fixes the issue you saw.

Thanks for the link. That patch applied on 3.9-rc3 did allow me to boot with my
default kernel config.

Not related to this patch, and not sure it really matters, but FYI: I just
noticed the following warning when building the patched kernel:

WARNING: vmlinux.o(.text+0x2a1a7): Section mismatch in reference from the 
function apply_microcode_early() to the function .cpuinit.text:print_ucode()
The function apply_microcode_early() references
the function __cpuinit print_ucode().
This is often because apply_microcode_early lacks a __cpuinit 
annotation or the annotation of print_ucode is wrong.

Thanks,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.9-rc1] Bug in bootup code or debug code?

2013-03-20 Thread Shaun Ruffell
On Tue, Mar 19, 2013 at 10:12:39PM +, Yu, Fenghua wrote:
> > From: Tetsuo Handa [mailto:penguin-ker...@i-love.sakura.ne.jp]
> > H. Peter Anvin wrote:
> > > This is a CONFIG_DEBUG_VIRTUAL configuration, isn't it?
> > 
> > Yes. CONFIG_MICROCODE_INTEL_EARLY=y && CONFIG_64BIT=n &&
> > CONFIG_DEBUG_VIRTUAL=y
> > on VMware Workstation/Player environment.
> 
> Tetsuo,
> 
> I just now sent out a patch to fix this issue and you are in the list.
> 
> Could you please verify if it fixes the issue you saw?

Hi Fenghua,

I ran into the same issue on a test system I use (not a virtual
machine) and went through basically the same process as Dave
Hansen w/bisecting before finding this thread.

Any chance you could send the patch to the mailing list and I could
also throw it on my test system?

Thanks,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.9-rc1] Bug in bootup code or debug code?

2013-03-20 Thread Shaun Ruffell
On Tue, Mar 19, 2013 at 10:12:39PM +, Yu, Fenghua wrote:
  From: Tetsuo Handa [mailto:penguin-ker...@i-love.sakura.ne.jp]
  H. Peter Anvin wrote:
   This is a CONFIG_DEBUG_VIRTUAL configuration, isn't it?
  
  Yes. CONFIG_MICROCODE_INTEL_EARLY=y  CONFIG_64BIT=n 
  CONFIG_DEBUG_VIRTUAL=y
  on VMware Workstation/Player environment.
 
 Tetsuo,
 
 I just now sent out a patch to fix this issue and you are in the list.
 
 Could you please verify if it fixes the issue you saw?

Hi Fenghua,

I ran into the same issue on a test system I use (not a virtual
machine) and went through basically the same process as Dave
Hansen w/bisecting before finding this thread.

Any chance you could send the patch to the mailing list and I could
also throw it on my test system?

Thanks,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.9-rc1] Bug in bootup code or debug code?

2013-03-20 Thread Shaun Ruffell
On Wed, Mar 20, 2013 at 04:32:14PM +, Yu, Fenghua wrote:
  From: Shaun Ruffell [mailto:sruff...@digium.com]
  On Tue, Mar 19, 2013 at 10:12:39PM +, Yu, Fenghua wrote:
From: Tetsuo Handa [mailto:penguin-ker...@i-love.sakura.ne.jp]
H. Peter Anvin wrote:
  
  Hi Fenghua,
  
  I ran into the same issue on a test system I use (not a virtual
  machine) and went through basically the same process as Dave
  Hansen w/bisecting before finding this thread.
  
  Any chance you could send the patch to the mailing list and I could
  also throw it on my test system?
 Hi, Shaun,
 
 The patch is in tip.git tree now. You can get it from:
 http://git.kernel.org/tip/c83a9d5e425d4678b05ca058fec6254f18601474
 
 Please let us know if the patch fixes the issue you saw.

Thanks for the link. That patch applied on 3.9-rc3 did allow me to boot with my
default kernel config.

Not related to this patch, and not sure it really matters, but FYI: I just
noticed the following warning when building the patched kernel:

WARNING: vmlinux.o(.text+0x2a1a7): Section mismatch in reference from the 
function apply_microcode_early() to the function .cpuinit.text:print_ucode()
The function apply_microcode_early() references
the function __cpuinit print_ucode().
This is often because apply_microcode_early lacks a __cpuinit 
annotation or the annotation of print_ucode is wrong.

Thanks,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/perf_events: Fix "section type conflict" build error.

2013-01-20 Thread Shaun Ruffell
Bump. Still an issue with v3.8-rc4. Am I mistaken in thinking this
patch is trivial and obviously correct?

Thanks,
Shaun

On Wed, Jan 09, 2013 at 03:59:42PM -0600, Shaun Ruffell wrote:
> From: Jan Beulich 
> 
> This patch fixes a build regression first introduced in 3.7 with
> (e09df47 "perf/x86: Update/fix generic events on P6 PMU").
> 
> At least some older versions of gcc, like (GCC) 4.1.2 20080704 (Red Hat
> 4.1.2-51), dislike mixing constant and non-const data in the same
> section. Without this patch a build will fail with the following error:
> 
> CC  arch/x86/kernel/cpu/perf_event_p6.o
>   arch/x86/kernel/cpu/perf_event_p6.c:22: error: p6_hw_cache_event_ids causes 
> a section type conflict
>   make[3]: *** [arch/x86/kernel/cpu/perf_event_p6.o] Error 1
> 
> Newer versions of gcc simply emits the section as writable (which isn't what
> we want, but also is not a big problem as it gets discarded post-init anyway).
> 
> Also get the Knight's Corner definitions in sync.
> 
> Signed-off-by: Jan Beulich 
> Cc: sta...@vger.kernel.org # 3.7.x only
> [sruff...@digium.com: Added details to the commit message.]
> Signed-off-by: Shaun Ruffell 
> 
> ---
> 
> [I had the wrong stable email address when I previously sent this.
> Sorry about the extra noise.]
> 
> Hans, Thomas,
> 
> Any chance of picking this up for 3.8? This doesn't seem too
> controversial since it only makes sure that data in the __initconst
> section is const.
> 
> Rob Landley also reported this to the list here:
> https://lkml.org/lkml/2012/12/14/511. 
> 
> Thanks,
> Shaun
> 
>  arch/x86/kernel/cpu/perf_event_knc.c | 4 ++--
>  arch/x86/kernel/cpu/perf_event_p6.c  | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_knc.c 
> b/arch/x86/kernel/cpu/perf_event_knc.c
> index 4b7731b..838fa87 100644
> --- a/arch/x86/kernel/cpu/perf_event_knc.c
> +++ b/arch/x86/kernel/cpu/perf_event_knc.c
> @@ -17,7 +17,7 @@ static const u64 knc_perfmon_event_map[] =
>[PERF_COUNT_HW_BRANCH_MISSES]  = 0x002b,
>  };
>  
> -static __initconst u64 knc_hw_cache_event_ids
> +static const u64 __initconst knc_hw_cache_event_ids
>   [PERF_COUNT_HW_CACHE_MAX]
>   [PERF_COUNT_HW_CACHE_OP_MAX]
>   [PERF_COUNT_HW_CACHE_RESULT_MAX] =
> @@ -284,7 +284,7 @@ static struct attribute *intel_knc_formats_attr[] = {
>   NULL,
>  };
>  
> -static __initconst struct x86_pmu knc_pmu = {
> +static const struct x86_pmu knc_pmu __initconst = {
>   .name   = "knc",
>   .handle_irq = knc_pmu_handle_irq,
>   .disable_all= knc_pmu_disable_all,
> diff --git a/arch/x86/kernel/cpu/perf_event_p6.c 
> b/arch/x86/kernel/cpu/perf_event_p6.c
> index f2af39f..b1e2fe1 100644
> --- a/arch/x86/kernel/cpu/perf_event_p6.c
> +++ b/arch/x86/kernel/cpu/perf_event_p6.c
> @@ -19,7 +19,7 @@ static const u64 p6_perfmon_event_map[] =
>  
>  };
>  
> -static __initconst u64 p6_hw_cache_event_ids
> +static const u64 __initconst p6_hw_cache_event_ids
>   [PERF_COUNT_HW_CACHE_MAX]
>   [PERF_COUNT_HW_CACHE_OP_MAX]
>   [PERF_COUNT_HW_CACHE_RESULT_MAX] =
> -- 
> 1.8.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/perf_events: Fix section type conflict build error.

2013-01-20 Thread Shaun Ruffell
Bump. Still an issue with v3.8-rc4. Am I mistaken in thinking this
patch is trivial and obviously correct?

Thanks,
Shaun

On Wed, Jan 09, 2013 at 03:59:42PM -0600, Shaun Ruffell wrote:
 From: Jan Beulich jbeul...@suse.com
 
 This patch fixes a build regression first introduced in 3.7 with
 (e09df47 perf/x86: Update/fix generic events on P6 PMU).
 
 At least some older versions of gcc, like (GCC) 4.1.2 20080704 (Red Hat
 4.1.2-51), dislike mixing constant and non-const data in the same
 section. Without this patch a build will fail with the following error:
 
 CC  arch/x86/kernel/cpu/perf_event_p6.o
   arch/x86/kernel/cpu/perf_event_p6.c:22: error: p6_hw_cache_event_ids causes 
 a section type conflict
   make[3]: *** [arch/x86/kernel/cpu/perf_event_p6.o] Error 1
 
 Newer versions of gcc simply emits the section as writable (which isn't what
 we want, but also is not a big problem as it gets discarded post-init anyway).
 
 Also get the Knight's Corner definitions in sync.
 
 Signed-off-by: Jan Beulich jbeul...@suse.com
 Cc: sta...@vger.kernel.org # 3.7.x only
 [sruff...@digium.com: Added details to the commit message.]
 Signed-off-by: Shaun Ruffell sruff...@digium.com
 
 ---
 
 [I had the wrong stable email address when I previously sent this.
 Sorry about the extra noise.]
 
 Hans, Thomas,
 
 Any chance of picking this up for 3.8? This doesn't seem too
 controversial since it only makes sure that data in the __initconst
 section is const.
 
 Rob Landley also reported this to the list here:
 https://lkml.org/lkml/2012/12/14/511. 
 
 Thanks,
 Shaun
 
  arch/x86/kernel/cpu/perf_event_knc.c | 4 ++--
  arch/x86/kernel/cpu/perf_event_p6.c  | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/arch/x86/kernel/cpu/perf_event_knc.c 
 b/arch/x86/kernel/cpu/perf_event_knc.c
 index 4b7731b..838fa87 100644
 --- a/arch/x86/kernel/cpu/perf_event_knc.c
 +++ b/arch/x86/kernel/cpu/perf_event_knc.c
 @@ -17,7 +17,7 @@ static const u64 knc_perfmon_event_map[] =
[PERF_COUNT_HW_BRANCH_MISSES]  = 0x002b,
  };
  
 -static __initconst u64 knc_hw_cache_event_ids
 +static const u64 __initconst knc_hw_cache_event_ids
   [PERF_COUNT_HW_CACHE_MAX]
   [PERF_COUNT_HW_CACHE_OP_MAX]
   [PERF_COUNT_HW_CACHE_RESULT_MAX] =
 @@ -284,7 +284,7 @@ static struct attribute *intel_knc_formats_attr[] = {
   NULL,
  };
  
 -static __initconst struct x86_pmu knc_pmu = {
 +static const struct x86_pmu knc_pmu __initconst = {
   .name   = knc,
   .handle_irq = knc_pmu_handle_irq,
   .disable_all= knc_pmu_disable_all,
 diff --git a/arch/x86/kernel/cpu/perf_event_p6.c 
 b/arch/x86/kernel/cpu/perf_event_p6.c
 index f2af39f..b1e2fe1 100644
 --- a/arch/x86/kernel/cpu/perf_event_p6.c
 +++ b/arch/x86/kernel/cpu/perf_event_p6.c
 @@ -19,7 +19,7 @@ static const u64 p6_perfmon_event_map[] =
  
  };
  
 -static __initconst u64 p6_hw_cache_event_ids
 +static const u64 __initconst p6_hw_cache_event_ids
   [PERF_COUNT_HW_CACHE_MAX]
   [PERF_COUNT_HW_CACHE_OP_MAX]
   [PERF_COUNT_HW_CACHE_RESULT_MAX] =
 -- 
 1.8.1
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86/perf_events: Fix "section type conflict" build error.

2013-01-09 Thread Shaun Ruffell
From: Jan Beulich 

This patch fixes a build regression first introduced in 3.7 with
(e09df47 "perf/x86: Update/fix generic events on P6 PMU").

At least some older versions of gcc, like (GCC) 4.1.2 20080704 (Red Hat
4.1.2-51), dislike mixing constant and non-const data in the same
section. Without this patch a build will fail with the following error:

CC  arch/x86/kernel/cpu/perf_event_p6.o
  arch/x86/kernel/cpu/perf_event_p6.c:22: error: p6_hw_cache_event_ids causes a 
section type conflict
  make[3]: *** [arch/x86/kernel/cpu/perf_event_p6.o] Error 1

Newer versions of gcc simply emits the section as writable (which isn't what
we want, but also is not a big problem as it gets discarded post-init anyway).

Also get the Knight's Corner definitions in sync.

Signed-off-by: Jan Beulich 
Cc: sta...@vger.kernel.org # 3.7.x only
[sruff...@digium.com: Added details to the commit message.]
Signed-off-by: Shaun Ruffell 

---

[I had the wrong stable email address when I previously sent this.
Sorry about the extra noise.]

Hans, Thomas,

Any chance of picking this up for 3.8? This doesn't seem too
controversial since it only makes sure that data in the __initconst
section is const.

Rob Landley also reported this to the list here:
https://lkml.org/lkml/2012/12/14/511. 

Thanks,
Shaun

 arch/x86/kernel/cpu/perf_event_knc.c | 4 ++--
 arch/x86/kernel/cpu/perf_event_p6.c  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_knc.c 
b/arch/x86/kernel/cpu/perf_event_knc.c
index 4b7731b..838fa87 100644
--- a/arch/x86/kernel/cpu/perf_event_knc.c
+++ b/arch/x86/kernel/cpu/perf_event_knc.c
@@ -17,7 +17,7 @@ static const u64 knc_perfmon_event_map[] =
   [PERF_COUNT_HW_BRANCH_MISSES]= 0x002b,
 };
 
-static __initconst u64 knc_hw_cache_event_ids
+static const u64 __initconst knc_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
@@ -284,7 +284,7 @@ static struct attribute *intel_knc_formats_attr[] = {
NULL,
 };
 
-static __initconst struct x86_pmu knc_pmu = {
+static const struct x86_pmu knc_pmu __initconst = {
.name   = "knc",
.handle_irq = knc_pmu_handle_irq,
.disable_all= knc_pmu_disable_all,
diff --git a/arch/x86/kernel/cpu/perf_event_p6.c 
b/arch/x86/kernel/cpu/perf_event_p6.c
index f2af39f..b1e2fe1 100644
--- a/arch/x86/kernel/cpu/perf_event_p6.c
+++ b/arch/x86/kernel/cpu/perf_event_p6.c
@@ -19,7 +19,7 @@ static const u64 p6_perfmon_event_map[] =
 
 };
 
-static __initconst u64 p6_hw_cache_event_ids
+static const u64 __initconst p6_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
-- 
1.8.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86/perf_events: Fix "section type conflict" build error.

2013-01-09 Thread Shaun Ruffell
From: Jan Beulich 

This patch fixes a build regression first introduced in 3.7 with
(e09df47 "perf/x86: Update/fix generic events on P6 PMU").

At least some older versions of gcc, like (GCC) 4.1.2 20080704 (Red Hat
4.1.2-51), dislike mixing constant and non-const data in the same
section. Without this patch a build will fail with the following error:

CC  arch/x86/kernel/cpu/perf_event_p6.o
  arch/x86/kernel/cpu/perf_event_p6.c:22: error: p6_hw_cache_event_ids causes a 
section type conflict
  make[3]: *** [arch/x86/kernel/cpu/perf_event_p6.o] Error 1

Newer versions of gcc simply emits the section as writable (which isn't what
we want, but also is not a big problem as it gets discarded post-init anyway).

Also get the Knight's Corner definitions in sync.

Signed-off-by: Jan Beulich 
Cc: sta...@kernel.org # 3.7.x only
[sruff...@digium.com: Added details to the commit message.]
Signed-off-by: Shaun Ruffell 

---

Hans, Thomas,

Any chance of picking this up for 3.8? This doesn't seem too
controversial since it only makes sure that data in the __initconst
section is const.

Rob Landley also reported this to the list here:
https://lkml.org/lkml/2012/12/14/511. 

Thanks,
Shaun

 arch/x86/kernel/cpu/perf_event_knc.c | 4 ++--
 arch/x86/kernel/cpu/perf_event_p6.c  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_knc.c 
b/arch/x86/kernel/cpu/perf_event_knc.c
index 4b7731b..838fa87 100644
--- a/arch/x86/kernel/cpu/perf_event_knc.c
+++ b/arch/x86/kernel/cpu/perf_event_knc.c
@@ -17,7 +17,7 @@ static const u64 knc_perfmon_event_map[] =
   [PERF_COUNT_HW_BRANCH_MISSES]= 0x002b,
 };
 
-static __initconst u64 knc_hw_cache_event_ids
+static const u64 __initconst knc_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
@@ -284,7 +284,7 @@ static struct attribute *intel_knc_formats_attr[] = {
NULL,
 };
 
-static __initconst struct x86_pmu knc_pmu = {
+static const struct x86_pmu knc_pmu __initconst = {
.name   = "knc",
.handle_irq = knc_pmu_handle_irq,
.disable_all= knc_pmu_disable_all,
diff --git a/arch/x86/kernel/cpu/perf_event_p6.c 
b/arch/x86/kernel/cpu/perf_event_p6.c
index f2af39f..b1e2fe1 100644
--- a/arch/x86/kernel/cpu/perf_event_p6.c
+++ b/arch/x86/kernel/cpu/perf_event_p6.c
@@ -19,7 +19,7 @@ static const u64 p6_perfmon_event_map[] =
 
 };
 
-static __initconst u64 p6_hw_cache_event_ids
+static const u64 __initconst p6_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
-- 
1.8.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86/perf_events: Fix section type conflict build error.

2013-01-09 Thread Shaun Ruffell
From: Jan Beulich jbeul...@suse.com

This patch fixes a build regression first introduced in 3.7 with
(e09df47 perf/x86: Update/fix generic events on P6 PMU).

At least some older versions of gcc, like (GCC) 4.1.2 20080704 (Red Hat
4.1.2-51), dislike mixing constant and non-const data in the same
section. Without this patch a build will fail with the following error:

CC  arch/x86/kernel/cpu/perf_event_p6.o
  arch/x86/kernel/cpu/perf_event_p6.c:22: error: p6_hw_cache_event_ids causes a 
section type conflict
  make[3]: *** [arch/x86/kernel/cpu/perf_event_p6.o] Error 1

Newer versions of gcc simply emits the section as writable (which isn't what
we want, but also is not a big problem as it gets discarded post-init anyway).

Also get the Knight's Corner definitions in sync.

Signed-off-by: Jan Beulich jbeul...@suse.com
Cc: sta...@kernel.org # 3.7.x only
[sruff...@digium.com: Added details to the commit message.]
Signed-off-by: Shaun Ruffell sruff...@digium.com

---

Hans, Thomas,

Any chance of picking this up for 3.8? This doesn't seem too
controversial since it only makes sure that data in the __initconst
section is const.

Rob Landley also reported this to the list here:
https://lkml.org/lkml/2012/12/14/511. 

Thanks,
Shaun

 arch/x86/kernel/cpu/perf_event_knc.c | 4 ++--
 arch/x86/kernel/cpu/perf_event_p6.c  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_knc.c 
b/arch/x86/kernel/cpu/perf_event_knc.c
index 4b7731b..838fa87 100644
--- a/arch/x86/kernel/cpu/perf_event_knc.c
+++ b/arch/x86/kernel/cpu/perf_event_knc.c
@@ -17,7 +17,7 @@ static const u64 knc_perfmon_event_map[] =
   [PERF_COUNT_HW_BRANCH_MISSES]= 0x002b,
 };
 
-static __initconst u64 knc_hw_cache_event_ids
+static const u64 __initconst knc_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
@@ -284,7 +284,7 @@ static struct attribute *intel_knc_formats_attr[] = {
NULL,
 };
 
-static __initconst struct x86_pmu knc_pmu = {
+static const struct x86_pmu knc_pmu __initconst = {
.name   = knc,
.handle_irq = knc_pmu_handle_irq,
.disable_all= knc_pmu_disable_all,
diff --git a/arch/x86/kernel/cpu/perf_event_p6.c 
b/arch/x86/kernel/cpu/perf_event_p6.c
index f2af39f..b1e2fe1 100644
--- a/arch/x86/kernel/cpu/perf_event_p6.c
+++ b/arch/x86/kernel/cpu/perf_event_p6.c
@@ -19,7 +19,7 @@ static const u64 p6_perfmon_event_map[] =
 
 };
 
-static __initconst u64 p6_hw_cache_event_ids
+static const u64 __initconst p6_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
-- 
1.8.1
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86/perf_events: Fix section type conflict build error.

2013-01-09 Thread Shaun Ruffell
From: Jan Beulich jbeul...@suse.com

This patch fixes a build regression first introduced in 3.7 with
(e09df47 perf/x86: Update/fix generic events on P6 PMU).

At least some older versions of gcc, like (GCC) 4.1.2 20080704 (Red Hat
4.1.2-51), dislike mixing constant and non-const data in the same
section. Without this patch a build will fail with the following error:

CC  arch/x86/kernel/cpu/perf_event_p6.o
  arch/x86/kernel/cpu/perf_event_p6.c:22: error: p6_hw_cache_event_ids causes a 
section type conflict
  make[3]: *** [arch/x86/kernel/cpu/perf_event_p6.o] Error 1

Newer versions of gcc simply emits the section as writable (which isn't what
we want, but also is not a big problem as it gets discarded post-init anyway).

Also get the Knight's Corner definitions in sync.

Signed-off-by: Jan Beulich jbeul...@suse.com
Cc: sta...@vger.kernel.org # 3.7.x only
[sruff...@digium.com: Added details to the commit message.]
Signed-off-by: Shaun Ruffell sruff...@digium.com

---

[I had the wrong stable email address when I previously sent this.
Sorry about the extra noise.]

Hans, Thomas,

Any chance of picking this up for 3.8? This doesn't seem too
controversial since it only makes sure that data in the __initconst
section is const.

Rob Landley also reported this to the list here:
https://lkml.org/lkml/2012/12/14/511. 

Thanks,
Shaun

 arch/x86/kernel/cpu/perf_event_knc.c | 4 ++--
 arch/x86/kernel/cpu/perf_event_p6.c  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_knc.c 
b/arch/x86/kernel/cpu/perf_event_knc.c
index 4b7731b..838fa87 100644
--- a/arch/x86/kernel/cpu/perf_event_knc.c
+++ b/arch/x86/kernel/cpu/perf_event_knc.c
@@ -17,7 +17,7 @@ static const u64 knc_perfmon_event_map[] =
   [PERF_COUNT_HW_BRANCH_MISSES]= 0x002b,
 };
 
-static __initconst u64 knc_hw_cache_event_ids
+static const u64 __initconst knc_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
@@ -284,7 +284,7 @@ static struct attribute *intel_knc_formats_attr[] = {
NULL,
 };
 
-static __initconst struct x86_pmu knc_pmu = {
+static const struct x86_pmu knc_pmu __initconst = {
.name   = knc,
.handle_irq = knc_pmu_handle_irq,
.disable_all= knc_pmu_disable_all,
diff --git a/arch/x86/kernel/cpu/perf_event_p6.c 
b/arch/x86/kernel/cpu/perf_event_p6.c
index f2af39f..b1e2fe1 100644
--- a/arch/x86/kernel/cpu/perf_event_p6.c
+++ b/arch/x86/kernel/cpu/perf_event_p6.c
@@ -19,7 +19,7 @@ static const u64 p6_perfmon_event_map[] =
 
 };
 
-static __initconst u64 p6_hw_cache_event_ids
+static const u64 __initconst p6_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
-- 
1.8.1
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Incorrect accounting of irq into the running task

2013-01-05 Thread Shaun Ruffell
On Fri, Jan 04, 2013 at 10:22:12AM -0800, Sadasivan Shaiju wrote:
> Hi  Venkatesh,
> 
> I have applied the following patches for the incorrect accounting
> of irq into the running task .
> 
> 
> [PATCH] x86: Add IRQ_TIME_ACCOUNTING
> [e82b8e4ea4f3dffe6e7939f90e78da675fcc450e]
> [PATCH] sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time
> [b52bfee445d315549d41eacf2fa7c156e7d153d5]
> 
> [PATCH] sched: Do not account irq time to current task
> [305e6835e05513406fa12820e40e4a8ecb63743c]
> [PATCH] sched: Export ns irqtimes through /proc/stat
> [abb74cefa9c682fb38ba86c17ca3c86fed6cc464]
> 
> 
> 
> But still the stime and utime of the process in /proc/pid/stat is
> high. I think the above patches does not update The stime and
> utime values in /proc/pid/stat.
> 
> 
> Or am I missing anything?

Just checking that you do have CONFIG_IRQ_TIME_ACCOUNTING=y in your
kernel config?

Cheers,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Incorrect accounting of irq into the running task

2013-01-05 Thread Shaun Ruffell
On Fri, Jan 04, 2013 at 10:22:12AM -0800, Sadasivan Shaiju wrote:
 Hi  Venkatesh,
 
 I have applied the following patches for the incorrect accounting
 of irq into the running task .
 
 
 [PATCH] x86: Add IRQ_TIME_ACCOUNTING
 [e82b8e4ea4f3dffe6e7939f90e78da675fcc450e]
 [PATCH] sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time
 [b52bfee445d315549d41eacf2fa7c156e7d153d5]
 
 [PATCH] sched: Do not account irq time to current task
 [305e6835e05513406fa12820e40e4a8ecb63743c]
 [PATCH] sched: Export ns irqtimes through /proc/stat
 [abb74cefa9c682fb38ba86c17ca3c86fed6cc464]
 
 
 
 But still the stime and utime of the process in /proc/pid/stat is
 high. I think the above patches does not update The stime and
 utime values in /proc/pid/stat.
 
 
 Or am I missing anything?

Just checking that you do have CONFIG_IRQ_TIME_ACCOUNTING=y in your
kernel config?

Cheers,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Minimum toolchain requirements?

2012-12-27 Thread Shaun Ruffell
Hi Rob,

On Fri, Dec 14, 2012 at 04:25:10PM -0600, Rob Landley wrote:
> Although the README and Documentation/Changes both say the kernel
> builds with gcc 3.2, this is no loner the case. In reality the new
> 3.7 kernel no longer builds under unpatched gcc 4.2.1 (the last
> GPLv2 release).
> 
> Building for i686 breaks with
> "arch/x86/kernel/cpu/perf_event_p6.c:22: error:
> p6_hw_cache_event_ids causes a section type conflict" (trivial
> workaround: patch kernel so CONFIG_BROKEN_RODATA defaults to y).

I came across your email while searching for a solution to the above
build error.

In addition to setting CONFIG_BROKEN_RODATA=y, Jan Beulich sent a
patch [1] to LKML that also fixes this build problem for me.

[1] https://lkml.org/lkml/2012/11/23/308

It doesn't appear to have hit Linus' tree yet though, but with luck
someone will pick it up before 3.8 final.

Cheers,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/perf_events: build fix

2012-12-27 Thread Shaun Ruffell
[forgot to copy lkml]

Hi Jan,

On Fri, Nov 23, 2012 at 04:28:32PM +, Jan Beulich wrote:
> At least some older gcc versions dislike mixing constant and non-const
> data in the same section ("... causes a section type confict"). Newer
> gcc simply emits the section as writable (which isn't what we want, but
> also is not a big problem as it gets discarded post-init anyway).
> 
> Also get the Knight's Corner definitions in sync.
> 
> Signed-off-by: Jan Beulich 
> 
> ---
>  arch/x86/kernel/cpu/perf_event_knc.c |4 ++--
>  arch/x86/kernel/cpu/perf_event_p6.c  |2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> --- 3.7-rc6/arch/x86/kernel/cpu/perf_event_knc.c
> +++ 3.7-rc6-x86-perf-initconst/arch/x86/kernel/cpu/perf_event_knc.c
> @@ -17,7 +17,7 @@ static const u64 knc_perfmon_event_map[]
>[PERF_COUNT_HW_BRANCH_MISSES]  = 0x002b,
>  };
>  
> -static __initconst u64 knc_hw_cache_event_ids
> +static const u64 __initconst knc_hw_cache_event_ids
>   [PERF_COUNT_HW_CACHE_MAX]
>   [PERF_COUNT_HW_CACHE_OP_MAX]
>   [PERF_COUNT_HW_CACHE_RESULT_MAX] =
> @@ -284,7 +284,7 @@ static struct attribute *intel_knc_forma
>   NULL,
>  };
>  
> -static __initconst struct x86_pmu knc_pmu = {
> +static const struct x86_pmu knc_pmu __initconst = {
>   .name   = "knc",
>   .handle_irq = knc_pmu_handle_irq,
>   .disable_all= knc_pmu_disable_all,
> --- 3.7-rc6/arch/x86/kernel/cpu/perf_event_p6.c
> +++ 3.7-rc6-x86-perf-initconst/arch/x86/kernel/cpu/perf_event_p6.c
> @@ -19,7 +19,7 @@ static const u64 p6_perfmon_event_map[]
>  
>  };
>  
> -static __initconst u64 p6_hw_cache_event_ids
> +static const u64 __initconst p6_hw_cache_event_ids
>   [PERF_COUNT_HW_CACHE_MAX]
>   [PERF_COUNT_HW_CACHE_OP_MAX]
>   [PERF_COUNT_HW_CACHE_RESULT_MAX] =
> 
> 
> 
> --

I was testing out 3.8-rc1 when I ran into the same problem resolved
by this patch when building with gcc (GCC) 4.1.2 20080704 (Red Hat
4.1.2-51)

Were you given a reason why this shouldn't be needed?

It looks like the build error was introduced by (e09df47 "perf/x86:
Update/fix generic events on P6 PMU") which is in v3.7 as well.

Thanks,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/perf_events: build fix

2012-12-27 Thread Shaun Ruffell
[forgot to copy lkml]

Hi Jan,

On Fri, Nov 23, 2012 at 04:28:32PM +, Jan Beulich wrote:
 At least some older gcc versions dislike mixing constant and non-const
 data in the same section (... causes a section type confict). Newer
 gcc simply emits the section as writable (which isn't what we want, but
 also is not a big problem as it gets discarded post-init anyway).
 
 Also get the Knight's Corner definitions in sync.
 
 Signed-off-by: Jan Beulich jbeul...@suse.com
 
 ---
  arch/x86/kernel/cpu/perf_event_knc.c |4 ++--
  arch/x86/kernel/cpu/perf_event_p6.c  |2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)
 
 --- 3.7-rc6/arch/x86/kernel/cpu/perf_event_knc.c
 +++ 3.7-rc6-x86-perf-initconst/arch/x86/kernel/cpu/perf_event_knc.c
 @@ -17,7 +17,7 @@ static const u64 knc_perfmon_event_map[]
[PERF_COUNT_HW_BRANCH_MISSES]  = 0x002b,
  };
  
 -static __initconst u64 knc_hw_cache_event_ids
 +static const u64 __initconst knc_hw_cache_event_ids
   [PERF_COUNT_HW_CACHE_MAX]
   [PERF_COUNT_HW_CACHE_OP_MAX]
   [PERF_COUNT_HW_CACHE_RESULT_MAX] =
 @@ -284,7 +284,7 @@ static struct attribute *intel_knc_forma
   NULL,
  };
  
 -static __initconst struct x86_pmu knc_pmu = {
 +static const struct x86_pmu knc_pmu __initconst = {
   .name   = knc,
   .handle_irq = knc_pmu_handle_irq,
   .disable_all= knc_pmu_disable_all,
 --- 3.7-rc6/arch/x86/kernel/cpu/perf_event_p6.c
 +++ 3.7-rc6-x86-perf-initconst/arch/x86/kernel/cpu/perf_event_p6.c
 @@ -19,7 +19,7 @@ static const u64 p6_perfmon_event_map[]
  
  };
  
 -static __initconst u64 p6_hw_cache_event_ids
 +static const u64 __initconst p6_hw_cache_event_ids
   [PERF_COUNT_HW_CACHE_MAX]
   [PERF_COUNT_HW_CACHE_OP_MAX]
   [PERF_COUNT_HW_CACHE_RESULT_MAX] =
 
 
 
 --

I was testing out 3.8-rc1 when I ran into the same problem resolved
by this patch when building with gcc (GCC) 4.1.2 20080704 (Red Hat
4.1.2-51)

Were you given a reason why this shouldn't be needed?

It looks like the build error was introduced by (e09df47 perf/x86:
Update/fix generic events on P6 PMU) which is in v3.7 as well.

Thanks,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Minimum toolchain requirements?

2012-12-27 Thread Shaun Ruffell
Hi Rob,

On Fri, Dec 14, 2012 at 04:25:10PM -0600, Rob Landley wrote:
 Although the README and Documentation/Changes both say the kernel
 builds with gcc 3.2, this is no loner the case. In reality the new
 3.7 kernel no longer builds under unpatched gcc 4.2.1 (the last
 GPLv2 release).
 
 Building for i686 breaks with
 arch/x86/kernel/cpu/perf_event_p6.c:22: error:
 p6_hw_cache_event_ids causes a section type conflict (trivial
 workaround: patch kernel so CONFIG_BROKEN_RODATA defaults to y).

I came across your email while searching for a solution to the above
build error.

In addition to setting CONFIG_BROKEN_RODATA=y, Jan Beulich sent a
patch [1] to LKML that also fixes this build problem for me.

[1] https://lkml.org/lkml/2012/11/23/308

It doesn't appear to have hit Linus' tree yet though, but with luck
someone will pick it up before 3.8 final.

Cheers,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.

2012-09-22 Thread Shaun Ruffell
Fix potential NULL pointer dereference in edac_unregister_sysfs() on
system boot introduced in 3.6-rc1. This patch is dependent on
Fengguang's "edac_mc: fix messy kfree calls in the error path".

Since commit 7a623c039 ("edac: rewrite the sysfs code to use struct
device") edac_mc_alloc() no longer initializes embedded kobjects in
struct mem_ctl_info. Therefore edac_mc_free() can no longer simply
decrement a kobject reference count to free the allocated memory
unless the memory controller driver module had also called
edac_mc_add_mc().

Now edac_mc_free() will check if the newly embedded struct device
has been registered with sysfs before using either the standard
device release functions or freeing the data structures itself with
logic pulled out of the error path of edac_mc_alloc().

The BUG this patch resolves for me:

  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP: [] __wake_up_common+0x1a/0x6a
  *pde = 7f0c6067
  Oops:  [#1] SMP
  Modules linked in: parport_pc parport floppy e7xxx_edac(+) ide_cd_mod 
edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero dm_mirror 
dm_region_hash d
  Pid: 933, comm: modprobe Tainted: GW3.6.0-rc2-00111-gc1999ee #12 
Dell Computer Corporation PowerEdge 2600 /0F0364
  EIP: 0060:[] EFLAGS: 00010093 CPU: 3
  EIP is at __wake_up_common+0x1a/0x6a
  EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
  ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
   DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
  DR0:  DR1:  DR2:  DR3: 
  DR6: 0ff0 DR7: 0400
  Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
  Stack:
     0003 c046701a f47b0980 f47b0984 0282 f3dc7d54
   c046703f   f47b08b0 f47b08b0  f3dc7d74 c06961ce
   f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 f3dc7d88 c068d56d
  Call Trace:
   [] ? complete_all+0x1a/0x50
   [] complete_all+0x3f/0x50
   [] device_pm_remove+0x23/0xa2
   [] ? kobject_put+0x5b/0x5d
   [] device_del+0x34/0x142
   [] edac_unregister_sysfs+0x3b/0x5c [edac_core]
   [] edac_mc_free+0x29/0x2f [edac_core]
   [] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
   [] ? __pci_enable_device_flags+0x8f/0xd3
   [] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
   [] local_pci_probe+0x13/0x15
  ...

Cc: Mauro Carvalho Chehab 
Cc: Shaohui Xie 
Signed-off-by: Shaun Ruffell 
---

Hi Linus, I did not bother to resend the third patch [1] since it's
not really *necessary* to boot my system. Fengguang's and this is
sufficient.

[1] http://marc.info/?l=linux-kernel=134764597921761=2

 drivers/edac/edac_mc.c | 59 +-
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 9037ffa..d5dc9da 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -199,6 +199,36 @@ void *edac_align_ptr(void **p, unsigned size, int n_elems)
return (void *)(((unsigned long)ptr) + align - r);
 }
 
+static void _edac_mc_free(struct mem_ctl_info *mci)
+{
+   int i, chn, row;
+   struct csrow_info *csr;
+   const unsigned int tot_dimms = mci->tot_dimms;
+   const unsigned int tot_channels = mci->num_cschannel;
+   const unsigned int tot_csrows = mci->nr_csrows;
+
+   if (mci->dimms) {
+   for (i = 0; i < tot_dimms; i++)
+   kfree(mci->dimms[i]);
+   kfree(mci->dimms);
+   }
+   if (mci->csrows) {
+   for (row = 0; row < tot_csrows; row++) {
+   csr = mci->csrows[row];
+   if (csr) {
+   if (csr->channels) {
+   for (chn = 0; chn < tot_channels; chn++)
+   kfree(csr->channels[chn]);
+   kfree(csr->channels);
+   }
+   kfree(csr);
+   }
+   }
+   kfree(mci->csrows);
+   }
+   kfree(mci);
+}
+
 /**
  * edac_mc_alloc: Allocate and partially fill a struct mem_ctl_info structure
  * @mc_num:Memory controller number
@@ -413,26 +443,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
return mci;
 
 error:
-   if (mci->dimms) {
-   for (i = 0; i < tot_dimms; i++)
-   kfree(mci->dimms[i]);
-   kfree(mci->dimms);
-   }
-   if (mci->csrows) {
-   for (row = 0; row < tot_csrows; row++) {
-   csr = mci->csrows[row];
-   if (csr) {
-   if (csr->channels) {
-   for (chn = 0; chn < 

[PATCH] edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.

2012-09-22 Thread Shaun Ruffell
Fix potential NULL pointer dereference in edac_unregister_sysfs() on
system boot introduced in 3.6-rc1. This patch is dependent on
Fengguang's edac_mc: fix messy kfree calls in the error path.

Since commit 7a623c039 (edac: rewrite the sysfs code to use struct
device) edac_mc_alloc() no longer initializes embedded kobjects in
struct mem_ctl_info. Therefore edac_mc_free() can no longer simply
decrement a kobject reference count to free the allocated memory
unless the memory controller driver module had also called
edac_mc_add_mc().

Now edac_mc_free() will check if the newly embedded struct device
has been registered with sysfs before using either the standard
device release functions or freeing the data structures itself with
logic pulled out of the error path of edac_mc_alloc().

The BUG this patch resolves for me:

  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP: [c045e195] __wake_up_common+0x1a/0x6a
  *pde = 7f0c6067
  Oops:  [#1] SMP
  Modules linked in: parport_pc parport floppy e7xxx_edac(+) ide_cd_mod 
edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero dm_mirror 
dm_region_hash d
  Pid: 933, comm: modprobe Tainted: GW3.6.0-rc2-00111-gc1999ee #12 
Dell Computer Corporation PowerEdge 2600 /0F0364
  EIP: 0060:[c045e195] EFLAGS: 00010093 CPU: 3
  EIP is at __wake_up_common+0x1a/0x6a
  EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
  ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
   DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
  DR0:  DR1:  DR2:  DR3: 
  DR6: 0ff0 DR7: 0400
  Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
  Stack:
     0003 c046701a f47b0980 f47b0984 0282 f3dc7d54
   c046703f   f47b08b0 f47b08b0  f3dc7d74 c06961ce
   f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 f3dc7d88 c068d56d
  Call Trace:
   [c046701a] ? complete_all+0x1a/0x50
   [c046703f] complete_all+0x3f/0x50
   [c06961ce] device_pm_remove+0x23/0xa2
   [c05e2837] ? kobject_put+0x5b/0x5d
   [c068d56d] device_del+0x34/0x142
   [f8547884] edac_unregister_sysfs+0x3b/0x5c [edac_core]
   [f8545041] edac_mc_free+0x29/0x2f [edac_core]
   [f860163f] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
   [c0603d00] ? __pci_enable_device_flags+0x8f/0xd3
   [f8601b0b] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
   [c0604f85] local_pci_probe+0x13/0x15
  ...

Cc: Mauro Carvalho Chehab mche...@redhat.com
Cc: Shaohui Xie shaohui@freescale.com
Signed-off-by: Shaun Ruffell sruff...@digium.com
---

Hi Linus, I did not bother to resend the third patch [1] since it's
not really *necessary* to boot my system. Fengguang's and this is
sufficient.

[1] http://marc.info/?l=linux-kernelm=134764597921761w=2

 drivers/edac/edac_mc.c | 59 +-
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 9037ffa..d5dc9da 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -199,6 +199,36 @@ void *edac_align_ptr(void **p, unsigned size, int n_elems)
return (void *)(((unsigned long)ptr) + align - r);
 }
 
+static void _edac_mc_free(struct mem_ctl_info *mci)
+{
+   int i, chn, row;
+   struct csrow_info *csr;
+   const unsigned int tot_dimms = mci-tot_dimms;
+   const unsigned int tot_channels = mci-num_cschannel;
+   const unsigned int tot_csrows = mci-nr_csrows;
+
+   if (mci-dimms) {
+   for (i = 0; i  tot_dimms; i++)
+   kfree(mci-dimms[i]);
+   kfree(mci-dimms);
+   }
+   if (mci-csrows) {
+   for (row = 0; row  tot_csrows; row++) {
+   csr = mci-csrows[row];
+   if (csr) {
+   if (csr-channels) {
+   for (chn = 0; chn  tot_channels; chn++)
+   kfree(csr-channels[chn]);
+   kfree(csr-channels);
+   }
+   kfree(csr);
+   }
+   }
+   kfree(mci-csrows);
+   }
+   kfree(mci);
+}
+
 /**
  * edac_mc_alloc: Allocate and partially fill a struct mem_ctl_info structure
  * @mc_num:Memory controller number
@@ -413,26 +443,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
return mci;
 
 error:
-   if (mci-dimms) {
-   for (i = 0; i  tot_dimms; i++)
-   kfree(mci-dimms[i]);
-   kfree(mci-dimms);
-   }
-   if (mci-csrows) {
-   for (row = 0; row  tot_csrows; row++) {
-   csr = mci-csrows[row];
-   if (csr) {
-   if (csr-channels

Re: Linux 3.6-rc6

2012-09-21 Thread Shaun Ruffell
On Sun, Sep 16, 2012 at 03:59:09PM -0700, Linus Torvalds wrote:
> 
> Please do test things out, I'd really like to be able to do the final
> 3.6 soonish..

Linus,

Just a heads up in case you are about to tag v3.6.

v3.6-rc6 still has a regression with edac_mc_alloc()/edac_mc_free()
introduced in commit de3910eb79ac8c0f29a11224661c0ebaaf813039.
edac_mc_free() assumes that struct mem_ctl_info is registered in
sysfs but there are error paths where this is not always the case.

I posted patches [1,2,3] that resolve the issue for me. Shaohui Xie
also hit the issue and posted a slightly different patch [4]. The
patches are currently waiting for Mauro, who I understand is
catching up since returning from San Diego, to check them out.

[1] http://marc.info/?l=linux-kernel=134764595921752=2
[2] http://marc.info/?l=linux-kernel=134764594721747=2
[3] http://marc.info/?l=linux-kernel=134764597921761=2
[4] http://marc.info/?l=linux-kernel=134753579818528=2

Without the patches I'll always hit on boot:

[  36.703479] BUG: unable to handle kernel NULL pointer dereference at   (null)
[  36.703479] IP: [] __wake_up_common+0x1a/0x6a
[  36.703479] *pde = 7f0c6067
[  36.703479] Oops:  [#1] SMP
[  36.703479] Modules linked in: parport_pc parport floppy e7xxx_edac(+) 
ide_cd_mod edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero dm_mirror 
dm_region_hash d
[  36.703479] Pid: 933, comm: modprobe Tainted: GW
3.6.0-rc2-00111-gc1999ee #12 Dell Computer Corporation PowerEdge 2600   
  /0F0364
[  36.703479] EIP: 0060:[] EFLAGS: 00010093 CPU: 3
[  36.703479] EIP is at __wake_up_common+0x1a/0x6a
[  36.703479] EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
[  36.703479] ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
[  36.703479]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  36.703479] CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
[  36.703479] DR0:  DR1:  DR2:  DR3: 
[  36.703479] DR6: 0ff0 DR7: 0400
[  36.703479] Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 
task.ti=f3dc6000)
[  36.703479] Stack:
[  36.703479]    0003 c046701a f47b0980 f47b0984 0282 
f3dc7d54
[  36.703479]  c046703f   f47b08b0 f47b08b0  f3dc7d74 
c06961ce
[  36.703479]  f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 f3dc7d88 
c068d56d
[  36.703479] Call Trace:
[  36.703479]  [] ? complete_all+0x1a/0x50
[  36.703479]  [] complete_all+0x3f/0x50
[  36.703479]  [] device_pm_remove+0x23/0xa2
[  36.703479]  [] ? kobject_put+0x5b/0x5d
[  36.703479]  [] device_del+0x34/0x142
[  36.703479]  [] edac_unregister_sysfs+0x3b/0x5c [edac_core]
[  36.703479]  [] edac_mc_free+0x29/0x2f [edac_core]
[  36.703479]  [] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
[  36.703479]  [] ? __pci_enable_device_flags+0x8f/0xd3
[  36.703479]  [] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
[  36.703479]  [] local_pci_probe+0x13/0x15
...

Cheers,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 3.6-rc6

2012-09-21 Thread Shaun Ruffell
On Sun, Sep 16, 2012 at 03:59:09PM -0700, Linus Torvalds wrote:
 
 Please do test things out, I'd really like to be able to do the final
 3.6 soonish..

Linus,

Just a heads up in case you are about to tag v3.6.

v3.6-rc6 still has a regression with edac_mc_alloc()/edac_mc_free()
introduced in commit de3910eb79ac8c0f29a11224661c0ebaaf813039.
edac_mc_free() assumes that struct mem_ctl_info is registered in
sysfs but there are error paths where this is not always the case.

I posted patches [1,2,3] that resolve the issue for me. Shaohui Xie
also hit the issue and posted a slightly different patch [4]. The
patches are currently waiting for Mauro, who I understand is
catching up since returning from San Diego, to check them out.

[1] http://marc.info/?l=linux-kernelm=134764595921752w=2
[2] http://marc.info/?l=linux-kernelm=134764594721747w=2
[3] http://marc.info/?l=linux-kernelm=134764597921761w=2
[4] http://marc.info/?l=linux-kernelm=134753579818528w=2

Without the patches I'll always hit on boot:

[  36.703479] BUG: unable to handle kernel NULL pointer dereference at   (null)
[  36.703479] IP: [c045e195] __wake_up_common+0x1a/0x6a
[  36.703479] *pde = 7f0c6067
[  36.703479] Oops:  [#1] SMP
[  36.703479] Modules linked in: parport_pc parport floppy e7xxx_edac(+) 
ide_cd_mod edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero dm_mirror 
dm_region_hash d
[  36.703479] Pid: 933, comm: modprobe Tainted: GW
3.6.0-rc2-00111-gc1999ee #12 Dell Computer Corporation PowerEdge 2600   
  /0F0364
[  36.703479] EIP: 0060:[c045e195] EFLAGS: 00010093 CPU: 3
[  36.703479] EIP is at __wake_up_common+0x1a/0x6a
[  36.703479] EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
[  36.703479] ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
[  36.703479]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  36.703479] CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
[  36.703479] DR0:  DR1:  DR2:  DR3: 
[  36.703479] DR6: 0ff0 DR7: 0400
[  36.703479] Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 
task.ti=f3dc6000)
[  36.703479] Stack:
[  36.703479]    0003 c046701a f47b0980 f47b0984 0282 
f3dc7d54
[  36.703479]  c046703f   f47b08b0 f47b08b0  f3dc7d74 
c06961ce
[  36.703479]  f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 f3dc7d88 
c068d56d
[  36.703479] Call Trace:
[  36.703479]  [c046701a] ? complete_all+0x1a/0x50
[  36.703479]  [c046703f] complete_all+0x3f/0x50
[  36.703479]  [c06961ce] device_pm_remove+0x23/0xa2
[  36.703479]  [c05e2837] ? kobject_put+0x5b/0x5d
[  36.703479]  [c068d56d] device_del+0x34/0x142
[  36.703479]  [f8547884] edac_unregister_sysfs+0x3b/0x5c [edac_core]
[  36.703479]  [f8545041] edac_mc_free+0x29/0x2f [edac_core]
[  36.703479]  [f860163f] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
[  36.703479]  [c0603d00] ? __pci_enable_device_flags+0x8f/0xd3
[  36.703479]  [f8601b0b] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
[  36.703479]  [c0604f85] local_pci_probe+0x13/0x15
...

Cheers,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe

2012-09-18 Thread Shaun Ruffell
On Wed, Sep 19, 2012 at 03:43:35AM +, Xie Shaohui-B21989 wrote:
> > On Mon, Sep 17, 2012 at 10:32:59AM +, Xie Shaohui-B21989 wrote:
> > >
> > > BTW: seems you are using a different kernel tree with mine.
> > 
> > On the chance that I missing something important: Why do you say
> > I was running a different kernel tree? I was against 3.6-rc2
> > when I original hit the problem.
>
> [S.H] I'm using 3.6-rc4, and some codes in your patch I did not
> find them in 3.6-rc4.

Is it because there were three patches in the series?

  [1/3] https://lkml.org/lkml/2012/9/14/473
  [2/3] https://lkml.org/lkml/2012/9/14/469
  [3/3] https://lkml.org/lkml/2012/9/14/474

They are also the last three commits on my edac_debug_v2 branch
here, which is on top of 3.6-rc6:

https://github.com/sruffell/linux/commits/edac_debug_v2/

Cheers,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe

2012-09-18 Thread Shaun Ruffell
On Mon, Sep 17, 2012 at 10:32:59AM +, Xie Shaohui-B21989 wrote:
> > -Original Message-
> > From: Shaun Ruffell [mailto:sruff...@digium.com]
> > Sent: Saturday, September 15, 2012 2:22 AM
> > To: Xie Shaohui-B21989
> > Cc: linux-e...@vger.kernel.org; linuxppc-...@lists.ozlabs.org;
> > a...@linux-foundation.org; avoront...@mvista.com; linux-
> > ker...@vger.kernel.org; grant.lik...@secretlab.ca
> > Subject: Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe
> > 
> > On Thu, Sep 13, 2012 at 06:55:29PM +0800, Shaohui Xie wrote:
> > > Error handle in case of DDR ECC off is wrong, sysfs entries
> > > have not been created, so edac_mc_free which frees a mci
> > > instance should not be called.
> > > Also, free mci's memory in this case.
> > 
> > Jus FYI: I ran into the same error in edac_mc_free() which I
> > resolved in a slightly different way in some patches I sent
> > previously. [1]
> > 
> > [1] https://lkml.org/lkml/2012/9/14/475
>
> [S.H] Thanks! I did not aware of this patch when one of my
> colleague asked me to have a look at the issue, It could save me
> some time if I saw this patch earlier. :(
> 
> BTW: seems you are using a different kernel tree with mine.

On the chance that I missing something important: Why do you say I
was running a different kernel tree? I was against 3.6-rc2 when I
original hit the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe

2012-09-18 Thread Shaun Ruffell
On Mon, Sep 17, 2012 at 10:32:59AM +, Xie Shaohui-B21989 wrote:
  -Original Message-
  From: Shaun Ruffell [mailto:sruff...@digium.com]
  Sent: Saturday, September 15, 2012 2:22 AM
  To: Xie Shaohui-B21989
  Cc: linux-e...@vger.kernel.org; linuxppc-...@lists.ozlabs.org;
  a...@linux-foundation.org; avoront...@mvista.com; linux-
  ker...@vger.kernel.org; grant.lik...@secretlab.ca
  Subject: Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe
  
  On Thu, Sep 13, 2012 at 06:55:29PM +0800, Shaohui Xie wrote:
   Error handle in case of DDR ECC off is wrong, sysfs entries
   have not been created, so edac_mc_free which frees a mci
   instance should not be called.
   Also, free mci's memory in this case.
  
  Jus FYI: I ran into the same error in edac_mc_free() which I
  resolved in a slightly different way in some patches I sent
  previously. [1]
  
  [1] https://lkml.org/lkml/2012/9/14/475

 [S.H] Thanks! I did not aware of this patch when one of my
 colleague asked me to have a look at the issue, It could save me
 some time if I saw this patch earlier. :(
 
 BTW: seems you are using a different kernel tree with mine.

On the chance that I missing something important: Why do you say I
was running a different kernel tree? I was against 3.6-rc2 when I
original hit the problem.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe

2012-09-18 Thread Shaun Ruffell
On Wed, Sep 19, 2012 at 03:43:35AM +, Xie Shaohui-B21989 wrote:
  On Mon, Sep 17, 2012 at 10:32:59AM +, Xie Shaohui-B21989 wrote:
  
   BTW: seems you are using a different kernel tree with mine.
  
  On the chance that I missing something important: Why do you say
  I was running a different kernel tree? I was against 3.6-rc2
  when I original hit the problem.

 [S.H] I'm using 3.6-rc4, and some codes in your patch I did not
 find them in 3.6-rc4.

Is it because there were three patches in the series?

  [1/3] https://lkml.org/lkml/2012/9/14/473
  [2/3] https://lkml.org/lkml/2012/9/14/469
  [3/3] https://lkml.org/lkml/2012/9/14/474

They are also the last three commits on my edac_debug_v2 branch
here, which is on top of 3.6-rc6:

https://github.com/sruffell/linux/commits/edac_debug_v2/

Cheers,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe

2012-09-14 Thread Shaun Ruffell
On Thu, Sep 13, 2012 at 06:55:29PM +0800, Shaohui Xie wrote:
> Error handle in case of DDR ECC off is wrong, sysfs entries have not been
> created, so edac_mc_free which frees a mci instance should not be called.
> Also, free mci's memory in this case.

Jus FYI: I ran into the same error in edac_mc_free() which I
resolved in a slightly different way in some patches I sent
previously. [1]

[1] https://lkml.org/lkml/2012/9/14/475 

Cheers,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] edac_mc: fix kfree calls in the error path

2012-09-14 Thread Shaun Ruffell
On Fri, Sep 14, 2012 at 12:58:16PM -0500, Shaun Ruffell wrote:
> From: Fengguang Wu 
> 
> From: Fengguang Wu 
 
Darn it. If you push this through, mind fixing the above?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/3] edac: edac_mc no longer deals with kobjects directly.

2012-09-14 Thread Shaun Ruffell
There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.

Signed-off-by: Shaun Ruffell 
---
 drivers/edac/edac_mc.c | 7 ---
 include/linux/edac.h   | 1 -
 2 files changed, 8 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index d5dc9da..02f0d3e 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -433,13 +433,6 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 
mci->op_state = OP_ALLOC;
 
-   /* at this point, the root kobj is valid, and in order to
-* 'free' the object, then the function:
-*  edac_mc_unregister_sysfs_main_kobj() must be called
-* which will perform kobj unregistration and the actual free
-* will occur during the kobject callback operation
-*/
-
return mci;
 
 error:
diff --git a/include/linux/edac.h b/include/linux/edac.h
index bab9f84..aeddb3f 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -14,7 +14,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/3] Fix edac_mc crash in e7xxx_edac error path.

2012-09-14 Thread Shaun Ruffell
v2:
Use '!device_is_registered(>dev)' instead of 'if (!mci->bus.name)' to
check if mem_ctl_info has been registered with sysfs.

v1:

With kernel version 3.6-rc2 on a Dell Poweredge 2600 I experienced a NULL
pointer dereference that did not occur with on 3.5. I believe the error is
related to commit de3910eb79a "edac: change the mem allocation scheme to make
Documentation/kobject.txt happy" [1] and the fact that my system is going
through an error path in the e7xxx_edac driver.

[1] 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=de3910eb79ac8c0f29a11224661c0ebaaf813039

This is the OOPS:

 [  36.703479] BUG: unable to handle kernel NULL pointer dereference at   (null)
 [  36.703479] IP: [] __wake_up_common+0x1a/0x6a
 [  36.703479] *pde = 7f0c6067
 [  36.703479] Oops:  [#1] SMP
 [  36.703479] Modules linked in: parport_pc parport floppy e7xxx_edac(+) 
ide_cd_mod edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero dm_mirror 
dm_region_hash d
 [  36.703479] Pid: 933, comm: modprobe Tainted: GW
3.6.0-rc2-00111-gc1999ee #12 Dell Computer Corporation PowerEdge 2600   
  /0F0364
 [  36.703479] EIP: 0060:[] EFLAGS: 00010093 CPU: 3
 [  36.703479] EIP is at __wake_up_common+0x1a/0x6a
 [  36.703479] EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
 [  36.703479] ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
 [  36.703479]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 [  36.703479] CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
 [  36.703479] DR0:  DR1:  DR2:  DR3: 
 [  36.703479] DR6: 0ff0 DR7: 0400
 [  36.703479] Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 
task.ti=f3dc6000)
 [  36.703479] Stack:
 [  36.703479]    0003 c046701a f47b0980 f47b0984 0282 
f3dc7d54
 [  36.703479]  c046703f   f47b08b0 f47b08b0  f3dc7d74 
c06961ce
 [  36.703479]  f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 f3dc7d88 
c068d56d
 [  36.703479] Call Trace:
 [  36.703479]  [] ? complete_all+0x1a/0x50
 [  36.703479]  [] complete_all+0x3f/0x50
 [  36.703479]  [] device_pm_remove+0x23/0xa2
 [  36.703479]  [] ? kobject_put+0x5b/0x5d
 [  36.703479]  [] device_del+0x34/0x142
 [  36.703479]  [] edac_unregister_sysfs+0x3b/0x5c [edac_core]
 [  36.703479]  [] edac_mc_free+0x29/0x2f [edac_core]
 [  36.703479]  [] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
 [  36.703479]  [] ? __pci_enable_device_flags+0x8f/0xd3
 [  36.703479]  [] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
 [  36.703479]  [] local_pci_probe+0x13/0x15
 ...


Fengguang Wu (1):
  edac_mc: fix kfree calls in the error path

Shaun Ruffell (2):
  edac: edac_mc_free() cannot assume mem_ctl_info is registered in
sysfs.
  edac: edac_mc no longer deals with kobjects directly.

 drivers/edac/edac_mc.c | 64 ++
 include/linux/edac.h   |  1 -
 2 files changed, 39 insertions(+), 26 deletions(-)

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/3] edac: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.

2012-09-14 Thread Shaun Ruffell
edac_mc_free() may need to deallocate any memory associated with struct
mem_ctl_info directly if the structure was never registered with sysfs in
edac_mc_add_mc(). This moves the error handling code from edac_mc_alloc() into a
dedicated function to be called by edac_mc_free() as well if necessary.

This resolves a NULL pointer dereference from the following code path first
introduced in 3.6-rc1:

  EDAC MC: Ver: 3.0.0
  EDAC DEBUG: edac_mc_sysfs_init: device mc created
  EDAC DEBUG: e7xxx_init_one:
  EDAC DEBUG: e7xxx_probe1: mci
  EDAC DEBUG: edac_mc_alloc: errcount layer 0 size 8
  EDAC DEBUG: edac_mc_alloc: errcount layer 1 size 16
  EDAC DEBUG: edac_mc_alloc: allocating 48 error counters
  EDAC DEBUG: edac_mc_alloc: allocating 1068 bytes for mci data (16 ranks, 16 
csrows/channels)
  EDAC DEBUG: e7xxx_probe1: init mci
  EDAC DEBUG: e7xxx_probe1: init pvt
  EDAC e7xxx: error reporting device not found:vendor 8086 device 0x2541 
(broken BIOS?)
  EDAC DEBUG: edac_mc_free:
  Floppy drive(s): fd0 is 1.44M
  EDAC DEBUG: edac_unregister_sysfs: Unregistering device (null)

Signed-off-by: Shaun Ruffell 
---
 drivers/edac/edac_mc.c | 59 +-
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 9037ffa..d5dc9da 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -199,6 +199,36 @@ void *edac_align_ptr(void **p, unsigned size, int n_elems)
return (void *)(((unsigned long)ptr) + align - r);
 }
 
+static void _edac_mc_free(struct mem_ctl_info *mci)
+{
+   int i, chn, row;
+   struct csrow_info *csr;
+   const unsigned int tot_dimms = mci->tot_dimms;
+   const unsigned int tot_channels = mci->num_cschannel;
+   const unsigned int tot_csrows = mci->nr_csrows;
+
+   if (mci->dimms) {
+   for (i = 0; i < tot_dimms; i++)
+   kfree(mci->dimms[i]);
+   kfree(mci->dimms);
+   }
+   if (mci->csrows) {
+   for (row = 0; row < tot_csrows; row++) {
+   csr = mci->csrows[row];
+   if (csr) {
+   if (csr->channels) {
+   for (chn = 0; chn < tot_channels; chn++)
+   kfree(csr->channels[chn]);
+   kfree(csr->channels);
+   }
+   kfree(csr);
+   }
+   }
+   kfree(mci->csrows);
+   }
+   kfree(mci);
+}
+
 /**
  * edac_mc_alloc: Allocate and partially fill a struct mem_ctl_info structure
  * @mc_num:Memory controller number
@@ -413,26 +443,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
return mci;
 
 error:
-   if (mci->dimms) {
-   for (i = 0; i < tot_dimms; i++)
-   kfree(mci->dimms[i]);
-   kfree(mci->dimms);
-   }
-   if (mci->csrows) {
-   for (row = 0; row < tot_csrows; row++) {
-   csr = mci->csrows[row];
-   if (csr) {
-   if (csr->channels) {
-   for (chn = 0; chn < tot_channels; chn++)
-   kfree(csr->channels[chn]);
-   kfree(csr->channels);
-   }
-   kfree(csr);
-   }
-   }
-   kfree(mci->csrows);
-   }
-   kfree(mci);
+   _edac_mc_free(mci);
 
return NULL;
 }
@@ -447,6 +458,14 @@ void edac_mc_free(struct mem_ctl_info *mci)
 {
edac_dbg(1, "\n");
 
+   /* If we're not yet registered with sysfs free only what was allocated
+* in edac_mc_alloc().
+*/
+   if (!device_is_registered(>dev)) {
+   _edac_mc_free(mci);
+   return;
+   }
+
/* the mci instance is freed here, when the sysfs object is dropped */
edac_unregister_sysfs(mci);
 }
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/3] edac_mc: fix kfree calls in the error path

2012-09-14 Thread Shaun Ruffell
From: Fengguang Wu 

From: Fengguang Wu 

We need to free up memory in this order:

  free csrows[i]->channels[j]
  free csrows[i]->channels
  free csrows[i]
  free csrows

Signed-off-by: Fengguang Wu 
---
 drivers/edac/edac_mc.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 616d90b..9037ffa 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -419,14 +419,16 @@ error:
kfree(mci->dimms);
}
if (mci->csrows) {
-   for (chn = 0; chn < tot_channels; chn++) {
-   csr = mci->csrows[chn];
+   for (row = 0; row < tot_csrows; row++) {
+   csr = mci->csrows[row];
if (csr) {
-   for (chn = 0; chn < tot_channels; chn++)
-   kfree(csr->channels[chn]);
+   if (csr->channels) {
+   for (chn = 0; chn < tot_channels; chn++)
+   kfree(csr->channels[chn]);
+   kfree(csr->channels);
+   }
kfree(csr);
}
-   kfree(mci->csrows[i]);
}
kfree(mci->csrows);
}
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/3] edac_mc: fix kfree calls in the error path

2012-09-14 Thread Shaun Ruffell
From: Fengguang Wu fengguang...@intel.com

From: Fengguang Wu fengguang...@intel.com

We need to free up memory in this order:

  free csrows[i]-channels[j]
  free csrows[i]-channels
  free csrows[i]
  free csrows

Signed-off-by: Fengguang Wu fengguang...@intel.com
---
 drivers/edac/edac_mc.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 616d90b..9037ffa 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -419,14 +419,16 @@ error:
kfree(mci-dimms);
}
if (mci-csrows) {
-   for (chn = 0; chn  tot_channels; chn++) {
-   csr = mci-csrows[chn];
+   for (row = 0; row  tot_csrows; row++) {
+   csr = mci-csrows[row];
if (csr) {
-   for (chn = 0; chn  tot_channels; chn++)
-   kfree(csr-channels[chn]);
+   if (csr-channels) {
+   for (chn = 0; chn  tot_channels; chn++)
+   kfree(csr-channels[chn]);
+   kfree(csr-channels);
+   }
kfree(csr);
}
-   kfree(mci-csrows[i]);
}
kfree(mci-csrows);
}
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/3] edac: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.

2012-09-14 Thread Shaun Ruffell
edac_mc_free() may need to deallocate any memory associated with struct
mem_ctl_info directly if the structure was never registered with sysfs in
edac_mc_add_mc(). This moves the error handling code from edac_mc_alloc() into a
dedicated function to be called by edac_mc_free() as well if necessary.

This resolves a NULL pointer dereference from the following code path first
introduced in 3.6-rc1:

  EDAC MC: Ver: 3.0.0
  EDAC DEBUG: edac_mc_sysfs_init: device mc created
  EDAC DEBUG: e7xxx_init_one:
  EDAC DEBUG: e7xxx_probe1: mci
  EDAC DEBUG: edac_mc_alloc: errcount layer 0 size 8
  EDAC DEBUG: edac_mc_alloc: errcount layer 1 size 16
  EDAC DEBUG: edac_mc_alloc: allocating 48 error counters
  EDAC DEBUG: edac_mc_alloc: allocating 1068 bytes for mci data (16 ranks, 16 
csrows/channels)
  EDAC DEBUG: e7xxx_probe1: init mci
  EDAC DEBUG: e7xxx_probe1: init pvt
  EDAC e7xxx: error reporting device not found:vendor 8086 device 0x2541 
(broken BIOS?)
  EDAC DEBUG: edac_mc_free:
  Floppy drive(s): fd0 is 1.44M
  EDAC DEBUG: edac_unregister_sysfs: Unregistering device (null)

Signed-off-by: Shaun Ruffell sruff...@digium.com
---
 drivers/edac/edac_mc.c | 59 +-
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 9037ffa..d5dc9da 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -199,6 +199,36 @@ void *edac_align_ptr(void **p, unsigned size, int n_elems)
return (void *)(((unsigned long)ptr) + align - r);
 }
 
+static void _edac_mc_free(struct mem_ctl_info *mci)
+{
+   int i, chn, row;
+   struct csrow_info *csr;
+   const unsigned int tot_dimms = mci-tot_dimms;
+   const unsigned int tot_channels = mci-num_cschannel;
+   const unsigned int tot_csrows = mci-nr_csrows;
+
+   if (mci-dimms) {
+   for (i = 0; i  tot_dimms; i++)
+   kfree(mci-dimms[i]);
+   kfree(mci-dimms);
+   }
+   if (mci-csrows) {
+   for (row = 0; row  tot_csrows; row++) {
+   csr = mci-csrows[row];
+   if (csr) {
+   if (csr-channels) {
+   for (chn = 0; chn  tot_channels; chn++)
+   kfree(csr-channels[chn]);
+   kfree(csr-channels);
+   }
+   kfree(csr);
+   }
+   }
+   kfree(mci-csrows);
+   }
+   kfree(mci);
+}
+
 /**
  * edac_mc_alloc: Allocate and partially fill a struct mem_ctl_info structure
  * @mc_num:Memory controller number
@@ -413,26 +443,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
return mci;
 
 error:
-   if (mci-dimms) {
-   for (i = 0; i  tot_dimms; i++)
-   kfree(mci-dimms[i]);
-   kfree(mci-dimms);
-   }
-   if (mci-csrows) {
-   for (row = 0; row  tot_csrows; row++) {
-   csr = mci-csrows[row];
-   if (csr) {
-   if (csr-channels) {
-   for (chn = 0; chn  tot_channels; chn++)
-   kfree(csr-channels[chn]);
-   kfree(csr-channels);
-   }
-   kfree(csr);
-   }
-   }
-   kfree(mci-csrows);
-   }
-   kfree(mci);
+   _edac_mc_free(mci);
 
return NULL;
 }
@@ -447,6 +458,14 @@ void edac_mc_free(struct mem_ctl_info *mci)
 {
edac_dbg(1, \n);
 
+   /* If we're not yet registered with sysfs free only what was allocated
+* in edac_mc_alloc().
+*/
+   if (!device_is_registered(mci-dev)) {
+   _edac_mc_free(mci);
+   return;
+   }
+
/* the mci instance is freed here, when the sysfs object is dropped */
edac_unregister_sysfs(mci);
 }
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/3] Fix edac_mc crash in e7xxx_edac error path.

2012-09-14 Thread Shaun Ruffell
v2:
Use '!device_is_registered(mci-dev)' instead of 'if (!mci-bus.name)' to
check if mem_ctl_info has been registered with sysfs.

v1:

With kernel version 3.6-rc2 on a Dell Poweredge 2600 I experienced a NULL
pointer dereference that did not occur with on 3.5. I believe the error is
related to commit de3910eb79a edac: change the mem allocation scheme to make
Documentation/kobject.txt happy [1] and the fact that my system is going
through an error path in the e7xxx_edac driver.

[1] 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=de3910eb79ac8c0f29a11224661c0ebaaf813039

This is the OOPS:

 [  36.703479] BUG: unable to handle kernel NULL pointer dereference at   (null)
 [  36.703479] IP: [c045e195] __wake_up_common+0x1a/0x6a
 [  36.703479] *pde = 7f0c6067
 [  36.703479] Oops:  [#1] SMP
 [  36.703479] Modules linked in: parport_pc parport floppy e7xxx_edac(+) 
ide_cd_mod edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero dm_mirror 
dm_region_hash d
 [  36.703479] Pid: 933, comm: modprobe Tainted: GW
3.6.0-rc2-00111-gc1999ee #12 Dell Computer Corporation PowerEdge 2600   
  /0F0364
 [  36.703479] EIP: 0060:[c045e195] EFLAGS: 00010093 CPU: 3
 [  36.703479] EIP is at __wake_up_common+0x1a/0x6a
 [  36.703479] EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
 [  36.703479] ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
 [  36.703479]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 [  36.703479] CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
 [  36.703479] DR0:  DR1:  DR2:  DR3: 
 [  36.703479] DR6: 0ff0 DR7: 0400
 [  36.703479] Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 
task.ti=f3dc6000)
 [  36.703479] Stack:
 [  36.703479]    0003 c046701a f47b0980 f47b0984 0282 
f3dc7d54
 [  36.703479]  c046703f   f47b08b0 f47b08b0  f3dc7d74 
c06961ce
 [  36.703479]  f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 f3dc7d88 
c068d56d
 [  36.703479] Call Trace:
 [  36.703479]  [c046701a] ? complete_all+0x1a/0x50
 [  36.703479]  [c046703f] complete_all+0x3f/0x50
 [  36.703479]  [c06961ce] device_pm_remove+0x23/0xa2
 [  36.703479]  [c05e2837] ? kobject_put+0x5b/0x5d
 [  36.703479]  [c068d56d] device_del+0x34/0x142
 [  36.703479]  [f8547884] edac_unregister_sysfs+0x3b/0x5c [edac_core]
 [  36.703479]  [f8545041] edac_mc_free+0x29/0x2f [edac_core]
 [  36.703479]  [f860163f] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
 [  36.703479]  [c0603d00] ? __pci_enable_device_flags+0x8f/0xd3
 [  36.703479]  [f8601b0b] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
 [  36.703479]  [c0604f85] local_pci_probe+0x13/0x15
 ...


Fengguang Wu (1):
  edac_mc: fix kfree calls in the error path

Shaun Ruffell (2):
  edac: edac_mc_free() cannot assume mem_ctl_info is registered in
sysfs.
  edac: edac_mc no longer deals with kobjects directly.

 drivers/edac/edac_mc.c | 64 ++
 include/linux/edac.h   |  1 -
 2 files changed, 39 insertions(+), 26 deletions(-)

-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 3/3] edac: edac_mc no longer deals with kobjects directly.

2012-09-14 Thread Shaun Ruffell
There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.

Signed-off-by: Shaun Ruffell sruff...@digium.com
---
 drivers/edac/edac_mc.c | 7 ---
 include/linux/edac.h   | 1 -
 2 files changed, 8 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index d5dc9da..02f0d3e 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -433,13 +433,6 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 
mci-op_state = OP_ALLOC;
 
-   /* at this point, the root kobj is valid, and in order to
-* 'free' the object, then the function:
-*  edac_mc_unregister_sysfs_main_kobj() must be called
-* which will perform kobj unregistration and the actual free
-* will occur during the kobject callback operation
-*/
-
return mci;
 
 error:
diff --git a/include/linux/edac.h b/include/linux/edac.h
index bab9f84..aeddb3f 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -14,7 +14,6 @@
 
 #include linux/atomic.h
 #include linux/device.h
-#include linux/kobject.h
 #include linux/completion.h
 #include linux/workqueue.h
 #include linux/debugfs.h
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] edac_mc: fix kfree calls in the error path

2012-09-14 Thread Shaun Ruffell
On Fri, Sep 14, 2012 at 12:58:16PM -0500, Shaun Ruffell wrote:
 From: Fengguang Wu fengguang...@intel.com
 
 From: Fengguang Wu fengguang...@intel.com
 
Darn it. If you push this through, mind fixing the above?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] edac/85xx: fix error handle of mpc85xx_mc_err_probe

2012-09-14 Thread Shaun Ruffell
On Thu, Sep 13, 2012 at 06:55:29PM +0800, Shaohui Xie wrote:
 Error handle in case of DDR ECC off is wrong, sysfs entries have not been
 created, so edac_mc_free which frees a mci instance should not be called.
 Also, free mci's memory in this case.

Jus FYI: I ran into the same error in edac_mc_free() which I
resolved in a slightly different way in some patches I sent
previously. [1]

[1] https://lkml.org/lkml/2012/9/14/475 

Cheers,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Fix edac_mc crash in e7xxx_edac error path.

2012-09-08 Thread Shaun Ruffell
Just a friendly reminder that I'm still seeing this NULL pointer
dereference on boot with 3.6-rc4.

On Sat, Aug 18, 2012 at 11:11:21PM -0500, Shaun Ruffell wrote:
> With kernel version 3.6-rc2 on a Dell Poweredge 2600 I experienced a NULL
> pointer dereference that did not occur with on 3.5. I believe the error is
> related to commit de3910eb79a "edac: change the mem allocation scheme to make
> Documentation/kobject.txt happy" [1] and the fact that my system is going
> through an error path in the e7xxx_edac driver.
> 
> [1] 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=de3910eb79ac8c0f29a11224661c0ebaaf813039
> 
> This is the OOPS:
> 
>  [  36.703479] BUG: unable to handle kernel NULL pointer dereference at   
> (null)
>  [  36.703479] IP: [] __wake_up_common+0x1a/0x6a
>  [  36.703479] *pde = 7f0c6067 
>  [  36.703479] Oops:  [#1] SMP 
>  [  36.703479] Modules linked in: parport_pc parport floppy e7xxx_edac(+) 
> ide_cd_mod edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero 
> dm_mirror dm_region_hash d
>  [  36.703479] Pid: 933, comm: modprobe Tainted: GW
> 3.6.0-rc2-00111-gc1999ee #12 Dell Computer Corporation PowerEdge 2600 
> /0F0364
>  [  36.703479] EIP: 0060:[] EFLAGS: 00010093 CPU: 3
>  [  36.703479] EIP is at __wake_up_common+0x1a/0x6a
>  [  36.703479] EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
>  [  36.703479] ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
>  [  36.703479]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>  [  36.703479] CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
>  [  36.703479] DR0:  DR1:  DR2:  DR3: 
>  [  36.703479] DR6: 0ff0 DR7: 0400
>  [  36.703479] Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 
> task.ti=f3dc6000)
>  [  36.703479] Stack:
>  [  36.703479]    0003 c046701a f47b0980 f47b0984 
> 0282 f3dc7d54
>  [  36.703479]  c046703f   f47b08b0 f47b08b0  
> f3dc7d74 c06961ce
>  [  36.703479]  f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 
> f3dc7d88 c068d56d
>  [  36.703479] Call Trace:
>  [  36.703479]  [] ? complete_all+0x1a/0x50
>  [  36.703479]  [] complete_all+0x3f/0x50
>  [  36.703479]  [] device_pm_remove+0x23/0xa2
>  [  36.703479]  [] ? kobject_put+0x5b/0x5d
>  [  36.703479]  [] device_del+0x34/0x142
>  [  36.703479]  [] edac_unregister_sysfs+0x3b/0x5c [edac_core]
>  [  36.703479]  [] edac_mc_free+0x29/0x2f [edac_core]
>  [  36.703479]  [] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
>  [  36.703479]  [] ? __pci_enable_device_flags+0x8f/0xd3
>  [  36.703479]  [] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
>  [  36.703479]  [] local_pci_probe+0x13/0x15
>  [  36.703479]  [] pci_call_probe+0x1c/0x1e
>  [  36.703479]  [] __pci_device_probe+0x41/0x4e
>  [  36.703479]  [] pci_device_probe+0x26/0x39
>  [  36.703479]  [] really_probe+0x101/0x2a1
>  [  36.703479]  [] ? __driver_attach+0x3d/0x6e
>  [  36.703479]  [] ? __driver_attach+0x3d/0x6e
>  [  36.703479]  [] ? quirk_usb_disable_ehci+0xa3/0x141
>  [  36.703479]  [] driver_probe_device+0x35/0x79
>  [  36.703479]  [] __driver_attach+0x6c/0x6e
>  [  36.703479]  [] bus_for_each_dev+0x44/0x62
>  [  36.703479]  [] driver_attach+0x1e/0x20
>  [  36.703479]  [] ? device_attach+0x98/0x98
>  [  36.703479]  [] bus_add_driver+0xc5/0x1c8
>  [  36.703479]  [] ? store_new_id+0xfa/0xfa
>  [  36.703479]  [] driver_register+0x52/0xd6
>  [  36.703479]  [] ? 0xf8603fff
>  [  36.703479]  [] __pci_register_driver+0x4b/0x73
>  [  36.703479]  [] ? 0xf8603fff
>  [  36.703479]  [] e7xxx_init+0x55/0x57 [e7xxx_edac]
>  [  36.703479]  [] do_one_initcall+0xa3/0xe0
>  [  36.703479]  [] sys_init_module+0x70/0x1af
>  [  36.703479]  [] ? trace_hardirqs_on_caller+0x56/0xf9
>  [  36.703479]  [] ? trace_hardirqs_on_thunk+0xc/0x10
>  [  36.703479]  [] sysenter_do_call+0x12/0x32
>  [  36.703479] Code: 5d c3 55 89 e5 3e 8d 74 26 00 e8 8f ff ff ff 5d c3 55 89 
> e5 57 56 53 83 ec 10 3e 8d 74 26 00 89 55 ec 89 4d e8 8b 58 28 83 eb 0c <8b> 
> 53 0c 83 c0 28
>  [  36.703479] EIP: [] __wake_up_common+0x1a/0x6a SS:ESP 
> 0068:f3dc7d1c
>  [  36.703479] CR2: 
>  [  36.703479] ---[ end trace 6fcfddc0eef7bbd8 ]---
> 
> When I enabled edac debugging I saw the following printed to the kernel log
> prior to the above BUG:
> 
>   EDAC MC: Ver: 3.0.0
>   EDAC DEBUG: edac_mc_sysfs_init: device mc created
>   EDAC DEBUG: e7xxx_init_one:
>   EDAC DEBUG: e7xxx_probe1: mci
>   EDAC DEBUG: edac_mc_alloc: errcount layer 0 size 8
>   EDAC DEBUG: edac_mc_alloc: errcount layer 1 size 16
>   EDAC DEBUG: edac_mc_alloc: allocating 48 error counters
>   EDAC DEBUG: edac_mc_alloc: allocating 106

Re: [PATCH 0/3] Fix edac_mc crash in e7xxx_edac error path.

2012-09-08 Thread Shaun Ruffell
Just a friendly reminder that I'm still seeing this NULL pointer
dereference on boot with 3.6-rc4.

On Sat, Aug 18, 2012 at 11:11:21PM -0500, Shaun Ruffell wrote:
 With kernel version 3.6-rc2 on a Dell Poweredge 2600 I experienced a NULL
 pointer dereference that did not occur with on 3.5. I believe the error is
 related to commit de3910eb79a edac: change the mem allocation scheme to make
 Documentation/kobject.txt happy [1] and the fact that my system is going
 through an error path in the e7xxx_edac driver.
 
 [1] 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=de3910eb79ac8c0f29a11224661c0ebaaf813039
 
 This is the OOPS:
 
  [  36.703479] BUG: unable to handle kernel NULL pointer dereference at   
 (null)
  [  36.703479] IP: [c045e195] __wake_up_common+0x1a/0x6a
  [  36.703479] *pde = 7f0c6067 
  [  36.703479] Oops:  [#1] SMP 
  [  36.703479] Modules linked in: parport_pc parport floppy e7xxx_edac(+) 
 ide_cd_mod edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero 
 dm_mirror dm_region_hash d
  [  36.703479] Pid: 933, comm: modprobe Tainted: GW
 3.6.0-rc2-00111-gc1999ee #12 Dell Computer Corporation PowerEdge 2600 
 /0F0364
  [  36.703479] EIP: 0060:[c045e195] EFLAGS: 00010093 CPU: 3
  [  36.703479] EIP is at __wake_up_common+0x1a/0x6a
  [  36.703479] EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
  [  36.703479] ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
  [  36.703479]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  [  36.703479] CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
  [  36.703479] DR0:  DR1:  DR2:  DR3: 
  [  36.703479] DR6: 0ff0 DR7: 0400
  [  36.703479] Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 
 task.ti=f3dc6000)
  [  36.703479] Stack:
  [  36.703479]    0003 c046701a f47b0980 f47b0984 
 0282 f3dc7d54
  [  36.703479]  c046703f   f47b08b0 f47b08b0  
 f3dc7d74 c06961ce
  [  36.703479]  f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 
 f3dc7d88 c068d56d
  [  36.703479] Call Trace:
  [  36.703479]  [c046701a] ? complete_all+0x1a/0x50
  [  36.703479]  [c046703f] complete_all+0x3f/0x50
  [  36.703479]  [c06961ce] device_pm_remove+0x23/0xa2
  [  36.703479]  [c05e2837] ? kobject_put+0x5b/0x5d
  [  36.703479]  [c068d56d] device_del+0x34/0x142
  [  36.703479]  [f8547884] edac_unregister_sysfs+0x3b/0x5c [edac_core]
  [  36.703479]  [f8545041] edac_mc_free+0x29/0x2f [edac_core]
  [  36.703479]  [f860163f] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
  [  36.703479]  [c0603d00] ? __pci_enable_device_flags+0x8f/0xd3
  [  36.703479]  [f8601b0b] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
  [  36.703479]  [c0604f85] local_pci_probe+0x13/0x15
  [  36.703479]  [c0605115] pci_call_probe+0x1c/0x1e
  [  36.703479]  [c0605158] __pci_device_probe+0x41/0x4e
  [  36.703479]  [c060579c] pci_device_probe+0x26/0x39
  [  36.703479]  [c06901ed] really_probe+0x101/0x2a1
  [  36.703479]  [c0690680] ? __driver_attach+0x3d/0x6e
  [  36.703479]  [c0690680] ? __driver_attach+0x3d/0x6e
  [  36.703479]  [c07e] ? quirk_usb_disable_ehci+0xa3/0x141
  [  36.703479]  [c06903c2] driver_probe_device+0x35/0x79
  [  36.703479]  [c06906af] __driver_attach+0x6c/0x6e
  [  36.703479]  [c068eb25] bus_for_each_dev+0x44/0x62
  [  36.703479]  [c068ff00] driver_attach+0x1e/0x20
  [  36.703479]  [c0690643] ? device_attach+0x98/0x98
  [  36.703479]  [c068fa12] bus_add_driver+0xc5/0x1c8
  [  36.703479]  [c0605beb] ? store_new_id+0xfa/0xfa
  [  36.703479]  [c0690bfb] driver_register+0x52/0xd6
  [  36.703479]  [f8604000] ? 0xf8603fff
  [  36.703479]  [c0605a30] __pci_register_driver+0x4b/0x73
  [  36.703479]  [f8604000] ? 0xf8603fff
  [  36.703479]  [f8604055] e7xxx_init+0x55/0x57 [e7xxx_edac]
  [  36.703479]  [c040120e] do_one_initcall+0xa3/0xe0
  [  36.703479]  [c0491023] sys_init_module+0x70/0x1af
  [  36.703479]  [c04823a9] ? trace_hardirqs_on_caller+0x56/0xf9
  [  36.703479]  [c05ec7e8] ? trace_hardirqs_on_thunk+0xc/0x10
  [  36.703479]  [c07f678c] sysenter_do_call+0x12/0x32
  [  36.703479] Code: 5d c3 55 89 e5 3e 8d 74 26 00 e8 8f ff ff ff 5d c3 55 89 
 e5 57 56 53 83 ec 10 3e 8d 74 26 00 89 55 ec 89 4d e8 8b 58 28 83 eb 0c 8b 
 53 0c 83 c0 28
  [  36.703479] EIP: [c045e195] __wake_up_common+0x1a/0x6a SS:ESP 
 0068:f3dc7d1c
  [  36.703479] CR2: 
  [  36.703479] ---[ end trace 6fcfddc0eef7bbd8 ]---
 
 When I enabled edac debugging I saw the following printed to the kernel log
 prior to the above BUG:
 
   EDAC MC: Ver: 3.0.0
   EDAC DEBUG: edac_mc_sysfs_init: device mc created
   EDAC DEBUG: e7xxx_init_one:
   EDAC DEBUG: e7xxx_probe1: mci
   EDAC DEBUG: edac_mc_alloc: errcount layer 0 size 8
   EDAC DEBUG: edac_mc_alloc: errcount layer 1 size 16
   EDAC DEBUG: edac_mc_alloc: allocating 48 error counters
   EDAC DEBUG: edac_mc_alloc: allocating 1068 bytes for mci data (16 ranks, 16 
 csrows/channels)
   EDAC DEBUG: e7xxx_probe1: init mci

Re: BUG: unable to handle kernel paging request at 00010016

2012-08-19 Thread Shaun Ruffell
[ Fixing netdev cc to use proper dadress... ]

On Sun, Aug 19, 2012 at 12:21:17PM +0400, Artem Savkov wrote:
> On Sat, Aug 18, 2012 at 11:25:43PM -0500, Shaun Ruffell wrote:
> > Adding linux-net to the CC list.
> > 
> > On Fri, Aug 17, 2012 at 11:57:56PM +0100, Dave Haywood wrote:
> > > [1.] One line summary of the problem:
> > > BUG: unable to handle kernel paging request at 00010016
> > > 
> > >   System boots then crashes a 5-10 or so seconds after getting to the 
> > > login prompt
> > >   Booting without the network cable attached prevents the crash (no 
> > > evidence beyond 10 minutes after boot)
> > > 
> > >   Diagnostics:
> > >   Captured the boot and managed a login + dmesg before the crash
> > >   Some of the log looks corrupted. Probably my crappy usb dongle serial 
> > > flow control but left it in anyway
> > 
> > [snip]
> > 
> > Just a note that I see this as well. It happens reliably for me after 
> > trying to
> > login to the machine via ssh.
> > 
> > Here is the back trace I collected on the serial port.
>
> There is a patch posted on netdev that fixes this for me:
> http://patchwork.ozlabs.org/patch/178525/

Thanks, I just applied that patch and confirmed that it does indeed
resolve the crash for me as well.

Cheers,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel paging request at 00010016

2012-08-19 Thread Shaun Ruffell
[ Fixing netdev cc to use proper dadress... ]

On Sun, Aug 19, 2012 at 12:21:17PM +0400, Artem Savkov wrote:
 On Sat, Aug 18, 2012 at 11:25:43PM -0500, Shaun Ruffell wrote:
  Adding linux-net to the CC list.
  
  On Fri, Aug 17, 2012 at 11:57:56PM +0100, Dave Haywood wrote:
   [1.] One line summary of the problem:
   BUG: unable to handle kernel paging request at 00010016
   
 System boots then crashes a 5-10 or so seconds after getting to the 
   login prompt
 Booting without the network cable attached prevents the crash (no 
   evidence beyond 10 minutes after boot)
   
 Diagnostics:
 Captured the boot and managed a login + dmesg before the crash
 Some of the log looks corrupted. Probably my crappy usb dongle serial 
   flow control but left it in anyway
  
  [snip]
  
  Just a note that I see this as well. It happens reliably for me after 
  trying to
  login to the machine via ssh.
  
  Here is the back trace I collected on the serial port.

 There is a patch posted on netdev that fixes this for me:
 http://patchwork.ozlabs.org/patch/178525/

Thanks, I just applied that patch and confirmed that it does indeed
resolve the crash for me as well.

Cheers,
Shaun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] edac: edac_mc no longer deals with kobjects directly.

2012-08-18 Thread Shaun Ruffell
There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.

Signed-off-by: Shaun Ruffell 
---
 drivers/edac/edac_mc.c | 7 ---
 include/linux/edac.h   | 1 -
 2 files changed, 8 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index a58facc..65c59b1 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -433,13 +433,6 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 
mci->op_state = OP_ALLOC;
 
-   /* at this point, the root kobj is valid, and in order to
-* 'free' the object, then the function:
-*  edac_mc_unregister_sysfs_main_kobj() must be called
-* which will perform kobj unregistration and the actual free
-* will occur during the kobject callback operation
-*/
-
return mci;
 
 error:
diff --git a/include/linux/edac.h b/include/linux/edac.h
index bab9f84..aeddb3f 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -14,7 +14,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] Fix edac_mc crash in e7xxx_edac error path.

2012-08-18 Thread Shaun Ruffell
With kernel version 3.6-rc2 on a Dell Poweredge 2600 I experienced a NULL
pointer dereference that did not occur with on 3.5. I believe the error is
related to commit de3910eb79a "edac: change the mem allocation scheme to make
Documentation/kobject.txt happy" [1] and the fact that my system is going
through an error path in the e7xxx_edac driver.

[1] 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=de3910eb79ac8c0f29a11224661c0ebaaf813039

This is the OOPS:

 [  36.703479] BUG: unable to handle kernel NULL pointer dereference at   (null)
 [  36.703479] IP: [] __wake_up_common+0x1a/0x6a
 [  36.703479] *pde = 7f0c6067 
 [  36.703479] Oops:  [#1] SMP 
 [  36.703479] Modules linked in: parport_pc parport floppy e7xxx_edac(+) 
ide_cd_mod edac_core intel_rng cdrom microcode(+) dm_snapshot dm_zero dm_mirror 
dm_region_hash d
 [  36.703479] Pid: 933, comm: modprobe Tainted: GW
3.6.0-rc2-00111-gc1999ee #12 Dell Computer Corporation PowerEdge 2600   
  /0F0364
 [  36.703479] EIP: 0060:[] EFLAGS: 00010093 CPU: 3
 [  36.703479] EIP is at __wake_up_common+0x1a/0x6a
 [  36.703479] EAX: f47b0984 EBX: fff4 ECX:  EDX: 0003
 [  36.703479] ESI: f47b0984 EDI: 0282 EBP: f3dc7d38 ESP: f3dc7d1c
 [  36.703479]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 [  36.703479] CR0: 8005003b CR2:  CR3: 347d4000 CR4: 07d0
 [  36.703479] DR0:  DR1:  DR2:  DR3: 
 [  36.703479] DR6: 0ff0 DR7: 0400
 [  36.703479] Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 
task.ti=f3dc6000)
 [  36.703479] Stack:
 [  36.703479]    0003 c046701a f47b0980 f47b0984 0282 
f3dc7d54
 [  36.703479]  c046703f   f47b08b0 f47b08b0  f3dc7d74 
c06961ce
 [  36.703479]  f3dc7d74 f3dc7d80 c05e2837 c094c4cc f47b08b0 f47b08b0 f3dc7d88 
c068d56d
 [  36.703479] Call Trace:
 [  36.703479]  [] ? complete_all+0x1a/0x50
 [  36.703479]  [] complete_all+0x3f/0x50
 [  36.703479]  [] device_pm_remove+0x23/0xa2
 [  36.703479]  [] ? kobject_put+0x5b/0x5d
 [  36.703479]  [] device_del+0x34/0x142
 [  36.703479]  [] edac_unregister_sysfs+0x3b/0x5c [edac_core]
 [  36.703479]  [] edac_mc_free+0x29/0x2f [edac_core]
 [  36.703479]  [] e7xxx_probe1+0x268/0x311 [e7xxx_edac]
 [  36.703479]  [] ? __pci_enable_device_flags+0x8f/0xd3
 [  36.703479]  [] e7xxx_init_one+0x56/0x61 [e7xxx_edac]
 [  36.703479]  [] local_pci_probe+0x13/0x15
 [  36.703479]  [] pci_call_probe+0x1c/0x1e
 [  36.703479]  [] __pci_device_probe+0x41/0x4e
 [  36.703479]  [] pci_device_probe+0x26/0x39
 [  36.703479]  [] really_probe+0x101/0x2a1
 [  36.703479]  [] ? __driver_attach+0x3d/0x6e
 [  36.703479]  [] ? __driver_attach+0x3d/0x6e
 [  36.703479]  [] ? quirk_usb_disable_ehci+0xa3/0x141
 [  36.703479]  [] driver_probe_device+0x35/0x79
 [  36.703479]  [] __driver_attach+0x6c/0x6e
 [  36.703479]  [] bus_for_each_dev+0x44/0x62
 [  36.703479]  [] driver_attach+0x1e/0x20
 [  36.703479]  [] ? device_attach+0x98/0x98
 [  36.703479]  [] bus_add_driver+0xc5/0x1c8
 [  36.703479]  [] ? store_new_id+0xfa/0xfa
 [  36.703479]  [] driver_register+0x52/0xd6
 [  36.703479]  [] ? 0xf8603fff
 [  36.703479]  [] __pci_register_driver+0x4b/0x73
 [  36.703479]  [] ? 0xf8603fff
 [  36.703479]  [] e7xxx_init+0x55/0x57 [e7xxx_edac]
 [  36.703479]  [] do_one_initcall+0xa3/0xe0
 [  36.703479]  [] sys_init_module+0x70/0x1af
 [  36.703479]  [] ? trace_hardirqs_on_caller+0x56/0xf9
 [  36.703479]  [] ? trace_hardirqs_on_thunk+0xc/0x10
 [  36.703479]  [] sysenter_do_call+0x12/0x32
 [  36.703479] Code: 5d c3 55 89 e5 3e 8d 74 26 00 e8 8f ff ff ff 5d c3 55 89 
e5 57 56 53 83 ec 10 3e 8d 74 26 00 89 55 ec 89 4d e8 8b 58 28 83 eb 0c <8b> 53 
0c 83 c0 28
 [  36.703479] EIP: [] __wake_up_common+0x1a/0x6a SS:ESP 0068:f3dc7d1c
 [  36.703479] CR2: 
 [  36.703479] ---[ end trace 6fcfddc0eef7bbd8 ]---

When I enabled edac debugging I saw the following printed to the kernel log
prior to the above BUG:

  EDAC MC: Ver: 3.0.0
  EDAC DEBUG: edac_mc_sysfs_init: device mc created
  EDAC DEBUG: e7xxx_init_one:
  EDAC DEBUG: e7xxx_probe1: mci
  EDAC DEBUG: edac_mc_alloc: errcount layer 0 size 8
  EDAC DEBUG: edac_mc_alloc: errcount layer 1 size 16
  EDAC DEBUG: edac_mc_alloc: allocating 48 error counters
  EDAC DEBUG: edac_mc_alloc: allocating 1068 bytes for mci data (16 ranks, 16 
csrows/channels)
  EDAC DEBUG: e7xxx_probe1: init mci
  EDAC DEBUG: e7xxx_probe1: init pvt
  EDAC e7xxx: error reporting device not found:vendor 8086 device 0x2541 
(broken BIOS?)
  EDAC DEBUG: edac_mc_free:
  Floppy drive(s): fd0 is 1.44M
  EDAC DEBUG: edac_unregister_sysfs: Unregistering device (null)

There are probably better ways to accomplish what the following patches are
doing but I thought I would send along what I had if only to motivate any
discussion. I also have resent Fengguang Wu's patch in this series since I found
that it was required as well.

Shaun Ruffell (2):
  edac: Remo

[PATCH 2/3] edac: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.

2012-08-18 Thread Shaun Ruffell
edac_mc_free() may need to deallocate any memory associated with struct
mem_ctl_info directly if the structure was never registered with sysfs in
edac_mc_add_mc(). This moves the error handling code from edac_mc_alloc() into a
dedicated function to be called by edac_mc_free() as well if necessary.

This resolves a NULL pointer dereference from the following code path first
introduced in 3.6-rc1:

  EDAC MC: Ver: 3.0.0
  EDAC DEBUG: edac_mc_sysfs_init: device mc created
  EDAC DEBUG: e7xxx_init_one:
  EDAC DEBUG: e7xxx_probe1: mci
  EDAC DEBUG: edac_mc_alloc: errcount layer 0 size 8
  EDAC DEBUG: edac_mc_alloc: errcount layer 1 size 16
  EDAC DEBUG: edac_mc_alloc: allocating 48 error counters
  EDAC DEBUG: edac_mc_alloc: allocating 1068 bytes for mci data (16 ranks, 16 
csrows/channels)
  EDAC DEBUG: e7xxx_probe1: init mci
  EDAC DEBUG: e7xxx_probe1: init pvt
  EDAC e7xxx: error reporting device not found:vendor 8086 device 0x2541 
(broken BIOS?)
  EDAC DEBUG: edac_mc_free:
  Floppy drive(s): fd0 is 1.44M
  EDAC DEBUG: edac_unregister_sysfs: Unregistering device (null)

Signed-off-by: Shaun Ruffell 
---
 drivers/edac/edac_mc.c | 59 +-
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 9037ffa..a58facc 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -199,6 +199,36 @@ void *edac_align_ptr(void **p, unsigned size, int n_elems)
return (void *)(((unsigned long)ptr) + align - r);
 }
 
+static void _edac_mc_free(struct mem_ctl_info *mci)
+{
+   int i, chn, row;
+   struct csrow_info *csr;
+   const unsigned int tot_dimms = mci->tot_dimms;
+   const unsigned int tot_channels = mci->num_cschannel;
+   const unsigned int tot_csrows = mci->nr_csrows;
+
+   if (mci->dimms) {
+   for (i = 0; i < tot_dimms; i++)
+   kfree(mci->dimms[i]);
+   kfree(mci->dimms);
+   }
+   if (mci->csrows) {
+   for (row = 0; row < tot_csrows; row++) {
+   csr = mci->csrows[row];
+   if (csr) {
+   if (csr->channels) {
+   for (chn = 0; chn < tot_channels; chn++)
+   kfree(csr->channels[chn]);
+   kfree(csr->channels);
+   }
+   kfree(csr);
+   }
+   }
+   kfree(mci->csrows);
+   }
+   kfree(mci);
+}
+
 /**
  * edac_mc_alloc: Allocate and partially fill a struct mem_ctl_info structure
  * @mc_num:Memory controller number
@@ -413,26 +443,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
return mci;
 
 error:
-   if (mci->dimms) {
-   for (i = 0; i < tot_dimms; i++)
-   kfree(mci->dimms[i]);
-   kfree(mci->dimms);
-   }
-   if (mci->csrows) {
-   for (row = 0; row < tot_csrows; row++) {
-   csr = mci->csrows[row];
-   if (csr) {
-   if (csr->channels) {
-   for (chn = 0; chn < tot_channels; chn++)
-   kfree(csr->channels[chn]);
-   kfree(csr->channels);
-   }
-   kfree(csr);
-   }
-   }
-   kfree(mci->csrows);
-   }
-   kfree(mci);
+   _edac_mc_free(mci);
 
return NULL;
 }
@@ -447,6 +458,14 @@ void edac_mc_free(struct mem_ctl_info *mci)
 {
edac_dbg(1, "\n");
 
+   /* If we're not yet registered with sysfs free only what was allocated
+* in edac_mc_alloc().
+*/
+   if (!mci->bus.name) {
+   _edac_mc_free(mci);
+   return;
+   }
+
/* the mci instance is freed here, when the sysfs object is dropped */
edac_unregister_sysfs(mci);
 }
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] edac_mc: fix kfree calls in the error path

2012-08-18 Thread Shaun Ruffell
From: Fengguang Wu 

We need to free up memory in this order:

  free csrows[i]->channels[j]
  free csrows[i]->channels
  free csrows[i]
  free csrows

Signed-off-by: Fengguang Wu 
---
 drivers/edac/edac_mc.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 616d90b..9037ffa 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -419,14 +419,16 @@ error:
kfree(mci->dimms);
}
if (mci->csrows) {
-   for (chn = 0; chn < tot_channels; chn++) {
-   csr = mci->csrows[chn];
+   for (row = 0; row < tot_csrows; row++) {
+   csr = mci->csrows[row];
if (csr) {
-   for (chn = 0; chn < tot_channels; chn++)
-   kfree(csr->channels[chn]);
+   if (csr->channels) {
+   for (chn = 0; chn < tot_channels; chn++)
+   kfree(csr->channels[chn]);
+   kfree(csr->channels);
+   }
kfree(csr);
}
-   kfree(mci->csrows[i]);
}
kfree(mci->csrows);
}
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: unable to handle kernel paging request at 00010016

2012-08-18 Thread Shaun Ruffell
Adding linux-net to the CC list.

On Fri, Aug 17, 2012 at 11:57:56PM +0100, Dave Haywood wrote:
> [1.] One line summary of the problem:
> BUG: unable to handle kernel paging request at 00010016
> 
>   System boots then crashes a 5-10 or so seconds after getting to the 
> login prompt
>   Booting without the network cable attached prevents the crash (no 
> evidence beyond 10 minutes after boot)
> 
>   Diagnostics:
>   Captured the boot and managed a login + dmesg before the crash
>   Some of the log looks corrupted. Probably my crappy usb dongle serial 
> flow control but left it in anyway

[snip]

> [6.] Output of Oops.. message (if applicable) with symbolic information
>  resolved (see Documentation/oops-tracing.txt)
> [   62.907899] BUG: unable to handle kernel paging request at 00010016
> [   62.908002] IP: [] inet6_sk_rx_dst_set+0x29/0x40
> [   62.908002] *pde = 
> [   62.908002] Oops:  [#1] SMP
> [   62.908002] Pid: 2168, comm: mprime Not tainted 3.6.0-rc2 #297 Compaq 
> Deskpro/06C4h
> [   62.908002] EIP: 0060:[] EFLAGS: 00010202 CPU: 0
> [   62.908002] EIP is at inet6_sk_rx_dst_set+0x29/0x40
> [   62.908002] EAX: ce738508 EBX: ce73a760 ECX: cf377000 EDX: 00010002
> [   62.908002] ESI: ca06c900 EDI: ce738000 EBP: cf80bc7c ESP: cf80bc7c
> [   62.908002]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [   62.908002] CR0: 80050033 CR2: 00010016 CR3: 0a036000 CR4: 07d0
> [   62.908002] DR0:  DR1:  DR2:  DR3: 
> [   62.908002] DR6: 0ff0 DR7: 0400
> [   62.908002] Process mprime (pid: 2168, ti=cf80a000 task=ce4f7390 
> task.ti=ca052000)
> [   62.908002] Stack:
> [   62.908002]  cf80bc9c c15449cb  ce436780  ce738508 
> ca06c900 ce738000
> [   62.908002]  cf80bcc0 c154337e c167545e cf80bcec c124b318 ce436780 
> ce738508 ce738000
> [   62.908002]  ce436780 cf80bd48 c15aff9d  0001 cf80bcec 
> c1236880 ce691ec0
> [   62.908002] Call Trace:
> [   62.908002]  [] tcp_create_openreq_child+0x3b/0x4a0
> [   62.908002]  [] tcp_v4_syn_recv_sock+0x2e/0x2a0
> [   62.908002]  [] ? _raw_spin_unlock_bh+0xe/0x10
> [   62.908002]  [] ? selinux_netlbl_sock_rcv_skb+0x18/0x190
> [   62.908002]  [] tcp_v6_syn_recv_sock+0x3ed/0x6d0
> [   62.908002]  [] ? selinux_parse_skb+0x50/0xb0
> [   62.908002]  [] tcp_check_req+0x283/0x450
> [   62.908002]  [] tcp_v4_hnd_req+0x51/0x140
> [   62.908002]  [] tcp_v4_do_rcv+0x129/0x1b0
> [   62.908002]  [] ? sk_filter+0x25/0xb0
> [   62.908002]  [] tcp_v4_rcv+0x5fe/0x730
> [   62.908002]  [] ? ip_rcv_finish+0x2f0/0x2f0
> [   62.908002]  [] ip_local_deliver_finish+0x8c/0x260
> [   62.908002]  [] ? inet_del_protocol+0x30/0x30
> [   62.908002]  [] ip_local_deliver+0x7f/0x90
> [   62.908002]  [] ? ip_rcv_finish+0x2f0/0x2f0
> [   62.908002]  [] ip_rcv_finish+0xf1/0x2f0
> [   62.908002]  [] ? inet_del_protocol+0x30/0x30
> [   62.908002]  [] ip_rcv+0x252/0x320
> [   62.908002]  [] ? inet_del_protocol+0x30/0x30
> [   62.908002]  [] __netif_receive_skb+0x46b/0x670
> [   62.908002]  [] netif_receive_skb+0x22/0x80
> [   62.908002]  [] rtl8139_rx+0xd2/0x370
> [   62.908002]  [] rtl8139_poll+0x42/0xb0
> [   62.908002]  [] net_rx_action+0xed/0x1c0
> [   62.908002]  [] ? fbcon_add_cursor_timer+0xd0/0xd0
> [   62.908002]  [] __do_softirq+0xa7/0x200
> [   62.908002]  [] ? local_bh_enable_ip+0x80/0x80
> [   62.908002]  
> [   62.908002]  [] ? irq_exit+0x6e/0x90
> [   62.908002]  [] ? do_IRQ+0x46/0xb0
> [   62.908002]  [] ? irq_exit+0x57/0x90
> [   62.908002]  [] ? smp_apic_timer_interrupt+0x56/0x90
> [   62.908002]  [] ? common_interrupt+0x29/0x30
> [   62.908002] Code: 90 90 55 8b 4a 48 89 e5 83 e1 fe 3e ff 41 40 89 88 8c 00 
> 00 00 8b 52 74 89 90 cc 01 00 00 8b 51 58 85 d2 74 0c 8b 80 a0 01 00 00 <8b> 
> 52 14 89 50 68 5d c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90
> [   62.908002] EIP: [] inet6_sk_rx_dst_set+0x29/0x40 SS:ESP 
> 0068:cf80bc7c
> [   62.908002] CR2: 00010016
> [   63.212118] ---[ end trace 1fcc7fe92846c9d3 ]---
> [   63.216734] Kernel panic - not syncing: Fatal exception in interrupt

Just a note that I see this as well. It happens reliably for me after trying to
login to the machine via ssh.

Here is the back trace I collected on the serial port.

[   67.258206] BUG: unable to handle kernel paging request at 00010016
[   67.260010] IP: [] inet6_sk_rx_dst_set+0x3a/0x89 [ipv6]
[   67.260010] *pde = 
[   67.260010] Oops:  [#1] SMP
[   67.260010] Modules linked in: bluetooth rfkill crc16 lockd sunrpc ipv6 
dm_multipath lp sg pcspkr serio_raw e1000 ata_piix libata floppy parport_pc pa 
rport e7xxx_edac edac_core ide_cd_mod cdrom intel_rng dm_snapshot dm_zero 
dm_mirror dm_region_hash dm_log dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mo
d ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
[   67.260010] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc2-00117-g741badf #14 
Dell Computer Corporation PowerEdge 2600 /0F0364
[   67.260010] EIP: 0060:[] EFLAGS: 

Re: BUG: unable to handle kernel paging request at 00010016

2012-08-18 Thread Shaun Ruffell
Adding linux-net to the CC list.

On Fri, Aug 17, 2012 at 11:57:56PM +0100, Dave Haywood wrote:
 [1.] One line summary of the problem:
 BUG: unable to handle kernel paging request at 00010016
 
   System boots then crashes a 5-10 or so seconds after getting to the 
 login prompt
   Booting without the network cable attached prevents the crash (no 
 evidence beyond 10 minutes after boot)
 
   Diagnostics:
   Captured the boot and managed a login + dmesg before the crash
   Some of the log looks corrupted. Probably my crappy usb dongle serial 
 flow control but left it in anyway

[snip]

 [6.] Output of Oops.. message (if applicable) with symbolic information
  resolved (see Documentation/oops-tracing.txt)
 [   62.907899] BUG: unable to handle kernel paging request at 00010016
 [   62.908002] IP: [c15acfc9] inet6_sk_rx_dst_set+0x29/0x40
 [   62.908002] *pde = 
 [   62.908002] Oops:  [#1] SMP
 [   62.908002] Pid: 2168, comm: mprime Not tainted 3.6.0-rc2 #297 Compaq 
 Deskpro/06C4h
 [   62.908002] EIP: 0060:[c15acfc9] EFLAGS: 00010202 CPU: 0
 [   62.908002] EIP is at inet6_sk_rx_dst_set+0x29/0x40
 [   62.908002] EAX: ce738508 EBX: ce73a760 ECX: cf377000 EDX: 00010002
 [   62.908002] ESI: ca06c900 EDI: ce738000 EBP: cf80bc7c ESP: cf80bc7c
 [   62.908002]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 [   62.908002] CR0: 80050033 CR2: 00010016 CR3: 0a036000 CR4: 07d0
 [   62.908002] DR0:  DR1:  DR2:  DR3: 
 [   62.908002] DR6: 0ff0 DR7: 0400
 [   62.908002] Process mprime (pid: 2168, ti=cf80a000 task=ce4f7390 
 task.ti=ca052000)
 [   62.908002] Stack:
 [   62.908002]  cf80bc9c c15449cb  ce436780  ce738508 
 ca06c900 ce738000
 [   62.908002]  cf80bcc0 c154337e c167545e cf80bcec c124b318 ce436780 
 ce738508 ce738000
 [   62.908002]  ce436780 cf80bd48 c15aff9d  0001 cf80bcec 
 c1236880 ce691ec0
 [   62.908002] Call Trace:
 [   62.908002]  [c15449cb] tcp_create_openreq_child+0x3b/0x4a0
 [   62.908002]  [c154337e] tcp_v4_syn_recv_sock+0x2e/0x2a0
 [   62.908002]  [c167545e] ? _raw_spin_unlock_bh+0xe/0x10
 [   62.908002]  [c124b318] ? selinux_netlbl_sock_rcv_skb+0x18/0x190
 [   62.908002]  [c15aff9d] tcp_v6_syn_recv_sock+0x3ed/0x6d0
 [   62.908002]  [c1236880] ? selinux_parse_skb+0x50/0xb0
 [   62.908002]  [c15450b3] tcp_check_req+0x283/0x450
 [   62.908002]  [c1541191] tcp_v4_hnd_req+0x51/0x140
 [   62.908002]  [c1542e69] tcp_v4_do_rcv+0x129/0x1b0
 [   62.908002]  [c14f5b45] ? sk_filter+0x25/0xb0
 [   62.908002]  [c154407e] tcp_v4_rcv+0x5fe/0x730
 [   62.908002]  [c15209b0] ? ip_rcv_finish+0x2f0/0x2f0
 [   62.908002]  [c1520a3c] ip_local_deliver_finish+0x8c/0x260
 [   62.908002]  [c15206c0] ? inet_del_protocol+0x30/0x30
 [   62.908002]  [c1520d9f] ip_local_deliver+0x7f/0x90
 [   62.908002]  [c15209b0] ? ip_rcv_finish+0x2f0/0x2f0
 [   62.908002]  [c15207b1] ip_rcv_finish+0xf1/0x2f0
 [   62.908002]  [c15206c0] ? inet_del_protocol+0x30/0x30
 [   62.908002]  [c1521002] ip_rcv+0x252/0x320
 [   62.908002]  [c15206c0] ? inet_del_protocol+0x30/0x30
 [   62.908002]  [c14e19bb] __netif_receive_skb+0x46b/0x670
 [   62.908002]  [c14e4b72] netif_receive_skb+0x22/0x80
 [   62.908002]  [c13eb6e2] rtl8139_rx+0xd2/0x370
 [   62.908002]  [c13eb9c2] rtl8139_poll+0x42/0xb0
 [   62.908002]  [c14e56dd] net_rx_action+0xed/0x1c0
 [   62.908002]  [c12c6110] ? fbcon_add_cursor_timer+0xd0/0xd0
 [   62.908002]  [c10427a7] __do_softirq+0xa7/0x200
 [   62.908002]  [c1042700] ? local_bh_enable_ip+0x80/0x80
 [   62.908002]  IRQ
 [   62.908002]  [c1042b0e] ? irq_exit+0x6e/0x90
 [   62.908002]  [c1004176] ? do_IRQ+0x46/0xb0
 [   62.908002]  [c1042af7] ? irq_exit+0x57/0x90
 [   62.908002]  [c10224a6] ? smp_apic_timer_interrupt+0x56/0x90
 [   62.908002]  [c1676289] ? common_interrupt+0x29/0x30
 [   62.908002] Code: 90 90 55 8b 4a 48 89 e5 83 e1 fe 3e ff 41 40 89 88 8c 00 
 00 00 8b 52 74 89 90 cc 01 00 00 8b 51 58 85 d2 74 0c 8b 80 a0 01 00 00 8b 
 52 14 89 50 68 5d c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90
 [   62.908002] EIP: [c15acfc9] inet6_sk_rx_dst_set+0x29/0x40 SS:ESP 
 0068:cf80bc7c
 [   62.908002] CR2: 00010016
 [   63.212118] ---[ end trace 1fcc7fe92846c9d3 ]---
 [   63.216734] Kernel panic - not syncing: Fatal exception in interrupt

Just a note that I see this as well. It happens reliably for me after trying to
login to the machine via ssh.

Here is the back trace I collected on the serial port.

[   67.258206] BUG: unable to handle kernel paging request at 00010016
[   67.260010] IP: [f93a4ae6] inet6_sk_rx_dst_set+0x3a/0x89 [ipv6]
[   67.260010] *pde = 
[   67.260010] Oops:  [#1] SMP
[   67.260010] Modules linked in: bluetooth rfkill crc16 lockd sunrpc ipv6 
dm_multipath lp sg pcspkr serio_raw e1000 ata_piix libata floppy parport_pc pa 
rport e7xxx_edac edac_core ide_cd_mod cdrom intel_rng dm_snapshot dm_zero 
dm_mirror dm_region_hash dm_log dm_mod megaraid_mbox megaraid_mm sd_mod scsi_mo
d ext3 jbd uhci_hcd ohci_hcd 

[PATCH 1/3] edac_mc: fix kfree calls in the error path

2012-08-18 Thread Shaun Ruffell
From: Fengguang Wu fengguang...@intel.com

We need to free up memory in this order:

  free csrows[i]-channels[j]
  free csrows[i]-channels
  free csrows[i]
  free csrows

Signed-off-by: Fengguang Wu fengguang...@intel.com
---
 drivers/edac/edac_mc.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 616d90b..9037ffa 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -419,14 +419,16 @@ error:
kfree(mci-dimms);
}
if (mci-csrows) {
-   for (chn = 0; chn  tot_channels; chn++) {
-   csr = mci-csrows[chn];
+   for (row = 0; row  tot_csrows; row++) {
+   csr = mci-csrows[row];
if (csr) {
-   for (chn = 0; chn  tot_channels; chn++)
-   kfree(csr-channels[chn]);
+   if (csr-channels) {
+   for (chn = 0; chn  tot_channels; chn++)
+   kfree(csr-channels[chn]);
+   kfree(csr-channels);
+   }
kfree(csr);
}
-   kfree(mci-csrows[i]);
}
kfree(mci-csrows);
}
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] edac: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.

2012-08-18 Thread Shaun Ruffell
edac_mc_free() may need to deallocate any memory associated with struct
mem_ctl_info directly if the structure was never registered with sysfs in
edac_mc_add_mc(). This moves the error handling code from edac_mc_alloc() into a
dedicated function to be called by edac_mc_free() as well if necessary.

This resolves a NULL pointer dereference from the following code path first
introduced in 3.6-rc1:

  EDAC MC: Ver: 3.0.0
  EDAC DEBUG: edac_mc_sysfs_init: device mc created
  EDAC DEBUG: e7xxx_init_one:
  EDAC DEBUG: e7xxx_probe1: mci
  EDAC DEBUG: edac_mc_alloc: errcount layer 0 size 8
  EDAC DEBUG: edac_mc_alloc: errcount layer 1 size 16
  EDAC DEBUG: edac_mc_alloc: allocating 48 error counters
  EDAC DEBUG: edac_mc_alloc: allocating 1068 bytes for mci data (16 ranks, 16 
csrows/channels)
  EDAC DEBUG: e7xxx_probe1: init mci
  EDAC DEBUG: e7xxx_probe1: init pvt
  EDAC e7xxx: error reporting device not found:vendor 8086 device 0x2541 
(broken BIOS?)
  EDAC DEBUG: edac_mc_free:
  Floppy drive(s): fd0 is 1.44M
  EDAC DEBUG: edac_unregister_sysfs: Unregistering device (null)

Signed-off-by: Shaun Ruffell sruff...@digium.com
---
 drivers/edac/edac_mc.c | 59 +-
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 9037ffa..a58facc 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -199,6 +199,36 @@ void *edac_align_ptr(void **p, unsigned size, int n_elems)
return (void *)(((unsigned long)ptr) + align - r);
 }
 
+static void _edac_mc_free(struct mem_ctl_info *mci)
+{
+   int i, chn, row;
+   struct csrow_info *csr;
+   const unsigned int tot_dimms = mci-tot_dimms;
+   const unsigned int tot_channels = mci-num_cschannel;
+   const unsigned int tot_csrows = mci-nr_csrows;
+
+   if (mci-dimms) {
+   for (i = 0; i  tot_dimms; i++)
+   kfree(mci-dimms[i]);
+   kfree(mci-dimms);
+   }
+   if (mci-csrows) {
+   for (row = 0; row  tot_csrows; row++) {
+   csr = mci-csrows[row];
+   if (csr) {
+   if (csr-channels) {
+   for (chn = 0; chn  tot_channels; chn++)
+   kfree(csr-channels[chn]);
+   kfree(csr-channels);
+   }
+   kfree(csr);
+   }
+   }
+   kfree(mci-csrows);
+   }
+   kfree(mci);
+}
+
 /**
  * edac_mc_alloc: Allocate and partially fill a struct mem_ctl_info structure
  * @mc_num:Memory controller number
@@ -413,26 +443,7 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
return mci;
 
 error:
-   if (mci-dimms) {
-   for (i = 0; i  tot_dimms; i++)
-   kfree(mci-dimms[i]);
-   kfree(mci-dimms);
-   }
-   if (mci-csrows) {
-   for (row = 0; row  tot_csrows; row++) {
-   csr = mci-csrows[row];
-   if (csr) {
-   if (csr-channels) {
-   for (chn = 0; chn  tot_channels; chn++)
-   kfree(csr-channels[chn]);
-   kfree(csr-channels);
-   }
-   kfree(csr);
-   }
-   }
-   kfree(mci-csrows);
-   }
-   kfree(mci);
+   _edac_mc_free(mci);
 
return NULL;
 }
@@ -447,6 +458,14 @@ void edac_mc_free(struct mem_ctl_info *mci)
 {
edac_dbg(1, \n);
 
+   /* If we're not yet registered with sysfs free only what was allocated
+* in edac_mc_alloc().
+*/
+   if (!mci-bus.name) {
+   _edac_mc_free(mci);
+   return;
+   }
+
/* the mci instance is freed here, when the sysfs object is dropped */
edac_unregister_sysfs(mci);
 }
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] Fix edac_mc crash in e7xxx_edac error path.

2012-08-18 Thread Shaun Ruffell
)

There are probably better ways to accomplish what the following patches are
doing but I thought I would send along what I had if only to motivate any
discussion. I also have resent Fengguang Wu's patch in this series since I found
that it was required as well.

Shaun Ruffell (2):
  edac: Remove invalid kfree in error path of edac_mc_allocate().
  edac: edac_mc_free() cannot assume mem_ctl_info is registered in
sysfs.

 drivers/edac/edac_mc.c | 60 +-
 1 file changed, 35 insertions(+), 25 deletions(-)

-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] edac: edac_mc no longer deals with kobjects directly.

2012-08-18 Thread Shaun Ruffell
There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.

Signed-off-by: Shaun Ruffell sruff...@digium.com
---
 drivers/edac/edac_mc.c | 7 ---
 include/linux/edac.h   | 1 -
 2 files changed, 8 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index a58facc..65c59b1 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -433,13 +433,6 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
 
mci-op_state = OP_ALLOC;
 
-   /* at this point, the root kobj is valid, and in order to
-* 'free' the object, then the function:
-*  edac_mc_unregister_sysfs_main_kobj() must be called
-* which will perform kobj unregistration and the actual free
-* will occur during the kobject callback operation
-*/
-
return mci;
 
 error:
diff --git a/include/linux/edac.h b/include/linux/edac.h
index bab9f84..aeddb3f 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -14,7 +14,6 @@
 
 #include linux/atomic.h
 #include linux/device.h
-#include linux/kobject.h
 #include linux/completion.h
 #include linux/workqueue.h
 #include linux/debugfs.h
-- 
1.7.11.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/