Re: [PATCH v3] vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-02-05 Thread Pearson, Greg
On 02/05/2014 06:39 AM, Vivek Goyal wrote:
> On Tue, Feb 04, 2014 at 04:25:52PM -0700, Greg Pearson wrote:
>
> [..]
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index 2ca7ba0..88d4585 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -468,17 +468,24 @@ static int __init update_note_header_size_elf64(const 
>> Elf64_Ehdr *ehdr_ptr)
>>  return rc;
>>  }
>>  nhdr_ptr = notes_section;
>> -while (real_sz < max_sz) {
>> -if (nhdr_ptr->n_namesz == 0)
>> -break;
>> +while (nhdr_ptr->n_namesz != 0) {
>>  sz = sizeof(Elf64_Nhdr) +
>>  ((nhdr_ptr->n_namesz + 3) & ~3) +
>>  ((nhdr_ptr->n_descsz + 3) & ~3);
>> +if ((real_sz + sz) > max_sz) {
>> +pr_warn("Warning: Exceeded p_memsz, dropping 
>> PT_NOTE entry n_namesz=0x%x, n_descsz=0x%x\n",
>> +nhdr_ptr->n_namesz, nhdr_ptr->n_descsz);
> You will need line break in pr_warn(). Too long a line. Limit it 80
> columns per line.

The checkpatch.pl script issues a warning when you break quoted strings, 
I have no personal preference. Just let me know if you want me to ignore 
the checkpatch warning and conform to the 80 column per line limit.

Also, I'm still debugging the reason why the corrupt entries show up, 
I'll continue to work on that.

Thanks

--
Greg


>
>> +break;
>> +}
>>  real_sz += sz;
>>  nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
>>  }
>>  kfree(notes_section);
>>  phdr_ptr->p_memsz = real_sz;
>> +if (real_sz == 0) {
>> +pr_warn("Warning: Zero PT_NOTE entries found\n");
>> +return -EINVAL;
> Given the fact that this is the first time I have heard about a PT_NOTE
> being corrup, I will be fine with this patch too. If one encounters
> an empty PT_NOTE, error out and don't create vmcore.
>
> So please repost this patch with line lenght fixed.
>
> Thanks
> Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-02-05 Thread Pearson, Greg
On 02/05/2014 06:39 AM, Vivek Goyal wrote:
 On Tue, Feb 04, 2014 at 04:25:52PM -0700, Greg Pearson wrote:

 [..]
 diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
 index 2ca7ba0..88d4585 100644
 --- a/fs/proc/vmcore.c
 +++ b/fs/proc/vmcore.c
 @@ -468,17 +468,24 @@ static int __init update_note_header_size_elf64(const 
 Elf64_Ehdr *ehdr_ptr)
  return rc;
  }
  nhdr_ptr = notes_section;
 -while (real_sz  max_sz) {
 -if (nhdr_ptr-n_namesz == 0)
 -break;
 +while (nhdr_ptr-n_namesz != 0) {
  sz = sizeof(Elf64_Nhdr) +
  ((nhdr_ptr-n_namesz + 3)  ~3) +
  ((nhdr_ptr-n_descsz + 3)  ~3);
 +if ((real_sz + sz)  max_sz) {
 +pr_warn(Warning: Exceeded p_memsz, dropping 
 PT_NOTE entry n_namesz=0x%x, n_descsz=0x%x\n,
 +nhdr_ptr-n_namesz, nhdr_ptr-n_descsz);
 You will need line break in pr_warn(). Too long a line. Limit it 80
 columns per line.

The checkpatch.pl script issues a warning when you break quoted strings, 
I have no personal preference. Just let me know if you want me to ignore 
the checkpatch warning and conform to the 80 column per line limit.

Also, I'm still debugging the reason why the corrupt entries show up, 
I'll continue to work on that.

Thanks

--
Greg



 +break;
 +}
  real_sz += sz;
  nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
  }
  kfree(notes_section);
  phdr_ptr-p_memsz = real_sz;
 +if (real_sz == 0) {
 +pr_warn(Warning: Zero PT_NOTE entries found\n);
 +return -EINVAL;
 Given the fact that this is the first time I have heard about a PT_NOTE
 being corrup, I will be fine with this patch too. If one encounters
 an empty PT_NOTE, error out and don't create vmcore.

 So please repost this patch with line lenght fixed.

 Thanks
 Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-02-03 Thread Pearson, Greg
On 02/03/2014 02:38 PM, Vivek Goyal wrote:
> On Mon, Feb 03, 2014 at 01:18:38PM -0700, Greg Pearson wrote:
>
> [..]
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index 2ca7ba0..051c803 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -468,12 +468,14 @@ static int __init update_note_header_size_elf64(const 
>> Elf64_Ehdr *ehdr_ptr)
>>  return rc;
>>  }
>>  nhdr_ptr = notes_section;
>> -while (real_sz < max_sz) {
>> -if (nhdr_ptr->n_namesz == 0)
>> -break;
>> +while (nhdr_ptr->n_namesz != 0) {
>>  sz = sizeof(Elf64_Nhdr) +
>>  ((nhdr_ptr->n_namesz + 3) & ~3) +
>>  ((nhdr_ptr->n_descsz + 3) & ~3);
>> +if ((real_sz + sz) > max_sz) {
>> +pr_warn("Warning: dropping PT_NOTE entry\n");
>> +break;
>> +}
> Hi Greg,
>
> Couple of minor nits.
>
> I think it is a good idea to give more data in warning which tells why
> are we dropping a note entry. May be something like.
>
> "Warning: Total note entry size exceeded PT_NOTE memsz. Dropping PT_NOTE 
> entry, n_namesz=<> n_descsz=<>".

Sounds good I'll add more information to the pr_warn().

>
> Secondly, if there is only on note entry in a PT_NOTE header and we drop
> it, then that PT_NOTE header is empty and needs to be cleaned up.
>
> I think you will have to modify get_note_number_and_size_elf64() and
> other relevant functions which are not expecting ->p_memsz=0.

What about treating this as an error condition and adding a check to the 
update_note_header_size_elf32()/update_note_header_size_elf64() routines 
that would return a failure, something like the following at the end of 
the routine:

 if (real_sz == 0)
 return -EINVAL

I could also add a pr_warn() with a message indicating no PT_NOTE 
entries were found.

This seems like a lower risk change to handle the case I'm currently 
seeing as opposed to changing the code to handle a p_memsz==0.

Thoughts?

--
Greg





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-02-03 Thread Pearson, Greg
On 02/03/2014 08:47 AM, Vivek Goyal wrote:
> On Sun, Feb 02, 2014 at 02:25:25PM -0800, Eric W. Biederman wrote:
>> Andrew Morton  writes:
>>
>>> On Sat, 1 Feb 2014 01:07:29 + "Pearson, Greg"  
>>> wrote:
>>>
>>>> As far as I know the only consequence of dropping a PT_NOTE entry is
>>>> that it would not be available in the crash dump for use in debugging.
>>>> I'm not sure how important this data might be for triage. I'm guessing
>>>> that in cases where one of these strange PT_NOTE entries shows up with a
>>>> size that causes an overflow it probably isn't even a real PT_NOTE entry
>>>> so dropping it won't matter, but that's a guess at this point since I'm
>>>> still trying to figure out how the bogus entries were created.
>>> Can we detect the crazy-huge notes, skip them and then proceed with
>>> the following sanely-sized ones?
>> The only way we can have following sanely-sized notes is if they are in
>> a separate note segment (one of our extensions for kdump and
>> /proc/vmcore merges them together).
> This processing is happening before we have merged ELF notes. Previous
> kernel/kexec-tools prepared per cpu PT_NOTE type ELF note. One for
> each cpu. And by default it prepares only one ELF note per PT_NOTE. So
> there should not be more notes in the same PT_NOTE.
>
> Also even if there are, n_namesz and n_descsz values seem so high that
> after skipping these nothing valid should be after that.
>
> So I will not be too worried about skipping seemingly corrupted ELf
> notes. I think giving a warning makes sense though. Is somebody
> overwriting the memory area in kenrel reserved for per cpu PT_NOTE.

I haven't figured out the cause of the strange second PT_NOTE entries 
yet, but I suspect some type of memory corruption.

I'll re-cut the patch and add a pr_warn() when we drop an entry.

--
Greg

>
> Thanks
> Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-02-03 Thread Pearson, Greg
On 02/03/2014 08:47 AM, Vivek Goyal wrote:
 On Sun, Feb 02, 2014 at 02:25:25PM -0800, Eric W. Biederman wrote:
 Andrew Morton a...@linux-foundation.org writes:

 On Sat, 1 Feb 2014 01:07:29 + Pearson, Greg greg.pear...@hp.com 
 wrote:

 As far as I know the only consequence of dropping a PT_NOTE entry is
 that it would not be available in the crash dump for use in debugging.
 I'm not sure how important this data might be for triage. I'm guessing
 that in cases where one of these strange PT_NOTE entries shows up with a
 size that causes an overflow it probably isn't even a real PT_NOTE entry
 so dropping it won't matter, but that's a guess at this point since I'm
 still trying to figure out how the bogus entries were created.
 Can we detect the crazy-huge notes, skip them and then proceed with
 the following sanely-sized ones?
 The only way we can have following sanely-sized notes is if they are in
 a separate note segment (one of our extensions for kdump and
 /proc/vmcore merges them together).
 This processing is happening before we have merged ELF notes. Previous
 kernel/kexec-tools prepared per cpu PT_NOTE type ELF note. One for
 each cpu. And by default it prepares only one ELF note per PT_NOTE. So
 there should not be more notes in the same PT_NOTE.

 Also even if there are, n_namesz and n_descsz values seem so high that
 after skipping these nothing valid should be after that.

 So I will not be too worried about skipping seemingly corrupted ELf
 notes. I think giving a warning makes sense though. Is somebody
 overwriting the memory area in kenrel reserved for per cpu PT_NOTE.

I haven't figured out the cause of the strange second PT_NOTE entries 
yet, but I suspect some type of memory corruption.

I'll re-cut the patch and add a pr_warn() when we drop an entry.

--
Greg


 Thanks
 Vivek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-02-03 Thread Pearson, Greg
On 02/03/2014 02:38 PM, Vivek Goyal wrote:
 On Mon, Feb 03, 2014 at 01:18:38PM -0700, Greg Pearson wrote:

 [..]
 diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
 index 2ca7ba0..051c803 100644
 --- a/fs/proc/vmcore.c
 +++ b/fs/proc/vmcore.c
 @@ -468,12 +468,14 @@ static int __init update_note_header_size_elf64(const 
 Elf64_Ehdr *ehdr_ptr)
  return rc;
  }
  nhdr_ptr = notes_section;
 -while (real_sz  max_sz) {
 -if (nhdr_ptr-n_namesz == 0)
 -break;
 +while (nhdr_ptr-n_namesz != 0) {
  sz = sizeof(Elf64_Nhdr) +
  ((nhdr_ptr-n_namesz + 3)  ~3) +
  ((nhdr_ptr-n_descsz + 3)  ~3);
 +if ((real_sz + sz)  max_sz) {
 +pr_warn(Warning: dropping PT_NOTE entry\n);
 +break;
 +}
 Hi Greg,

 Couple of minor nits.

 I think it is a good idea to give more data in warning which tells why
 are we dropping a note entry. May be something like.

 Warning: Total note entry size exceeded PT_NOTE memsz. Dropping PT_NOTE 
 entry, n_namesz= n_descsz=.

Sounds good I'll add more information to the pr_warn().


 Secondly, if there is only on note entry in a PT_NOTE header and we drop
 it, then that PT_NOTE header is empty and needs to be cleaned up.

 I think you will have to modify get_note_number_and_size_elf64() and
 other relevant functions which are not expecting -p_memsz=0.

What about treating this as an error condition and adding a check to the 
update_note_header_size_elf32()/update_note_header_size_elf64() routines 
that would return a failure, something like the following at the end of 
the routine:

 if (real_sz == 0)
 return -EINVAL

I could also add a pr_warn() with a message indicating no PT_NOTE 
entries were found.

This seems like a lower risk change to handle the case I'm currently 
seeing as opposed to changing the code to handle a p_memsz==0.

Thoughts?

--
Greg





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-01-31 Thread Pearson, Greg
On 01/31/2014 04:16 PM, Andrew Morton wrote:
> On Fri, 31 Jan 2014 16:06:06 -0700 Greg Pearson  wrote:
>
>> Currently, update_note_header_size_elf64() and
>> update_note_header_size_elf32() will add the size
>> of a PT_NOTE entry to real_sz even if that causes real_sz
>> to exceeds max_sz. This patch corrects the while loop logic
>> in those routines to ensure that does not happen.
>>
>> ...
>>
>> Occasionally, a second entry is encountered with very
>> large n_namesz and n_descsz sizes:
>>
>>n_namesz = 0x8008
>>n_descsz = 0x510ae163
>>n_type   = 0x8008
> Hang on.
>
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -468,12 +468,13 @@ static int __init update_note_header_size_elf64(const 
>> Elf64_Ehdr *ehdr_ptr)
>>  return rc;
>>  }
>>  nhdr_ptr = notes_section;
>> -while (real_sz < max_sz) {
>> -if (nhdr_ptr->n_namesz == 0)
>> -break;
>> +while (nhdr_ptr->n_namesz != 0) {
>>  sz = sizeof(Elf64_Nhdr) +
>>  ((nhdr_ptr->n_namesz + 3) & ~3) +
>>  ((nhdr_ptr->n_descsz + 3) & ~3);
>> +/* Silently drop further PT_NOTE entries */
>> +if ((real_sz + sz) > max_sz)
>> +break;
> If we are encountering notes with these crazy sizes then what is
> preventing (real_sx + sz) from wrapping through zero, which would
> defeat this check?
>
>

As far as I know the only consequence of dropping a PT_NOTE entry is 
that it would not be available in the crash dump for use in debugging. 
I'm not sure how important this data might be for triage. I'm guessing 
that in cases where one of these strange PT_NOTE entries shows up with a 
size that causes an overflow it probably isn't even a real PT_NOTE entry 
so dropping it won't matter, but that's a guess at this point since I'm 
still trying to figure out how the bogus entries were created.

I think the wrap case is handled ok by the current code since "real_sz" 
and "sz" are both declared as u64, while the "n_namesz" and "n_descsz" 
fields are declared as u32. This is true for both the elf32 and elf64 case.

Adding a pr_warn() is probably a good idea, then I can remove the comment.

--
Greg




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] vmcore: prevent PT_NOTE p_memsz overflow during header update

2014-01-31 Thread Pearson, Greg
On 01/31/2014 04:16 PM, Andrew Morton wrote:
 On Fri, 31 Jan 2014 16:06:06 -0700 Greg Pearson greg.pear...@hp.com wrote:

 Currently, update_note_header_size_elf64() and
 update_note_header_size_elf32() will add the size
 of a PT_NOTE entry to real_sz even if that causes real_sz
 to exceeds max_sz. This patch corrects the while loop logic
 in those routines to ensure that does not happen.

 ...

 Occasionally, a second entry is encountered with very
 large n_namesz and n_descsz sizes:

n_namesz = 0x8008
n_descsz = 0x510ae163
n_type   = 0x8008
 Hang on.

 --- a/fs/proc/vmcore.c
 +++ b/fs/proc/vmcore.c
 @@ -468,12 +468,13 @@ static int __init update_note_header_size_elf64(const 
 Elf64_Ehdr *ehdr_ptr)
  return rc;
  }
  nhdr_ptr = notes_section;
 -while (real_sz  max_sz) {
 -if (nhdr_ptr-n_namesz == 0)
 -break;
 +while (nhdr_ptr-n_namesz != 0) {
  sz = sizeof(Elf64_Nhdr) +
  ((nhdr_ptr-n_namesz + 3)  ~3) +
  ((nhdr_ptr-n_descsz + 3)  ~3);
 +/* Silently drop further PT_NOTE entries */
 +if ((real_sz + sz)  max_sz)
 +break;
 If we are encountering notes with these crazy sizes then what is
 preventing (real_sx + sz) from wrapping through zero, which would
 defeat this check?



As far as I know the only consequence of dropping a PT_NOTE entry is 
that it would not be available in the crash dump for use in debugging. 
I'm not sure how important this data might be for triage. I'm guessing 
that in cases where one of these strange PT_NOTE entries shows up with a 
size that causes an overflow it probably isn't even a real PT_NOTE entry 
so dropping it won't matter, but that's a guess at this point since I'm 
still trying to figure out how the bogus entries were created.

I think the wrap case is handled ok by the current code since real_sz 
and sz are both declared as u64, while the n_namesz and n_descsz 
fields are declared as u32. This is true for both the elf32 and elf64 case.

Adding a pr_warn() is probably a good idea, then I can remove the comment.

--
Greg




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] PCI/AER: Provide reset_link for firmware first root port

2013-05-29 Thread Pearson, Greg
On 05/28/2013 12:48 PM, Betty Dall wrote:
> The firmware first path does not install the aerdrv root port
> service driver, so the firmware first root port does not have
> a reset_link callback. When a firmware first root port does not have
> a reset_link callback, use a new default reset_link similar to what
> we already do for the default_downstream_reset_link(). The firmware
> first default reset_link brings the root port out of SBR if firmware
> left it in SBR.
>
> Signed-off-by: Betty Dall 
> ---
>
>   drivers/pci/pcie/aer/aerdrv_core.c |   37 
> 
>   1 files changed, 37 insertions(+), 0 deletions(-)
>
>
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
> b/drivers/pci/pcie/aer/aerdrv_core.c
> index 8ec8b4f..6862fe3 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -413,6 +413,40 @@ static pci_ers_result_t 
> default_downstream_reset_link(struct pci_dev *dev)
>   return PCI_ERS_RESULT_RECOVERED;
>   }
>   
> +/**
> + * default_ff_root_port_reset_link - default reset function for firmware
> + *   first Root Port
> + * @dev: pointer to root port's pci_dev data structure
> + *
> + * Invoked when performing link reset at Root Port w/ no aer driver.
> + * This happens through the firmware first path. Firmware may leave
> + * the Root Port in SBR and it is the OS responsiblity to bring it out
> + * of SBR.
> + */
> +static pci_ers_result_t default_ff_root_port_reset_link(struct pci_dev *dev)
> +{
> + u16 p2p_ctrl;
> +
> + /* Read Secondary Bus Reset */
> + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, _ctrl);
> + p2p_ctrl |= PCI_BRIDGE_CTL_BUS_RESET;

Remove "p2p_ctrl |= PCI_BRIDGE_CTL_BUS_RESET;" otherwise the following 
conditional is always taken.

--
Greg

> +
> + /* De-assert Secondary Bus Reset, if it is set */
> + if (p2p_ctrl & PCI_BRIDGE_CTL_BUS_RESET) {
> + p2p_ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
> + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl);
> +
> + /*
> +  * System software must wait for at least 100ms from the end
> +  * of a reset of one or more device before it is permitted
> +  * to issue Configuration Requests to those devices.
> +  */
> + msleep(200);
> + dev_dbg(>dev, "Root Port link has been reset\n");
> + }
> + return PCI_ERS_RESULT_RECOVERED;
> +}
> +
>   static int find_aer_service_iter(struct device *device, void *data)
>   {
>   struct pcie_port_service_driver *service_driver, **drv;
> @@ -460,6 +494,9 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
>   status = driver->reset_link(udev);
>   } else if (pci_pcie_type(udev) == PCI_EXP_TYPE_DOWNSTREAM) {
>   status = default_downstream_reset_link(udev);
> + } else if (pci_pcie_type(udev) == PCI_EXP_TYPE_ROOT_PORT &&
> + pcie_aer_get_firmware_first(udev)) {
> + status = default_ff_root_port_reset_link(udev);
>   } else {
>   dev_printk(KERN_DEBUG, >dev,
>   "no link-reset support at upstream device %s\n",
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] ACPI/APEI: Force fatal AER severity when bus has been reset

2013-05-29 Thread Pearson, Greg
On 05/28/2013 12:48 PM, Betty Dall wrote:
> The CPER error record has a reset bit that indicates that the platform
> has reset the bus. The reset bit can be set for any severity error
> including recoverable.  From the AER code path's perspective,
> any error is fatal if the bus has been reset. This patch upgrades the
> severity of the AER recovery to AER_FATAL whenever the CPER error record
> indicates that the bus has been reset.
>
> Signed-off-by: Betty Dall 
> ---
>
>   drivers/acpi/apei/ghes.c |   21 -
>   1 files changed, 20 insertions(+), 1 deletions(-)
>
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d668a8a..4a42afc 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -451,7 +451,26 @@ static void ghes_do_proc(struct ghes *ghes,
>   int aer_severity;
>   devfn = PCI_DEVFN(pcie_err->device_id.device,
> pcie_err->device_id.function);
> - aer_severity = cper_severity_to_aer(sev);
> + /*
> +  * Some Firmware First implementations
> +  * put the device in SBR to contain
> +  * the error. This is indicated by the
> +  * CPER Section Decriptor Flags reset

Nit Pick, change "Decriptor" to "Descriptor".

--
Greg

> +  * bit which means the component must
> +  * be re-initialized or re-enabled
> +  * prior to use. Promoting the AER
> +  * serverity to FATAL will cause the
> +  * AER code to link_reset and allow
> +  * drivers to reprogram their cards.
> +  */
> + if (gdata->flags & CPER_SEC_RESET)
> + aer_severity = cper_severity_to_aer(
> + CPER_SEV_FATAL);
> + else
> + aer_severity =
> + cper_severity_to_aer(sev);
> +
> +
>   aer_recover_queue(pcie_err->device_id.segment,
> pcie_err->device_id.bus,
> devfn, aer_severity);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] ACPI/APEI: Force fatal AER severity when bus has been reset

2013-05-29 Thread Pearson, Greg
On 05/28/2013 12:48 PM, Betty Dall wrote:
 The CPER error record has a reset bit that indicates that the platform
 has reset the bus. The reset bit can be set for any severity error
 including recoverable.  From the AER code path's perspective,
 any error is fatal if the bus has been reset. This patch upgrades the
 severity of the AER recovery to AER_FATAL whenever the CPER error record
 indicates that the bus has been reset.

 Signed-off-by: Betty Dall betty.d...@hp.com
 ---

   drivers/acpi/apei/ghes.c |   21 -
   1 files changed, 20 insertions(+), 1 deletions(-)


 diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
 index d668a8a..4a42afc 100644
 --- a/drivers/acpi/apei/ghes.c
 +++ b/drivers/acpi/apei/ghes.c
 @@ -451,7 +451,26 @@ static void ghes_do_proc(struct ghes *ghes,
   int aer_severity;
   devfn = PCI_DEVFN(pcie_err-device_id.device,
 pcie_err-device_id.function);
 - aer_severity = cper_severity_to_aer(sev);
 + /*
 +  * Some Firmware First implementations
 +  * put the device in SBR to contain
 +  * the error. This is indicated by the
 +  * CPER Section Decriptor Flags reset

Nit Pick, change Decriptor to Descriptor.

--
Greg

 +  * bit which means the component must
 +  * be re-initialized or re-enabled
 +  * prior to use. Promoting the AER
 +  * serverity to FATAL will cause the
 +  * AER code to link_reset and allow
 +  * drivers to reprogram their cards.
 +  */
 + if (gdata-flags  CPER_SEC_RESET)
 + aer_severity = cper_severity_to_aer(
 + CPER_SEV_FATAL);
 + else
 + aer_severity =
 + cper_severity_to_aer(sev);
 +
 +
   aer_recover_queue(pcie_err-device_id.segment,
 pcie_err-device_id.bus,
 devfn, aer_severity);
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] PCI/AER: Provide reset_link for firmware first root port

2013-05-29 Thread Pearson, Greg
On 05/28/2013 12:48 PM, Betty Dall wrote:
 The firmware first path does not install the aerdrv root port
 service driver, so the firmware first root port does not have
 a reset_link callback. When a firmware first root port does not have
 a reset_link callback, use a new default reset_link similar to what
 we already do for the default_downstream_reset_link(). The firmware
 first default reset_link brings the root port out of SBR if firmware
 left it in SBR.

 Signed-off-by: Betty Dall betty.d...@hp.com
 ---

   drivers/pci/pcie/aer/aerdrv_core.c |   37 
 
   1 files changed, 37 insertions(+), 0 deletions(-)


 diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
 b/drivers/pci/pcie/aer/aerdrv_core.c
 index 8ec8b4f..6862fe3 100644
 --- a/drivers/pci/pcie/aer/aerdrv_core.c
 +++ b/drivers/pci/pcie/aer/aerdrv_core.c
 @@ -413,6 +413,40 @@ static pci_ers_result_t 
 default_downstream_reset_link(struct pci_dev *dev)
   return PCI_ERS_RESULT_RECOVERED;
   }
   
 +/**
 + * default_ff_root_port_reset_link - default reset function for firmware
 + *   first Root Port
 + * @dev: pointer to root port's pci_dev data structure
 + *
 + * Invoked when performing link reset at Root Port w/ no aer driver.
 + * This happens through the firmware first path. Firmware may leave
 + * the Root Port in SBR and it is the OS responsiblity to bring it out
 + * of SBR.
 + */
 +static pci_ers_result_t default_ff_root_port_reset_link(struct pci_dev *dev)
 +{
 + u16 p2p_ctrl;
 +
 + /* Read Secondary Bus Reset */
 + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl);
 + p2p_ctrl |= PCI_BRIDGE_CTL_BUS_RESET;

Remove p2p_ctrl |= PCI_BRIDGE_CTL_BUS_RESET; otherwise the following 
conditional is always taken.

--
Greg

 +
 + /* De-assert Secondary Bus Reset, if it is set */
 + if (p2p_ctrl  PCI_BRIDGE_CTL_BUS_RESET) {
 + p2p_ctrl = ~PCI_BRIDGE_CTL_BUS_RESET;
 + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl);
 +
 + /*
 +  * System software must wait for at least 100ms from the end
 +  * of a reset of one or more device before it is permitted
 +  * to issue Configuration Requests to those devices.
 +  */
 + msleep(200);
 + dev_dbg(dev-dev, Root Port link has been reset\n);
 + }
 + return PCI_ERS_RESULT_RECOVERED;
 +}
 +
   static int find_aer_service_iter(struct device *device, void *data)
   {
   struct pcie_port_service_driver *service_driver, **drv;
 @@ -460,6 +494,9 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
   status = driver-reset_link(udev);
   } else if (pci_pcie_type(udev) == PCI_EXP_TYPE_DOWNSTREAM) {
   status = default_downstream_reset_link(udev);
 + } else if (pci_pcie_type(udev) == PCI_EXP_TYPE_ROOT_PORT 
 + pcie_aer_get_firmware_first(udev)) {
 + status = default_ff_root_port_reset_link(udev);
   } else {
   dev_printk(KERN_DEBUG, dev-dev,
   no link-reset support at upstream device %s\n,
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/