Re: [PATCH] lpfc: Avoid to disable pci_dev twice
On 08/28/2014 02:34 AM, James Smart wrote: Mike, Can you confirm - the "nulls" this patch correct are because the probe_one and error_detect threads are running concurrently, thus battling ? If so - this fix looks insufficient and we should rework it. Yes, it is. My patch is just a workaround for this bug. Q: why are they allowed to run concurrently ? I could see this solved at the platform level to let probe_one finish before error_detect is called (and therefore stating error_detect only makes sense to call if probe_one was successful). It's also a much driver-friendly solution. I could see other drivers have much the same issue with concurrency and data structure teardown - and if locks aren't allowed in the error-detect path... it's not good. I agree with you on this point, platform solution is much better. So maybe use a lock or a flag to show it is in such stat, this maybe also happens when driver is in remove stat. Thanks, Mike -- james s On 7/31/2014 10:16 PM, Mike Qiu wrote: On 07/17/2014 02:32 PM, Mike Qiu wrote: Hi, all How about this patch ? Any idea ? In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; +uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } +/* Set the probe flag */ +phba->probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { +if (phba) +return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2710 PCI channel disable preparing for reset\n"); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); -pci_disable_device(phba->pcidev); +if (phba->probe_done && phba->pcidev) +pci_disable_device(phba->pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } +/* Set probe_done flag */ +phba->probe_done = 1; + /* Log the current active interrupt mode */ phba->intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { +if (!phba) +return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2826 PCI channel disable preparing for reset\n"); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); -pci_disable_device(phba->pcidev); + +if (phba->probe_done && phba->pcidev) +pci_disable_devic
Re: [PATCH] lpfc: Avoid to disable pci_dev twice
On 08/28/2014 02:34 AM, James Smart wrote: Mike, Can you confirm - the nulls this patch correct are because the probe_one and error_detect threads are running concurrently, thus battling ? If so - this fix looks insufficient and we should rework it. Yes, it is. My patch is just a workaround for this bug. Q: why are they allowed to run concurrently ? I could see this solved at the platform level to let probe_one finish before error_detect is called (and therefore stating error_detect only makes sense to call if probe_one was successful). It's also a much driver-friendly solution. I could see other drivers have much the same issue with concurrency and data structure teardown - and if locks aren't allowed in the error-detect path... it's not good. I agree with you on this point, platform solution is much better. So maybe use a lock or a flag to show it is in such stat, this maybe also happens when driver is in remove stat. Thanks, Mike -- james s On 7/31/2014 10:16 PM, Mike Qiu wrote: On 07/17/2014 02:32 PM, Mike Qiu wrote: Hi, all How about this patch ? Any idea ? In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 SF,HV,EE,ME,IR,DR,RI CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; +uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } +/* Set the probe flag */ +phba-probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { +if (phba) +return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, 2710 PCI channel disable preparing for reset\n); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); -pci_disable_device(phba-pcidev); +if (phba-probe_done phba-pcidev) +pci_disable_device(phba-pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } +/* Set probe_done flag */ +phba-probe_done = 1; + /* Log the current active interrupt mode */ phba-intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { +if (!phba) +return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, 2826 PCI channel disable preparing for reset\n); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); -pci_disable_device(phba-pcidev); + +if (phba-probe_done phba-pcidev) +pci_disable_device(phba-pcidev); } /** @@ -10893,9 +10908,21 @@ static
Re: [PATCH] lpfc: Avoid to disable pci_dev twice
On 07/17/2014 02:32 PM, Mike Qiu wrote: Hi, all How about this patch ? Any idea ? In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; + uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } + /* Set the probe flag */ + phba->probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { + if (phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2710 PCI channel disable preparing for reset\n"); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); - pci_disable_device(phba->pcidev); + if (phba->probe_done && phba->pcidev) + pci_disable_device(phba->pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } + /* Set probe_done flag */ + phba->probe_done = 1; + /* Log the current active interrupt mode */ phba->intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { + if (!phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2826 PCI channel disable preparing for reset\n"); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); - pci_disable_device(phba->pcidev); + + if (phba->probe_done && phba->pcidev) + pci_disable_device(phba->pcidev); } /** @@ -10893,9 +10908,21 @@ static pci_ers_result_t lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba; + struct lpfc_hba *phba; pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; + if (!shost) + /* Run here means it may during probe state and +* Scsi_Host has not been created and We can do nothing +* in this state so call for hotplug*/ + return PCI_ERS_RESULT_NONE; + + phba = ((struct lpfc_vport *)shost->hostdata)->phba; + + if (!phba || !phba->probe_done) + /* Run here means it may during probe state */ + return PCI_ERS_RESULT_NONE; + switch (phba->pci_dev_grp) { case LPFC_PCI_DEV_LP:
Re: [PATCH] lpfc: Avoid to disable pci_dev twice
On 07/17/2014 02:32 PM, Mike Qiu wrote: Hi, all How about this patch ? Any idea ? In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 SF,HV,EE,ME,IR,DR,RI CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; + uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } + /* Set the probe flag */ + phba-probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { + if (phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, 2710 PCI channel disable preparing for reset\n); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); - pci_disable_device(phba-pcidev); + if (phba-probe_done phba-pcidev) + pci_disable_device(phba-pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } + /* Set probe_done flag */ + phba-probe_done = 1; + /* Log the current active interrupt mode */ phba-intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { + if (!phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, 2826 PCI channel disable preparing for reset\n); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); - pci_disable_device(phba-pcidev); + + if (phba-probe_done phba-pcidev) + pci_disable_device(phba-pcidev); } /** @@ -10893,9 +10908,21 @@ static pci_ers_result_t lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct lpfc_hba *phba = ((struct lpfc_vport *)shost-hostdata)-phba; + struct lpfc_hba *phba; pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; + if (!shost) + /* Run here means it may during probe state and +* Scsi_Host has not been created and We can do nothing +* in this state so call for hotplug*/ + return PCI_ERS_RESULT_NONE; + + phba = ((struct lpfc_vport *)shost-hostdata)-phba; + + if (!phba || !phba-probe_done) + /* Run here means it may during probe state */ + return PCI_ERS_RESULT_NONE; + switch (phba-pci_dev_grp) { case LPFC_PCI_DEV_LP: rc = lpfc_io_error_detected_s3(pdev, state); @@ -10930,9
Re: WARNING: at kernel/cpuset.c:1139
On 07/24/2014 08:27 AM, Li Zefan wrote: On 2014/7/23 23:12, Tejun Heo wrote: On Wed, Jul 23, 2014 at 10:50:29AM +0800, Mike Qiu wrote: commit 734d45130cb ("cpuset: update cs->effective_{cpus, mems} when config changes") introduce the below warning in my server. [ 35.652137] [ cut here ] [ 35.652141] WARNING: at kernel/cpuset.c:1139 Hah, can you reproduce it? If so, can you detail how? It's a typo. WARN_ON(!cgroup_on_dfl(cp->css.cgroup) && nodes_equal(cp->mems_allowed, cp->effective_mems)); should be WARN_ON(!cgroup_on_dfl(cp->css.cgroup) && !nodes_equal(cp->mems_allowed, cp->effective_mems)); Yes, it is. This warning disappeared after this patch. Reported-and-Tested-by: Mike Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: WARNING: at kernel/cpuset.c:1139
On 07/24/2014 08:27 AM, Li Zefan wrote: On 2014/7/23 23:12, Tejun Heo wrote: On Wed, Jul 23, 2014 at 10:50:29AM +0800, Mike Qiu wrote: commit 734d45130cb (cpuset: update cs-effective_{cpus, mems} when config changes) introduce the below warning in my server. [ 35.652137] [ cut here ] [ 35.652141] WARNING: at kernel/cpuset.c:1139 Hah, can you reproduce it? If so, can you detail how? It's a typo. WARN_ON(!cgroup_on_dfl(cp-css.cgroup) nodes_equal(cp-mems_allowed, cp-effective_mems)); should be WARN_ON(!cgroup_on_dfl(cp-css.cgroup) !nodes_equal(cp-mems_allowed, cp-effective_mems)); Yes, it is. This warning disappeared after this patch. Reported-and-Tested-by: Mike Qiu qiud...@linux.vnet.ibm.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
WARNING: at kernel/cpuset.c:1139
commit 734d45130cb ("cpuset: update cs->effective_{cpus, mems} when config changes") introduce the below warning in my server. [ 35.652137] [ cut here ] [ 35.652141] WARNING: at kernel/cpuset.c:1139 [ 35.652142] Modules linked in: ebtable_nat xt_CHECKSUM bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser iptable_mangle nf_conntrack_ipv4 rdma_cm nf_defrag_ipv4 xt_conntrack iw_cm nf_conntrack ib_cm ib_sa ib_mad ebtable_filter ib_core ebtables ip6_tables ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi e1000e ses ptp enclosure pps_core be2net shpchp vhost_net tun macvtap macvlan vhost kvm binfmt_misc uinput lpfc scsi_transport_fc ipr [ 35.652185] CPU: 36 PID: 1363 Comm: libvirtd Not tainted 3.16.0-rc5-next-20140721+ #93 [ 35.652187] task: c003b3443a00 ti: c003bb008000 task.ti: c003bb008000 [ 35.652189] NIP: c015ff38 LR: c015ff2c CTR: [ 35.652190] REGS: c003bb00b850 TRAP: 0700 Not tainted (3.16.0-rc5-next-20140721+) [ 35.652191] MSR: 90029032 CR: 24004824 XER: [ 35.652196] CFAR: c045f6cc SOFTE: 1 GPR00: c015ff04 c003bb00bad0 c145acf8 0001 GPR04: c003b3dae5d0 0100 GPR08: c003b3dae548 0004 0004 GPR12: 0001 cfeea200 008066727bd8 008066727a30 GPR16: 0080667dfa08 008066727a68 0080667279f8 0080667279d0 GPR20: c166acf8 c003b3dae530 c1311990 c003b3dae5d0 GPR24: c003b3dae530 c003b3dadc00 c003b3dae400 0001 GPR28: c1311968 c003b1873100 c003b3dae400 [ 35.652219] NIP [c015ff38] .cpuset_write_resmask+0x438/0x8c0 [ 35.652221] LR [c015ff2c] .cpuset_write_resmask+0x42c/0x8c0 [ 35.65] Call Trace: [ 35.652224] [c003bb00bad0] [c015ff04] .cpuset_write_resmask+0x404/0x8c0 (unreliable) [ 35.652227] [c003bb00bba0] [c0156f08] .cgroup_file_write+0x78/0x190 [ 35.652230] [c003bb00bc50] [c030c490] .kernfs_fop_write+0x150/0x1e0 [ 35.652233] [c003bb00bcf0] [c026b6d0] .vfs_write+0xe0/0x270 [ 35.652235] [c003bb00bd90] [c026be24] .SyS_write+0x64/0x110 [ 35.652238] [c003bb00be30] [c000a158] syscall_exit+0x0/0x98 [ 35.652239] Instruction dump: [ 35.652240] e93a 39549528 e9290118 7fa95000 419e0024 7ea3ab78 7ee4bb78 38a00100 [ 35.652243] 482ff719 6000 2fa3 419e0008 <0fe0> 7f43d378 4bfffa71 813a006c [ 35.652247] ---[ end trace f91b0c3aadfe71a6 ]--- Thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port
On 07/22/2014 10:51 PM, Mike Qiu wrote: In ata_sas_port_alloc(), it haven't initialized scsi_host field in ata_port, although scsi_host is in parameters list and unused in this function. With commit 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32") ata_qc_new() try to use scsi_host, while it is a NULL pointer for ipr IOA and error message shows below: ... While scsi_host is unused in ata_sas_port_alloc(), better to set it in ata_sas_port_alloc() instead of in driver. Signed-off-by: Mike Qiu --- drivers/ata/libata-scsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0586f66..a472b6f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host, ap->flags |= port_info->flags; ap->ops = port_info->port_ops; ap->cbl = ATA_CBL_SATA; + ap->scsi_host = shost; What about my patch itself, ata_sas_port_alloc() has "shot" in parameters list, but unused. Maybe better to set ap->scsi_host here, it is very convenient, and drivers, like ipr, may forget to set this field, otherwise "shot" need to be removed from parameters list I think. Thanks, Mike return ap; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port
I have tested with the ipr IOA, passed. Reviewed-and Tested-by: Mike Qiu On 07/23/2014 04:11 AM, Tejun Heo wrote: Hello, Can you please test the following patch? diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index d19c37a7..773f4e6 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4798,9 +4798,8 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words) static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) { struct ata_queued_cmd *qc = NULL; - unsigned int i, tag, max_queue; - - max_queue = ap->scsi_host->can_queue; + unsigned int max_queue = ap->host->n_tags; + unsigned int i, tag; /* no command while frozen */ if (unlikely(ap->pflags & ATA_PFLAG_FROZEN)) @@ -6094,6 +6093,7 @@ void ata_host_init(struct ata_host *host, struct device *dev, { spin_lock_init(>lock); mutex_init(>eh_mutex); + host->n_tags = ATA_MAX_QUEUE; host->dev = dev; host->ops = ops; } @@ -6179,11 +6179,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht) * The max queue supported by hardware must not be greater than * ATA_MAX_QUEUE. */ - if (sht->can_queue > ATA_MAX_QUEUE) { - dev_err(host->dev, "BUG: the hardware max queue is too large\n"); - WARN_ON(1); - return -EINVAL; - } + host->n_tags = clamp(sht->can_queue, 1, ATA_MAX_QUEUE); /* host must have been started */ if (!(host->flags & ATA_HOST_STARTED)) { diff --git a/include/linux/libata.h b/include/linux/libata.h index 5ab4e3a..92abb49 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -593,6 +593,7 @@ struct ata_host { struct device *dev; void __iomem * const*iomap; unsigned intn_ports; + unsigned intn_tags; /* nr of NCQ tags */ void*private_data; struct ata_port_operations *ops; unsigned long flags; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] libata: Fix scsi_host can_queue issue in ata_qc_new()
On 07/22/2014 11:42 PM, Tejun Heo wrote: Hello, (cc'ing Dan) On Tue, Jul 22, 2014 at 10:50:19AM -0400, Mike Qiu wrote: The can_queue in scsi_host can be more than ATA_MAX_QUEUE (32), for example, in ipr, it can be 100 or more. Also, some drivers, like ipr driver, haven't filled the field scsi_host in ata_port, and will lead a call trace, so add check for that. Signed-off-by: Mike Qiu --- drivers/ata/libata-core.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 259d879..a5b9c70 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4734,7 +4734,10 @@ static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) struct ata_queued_cmd *qc = NULL; unsigned int i, tag, max_queue; - max_queue = ap->scsi_host->can_queue; + if (ap->scsi_host && ap->scsi_host->can_queue <= ATA_MAX_QUEUE) + max_queue = ap->scsi_host->can_queue; + else + max_queue = ATA_MAX_QUEUE; /* no command while frozen */ if (unlikely(ap->pflags & ATA_PFLAG_FROZEN)) @@ -6109,16 +6112,6 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht) { int i, rc; - /* -* The max queue supported by hardware must not be greater than -* ATA_MAX_QUEUE. -*/ - if (sht->can_queue > ATA_MAX_QUEUE) { - dev_err(host->dev, "BUG: the hardware max queue is too large\n"); - WARN_ON(1); - return -EINVAL; - } - So, ummm, I really don't like that we're adding the conditionals to the hot path (yeah, its implementation is slow but still). Maybe we Yes, agree ..., not a good idea to do this... Thanks Mike need to store the chosen queue depth after all? Dan? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port
[+cc Wendy, Brian King, Stephen] On 07/22/2014 10:51 PM, Mike Qiu wrote: In ata_sas_port_alloc(), it haven't initialized scsi_host field in ata_port, although scsi_host is in parameters list and unused in this function. With commit 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32") ata_qc_new() try to use scsi_host, while it is a NULL pointer for ipr IOA and error message shows below: Unable to handle kernel paging request for data at address 0x0114 Faulting instruction address: 0xc05c2580 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c05c2580] .ata_qc_new_init+0x30/0x1f0 LR [c05c9384] .ata_scsi_translate+0x44/0x230 Call Trace: 0xc003ad332280 (unreliable) .ata_scsi_translate+0x44/0x230 .ipr_queuecommand+0x2e0/0x780 [ipr] .scsi_dispatch_cmd+0xec/0x400 .scsi_request_fn+0x52c/0x670 .__blk_run_queue+0x5c/0x80 .blk_execute_rq_nowait+0xf8/0x1c0 .blk_execute_rq+0x88/0x150 .scsi_execute+0xf0/0x1f0 .scsi_execute_req_flags+0xc4/0x170 .scsi_probe_and_add_lun+0x2d4/0xe00 .__scsi_scan_target+0x1a4/0x790 .scsi_scan_channel.part.3+0x80/0xc0 .scsi_scan_host_selected+0x1a0/0x240 .do_scan_async+0x30/0x210 .async_run_entry_fn+0x78/0x1c0 .process_one_work+0x1c4/0x4a0 .worker_thread+0x184/0x600 .kthread+0x10c/0x130 .ret_from_kernel_thread+0x58/0x7c While scsi_host is unused in ata_sas_port_alloc(), better to set it in ata_sas_port_alloc() instead of in driver. Signed-off-by: Mike Qiu --- drivers/ata/libata-scsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0586f66..a472b6f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host, ap->flags |= port_info->flags; ap->ops = port_info->port_ops; ap->cbl = ATA_CBL_SATA; + ap->scsi_host = shost; return ap; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port
In ata_sas_port_alloc(), it haven't initialized scsi_host field in ata_port, although scsi_host is in parameters list and unused in this function. With commit 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32") ata_qc_new() try to use scsi_host, while it is a NULL pointer for ipr IOA and error message shows below: Unable to handle kernel paging request for data at address 0x0114 Faulting instruction address: 0xc05c2580 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c05c2580] .ata_qc_new_init+0x30/0x1f0 LR [c05c9384] .ata_scsi_translate+0x44/0x230 Call Trace: 0xc003ad332280 (unreliable) .ata_scsi_translate+0x44/0x230 .ipr_queuecommand+0x2e0/0x780 [ipr] .scsi_dispatch_cmd+0xec/0x400 .scsi_request_fn+0x52c/0x670 .__blk_run_queue+0x5c/0x80 .blk_execute_rq_nowait+0xf8/0x1c0 .blk_execute_rq+0x88/0x150 .scsi_execute+0xf0/0x1f0 .scsi_execute_req_flags+0xc4/0x170 .scsi_probe_and_add_lun+0x2d4/0xe00 .__scsi_scan_target+0x1a4/0x790 .scsi_scan_channel.part.3+0x80/0xc0 .scsi_scan_host_selected+0x1a0/0x240 .do_scan_async+0x30/0x210 .async_run_entry_fn+0x78/0x1c0 .process_one_work+0x1c4/0x4a0 .worker_thread+0x184/0x600 .kthread+0x10c/0x130 .ret_from_kernel_thread+0x58/0x7c While scsi_host is unused in ata_sas_port_alloc(), better to set it in ata_sas_port_alloc() instead of in driver. Signed-off-by: Mike Qiu --- drivers/ata/libata-scsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0586f66..a472b6f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host, ap->flags |= port_info->flags; ap->ops = port_info->port_ops; ap->cbl = ATA_CBL_SATA; + ap->scsi_host = shost; return ap; } -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] libata: Fix scsi_host can_queue issue in ata_qc_new()
The can_queue in scsi_host can be more than ATA_MAX_QUEUE (32), for example, in ipr, it can be 100 or more. Also, some drivers, like ipr driver, haven't filled the field scsi_host in ata_port, and will lead a call trace, so add check for that. Signed-off-by: Mike Qiu --- drivers/ata/libata-core.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 259d879..a5b9c70 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4734,7 +4734,10 @@ static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) struct ata_queued_cmd *qc = NULL; unsigned int i, tag, max_queue; - max_queue = ap->scsi_host->can_queue; + if (ap->scsi_host && ap->scsi_host->can_queue <= ATA_MAX_QUEUE) + max_queue = ap->scsi_host->can_queue; + else + max_queue = ATA_MAX_QUEUE; /* no command while frozen */ if (unlikely(ap->pflags & ATA_PFLAG_FROZEN)) @@ -6109,16 +6112,6 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht) { int i, rc; - /* -* The max queue supported by hardware must not be greater than -* ATA_MAX_QUEUE. -*/ - if (sht->can_queue > ATA_MAX_QUEUE) { - dev_err(host->dev, "BUG: the hardware max queue is too large\n"); - WARN_ON(1); - return -EINVAL; - } - /* host must have been started */ if (!(host->flags & ATA_HOST_STARTED)) { dev_err(host->dev, "BUG: trying to register unstarted host\n"); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] libata: Fix scsi_host can_queue issue in ata_qc_new()
The can_queue in scsi_host can be more than ATA_MAX_QUEUE (32), for example, in ipr, it can be 100 or more. Also, some drivers, like ipr driver, haven't filled the field scsi_host in ata_port, and will lead a call trace, so add check for that. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/ata/libata-core.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 259d879..a5b9c70 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4734,7 +4734,10 @@ static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) struct ata_queued_cmd *qc = NULL; unsigned int i, tag, max_queue; - max_queue = ap-scsi_host-can_queue; + if (ap-scsi_host ap-scsi_host-can_queue = ATA_MAX_QUEUE) + max_queue = ap-scsi_host-can_queue; + else + max_queue = ATA_MAX_QUEUE; /* no command while frozen */ if (unlikely(ap-pflags ATA_PFLAG_FROZEN)) @@ -6109,16 +6112,6 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht) { int i, rc; - /* -* The max queue supported by hardware must not be greater than -* ATA_MAX_QUEUE. -*/ - if (sht-can_queue ATA_MAX_QUEUE) { - dev_err(host-dev, BUG: the hardware max queue is too large\n); - WARN_ON(1); - return -EINVAL; - } - /* host must have been started */ if (!(host-flags ATA_HOST_STARTED)) { dev_err(host-dev, BUG: trying to register unstarted host\n); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port
In ata_sas_port_alloc(), it haven't initialized scsi_host field in ata_port, although scsi_host is in parameters list and unused in this function. With commit 1871ee134b73 (libata: support the ata host which implements a queue depth less than 32) ata_qc_new() try to use scsi_host, while it is a NULL pointer for ipr IOA and error message shows below: Unable to handle kernel paging request for data at address 0x0114 Faulting instruction address: 0xc05c2580 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c05c2580] .ata_qc_new_init+0x30/0x1f0 LR [c05c9384] .ata_scsi_translate+0x44/0x230 Call Trace: 0xc003ad332280 (unreliable) .ata_scsi_translate+0x44/0x230 .ipr_queuecommand+0x2e0/0x780 [ipr] .scsi_dispatch_cmd+0xec/0x400 .scsi_request_fn+0x52c/0x670 .__blk_run_queue+0x5c/0x80 .blk_execute_rq_nowait+0xf8/0x1c0 .blk_execute_rq+0x88/0x150 .scsi_execute+0xf0/0x1f0 .scsi_execute_req_flags+0xc4/0x170 .scsi_probe_and_add_lun+0x2d4/0xe00 .__scsi_scan_target+0x1a4/0x790 .scsi_scan_channel.part.3+0x80/0xc0 .scsi_scan_host_selected+0x1a0/0x240 .do_scan_async+0x30/0x210 .async_run_entry_fn+0x78/0x1c0 .process_one_work+0x1c4/0x4a0 .worker_thread+0x184/0x600 .kthread+0x10c/0x130 .ret_from_kernel_thread+0x58/0x7c While scsi_host is unused in ata_sas_port_alloc(), better to set it in ata_sas_port_alloc() instead of in driver. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/ata/libata-scsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0586f66..a472b6f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host, ap-flags |= port_info-flags; ap-ops = port_info-port_ops; ap-cbl = ATA_CBL_SATA; + ap-scsi_host = shost; return ap; } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port
[+cc Wendy, Brian King, Stephen] On 07/22/2014 10:51 PM, Mike Qiu wrote: In ata_sas_port_alloc(), it haven't initialized scsi_host field in ata_port, although scsi_host is in parameters list and unused in this function. With commit 1871ee134b73 (libata: support the ata host which implements a queue depth less than 32) ata_qc_new() try to use scsi_host, while it is a NULL pointer for ipr IOA and error message shows below: Unable to handle kernel paging request for data at address 0x0114 Faulting instruction address: 0xc05c2580 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c05c2580] .ata_qc_new_init+0x30/0x1f0 LR [c05c9384] .ata_scsi_translate+0x44/0x230 Call Trace: 0xc003ad332280 (unreliable) .ata_scsi_translate+0x44/0x230 .ipr_queuecommand+0x2e0/0x780 [ipr] .scsi_dispatch_cmd+0xec/0x400 .scsi_request_fn+0x52c/0x670 .__blk_run_queue+0x5c/0x80 .blk_execute_rq_nowait+0xf8/0x1c0 .blk_execute_rq+0x88/0x150 .scsi_execute+0xf0/0x1f0 .scsi_execute_req_flags+0xc4/0x170 .scsi_probe_and_add_lun+0x2d4/0xe00 .__scsi_scan_target+0x1a4/0x790 .scsi_scan_channel.part.3+0x80/0xc0 .scsi_scan_host_selected+0x1a0/0x240 .do_scan_async+0x30/0x210 .async_run_entry_fn+0x78/0x1c0 .process_one_work+0x1c4/0x4a0 .worker_thread+0x184/0x600 .kthread+0x10c/0x130 .ret_from_kernel_thread+0x58/0x7c While scsi_host is unused in ata_sas_port_alloc(), better to set it in ata_sas_port_alloc() instead of in driver. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/ata/libata-scsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0586f66..a472b6f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host, ap-flags |= port_info-flags; ap-ops = port_info-port_ops; ap-cbl = ATA_CBL_SATA; + ap-scsi_host = shost; return ap; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] libata: Fix scsi_host can_queue issue in ata_qc_new()
On 07/22/2014 11:42 PM, Tejun Heo wrote: Hello, (cc'ing Dan) On Tue, Jul 22, 2014 at 10:50:19AM -0400, Mike Qiu wrote: The can_queue in scsi_host can be more than ATA_MAX_QUEUE (32), for example, in ipr, it can be 100 or more. Also, some drivers, like ipr driver, haven't filled the field scsi_host in ata_port, and will lead a call trace, so add check for that. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/ata/libata-core.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 259d879..a5b9c70 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4734,7 +4734,10 @@ static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) struct ata_queued_cmd *qc = NULL; unsigned int i, tag, max_queue; - max_queue = ap-scsi_host-can_queue; + if (ap-scsi_host ap-scsi_host-can_queue = ATA_MAX_QUEUE) + max_queue = ap-scsi_host-can_queue; + else + max_queue = ATA_MAX_QUEUE; /* no command while frozen */ if (unlikely(ap-pflags ATA_PFLAG_FROZEN)) @@ -6109,16 +6112,6 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht) { int i, rc; - /* -* The max queue supported by hardware must not be greater than -* ATA_MAX_QUEUE. -*/ - if (sht-can_queue ATA_MAX_QUEUE) { - dev_err(host-dev, BUG: the hardware max queue is too large\n); - WARN_ON(1); - return -EINVAL; - } - So, ummm, I really don't like that we're adding the conditionals to the hot path (yeah, its implementation is slow but still). Maybe we Yes, agree ..., not a good idea to do this... Thanks Mike need to store the chosen queue depth after all? Dan? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port
I have tested with the ipr IOA, passed. Reviewed-and Tested-by: Mike Qiu qiud...@linux.vnet.ibm.com On 07/23/2014 04:11 AM, Tejun Heo wrote: Hello, Can you please test the following patch? diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index d19c37a7..773f4e6 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4798,9 +4798,8 @@ void swap_buf_le16(u16 *buf, unsigned int buf_words) static struct ata_queued_cmd *ata_qc_new(struct ata_port *ap) { struct ata_queued_cmd *qc = NULL; - unsigned int i, tag, max_queue; - - max_queue = ap-scsi_host-can_queue; + unsigned int max_queue = ap-host-n_tags; + unsigned int i, tag; /* no command while frozen */ if (unlikely(ap-pflags ATA_PFLAG_FROZEN)) @@ -6094,6 +6093,7 @@ void ata_host_init(struct ata_host *host, struct device *dev, { spin_lock_init(host-lock); mutex_init(host-eh_mutex); + host-n_tags = ATA_MAX_QUEUE; host-dev = dev; host-ops = ops; } @@ -6179,11 +6179,7 @@ int ata_host_register(struct ata_host *host, struct scsi_host_template *sht) * The max queue supported by hardware must not be greater than * ATA_MAX_QUEUE. */ - if (sht-can_queue ATA_MAX_QUEUE) { - dev_err(host-dev, BUG: the hardware max queue is too large\n); - WARN_ON(1); - return -EINVAL; - } + host-n_tags = clamp(sht-can_queue, 1, ATA_MAX_QUEUE); /* host must have been started */ if (!(host-flags ATA_HOST_STARTED)) { diff --git a/include/linux/libata.h b/include/linux/libata.h index 5ab4e3a..92abb49 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -593,6 +593,7 @@ struct ata_host { struct device *dev; void __iomem * const*iomap; unsigned intn_ports; + unsigned intn_tags; /* nr of NCQ tags */ void*private_data; struct ata_port_operations *ops; unsigned long flags; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] libata: Fix NULL pointer of scsi_host in ata_port
On 07/22/2014 10:51 PM, Mike Qiu wrote: In ata_sas_port_alloc(), it haven't initialized scsi_host field in ata_port, although scsi_host is in parameters list and unused in this function. With commit 1871ee134b73 (libata: support the ata host which implements a queue depth less than 32) ata_qc_new() try to use scsi_host, while it is a NULL pointer for ipr IOA and error message shows below: ... While scsi_host is unused in ata_sas_port_alloc(), better to set it in ata_sas_port_alloc() instead of in driver. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/ata/libata-scsi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0586f66..a472b6f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -4070,6 +4070,7 @@ struct ata_port *ata_sas_port_alloc(struct ata_host *host, ap-flags |= port_info-flags; ap-ops = port_info-port_ops; ap-cbl = ATA_CBL_SATA; + ap-scsi_host = shost; What about my patch itself, ata_sas_port_alloc() has shot in parameters list, but unused. Maybe better to set ap-scsi_host here, it is very convenient, and drivers, like ipr, may forget to set this field, otherwise shot need to be removed from parameters list I think. Thanks, Mike return ap; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
WARNING: at kernel/cpuset.c:1139
commit 734d45130cb (cpuset: update cs-effective_{cpus, mems} when config changes) introduce the below warning in my server. [ 35.652137] [ cut here ] [ 35.652141] WARNING: at kernel/cpuset.c:1139 [ 35.652142] Modules linked in: ebtable_nat xt_CHECKSUM bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser iptable_mangle nf_conntrack_ipv4 rdma_cm nf_defrag_ipv4 xt_conntrack iw_cm nf_conntrack ib_cm ib_sa ib_mad ebtable_filter ib_core ebtables ip6_tables ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi e1000e ses ptp enclosure pps_core be2net shpchp vhost_net tun macvtap macvlan vhost kvm binfmt_misc uinput lpfc scsi_transport_fc ipr [ 35.652185] CPU: 36 PID: 1363 Comm: libvirtd Not tainted 3.16.0-rc5-next-20140721+ #93 [ 35.652187] task: c003b3443a00 ti: c003bb008000 task.ti: c003bb008000 [ 35.652189] NIP: c015ff38 LR: c015ff2c CTR: [ 35.652190] REGS: c003bb00b850 TRAP: 0700 Not tainted (3.16.0-rc5-next-20140721+) [ 35.652191] MSR: 90029032 SF,HV,EE,ME,IR,DR,RI CR: 24004824 XER: [ 35.652196] CFAR: c045f6cc SOFTE: 1 GPR00: c015ff04 c003bb00bad0 c145acf8 0001 GPR04: c003b3dae5d0 0100 GPR08: c003b3dae548 0004 0004 GPR12: 0001 cfeea200 008066727bd8 008066727a30 GPR16: 0080667dfa08 008066727a68 0080667279f8 0080667279d0 GPR20: c166acf8 c003b3dae530 c1311990 c003b3dae5d0 GPR24: c003b3dae530 c003b3dadc00 c003b3dae400 0001 GPR28: c1311968 c003b1873100 c003b3dae400 [ 35.652219] NIP [c015ff38] .cpuset_write_resmask+0x438/0x8c0 [ 35.652221] LR [c015ff2c] .cpuset_write_resmask+0x42c/0x8c0 [ 35.65] Call Trace: [ 35.652224] [c003bb00bad0] [c015ff04] .cpuset_write_resmask+0x404/0x8c0 (unreliable) [ 35.652227] [c003bb00bba0] [c0156f08] .cgroup_file_write+0x78/0x190 [ 35.652230] [c003bb00bc50] [c030c490] .kernfs_fop_write+0x150/0x1e0 [ 35.652233] [c003bb00bcf0] [c026b6d0] .vfs_write+0xe0/0x270 [ 35.652235] [c003bb00bd90] [c026be24] .SyS_write+0x64/0x110 [ 35.652238] [c003bb00be30] [c000a158] syscall_exit+0x0/0x98 [ 35.652239] Instruction dump: [ 35.652240] e93a 39549528 e9290118 7fa95000 419e0024 7ea3ab78 7ee4bb78 38a00100 [ 35.652243] 482ff719 6000 2fa3 419e0008 0fe0 7f43d378 4bfffa71 813a006c [ 35.652247] ---[ end trace f91b0c3aadfe71a6 ]--- Thanks, Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lpfc: Avoid to disable pci_dev twice
On 07/17/2014 10:15 PM, Joe Lawrence wrote: [ +cc linux-pci and Bjorn, comments inline/below ... ] On Thu, 17 Jul 2014 02:32:31 -0400 Mike Qiu wrote: In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; + uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } + /* Set the probe flag */ + phba->probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { + if (phba) + return; + Should that be "if *not* phba" like the others below? Yes, should be ... if (!phba) lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2710 PCI channel disable preparing for reset\n"); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); - pci_disable_device(phba->pcidev); + if (phba->probe_done && phba->pcidev) + pci_disable_device(phba->pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } + /* Set probe_done flag */ + phba->probe_done = 1; + /* Log the current active interrupt mode */ phba->intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { + if (!phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2826 PCI channel disable preparing for reset\n"); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); - pci_disable_device(phba->pcidev); + + if (phba->probe_done && phba->pcidev) + pci_disable_device(phba->pcidev); } /** @@ -10893,9 +10908,21 @@ static pci_ers_result_t lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba; + struct lpfc_hba *phba; pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; + if (!shost) + /* Run here means it may during probe state and +* Scsi_Host has not been created and We can do nothing +* in this state so call for hotplug*/ + return PCI_ERS_RESULT_NONE; Is it possible to get here during device removal, ie lpfc_pci_remove_one? If so, we may have shost in hand now, but can these rou
[PATCH] lpfc: Avoid to disable pci_dev twice
In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; + uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } + /* Set the probe flag */ + phba->probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { + if (phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2710 PCI channel disable preparing for reset\n"); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); - pci_disable_device(phba->pcidev); + if (phba->probe_done && phba->pcidev) + pci_disable_device(phba->pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } + /* Set probe_done flag */ + phba->probe_done = 1; + /* Log the current active interrupt mode */ phba->intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { + if (!phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "2826 PCI channel disable preparing for reset\n"); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); - pci_disable_device(phba->pcidev); + + if (phba->probe_done && phba->pcidev) + pci_disable_device(phba->pcidev); } /** @@ -10893,9 +10908,21 @@ static pci_ers_result_t lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba; + struct lpfc_hba *phba; pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; + if (!shost) + /* Run here means it may during probe state and +* Scsi_Host has not been created and We can do nothing +* in this state so call for hotplug*/ + return PCI_ERS_RESULT_NONE; + + phba = ((struct lpfc_vport *)shost->hostdata)->phba; + + if (!phba || !phba->probe_done) + /* Run here means it may during probe state */ + return PCI_ERS_RESULT_NONE; + switch (phba->pci_dev_grp) { case LPFC_PCI_DEV_LP: rc = lpfc_io_error_detected_s3(pdev, state); @@ -10930,9 +10957,20 @@ static pci_er
[PATCH] lpfc: Avoid to disable pci_dev twice
In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 SF,HV,EE,ME,IR,DR,RI CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; + uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } + /* Set the probe flag */ + phba-probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { + if (phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, 2710 PCI channel disable preparing for reset\n); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); - pci_disable_device(phba-pcidev); + if (phba-probe_done phba-pcidev) + pci_disable_device(phba-pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } + /* Set probe_done flag */ + phba-probe_done = 1; + /* Log the current active interrupt mode */ phba-intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { + if (!phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, 2826 PCI channel disable preparing for reset\n); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); - pci_disable_device(phba-pcidev); + + if (phba-probe_done phba-pcidev) + pci_disable_device(phba-pcidev); } /** @@ -10893,9 +10908,21 @@ static pci_ers_result_t lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct lpfc_hba *phba = ((struct lpfc_vport *)shost-hostdata)-phba; + struct lpfc_hba *phba; pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; + if (!shost) + /* Run here means it may during probe state and +* Scsi_Host has not been created and We can do nothing +* in this state so call for hotplug*/ + return PCI_ERS_RESULT_NONE; + + phba = ((struct lpfc_vport *)shost-hostdata)-phba; + + if (!phba || !phba-probe_done) + /* Run here means it may during probe state */ + return PCI_ERS_RESULT_NONE; + switch (phba-pci_dev_grp) { case LPFC_PCI_DEV_LP: rc = lpfc_io_error_detected_s3(pdev, state); @@ -10930,9 +10957,20 @@ static pci_ers_result_t lpfc_io_slot_reset(struct pci_dev *pdev) { struct
Re: [PATCH] lpfc: Avoid to disable pci_dev twice
On 07/17/2014 10:15 PM, Joe Lawrence wrote: [ +cc linux-pci and Bjorn, comments inline/below ... ] On Thu, 17 Jul 2014 02:32:31 -0400 Mike Qiu qiud...@linux.vnet.ibm.com wrote: In IBM Power servers, when hardware error occurs during probe state, EEH subsystem will call driver's error_detected interface, which will call pci_disable_device(). But driver's probe function also call pci_disable_device() in this situation. So pci_dev will be disabled twice: Device lpfc disabling already-disabled device [ cut here ] WARNING: at drivers/pci/pci.c:1407 CPU: 0 PID: 8744 Comm: kworker/0:0 Tainted: GW 3.10.42-2002.pkvm2_1_1.6.ppc64 #1 Workqueue: events .work_for_cpu_fn task: c0274e3f5400 ti: c027d3958000 task.ti: c027d3958000 NIP: c0471b8c LR: c0471b88 CTR: c043ebe0 REGS: c027d395b650 TRAP: 0700 Tainted: GW (3.10.42-2002.pkvm2_1_1.6.ppc64) MSR: 900100029032 SF,HV,EE,ME,IR,DR,RI CR: 28b52b44 XER: 2000 CFAR: c0879ab8 SOFTE: 1 ... NIP .pci_disable_device+0xcc/0xe0 LR .pci_disable_device+0xc8/0xe0 Call Trace: .pci_disable_device+0xc8/0xe0 (unreliable) .lpfc_disable_pci_dev+0x50/0x80 [lpfc] .lpfc_pci_probe_one+0x870/0x21a0 [lpfc] .local_pci_probe+0x68/0xb0 .work_for_cpu_fn+0x38/0x60 .process_one_work+0x1a4/0x4d0 .worker_thread+0x37c/0x490 .kthread+0xf0/0x100 .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- drivers/scsi/lpfc/lpfc.h | 1 + drivers/scsi/lpfc/lpfc_init.c | 59 +++ 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 434e903..0c7bad9 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -813,6 +813,7 @@ struct lpfc_hba { #define VPD_MASK0xf /* mask for any vpd data */ uint8_t soft_wwn_enable; + uint8_t probe_done; struct timer_list fcp_poll_timer; struct timer_list eratt_poll; diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 06f9a5b..c2e67ae 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -9519,6 +9519,9 @@ lpfc_pci_probe_one_s3(struct pci_dev *pdev, const struct pci_device_id *pid) } } + /* Set the probe flag */ + phba-probe_done = 1; + /* Perform post initialization setup */ lpfc_post_init_setup(phba); @@ -9795,6 +9798,9 @@ lpfc_sli_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) { + if (phba) + return; + Should that be if *not* phba like the others below? Yes, should be ... if (!phba) lpfc_printf_log(phba, KERN_ERR, LOG_INIT, 2710 PCI channel disable preparing for reset\n); @@ -9812,7 +9818,8 @@ lpfc_sli_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli_disable_intr(phba); - pci_disable_device(phba-pcidev); + if (phba-probe_done phba-pcidev) + pci_disable_device(phba-pcidev); } /** @@ -10282,6 +10289,9 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_disable_intr; } + /* Set probe_done flag */ + phba-probe_done = 1; + /* Log the current active interrupt mode */ phba-intr_mode = intr_mode; lpfc_log_intr_mode(phba, intr_mode); @@ -10544,6 +10554,9 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba) static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) { + if (!phba) + return; + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, 2826 PCI channel disable preparing for reset\n); @@ -10562,7 +10575,9 @@ lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba) /* Disable interrupt and pci device */ lpfc_sli4_disable_intr(phba); lpfc_sli4_queue_destroy(phba); - pci_disable_device(phba-pcidev); + + if (phba-probe_done phba-pcidev) + pci_disable_device(phba-pcidev); } /** @@ -10893,9 +10908,21 @@ static pci_ers_result_t lpfc_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) { struct Scsi_Host *shost = pci_get_drvdata(pdev); - struct lpfc_hba *phba = ((struct lpfc_vport *)shost-hostdata)-phba; + struct lpfc_hba *phba; pci_ers_result_t rc = PCI_ERS_RESULT_DISCONNECT; + if (!shost) + /* Run here means it may during probe state and +* Scsi_Host has not been created and We can do nothing +* in this state so call for hotplug*/ + return PCI_ERS_RESULT_NONE; Is it possible to get here during device removal, ie lpfc_pci_remove_one? If so, we may have shost in hand now, but can these routines race? Same for similar instances
Re: Bug_ON with patch: bio: modify __bio_add_page() to accept pages that don't start a new segment
On 07/15/2014 04:41 PM, Jens Axboe wrote: On 15/07/2014, at 10.14, Mike Qiu wrote: My Power7 box boot fail with commit: 254c4407cb84a6dec90336054615b0f0e996bb7c bio: modify __bio_add_page() to accept pages that don't start a new segment Just revert it will works for me. I have reverted it yesterday in my tree. OK, that's fine :) Thanks Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bug_ON with patch: bio: modify __bio_add_page() to accept pages that don't start a new segment
My Power7 box boot fail with commit: 254c4407cb84a6dec90336054615b0f0e996bb7c bio: modify __bio_add_page() to accept pages that don't start a new segment Just revert it will works for me. See below: [ 22.659431] [ cut here ] [ 22.659437] kernel BUG at fs/direct-io.c:747! [ 22.659501] Oops: Exception in kernel mode, sig: 5 [#1] [ 22.659528] SMP NR_CPUS=1024 NUMA PowerNV [ 22.659533] Modules linked in: e1000e vhost_net tun ses(+) macvtap macvlan enclosure ptp pps_core vhost be2net(+) shpchp kvm binfmt_misc uinput lpfc scsi_transport_fc ipr [ 22.659688] CPU: 8 PID: 772 Comm: lvm Not tainted 3.16.0-rc5-next-20140714+ #76 [ 22.659755] task: c003b0a7dc20 ti: c003b0afc000 task.ti: c003b0afc000 [ 22.659823] NIP: c02ba854 LR: c02bad80 CTR: 0010 [ 22.659890] REGS: c003b0aff450 TRAP: 0700 Not tainted (3.16.0-rc5-next-20140714+) [ 22.659957] MSR: 90029032 CR: 24222844 XER: 2000 [ 22.660114] CFAR: c02bad90 SOFTE: 1 GPR00: c02bad80 c003b0aff6d0 c145c148 GPR04: c0b6e7c8 0001 GPR08: 0001 0010 f000 GPR12: 24222844 cfee2400 0010 c003b914 GPR16: 0001 c003b914 00047bff 0001 GPR20: f0cb0fdc 0001 0001 GPR24: 0001 c003b0afc000 GPR28: 023dff80 c003fcb10380 c003b9140028 [ 22.660980] NIP [c02ba854] .__blockdev_direct_IO+0x1584/0x3960 [ 22.661036] LR [c02bad80] .__blockdev_direct_IO+0x1ab0/0x3960 [ 22.661092] Call Trace: [ 22.661116] [c003b0aff6d0] [c02bad80] .__blockdev_direct_IO+0x1ab0/0x3960 (unreliable) [ 22.661208] [c003b0aff980] [c02b6114] .blkdev_direct_IO+0x64/0x80 [ 22.661276] [c003b0affa20] [c01dd430] .generic_file_read_iter+0x5b0/0x690 [ 22.661355] [c003b0affb50] [c02b5a40] .blkdev_read_iter+0x60/0x90 [ 22.661423] [c003b0affbd0] [c0269d28] .new_sync_read+0xa8/0x120 [ 22.661491] [c003b0affcf0] [c026b280] .vfs_read+0xc0/0x1f0 [ 22.661559] [c003b0affd90] [c026b674] .SyS_read+0x64/0x110 [ 22.661628] [c003b0affe30] [c000a158] syscall_exit+0x0/0x98 [ 22.661695] Instruction dump: [ 22.661729] e88100d8 80a100e4 80c100e0 f92100c0 3920 912100a8 4814fe15 6000 [ 22.661841] 812100e4 78630020 7f891800 419ef880 <0fe0> 6000 6042 e9410118 [ 22.661955] ---[ end trace 6248a5bb36020fd2 ]--- Thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bug_ON with patch: bio: modify __bio_add_page() to accept pages that don't start a new segment
My Power7 box boot fail with commit: 254c4407cb84a6dec90336054615b0f0e996bb7c bio: modify __bio_add_page() to accept pages that don't start a new segment Just revert it will works for me. See below: [ 22.659431] [ cut here ] [ 22.659437] kernel BUG at fs/direct-io.c:747! [ 22.659501] Oops: Exception in kernel mode, sig: 5 [#1] [ 22.659528] SMP NR_CPUS=1024 NUMA PowerNV [ 22.659533] Modules linked in: e1000e vhost_net tun ses(+) macvtap macvlan enclosure ptp pps_core vhost be2net(+) shpchp kvm binfmt_misc uinput lpfc scsi_transport_fc ipr [ 22.659688] CPU: 8 PID: 772 Comm: lvm Not tainted 3.16.0-rc5-next-20140714+ #76 [ 22.659755] task: c003b0a7dc20 ti: c003b0afc000 task.ti: c003b0afc000 [ 22.659823] NIP: c02ba854 LR: c02bad80 CTR: 0010 [ 22.659890] REGS: c003b0aff450 TRAP: 0700 Not tainted (3.16.0-rc5-next-20140714+) [ 22.659957] MSR: 90029032 SF,HV,EE,ME,IR,DR,RI CR: 24222844 XER: 2000 [ 22.660114] CFAR: c02bad90 SOFTE: 1 GPR00: c02bad80 c003b0aff6d0 c145c148 GPR04: c0b6e7c8 0001 GPR08: 0001 0010 f000 GPR12: 24222844 cfee2400 0010 c003b914 GPR16: 0001 c003b914 00047bff 0001 GPR20: f0cb0fdc 0001 0001 GPR24: 0001 c003b0afc000 GPR28: 023dff80 c003fcb10380 c003b9140028 [ 22.660980] NIP [c02ba854] .__blockdev_direct_IO+0x1584/0x3960 [ 22.661036] LR [c02bad80] .__blockdev_direct_IO+0x1ab0/0x3960 [ 22.661092] Call Trace: [ 22.661116] [c003b0aff6d0] [c02bad80] .__blockdev_direct_IO+0x1ab0/0x3960 (unreliable) [ 22.661208] [c003b0aff980] [c02b6114] .blkdev_direct_IO+0x64/0x80 [ 22.661276] [c003b0affa20] [c01dd430] .generic_file_read_iter+0x5b0/0x690 [ 22.661355] [c003b0affb50] [c02b5a40] .blkdev_read_iter+0x60/0x90 [ 22.661423] [c003b0affbd0] [c0269d28] .new_sync_read+0xa8/0x120 [ 22.661491] [c003b0affcf0] [c026b280] .vfs_read+0xc0/0x1f0 [ 22.661559] [c003b0affd90] [c026b674] .SyS_read+0x64/0x110 [ 22.661628] [c003b0affe30] [c000a158] syscall_exit+0x0/0x98 [ 22.661695] Instruction dump: [ 22.661729] e88100d8 80a100e4 80c100e0 f92100c0 3920 912100a8 4814fe15 6000 [ 22.661841] 812100e4 78630020 7f891800 419ef880 0fe0 6000 6042 e9410118 [ 22.661955] ---[ end trace 6248a5bb36020fd2 ]--- Thanks, Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug_ON with patch: bio: modify __bio_add_page() to accept pages that don't start a new segment
On 07/15/2014 04:41 PM, Jens Axboe wrote: On 15/07/2014, at 10.14, Mike Qiu qiud...@linux.vnet.ibm.com wrote: My Power7 box boot fail with commit: 254c4407cb84a6dec90336054615b0f0e996bb7c bio: modify __bio_add_page() to accept pages that don't start a new segment Just revert it will works for me. I have reverted it yesterday in my tree. OK, that's fine :) Thanks Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] powerpc: Avoid circular dependency with zImage.%
This v2 patch is good, Tested-by: Mike Qiu On 06/11/2014 11:40 PM, Michal Marek wrote: The rule to create the final images uses a zImage.% pattern. Unfortunately, this also matches the names of the zImage.*.lds linker scripts, which appear as a dependency of the final images. This somehow worked when $(srctree) used to be an absolute path, but now the pattern matches too much. List only the images from $(image-y) as the target of the rule, to avoid the circular dependency. Signed-off-by: Michal Marek --- v2: - Filter out duplicates in the target list - fix the platform argument to cmd_wrap arch/powerpc/boot/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 426dce7..ccc25ed 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -333,8 +333,8 @@ $(addprefix $(obj)/, $(initrd-y)): $(obj)/ramdisk.image.gz $(obj)/zImage.initrd.%: vmlinux $(wrapperbits) $(call if_changed,wrap,$*,,,$(obj)/ramdisk.image.gz) -$(obj)/zImage.%: vmlinux $(wrapperbits) - $(call if_changed,wrap,$*) +$(addprefix $(obj)/, $(sort $(filter zImage.%, $(image-y: vmlinux $(wrapperbits) + $(call if_changed,wrap,$(subst $(obj)/zImage.,,$@)) # dtbImage% - a dtbImage is a zImage with an embedded device tree blob $(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] powerpc: Avoid circular dependency with zImage.%
This v2 patch is good, Tested-by: Mike Qiu qiud...@linux.vnet.ibm.com On 06/11/2014 11:40 PM, Michal Marek wrote: The rule to create the final images uses a zImage.% pattern. Unfortunately, this also matches the names of the zImage.*.lds linker scripts, which appear as a dependency of the final images. This somehow worked when $(srctree) used to be an absolute path, but now the pattern matches too much. List only the images from $(image-y) as the target of the rule, to avoid the circular dependency. Signed-off-by: Michal Marek mma...@suse.cz --- v2: - Filter out duplicates in the target list - fix the platform argument to cmd_wrap arch/powerpc/boot/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 426dce7..ccc25ed 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -333,8 +333,8 @@ $(addprefix $(obj)/, $(initrd-y)): $(obj)/ramdisk.image.gz $(obj)/zImage.initrd.%: vmlinux $(wrapperbits) $(call if_changed,wrap,$*,,,$(obj)/ramdisk.image.gz) -$(obj)/zImage.%: vmlinux $(wrapperbits) - $(call if_changed,wrap,$*) +$(addprefix $(obj)/, $(sort $(filter zImage.%, $(image-y: vmlinux $(wrapperbits) + $(call if_changed,wrap,$(subst $(obj)/zImage.,,$@)) # dtbImage% - a dtbImage is a zImage with an embedded device tree blob $(obj)/dtbImage.initrd.%: vmlinux $(wrapperbits) $(obj)/%.dtb -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 08/10] irqdomain: Refactor irq_domain_associate_many()
于 2013/6/10 8:49, Grant Likely 写道: Originally, irq_domain_associate_many() was designed to unwind the mapped irqs on a failure of any individual association. However, that proved to be a problem with certain IRQ controllers. Some of them only support a subset of irqs, and will fail when attempting to map a reserved IRQ. In those cases we want to map as many IRQs as possible, so instead it is better for irq_domain_associate_many() to make a best-effort attempt to map irqs, but not fail if any or all of them don't succeed. If a caller really cares about how many irqs got associated, then it should instead go back and check that all of the irqs is cares about were mapped. The original design open-coded the individual association code into the body of irq_domain_associate_many(), but with no longer needing to unwind associations, the code becomes simpler to split out irq_domain_associate() to contain the bulk of the logic, and irq_domain_associate_many() to be a simple loop wrapper. This patch also adds a new error check to the associate path to make sure it isn't called for an irq larger than the controller can handle, and adds locking so that the irq_domain_mutex is held while setting up a new association. Signed-off-by: Grant Likely --- include/linux/irqdomain.h | 22 +++--- kernel/irq/irqdomain.c| 185 +++--- 2 files changed, 101 insertions(+), 106 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index fd4b26f..f9e8e06 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -103,6 +103,7 @@ struct irq_domain { struct irq_domain_chip_generic *gc; /* reverse map data. The linear map gets appended to the irq_domain */ + irq_hw_number_t hwirq_max; unsigned int revmap_direct_max_irq; unsigned int revmap_size; struct radix_tree_root revmap_tree; @@ -110,8 +111,8 @@ struct irq_domain { }; #ifdef CONFIG_IRQ_DOMAIN -struct irq_domain *__irq_domain_add(struct device_node *of_node, - int size, int direct_max, +struct irq_domain *__irq_domain_add(struct device_node *of_node, int size, + irq_hw_number_t hwirq_max, int direct_max, const struct irq_domain_ops *ops, void *host_data); struct irq_domain *irq_domain_add_simple(struct device_node *of_node, @@ -140,14 +141,14 @@ static inline struct irq_domain *irq_domain_add_linear(struct device_node *of_no const struct irq_domain_ops *ops, void *host_data) { - return __irq_domain_add(of_node, size, 0, ops, host_data); + return __irq_domain_add(of_node, size, size, 0, ops, host_data); } static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_node, unsigned int max_irq, const struct irq_domain_ops *ops, void *host_data) { - return __irq_domain_add(of_node, 0, max_irq, ops, host_data); + return __irq_domain_add(of_node, 0, max_irq, max_irq, ops, host_data); } static inline struct irq_domain *irq_domain_add_legacy_isa( struct device_node *of_node, @@ -166,14 +167,11 @@ static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node extern void irq_domain_remove(struct irq_domain *host); -extern int irq_domain_associate_many(struct irq_domain *domain, -unsigned int irq_base, -irq_hw_number_t hwirq_base, int count); -static inline int irq_domain_associate(struct irq_domain *domain, unsigned int irq, - irq_hw_number_t hwirq) -{ - return irq_domain_associate_many(domain, irq, hwirq, 1); -} +extern int irq_domain_associate(struct irq_domain *domain, unsigned int irq, + irq_hw_number_t hwirq); +extern void irq_domain_associate_many(struct irq_domain *domain, + unsigned int irq_base, + irq_hw_number_t hwirq_base, int count); extern unsigned int irq_create_mapping(struct irq_domain *host, irq_hw_number_t hwirq); diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 280b804..80e9249 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -35,8 +35,8 @@ static struct irq_domain *irq_default_domain; * register allocated irq_domain with irq_domain_register(). Returns pointer * to IRQ domain, or NULL on failure. */ -struct irq_domain *__irq_domain_add(struct device_node *of_node, - int size, int direct_max, +struct irq_domain *__irq_domain_add(struct
Re: [RFC 08/10] irqdomain: Refactor irq_domain_associate_many()
于 2013/6/10 8:49, Grant Likely 写道: Originally, irq_domain_associate_many() was designed to unwind the mapped irqs on a failure of any individual association. However, that proved to be a problem with certain IRQ controllers. Some of them only support a subset of irqs, and will fail when attempting to map a reserved IRQ. In those cases we want to map as many IRQs as possible, so instead it is better for irq_domain_associate_many() to make a best-effort attempt to map irqs, but not fail if any or all of them don't succeed. If a caller really cares about how many irqs got associated, then it should instead go back and check that all of the irqs is cares about were mapped. The original design open-coded the individual association code into the body of irq_domain_associate_many(), but with no longer needing to unwind associations, the code becomes simpler to split out irq_domain_associate() to contain the bulk of the logic, and irq_domain_associate_many() to be a simple loop wrapper. This patch also adds a new error check to the associate path to make sure it isn't called for an irq larger than the controller can handle, and adds locking so that the irq_domain_mutex is held while setting up a new association. Signed-off-by: Grant Likely grant.lik...@linaro.org --- include/linux/irqdomain.h | 22 +++--- kernel/irq/irqdomain.c| 185 +++--- 2 files changed, 101 insertions(+), 106 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index fd4b26f..f9e8e06 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -103,6 +103,7 @@ struct irq_domain { struct irq_domain_chip_generic *gc; /* reverse map data. The linear map gets appended to the irq_domain */ + irq_hw_number_t hwirq_max; unsigned int revmap_direct_max_irq; unsigned int revmap_size; struct radix_tree_root revmap_tree; @@ -110,8 +111,8 @@ struct irq_domain { }; #ifdef CONFIG_IRQ_DOMAIN -struct irq_domain *__irq_domain_add(struct device_node *of_node, - int size, int direct_max, +struct irq_domain *__irq_domain_add(struct device_node *of_node, int size, + irq_hw_number_t hwirq_max, int direct_max, const struct irq_domain_ops *ops, void *host_data); struct irq_domain *irq_domain_add_simple(struct device_node *of_node, @@ -140,14 +141,14 @@ static inline struct irq_domain *irq_domain_add_linear(struct device_node *of_no const struct irq_domain_ops *ops, void *host_data) { - return __irq_domain_add(of_node, size, 0, ops, host_data); + return __irq_domain_add(of_node, size, size, 0, ops, host_data); } static inline struct irq_domain *irq_domain_add_nomap(struct device_node *of_node, unsigned int max_irq, const struct irq_domain_ops *ops, void *host_data) { - return __irq_domain_add(of_node, 0, max_irq, ops, host_data); + return __irq_domain_add(of_node, 0, max_irq, max_irq, ops, host_data); } static inline struct irq_domain *irq_domain_add_legacy_isa( struct device_node *of_node, @@ -166,14 +167,11 @@ static inline struct irq_domain *irq_domain_add_tree(struct device_node *of_node extern void irq_domain_remove(struct irq_domain *host); -extern int irq_domain_associate_many(struct irq_domain *domain, -unsigned int irq_base, -irq_hw_number_t hwirq_base, int count); -static inline int irq_domain_associate(struct irq_domain *domain, unsigned int irq, - irq_hw_number_t hwirq) -{ - return irq_domain_associate_many(domain, irq, hwirq, 1); -} +extern int irq_domain_associate(struct irq_domain *domain, unsigned int irq, + irq_hw_number_t hwirq); +extern void irq_domain_associate_many(struct irq_domain *domain, + unsigned int irq_base, + irq_hw_number_t hwirq_base, int count); extern unsigned int irq_create_mapping(struct irq_domain *host, irq_hw_number_t hwirq); diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 280b804..80e9249 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -35,8 +35,8 @@ static struct irq_domain *irq_default_domain; * register allocated irq_domain with irq_domain_register(). Returns pointer * to IRQ domain, or NULL on failure. */ -struct irq_domain *__irq_domain_add(struct device_node *of_node, - int size, int direct_max, +struct irq_domain
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
于 2013/5/22 8:15, Benjamin Herrenschmidt 写道: On Tue, 2013-05-21 at 16:45 +0200, Alexander Gordeev wrote: On Tue, Jan 15, 2013 at 03:38:53PM +0800, Mike Qiu wrote: The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 16: 240458 261601 226310 200425 XICS Level IPI 17: 0 0 0 0 XICS Level RAS_EPOW 18: 10 0 3 2 XICS Level hvc_console 19: 122182 28481 28527 28864 XICS Level ibmvscsi 20:5067388226108118 XICS Level eth0 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 Hi Mike, I am curious if pSeries firmware allows changing affinity masks independently for multiple MSIs? I.e. in your example, would it be possible to assign IRQ21 and IRQ22 to different CPUs? Yes. Each interrupt has its own affinity, whether it's an MSI or not, the affinity is not driven by the address. Cheers, Ben. Hi Ben, May this patch be accepted? if so I will send out the 3.9 version. As Michael Ellerman says, he want to see the performance data, but this depends on the driver. It is something like MSI, and the driver can use more than 1 MSI. That is to say, the driver has more interrupt resource to use, but whether the driver is full use of the resource, is out of this patch's control. I test this patch use ipr driver, which add multiple MSI support by others. and it can work. Thanks Mike Thanks! LOC: 398077 316725 231882 203049 Local timer interrupts SPU: 1659919961903 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
于 2013/5/22 8:15, Benjamin Herrenschmidt 写道: On Tue, 2013-05-21 at 16:45 +0200, Alexander Gordeev wrote: On Tue, Jan 15, 2013 at 03:38:53PM +0800, Mike Qiu wrote: The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 16: 240458 261601 226310 200425 XICS Level IPI 17: 0 0 0 0 XICS Level RAS_EPOW 18: 10 0 3 2 XICS Level hvc_console 19: 122182 28481 28527 28864 XICS Level ibmvscsi 20:5067388226108118 XICS Level eth0 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 Hi Mike, I am curious if pSeries firmware allows changing affinity masks independently for multiple MSIs? I.e. in your example, would it be possible to assign IRQ21 and IRQ22 to different CPUs? Yes. Each interrupt has its own affinity, whether it's an MSI or not, the affinity is not driven by the address. Cheers, Ben. Hi Ben, May this patch be accepted? if so I will send out the 3.9 version. As Michael Ellerman says, he want to see the performance data, but this depends on the driver. It is something like MSI, and the driver can use more than 1 MSI. That is to say, the driver has more interrupt resource to use, but whether the driver is full use of the resource, is out of this patch's control. I test this patch use ipr driver, which add multiple MSI support by others. and it can work. Thanks Mike Thanks! LOC: 398077 316725 231882 203049 Local timer interrupts SPU: 1659919961903 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/27 17:28, Chen Gang F T 写道: On 2013年04月26日 11:54, Mike Qiu wrote: 于 2013/4/26 11:42, Chen Gang 写道: On 2013年04月26日 11:25, Chen Gang wrote: On 2013年04月26日 11:08, Mike Qiu wrote: 于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Please reference commit number: 1707dd161349e6c54170c88d94fed012e3d224e3 (1707dd1 powerpc: Save CFAR before branching in interrupt entry paths) What our diff v2 has done is just the fix for our patch v2 (just like the commit 1707dd1 has done). Please check, thanks. :-) I will check this evening or tomorrow, I have something else to do this afteroon. I think the diff v2 is correct, but is not the best one for this issue. I prefer the Paul's patch for this issue which has better performance :-) yes, I use your patch and it can work, also Paul's patch can work too. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/27 17:28, Chen Gang F T 写道: On 2013年04月26日 11:54, Mike Qiu wrote: 于 2013/4/26 11:42, Chen Gang 写道: On 2013年04月26日 11:25, Chen Gang wrote: On 2013年04月26日 11:08, Mike Qiu wrote: 于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Please reference commit number: 1707dd161349e6c54170c88d94fed012e3d224e3 (1707dd1 powerpc: Save CFAR before branching in interrupt entry paths) What our diff v2 has done is just the fix for our patch v2 (just like the commit 1707dd1 has done). Please check, thanks. :-) I will check this evening or tomorrow, I have something else to do this afteroon. I think the diff v2 is correct, but is not the best one for this issue. I prefer the Paul's patch for this issue which has better performance :-) yes, I use your patch and it can work, also Paul's patch can work too. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 11:42, Chen Gang 写道: On 2013年04月26日 11:25, Chen Gang wrote: On 2013年04月26日 11:08, Mike Qiu wrote: 于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Please reference commit number: 1707dd161349e6c54170c88d94fed012e3d224e3 (1707dd1 powerpc: Save CFAR before branching in interrupt entry paths) What our diff v2 has done is just the fix for our patch v2 (just like the commit 1707dd1 has done). Please check, thanks. :-) I will check this evening or tomorrow, I have something else to do this afteroon. Thank you for your information ! I have checked the disassemble by powerpc64-linux-gnu-objdump, it seems all we have done for 0x900 is almost like the original done for 0x200. I am just learning about the CFAR (google it), And I plan to wait for a day, if all things go smoothly, I will send patch v3. :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "attempt to move .org backwards" still show up
于 2013/4/25 14:25, Paul Mackerras 写道: On Thu, Apr 25, 2013 at 12:05:54PM +0800, Mike Qiu wrote: This has block my work now So I hope you can take a look ASAP Thanks :) Mike As a quick fix, turn on CONFIG_KVM_BOOK3S_64_HV. That will eliminate the immediate problem. Thanks got it, I will have a try. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Thanks Mike :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 9:36, Chen Gang 写道: > On 2013年04月26日 09:18, Chen Gang wrote: >> On 2013年04月26日 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last > branch and is hence overwritten by any branch. > >>> Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? >>> . = 0x900 >>> .globl decrementer_pSeries >>> decrementer_pSeries: >>> HMT_MEDIUM_PPR_DISCARD >>> SET_SCRATCH0(r13) >>> b decrementer_pSeries_0 >>> >>> ... >>> >>> > Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related > with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up > -diff v2 begin- > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index e789ee7..f0489c4 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -254,7 +254,15 @@ hardware_interrupt_hv: > STD_EXCEPTION_PSERIES(0x800, 0x800, fp_unavailable) > KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) > > - MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) > + . = 0x900 > + .globl decrementer_pSeries > +decrementer_pSeries: > + HMT_MEDIUM_PPR_DISCARD > + SET_SCRATCH0(r13) /* save r13 */ > + EXCEPTION_PROLOG_0(PACA_EXGEN) > + EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, 0x900) > + b decrementer_pSeries_0 > + > STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) > > MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) > @@ -536,6 +544,11 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206) > #endif > > .align 7 > + /* moved from 0x900 */ > +decrementer_pSeries_0: > + EXCEPTION_PROLOG_PSERIES_1(decrementer_common, EXC_STD) > + > + .align 7 > /* moved from 0xe00 */ > STD_EXCEPTION_HV_OOL(0xe02, h_data_storage) > KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02) > > > -diff v2 end--- > > >> Such as the fix below, is it OK (just like 0x300 or 0x200 has done) ? >> >> Please check, thanks. >> >> ---diff begin- >> >> diff --git a/arch/powerpc/kernel/exceptions-64s.S >> b/arch/powerpc/kernel/exceptions-64s.S >> index e789ee7..a0a5ff2 100644 >> --- a/arch/powerpc/kernel/exceptions-64s.S >> +++ b/arch/powerpc/kernel/exceptions-64s.S >> @@ -254,7 +254,14 @@ hardware_interrupt_hv: >> STD_EXCEPTION_PSERIES(0x800, 0x800, fp_unavailable) >> KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) >> >> -MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) >> +. = 0x900 >> +.globl decrementer_pSeries >> +decrementer_pSeries: >> +HMT_MEDIUM_PPR_DISCARD >> +SET_SCRATCH0(r13) /* save r13 */ >> +EXCEPTION_PROLOG_0(PACA_EXGEN) >> +b decrementer_pSeries_0 >> + >> STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) >> >> MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) >> @@ -536,6 +543,12 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206) >> #endif >> >> .align 7 >> +/* moved from 0x900 */ >> +decrementer_pSeries_0: >> +EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, 0x900) >> +EXCEPTION_PROLOG_PSERIES_1(decrementer_common, EXC_STD) >> + >> +.align 7 >> /* moved from 0xe00 */ >> STD_EXCEPTION_HV_OOL(0xe02, h_data_storage) >> KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02) >> >> ---diff end--- >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "attempt to move .org backwards" still show up
于 2013/4/25 19:16, Chen Gang 写道: On 2013年04月25日 14:25, Paul Mackerras wrote: On Thu, Apr 25, 2013 at 12:05:54PM +0800, Mike Qiu wrote: This has block my work now So I hope you can take a look ASAP Thanks :) Mike As a quick fix, turn on CONFIG_KVM_BOOK3S_64_HV. That will eliminate the immediate problem. Yes, just as my original reply to Mike to bypass it, but get no reply, I guess he has to face the CONFIG_KVM_BOOK3S_64_PR. Now, I am just fixing it, when I finish one patch, please help check. Actually, I have compile pass by your patch, but I see Micheal Neuling's reply, I just stop to do that, and wait for you new patch :) Now I will use your V2 patch to build Thanks Mike Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/25 16:21, Chen Gang 写道: Hello Mike: Please try this patch, at least it can pass compiling with the config file which you provided under my cross-compiling envrionments. I do not give a running test now, so better to try to run the new kernel with this patch. OK, I will use your patch, and I will send out the result later. Thanks Mike Thanks. On 2013年04月25日 16:18, Chen Gang wrote: When CONFIG_KVM_BOOK3S_64_PR is enabled, MASKABLE_EXCEPTION_PSERIES(0x900 ...) will includes __KVMTEST, it will exceed 0x980 which STD_EXCEPTION_HV(0x980 ...) will use, it will cause compiling issue. The related errors: arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1 Signed-off-by: Chen Gang --- arch/powerpc/include/asm/kvm_asm.h |2 +- arch/powerpc/kernel/exceptions-64s.S |6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index b9dd382..2c65bae 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -86,7 +86,7 @@ #define BOOK3S_INTERRUPT_PROGRAM 0x700 #define BOOK3S_INTERRUPT_FP_UNAVAIL 0x800 #define BOOK3S_INTERRUPT_DECREMENTER 0x900 -#define BOOK3S_INTERRUPT_HV_DECREMENTER0x980 +#define BOOK3S_INTERRUPT_HV_DECREMENTER0x988 #define BOOK3S_INTERRUPT_SYSCALL 0xc00 #define BOOK3S_INTERRUPT_TRACE0xd00 #define BOOK3S_INTERRUPT_H_DATA_STORAGE 0xe00 diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e789ee7..bb0e677 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -255,7 +255,7 @@ hardware_interrupt_hv: KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) - STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) + STD_EXCEPTION_HV(0x988, 0x982, hdecrementer) MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xa00) @@ -698,7 +698,7 @@ machine_check_common: STD_EXCEPTION_COMMON_ASYNC(0x500, hardware_interrupt, do_IRQ) STD_EXCEPTION_COMMON_ASYNC(0x900, decrementer, .timer_interrupt) - STD_EXCEPTION_COMMON(0x980, hdecrementer, .hdec_interrupt) + STD_EXCEPTION_COMMON(0x988, hdecrementer, .hdec_interrupt) #ifdef CONFIG_PPC_DOORBELL STD_EXCEPTION_COMMON_ASYNC(0xa00, doorbell_super, .doorbell_exception) #else @@ -802,7 +802,7 @@ hardware_interrupt_relon_hv: STD_RELON_EXCEPTION_PSERIES(0x4700, 0x700, program_check) STD_RELON_EXCEPTION_PSERIES(0x4800, 0x800, fp_unavailable) MASKABLE_RELON_EXCEPTION_PSERIES(0x4900, 0x900, decrementer) - STD_RELON_EXCEPTION_HV(0x4980, 0x982, hdecrementer) + STD_RELON_EXCEPTION_HV(0x4988, 0x982, hdecrementer) MASKABLE_RELON_EXCEPTION_PSERIES(0x4a00, 0xa00, doorbell_super) STD_RELON_EXCEPTION_PSERIES(0x4b00, 0xb00, trap_0b) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/25 16:21, Chen Gang 写道: Hello Mike: Please try this patch, at least it can pass compiling with the config file which you provided under my cross-compiling envrionments. I do not give a running test now, so better to try to run the new kernel with this patch. OK, I will use your patch, and I will send out the result later. Thanks Mike Thanks. On 2013年04月25日 16:18, Chen Gang wrote: When CONFIG_KVM_BOOK3S_64_PR is enabled, MASKABLE_EXCEPTION_PSERIES(0x900 ...) will includes __KVMTEST, it will exceed 0x980 which STD_EXCEPTION_HV(0x980 ...) will use, it will cause compiling issue. The related errors: arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1 Signed-off-by: Chen Gang gang.c...@asianux.com --- arch/powerpc/include/asm/kvm_asm.h |2 +- arch/powerpc/kernel/exceptions-64s.S |6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index b9dd382..2c65bae 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -86,7 +86,7 @@ #define BOOK3S_INTERRUPT_PROGRAM 0x700 #define BOOK3S_INTERRUPT_FP_UNAVAIL 0x800 #define BOOK3S_INTERRUPT_DECREMENTER 0x900 -#define BOOK3S_INTERRUPT_HV_DECREMENTER0x980 +#define BOOK3S_INTERRUPT_HV_DECREMENTER0x988 #define BOOK3S_INTERRUPT_SYSCALL 0xc00 #define BOOK3S_INTERRUPT_TRACE0xd00 #define BOOK3S_INTERRUPT_H_DATA_STORAGE 0xe00 diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e789ee7..bb0e677 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -255,7 +255,7 @@ hardware_interrupt_hv: KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) - STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) + STD_EXCEPTION_HV(0x988, 0x982, hdecrementer) MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0xa00) @@ -698,7 +698,7 @@ machine_check_common: STD_EXCEPTION_COMMON_ASYNC(0x500, hardware_interrupt, do_IRQ) STD_EXCEPTION_COMMON_ASYNC(0x900, decrementer, .timer_interrupt) - STD_EXCEPTION_COMMON(0x980, hdecrementer, .hdec_interrupt) + STD_EXCEPTION_COMMON(0x988, hdecrementer, .hdec_interrupt) #ifdef CONFIG_PPC_DOORBELL STD_EXCEPTION_COMMON_ASYNC(0xa00, doorbell_super, .doorbell_exception) #else @@ -802,7 +802,7 @@ hardware_interrupt_relon_hv: STD_RELON_EXCEPTION_PSERIES(0x4700, 0x700, program_check) STD_RELON_EXCEPTION_PSERIES(0x4800, 0x800, fp_unavailable) MASKABLE_RELON_EXCEPTION_PSERIES(0x4900, 0x900, decrementer) - STD_RELON_EXCEPTION_HV(0x4980, 0x982, hdecrementer) + STD_RELON_EXCEPTION_HV(0x4988, 0x982, hdecrementer) MASKABLE_RELON_EXCEPTION_PSERIES(0x4a00, 0xa00, doorbell_super) STD_RELON_EXCEPTION_PSERIES(0x4b00, 0xb00, trap_0b) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: attempt to move .org backwards still show up
于 2013/4/25 19:16, Chen Gang 写道: On 2013年04月25日 14:25, Paul Mackerras wrote: On Thu, Apr 25, 2013 at 12:05:54PM +0800, Mike Qiu wrote: This has block my work now So I hope you can take a look ASAP Thanks :) Mike As a quick fix, turn on CONFIG_KVM_BOOK3S_64_HV. That will eliminate the immediate problem. Yes, just as my original reply to Mike to bypass it, but get no reply, I guess he has to face the CONFIG_KVM_BOOK3S_64_PR. Now, I am just fixing it, when I finish one patch, please help check. Actually, I have compile pass by your patch, but I see Micheal Neuling's reply, I just stop to do that, and wait for you new patch :) Now I will use your V2 patch to build Thanks Mike Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 9:36, Chen Gang 写道: On 2013年04月26日 09:18, Chen Gang wrote: On 2013年04月26日 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up -diff v2 begin- diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e789ee7..f0489c4 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -254,7 +254,15 @@ hardware_interrupt_hv: STD_EXCEPTION_PSERIES(0x800, 0x800, fp_unavailable) KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) - MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) + . = 0x900 + .globl decrementer_pSeries +decrementer_pSeries: + HMT_MEDIUM_PPR_DISCARD + SET_SCRATCH0(r13) /* save r13 */ + EXCEPTION_PROLOG_0(PACA_EXGEN) + EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, 0x900) + b decrementer_pSeries_0 + STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) @@ -536,6 +544,11 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206) #endif .align 7 + /* moved from 0x900 */ +decrementer_pSeries_0: + EXCEPTION_PROLOG_PSERIES_1(decrementer_common, EXC_STD) + + .align 7 /* moved from 0xe00 */ STD_EXCEPTION_HV_OOL(0xe02, h_data_storage) KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02) -diff v2 end--- Such as the fix below, is it OK (just like 0x300 or 0x200 has done) ? Please check, thanks. ---diff begin- diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e789ee7..a0a5ff2 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -254,7 +254,14 @@ hardware_interrupt_hv: STD_EXCEPTION_PSERIES(0x800, 0x800, fp_unavailable) KVM_HANDLER_PR(PACA_EXGEN, EXC_STD, 0x800) -MASKABLE_EXCEPTION_PSERIES(0x900, 0x900, decrementer) +. = 0x900 +.globl decrementer_pSeries +decrementer_pSeries: +HMT_MEDIUM_PPR_DISCARD +SET_SCRATCH0(r13) /* save r13 */ +EXCEPTION_PROLOG_0(PACA_EXGEN) +b decrementer_pSeries_0 + STD_EXCEPTION_HV(0x980, 0x982, hdecrementer) MASKABLE_EXCEPTION_PSERIES(0xa00, 0xa00, doorbell_super) @@ -536,6 +543,12 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_206) #endif .align 7 +/* moved from 0x900 */ +decrementer_pSeries_0: +EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, 0x900) +EXCEPTION_PROLOG_PSERIES_1(decrementer_common, EXC_STD) + +.align 7 /* moved from 0xe00 */ STD_EXCEPTION_HV_OOL(0xe02, h_data_storage) KVM_HANDLER_SKIP(PACA_EXGEN, EXC_HV, 0xe02) ---diff end--- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Thanks Mike :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: attempt to move .org backwards still show up
于 2013/4/25 14:25, Paul Mackerras 写道: On Thu, Apr 25, 2013 at 12:05:54PM +0800, Mike Qiu wrote: This has block my work now So I hope you can take a look ASAP Thanks :) Mike As a quick fix, turn on CONFIG_KVM_BOOK3S_64_HV. That will eliminate the immediate problem. Thanks got it, I will have a try. Paul. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] PowerPC: kernel: compiling issue, make additional room in exception vector area
于 2013/4/26 11:42, Chen Gang 写道: On 2013年04月26日 11:25, Chen Gang wrote: On 2013年04月26日 11:08, Mike Qiu wrote: 于 2013/4/26 10:06, Chen Gang 写道: On 2013年04月26日 10:03, Mike Qiu wrote: �� 2013/4/26 9:36, Chen Gang д��: On 2013��04��26�� 09:18, Chen Gang wrote: On 2013��04��26�� 09:06, Chen Gang wrote: CFAR is the Come From Register. It saves the location of the last branch and is hence overwritten by any branch. Do we process it just like others done (e.g. 0x300, 0xe00, 0xe20 ...) ? . = 0x900 .globl decrementer_pSeries decrementer_pSeries: HMT_MEDIUM_PPR_DISCARD SET_SCRATCH0(r13) b decrementer_pSeries_0 ... Oh, it seems EXCEPTION_PROLOG_1 will save the regesters which related with CFAR, so I think need move EXCEPTION_PROLOG_1 to near 0x900. I will try your diff V2, to see if the machine can boot up OK, thanks. (hope it can work) It seems that the machine can be bootup in powernv mode, but I'm not sure if my machine call that module. At lease my machine can boot up Please reference commit number: 1707dd161349e6c54170c88d94fed012e3d224e3 (1707dd1 powerpc: Save CFAR before branching in interrupt entry paths) What our diff v2 has done is just the fix for our patch v2 (just like the commit 1707dd1 has done). Please check, thanks. :-) I will check this evening or tomorrow, I have something else to do this afteroon. Thank you for your information ! I have checked the disassemble by powerpc64-linux-gnu-objdump, it seems all we have done for 0x900 is almost like the original done for 0x200. I am just learning about the CFAR (google it), And I plan to wait for a day, if all things go smoothly, I will send patch v3. :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "attempt to move .org backwards" still show up
于 2013/4/25 9:05, Chen Gang 写道: On 2013年04月24日 20:47, Mike wrote: 在 2013-04-24三的 20:37 +1000,Michael Neuling写道: Mike Qiu wrote: 于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? I just copy the config file from /boot/config.* to .config and use make menuconfig change nothing by manually, then save. Can you post the resulting config here? Do you have commit in your tree? commit 087aa036eb79f24b856893190359ba812b460f45 Author: Chen Gang powerpc: make additional room in exception vector area Sure, that commit certainly in my git tree. And I just try to remove the code and re-git clone the source code from upstream, this problem still happen. I will post the config file as the attachment :) Thanks I will try, and plan to get a result within this week (2013-04-28) Thanks. Hi This has block my work now So I hope you can take a look ASAP Thanks :) Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "attempt to move .org backwards" still show up
于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? cheers And I do know how to build the source code in this machine . . . Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "attempt to move .org backwards" still show up
于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? I just copy the config file from /boot/config.* to .config and use make menuconfig change nothing by manually, then save. cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
"attempt to move .org backwards" still show up
Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. [root@feng linux]# make -j60 CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CC scripts/mod/devicetable-offsets.s GEN scripts/mod/devicetable-offsets.h HOSTCC scripts/mod/file2alias.o CALL scripts/checksyscalls.sh HOSTLD scripts/mod/modpost CHK include/generated/compile.h CALL arch/powerpc/kernel/systbl_chk.sh CALL arch/powerpc/kernel/prom_init_check.sh AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1 make: *** [arch/powerpc/kernel] Error 2 make: *** Waiting for unfinished jobs and I see this should be fixed by the commit: 087aa036eb79f24b856893190359ba812b460f45 But it still failed in my P7 machine. the kernel source code info: git tree : git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [root@feng linux]# git log commit 824282ca7d250bd7c301f221c3cd902ce906d731 Merge: f83b293 3b5e50e Author: Linus Torvalds Date: Mon Apr 22 15:00:59 2013 -0700 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS fix from Ralf Baechle: "Revert the change of the definition of PAGE_MASK which was prettier but broke a few relativly rare platforms" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: Revert "MIPS: page.h: Provide more readable definition for PAGE_MASK." commit 3b5e50edaf500f392f4a372296afc0b99ffa7e70 Author: Ralf Baechle Date: Mon Apr 22 17:57:54 2013 +0200 [root@feng linux]# git branch * master [root@feng linux]# git diff [root@feng linux]# Thant means I have done nothing with the kernel Thanks Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
attempt to move .org backwards still show up
Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. [root@feng linux]# make -j60 CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CC scripts/mod/devicetable-offsets.s GEN scripts/mod/devicetable-offsets.h HOSTCC scripts/mod/file2alias.o CALL scripts/checksyscalls.sh HOSTLD scripts/mod/modpost CHK include/generated/compile.h CALL arch/powerpc/kernel/systbl_chk.sh CALL arch/powerpc/kernel/prom_init_check.sh AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[1]: *** [arch/powerpc/kernel/head_64.o] Error 1 make: *** [arch/powerpc/kernel] Error 2 make: *** Waiting for unfinished jobs and I see this should be fixed by the commit: 087aa036eb79f24b856893190359ba812b460f45 But it still failed in my P7 machine. the kernel source code info: git tree : git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [root@feng linux]# git log commit 824282ca7d250bd7c301f221c3cd902ce906d731 Merge: f83b293 3b5e50e Author: Linus Torvalds torva...@linux-foundation.org Date: Mon Apr 22 15:00:59 2013 -0700 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS fix from Ralf Baechle: Revert the change of the definition of PAGE_MASK which was prettier but broke a few relativly rare platforms * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: Revert MIPS: page.h: Provide more readable definition for PAGE_MASK. commit 3b5e50edaf500f392f4a372296afc0b99ffa7e70 Author: Ralf Baechle r...@linux-mips.org Date: Mon Apr 22 17:57:54 2013 +0200 [root@feng linux]# git branch * master [root@feng linux]# git diff [root@feng linux]# Thant means I have done nothing with the kernel Thanks Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: attempt to move .org backwards still show up
于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? I just copy the config file from /boot/config.* to .config and use make menuconfig change nothing by manually, then save. cheers -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: attempt to move .org backwards still show up
于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? cheers And I do know how to build the source code in this machine . . . Thanks -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: attempt to move .org backwards still show up
于 2013/4/25 9:05, Chen Gang 写道: On 2013年04月24日 20:47, Mike wrote: 在 2013-04-24三的 20:37 +1000,Michael Neuling写道: Mike Qiu qiud...@linux.vnet.ibm.com wrote: 于 2013/4/24 16:31, Michael Ellerman 写道: On Wed, Apr 24, 2013 at 04:22:53PM +0800, Mike Qiu wrote: Hi all I get an error message when I compile the source code in Power7 platform use the newest upstream kernel. Hi Mike, It depends on what your .config is. What defconfig are you building? I just copy the config file from /boot/config.* to .config and use make menuconfig change nothing by manually, then save. Can you post the resulting config here? Do you have commit in your tree? commit 087aa036eb79f24b856893190359ba812b460f45 Author: Chen Gang gang.c...@asianux.com powerpc: make additional room in exception vector area Sure, that commit certainly in my git tree. And I just try to remove the code and re-git clone the source code from upstream, this problem still happen. I will post the config file as the attachment :) Thanks I will try, and plan to get a result within this week (2013-04-28) Thanks. Hi This has block my work now So I hope you can take a look ASAP Thanks :) Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] PowerNV/PCI: Fix NULL PCI controller
In pnv_pci_read_config() or pnv_pci_write_config(), we never check if the PCI controller is valid before converting that into platform dependent one, this is very dangerous. To avoid this potential risks, the patch check PCI controller first before use it. Signed-off-by: Mike Qiu --- arch/powerpc/platforms/powernv/pci.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index b8b8e0b..e7b7f1a 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -286,11 +286,11 @@ static int pnv_pci_read_config(struct pci_bus *bus, int where, int size, u32 *val) { struct pci_controller *hose = pci_bus_to_host(bus); - struct pnv_phb *phb = hose->private_data; + struct pnv_phb *phb = hose ? hose->private_data : NULL; u32 bdfn = (((uint64_t)bus->number) << 8) | devfn; s64 rc; - if (hose == NULL) + if (!phb) return PCIBIOS_DEVICE_NOT_FOUND; switch (size) { @@ -330,10 +330,10 @@ static int pnv_pci_write_config(struct pci_bus *bus, int where, int size, u32 val) { struct pci_controller *hose = pci_bus_to_host(bus); - struct pnv_phb *phb = hose->private_data; + struct pnv_phb *phb = hose ? hose->private_data : NULL; u32 bdfn = (((uint64_t)bus->number) << 8) | devfn; - if (hose == NULL) + if (!phb) return PCIBIOS_DEVICE_NOT_FOUND; cfg_dbg("pnv_pci_write_config bus: %x devfn: %x +%x/%x -> %08x\n", -- 1.7.10.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] PowerNV/PCI: Fix NULL PCI controller
In pnv_pci_read_config() or pnv_pci_write_config(), we never check if the PCI controller is valid before converting that into platform dependent one, this is very dangerous. To avoid this potential risks, the patch check PCI controller first before use it. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/pci.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index b8b8e0b..e7b7f1a 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -286,11 +286,11 @@ static int pnv_pci_read_config(struct pci_bus *bus, int where, int size, u32 *val) { struct pci_controller *hose = pci_bus_to_host(bus); - struct pnv_phb *phb = hose-private_data; + struct pnv_phb *phb = hose ? hose-private_data : NULL; u32 bdfn = (((uint64_t)bus-number) 8) | devfn; s64 rc; - if (hose == NULL) + if (!phb) return PCIBIOS_DEVICE_NOT_FOUND; switch (size) { @@ -330,10 +330,10 @@ static int pnv_pci_write_config(struct pci_bus *bus, int where, int size, u32 val) { struct pci_controller *hose = pci_bus_to_host(bus); - struct pnv_phb *phb = hose-private_data; + struct pnv_phb *phb = hose ? hose-private_data : NULL; u32 bdfn = (((uint64_t)bus-number) 8) | devfn; - if (hose == NULL) + if (!phb) return PCIBIOS_DEVICE_NOT_FOUND; cfg_dbg(pnv_pci_write_config bus: %x devfn: %x +%x/%x - %08x\n, -- 1.7.10.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/6 13:42, Michael Ellerman 写道: On Wed, Mar 06, 2013 at 01:34:58PM +0800, Mike Qiu wrote: 于 2013/3/6 11:54, Michael Ellerman 写道: On Tue, Mar 05, 2013 at 03:19:57PM +0800, Mike Qiu wrote: 于 2013/3/5 10:23, Michael Ellerman 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map For multiple-MSI the allocated interrupt numbers must be a power-of-2, and must be naturally aligned. I don't /think/ that's a requirement for the virtual numbers, but it's probably best that we do it anyway. So this API needs to specify that it will give you back a power-of-2 block that is naturally aligned - otherwise you can't use it for MSI. rtas_call will return the numbers of hardware interrupt, and it should be power-of-2, as this I think do not need to specify You're confusing hardware interrupt numbers and virtual interrupt numbers. My comment is about irq_create_mapping_many(), which returns virtual interrupt numbers. As I said I don't think there is a requirement that the virtual interrupt numbers are also a power-of-2 naturally aligned block, but we should allocate them as one anyway, to avoid any issues in future. But for virtual interrupt numbersit should be a power-of-2 naturally aligned block, because it must be continuous, as the MSI-HOWTO.txt says: 4.2.2 pci_enable_msi_block int pci_enable_msi_block(struct pci_dev *dev, int count) This variation on the above call allows a device driver to request multiple MSIs. The MSI specification only allows interrupts to be allocated in powers of two, up to a maximum of 2^5 (32). If this function returns 0, it has succeeded in allocating at least as many interrupts as the driver requested (it may have allocated more in order to satisfy the power-of-two requirement). In this case, the function enables MSI on this device and updates dev->irq to be the lowest of the new interrupts assigned to it. The other interrupts assigned to the device are in the range dev->irq to dev->irq + count - 1. See the last line, that means for the virtual interrupts must be a continuous block. In practice I think things could work if we didn't, because we are not using the mask routines that assume that layout. But you're right, we must implement the API as it's specified, so the virtual interrupt numbers must be a naturally aligned power-of-2. Yes, also your opinion is also right, just becasue the API requires a naturally aligned power-of-2 interrupt numbers, so we need to implement it like this. cheers cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/6 13:42, Michael Ellerman 写道: On Wed, Mar 06, 2013 at 01:34:58PM +0800, Mike Qiu wrote: 于 2013/3/6 11:54, Michael Ellerman 写道: On Tue, Mar 05, 2013 at 03:19:57PM +0800, Mike Qiu wrote: 于 2013/3/5 10:23, Michael Ellerman 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map For multiple-MSI the allocated interrupt numbers must be a power-of-2, and must be naturally aligned. I don't /think/ that's a requirement for the virtual numbers, but it's probably best that we do it anyway. So this API needs to specify that it will give you back a power-of-2 block that is naturally aligned - otherwise you can't use it for MSI. rtas_call will return the numbers of hardware interrupt, and it should be power-of-2, as this I think do not need to specify You're confusing hardware interrupt numbers and virtual interrupt numbers. My comment is about irq_create_mapping_many(), which returns virtual interrupt numbers. As I said I don't think there is a requirement that the virtual interrupt numbers are also a power-of-2 naturally aligned block, but we should allocate them as one anyway, to avoid any issues in future. But for virtual interrupt numbersit should be a power-of-2 naturally aligned block, because it must be continuous, as the MSI-HOWTO.txt says: 4.2.2 pci_enable_msi_block int pci_enable_msi_block(struct pci_dev *dev, int count) This variation on the above call allows a device driver to request multiple MSIs. The MSI specification only allows interrupts to be allocated in powers of two, up to a maximum of 2^5 (32). If this function returns 0, it has succeeded in allocating at least as many interrupts as the driver requested (it may have allocated more in order to satisfy the power-of-two requirement). In this case, the function enables MSI on this device and updates dev-irq to be the lowest of the new interrupts assigned to it. The other interrupts assigned to the device are in the range dev-irq to dev-irq + count - 1. See the last line, that means for the virtual interrupts must be a continuous block. In practice I think things could work if we didn't, because we are not using the mask routines that assume that layout. But you're right, we must implement the API as it's specified, so the virtual interrupt numbers must be a naturally aligned power-of-2. Yes, also your opinion is also right, just becasue the API requires a naturally aligned power-of-2 interrupt numbers, so we need to implement it like this. cheers cheers -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/5 10:41, Paul Mundt 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ Other than the other review comments already made, I think you can simplify this considerably by simply doing what irq_create_strict_mappings() does, and relaxing the irq_base requirements. In any event, as you are creating a new interface, I don't think you want to carry around half of the legacy crap that irq_create_mapping() has to deal with. We made the decision to avoid this with irq_create_strict_mappings() intentionally, too. Oh, yes, you are right, I will send out V2 of my patch to make it more comfortable , and hope you can review my patch again Thanks Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/5 10:23, Michael Ellerman 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. Signed-off-by: Mike Qiu --- include/linux/irq.h |2 + include/linux/irqdomain.h |3 ++ kernel/irq/irqdomain.c| 61 + 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 60ef45b..e00a7ec 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -592,6 +592,8 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, #define irq_alloc_desc_from(from, node) \ irq_alloc_descs(-1, from, 1, node) +#define irq_alloc_desc_n(nevc, node) \ + irq_alloc_descs(-1, 0, nevc, node) This has been superseeded by irq_alloc_descs_from(), which is the right way to do it. Yes, but irq_alloc_descs_from() just for 1 irq, and if I change the api, maybe a lot places which call this function will be affact. diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 0d5b17b..831dded 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -168,6 +168,9 @@ extern int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, irq_hw_number_t hwirq_base, int count); +extern int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count); + static inline int irq_create_identity_mapping(struct irq_domain *host, irq_hw_number_t hwirq) { diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map For multiple-MSI the allocated interrupt numbers must be a power-of-2, and must be naturally aligned. I don't /think/ that's a requirement for the virtual numbers, but it's probably best that we do it anyway. So this API needs to specify that it will give you back a power-of-2 block that is naturally aligned - otherwise you can't use it for MSI. rtas_call will return the numbers of hardware interrupt, and it should be power-of-2, as this I think do not need to specify + * This routine is used for allocating and mapping a range of hardware + * irqs to virtual IRQs where the virtual irq numbers are not at pre-defined + * locations. This comment doesn't make sense to me. + * + * Greater than 0 is returned upon success, while any failure to establish a + * static mapping is treated as an error. + */ +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ + int ret, irq_base; + int virq, i; + + pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq_base); I'd like to see this whole function rewritten to reduce the duplication vs irq_create_mapping(). I don't see any reason why this can't be the core routine, and irq_create_mapping() becomes a caller of it, passing a count of 1 ? It's good suggestion. + /* Look for default domain if nececssary */ + if (!domain) + domain = irq_default_domain; + if (!domain) { + pr_warn("irq_create_mapping called for NULL domain, hwirq=%lx\n" + , hwirq_base); + WARN_ON(1); + return 0; + } + pr_debug("-> using domain @%p\n", domain); + + /* For IRQ_DOMAIN_MAP_LEGACY, get the first virtual interrupt number */ + if (domain->revmap_type == IRQ_DOMAIN_MAP_LEGACY) + return irq_domain_legacy_revmap(domain, hwirq_base); The above doesn't work. Why it doesn't work ? + /* Check if mapping already exists */ + for (i = 0; i < count; i++) { + virq = irq_find_mapping(domain, hwirq_base+i); + if (virq) { + pr_debug("existing mapping on virq %d," + " now dispose it first\n", virq); + irq_dispose_mapping(virq); You might have just disposed of someone elses mapping, we shouldn't do that. It should be an error to the caller. It's a good question. If the interrupt used for someone elses, why I can apply it from the system? So it may someone else forget to disp
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/5 10:23, Michael Ellerman 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- include/linux/irq.h |2 + include/linux/irqdomain.h |3 ++ kernel/irq/irqdomain.c| 61 + 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 60ef45b..e00a7ec 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -592,6 +592,8 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, #define irq_alloc_desc_from(from, node) \ irq_alloc_descs(-1, from, 1, node) +#define irq_alloc_desc_n(nevc, node) \ + irq_alloc_descs(-1, 0, nevc, node) This has been superseeded by irq_alloc_descs_from(), which is the right way to do it. Yes, but irq_alloc_descs_from() just for 1 irq, and if I change the api, maybe a lot places which call this function will be affact. diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 0d5b17b..831dded 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -168,6 +168,9 @@ extern int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, irq_hw_number_t hwirq_base, int count); +extern int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count); + static inline int irq_create_identity_mapping(struct irq_domain *host, irq_hw_number_t hwirq) { diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map For multiple-MSI the allocated interrupt numbers must be a power-of-2, and must be naturally aligned. I don't /think/ that's a requirement for the virtual numbers, but it's probably best that we do it anyway. So this API needs to specify that it will give you back a power-of-2 block that is naturally aligned - otherwise you can't use it for MSI. rtas_call will return the numbers of hardware interrupt, and it should be power-of-2, as this I think do not need to specify + * This routine is used for allocating and mapping a range of hardware + * irqs to virtual IRQs where the virtual irq numbers are not at pre-defined + * locations. This comment doesn't make sense to me. + * + * Greater than 0 is returned upon success, while any failure to establish a + * static mapping is treated as an error. + */ +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ + int ret, irq_base; + int virq, i; + + pr_debug(irq_create_mapping(0x%p, 0x%lx)\n, domain, hwirq_base); I'd like to see this whole function rewritten to reduce the duplication vs irq_create_mapping(). I don't see any reason why this can't be the core routine, and irq_create_mapping() becomes a caller of it, passing a count of 1 ? It's good suggestion. + /* Look for default domain if nececssary */ + if (!domain) + domain = irq_default_domain; + if (!domain) { + pr_warn(irq_create_mapping called for NULL domain, hwirq=%lx\n + , hwirq_base); + WARN_ON(1); + return 0; + } + pr_debug(- using domain @%p\n, domain); + + /* For IRQ_DOMAIN_MAP_LEGACY, get the first virtual interrupt number */ + if (domain-revmap_type == IRQ_DOMAIN_MAP_LEGACY) + return irq_domain_legacy_revmap(domain, hwirq_base); The above doesn't work. Why it doesn't work ? + /* Check if mapping already exists */ + for (i = 0; i count; i++) { + virq = irq_find_mapping(domain, hwirq_base+i); + if (virq) { + pr_debug(existing mapping on virq %d, +now dispose it first\n, virq); + irq_dispose_mapping(virq); You might have just disposed of someone elses mapping, we shouldn't do that. It should be an error to the caller. It's a good question. If the interrupt used for someone elses, why I can apply it from the system? So it may someone else forget to dispose mapping, and it never be used for others as I
Re: [PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
于 2013/3/5 10:41, Paul Mundt 写道: On Tue, Jan 15, 2013 at 03:38:55PM +0800, Mike Qiu wrote: Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ Other than the other review comments already made, I think you can simplify this considerably by simply doing what irq_create_strict_mappings() does, and relaxing the irq_base requirements. In any event, as you are creating a new interface, I don't think you want to carry around half of the legacy crap that irq_create_mapping() has to deal with. We made the decision to avoid this with irq_create_strict_mappings() intentionally, too. Oh, yes, you are right, I will send out V2 of my patch to make it more comfortable , and hope you can review my patch again Thanks Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
于 2013/3/1 11:54, Michael Ellerman 写道: On Fri, Mar 01, 2013 at 11:08:45AM +0800, Mike wrote: Hi all Any comments? or any questions about my patchset? You were going to get some performance numbers that show a definite benefit for using more than one MSI. Yes, but my patch just enable the kernel to support this feature, whether to use it depens on the device driver. And this feature has been merged to the kernel for X86 for a long time. See commit: 5ca72c4f7c412c2002363218901eba5516c476b1 51906e779f2b13b38f8153774c4c7163d412ffd9 Actually, I'm trying to do the test. but it is difficult to do that test, because it mostly depends on how the device driver to use this feature, while the ipr driver patch was wrote by another person. also no any reply from her. cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
于 2013/3/1 11:54, Michael Ellerman 写道: On Fri, Mar 01, 2013 at 11:08:45AM +0800, Mike wrote: Hi all Any comments? or any questions about my patchset? You were going to get some performance numbers that show a definite benefit for using more than one MSI. Yes, but my patch just enable the kernel to support this feature, whether to use it depens on the device driver. And this feature has been merged to the kernel for X86 for a long time. See commit: 5ca72c4f7c412c2002363218901eba5516c476b1 51906e779f2b13b38f8153774c4c7163d412ffd9 Actually, I'm trying to do the test. but it is difficult to do that test, because it mostly depends on how the device driver to use this feature, while the ipr driver patch was wrote by another person. also no any reply from her. cheers -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
2013/2/4 13:56, Michael Ellerman: On Mon, 2013-02-04 at 11:49 +0800, Mike Qiu wrote: On Tue, 2013-01-15 at 15:38 +0800, Mike Qiu wrote: Currently, multiple MSI feature hasn't been enabled in pSeries, These patches try to enbale this feature. Hi Mike, These patches have been tested by using ipr driver, and the driver patch has been made by Wen Xiong : So who wrote these patches? Normally we would expect the original author to post the patches if at all possible. Hi Michael These Multiple MSI patches were wrote by myself, you know this feature has not enabled and it need device driver to test whether it works suitable. So I test my patches use Wen Xiong's ipr patches, which has been send out to the maillinglist. I'm the original author :) Ah OK, sorry, that was more or less clear from your mail but I just misunderstood. [PATCH 0/7] Add support for new IBM SAS controllers I would like to see the full series, including the driver enablement. Yep, but the driver patches were wrote by Wen Xiong and has been send out. OK, you mean this series? http://thread.gmane.org/gmane.linux.scsi/79639 Yes, exactly. I just use her patches to test my patches. all device support Multiple MSI can use my feature not only IBM SAS controllers, I also test my patches use the broadcom wireless card tg3, and also works OK. You mean drivers/net/ethernet/broadcom/tg3.c ? I don't see where it calls pci_enable_msi_block() ? Yes, I just modify the driver to support mutiple MSI. All devices /can/ use it, but the driver needs to be updated. Currently we have two drivers that do so (in Linus' tree), plus the updated IPR. Not all devices, just the device which support the multiple MSI by hardware, can use it Test platform: One partition of pSeries with one cpu core(4 SMTs) and RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7 OS version: SUSE Linux Enterprise Server 11 SP2 (ppc64) with 3.8-rc3 kernel IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI. The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 This shows that you are correctly configuring two MSIs. But the key advantage of using multiple interrupts is to distribute load across CPUs and improve performance. So I would like to see some performance numbers that show that there is a real benefit for all the extra complexity in the code. Yes, the system just has suport two MSIs. Anyway, I will try to do some proformance test, to show the real benefit. But actually it needs the driver to do so. As the data show above, it seems there is some problems in use the interrupt, the irq 21 use few, most use 22, I will discuss with the driver author to see why and if she fixed, I will give out the proformance result. Yeah that would be good. I really dislike that we have a separate API for multi-MSI vs MSI-X, and pci_enable_msi_block() also pushes the contiguous power-of-2 allocation into the irq domain layer, which is unpleasant. So if we really must do multi-MSI I would like to do it differently. Yes, but the multi-MSI must need the hardware support, it is one extend for MSI, The device may sopport MSI and multiple MSI, but not support MSI-X. for these devices, we'd better use multiple MSI to makes it more efficiency, compare with MSI. multi-MSI just can use no more than 32 interrupts Thanks cheers -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] Enable multiple MSI feature in pSeries
2013/2/4 13:56, Michael Ellerman: On Mon, 2013-02-04 at 11:49 +0800, Mike Qiu wrote: On Tue, 2013-01-15 at 15:38 +0800, Mike Qiu wrote: Currently, multiple MSI feature hasn't been enabled in pSeries, These patches try to enbale this feature. Hi Mike, These patches have been tested by using ipr driver, and the driver patch has been made by Wen Xiong wenxi...@linux.vnet.ibm.com: So who wrote these patches? Normally we would expect the original author to post the patches if at all possible. Hi Michael These Multiple MSI patches were wrote by myself, you know this feature has not enabled and it need device driver to test whether it works suitable. So I test my patches use Wen Xiong's ipr patches, which has been send out to the maillinglist. I'm the original author :) Ah OK, sorry, that was more or less clear from your mail but I just misunderstood. [PATCH 0/7] Add support for new IBM SAS controllers I would like to see the full series, including the driver enablement. Yep, but the driver patches were wrote by Wen Xiong and has been send out. OK, you mean this series? http://thread.gmane.org/gmane.linux.scsi/79639 Yes, exactly. I just use her patches to test my patches. all device support Multiple MSI can use my feature not only IBM SAS controllers, I also test my patches use the broadcom wireless card tg3, and also works OK. You mean drivers/net/ethernet/broadcom/tg3.c ? I don't see where it calls pci_enable_msi_block() ? Yes, I just modify the driver to support mutiple MSI. All devices /can/ use it, but the driver needs to be updated. Currently we have two drivers that do so (in Linus' tree), plus the updated IPR. Not all devices, just the device which support the multiple MSI by hardware, can use it Test platform: One partition of pSeries with one cpu core(4 SMTs) and RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7 OS version: SUSE Linux Enterprise Server 11 SP2 (ppc64) with 3.8-rc3 kernel IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI. The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 This shows that you are correctly configuring two MSIs. But the key advantage of using multiple interrupts is to distribute load across CPUs and improve performance. So I would like to see some performance numbers that show that there is a real benefit for all the extra complexity in the code. Yes, the system just has suport two MSIs. Anyway, I will try to do some proformance test, to show the real benefit. But actually it needs the driver to do so. As the data show above, it seems there is some problems in use the interrupt, the irq 21 use few, most use 22, I will discuss with the driver author to see why and if she fixed, I will give out the proformance result. Yeah that would be good. I really dislike that we have a separate API for multi-MSI vs MSI-X, and pci_enable_msi_block() also pushes the contiguous power-of-2 allocation into the irq domain layer, which is unpleasant. So if we really must do multi-MSI I would like to do it differently. Yes, but the multi-MSI must need the hardware support, it is one extend for MSI, The device may sopport MSI and multiple MSI, but not support MSI-X. for these devices, we'd better use multiple MSI to makes it more efficiency, compare with MSI. multi-MSI just can use no more than 32 interrupts Thanks cheers -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. Signed-off-by: Mike Qiu --- include/linux/irq.h |2 + include/linux/irqdomain.h |3 ++ kernel/irq/irqdomain.c| 61 + 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 60ef45b..e00a7ec 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -592,6 +592,8 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, #define irq_alloc_desc_from(from, node)\ irq_alloc_descs(-1, from, 1, node) +#define irq_alloc_desc_n(nevc, node) \ + irq_alloc_descs(-1, 0, nevc, node) void irq_free_descs(unsigned int irq, unsigned int cnt); int irq_reserve_irqs(unsigned int from, unsigned int cnt); diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 0d5b17b..831dded 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -168,6 +168,9 @@ extern int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, irq_hw_number_t hwirq_base, int count); +extern int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count); + static inline int irq_create_identity_mapping(struct irq_domain *host, irq_hw_number_t hwirq) { diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map + * + * This routine is used for allocating and mapping a range of hardware + * irqs to virtual IRQs where the virtual irq numbers are not at pre-defined + * locations. + * + * Greater than 0 is returned upon success, while any failure to establish a + * static mapping is treated as an error. + */ +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ + int ret, irq_base; + int virq, i; + + pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq_base); + + /* Look for default domain if nececssary */ + if (!domain) + domain = irq_default_domain; + if (!domain) { + pr_warn("irq_create_mapping called for NULL domain, hwirq=%lx\n" + , hwirq_base); + WARN_ON(1); + return 0; + } + pr_debug("-> using domain @%p\n", domain); + + /* For IRQ_DOMAIN_MAP_LEGACY, get the first virtual interrupt number */ + if (domain->revmap_type == IRQ_DOMAIN_MAP_LEGACY) + return irq_domain_legacy_revmap(domain, hwirq_base); + + /* Check if mapping already exists */ + for (i = 0; i < count; i++) { + virq = irq_find_mapping(domain, hwirq_base+i); + if (virq) { + pr_debug("existing mapping on virq %d," + " now dispose it first\n", virq); + irq_dispose_mapping(virq); + } + } + + /* Allocate the continuous virtual interrupt numbers */ + irq_base = irq_alloc_desc_n(count, of_node_to_nid(domain->of_node)); + if (unlikely(irq_base < 0)) + return irq_base; + + ret = irq_domain_associate_many(domain, irq_base, hwirq_base, count); + if (unlikely(ret < 0)) { + irq_free_descs(irq_base, count); + return ret; + } + + return irq_base; +} +EXPORT_SYMBOL_GPL(irq_create_mapping_many); + unsigned int irq_create_of_mapping(struct device_node *controller, const u32 *intspec, unsigned int intsize) { -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] Enable multiple MSI feature in pSeries
Currently, multiple MSI feature hasn't been enabled in pSeries, These patches try to enbale this feature. These patches have been tested by using ipr driver, and the driver patch has been made by Wen Xiong : [PATCH 0/7] Add support for new IBM SAS controllers Test platform: One partition of pSeries with one cpu core(4 SMTs) and RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7 OS version: SUSE Linux Enterprise Server 11 SP2 (ppc64) with 3.8-rc3 kernel IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI. The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 16: 240458 261601 226310 200425 XICS Level IPI 17: 0 0 0 0 XICS Level RAS_EPOW 18: 10 0 3 2 XICS Level hvc_console 19: 122182 28481 28527 28864 XICS Level ibmvscsi 20:5067388226108118 XICS Level eth0 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 LOC: 398077 316725 231882 203049 Local timer interrupts SPU: 1659919961903 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions Mike Qiu (3): irq: Set multiple MSI descriptor data for multiple IRQs irq: Add hw continuous IRQs map to virtual continuous IRQs support powerpc/pci: Enable pSeries multiple MSI feature arch/powerpc/kernel/msi.c|4 -- arch/powerpc/platforms/pseries/msi.c | 62 - include/linux/irq.h |4 ++ include/linux/irqdomain.h|3 ++ kernel/irq/chip.c| 40 - kernel/irq/irqdomain.c | 61 + 6 files changed, 158 insertions(+), 16 deletions(-) -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] irq: Set multiple MSI descriptor data for multiple IRQs
Multiple MSI only requires the IRQ in msi_desc entry to be set as the value of irq_base. This patch implements the above mentioned technique. Signed-off-by: Mike Qiu --- include/linux/irq.h |2 ++ kernel/irq/chip.c | 40 ++-- 2 files changed, 32 insertions(+), 10 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index fdf2c4a..60ef45b 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -528,6 +528,8 @@ extern int irq_set_handler_data(unsigned int irq, void *data); extern int irq_set_chip_data(unsigned int irq, void *data); extern int irq_set_irq_type(unsigned int irq, unsigned int type); extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry); +extern int irq_set_multiple_msi_desc(unsigned int irq_base, unsigned int nvec, + struct msi_desc *entry); extern struct irq_data *irq_get_irq_data(unsigned int irq); static inline struct irq_chip *irq_get_chip(unsigned int irq) diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 3aca9f2..c4c39d3 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -90,6 +90,35 @@ int irq_set_handler_data(unsigned int irq, void *data) EXPORT_SYMBOL(irq_set_handler_data); /** + * irq_set_multiple_msi_desc - set Multiple MSI descriptor data + * for multiple IRQs + * @irq_base: Interrupt number base + * @nvec: The number of interrupts + * @entry: Pointer to MSI descriptor data + * + * Set IRQ descriptors for multiple MSIs + */ +int irq_set_multiple_msi_desc(unsigned int irq_base, unsigned int nvec, + struct msi_desc *entry) +{ + unsigned long flags, i; + struct irq_desc *desc; + + for (i = 0; i < nvec; i++) { + desc = irq_get_desc_lock(irq_base + i, , + IRQ_GET_DESC_CHECK_GLOBAL); + if (!desc) + return -EINVAL; + desc->irq_data.msi_desc = entry; + if (i == 0 && entry) + entry->irq = irq_base; + irq_put_desc_unlock(desc, flags); + } + + return 0; +} + +/** * irq_set_msi_desc - set MSI descriptor data for an irq * @irq: Interrupt number * @entry: Pointer to MSI descriptor data @@ -98,16 +127,7 @@ EXPORT_SYMBOL(irq_set_handler_data); */ int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry) { - unsigned long flags; - struct irq_desc *desc = irq_get_desc_lock(irq, , IRQ_GET_DESC_CHECK_GLOBAL); - - if (!desc) - return -EINVAL; - desc->irq_data.msi_desc = entry; - if (entry) - entry->irq = irq; - irq_put_desc_unlock(desc, flags); - return 0; + return irq_set_multiple_msi_desc(irq, 1, entry); } /** -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] powerpc/pci: Enable pSeries multiple MSI feature
PCI devices support MSI, MSIX as well as multiple MSI. But pSeries does not support multiple MSI yet. This patch enable multiple MSI feature in pSeries. Signed-off-by: Mike Qiu --- arch/powerpc/kernel/msi.c|4 -- arch/powerpc/platforms/pseries/msi.c | 62 - 2 files changed, 60 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c index 8bbc12d..46b1470 100644 --- a/arch/powerpc/kernel/msi.c +++ b/arch/powerpc/kernel/msi.c @@ -20,10 +20,6 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int type) return -ENOSYS; } - /* PowerPC doesn't support multiple MSI yet */ - if (type == PCI_CAP_ID_MSI && nvec > 1) - return 1; - if (ppc_md.msi_check_device) { pr_debug("msi: Using platform check routine.\n"); return ppc_md.msi_check_device(dev, nvec, type); diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c index e5b0847..6633b18 100644 --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -132,13 +132,17 @@ static int rtas_query_irq_number(struct pci_dn *pdn, int offset) static void rtas_teardown_msi_irqs(struct pci_dev *pdev) { struct msi_desc *entry; + int nvec, i; list_for_each_entry(entry, >msi_list, list) { if (entry->irq == NO_IRQ) continue; irq_set_msi_desc(entry->irq, NULL); - irq_dispose_mapping(entry->irq); + nvec = entry->msi_attrib.is_msix ? 1 : 1 << + entry->msi_attrib.multiple; + for (i = 0; i < nvec; i++) + irq_dispose_mapping(entry->irq + i); } rtas_disable_msi(pdev); @@ -392,6 +396,55 @@ static int check_msix_entries(struct pci_dev *pdev) return 0; } +static int setup_multiple_msi_irqs(struct pci_dev *pdev, int nvec) +{ + struct pci_dn *pdn; + int hwirq, virq_base, i, hwirq_base = 0; + struct msi_desc *entry; + struct msi_msg msg; + + pdn = get_pdn(pdev); + entry = list_entry(pdev->msi_list.next, typeof(*entry), list); + + /* +* Get the hardware IRQ base and ensure the retrieved +* hardware IRQs are continuous +*/ + for (i = 0; i < nvec; i++) { + hwirq = rtas_query_irq_number(pdn, i); + if (i == 0) + hwirq_base = hwirq; + + if (hwirq < 0 || hwirq != (hwirq_base + i)) { + pr_debug("rtas_msi: Failure to get %d IRQs on" + "PCI device %04x:%02x:%02x.%01x\n", nvec, + pci_domain_nr(pdev->bus), pdev->bus->number, + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn)); + return hwirq; + } + } + + virq_base = irq_create_mapping_many(NULL, hwirq_base, nvec); + if (virq_base <= 0) { + pr_debug("rtas_msi: Failure to map IRQs (%d, %d) " + "for PCI device %04x:%02x:%02x.%01x\n", + hwirq_base, nvec, pci_domain_nr(pdev->bus), + pdev->bus->number, PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn)); + return -ENOSPC; + } + + entry->msi_attrib.multiple = ilog2(nvec & 0x3f); + irq_set_multiple_msi_desc(virq_base, nvec, entry); + for (i = 0; i < nvec; i++) { + /* Read config space back so we can restore after reset */ + read_msi_msg(virq_base + i, ); + entry->msg = msg; + } + + return 0; +} + static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) { struct pci_dn *pdn; @@ -444,11 +497,16 @@ again: return rc; } + if (type == PCI_CAP_ID_MSI && nvec > 1) { + rc = setup_multiple_msi_irqs(pdev, nvec); + return rc; + } + i = 0; list_for_each_entry(entry, >msi_list, list) { hwirq = rtas_query_irq_number(pdn, i++); if (hwirq < 0) { - pr_debug("rtas_msi: error (%d) getting hwirq\n", rc); + pr_debug("rtas_msi: error (%d) getting hwirq\n", nvec); return hwirq; } -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] irq: Set multiple MSI descriptor data for multiple IRQs
Multiple MSI only requires the IRQ in msi_desc entry to be set as the value of irq_base. This patch implements the above mentioned technique. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- include/linux/irq.h |2 ++ kernel/irq/chip.c | 40 ++-- 2 files changed, 32 insertions(+), 10 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index fdf2c4a..60ef45b 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -528,6 +528,8 @@ extern int irq_set_handler_data(unsigned int irq, void *data); extern int irq_set_chip_data(unsigned int irq, void *data); extern int irq_set_irq_type(unsigned int irq, unsigned int type); extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry); +extern int irq_set_multiple_msi_desc(unsigned int irq_base, unsigned int nvec, + struct msi_desc *entry); extern struct irq_data *irq_get_irq_data(unsigned int irq); static inline struct irq_chip *irq_get_chip(unsigned int irq) diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 3aca9f2..c4c39d3 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -90,6 +90,35 @@ int irq_set_handler_data(unsigned int irq, void *data) EXPORT_SYMBOL(irq_set_handler_data); /** + * irq_set_multiple_msi_desc - set Multiple MSI descriptor data + * for multiple IRQs + * @irq_base: Interrupt number base + * @nvec: The number of interrupts + * @entry: Pointer to MSI descriptor data + * + * Set IRQ descriptors for multiple MSIs + */ +int irq_set_multiple_msi_desc(unsigned int irq_base, unsigned int nvec, + struct msi_desc *entry) +{ + unsigned long flags, i; + struct irq_desc *desc; + + for (i = 0; i nvec; i++) { + desc = irq_get_desc_lock(irq_base + i, flags, + IRQ_GET_DESC_CHECK_GLOBAL); + if (!desc) + return -EINVAL; + desc-irq_data.msi_desc = entry; + if (i == 0 entry) + entry-irq = irq_base; + irq_put_desc_unlock(desc, flags); + } + + return 0; +} + +/** * irq_set_msi_desc - set MSI descriptor data for an irq * @irq: Interrupt number * @entry: Pointer to MSI descriptor data @@ -98,16 +127,7 @@ EXPORT_SYMBOL(irq_set_handler_data); */ int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry) { - unsigned long flags; - struct irq_desc *desc = irq_get_desc_lock(irq, flags, IRQ_GET_DESC_CHECK_GLOBAL); - - if (!desc) - return -EINVAL; - desc-irq_data.msi_desc = entry; - if (entry) - entry-irq = irq; - irq_put_desc_unlock(desc, flags); - return 0; + return irq_set_multiple_msi_desc(irq, 1, entry); } /** -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] powerpc/pci: Enable pSeries multiple MSI feature
PCI devices support MSI, MSIX as well as multiple MSI. But pSeries does not support multiple MSI yet. This patch enable multiple MSI feature in pSeries. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- arch/powerpc/kernel/msi.c|4 -- arch/powerpc/platforms/pseries/msi.c | 62 - 2 files changed, 60 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c index 8bbc12d..46b1470 100644 --- a/arch/powerpc/kernel/msi.c +++ b/arch/powerpc/kernel/msi.c @@ -20,10 +20,6 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int type) return -ENOSYS; } - /* PowerPC doesn't support multiple MSI yet */ - if (type == PCI_CAP_ID_MSI nvec 1) - return 1; - if (ppc_md.msi_check_device) { pr_debug(msi: Using platform check routine.\n); return ppc_md.msi_check_device(dev, nvec, type); diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c index e5b0847..6633b18 100644 --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -132,13 +132,17 @@ static int rtas_query_irq_number(struct pci_dn *pdn, int offset) static void rtas_teardown_msi_irqs(struct pci_dev *pdev) { struct msi_desc *entry; + int nvec, i; list_for_each_entry(entry, pdev-msi_list, list) { if (entry-irq == NO_IRQ) continue; irq_set_msi_desc(entry-irq, NULL); - irq_dispose_mapping(entry-irq); + nvec = entry-msi_attrib.is_msix ? 1 : 1 + entry-msi_attrib.multiple; + for (i = 0; i nvec; i++) + irq_dispose_mapping(entry-irq + i); } rtas_disable_msi(pdev); @@ -392,6 +396,55 @@ static int check_msix_entries(struct pci_dev *pdev) return 0; } +static int setup_multiple_msi_irqs(struct pci_dev *pdev, int nvec) +{ + struct pci_dn *pdn; + int hwirq, virq_base, i, hwirq_base = 0; + struct msi_desc *entry; + struct msi_msg msg; + + pdn = get_pdn(pdev); + entry = list_entry(pdev-msi_list.next, typeof(*entry), list); + + /* +* Get the hardware IRQ base and ensure the retrieved +* hardware IRQs are continuous +*/ + for (i = 0; i nvec; i++) { + hwirq = rtas_query_irq_number(pdn, i); + if (i == 0) + hwirq_base = hwirq; + + if (hwirq 0 || hwirq != (hwirq_base + i)) { + pr_debug(rtas_msi: Failure to get %d IRQs on + PCI device %04x:%02x:%02x.%01x\n, nvec, + pci_domain_nr(pdev-bus), pdev-bus-number, + PCI_SLOT(pdev-devfn), PCI_FUNC(pdev-devfn)); + return hwirq; + } + } + + virq_base = irq_create_mapping_many(NULL, hwirq_base, nvec); + if (virq_base = 0) { + pr_debug(rtas_msi: Failure to map IRQs (%d, %d) + for PCI device %04x:%02x:%02x.%01x\n, + hwirq_base, nvec, pci_domain_nr(pdev-bus), + pdev-bus-number, PCI_SLOT(pdev-devfn), + PCI_FUNC(pdev-devfn)); + return -ENOSPC; + } + + entry-msi_attrib.multiple = ilog2(nvec 0x3f); + irq_set_multiple_msi_desc(virq_base, nvec, entry); + for (i = 0; i nvec; i++) { + /* Read config space back so we can restore after reset */ + read_msi_msg(virq_base + i, msg); + entry-msg = msg; + } + + return 0; +} + static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) { struct pci_dn *pdn; @@ -444,11 +497,16 @@ again: return rc; } + if (type == PCI_CAP_ID_MSI nvec 1) { + rc = setup_multiple_msi_irqs(pdev, nvec); + return rc; + } + i = 0; list_for_each_entry(entry, pdev-msi_list, list) { hwirq = rtas_query_irq_number(pdn, i++); if (hwirq 0) { - pr_debug(rtas_msi: error (%d) getting hwirq\n, rc); + pr_debug(rtas_msi: error (%d) getting hwirq\n, nvec); return hwirq; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] Enable multiple MSI feature in pSeries
Currently, multiple MSI feature hasn't been enabled in pSeries, These patches try to enbale this feature. These patches have been tested by using ipr driver, and the driver patch has been made by Wen Xiong wenxi...@linux.vnet.ibm.com: [PATCH 0/7] Add support for new IBM SAS controllers Test platform: One partition of pSeries with one cpu core(4 SMTs) and RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7 OS version: SUSE Linux Enterprise Server 11 SP2 (ppc64) with 3.8-rc3 kernel IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI. The test results is shown by 'cat /proc/interrups': CPU0 CPU1 CPU2 CPU3 16: 240458 261601 226310 200425 XICS Level IPI 17: 0 0 0 0 XICS Level RAS_EPOW 18: 10 0 3 2 XICS Level hvc_console 19: 122182 28481 28527 28864 XICS Level ibmvscsi 20:5067388226108118 XICS Level eth0 21: 6 5 5 5 XICS Level host1-0 22:817814816813 XICS Level host1-1 LOC: 398077 316725 231882 203049 Local timer interrupts SPU: 1659919961903 Spurious interrupts CNT: 0 0 0 0 Performance monitoring interrupts MCE: 0 0 0 0 Machine check exceptions Mike Qiu (3): irq: Set multiple MSI descriptor data for multiple IRQs irq: Add hw continuous IRQs map to virtual continuous IRQs support powerpc/pci: Enable pSeries multiple MSI feature arch/powerpc/kernel/msi.c|4 -- arch/powerpc/platforms/pseries/msi.c | 62 - include/linux/irq.h |4 ++ include/linux/irqdomain.h|3 ++ kernel/irq/chip.c| 40 - kernel/irq/irqdomain.c | 61 + 6 files changed, 158 insertions(+), 16 deletions(-) -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] irq: Add hw continuous IRQs map to virtual continuous IRQs support
Adding a function irq_create_mapping_many() which can associate multiple MSIs to a continous irq mapping. This is needed to enable multiple MSI support for pSeries. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- include/linux/irq.h |2 + include/linux/irqdomain.h |3 ++ kernel/irq/irqdomain.c| 61 + 3 files changed, 66 insertions(+), 0 deletions(-) diff --git a/include/linux/irq.h b/include/linux/irq.h index 60ef45b..e00a7ec 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -592,6 +592,8 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, #define irq_alloc_desc_from(from, node)\ irq_alloc_descs(-1, from, 1, node) +#define irq_alloc_desc_n(nevc, node) \ + irq_alloc_descs(-1, 0, nevc, node) void irq_free_descs(unsigned int irq, unsigned int cnt); int irq_reserve_irqs(unsigned int from, unsigned int cnt); diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index 0d5b17b..831dded 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -168,6 +168,9 @@ extern int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, irq_hw_number_t hwirq_base, int count); +extern int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count); + static inline int irq_create_identity_mapping(struct irq_domain *host, irq_hw_number_t hwirq) { diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 96f3a1d..38648e6 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -636,6 +636,67 @@ int irq_create_strict_mappings(struct irq_domain *domain, unsigned int irq_base, } EXPORT_SYMBOL_GPL(irq_create_strict_mappings); +/** + * irq_create_mapping_many - Map a range of hw IRQs to a range of virtual IRQs + * @domain: domain owning the interrupt range + * @hwirq_base: beginning of continuous hardware IRQ range + * @count: Number of interrupts to map + * + * This routine is used for allocating and mapping a range of hardware + * irqs to virtual IRQs where the virtual irq numbers are not at pre-defined + * locations. + * + * Greater than 0 is returned upon success, while any failure to establish a + * static mapping is treated as an error. + */ +int irq_create_mapping_many(struct irq_domain *domain, + irq_hw_number_t hwirq_base, int count) +{ + int ret, irq_base; + int virq, i; + + pr_debug(irq_create_mapping(0x%p, 0x%lx)\n, domain, hwirq_base); + + /* Look for default domain if nececssary */ + if (!domain) + domain = irq_default_domain; + if (!domain) { + pr_warn(irq_create_mapping called for NULL domain, hwirq=%lx\n + , hwirq_base); + WARN_ON(1); + return 0; + } + pr_debug(- using domain @%p\n, domain); + + /* For IRQ_DOMAIN_MAP_LEGACY, get the first virtual interrupt number */ + if (domain-revmap_type == IRQ_DOMAIN_MAP_LEGACY) + return irq_domain_legacy_revmap(domain, hwirq_base); + + /* Check if mapping already exists */ + for (i = 0; i count; i++) { + virq = irq_find_mapping(domain, hwirq_base+i); + if (virq) { + pr_debug(existing mapping on virq %d, +now dispose it first\n, virq); + irq_dispose_mapping(virq); + } + } + + /* Allocate the continuous virtual interrupt numbers */ + irq_base = irq_alloc_desc_n(count, of_node_to_nid(domain-of_node)); + if (unlikely(irq_base 0)) + return irq_base; + + ret = irq_domain_associate_many(domain, irq_base, hwirq_base, count); + if (unlikely(ret 0)) { + irq_free_descs(irq_base, count); + return ret; + } + + return irq_base; +} +EXPORT_SYMBOL_GPL(irq_create_mapping_many); + unsigned int irq_create_of_mapping(struct device_node *controller, const u32 *intspec, unsigned int intsize) { -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] No need to call irq_domain_legacy_revmap() for twice
Function irq_create_mapping() calls irq_find_mapping(). The later function has checked if the indicated IRQ domain has hw IRQ mapped to virtual IRQ through legacy mode or not and return the value of the legacy irq number by call irq_domain_legacy_revmap(). We needn't to call irq_domain_legacy_revmap() to do same check in irq_create_mapping() again. The patch removes the duplicate call. Signed-off-by: Mike Qiu --- kernel/irq/irqdomain.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 49a7772..286d672 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -547,9 +547,12 @@ unsigned int irq_create_mapping(struct irq_domain *domain, return virq; } - /* Get a virtual interrupt number */ + /* +* For IRQ domain with type of IRQ_DOMAIN_MAP_LEGACY, we needn't +* create the IRQ mapping for non-existing one, so just return 0. +*/ if (domain->revmap_type == IRQ_DOMAIN_MAP_LEGACY) - return irq_domain_legacy_revmap(domain, hwirq); + return 0; /* Allocate a virtual interrupt number */ hint = hwirq % nr_irqs; -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] No need to call irq_domain_legacy_revmap() for twice
Function irq_create_mapping() calls irq_find_mapping(). The later function has checked if the indicated IRQ domain has hw IRQ mapped to virtual IRQ through legacy mode or not and return the value of the legacy irq number by call irq_domain_legacy_revmap(). We needn't to call irq_domain_legacy_revmap() to do same check in irq_create_mapping() again. The patch removes the duplicate call. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- kernel/irq/irqdomain.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 49a7772..286d672 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -547,9 +547,12 @@ unsigned int irq_create_mapping(struct irq_domain *domain, return virq; } - /* Get a virtual interrupt number */ + /* +* For IRQ domain with type of IRQ_DOMAIN_MAP_LEGACY, we needn't +* create the IRQ mapping for non-existing one, so just return 0. +*/ if (domain-revmap_type == IRQ_DOMAIN_MAP_LEGACY) - return irq_domain_legacy_revmap(domain, hwirq); + return 0; /* Allocate a virtual interrupt number */ hint = hwirq % nr_irqs; -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/